Chris D’Agostino, global principal technologist at Databricks, explores what's holding data teams back from realising the full potential of their data
Many successful data-driven companies are constantly adapting, innovating and future-proofing their technology investments to remain competitive. In fact, according to recent research, the most innovative companies of 2020 have been recovering from the pandemic faster than others, and many of these are more advanced on their data journey. However, despite the progress that is being made, a recent Databricks and MIT Technology Review Insights report shows that just 13% of organisations are excelling at delivering on their data strategy, while the rest are struggling to some extent.
Many of the organisations that are struggling are experiencing similar challenges to one another, such as fragmentation of architecture, data silos and dealing with old legacy systems. Data teams have a lot to tackle at once while also navigating new technologies and keeping up with the sheer pace of change in data and analytics. Worryingly, many appear to be unaware of the full potential of their data. This is in part due to the teams being unsure of their top priorities and how exactly to deliver a future-proofed data strategy. Why are data teams struggling with this so much though, and what can be done to help?
Databricks and MIT’s research revealed a number of areas that organisations believe need addressing as a priority including improving data quality, training and hiring for the right skills and democratising data. The most frequently cited priority – according to 48% of chief data officers who responded to the global survey – is achieving better data management by improving data quality and processing. By doing this, the people within an organisation can find the information needed to excel. The second most frequently cited priority is increasing the adoption of cloud platforms, and this is directly in line with what the 13% of organisations excelling have done. Within this group of high achievers (the 13%), almost three-quarters run at least half of their data services in a cloud environment. The flexibility of working in a cloud environment has a number of important benefits for them, such as reduced cost, collaboration efficiency, and scalability, among others.
In a cloud environment, organisations can better and more reliably apply machine learning due to faster access and processing of the data. However, many organisations are struggling to keep up with the pace of innovation in the marketplace, which can be seen in a lack of expertise in machine learning. Training and hiring the right skills are key to tackling this and keeping up with the pace of change. Organisations need to empower their people to run analytics on their own rather than consuming analytics produced by someone else. Data democratisation must become an ongoing, core focus – if there are only a select few people working with the data, such as the data team, it’s only being used to a small percentage of its potential. Whereas, enabling data accessibility, regardless of technical knowledge, will help teams to make the most of their data. The easier it is for people to access the data, the more they will utilise it and realise its full potential. An active step toward a data culture is crucial and, for this to happen, the data platform and data within it must be accessible, understood, and trusted.
To overcome these obstacles, a modern data analytics platform that is open, flexible and empowers teams to make faster, better informed decisions with a cohesive view of all data, is needed.
Data warehouses have been a popular option since the 1980s, and revolutionised the data world we live in, enabling business intelligence tools to be plugged in to ask questions about the past, but looking at future insights is more difficult, and there are restrictions to the volume and formats of the data that can be analysed. Another option is data lakes, which on the other hand enable artificial intelligence (AI) to be utilised to ask questions about future scenarios. However, data lakes also have a weakness in that all data can be stored, cleaned and analysed, but can be quickly disorganised and become ‘data swamps’. Taking the best of both options, a new data architecture is emerging. Lakehouses are a technological breakthrough that finally allows businesses to both look to future scenarios and back to the past in the same space, at the same time, revolutionising the future of data capabilities. It’s the solution enterprises have been calling out for throughout the last decade at least; by combining the best elements of the data warehouse and data lake, the lakehouse enables enterprises to implement a superior data strategy, achieve better data management, and squeeze the full potential out of their data.
Another important focus area that cannot be overlooked is the simplicity of data platforms. They need to be easy to use with flexible architecture to easily meet business needs, without requiring redesigning, restructuring, or extensive data knowledge. There should also be an element of openness within the data platform. Open platforms allow other vendors to join and innovate on top of what is already available. A lot of modifications can be made on open platforms, meaning innovation and openness go hand in hand, which is very much needed at this time.
Innovation and the sheer pace of change in data and analytics is accelerating rapidly – enterprises and their data teams must keep up if they are to capture the vast opportunities data presents and remain competitive. Through open and simple data platforms with more accessible and easy to understand data, as part of a successful data culture, more and more organisations can excel at delivering on their data strategy and realising the full potential of their data.