Recently, TWDI Research conducted a survey of analytics and data professionals. The intent was to review some of the new directions in data architecture, most notably the convergence of data lakes and data warehouses.
89% of survey respondents see integration as an opportunity to manage a more diverse range of use cases and data structures. More than half said the top priority is to gain greater business value from their data. Other value drivers include unifying data silos and reducing data storage and costs.
Data warehouses have been around for a while, originally developed for BI queries and reporting using structured data. More recently, they’ve evolved to support NoSQL databases, Hadoop, advanced analytics and deployment in the cloud.
Data lakes are designed to store and process massive volumes of semi-structured and unstructured data including text, images and video. All of which are needed for today’s advanced analytics, AI, machine learning and data science initiatives. Lakes can be run on public, private and hybrid cloud architectures.
Machine learning and unstructured data don’t fit as well with the ordered way information is stored in a data warehouse architecture. As such, it may appear as though data lakes have a leg up in a modern analytics environment.
Yet data warehouses still play an ongoing role in areas like operational reporting and structured queries. 58% of survey respondents said they have a data warehouse on-premises, while 36% deploy a warehouse in the cloud.
Meanwhile, data lakes can become unmanaged, complicated environments (data swamps) and may have issues related to data privacy and compliance. 53% of survey participants agreed that data lakes need better data curation, governance and query optimization.
Data warehouse and data lakes have their strengths – and limitations. The logical progression is to take the best of both and put them together.
“One of the hallmarks of the unified architecture is its ability to support a wider range of data structures, end user types and business use cases than either of its constituent micro-architectures,” say the researchers.
Survey respondents cited strategies they’re using today to converge capabilities. For example, using a data lake to analyze IoT data and feeding the results into a warehouse for use in dashboards. Or creating a sandbox environment that marries data from a warehouse and a lake to perform multi-structured analysis.
Going forward, the core trend is for these platforms to integrate and interoperate architecturally – sharing common storage, workload management, data virtualization, semantic layer, queries, transaction processing, security and more.
Researchers note that more vendors are offering converged solutions, whether as a single platform or separate services that can be integrated as needed. Some no longer distinguish between the two technologies.
As well, organizations are using a number of tools and disciplines to better support and underpin convergence. This includes data catalogs, master data management, metadata management, and model and data governance.
Researchers underline the importance of using centralized, modern data pipelines. Pipelines need to be automated and comprehensive, providing end-to-end processes from data ingestion all the way to analysis. They should deliver data at high speed and provide real-time data integration to keep information fresh.
The Next Level in Data Analytics
Data warehouses and data lakes provide an impressive range of capabilities and use cases. By unifying both technologies – along with the right supporting tools and data pipelines – organizations can expect to take their modern analytics to a whole new level.
To read more, download the TDWI Research report Building the Unified Data Warehouse and Data Lake here.