Logo

The Data Daily

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform - Cloudera Blog

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform - Cloudera Blog

We are excited to announce the general availability of Apache Iceberg in Cloudera Data Platform (CDP). Iceberg is a 100% open table format, developed through theApache Software Foundation, and helps users avoid vendor lock-in. Today’s general availability announcement covers Iceberg running within key data services in the Cloudera Data Platform (CDP)—including Cloudera Data Warehousing (CDW), Cloudera Data Engineering (CDE), and Cloudera Machine Learning (CML). These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines. There’s zero effort required by companies to get the benefits of Iceberg as part of CDP. No more lock-in, unnecessary data transformations, or data movement across tools and clouds just to extract insights out of the data.

As the first hybrid data platform to offer an open data lakehouse, CDP enables multi-function analytics at petabyte scale on both streaming and stored data in a cloud-native object store across multiple clouds and on premises. This allows our customers the freedom to choose their preferred analytic tool. With Cloudera’s vision of hybrid data,enterprises adoptingan open data lakehousecan easily get application interoperability and portability to and from on premises environments and any public cloud without worrying about data scaling. With Shared Data Experience (SDX) which is built in to CDP right from the beginning, customers benefit from a common metadata, security, and governance model across all their data. 

At Cloudera, we are unambiguous about our commitment to openness and interoperability.  This has driven our many significant contributions to innovation in communities like Apache Hive, Apache Spark, Apache Nifi, Apache Impala, Apache YuniKorn, and many more. In February 2022, we introduced Apache Icebergas a technical previewwithin CDP.

Over the past decade, Cloudera has enabled multi-function analytics on data lakes through the introduction of the Hive table format and Hive ACID. The lakehouse pattern has evolved to the cloud, however, it still remains driven by table formats that are tied to primary engines, and oftentimes single vendors. Companies, on the other hand, have continued to demand highly scalable and flexible analytic engines and services on the data lake,withoutvendor lock-in. Organizations want modern data architectures that evolve at the speed of their business and we are happy to support them with the first open data lakehouse. 

Apache Iceberg, now included as part of CDP, brings significant benefits to a modern data architecture, including:

We integrate Iceberg right into CDP’s SDX layer, so customers can easily use Iceberg and get all the productivity and performance benefits of the open table format right out of the box. Customers use a metadata-only migration in a single command, without touching any of the underlying large data sets.  This is a huge accelerator to adoption.

The data lakehouse is not new to Cloudera or our customers. For exampleIQVIAuses Cloudera to bring together more than two petabytes of data from 250 data warehouses worldwide – spanning Oracle, IBM Netezza, and Teradata systems – into a global, multi-tenant data lake on which they run their analytics. IQVIA has been leveraging the Hive open table format and Cloudera’s pre-integrated, multi-function analytics platform for more than five years. But the current data lakehouse architectural pattern is not enough. We see that companies need a platform across the full data lifecycle that can deliver multiple advanced analytics use cases with complete data in motion and operational database offerings. This is the open data lakehouse, which only Cloudera can offer in a hybrid data platform. 

With Apache Iceberg in CDP, Cloudera leads beyond the data lakehouse with an open ecosystem of data and community, combined with enterprise hardening and performance.  Our technical preview customers have shared the following feedback:

After evaluating all the major open-source storage frameworks to build our lakehouse, we chose Apache Iceberg because it’s100% openstrong community engagement . Now with Iceberg, CDP supports an open data lakehouse architecture that future-proofs our data platform for all our analytical workloads. We selected change data capture as our first use case on Iceberg. With frequent updates to our data lake, we aim to accelerate reporting and business intelligence, giving our business teams access to current insights. Partition evolution is also a critical capability for us, guaranteeing superior query performance for large-scale data engineering and BI workloads, “Modak’s partnership with Cloudera enables us to assist our customers in deploying a lakehouse architecture that unifies all their data while providing common security and governance for any analytic use case AI, machine learning, SQL, business intelligence reports, dashboards, and more.  By certifying Modak Nabu with Cloudera’s CDP Iceberg table format, enterprise customers canaccelerate data ingestion, curation, and consumption at a petabyte-scale for any data, resulting insimplified data management and faster data access ,” says Daniel Mantovani, head of innovation at Modak Analytics.

Customers have leveraged partition evolution capabilities through CDP and realized over 10x query performance benefits by using finer-grained partitions on their data. They can do this without needing to regenerate or modify any of the underlying data.

Our integration of Apache Iceberg supercharges CDP’s capabilities beyond the data lakehouse. We can handle any data anywhere, in hybrid and multi-cloud. We work where your data is born, where it lands, and where it’s used.  

Try Cloudera Data Warehouse (CDW), Cloudera Data Engineering (CDE), and Cloudera Machine Learning (CML) by signing up fora60 day trial, ortest drive CDP. If you are interested in chatting about Apache Iceberg in CDP, let your account team know.  As always, please provide your feedback in the comments section below. 

Thank you to all Cloudera contributors for this article: Navita Sood, Peter Vary, Zoltan Borok-Nagy, Imran Rashid, Justin Hayes, Priyank Patel

Images Powered by Shutterstock