Logo

The Data Daily

How To Increase Price/Performance With Lakehouse and Xeon Processors

How To Increase Price/Performance With Lakehouse and Xeon Processors

  The Databricks Lakehouse Platform unifies the best of data lake’s openness, scalability and flexibility with the best of data warehouse’s reliability, governance and performance. In this blog, we will look at performance aspects using Databricks Photon, which uses the latest techniques in vectorized query processing, and the latest Intel 3rd Gen Xeon scalable processors, which includes Intel Advanced Vector Extensions 512 (Intel® AVX-512).

Before we dive into the numbers, and the price/performance improvements, let’s take a moment to consider why these performance improvements are important. Consider this: as the volume of your data grows, and the requirement to deliver insights and take decisions quickly becomes important as a competitive advantage, the need to quickly process your data grows even faster. While optimizing and refactoring queries or code could help speed up workloads, analysts should focus on functional intent and business questions rather than query optimization. How do you ensure that results improve over time?

When you choose the Databricks Lakehouse Platform, you are choosing a platform that, together with our partners, consistently pushes and delivers improvements to help deliver the best value to our customers.

To examine these benefits in action, we ran a test derived from the industry-standard TPC-DS power test. We examined the results before and after enabling Photon and then switching to use latest Intel 3rd Gen Xeon Scalable processors:

Photon is the native vectorized query engine on Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. When you enable Photon, your existing code and queries can take advantage of the latest techniques in vectorized query processing to capitalize on data – and instruction-level parallelism in CPUs. This allows Photon customers to get a lower TCO and faster SLA for ETL and interactive queries.

Intel 3rd Gen Xeon Scalable processor includes Intel’s latest generation of Single Instruction Multiple Data (SIMD) instruction set, Intel® AVX-512, which boosts performance and throughput for the most demanding computational tasks such as data analytics and machine learning.

For the baseline, we are using Azure’s E8ds_v3 virtual machines, which have Intel 1st Gen Xeon Scalable processors, and Databricks runtime (DBR) 10.3 without Photon enabled. We ran TPC-DS benchmarks during March 2022 at both 1TB and 10TB scales on 20 worker clusters sizes.

We then ran the same workload any code changes on the same machines with Photon enabled.

That’s already yielded a 1.9x price-performance increase and a 3.4x performance speedup compared to the baseline.

Unleashing the full potential with Photon and Intel 3rd Gen Xeon Scalable processors

Again the same workload any code changes, but this time using Azure’s E8_ds_v5 virtual machines, with Intel 3rd Gen Xeon Scalable processors, and Photon enabled

That’s a 3x price-performance increase and a 6.7x performance speedup compared to our baseline.

Putting it all together

By enabling Databricks Photon and using Intel’s 3rd Gen Xeon Scalable processors, without making any code modifications, we were able to save ⅔ of the costs on our TPC-DS benchmark at 10TB and run 6.7 times quicker. This translates not only to cost savings but also reduced time-to-insight.

3.0x price/performance benefits and 6.7x the speed up – compared to the same TPC-DS 10TB benchmark with Intel 1st Gen Xeon processors with DBR 10.3 and without Photon enabled.

Derived from the power test consisting of all 99 TPC-DS queries ran in sequential order within a single stream.

The results shown are not comparable to an official, audited TPC benchmark.

Images Powered by Shutterstock