The Data Daily

Impulse Data Warehousing and OLAP Solution Outperforms Google BigQuery by 3x

Impulse Data Warehousing and OLAP Solution Outperforms Google BigQuery by 3x

A comprehensive benchmarking of Accure’s Impulse data warehousing and OLAP solution was performed and compared against the price-performance of Google BigQuery (GBQ). Impulse outperformed GBQ by 3x on an average in all queries executed on both the platforms against the same dataset (described below). For the same query performance, Impulse runs on a cluster that costs 10% of GBQ.

Method: Star Schema Benchmark (SSB), which is a dataset and query set designed to evaluate performance of data warehouses since 2007

Average query performance: 6.5 second on Impulse and 20 seconds on BigQuery

The benchmarking method that we followed to compare the performance of Impulse data warehouse against BigQuery is:

Google BigQuery offers two different pricing options — on demand and flat-rate pricing. For the purpose of this benchmarking, we took the most cost effective pricing that BigQuery offers. We considered the $1,700 per month rate for a flat-rate based for one year of committed usage of 100 reserved slots.

Impulse was installed on Amazon AWS EC2 instances running Ubuntu 20.04. The cluster configuration, machine types and monthly costs are described below.

We utilized a publicly available free tool, https://github.com/lemire/StarSchemaBenchmark, to generate an SSB compliant dataset.

The datagen tool created the following tables:

The data files are pipe “|” delimited having the schema shown in Figure 1 below.

The pipe delimited data was ingested using Impulse’s “Delimited File” ingester with the output format selected as parquet. Impulse stores all data on a HDFS cluster. Impulse was configured to load the data into the data warehouse. The data warehouse was configured with HDFS as its deep storage system. The data was partitioned by month.

The parquet data was manually moved to the Google Cloud Storage bucket from where it was read and loaded into BigQuery tables.

Impulse consists of several components for end-to-end machine learning, enterprise automation, big data analytics and visualization. For the purpose of this benchmarking we installed only those components that were needed to conduct this exercise and also to keep the AWS server cost down. We utilized the following AWS machine types to create the cluster:

The following Table 1 shows the average query response times when Star Schema queries were executed on Impulse and BigQuery. The Query ID refers to the same query id as is mentioned in the original paper, https://www.cs.umb.edu/~poneil/StarSchemaB.pdf. On average, the total time taken by all 13 queries on Impulse was 6.5 seconds compared to 20 seconds on BigQuery. In other words, Impulse performed on average 3 times faster than BigQuery.

We also performed concurrency tests to compare Impulse cluster cost and BigQuery cost. Concurrency 2 means that two users are concurrently accessing the same dataset. For our experiment we conducted the concurrency test for a single user and extrapolated the results for up to 64 concurrent execution of queries on a 24x7 days basis.

Table 2 below shows the price comparison of the two platforms. We utilized Google Cloud Console to record the slot utilization time and number of slots. For the purpose of pricing, we took the most cost effective offer that BogQuery has. We considered $1,700 per month for a yearly committed usage of 100 slot batches.

On average, each data node of the Impulse server costs $449.28 per month for on demand AWS instances. However, for long term committed usage, we considered a rounded price of $200 per month.

Table 2 and Figure 3 indicate that the Impulse server price is about 10% of BigQuery monthly cost. For example, for a 64 concurrent-user-based usage, the monthly price of BigQuery is $85,000 compared to $9,400 on Impulse.

Table 2: BigQuery flat rate monthly cost (for 1 year committed usage of 100 slot batch) and Impulse server monthly cost (with 3 year long term committed usage on AWS)

Figure 3: Graphical representation of price-performance comparison of executing Star Schema Queries with concurrencies on BigQuery and Impulse.

Impulse Data Warehouse is built based on technology that offers scale, speed, and consistent query response at concurrency. It is based on the column-based storage system similar to BigQuery.

While BigQuery is a 100% cloud based platform with shared resources, Impulse technology can be deployed on any Linux based hardware, such as on-premise, private data center, or on cloud virtual machines.

Accure provides software platforms for data engineers, scientists, analysts, and automation engineers to efficiently solve machine learning problems and automate business processes.

Accure provides products and professional services to prototype, build, deploy and scale enterprise AI.

Accure engineered Impulse to accelerate all phases of AI development. Accure’s professional services help connect all pieces together to build sustainable solutions so that customers focus on deriving values from the AI implementation.

To test drive Momentum data warehousing solution, contact at momentum@accure.ai or signup for a sandbox environment by visiting https://one.accure.ai:8444.

Images Powered by Shutterstock