Gartner Reprint

Critical Capabilities for Data Management Solutions for Analytics
Published: 16 March 2018 ID: G00328243
Analyst(s):
Rick Greenwald, Adam M. Ronthal, Roxane Edjlali
Summary
Data management solutions for analytics are continuing to improve, with key capabilities consolidating across all vendors. Real-time data analytics have moved to their own distinct use case. Data and analytics leaders can use this research to guide evaluation for modern DMSA offerings.
Overview
Key Findings
More than 70% of organizations surveyed have heterogeneous data management solutions for analytics (DMSA) environments with multiple products in the environment.
Cloud-based DMSAs continue to grow, with the estimated sales of the leading cloud-only DBMS vendors accounting for more than $2 billion of a $34 billion market.
Just over half of our surveyed organizations either use a cloud-based solution or a hybrid cloud and on-premises solution.
Hadoop-based vendors are maturing, but are still not approaching the robust capabilities across multiple use cases offered by more-traditional vendors.
Most solution vendors were rated as "meeting requirements" across all four of our use cases. The majority of vendors were ranked very closely in their capabilities for each use case.
Recommendations
For data and analytics leaders tasked with modernizing information infrastructure:
Use cloud-based solutions as options for new DMSA use cases. Existing DMSA use cases may also benefit from hybrid solutions, which can ease the migration path and deliver greater flexibility.
Provide rapid access to raw data and for use in building analytic models and performing exploratory analysis by exploiting the use of data lakes.
Implement the logical data warehouse (LDW) through the use of semantic interfaces to unify access to heterogeneous data sources as well as the multimodel capabilities of DMSA offerings.
Use robust use-case examples with production-level workloads in proofs of concept to ensure that a new DMSA offering will be able to meet your actual production needs.
What You Need to Know
Organizations are increasingly using their DMSAs to access data from multiple external sources, as well as combining multiple data types and models together in the same DMSA. These requirements are placing new demands on DMSA offerings in this market as customers look for features and functions that represent a significant augmentation of existing enterprise data warehouse capabilities.
Moreover, expectations are now turning to the cloud as an alternative deployment option because of its elasticity and operational pricing models. As combining cloud-based and on-premises elements into a hybrid configuration becomes the norm, organizations expect vendors to support them with such deployments.
In addition to the increased number of data repositories and models in use, newer use cases (such as using DMSA products for real-time analysis of data) are moving from niche to mainstream positions within many organizations. Although the initial projects may be somewhat limited in scope and depth, these new, rapidly incoming data sources are becoming a part of the overall DMSA landscape.
For this market, a data management solution for analytics is defined as a complete software system that supports and manages data in one or many file management systems (most commonly a database or multiple databases). They include specific optimization strategies designed to support analytical processing, including — but not limited to — relational processing, nonrelational processing (such as graph processing), and machine learning or programming languages (such as Python or R).
Data is not necessarily stored in a relational structure and can use multiple models (relational, document, key value, text, graph, geospatial and others).
Our definitions also state that a DMSA:
Is a system for storing, accessing, processing and delivering data intended for one or more of the four primary use cases that Gartner identifies as supporting analytics.
Is not limited to a single specific class or type of database management system (DBMS).
May consist of many different data management technologies in combination; however, any offering or combination of offerings must, at its core, exhibit the ability to provide access to the data under management by open-access tools via commonly used APIs.
Must include mechanisms to isolate workload requirements and control various parameters of end-user access within managed instances of data.
Must manage the storage and access of data residing in a type of storage medium, which may include (but is not limited to) hard-disk drives, flash memory, solid-state drives and DRAM.
This Critical Capabilities research is aimed at data and analytics leaders. We have focused on the 12 most important functional (critical) capabilities that are required to support the four major use cases identified. The research combines analysis of product functions and customer experience to evaluate the support offered by each vendor or products for these critical capabilities. User experience is evaluated based on the companion Magic Quadrant reference survey, 1 Gartner inquiries, peer insights, in-depth reference calls and interactions with vendors. In addition to customer experience, capability ratings include Gartner analysis of differentiating product capabilities as described in the capability definitions.
Gartner took into account both the documented capabilities of the products, and the results of the user surveys on the actual adoption of these capabilities. The survey results were given a greater weight than the stated capabilities, as the ultimate proof of use is with the end users themselves. Consequently, the results as reflected in this Critical Capabilities research should be seen as somewhat lagging — especially for emerging use cases — as organizations need time to implement newer functionality into their environments.
Although this research shared survey results gathered for the 2018 "Magic Quadrant for Data Management Solutions for Analytics," the critical capabilities ratings are focused on how well a specific vendor product addresses one of four use cases. It does not offer an overall estimation of the vendor. In addition, the critical capabilities focus on a single product from each vendor, unlike the Magic Quadrant, where all relevant products or services were considered. Additional products that supported the core functionality of the main product were also considered in this body of research, while similar offerings were not.
Survey customers were initially identified by vendors, and a subset of those customers completed the survey. This methodology was employed to provide a level playing field for vendors of different sizes and orientations. The actual composition of the customer base of a vendor may differ in its overall use of a particular solution or capability. The survey helps to determine if a vendor's user base has widely adopted a capability for production use.
Companies see analytics as a way to new business and greater business opportunities, and DMSA offerings are a key part of this drive. According to our Magic Quadrant survey, the biggest reason for purchase was to drive innovation, and 80% of purchases were determined by the features and functions offered by the DMSA.
As stated above, an increased emphasis was placed on data points and trends collected from the survey this year. As customer adoption of new features and technologies is not immediate, the newer use cases (such as real-time data warehouse) have a bias toward incumbent solutions. These are frequently the default choice for new use cases as a market approaches maturity. New solutions are more likely to have new advanced capabilities but non-risk-averse initial adopters of these are a smaller part of the market. The main part of the market is more risk-averse and more likely to use incumbent products that are not yet able to fully implement these new capabilities.
This research does not include all of the criteria that should be investigated before selecting a particular DMSA vendor. Many other criteria not included in our analysis will come into play, such as whether the offering is a stand-alone DBMS software package, appliance or cloud solution. Other requirements — pricing, vertical industry offerings, the availability of services, for example — are not included but would need to be part of a formal RFP process (see "Toolkit: RFP Template for Data Warehouse and Data Management Solutions for Analytics" ). These aspects do factor in the evaluations for the Magic Quadrant for this market space.
Our research covers the ability of various vendors to provide certain capabilities that are critical to one or more of the use cases described. Be aware that the vendors represented in this Critical Capabilities report range from best-fit solutions to best-of-breed solutions.
Additionally, capabilities are chosen based, in part, on their ability to differentiate between solutions. As vendors approach parity on a capability, the range of scores for that capability narrow and the capability moves toward a status of "basic requirement," which may lead to it being dropped in future Critical Capabilities documents.
Readers should understand that our scores are meant to convey a vendor's standing in relationship to the market at the time the data was finalized. As such, scores for any capability are not absolute from year to year, but relative and only relevant within the context of this specific yearly report.
As detailed below, a score of 3.0 indicates that a product met the requirements for a particular use case. Although vendors are listed in the order of their relevant ranking (and alphabetically in the case of an equivalent score), be aware of the meaning of the individual ratings. Furthermore, Gartner does not recommend using any rating as the sole or primary basis for product selection, as there are many factors outside the scope of this research that can impact the suitability of a product.
In some cases, the overall range of these scores may shift from year to year. These changes are the result of both changing market conditions and refinements in the calculations used to evaluate these capabilities. In this year's research, there have been changes to criteria for evaluating support of external data sources and the replacement of a criterion for repeated queries with a criterion for query optimization.
Analysis
Figure 1. Vendors' Product Scores for the Traditional Data Warehouse Use Case
Source: Gartner (March 2018)
Figure 2. Vendors' Product Scores for the Real-Time Data Warehouse Use Case
Source: Gartner (March 2018)
Figure 3. Vendors' Product Scores for the Logical Data Warehouse Use Case
Source: Gartner (March 2018)
Figure 4. Vendors' Product Scores for the Context-Independent Data Warehouse Use Case
Source: Gartner (March 2018)
Vendors
Actian
Actian , which is headquartered in Palo Alto, California, U.S., offers the Actian Vector analytics database and Actian Vector in Hadoop for analytical workloads; and Actian X for combined operational and analytical processing. The Actian Vector analytics platform can also be deployed on AWS and Microsoft Azure with a bring-your-own-license model.
References are most satisfied with Actian Vector's analytics platform query performance. This is also demonstrated by the scores for performance optimization for both exploratory and traditional uses. Many of these references use Actian Vector analytics platform on small data volumes but with highly interactive query workloads. Analytical workloads are performed over a variety of data types that are loaded into Actian Vector analytics platform. This combination of capabilities makes Actian Vector analytics platform rate above average for the Context-independent Warehouse, Logical Data Warehouse and Real-time Data Warehouse use cases.
Alibaba Cloud (MaxCompute)
Alibaba Cloud is a global cloud computing company headquartered in Hangzhou, China. It offers a wide variety of services including:
ApsaraDB for RDS (relational database service) for MySQL
SQL Server
HybridDB for PostgreSQL (based on the open-source Greenplum Database)
Analytic DB for OLAP analysis
MaxCompute for large data warehouse implementations
MapReduce for Hadoop
Apsara Stack (an on-premises private cloud implementation)
Reference customers of Alibaba combine a number of services including Alibaba object storage and MaxCompute but also HybridDB, E-MapReduce or Analytic DB. Alibaba Cloud met the requirements for all four use cases. Over 60% of Alibaba's survey respondents used the company's products for the traditional data warehouse use case. Survey results showed Alibaba delivering above-average support for both traditional and exploratory data warehouse users.
Amazon Web Services (Amazon Redshift)
Amazon Web Services (AWS) offers Amazon Redshift , a data warehouse service in the cloud. AWS is a wholly owned subsidiary of Amazon. This document focuses on a single product from each vendor, while Amazon believes in a best-fit approach, with customers using multiple services together. Amazon has other services — such as DynamoDB, Athena and the Spectrum capability of Redshift — that can also be used for some DMSA scenarios but is still a rather recent introduction to the AWS product portfolio. The ratings in this Critical Capabilities document center on Redshift.
Amazon Redshift's position for the three use cases repeated from last year — traditional data warehouse, logical data warehouse and context-independent data warehouse — remained relatively the same as last year. Its absolute scores, however, rose across the board, indicating a product that is growing in maturity. The vendor's most notable scoring gains were in the area of logical data warehouse and context-independent data warehouse. The former increase was due in part to the addition of Spectrum, which allows data in S3 to be accessed in the same SQL statement as data in Redshift.
Amazon continues to embrace a best-fit philosophy, which is different from the classic DMSA leaders. With the addition of Glue (a tool that can be used for a broad data catalog) and Spectrum — AWS is starting to give users the ability to access data in multiple formats. The overall revenue from AWS database offerings rose dramatically in the past year, which is certainly a market comment on the acceptance of the AWS philosophy to some degree. This is backed up by the fact that 100% of AWS customers surveyed indicated that product functionality and performance was a factor in their product selection.
Reference customers also scored AWS near the top in terms of value for the money spent and pricing and contract flexibility, which is not surprising based on its spot pricing and other features. Survey respondents also indicated that AWS had one of the fastest times for deployment. This result is partly due to the efficiency of cloud-based deployment, but can also be seen as an indication of the lower complexity of projects being deployed.
Cloudera (Cloudera Enterprise)
Cloudera , which is based in Palo Alto, California, offers Cloudera Enterprise, a platform that includes Cloudera Analytic DB (based on Apache Impala and Hadoop), and Cloudera Data Science & Engineering (based on Hadoop and Spark).
Cloudera also offers additional components such as:
Cloudera Navigator (for data governance).
Cloudera Manager and Cloudera Director (for cluster administration on-premises and in the cloud).
Cloudera Impala (for SQL access).
Kudu (for analytics on transactional data).
Cloudera's platform is available both on-premises and across multiple cloud environments (such as AWS, Microsoft Azure and Google Cloud Platform). It offers cloud-native support for object stores, with Cloudera Altus.
In this year's research, Cloudera achieved a ranking of 3.0 (meets expectations) across all four use cases. Cloudera's best ranking was in the context-independent data warehouse use case. Cloudera's relative ranking in each of the use cases slipped a bit from last year, which could be seen as a general re-assessment of the suitability of Hadoop-based solutions for traditional and logical data warehouse use cases.
GBase (GBase 8a)
GBase , which is based in Beijing, China, offers a number of packages from its QingCloud app center, including:
GBase 8a (a relational massively parallel processing [MPP] data warehousing platform).
GBase Infinidata 8a (a data warehouse appliance).
GBase UP (a logical data warehouse platform supporting data virtualization between GBase 8a, Hadoop and other platforms).
GBase cloud DB
GBase 8a met requirements for all four defined use cases, with the strongest showing in the traditional data warehouse use case. Interestingly, the four capabilities in which GBase 8a scored highest (administration and management, flexible scalability, variety of data types, and exploratory support) are spread across all four use cases in terms of weighted importance. This lead to scores that, while they met requirements for all use cases, did not excel in any one use case enough to push GBase 8a into the top half of the rankings.
However, survey respondents gave GBase top scores for traditional user support, and placed it in the top quartile for workload management — both of which are highly weighted categories for the traditional data warehouse use case. GBase is a strong choice for traditional workloads, and as new product capabilities evolve with GBase UP, we anticipate improvements in support for other use cases as well.
Google (BigQuery)
Google , based in Mountain View, California, is a wholly owned subsidiary of the Alphabet holding company. Google Cloud is the part of Google that is focused on delivering solutions and services to the business market.
Google's dbPaaS offerings for DMSA in the Google Cloud Platform include:
BigQuery (a managed data warehouse offering).
Bigtable (a nonrelational wide-column DBMS).
Cloud Dataproc (a managed Spark and Hadoop service).
Cloud Dataflow and Cloud Pub/Sub (both focused on real-time stream and batch data processing).
Google's BigQuery product is specifically designed to address the needs of the DMSA market. BigQuery met the requirements for all four use cases ranking in the top 10 across the board, with particularly strong showings in the real-time data warehouse and context-independent data warehouse use cases. BigQuery received high scores for flexible scalability, exploratory use-case support and optimization, administration and management, and managing large volumes of data. Lower scores in access to multiple data sources and workload management capabilities (the latter scoring below the "meets requirements" threshold of 3.0) reflect the relative immaturity of the platform compared to other, more-established competitors. BigQuery is best suited for use cases where all the data will reside on the Google Cloud Platform.
Platform maturity, documentation, and support were cited by references as areas needing improvement. Google is aggressively engaged in building out its capabilities, platform maturity and ecosystem, and remains a sound choice for those seeking a DMSA solution based on a modern cloud platform.
Hortonworks (Hortonworks Data Platform)
Hortonworks is based in Santa Clara, California. It offers a range of products including:
Hortonworks Data Platform (a Hadoop distribution, also known as HDP).
Hortonworks DataFlow (for streaming data delivery and ingestion)
HDInsight (a Hadoop service for Microsoft Azure)
Hortonworks Data Cloud Hadoop (for AWS).
In addition, Hortonworks recently introduced Hortonworks DataPlane Service, a unified architecture to manage, govern store, process and access datasets across multiple use cases.
Hortonworks Data Platform (HDP) is a Hadoop-based solution that is often used for data lake implementation. Reference customers use HDP primarily for two reasons, mostly in equal proportions. The first is to provide an integrated and consistent dataset across multiple business domains for analysis by all users. This use is a bit surprising, as the rating of HDP for traditional data warehouse use is below average. This is in line with offloading part of the data warehouses to HDP or using HDP as a staging area to the data warehouse. The second use is as a context-independent data warehouse. This aligns with data lake implementations in support of experimental uses of data and the good rating for this use case. HDP also rates above average for the real-time data warehouse and the logical data warehouse use cases, although these two use cases combined represented less than 15% of the utilization among reference customers.
IBM (Db2)
IBM , which is based in Armonk, New York, offers a wide range of DMSA solutions, including:
Stand-alone DBMSs (Db2, Db2 for z/OS and Informix).
Appliances (PureData System for Analytics, PureData System for Operational Analytics, Integrated Analytics System, Db2 Analytics Accelerator).
Hadoop solutions (BigInsights, based on the Hortonworks Data Platform since 2017).
Managed data warehouse cloud services (Db2 Warehouse on Cloud).
Private cloud data warehouse capabilities (Db2 Warehouse).
IBM's Db2 Big SQL and Fluid Query provide a consolidated access tier to a wide range of DBMS and Hadoop service. In addition, IBM's Watson Data Platform supports further evolution of the company's hybrid cloud and on-premises deployment and management.
IBM Db2 ranks in the top nine for all use cases, with particular strength in the logical data warehouse use case where it ranked fourth. This reflects generally strong capabilities across all use cases as the LDW use case is the only one drawing on every defined critical capability in our model. Nearly two thirds of survey respondents report connecting to external data, contributing to the strong ranking for logical data warehouse use.
In six of the 12 defined critical capabilities (access to multiple data sources, administration and management, advanced analytics, performance optimization for both traditional and exploratory use and flexible scalability), Db2 rated a 4.0 or above. Scores for data ingest ranked it in the middle of the pack with approximately 15% of respondents reporting continuous data loads. This explains Db2's relatively lower ranking for the real-time data warehouse use case.
MapR Technologies (Converged Data Platform)
MapR Technologies , which is based in Santa Clara, California, offers its Converged Data Platform (CDP) in both free (community) and commercial software editions. CDP features performance and storage optimizations using:
Network File System (NFS).
High-availability features.
Administrative and management tools.
MapR Edge, a small footprint edition of CDP, extends MapR's reach to edge-processing use cases common to Internet of Things (IoT) environments.
MapR met requirements for all four use cases. Once again this year, MapR has maintained its strong ranking for the context-independent use case, largely due to its support for a variety of data types, ability to manage large volumes of data, and support for advanced analytics.
In the other three use cases, MapR scored greater than 3.0 (meets requirements) and ranked in the top half for all except traditional data warehousing. Of note, MapR scored in the top quartile for percentage of data loaded continuously, with just under one third of survey respondents reporting using the product in that capacity. The company markets MapR as a converged data platform combining analytics capabilities with real-time streaming, so this is not surprising. Ratings from our reference survey for traditional data warehouse performance and user support both influenced the MapR scores for the traditional data warehouse use cases, where MapR scored in the bottom third for both. This remains a general weakness of products related to big data and Apache Hadoop.
MarkLogic
MarkLogic , which is based in San Carlos, California, offers a nonrelational multimodel DBMS that it describes as "operational and transactional." The product is available in two editions: Essential Enterprise and a free Developer edition. Essential Enterprise can be deployed on-premises, in the cloud, across a range of hybrid infrastructures environments, and on cloud and virtualization platforms, including those owned by Amazon, Microsoft and Google. MarkLogic can also be deployed using the VMware, Pivotal Cloud Foundry, and Red Hat platforms.
MarkLogic continues to promote its offering as the solution for accessing data across multiple silos, both through shared metadata in MarkLogic, and with its longstanding and flexible document storage orientation. MarkLogic has the second highest survey score for deployment in a hybrid cloud environment, which is probably a tribute to its usefulness in combining multiple data sources regardless of platform.
MarkLogic improved its absolute score in all three repeated use cases from last year's Critical Capabilities, and placed in the top five vendors in the real-time and LDW categories. Survey results indicated that MarkLogic is usually not the only DMSA product used by an organization. A very high percentage of survey respondents indicated that MarkLogic's combination of functionality and performance was a key factor in their selection — the second highest percentage in the survey.
MarkLogic's survey respondents did indicate fairly long deployment times, however, since difficulty of implementation was not called out by a very high percentage of the same respondents. This could be as much about the uniqueness of the solution, and the complexity of the projects undertaken, as any steep learning curve or overall difficulty of implementation.
MemSQL
MemSQL , which is based in San Francisco, California, offers a distributed scale-out SQL DBMS with an in-memory row store, along with a memory and disk-based column store that supports transaction and analytic use cases. MemSQL extends its DBMS platform by including real-time analytics with streaming data via Apache Spark or Apache Kafka. MemSQL offers a free Developer Edition for nonproduction use, and a paid-for Enterprise Edition that can be deployed on-premises or as a fully managed cloud service running on AWS or Microsoft Azure infrastructure.
MemSQL's strongest showing is in the real-time data warehouse use case. The strength here is not surprising given MemSQL's focus on real-time, low-latency analytics with integrated capabilities for ingesting streaming data built into the product. More than 60% of survey respondents reported loading data either continuously, or multiple times per day, and MemSQL lead the survey in continuous loading with 46% of respondents using it in that capacity. MemSQL's scores for the other three use cases are generally lower, though it did meet requirements for all. MemSQL is a strong candidate for when real-time analytics or converged infrastructure for transactional and analytics use cases is important.
Micro Focus (Vertica)
Micro Focus , which is based in Newbury, U.K., offers the Vertica Analytics Platform.
It is available as:
Vertica Enterprise (a columnar relational DBMS delivered as a software-only solution for on-premises use).
Vertica in the Clouds (machine images available from the marketplaces for AWS, Microsoft Azure and Google Cloud Platform).
Vertica for SQL (on Hadoop and supporting Hadoop environments).
Vertica shows consistent strength in all four of our defined use cases, meeting requirements for all of them with the strongest showing in the traditional data warehouse use case. Although Vertica's relative ranking has declined this year, this change was due to the movement of the overall market, rather than any diminishment of their capabilities. Vertica's absolute scores actually improved in two out of the three use cases repeated from last year's research.
Vertica customers praised the performance, reliability and scalability, of the product and view it as a mature offering capable of handling enterprise workloads. Vertica's high rating for advanced analytics capabilities was particularly notable.
Among survey respondents, Vertica remains a strong standard for DMSA in those organizations that have deployed it, with more than half noting that is the only accepted and supported DMSA technology within their environments.
Microsoft (SQL Server)
Microsoft , which is based in Redmond, Washington, U.S., offers SQL Server as a software-only solution. It also offers the Analytics Platform System, an MPP data warehouse appliance. In addition, it sells Azure SQL, Azure SQL Data Warehouse (a fully managed, MPP cloud data warehouse), Azure HDInsight (a Hadoop distribution based on Hortonworks) and Azure Data Lake as cloud services. This research is focused on SQL Server.
Microsoft achieved scores at or above the "meets criteria" level of 3.0 across all four use cases, although its scores and rankings dropped slightly from last year's Critical Capabilities ratings. This may be due in part to a fairly small sample size for its reference customers, which means individual ratings can have a larger effect.
Microsoft was in the upper portion of vendors where only one DMSA product is supported by an organization, and garnered the highest score from survey respondents citing a previous relationship with the vendor as a key factor in product selection. It also placed near the top percentage of respondents that used its products for traditional data warehouse operations.
Microsoft did not score well in survey questions about continuous data ingestion, and had the highest percentage of survey responses citing software bugs as a main issue with the platform.
Microsoft does have a very broad range of cloud and on-premises offerings, with two cloud and two on-premises products that can be used in the DMSA area, as well as supporting services in the cloud.
Neo4j
Neo4j — which is based in San Mateo, California and Malmö, Sweden — provides a graph platform that includes the Neo4j native graph database, graph analytics, the Cypher graph query language, data integration, and graph visualization and discovery tools. The company offers an open-source Community Edition, Neo4j Desktop (free for developers and data scientists) and a paid-for Enterprise Edition for production deployments.
Neo4j was not in the Critical Capabilities research last year, so no year-to-year comparisons can be made. Neo4j is offered as a graph database and, as such, has less applicability to the more-general use cases covered in this research. However, it did meet expectations for the context-independent data warehouse use case, which comes closest to its specialist focus, and rated nearly as well for the logical data warehouse use case.
The most widely stated reason in the survey for selecting Neo4j was to drive innovation, enable by the relatively new category of graph capabilities. The largest percentage of survey respondents (by far) chose Neo4j based on its functionality, which is also appropriate for a technology in the early stages of adoption.
Oracle
Oracle , which is based in Redwood Shores, California, provides Oracle Database 12c, Oracle Exadata Database Machine, Oracle Big Data Appliance, Oracle Big Data Management System, Oracle Big Data SQL and Oracle Big Data Connectors.
Oracle Cloud provides Oracle Database Cloud Service, Oracle Database Cloud Exadata Cloud Service, Oracle Big Data Cloud Service, and the upcoming Oracle Autonomous Data Warehouse in the cloud. Additionally, Oracle's cloud portfolio includes on-premises solutions with Oracle Exadata Cloud at Customer and Oracle Big Data Cloud at Customer. This research centers on the Oracle 12c offering
Oracle placed second in three of our four use cases and first in the context-independent data warehouse use case, an improvement over last year's showing of second place in all use cases. Oracle also scored a 4.0 or above in three out of four use cases this year.
Although reference survey respondents ranked Oracle near the bottom in terms of combined cloud and hybrid deployments, the introduction of a new data warehouse as a service offering, as well as continued momentum in the cloud, should improve that ranking.
Oracle ranked in the upper portion of vendors for customers' overall satisfaction in working with the vendor, and near the top in terms of the survey ratings for overall product capabilities. When customers were asked to rank vendors' critical capabilities, Oracle received ratings of 4.15 or higher for all capabilities, and was tied for the highest score for traditional data warehouse performance.
Pivotal (Greenplum)
Pivotal , which is based in San Francisco, California, offers the Pivotal Greenplum database, an open-source MPP database based on PostgreSQL. This solution is available as software or in the cloud on either AWS or Microsoft Azure infrastructures. Pivotal also offers an appliance in the form of the Dell EMC Data Computing Appliance, as well as GemFire, an in-memory caching data grid product. Pivotal also sells a caching service Pivotal Cloud Cache, based on GemFire, which runs on its Pivotal Cloud Foundry platform. This research is based on the Greenplum offering.
Pivotal Greenplum was ranked in the middle for all use cases, with its best relative showing in the traditional data warehouse use case. The company met expectations (a rating of 3.0 or better) across all four use cases. A large majority of survey responses indicated use of Pivotal Greenplum on-premises.
Respondents indicated that the most-frequent competitors for Pivotal Greenplum were some of the megavendors — Oracle and IBM — and its closest competitor, Vertica. A very large percentage of survey responses indicated that the key factor in choosing Pivotal Greenplum was its functionality and performance.
Qubole
Qubole is based in Santa Clara, California. It offers the Qubole Data Service Enterprise Edition (QDS), a cloud-based processing engine for data managed under cloud object storage or other data management solutions, such as relational DBMSs. The company's processing engine can utilize Hadoop, Spark, Presto, TensorFlow and Airflow. Qubole also offers Qubole Cloud Agents, which are add-on services that can optimize cloud resource consumption and automate workloads. Qubole is available on AWS, Microsoft Azure and Oracle Cloud Platform.
QDS offers a flexible data management solution easing the access and processing of data not necessarily residing under its own management. It is also optimized for cloud resource consumption, which make it particularly suitable for the context-independent data warehouse use case. The flexibility of access to data it offers makes it particularly appealing for the logical data warehouse use case. The reference customer distribution among use cases has a higher proportion than average across these two use cases. Around one third of the references indicate using the solution for context-independent data warehouse use cases.
SAP HANA
SAP, which is based in Walldorf, Germany, offers the SAP HANA platform, an in-memory column-store DBMS that supports operational and analytical use cases. SAP BW/4HANA is a packaged data warehouse solution. SAP HANA and SAP BW/4HANA are also offered as cloud solutions (public, private and based on the SAP Cloud Platform) and as an appliancelike hardware reference architecture. SAP also offers SAP Cloud Platform Big Data Services, a cloud-based Hadoop distribution, and SAP Vora for Spark and Hadoop processing.
Three out of four SAP references surveyed for this research indicated using SAP BW on SAP HANA for data warehousing with very few SAP BW/4HANA (a different version of BW designed to run only on SAP HANA) implementations so far. Real-time data warehouse and context-independent uses are small. This distribution also reflects the mix of use cases Gartner observes through client interactions. These uses of SAP HANA also align with the good scores of SAP for traditional data warehouse and logical data warehouse use cases.
The logical data warehouse use case is covered by SAP HANA Smart Data Access, which offers access to wide variety of sources from SAP HANA. Finally, the real-time data warehouse use case, while not highly represented among our references, can be addressed by SAP thanks to its in-memory implementation allowing for both transactional and analytical processing on the same data. However, SAP HANA's use in this area is aligned with the overall market, where real-time data warehouse use cases remain a small proportion of the total.
Snowflake
Snowflake , which is based in San Mateo, California, offers a fully managed data warehouse as a service on AWS infrastructure. It supports ACID-compliant relational processing, as well as native support for document store formats such as JSON, Avro and XML. A native Spark connector, R integration, support for user-defined functions, dynamic elasticity, temporal support, and recently announced data-sharing capabilities round out the core offering. Snowflake is currently available only in the AWS cloud.
Snowflake references demonstrated an interesting split across all four of our use cases, with slightly over 50% using it for the traditional data warehouse use case, and the other three use cases being equally represented in the remainder. This distribution of use aligns with the overall good scores that Snowflake has received across all four use cases with a particularly high score for the traditional data warehouse use case.
Among all four use cases, the logical data warehouse use case rated the lowest because of the requirement to load data in Snowflake. However, this has not prevented references from using the solution to access data in multiple formats. The company's cloud implementation, functional capabilities and separation of storage and compute allow for flexible implementations across all four use cases.
Teradata
Teradata is based in Dayton, Ohio. Its offerings include Teradata Database, a software-only DBMS solution; Teradata IntelliFlex and IntelliBase data warehouse appliances, a range of other appliance offerings, and a cloud data warehouse solution (all with MPP). These are available both on its managed cloud (Teradata IntelliCloud) and on public cloud infrastructure from providers such as AWS and Microsoft. Support for the logical data warehouse comes in the form of Teradata's Unified Data Architecture (UDA). Teradata QueryGrid (part of the UDA) provides multisystem query support via the company's own software, as well as via the open-source platform Presto. Teradata also offers the Aster Analytics platform and Hadoop support for Cloudera, Hortonworks and MapR distributions, as well as analytic consulting services.
Teradata rated in the top three vendors across all four of our use cases, and is the uncontested leader in the traditional data warehouse use case — a historical "sweet spot" for the vendor. The company's real-time data warehouse use case ranking is also particularly high, with a number of references using Teradata for active data warehousing supporting mission-critical use cases. Two thirds of Teradata references indicate using it for the Traditional Data Warehouse use case. One quarter cited use of the logical data warehouse use case and the remaining are equally split among the context-independent data warehouse and real-time data warehouse use cases. Teradata references often use multiple products from the Teradata portfolio, in particular the Teradata query grid, which provides flexible access to data and Hadoop integration as well as Aster for context-independent use cases.
Treasure Data
Treasure Data is based in Mountain View, California. Its Customer Data Platform is a fully managed DMSA solution running on AWS infrastructure with availability regions in the U.S. and Japan. Treasure Data Platform provides a cloud data lake, combined with relational data marts. Its ability to ingest data from a wide range of sources, and to feed data to downstream data management platforms, is a focus for Treasure Data.
New to this year's critical capabilities research, Treasure Data scored above a 3.0 (met requirements) for the context-independent data warehouse use case. The relatively poor showing in other use cases is largely due to this vendor's niche focus. Treasure Data's early use cases were largely in the IoT space, which does not align particularly well to any of our four defined use cases for DMSA due to the fairly unique requirements for such platforms.
Treasure Data scored just below the 3.0 threshold in two of the other use cases — real-time data warehouse and logical data warehouse — essentially meeting requirements for both. Scores for the percentage of data used in a continuous ingest capacity were in the top half of our reference survey, and nearly three quarters of surveyed respondents indicated that they were connecting to external data sources. Treasure Data's approach to building data lakes via flexible data integration capabilities make it a strong candidate for IoT use cases, and other use cases requiring strong advanced analytics capabilities.
Context
This year's Critical Capabilities scores illustrate the increasing breadth of viable solutions for DMSA. Some vendors did significantly better in some use cases than others, based both on their capabilities and on the adoption of their offerings for those use cases. However, most vendors who qualified for this Critical Capabilities research achieved a rating of 3.0 or above, which indicates their product "meets requirements," or higher. Of the 84 ratings across the four use cases, there were only seven scores below 3.0, and three of these were above 2.9. We believe this indicates that customers are using the whole range of offerings for the whole range of use cases, in part because the offerings can address the use cases and in part because of the relative ease of sticking with a known vendor. This might be as expected, considering that the survey data came from reference customers recommended by the vendors.
At the same time, the range of scores (especially if we exclude the first few leaders and the bottom few laggards) narrowed, indicating less differentiation across the use-case capabilities.
The traditional MPP cohort vendors who are not also megavendors — Vertica and Pivotal — saw their relative standings slip somewhat, as other companies, those with cloud-based products in particular, start to offer the same capabilities in areas such as traditional data warehouses. As default functionality in the market advances, differentiators for these vendors become less compelling.
Although the ability to deploy on-premises, in the cloud or in a hybrid model was not considered a critical capability in this research, the market saw wider adoption of cloud-based solutions. This includes cloud-only vendors, such as AWS, Google and Snowflake. The survey showed approximately half of all respondents using a hybrid or cloud-only platform for their DMSA.
As in the Magic Quadrant, Hadoop-based vendors continue to slowly slip relative to the rest of the market in the Critical Capabilities ratings. Although Hadoop-based solutions still address a set of use cases, they have not been successful in broadening their capabilities to support a broader set of use cases.
Inclusion in this research should be seen as a significant accomplishment. Readers should be aware that there are fairly stringent requirements for inclusion in this Critical Capabilities evaluation. In a similar manner, many vendors failed to meet a small number of requirements, and may still be an acceptable alternative to those vendors in the research, especially for focused or edge use cases.
This year saw the addition of the real-time data warehouse use case, which is replacing the operational data warehouse use case. As organization use more and broader sources of data (such as information from IoT systems) the ability to analyze data in near-real-time has become a source of great value.
Product/Service Class Definition
The various capabilities identified below address the major needs identified above.
Critical Capabilities Definition
Access to Multiple Data Sources
This capability reflects the prevalence of queries across multiple data types and sources by customers across all types of queries, and access to data in other sources beyond the database management system, such as other RDBMSs or Hadoop distributions.
This capability is also rated on the functionality implemented when accessing external data sources, such as whether some kind of processing (like predicate evaluation) is passed to the external data source for implementation locally. Additionally, offerings could deliver some of this capability through storing multiple data types within their products.
Administration and Management
This capability demonstrates the product's ease of implementation, upgrade and ease of use as expressed by customers. It covers overall ease of administration and management, not only during implementation but also during ongoing use and upgrade phases.
Scoring is also affected by the complexity of deployment and by vendor history. Some vendors have recent offerings for which upgrades may not yet have been released.
In addition to customer experience, this capability takes into consideration the completeness of vendor administration capabilities, such as role-based activities, advisors, utilization and capacity planning, resource allocation features and user interface, as well as complexity of deployment and management.
Advanced Analytics
The ability to perform advanced analytic operations within the product. The capability was evaluated both on the basis of what functionality was offered in the current version of the product, and what functionality was actually being used by customers, based on their survey responses.
Data Ingest
This capability represents the prevalence of data being loaded continuously by customers. Some use cases — more than others — require data to be loaded from the operational sources in near real time, making this a key capability in the real-time data warehouse use case (new this year).
This capability was evaluated based on survey responses indicating continuous data loading, as well as responses indicating the amount of data loaded daily and analyst assessments based on briefings and inquiries.
Managing Large Volumes of Data
This capability reflects if the volume of data managed by customers is large. This applies to data of multiple data structures and formats.
It plays a role in all use cases, but to various degrees, as it may not be equally important for all. In this context, we have defined "small" as being below 10 terabytes (TB) and "large" as being over 150TB, with consideration given to those vendors whose survey respondents reported data stores of 1 petabyte or larger. This year, the mean size of survey respondents' data stores was considered, rather than the median, which avoids result skew based on a small number of very large data stores.
In addition to customer experience, this capability takes into consideration the ability of the vendor to address management of query workloads and the availability of price performance optimization options, as well as strategies for query optimization in isolation.
Optimized Performance (Traditional)
This capability reflects the features and functions of a product that was designed to address traditional data warehouse workloads. These features would be more focused on optimization of repeated and complex queries.
Optimized Performance (Exploratory)
This capability reflects the features and functions of a product that was designed to address exploratory data warehouse workloads, such as those used for building models or prescriptive analytics.
These workloads have a different set of requirements from traditional data warehouse workloads, and were consequently evaluated separately.
Flexible Scalability
This capability reflects the ease with which a product can scale both up and down in response to changing workloads or user specifications.
Different products could deliver this capability in different ways. Cloud-based vendors can scale up with little user effort, although the separation of compute and storage can make it easier for the cloud vendor to implement this capability.
Distributed solutions typically can scale out more easily than nondistributed solutions, although there is significant variation even among distributed architectures in this area.
Variety of Data Types
This capability reflects the ability of an offering to support a variety of data types, either by native storage or by accessing those data types through some type of virtualized interface.
Workload Management
This capability evaluates how well a product manages different types and sizes of workloads.
This ability can significantly contribute to a product being able to handle demanding workloads without an excess increase in resources, as well as the ability of the product to handle varying workloads without a corresponding variance in response times.
Traditional Use Support
This capability looks at the overall ability of a product to support traditional data warehouse workloads and their users. These workloads are typically initiated by nontechnical business users and casual users.
The rating in this area was based on both survey data and the functionality within the product itself.
In this year's Critical Capabilities calculations, business analysts and casual users were classified as traditional data warehouse users, while data scientists and data miners were classified as discovery users.
The criteria for traditional data warehouse use were based, in large part, on the relative percentage of users classified as business analysts or casual users. These skill sets were defined as:
Business analyst: Utilizes online analytical processing and dimensional tools to create new objects. Some faculty with computer languages and computer processing techniques.
Casual user: Regularly uses portals and prebuilt interfaces. Minimally capable of designing dimensional analytics (if at all).
We also took into consideration some survey results and product evaluations relating to traditional data warehouse usage.
Exploratory Use Support
This capability looks at the overall ability of a product to support exploratory data warehouse workloads and their users, such as model building, predictive analytics and prescriptive analytics.
These workloads are typically initiated by data science and data miner users. The rating in this area was based on both survey data and the functionality within the product itself.
Use Cases
Traditional Data Warehouse
This use case involves managing historical data coming from various structured sources. Data is mainly loaded through bulk and batch loading.
The traditional data warehouse use case can manage large volumes of data, and is primarily used for standard reporting and dashboarding. To a lesser extent, it is used for free-form querying and mining, or operational queries. It requires high capabilities for system availability, administration and management, given the mixed workload capabilities for queries and user skills breakdown. Query optimization plays a role in this use case, as many of the business-intelligence-style queries are used repetitively, multiplying the effects of the optimization.
Real-Time Data Warehouse
This use case manages structured data that is loaded continuously in support of embedded analytics in applications, real-time data warehousing and operational data stores.
This year, the emphasis for this use case shifted to those capabilities that support real-time operations.
The real-time data warehouse use case primarily supports reporting and automated queries to support operational needs, and will require high-availability and disaster recovery capabilities to meet operational needs. In addition, this use case encompasses use cases requiring high concurrency, low latency queries, usually using more-restricted amounts of data. Managing different types of users or workloads, such as ad hoc querying and mining, will be of less importance as the major driver is to meet operational excellence.
Logical Data Warehouse
This use case manages data variety and volume of data for both structured and other content data types.
Besides structured data coming from transactional applications, this use case includes other content data types such as machine data, text documents, images and videos. Because additional content types can drive large data volumes, managing large volumes is an important criterion. The logical data warehouse is also required to meet diverse query capabilities and support diverse user skills.
This use case supports queries reaching into other sources than the data warehouse DBMS alone, as well as the storage of different types of data within the product. The logical data warehouse encompasses the features of the other use cases, combined with the ability to include a variety of data sources in analytic operations.
Context-Independent Data Warehouse
This use case concerns new data values, variants of data form and new relationships. It supports search, graph and other capabilities for discovering new information models.
It is primarily used for free-form queries to support forecasting, predictive modeling or other mining styles, as well as queries supporting multiple data types and sources. It has no operational requirements and favors advanced users such as data scientists or business analysts, resulting in free-form queries across potentially multiple data types. The context-independent data warehouse use case is more likely to use siloed data that has not been integrated with other data or subjected to full data governance practices.
Vendors Added and Dropped
Micro Focus (Vertica)
Dropped
1010data did not demonstrate that at least 10% of its customer base was from outside its home region, and therefore does not meet the inclusion criteria requiring production customers from at least two distinct geographic regions.
EnterpriseDB did not demonstrate that it fully supported at least two of the four defined use cases.
MongoDB did not demonstrate that it fully supported at least two of the four defined use cases.
Transwarp Technology did not demonstrate that at least 10% of its customer base was from outside its home region, and therefore does not meet the inclusion criteria requiring production customers from at least two distinct geographic regions.
Hewlett Packard Enterprise completed the spin-off and merger of its software division to Micro Focus in September 2016. This included the Vertica software product, which remains in this Critical Capabilities under the Micro Focus entry.
Huawei did not supply a sufficient number of verified reference customers.
Inclusion Criteria
To be included in this Critical Capabilities research, vendors had to meet the same criteria that were required for the corresponding "Magic Quadrant Data Management Solutions for Analytics" and are detailed below. In addition, vendors have to supply a viable number of reference customers, since a large weight is placed on survey responses in this research.
Vendors must have DMSA software generally available for licensing or supported for download for approximately one year (since 1 December 2016). We do not consider beta releases.
We used the most recent release of the software to evaluate each vendor's current technical capabilities. For existing solutions, and direct vendor customer references and reference survey responses, all versions currently used in production were considered. For older versions, we considered whether later releases may have addressed reported issues, but also the rate at which customers refuse to move to newer versions.
Product evaluations included technical capabilities, features and functionality present in the product or supported for download on 1 December 2017. Capabilities, product features or functionality released after this date could be included at Gartner's discretion and in a manner Gartner deemed appropriate to ensure the quality of our research product on behalf of our nonvendor clients. We also considered how such later releases might reasonably impact the end-user experience.
Vendors should provide 30 verifiable production implementations that will exhibit generated revenue from distinct organizations with data management solutions for analytics indicating they are in production.
Vendors must have a minimum of $10 million USD in revenue with a 50% growth rate year over year from calendar year 2015 to 2016.
Or
More than $40 million in revenue.
Revenue can be from licenses, support and/or maintenance. Revenue requirements are unchanged in this year's iteration. We will consider public sources of revenue reporting where possible. If such sources are not available, and Gartner market share data is not sufficient (or is in disagreement with the vendor), we will accept a signed attestation from a senior executive from the organization's finance department or the CEO.
The production customer base must include customers from three or more vertical industries (see Note 1).
Customers in production must have deployed data management solutions for analytics that integrate data from at least two operational source systems for more than one end-user community (such as separate business lines or differing levels of analytics).
Vendors must demonstrate production customers from at least two distinct geographic regions (see Note 2). Due to clients wanting to know global presence for vendors, we want to clarify that when we state at least two distinct geographies, at least 10% of the verified production customer base must be outside of the vendor's home geography.
Any acquired product must have been acquired and offered by the acquiring vendor as of 30 June 2017.
Support for the included data management for analytics product(s) had to be available from the vendor. We also considered products from vendors that control or contribute specific technology components to the engineering of open-source DBMSs and their support.
We included in our assessments the capability of vendors to coordinate data management and processing from additional sources beyond the evaluated DMSA. However, vendors in this research need to offer significant value-added capabilities beyond simply providing an interface to data stored in other sources.
Vendors must provide support for at least two of the four defined use cases. (Note: this is a change from last year's criteria, which only required one use case).
Vendors must at least provide relational processing. Depth of processing capabilities and variety of analytical processing options are considered as advantageous in the evaluation criteria.
Vendors participating in the DMSA market had to demonstrate their ability to deliver the necessary services to support a data management solution for analytics through the establishment and delivery of support processes, professional services and/or committed resources and budget.
Products that exclusively support an integrated front-end tool that reads only from the paired data management system did not qualify for assessment.
We also consider the following factors when deciding whether products were eligible for inclusion:
Relational DBMS.
Europe (Western and Eastern Europe)
The Middle East and Africa (including North Africa)
Asia/Pacific (including Japan)
Critical Capabilities Methodology
This methodology requires analysts to identify the critical capabilities for a class of products or services. Each capability is then weighted in terms of its relative importance for specific product or service use cases. Next, products/services are rated in terms of how well they achieve each of the critical capabilities. A score that summarizes how well they meet the critical capabilities for each use case is then calculated for each product/service.
"Critical capabilities" are attributes that differentiate products/services in a class in terms of their quality and performance. Gartner recommends that users consider the set of critical capabilities as some of the most important criteria for acquisition decisions.
In defining the product/service category for evaluation, the analyst first identifies the leading uses for the products/services in this market. What needs are end-users looking to fulfill, when considering products/services in this market? Use cases should match common client deployment scenarios. These distinct client scenarios define the Use Cases.
The analyst then identifies the critical capabilities. These capabilities are generalized groups of features commonly required by this class of products/services. Each capability is assigned a level of importance in fulfilling that particular need; some sets of features are more important than others, depending on the use case being evaluated.
Each vendor’s product or service is evaluated in terms of how well it delivers each capability, on a five-point scale. These ratings are displayed side-by-side for all vendors, allowing easy comparisons between the different sets of features.
Ratings and summary scores range from 1.0 to 5.0:
1 = Poor or Absent: most or all defined requirements for a capability are not achieved
2 = Fair: some requirements are not achieved
3 = Good: meets requirements
4 = Excellent: meets or exceeds some requirements
5 = Outstanding: significantly exceeds requirements
To determine an overall score for each product in the use cases, the product ratings are multiplied by the weightings to come up with the product score in use cases.
The critical capabilities Gartner has selected do not represent all capabilities for any product; therefore, may not represent those most important for a specific use situation or business objective. Clients should use a critical capabilities analysis as one of several sources of input about a product before making a product/service decision.
© 2018 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. This publication may not be reproduced or distributed in any form without Gartner's prior written permission. If you are authorized to access this publication, your use of it is subject to the Usage Guidelines for Gartner Services posted on gartner.com. The information contained in this publication has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information and shall have no liability for errors, omissions or inadequacies in such information. This publication consists of the opinions of Gartner's research organization and should not be construed as statements of fact. The opinions expressed herein are subject to change without notice. Gartner provides information technology research and advisory services to a wide range of technology consumers, manufacturers and sellers, and may have client relationships with, and derive revenues from, companies discussed herein. Although Gartner research may include a discussion of related legal issues, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner is a public company, and its shareholders may include firms and funds that have financial interests in entities covered in Gartner research. Gartner's Board of Directors may include senior managers of these firms or funds. Gartner research is produced independently by its research organization without input or influence from these firms, funds or their managers. For further information on the independence and integrity of Gartner research, see " Guiding Principles on Independence and Objectivity. "

Images Powered by Shutterstock

The Data Daily

Gartner Reprint