A Deep Dive into Cloud Data Warehousing | Teradata

Read original article here

The ability to efficiently process, collate, and analyze data has only grown more valuable in recent years. Cloud data warehousing allows large organizations to meet this need with considerable agility, effectiveness, and speed. Gartner has predicted that by 2022, 75% of all databases will have migrated to — or be initially deployed within — the cloud.

All available evidence seems to suggest that enterprise data use will continue to increase, and that the types of data being circulated, and their sources, will become more varied. This evolving diversity of data requires a flexible approach that may be outside the scope of what traditional data warehousing methods can handle, and that's where the cloud comes in.

On-premises infrastructure still offers value for data warehousing, but the cloud-first data warehouse, including hybrid environments, has rapidly become the norm. Trends indicate this shift will continue. Let's take a look at the fundamentals of this technology and examine how best to evaluate and implement a cloud data warehousing solution for your enterprise.

A cloud data warehouse performs all of the functions you would expect of a traditional data warehouse — data processing, collation, integration, cleansing, loading, reporting, and so on — but does so within a public cloud environment. Major examples include Microsoft Azure SQL Data Warehouse, Amazon Redshift, Teradata Vantage, Google Cloud's BigQuery, and Snowflake Cloud Data Platform.

Like its on-premises counterpart, an enterprise data warehouse deployed in the cloud is typically a relational database, focusing on structured and semi-structured data. This is the kind you'd see in various customer relationship management (CRM), enterprise resource planning (ERP), and point-of-sale applications, to name just a few. Unstructured data, meanwhile, is typically aggregated using a data lake framework, which can also be cloud-based.

At the most granular level, the majority of data stored in a warehouse is characterized as either facts, measures, or dimensions:

A cloud database is distinguished by its versatility, and as such it can easily be multi-dimensional. In addition to its ability to easily manage many dimensions of both current and historical big data in a single venue, a modern cloud database can operate on serverless architecture, which can help minimize an enterprise's data management responsibilities. Alternatively, cloud databases may use the cluster-and-node approach, in which two or more physical servers are used.

Aside from being located on-premises as opposed to in the cloud, the most basic differences between traditional enterprise data warehouse tools and warehouses managed by a cloud provider are found in the architecture and modeling:

A traditional data warehouse solution is delineated by tiers. The models seen in traditional data warehousing are as follows:

The types of cloud data warehouse services you'll see can generally be categorized as either a cluster-based or serverless architecture:

Both cloud data warehouse types detailed above can offer very quick query responses. The main difference is management: Enterprises must oversee cluster-based warehousing to a certain extent, requesting that their provider add or subtract nodes based on data traffic. Serverless users will expect their provider to dynamically allocate resources as necessary to maximize query speed.

Amassing and collating all of those gigabytes upon gigabytes (and, eventually, terabytes) of data isn't about storage or operations. The insights it can reveal are capable of being the foundation for strategic development that drives growth and the bottom line — and they must be unlocked with analytics tools.

Running data analytics and reporting on a data warehouse hosted via a cloud solution is quite different from completing the same tasks for an on-premises warehouse. In fact, it's arguably one of the most exciting cloud computing trends in the enterprise world right now.

Whether working in a single public cloud, using a multi-cloud solution, or operating a hybrid cloud deployment tied to on-premises data infrastructure, a cloud data warehouse offers greater and more cost-effective scalability and elasticity for analytics workloads, as they expand and contract in conjunction with an enterprise's shifting priorities. Queries will run more quickly than they would in an on-premises warehouse, at a lower overall cost due to the lack of hardware overhead.

With the right data analytics engine for the cloud, you can give your organization the flexibility to craft and implement algorithms as sophisticated as your circumstances require, using the programming languages you're familiar with, such as SQL, Python, SAS, and R. Scalable analytics in this context brings to bear leading-edge machine learning processes, clustering and segmentation, sentiment parsing, text extraction, graphing, and geospatial or time series analysis.

Additionally, running data warehouse analytics in the cloud allows you to integrate with numerous data management services: Amazon EBS, S3, SageMaker, Glue, and Lambda, as well as Azure Blob Storage, Data Factory, ML Studio, and PowerBI, are just a few examples.

First, you must consider whether a cluster-based or serverless warehouse architecture will be right for your organization's cloud deployment.

Clustered warehouses have more predictable pricing and allow more direct oversight, but the latter advantage comes at the cost of devoting more time and resources to managing elasticity, capacity, and cluster health. By contrast, serverless models are completely overseen by your CSP and elasticity is scaled automatically, but you pay either per query or based on utilization, which can be difficult to predict.

Pricing, in fact, may be the most complicated aspect of choosing a cloud data warehouse, regardless of model. One of the chief advantages of a strong cloud platform is its elasticity, but at times when data workloads are steady, you may come upon cost inefficiencies. Additionally, it's critical to monitor any costs associated with workflows that move data out of the cloud, as well as complicated budgeting and cost controls that can quickly spin out of control.

Last but not least, initial implementation of a cloud-based warehouse may come with slower-than-expected performance and require users to change their practices to accommodate this early hiccup.

The key to making the most of a cloud data warehouse is using it alongside an agile, scalable, and flexibly priced connected multi-cloud data platform like Teradata Vantage. Vantage is compatible with complementary data tools from major cloud providers, and pricing is based solely on use. Also, the platform works seamlessly in any cloud environment or on-premises, and allows for fluid movement of data and applications back and forth from physical data infrastructure to the cloud — and even between cloud providers in a multi-cloud model — as need be.

Images Powered by Shutterstock

The Data Daily

A Deep Dive into Cloud Data Warehousing | Teradata