What is a data fabric?
Learn how data fabrics orchestrate data intelligently across a distributed landscape, surfacing it for data consumers
What is a data fabric?
Data fabric is an architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems. Over the last decade, developments within hybrid cloud , artificial intelligence , the internet of things (IoT), and edge computing have led to the exponential growth of big data, creating even more complexity for enterprises to manage. This has made the unification and governance of data environments an increasing priority as this growth has created significant challenges, such as data silos, security risks, and general bottlenecks to decision making. Data management teams are addressing these challenges head on with data fabric solutions . They are leveraging them to unify their disparate data systems, embed governance, strengthen security and privacy measures, and provide more data accessibility to workers, particularly their business users.
These data integration efforts via data fabrics allow for more holistic, data-centric decision-making. Historically, an enterprise may have had different data platforms aligned to specific lines of business. For example, you might have a HR data platform, a supply chain data platform, and a customer data platform, which house data in different and separate environments despite potential overlaps. However, a data fabric can allow decision-makers to view this data more cohesively to better understand the customer lifecycle, making connections between data that didn’t exist before. By closing these gaps in understanding of customers, products and processes, data fabrics are accelerating digital transformation and automation initiatives across businesses.
Data fabric vs. data virtualization
Data virtualization is one of the technologies that enables a data fabric approach. Rather than physically moving the data from various on-premises and cloud sources using the standard ETL (extract, transform, load) processes, a data virtualization tool connects to the different sources, integrating only the metadata required and creating a virtual data layer. This allows users to leverage the source data in real-time.
Data fabric architecture
By leveraging data services and APIs, data fabrics pull together data from legacy systems, data lakes , data warehouses , sql databases , and apps, providing a holistic view into business performance. In contrast to these individual data storage systems, it aims to create more fluidity across data environments, attempting to counteract the problem of data gravity—i.e. the idea that data becomes more difficult to move as it grows in size. A data fabric abstracts away the technological complexities engaged for data movement, transformation and integration, making all data available across the enterprise.
Data fabric architectures operate around the idea of loosely coupling data in platforms with applications that need it. One example of data fabric architecture in a multi-cloud environment may look like the below, where one cloud, like AWS, manages data ingestion and another platform, such as Azure, oversees data transformation and consumption. Then, you might have a third vendor, like IBM Cloud Pak for Data, providing analytical services. The data fabric architecture stitches these environments together to create a unified view of data.
That said, this is just one example. There isn’t one single data architecture for a data fabric as different businesses have different needs. The various number of cloud providers and data infrastructure implementations ensure variation across businesses. However, businesses utilizing this type of data framework exhibit commonalities across their architectures, which are unique to a data fabric. More specifically, they have six fundamental components, which Forrester (link resides outside of ibm.com) describes in the “Enterprise Data Fabric Enables DataOps” report. These six layers include the following:
Data Management layer: This is responsible for data governance and security of data.
Data Ingestion Layer: This layer begins to stitch cloud data together, finding connections between structured and unstructured data.
Data Processing: The data processing layer refines the data to ensure that only relevant data is surfaced for data extraction.
Data Orchestration: This critical layer conducts some of the most important jobs for the data fabric—transforming, integrating, and cleansing the data, making it usable for teams across the business.
Data Discovery: This layer surfaces new opportunities to integrate disparate data sources. For example, it might find ways to connect data in a supply chain data mart and customer relationship management data system, enabling new opportunities for product offers to clients or ways to improve customer satisfaction.
Data Access: This layer allows for the consumption of data, ensuring the right permissions for certain teams to comply with government regulations. Additionally, this layer helps surface relevant data through the use of dashboards and other data visualization tools.
Advantages of data fabric architectures
As data fabric providers gain more adoption from businesses in the market, Gartner (link resides outside of ibm.com) has noted specific improvements in efficiency, touting that it can reduce “time for integration design by 30%, deployment by 30%, and maintenance by 70%.” While it’s clear that data fabrics can improve overall productivity, the following benefits have also demonstrated business value for adopters:
Intelligent integration: Data fabrics utilize semantic knowledge graphs , metadata management, and machine learning to unify data across various data types and endpoints. This aids data management teams in clustering related datasets together as well as integrating net new data sources into a business’s data ecosystem. This functionality automates aspects of data workload management, leading to the aforementioned efficiency gains, but it also helps to eliminate silos across data systems, centralize data governance practices, and improve overall data quality.
Democratization of data: Data fabric architectures facilitates self-service applications, broadening the access of data beyond more technical resources, such as data engineers, developers, and data analytics teams. The reduction of data bottlenecks subsequently fosters more productivity, enabling business users to make faster business decisions and by freeing up technical users to prioritize tasks that better utilize their skillsets.
Better data protection: The broadening of data access also doesn’t mean compromising on data security and privacy measures. In fact, it means that more data governance guardrails are put into place around access controls, ensuring specific data is only available to certain roles. Data fabric architectures also allow technical and security teams to implement data masking and encryption around sensitive and proprietary data, mitigating risks around data sharing and system breaches.
Use cases of data fabrics
Data fabrics are still in their infancy in terms of adoption, but their data integration capabilities aid businesses in data discovery, allowing them to take on a variety of use cases. While the use cases that a data fabric can handle may not be extremely different from other data products, it differentiates itself by the scope and scale that it can handle as it eliminates data silos. By integrating across various data sources, companies and their data scientists can create a holistic view of their customers, which has been particularly helpful with banking clients . Data fabrics have been more specifically used for:
Customer profiles,
Return-to-work risk models, and more.
Related solutions
IBM Cloud Pak for Data
IBM Cloud Pak for Data is an open, extensible data platform that provides a data fabric to make all data available for AI and analytics, on any cloud.