Independent analyst for data and analytics Philip Russom PhD. offers commentary on the Gartner view of data fabric from the recent Gartner Data & Analytics Summit 2023.
I had the honor and pleasure of attending the Gartner Data & Analytics Summit, just held on March 19-22, 2023 in Orlando, Florida. It was an impressively large and well-organized event that covered most aspects of data and analytics (D&A), plus their best practices, tools, technologies, and team structures. However, in my opinion, the topic hit most often and most profoundly was the data fabric. Allow me to summarize the Gartner View of the Data Fabric, as presented at the Gartner D&A Summit 2023.
Data fabric has become the leading-edge paradigm for data management development, deployment, and automation. Without a fabric, organizations struggle with data availability and access, data standards, governance, data engineer productivity, and time to use for analytics and other data products.
Industry analysts at Garter define the data fabric as an architecture and set of best practices for unifying and governing multiple data management disciplines, including data integration, quality, active metadata, master data, pipelines, catalogs, orchestration, analytics, DataOps, and much more. “Unified” means that the diverse tools of a data fabric must interoperate deeply, in both development and production; unification and interoperability for the data fabric tool portfolio may be achieved via a common graphical user interface (GUI), application programming interfaces (APIs), data standards, user methods, shared data products and objects, and shared metadata and other semantics.
To achieve the scale, agility, and productivity required for a production data fabric, the tools used should ideally support automation, ranging from old-fashioned business rules to cutting-edge smart algorithms (perhaps based on machine learning) that recommend or automatically perform data engineering actions. Other data fabric capabilities either required or recommended by Gartner analysts include intelligent orchestration, composable architecture, data catalogs, and knowledge graph databases and analytics (to represent and analyze data objects collected via active metadata).
The data fabric capabilities mentioned above are numerous. Many are advanced (e.g., knowledge graphs, automation via machine learning) or in a rudimentary state of evolution (active metadata, DataOps). For these reasons, many data management professionals and other technical users find the Gartner definition of the data fabric to be quite challenging to understand and implement.
Hence, most data and analytics leaders don’t know where to begin when designing and deploying a data fabric. Likewise, it is not obvious how to extend existing data management solutions to evolve into a data fabric. Luckily, the Gartner D&A Summit 2023 included some sessions that provided clear explanations of the data fabric and its key component called active metadata, plus how to approach the implementation of these. I will now summarize those sessions.
One of the most useful sessions I attended at the Gartner D&A Summit was “The Practical Data Fabric — How to Architect the Next Generation Data Management Design,” presented by Ehtisham Zaidi, VP Analyst at Gartner Inc. The presenter made the data fabric more understandable, and he stressed the benefits of the fabric.
For example, the data fabric offers something for everyone:
As further proof of fabric benefits, the presentation quoted a recent Gartner prediction: “By 2025, active metadata-assisted automated functions in the data fabric will reduce human effort by half and quadruple data utilization efficiency.”
The greatest contribution of this presentation, however, is the procedure of nine steps that Ehtisham Zaidi shared with us. The procedure answers the tough questions many users ask: Where do we start? In what order should we proceed? What is the operating model for a data fabric? What advanced levels should we aspire toward?
Here is my summary of Ehtisham Zaidi’s nine steps for data fabric design and development:
Start with known data that can answer known questions, and use standard data integration tools and practices, plus data cataloging and the DataOps method.
Embrace unknown data and unknown questions. Use the tools and methods of the standard path, but add knowledge graphs (as a representation of data relationships) as an enabler for advanced analytics.
Satisfy a need for automation, which is critical to data fabric speed, productivity, and depth of analytic insight. To the other two paths, add active metadata and a recommendation engine.
One of the more provocative sessions I attended at the Gartner D&A Summit was “Data Fabric or Data Mesh: Debate on Deciding Your Future Data Management Architecture,” presented by Ehtisham Zaidi, VP Analyst and Robert Thanaraj, Director Analyst at Gartner Inc.
Proponents of the data mesh tout it as a next-generation data management architecture based on domain-driven, distributed data management. The data fabric, on the other hand, provides an enterprise-wide infrastructure for centralized (but not necessarily consolidated) data management. Data mesh consultants regularly recommend that enterprises disassemble centralized competency centers for data management, to be replaced by multiple small teams at the department or business unit level. The data fabric, as defined by Gartner analysts, recommends that enterprises utilize their existing data management infrastructure and team structures, while evolving them to support data fabric technologies and methods from a central “version of the truth” for most enterprise data. Both data mesh and data fabric champion new practices, such as DataOps and data as a product.
The main advantage of this definition of data mesh is that the resulting data products align well with the business domain that the data and analytics team reports to, as compared to data products that come from a distant, central team and its data. There is some truth to this, but the downside is that the meshed teams are alienated from the standards, governance, and shared innovations of a central fabric and its team. And reinventing the wheel with multiple distributed teams is more expensive than the efficiency of a shared pool of data and analytics specialists. Plus, we all know that siloed departmental data has a variety of problems. Users must weigh the comparisons of mesh and fabric when contemplating the use of either.
The presenters – Ehtisham Zaidi and Robert Thanaraj – near the end of their session pointed out that the comparisons lose some relevance when we realize that data mesh and data fabric are very different and they target different technology stack levels. We can even say they have complementary strengths and weaknesses, such that deploying both can be desirable for some enterprises. For example, a compromise seen with some Gartner clients is that they maintain a central infrastructure and team for shared enterprise data (i.e., data fabric with enterprise governance), but deploy multiple autonomous teams for analytics focused on domain issues and goals (i.e., data mesh with federated governance).
You may have noticed that metadata plays a prominent role in the data fabric, as defined by Gartner analysts. For example, the first four of Ehtisham Zaidi’s nine steps for data fabric design (listed above) are all about metadata, plus new and innovative methods for managing and using metadata, such as active metadata and knowledge graphs.
At the Gartner D&A Summit, Gartner analyst Mark Beyer drilled into a number of metadata-based innovations in his presentation “The Active Metadata Helix: The Benefits of Automating Data Management.”
According to Beyer: “Metadata is generated practically every time data is accessed in any tool, platform or application. This is a largely untapped resource that is real-time documentation of exactly how, when and why any person in the enterprise uses data from any and all assets available to them. ‘Active metadata’ is the conversion of these otherwise passive observations into ML-enabled, automated data management.”
The presenter introduced the idea of the ‘active metadata helix.’ For many business processes and entities, metadata records a syntax of data triples: subject, predicate, and object. Combining these data points from metadata forms a multi-layered helix, which is a series of data triples. Using any common data point, all triples can be traced. This is valuable to the business, because it reveals the details of data utilization, to support analytics about data lineage, compliant (or non-compliant) usage, user behaviors, levels of consumption (or lack of consumption), and so on. “Metadata is observable evidence of the data experience in an organization,” said Beyer.
Near the end of his session, Mark Beyer made a provocative recommendation: “Stop designing data so much. Start observing it, then build alerts that notify people about data and the events and entities it represents.” You need automation for that, which is where active metadata comes in. “Data management tools already ‘do’ metadata. Observe and listen to users by activating metadata.”
Note that Beyer’s vision for the future of metadata management and Gartner’s definition of data fabric both assume that fully automated active metadata will become a common data management tool function and technical user practice. Recent editions of the Gartner Hype Cycle for Data Management project the maturity of active metadata and the data fabric, but not for many years. Both have adoption rates of 5% or less of their addressable markets, according to anecdotal remarks made at the summit by Gartner analysts. So, don’t expect active metadata and the data fabric to be commonplace soon. But rest assured: they are coming.