Challenges in Data Mesh and How to Resolve Them - DZone Big Data

Read original article here

The desire for more and better data encourages enterprises to elevate their management landscape, and data mesh is an integral addition to the stack. To put it simply, a mesh decentralizes the architecture and puts the domains in charge of the data capturing and sharing for the entire enterprise. The domain-driven design fits aptly in the hugely anticipated web 3.0. Add to it that the step ahead from traditional fabric solutions aims to bring down management costs, ensure faster streams, and ultimately think of data as a product.

Mesh, being a relatively newer concept, attracts challenges for the data teams and expects a more holistic approach. While it made it to the list of hot trends in data and analytics last year, the ongoing 2022 and the coming year will only scale its adoption.

You read that right. More power to domains that actually define the Mesh is also a challenge for the enterprises at first.

As we know, data mesh architecture decentralizes the control of ownership to all concerned domain teams. Since all of them are accountable for creating and sharing their respective data products with everyone, it tends to affect their core job roles.

In some cases, the data duplicity due to multi-domains causes redundancy. This happens when the domain data of one unit is repurposed to cater to the business needs of the other. Apart from deteriorating resource utilization, multi-domain duplicity shoots management costs.

A tightly coupled data pipeline can resolve this. Data modifications at the application layer can directly result in data errors to lakes and further in the reports by engineers. Not to miss, troubleshooting these issues consumes time.

These errors often produce friction between the data and the engineering teams.

Moreover, multiple domains have different Q/A standards and may have exclusive governance policies. Hence, it’s an additional factor to consider while working with a decentralized model of a mesh.

Enterprise scalability comes attached with the challenges of increasing data, and that further impacts the mesh architecture. With the changes in the domain structure of the enterprise or the type of data within each domain, mesh platforms often succumb to the evolution.

Therefore, data mesh architecture designs should address future scalability. While adding new domains to mesh architecture is mostly simple in most products, they often increase the workload and affect the mesh’s overall performance.

An optimal solution here would be to strategically eliminate data products from the system. As of now, it happens less frequently, is costlier, manual, and more difficult. Not to miss, it is important to examine the dependency of every data product before removal. In some cases, another data product consuming the eliminated data product results in a major blunder.

Thus, it is imperative to notify the users about the data removal strategy or any other changes.

Ultimately, the transition from the conventional centralized approach to a more democratic and domain-governed data mesh is a careful execution with no errors.

Now, like all new technologies, data mesh too suffers from a lack of talented professionals. With total ownership of the data to the respective domains, the already burdened teams hardly get the bandwidth to train new resources. Domain units should build a dedicated team so that their core work is unaffected.

That being said, lack of professional skill-set is an issue across disciplines in the data management landscape. Gartner asserts that 80% of data science projects would fail due to a shortage of adequate skills.

Apart from in-house learning & development initiatives, data management leaders should implement a smarter data product that fills the gaps. A user-friendly product would not only lessen the dependency of minute tasks on the professionals but would also assist the beginners in self-train themselves.

An appropriate data product platform not only adheres to decentralization but also addresses all the inconsistencies discussed above. Furthermore, it captures data from varied sources and delivers the data products, providing a holistic and real-time view of the analytical and operational workloads.

Not to miss, it prepares the semantic definition of all data products. Next, it lays down the governance policies and sets the data ingestion methods with a vision to prepare and secure the data as per the regulations.

While we are at it, existing data and analytical products should scale up to support mesh architecture. This is because building a data management landscape from scratch just to support mesh attracts unforeseen costs. Mesh architecture covers all modules in the data management landscape, such as integration, virtualization, preparation, masking, governance orchestration, cataloguing, and others.

Next, a reliable data product platform should enable the secure distribution of data to a number of domains. There’s Talend, a popular name in the data products segment. It offers great integration spanning different system landscape environments such as on-premise, cloud, and hybrid. Given its prior experience in building data management products, Talend offers broader data engineering capabilities. It provides a wide range of connectors to diverse data sources.

K2View, for example, integrates data from multiple sources into different data products. The data mesh platform strikes the right balance by enabling the centralization of data modelling, governance, and cataloguing while decentralizing the data creating and sharing ownership to respective domains. It supports both analytical and operational workloads. They are already popular for successfully implementing micro-DB in their fabric product which provides a single and accurate high-level view of all domains.

Informatica, which is one of the pioneering data management products, offers excellent integration and support for different architectures. It offers you the ability to scale to complex integration scenarios.

An ideal data product platform should deliver a real-time Infrastructure-as-a-Service (IaaS) solution, thereby supporting multi-dimensional data automation and abstraction.

There is no straight answer to the perfect data mesh platform. The degree of decentralization, authorization, data ownership, and feeding depends upon the business requirement. Furthermore, based on different sectors such as BFSI, healthcare, Supply Chain, etc., the data mesh platform will respond differently.

Thus, mesh developers should inculcate full-stack customization to aptly fit the business level requirement. That is where a data product platform would address the insufficiencies.

Images Powered by Shutterstock

The Data Daily

Challenges in Data Mesh and How to Resolve Them - DZone Big Data