Given the world’s growing user base across devices and applications in recent years, we have seen a huge surge in not just the volume of data we are collecting but also in the number and variety of sources. The pandemic has certainly accelerated this trend even more and having high quality and consistency of data has become mission-critical to successfully drive business outcomes for both business and data leaders.
If you are part of the data team in any capacity be it data engineer, data scientist, data product manager, data analyst etc., you would have heard of common data governance issues of different kinds based on the type of data you work with or its primary user groups.
A data lake or data warehouse hosting data for consumer and producer user groups without proper governance in place leads to chaotic operations with unplanned emergencies very quickly.So controls are necessary for data, its content, structure, use, privacy and security.Every organization needs these controls at different levels based on the complexity of the datasets, types of data they handle and the regulatory requirements around both the data and usage patterns of this data by both producers and consumers of this data.
data governance as a concept covers all these aspects. Traditionally this is an operations function limited to defining the specification of decision rights and an accountability framework to ensure the appropriate behaviour in the valuation, creation, consumption and control of data and analytics.However, to build continuous data governance, the legacy policies and formal meetings to enforce these policies won’t help.Lately, data governance is viewed as a bureaucratic way to control data, which impedes its usage and impairs data-driven decision making culture. So instead of helping democratize data, it is seen as a blocking function. There is no doubt we need data governance to reduce the risk of non-compliance, costs with reusability, improve productivity and in general give confidence in decision making to data consumers.
So, if we want to treat data as a strategic asset, modern data governance should keep technology at the centre and help drive people, processes and tools to enable organizations to formally manage reliability, accountability, usability, trust and compliance ofdatato support business objectives with as much automation and self-serve capabilities as possible.
Recently DataOps also has emerged as a concept to help us move in this direction and reimagine data governance by bringing together data engineering, analysts, operations, data scientists, data stewards and business teams. Gartner defines DataOps as “a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization”
There are a few primary goals or objectives for good data governance in general. The following outlines the framework of data governance we currently utilize:
This involves defining the owners, governance stewards, subject matter or functional experts, and support teams for the full lifecycle of all data.
Usability aspects are right from data discovery, profiling, business-friendly definitions, flexibility for users to bring in their own tools with computing and another menu of provisioning services as part of the tech stack.
All these functionalities should ideally be part of the data platform governance and cataloguing to support transparency and accountability of the platform and to maintain/update the tools and processes, making it a collective responsibility of all teams involved.