How to use data governance for AI/ML systems

Read original article here

Data governance assures that data is available, consistent, usable, trusted and secure. It is a concept that organizations struggle with, and the ante is upped when big data and systems like artificial intelligence and machine language enter the picture. Organizations quickly realize that AI/ML systems function differently from traditional, fixed record systems.

With AI/ML, the objective isn’t to return a value or a status for a single transaction. Rather, an AI/ML system sifts through petabytes of data seeking answers to a query or an algorithm that might even seem to be a little open ended. Data is parallel-processed with threads of data being simultaneously fed into the processor. The vast amounts of data being simultaneously and asynchronously processed can be weeded out by IT in advance to speed processing.

This data can come from many different internal and external sources. Each source has its own way of collecting, curating and storing data — and it may or may not conform to your own organization’s governance standards. Then there are the recommendations of the AI itself. Do you trust them? These are just some of the questions that companies and their auditors face as they focus on data governance for AI/ML and look for tools that can help them.

If you’re integrating data from internal and external transactional systems, the data should be standardized so that it can communicate and blend with data from other sources. Application programming interfaces that are prebuilt in many systems so they can exchange data with other systems facilitate this. If there aren’t available APIs, you can use ETL tools, which transfer data from one system into a format that another system can read.

If you’re adding unstructured data such as photographic, video and sound objects, there are object-linking tools that can link and relate these objects to each other. A good example of an object-linker is a GIS system, which combines photographs, schematics and other types of data to deliver a full geographic context for a particular setting.

We often think of usable data as data that users can access — but it’s more than that. If the data you retain has lost its value because it is obsolete, it should be purged. IT and end business users have to agree on when data should be purged. This will come in the form of data retention policies.

There are also other occasions when AI/ML data should be purged. This happens when a data model for AI is changed, and the data no longer fits the model.

In an AI/ML governance audit, examiners will expect to see written policies and procedures for both types of data purges. They will also check to see that your data purge practices are in compliance with industry standards. There are many data purge tools and utilities in the market.

Circumstances change: An AI/ML system that once worked quite efficiently may begin to lose effectiveness. How do you know this? By regularly checking AI/ML results against past performance and against what is happening in the world around you. If the accuracy of your AI/ML system is drifting away from you, you have to fix it.

The Amazon hiring model is a great example. Amazon’s AI system concluded that it was best to hire male job applicants because the system was looking at past hiring practices, and most of those hired had been men. What the model failed to adjust to moving forward was a greater number of highly qualified female applicants. The AI/ML system had drifted away from the truth and instead had begun to sow hiring biases into the system. From a regulatory standpoint, the AI was out of compliance.

Amazon ultimately deimplemented the system — but companies can avoid these errors if they regularly monitor system performance, check it against past performance and compare it with what is going on in the outside world. If the AI/ML model is out of sync, it can be adjusted.

There are AI/ML tools that data scientists use to measure model drift, but the most direct way for business professionals to check for drift is to cross-compare AI/ML system performance with historical performance. For instance, if you suddenly find weather forecasts to be 30% less accurate, it’s time to check the data and the algorithms that your AI/ML system is running.

Images Powered by Shutterstock

The Data Daily

How to use data governance for AI/ML systems