Logo

The Data Daily

How to easily check if your ML model is fair?

How to easily check if your ML model is fair?

We live in a world that is getting more divided each day. In some parts of the world, the differences and inequalities between races, ethnicities, and sometimes sexes are aggravating. The data we use for modeling is in the major part a reflection of the world it derives from. And the world can be biased, so data and therefore model will likely reflect that. We propose a way in which ML engineers can easily check if their model is biased.

To showcase the abilities of the module, we will be using the well-known German Credit Data dataset to assign risk for each credit-seeker. This simple task may require using an interpretable decision tree classifier.

Once we have we need to execute the method so it can calculate all necessary metrics among the subgroups from vector which is an array or a list with sensitive attributes. Apart from that, we will need to point which subgroup is the most privileged and it can be done through parameter.

This object has many attributes and we will not go through each and every one of them. A more detailed overview can be found in this vignette. Instead, we will focus on one method and two plots.

This question is simple but because of the nature of bias, the response will be: it depends. But this method measuring bias from different perspectives so that no bias model can go through. To check fairness one has to use method.

So our model is not fair!

The answer to this question is tricky, but the method of judging fairness seems to be the best so far. Generally, the score for each subgroup should be close to the score of the privileged subgroup. To put it in a more mathematical perspective the ratios between scores of privileged and unprivileged metrics should be close to 1. The closer the more fair the model is. But to relax this criterion a little bit, it can be written more thoughtfully:

Where the is a value between 0 and 1, it should be a minimum acceptable value of the ratio. On default, it is 0.8, which adheres to the four-fifths rule (80% rule) often looked at in hiring. To go deeper into this topic I encourage you to check my previous blog concerning R package fairmodels.

For now there are two types of plots available

The types just need to be passed to parameter of the method.

Multiple models can be put into one plot so they can be easily compared with each other. Let’s add some models and visualize the

Every plot is interactive, made with python visualization package. I highly suggest checking it out in this vignette.

Fairness module in is a unified and accessible way to ensure that the models are fair. New plot types and bias mitigation methods will be added in the next versions of the package. There is a long term plan to add support for individual fairness and fairness in regression. Be sure to check it out. You can install with:

If you want to learn more about fairness I really recommend:

Images Powered by Shutterstock