By now, you’ve probably heard that bias in AI systems is a huge problem—whether it’s facial recognition failing more often on Black people or AI image generators like DALL-E reproducing racist and sexist stereotypes. But what does algorithmic bias actually look like in the real world, and what causes it to manifest?
A new tool designed by an AI ethics researcher attempts to answer that question by letting anyone query a popular text-to-image system, allowing people to see for themselves how certain word combinations produce biased results.
Hosted on HuggingFace, a popular Github-like platform that hosts machine learning projects, the tool was launched this week following the release of Stable Diffusion, an AI model that generates images from text prompts. Called the Stable Diffusion Bias Explorer, the project is one of the first interactive demonstrations of its kind, letting users combine different descriptive terms and see firsthand how the AI model maps them to racial and gender stereotypes.
“When Stable Diffusion got put up on HuggingFace about a month ago, we were like oh, crap,” Sasha Luccioni, a research scientist at HuggingFace who spearheaded the project, told Motherboard. “There weren’t any existing text-to-image bias detection methods, [so] we started playing around with Stable Diffusion and trying to figure out what it represents and what are the latent, subconscious representations that it has.”
To do this, Luccioni came up with a list of 20 descriptive word pairings. Half of them were typically feminine-coded words, like “gentle” and “supportive,” while the other half were masculine-coded, like “assertive” and “decisive.” The tool then lets users combine these descriptors with a list of 150 professions—everything from “pilot” to “CEO” and “cashier.”
The results show stark differences in what types of faces the model generates based on what descriptors are used. For example, using “CEO” almost always exclusively generates images of men, but is more likely to generate women if the accompanying adjectives are terms like “supportive” and “compassionate.” Conversely, changing the descriptor to words like “ambitious” and “assertive” across many job categories makes it far more likely the model will generate pictures of men.
“You really see patterns emerge in terms of what the generations look like,” said Luccioni. “You can really compare ‘pilot’ versus ‘assertive pilot’ and really see the difference between the two.”
The issue of bias in image generation systems has become increasingly urgent in recent months, as tools like DALL-E, Midjourney, and Stable Diffusion hit mainstream levels of hype. Last month, it would team up with DALL-E creator OpenAI to begin allowing the sale of stock images generated by AI systems, while Getty Images banned AI-generated images, citing copyright concerns. Meanwhile, some artists have spoken out against the AI image tools, which are trained on massive amounts of artistic images scraped from the web without credit or permission from their creators.
Tools like the Stable Diffusion Explorer are a reaction to the increasing complexity of these black box AI systems, which has made it virtually impossible for scientists to understand how the systems work—beyond looking at what goes into and comes out of them. While it’s impossible to fully eliminate human bias from human-made tools, Luccioni believes that tools like hers could at least give regular people an understanding of the ways that bias manifests in AI systems. And it could also help researchers reverse-engineer the models’ bias, by finding out how different words and concepts are being correlated to one another.
“The way that machine learning has been shaped in the past decade has been so computer-science focused, and there’s so much more to it than that,” said Luccioni. “Making tools like this where people can just click around, I think it will help people understand how this works.”