Text-to-image generators such as DALL-E 2 have been pitched asmaking stock photos obsolete. Specifically, generatingcover imagesfor news articles is one of the use cases envisioned for these models. One downside of this application is that like most AI models, these tools perpetuatebiasesandstereotypesin their training data.
But do these models also perpetuate stereotypes about AI, rather than people? After all, stock images are notorious formisleading imagerysuch as humanoid robots that propagate AI hype.
We tested Stable Diffusion, a recently released text-to-image tool, and found thatover 90% of the images generated for AI-related prompts propagate AI hype. Journalists, marketers, artists, and others who use these tools should use caution to avoid a feedback loop of hype around AI.
Stock photos of all kinds, not just of AI, are some of themost ridiculed imageson the internet.
News articles on AI tend to use stock photos that include humanoid robots, blue backgrounds with floating letters and numbers, and robotic arms shaking hands with humans. This imagery obscures the fact that AI is a tool and doesn’t have agency. Most of the articles in question are about finding patterns in data rather than robots.
A news article’s cover image sets the tone and provides avisual metaphor. Especially in articles on AI, images are used as a scaffold, since most readers do not have a deep understanding of how AI systems work.
Images that do not reflect the content of the article areat best misleadingand at worst lead to hype and fear about the technology. For example, astudythat surveyed the U.K. public found that a majority couldn’t give a plausible definition of AI and 25% defined it as robots—no doubt in part because of how AI is represented in the media.
Accurately representing emerging technologies such as artificial intelligence is hard. In fact, misleading images of AI have become so widespread that there areseveralprojectstoimproveAIimagery.
Text-to-image tools such asDALL-E 2,Imagen,Midjourney, andStable Diffusionhave sparked people’s imagination, but also generated discussion about economically productive uses. One potential use is togenerate cover images for news articles. Afewarticleshave already used them, and wider use is likely as the tools become open-source and freely available.
Given this prospect, do AI-based text-to-image tools create misleading images of AI, or do they generate images that accurately depict how AI is used today?
We tested one of these systems—Stable Diffusion—with four prompts about AI: “artificial intelligence”, “artificial intelligence in healthcare”, “artificial intelligence in science”, and “cover image for a news article on artificial intelligence”.
For each prompt, we generated 20 images to check whether they contained humanoid forms or robots. We included prompts about healthcare and science to check if the resulting images included elements about the domain such as images of doctors and scientists using AI tools.
The results are stark:for each prompt, over 90% of the images generated using the text-to-image tool contain humanoids. Adding an application domain for artificial intelligence to the prompt does not improve the outcomes too much—only 1 out of 20 images for both the healthcare and science prompts contain anything that is even remotely related to those domains, and 19 of the 20 images still contain humanoids.
Surprisingly, the results for “artificial intelligence” on the stock image website Shutterstock contain humanoids or robots in only11 out of the first 20 results; this hints that text-to-image models could amplify stereotypes in images of AI.
We include all 20 images for each prompthere; you can alsotry out the tool yourself.
Using carefully chosen prompts for AI-based text-to-image tools is pivotal to generating images that reflect the user’s intent. In fact, it is so important that there is now amarketplace for buying promptsto create better images.
Fortunately, changing the prompt is also an easy way to generate images of AI that are not misleading or inaccurate. Using more descriptive prompts about thecontentsof the image rather than theconceptsin the image can result in more appropriate images. For example, instead of using “artificial intelligence in healthcare” as a prompt, you can use a descriptive prompt such as “a photograph of a doctor looking at a medical scan on a computer” to get images where humans are rightly depicted as agents, rather than humanoid robots. If it is necessary to visually emphasize the technology itself, prompts such as “silicon chip” tend to work well. Or “self-driving car”, if it is appropriate to the contents of the article.
Surprisingly, if we usemachine learninginstead ofartificial intelligenceas a prompt, robots disappear from the resulting images, though most images are unusable and create illegible characters. Even though recent AI advances rely on some form of machine learning, the resulting images for the two prompts are starkly different, showing much the prompt matters.
The heavy use of stock images in the training data for text-to-image tools could be one reason for the abundance of humanoid robots in images of AI. The images used for training text-to-image tools include stock photo repositories—so much so that these tools often end up creating images with stock photowatermarksstill intact!
Given theabundance of humanoids in stock images, it is unsurprising that the tools trained using these images generate humanoids and robots in response to prompts about AI.
Stock photos are not just problematic when they are about AI. The very point of a stock image is to quickly bring a specific concept or category to mind—so they oftenfocus on stereotypesand can bederogatory and biased. Using stock images for training text-to-image tools could be one source of misleading or biased images that result from these tools—andnot just in articles about AI.
Images used to illustrate AI articles today are already misleading and inaccurate. If news media turn towards using text-to-image tools for generating illustrations, it risks creating a pernicious feedback loop where AI tools are used to generate misleading images of AI that feed into the hype—and are used as training data for future models.
Similar issues have been found in language models, where training datasets arecontaminated by machine-generated text. Stable Diffusion does apply aninvisible watermarkwhich could potentially be used to filter out machine-generated images. However, the rapid increase in the number of text-to-image tools and the lack of standardized watermarks could complicate this filtering step. As we suggested above, an alternative way to improve these tools could be to reduce the number of stock images in the training data.
Finally, when writing prompts, describing thecontentsof the image instead of theconceptsin the image and experimenting with different prompts to pick out accurate examples will go a long way.