What's your favorite ice cream flavor? You might say vanilla or chocolate, and if I asked why, you’d probably say it’s because it tastes good. But why does it taste good, and why do you still want to try other flavors sometimes? Rarely do we ever question the basic decisions we make in our everyday lives, but if we did, we might realize that we can’t pinpoint the exact reasons for our preferences, emotions, and desires at any given moment.
There's a similar problem in artificial intelligence: The people who develop AI are increasingly having problems explaining how it works and determining why it has the outputs it has. Deep neural networks (DNN)—made up of layers and layers of processing systems trained on human-created data to mimic the neural networks of our brains—often seem to mirror not just human intelligence, but also human inexplicability.
Most AI systems are black box models, which are systems that are viewed only in terms of their inputs and outputs. Scientists do not attempt to decipher the “black box,” or the opaque processes that the system undertakes, as long as they receive the outputs they are looking for. For example, if I gave a black box AI model data about every single ice cream flavor, and demographic data about economic, social, and lifestyle factors for millions of people, it could probably guess what your favorite ice cream flavor is or where your favorite ice cream store is, even if it wasn’t programmed with that intention.
These types of AI systems notoriously have issues because the data they are trained on are often inherently biased, mimicking the racial and gender biases that exist within our society. The haphazard deployment of them leads to situations where, to use just one example, Black people are disproportionately misidentified by facial recognition technology. It becomes difficult to fix these systems in part because their developers often cannot fully explain how they work, which makes accountability difficult. As AI systems become more complex and humans become less able to understand them, AI experts and researchers are warning developers to take a step back and focus more on how and why a system produces certain results than the fact that the system can accurately and rapidly produce them.
“If all we have is a ‘black box’ it is impossible to understand causes of failure and improve system safety,” Roman V. Yampolskiy, a professor of Computer Science at the University of Louisville, wrote in his paper titled “Unexplainability and Incomprehensibility of Artificial Intelligence.” “Additionally, if we grow accustomed to accepting AI’s answers without an explanation, essentially treating it as an Oracle system, we would not be able to tell if it begins providing wrong or manipulative answers.”
Black box models can be extremely powerful, which is how many scientists and companies justify sacrificing explainability for accuracy. AI systems have been used for autonomous cars, customer service chatbots, and diagnosing disease, and have the power to perform some tasks better than humans can. For example, a machine that is capable of remembering one trillion items, such as digits, letters, and words, versus humans, who on average remember seven in their short-term memory would be able to process and compute information at a much faster and improved rate than humans. Among the different deep learning models include generative adversarial networks (GANs), which are most often used to train generative AI models, such as text-to-image generator MidJourney AI. GANs essentially pit AI models against each other to do a specific task; the "winner" of each interaction is then pitted against another model, allowing the model to iterate itself until it becomes very good at doing that task. The issue is that this creates models that their developers simply can't explain.
“I think in a lot of cases, people look to these black box models as a response to lack of resources. It would be very convenient to have an automated system that can produce the kinds of outputs they're looking for from the kind of inputs they have,” Emily M. Bender, a Professor of Linguistics at the University of Washington told Motherboard. “If you have a dataset consisting of such inputs and outputs, it's always possible to train a black box system that can produce outputs of the right type—but often much, much harder to evaluate whether they are correct. Furthermore, there are lots of cases where it's impossible to make a system where the outputs would be reliably correct because the inputs just don't contain enough information.”
When we put our trust in a system simply because it gives us answers that fit what we are looking for, we fail to ask key questions: Are these responses reliable, or do they just tell us what we want to hear? Whom do the results ultimately benefit? And who is responsible if it causes harm?
“If business leaders and data scientists don’t understand why and how AI calculates the outputs it does, that creates potential risk for the business. A lack of explainability limits AI’s potential value, by inhibiting the development and trust in the AI tools that companies deploy,” Beena Ammanath, Executive Director of the Deloitte AI Institute, told Motherboard.
“The risks are that the system may be making decisions using values we disagree with, such as biased (e.g. racist or sexist) decisions. Another risk is that the system may be making a very bad decision, but we cannot intervene because we do not understand its reasoning,” Jeff Clune, an Associate Professor of Computer Science at the University of British Columbia, told Motherboard.
AI systems are already deeply entrenched with bias and are constantly reproducing such bias in their output without developers understanding how. In a groundbreaking 2018 study called “Gender Shades,” researchers Joy Buolamwini and Timnit Gebru found that popular facial recognition systems most accurately detected males with lighter skin and had the highest errors detecting females with darker skin. Facial recognition systems, which are skewed against people of color and have been used for everything from housing to policing, deepen pre-existing racial biases by determining who is more likely to get a house or be identified as a criminal, for example. Predictive AI systems can also guess a person’s race based on X-rays and CT scans, but scientists have no idea why or how this is the case. Black and female patients are less likely to receive an accurate diagnosis from automated systems that analyze medical images, and we're not sure why. These are just a few examples of how viewing AI-generated results as concrete data without understanding the system’s potential biases creates rippling societal consequences.
“There are many tasks right now where black box approaches are far and away better than interpretable models,” Clune said. “The gap can be large enough that black box models are the only option, assuming that an application is not feasible unless the capabilities of the model are good enough. Even in cases where the interpretable models are more competitive, there is usually a tradeoff between capability and interpretability. People are working on closing that gap, but I suspect it will remain for the foreseeable future, and potentially will always exist.”
Though there is already a subset of AI known as Explainable AI (XAI), the general techniques it promotes are often diminutive and inaccurate in encompassing the true breadth of the processes, and AI developers are not incentivized to follow this model. The issue with explainability has to do with the fact that because AI systems have become so complex, blanket explanations only increase the power differential between AI systems and their creators, and AI developers and their users. In other words, seeking to add explainability after an AI system is already in place makes it more difficult to approach than if you start with it.
“Maybe the answer is to abandon the illusion of explanation, and instead focus on more rigorously testing the reliability, biases, and performance of models, as we try to do with humans,” Clune said.
In recent years there's been a small but real push by some in the industry to develop "white-box models," which are more transparent and whose outputs can be better explained (it's worth mentioning that the white-box / black-box terminology is in itself part of a long history of racially coded terms in science; researchers have pushed to change "blacklist" to "blocklist," for example.) White-box models, nonetheless, are a relatively new branch of AI research that are seeking to make AI more explainable.
AI researchers say giving users who are impacted by a certain system a bigger role in participating in the development process is an important first step in making AI systems that are more transparent and adequately represent user needs.
“A lot of the explanations that people treat as explanations really aren't. They are reductive, they are written to the interests of the developers and what the developers think are important to explain, rather than what the user needs,” Os Keyes, a PhD Candidate at the University of Washington's Department of Human Centered Design & Engineering, told Motherboard. “Arguably, I'd say there are two big changes to AI that would be necessary to change this state of affairs, and the first is that, ultimately, this is in part a problem of the massive gap in practice between developers and actual users.”
“Broader participation in not just building the system, but also asking, what questions are interesting, what things need to be possible for this to really be explainable,” Keyes added. “That would make a massive difference.”
Ammanath agrees that some of the best practices in fostering explainability include tailoring explanations and reporting to the people who will engage with or be impacted by the automated systems. Along the same line, she said, developers need to first identify the needs and priorities of the people who will be most affected.
A more challenging problem is that many AI systems are designed for the concept of universalism—the idea that “[a] system is good if it works everywhere for everyone at all times” Keyes explained. “But the problem is that that's not how reality works, different people are going to need different explanations of different things. If we really want AI to be more explainable, we actually have to really, fundamentally change how we imagine and how developers imagine AI.”
In other words, if you build explainable AI with a one-size-fits-all design process, “you end up with something where it has explanations that only make sense to one group of people who are involved in the system in practice,” said Keyes. “The internal change is a [much] broader set of involvement in deciding explainable to whom, and what does explainable mean.”
Keyes’ call for more localized AI systems and their concern about the universality of AI models is what researchers have been warning us about in the past few years. In a 2021 paper co-authored by Bender and Gebru, who was terminated from Google for publishing this research, the authors argue that training AI models with big data make it difficult to audit for embedded biases. They wrote that big data also fails to represent populations that have less access to the internet and “overrepresents younger users and those from developed countries.”
“If we orient knowledge and AI around big data, then we're always going to bias towards those who have the resources to spin up a thousand servers, or those who have the resources to, you know, get a billion images and train them,” said Keyes. “There's something fundamentally, I'd say undemocratic, but I'd also say just badly incentivized in that.”
“The question first is, what are the conditions under which AI is developed? Who gets to decide when it's deployed? And with what reasoning? Because if we can't answer that, then all good intentions in the world around how do we live with that [AI] are all screwed,” they added. “[I]f we're not participating in those conversations, then it's a losing game. All you can do is have something that works for people with power, and silences the people who don't.”
Debiasing the datasets that AI systems are trained on is near impossible in a society whose Internet reflects inherent, continuous human bias. Besides using smaller datasets, in which developers can have more control in deciding what appears in them, experts say a solution is to design with bias in mind, rather than feign impartiality.
“The approach I currently think is the best is to have the system learn to do what we want it to,” Clune said. “That means it tries to do what we ask it to (for example, generate pictures of CEOs) and if it does not do what we like (e.g. generating all white males), we give it negative feedback and it learns and tries again. We repeat that process until it is doing something we approve of (e.g. returning a set of pictures of CEOs that represents the diversity in the world we want to reflect). This is called ‘reinforcement learning through human feedback’, because the system is effectively using trial and error learning to bring its outputs in line with our values. It is far from perfect, and much more research and innovation is required to improve things.”
“I think it is absolutely critical to start by keeping in mind that what gets called ‘AI’ isn't any kind of autonomous agent, or intelligence, or thinking entity,” Bender said. “These are tools, which can serve specific purposes. As with any other tools, we should be asking: How well do they work? How suited are they to the task at hand? Who are they designed to be used by? And: How can their use reinforce or disrupt systems of oppression?”