Funding Data and AI That Serve the Social Sector (SSIR)

Read original article here

When I co-founded DataKind in 2011 to connect volunteer data scientists with nonprofits looking to increase their impact, I did so with a vision of data science and AI being used, first and foremost, in the service of humanity. In my eyes, the social sector had a fantastic opportunity to not just catch up to the for-profit uses of tech that were changing industries and disrupting the world, but to instead race ahead with its own vision of success.

More than 10 years later, there has been an influx of money dedicated to data for good and AI for good efforts, and more projects have demonstrated applications of data science in the social sector. However, the impact of these efforts does not yet square with the bullish visions of a Fourth Industrial Revolution that will unlock a new era of human flourishing. Worse, these positive cases don’t seem to counterbalance the horror stories of technology gone wrong that we are regularly treated to, opening our eyes to the ways algorithms reinforce systemic inequities, harm already underprivileged communities, and are used for crimes against humanity by nefarious actors. There is, understandably, a growing tension in social-sector organizations that seek to take advantage of the benefits AI can provide for civil society while not exacerbating the systemic inequities that already permeate our communities. Are our only options to charge ahead with innovation, hoping for the best? Or to shun data science and AI in the social sector as more harm than good?

There must be a third way.

Within the social sector we have an opportunity to design for what we want—but to get there we’ll need to realign as a sector on how we use and, most importantly, fund this technology in line with the public interest. Building on research done at data.org, I am working with the Ford Foundation, social-sector community leaders, ethics experts, and foundation program officers to design a guidebook and set of workshops that help funders identify and support data and AI that is high impact while centering human flourishing. The hope is that this framework will begin to illuminate a third path for funders who may feel torn between capitalizing on the benefits of technology and risking their constituents in the process.

While research is still ongoing, this article shares a preview of early results from our work. Specifically, here are five ways that funders can think about investing in data and AI to put ownership of responsible data and AI squarely in the hands of the social sector.

A quick note on terminology: Throughout this article, I use the terms “data science,” “AI,” and “machine learning” interchangeably. For all intents and purposes, the technical terms in this article can be replaced with the phrase “data and computers.” You can take a deeper dive into this language conundrum hereandsee how others are navigating it.

There is no dearth of guidance on data and AI ethics, and this article is not a replacement for them, but merely one additional perspective. As a funder of anything touching data science or AI, you will want to brush up on the core tenets of thoughtful uses of data and AI. Shannon Vallor’s Introduction to Data Ethics contains a rigorous yet easy-to-understand framework rooted in philosophical ideals of human flourishing, Upturn produced a report on ethical principles specifically for foundations investing in data, and Alix Dunn’s How To Fund Tech is an extremely accessible guide for funders investing in tech projects or technology-driven nonprofits. Mimi Onuoha’s A People’s Guide to Artificial Intelligence is also a very accessible and insightful read on what can otherwise be a bewildering technology. If you’re new to this conversation, welcome! These resources are great starting points to ground yourself.

Under a fellowship with data.org, I researched and developed a landscape mapping of data-for-good and AI-for-good efforts. One point that came up in my research was that funders thought of themselves as more or less aligned to one of two camps: those funding the creation of data-driven technologies for the social sector, and those preventing harms from the use of data-driven technologies, be they from the for-profit or nonprofit sectors. While many funders may make grants in both areas, their risk tolerance for technology seems to lean more to one side than the other. Unfortunately, instead of treating the two theories of change as complementary, a culture may be growing that puts the two in conflict with one another. Anecdotally, when I was at a conference on AI governance, I asked workshop participants how we might think about applying AI to support social-sector efforts. A researcher turned to me with shock and asked, “Why would we ever?”. The implication, as I understood it, was that AI had no positive benefit and could only cause harm.

On the other hand, I have witnessed some technopositive program officers treat ethical considerations with an air of annoyance, and while they still carried out ethics checks, they seemed to do so as a matter of compliance. One can see an example of this difference in culture within this very publication. Two SSIRarticles about using AI in the social sector, “Investing in AI for Good” and “When Good Algorithms Go Sexist,” use very different language about harms. To be clear, I am a huge fan of the work of all the authors involved and support the conclusions of both articles. What I find interesting is the difference in language used: The first article focuses on the quality of an AI project in terms of its efficiency and potential for scale, and the second focuses on the quality of an AI project in terms of its gender equity. The two need not live in opposition, of course–who wouldn’t want an AI project to be highly efficient while also preserving human rights? But often, data and AI funding strategies seem to be divided along these lines.

If we are to make progress toward a world where the social sector leads in its responsible use of data and AI, funders must start seeing themselves as allies in this fight. Technical rigor needs to be married with humility and guardrails around harms, or else we’re almost sure to create systems that are inefficient, harmful, or both. One example of an initiative blending the value of data-driven solutions with an equity lens is WeAllCount, whose equity framework focuses simultaneously on reducing harms and increasing application of data science.

In my research with data.org, I found that a lot of confusion in data science funding came from a lack of clarity about what data and AI actually do in certain contexts. The two terms, data and AI, are massively overloaded, standing in for inputs, activities, outputs, and outcomes. As a result of this ambiguity, funders may have difficulty knowing how to practically apply available advice on data and AI ethics. For example, it’s become commonplace to acknowledge that data can be biased, and thus any results built from that data will perpetuate that bias. But how is that information practically applied to data in different contexts? Thinking about what bias means, and what bias is acceptable, would vary a lot between a project to collect a dataset about how many people lack social services, and an algorithm that uses that data to automatically match people to social services. One shift I’ve found helpful for mitigating this challenge is to reframe the problem from focusing on the way that data and computing could possibly cause harm, to talking about outcomes we expect to see from computers and humans throughout the project. A framework I use called “Observe-Reason-Act” is key in this discussion, and it describes a simplified version of the three core activities humans undergo to reach a goal. (Readers might be familiar with the Descriptive-Predictive-Prescriptive framework and can use it interchangeably here.) Specifically, for each goal we want to accomplish, broadly speaking, we must:

Done enough times, and strategically enough, this Observe-Reason-Act loop should allow us to reach our goal.

Because these are the steps we have to take to accomplish any goal, the question: “Is this data project doing harm?” can shift to “Who is in charge of each step—a human, a computer, or both—and what outcomes do we expect from them?” This framework has been helpful for thoughtfully assessing social-sector applications of data by shifting the conversation away from technical descriptions of the activities involved, like machine learning, and instead focusing on the outcomes we care about.

For example, at DataKind, we once worked on a project in which foster-care nonprofits wanted to build an algorithm to match foster children to the foster families who would provide the safest and healthiest homes for them. Nonprofits, foundations, and members of the foster-care system were excited and bought into this idea for using technology to get foster kids better outcomes, so this project seemed to have all the right conditions of a clear demand and community co-creation. When assessing the risks of the tool, reactions seemed to align to peoples’ predispositions to going ahead with the project already. For people who wanted to build the technology, stronger security and oversight would mitigate ethical concerns about data privacy or poorly matched kids. For those who were concerned this project could harm kids, every mitigation plan seemed insufficient. Often these ethics conversations collapse to either “We’ll make it work” or “We shouldn’t do it because it’s risky technology” for the very reason that the conversation is focused on the tech: Should it exist or not?

The Observe-Reason-Act framework is useful here because we already have a process for matching children to foster families to use in the discussion. How does a foster-care worker observe the state of the world today? How would a computer do it? How do we expect each of them to reason about matching children to families, taking into account an equity lens? Whose model of the world are we relying on? Once they make a decision, how do we expect them to act responsibly? What outcomes would tell us they were acting well, and what is the recourse for their actions if they don’t? This framing grounds us in a set of activities we may already be familiar with, and changes a conversation of pure risk into one of risk-benefit against our known reality, free from jargon like “predictive analytics.” As a result, the team assessed the ways data and computing could best support a foster-care worker who was working to match children to good families at each stage of Observe-Reason-Act, with an eye toward the child’s well-being. Ultimately, the goal of automated matching was tabled, as it didn’t seem that computers would add much to the current process, given the risks it could introduce. Interestingly, the framing of the question led to an entirely different project to support caseworkers and focused on scheduling and routing software to help them organize their lives more easily. This prevents burnout and turnover that can be devastating to a child’s chances of exiting the foster-care system.

The Observe-Reason-Act model for thinking about outcomes along the way to a goal is far simpler than many rigorous frameworks for assessing risks in algorithm design. As the example of foster-care nonprofits shows, using an outcomes-based framework throughout a project that many people can relate to may help change technology conversations from “How could this tech harm?” to “What do we expect from the actors in this system, and how do we ensure that?”

Some program officers we spoke with lamented that foundations have a chance to assess the design of a data or computing grant at the time of making the grant, but may not have many opportunities after that. However, data has a long lifespan, and algorithms and models, if successful, will continue acting in the world long after the grant starts and ends. Therefore, if the social sector is going to define best practice for data and algorithms, funders and grantees need more avenues for continual learning, oversight, and improvement of the data products they create. More opportunities to evaluate data-driven tools would allow folks to catch any unintended consequences or outcomes that could evolve down the line, as well as increase support for efforts that are working well.

Cases of social-sector algorithms that don’t perform to our standards of equity and human flourishing when used in the real world are too numerous to list and hopefully well known by now (see Automating Inequality or Weapons of Math Destruction for truly devastating stories), so the case for continued oversight of the outcomes should be easy to make. The Ada Lovelace Institute and AI Now recently released a report on algorithmic accountability in the public sector, along with a number of good recommendations. They make suggestions for policy makers and researchers, but these may as well be funders and those building tech for the social sector. A number of frameworks for contextual algorithmic oversight exist, like this one from Cambridge University, that funders could consider implementing.

What is less documented is how funders responsibly support technologies that are performing well in the social sector. One case that recently caught media attention was the news that Crisis Text Line (CTL) was sharing its data with a for-profit entity in return for a share of the profits. The optics of a nonprofit seemingly profiting off of selling data from vulnerable teens in need of mental health services were terrible and created no lack of scorn in the media. The lesson for funders of data and algorithms here is less black-and-white, however.

As alluded to in a letter from one of CTL’s board members danah boyd, and much more pointedly in a Twitter thread by Lucy Bernholz, the pressures for nonprofits that are playing the role of de facto social services to sustain themselves are huge and the stakes are high.

Successful data-driven nonprofits may face a lose-lose situation with regards to sustainability as incentives are stacked against them and funders inevitably push them to find alternative forms of funding: Given a gap in funding, should they shut down or cut back their services, creating worse outcomes for the communities they serve? Or should they explore fraught market-based solutions to keep their services running?

As Lucy’s thread points out, the challenge is much more interconnected and nuanced than that simple ultimatum, but it hopefully illustrates the challenge at hand. The question of data ethics in this case extends beyond the question “How might CTL’s algorithms do harm in its use?” that one might ask during inception to “How might we sustain a responsible version of CTL’s services, and who is responsible to do so?” Admittedly, this is fairly new territory for the sector so best practices aren’t easy to come by. Nevertheless, funders have a huge role to play in this design and an opportunity to shape a positive outcome for the social sector by exploring long-term or innovative funding models that sustain responsible use of algorithms.

Much of the advice above gives guidance on creating more effective and less harmful solutions in the social sector by thinking more specifically about the design of these solutions and how we maintain them. Funders, however, can’t be responsible for implementing solutions or maintaining them. We need skilled professionals responsible for assessing designs for efficiency and equity, for providing technology support to build solutions, for maintaining successful solutions and scaling them, and for creating new guidelines and oversight mechanisms. There are not, at present, many jobs that fulfill these needs. Corporate social responsibility programs at big tech companies provide short-term technical assistance, but not enough to maintain a product through its whole lifecycle. Third-party algorithmic auditing services are beginning to appear, but they primarily focus on the private sector and don’t have the scale to examine all social-sector solutions. Social-sector organizations struggle to pay for technology skills in-house, so often have to cobble together insufficient solutions. Public interest technologists, like those being supported by the Ford Foundation, will be critical in realizing the suggestions above. Funders can help mature this field by funding the skills and talent required to support the ecosystem.

Foundations have a unique and exciting opportunity to shape the use of data science and AI in the social sector. Working together, they can shift the sector away from a false binary of funding tech for tech’s sake or seeing tech as an encroaching harm to other social priorities. While research is still ongoing, these five principles can help funders and social organizations see an opportunity to not just use data and AI better in the social sector, but to redefine how these technologies are fundamentally designed.

Images Powered by Shutterstock

The Data Daily

Funding Data and AI That Serve the Social Sector (SSIR)