In a time of the re-rising of fascism, of what feels like a lessening of social justice values, and in an age of global digitization, social justice has never been more integral in the space of interrogating data, technology, and the structure of society itself. Technology surrounds us, everywhere, but how technology is made, and who benefits deeply from technology, and who does not, is still an important question to unpack, research and critique. Feminist Data Set is an art project that uses intersectional feminism as a critical framework for investigating and critiquing machine learning. The investigation happens through a critical design lens, since the project involves making a chatbot from start to finish using intersectional feminism as a guide. This includes asking what is intersectional feminist data collection; what is intersectional feminist data; what is intersectional feminist data labeling, and data training; does an intersectional feminist system or product exist to data label and data train; what does intersectional feminist software look like; what does it do, and are there intersectional feminist algorithms; what would these algorithms then need to exist; what does an intersectional chatbot look like, and how does it interact?
Politically and artistically, Feminist Data Set is inspired by the work of the maker movement, critical design, Arte Útil, Data Feminism, Design Justice, the Critical Engineering Manifesto, Xenofeminism, and the Feminist Principles of the Internet. Pedagogically, Feminist Data Set operates in a similar vein to Thomas Thwaites’s Toaster Project, a critical design project in which Thwaites builds a commercial toaster from scratch. Feminist Data Set, however, takes a critical and artistic view on software, particularly machine learning. What does it mean to thoughtfully make machine learning, to carefully consider every angle of making, iterating, and designing? Every step of this process needs to be thoroughly re-examined through a feminist lens.
There is a growing movement of analyzing technology through a more critical and ethical lens (however, the word ethical itself is becoming controversial and overused). But this critical lens is important, especially as it exists to create more equity within technology as a practice and as its own entity—entity, in the sense that technology is a specific kind of thing, be it software or hardware, and a practice would be how an individual or group uses technology as an addition to its own creativity and making. Current books like Data Feminism by Catherine D’Ignazio and Lauren Klein generate a practice that unpacks data and datasets in society, and creates methodologies for having better data practices through a feminist lens. Design Justice by Sasha Costanza-Chock directly situates social justice within the design world, within design making and product design, and in design thinking as an exercise and practice.
Both Data Feminism and Design Justice analyze how capitalist and corporate structures use design and data, and how civic technologists, social justice movements, and activists engage with data and design. By situating design and data in both the corporate and non-corporate worlds, Data Feminism and Design Justice create an expansive, more holistic view of how data and design are utilized, from a more problematic end (the corporate side) and a more equity-driven end (the social justice and activist side). But both of these ends of the spectrums, and the spaces in-between, must be explored, and this is what makes Data Feminism and Design Justice strong as books, as methodologies, and as use cases. Feminist Data Set takes a similar stance. By interrogating machine learning, not just an artistic practice, but one rooted in product design and the corporate world, Feminist Data Set looks at how technology misuses machine learning as a whole field, and then in individual pieces within the machine learning pipeline.
Often the tools I need to make Feminist Data Set as a critical design and art project don’t exist. For example, what is a feminist data training platform? Is there one? In machine learning, when it comes to labeling data and creating a data model for an algorithm, groups will generally use Amazon’s labor force, Mechanical Turk, to label data. Amazon created Mechanical Turk to solve their own machine learning problem of scale: they needed large data sets trained and labeled. Using Mechanical Turk in machine learning projects is standard in the field; it is used everywhere, from technology companies to research groups to help label data. Mechanical Turk underpays their workers and treats them as gig economy employees rather than full-time employees, thus giving them fewer benefits, and is not intersectional feminist, so I cannot use it in my work. I either have to find an alternative or build one. For the past year, I’ve been exploring what an intersectional feminist machine-learning labeling and training system is, and what it would need. I’ve been creating a tool, much like a calculator, that translates the number of tasks a Turker does, and the costs of those tasks, into an hourly wage. Say, for example, a client prices ten thousand tasks at four cents each: the calculator would translate that into hours or days of work it would take a Turker to complete all of those tasks, and how much their wages would be.
In this sense, Feminist Data Set blends art and social justice-driven research and technology, like Turkopticon, made by Professor Lilly Irani and Mechanical Turkers. Turkopticon allows Mechanical Turkers to rate jobs and clients. This add-on solves a real problem that workers face, often not knowing the quality of a client and having no way to share this information with other Turkers.
To create a feminist AI, the labor and payment inequity in machine learning data training platforms needs to be confronted. In an article written by The Atlantic that investigates the treatment of Mechanical Turkers, the problem “is not necessarily that requesters are underpaying for the work. The average requester pays around eleven dollars an hour for the work they get, according to Hara. But there are also many requesters who pay less than that, and there are many requesters who post tasks that take longer than they say to complete. Still, the root of the problem is that these platforms allow requesters to avoid paying workers for the downtime that would arise if workers did these tasks full-time”. A research paper co-authored by researcher and Mechanical Turker Kristy Milland found that a median wage was about two dollars an hour but only four percent of Turkers earned more than $7.25 per hour.
Pay equality and pay equity is a step towards equity in technology and society. Workers need to be paid for their time, not just for time doing a task, but time working—finding work and setting up work tasks—and they should be paid well for their labor. Gig economy companies should be held to labor standards and labor laws.
Making must be thoughtful and critical in order to create equity. It must be open to feedback and interpretation. We must understand the role of data creation and how systems can use, misuse, and benefit from data. Data must be seen as something created from communities and as a reflection of that community—data ownership is key. Data’s position inside technology systems is political, it’s activated, and it’s intimate. For there to be equity in machine learning, every aspect of the system needs to be examined, taken apart and put back together. It needs to integrate the realities, contexts, and constraints of all different kinds of people, not just the ones who built the early Web. Technology needs to reflect those who are on the web now.