Logo

The Data Daily

What do journalists do with text? – Text Data Stories – Medium

What do journalists do with text? – Text Data Stories – Medium

Over the last few months, I’ve been talking to journalists about their trials and tribulations with textual sources, trying to get as detailed a picture as possible of their processes, namely:

This inquiry is part of my John S. Knight Journalism Fellowship project at Stanford University, where I’m working on designing text processing solutions for journalists.

What I’ve found so far is fascinating: from tech-savvy reporters who write their own code when they need to analyze a text collection, to old-school investigative journalists convinced that printing and highlighting are the most reliable and effective options — and many shades of approaches in between.

If you’ve ever dug a story out of a pile of text, please let me know using this questionnaire. It doesn’t matter if you’ve used more or less sophisticated tools to do it.

Here’s a few reasons and incentives to contribute:

Pieces based on the analysis of text collections are not as common as their structured-data-driven counterparts, and are harder to find all in one place. One of the goals of this survey is to create a database of examples. As the rest of my work in Stanford University, the data will be publicly available for anyone to use.

Concretely, it will include information about the story:

And details about the production and sources:

I regularly check news websites looking for examples to bookmark, but it’s not always obvious whether a story involved text analysis or not, and many times they don’t come with a “how we did it” blog post associated. On other occasions, it’s hard to find works from many years ago, or stories that are no longer available online. Please, help me find yours.

The details about the production of these stories, especially the software and approaches used, could be a valuable reference for beginners as to what skills and tools are more relevant, or for more experienced journalists to compare notes with fellow text-data enthusiasts.

One of the goals of my project is to make solutions accessible to reporters who don’t have the time (or desire) to become “text-miner-journalists.” As I wrote in a previous post, the list of skills for this area of expertise is long, and the training time-consuming. Ultimately, my interest is to find ways to bring the benefits of these techniques to more reporters.

Finally, although the questionnaire is only useful if you have an example to share, I’m also interested in hearing about less successful cases. Did you have to deal with a group of documents that was too complicated to process? Was there a text or file format that became your nightmare? I want to hear all about it. Email me, and maybe we can put together a list of interesting challenges.

Images Powered by Shutterstock