What are important data skills for a layperson to have? With the data revolution, what should everyone be able to do?
I argue that it is much more than understand a median vs a mean. People need to be equipped to use their critical thinking skills on data.
Let’s get a good conversation going. I recently had the pleasure of speaking with my sister-in-law about her graduate program. Because I tend to hog space on academic matters, we drifted to data analysis. She explained that she had taken a couple statistics courses but had found limited value in their application.
She also spoke to others already practicing in her chosen area of study and they leveled with her that they had not found the courses applicable to their work.
But this is the opposite of what we have been promised by the data revolution. As data becomes more accessible, we have been told that we will all be able to see and understand our data. Statistics courses should be highly applicable to everyday analysis. Even the basics about learning to use medians vs averages, sums, mins, and maxes can be very useful.
I briefly pondered this feedback. What kind of courses would better prepare people for the real world when working with data?
In my experience one thing has become very clear. Data are an imperfect reflection of the real world. It is rare to be able to measure something directly. It is even rarer to conclusively say much of anything.
I find that decision-makers tend to want concrete answers from spotty, and poorly cleaned datasets. What most people have access to can usually only come at an answer obtusely, circling it roughly, and provide incomplete results. There are usually many caveats.
With this in mind I believe that we need to think of data analysis more as a social science than as a math. The foundational concepts are mathematical. The tests and manipulations are mathematical. But the analysis, reading the data and translating that to take action, are social science. I have a liberal arts background.
We are taught to think critically, to find confounding variables and describe the world in-situ. We don’t often get the luxury of control groups. We are primarily focused on the limited information before us and have to describe it, hoping for something meaningful.
That atmosphere is exactly how data functions in most cases. Yet many courses continue to give students perfect sample datasets as if the methods and tests will answer the questions. But they don’t. Data is not neutral, nor is it perfect. By asking students to ignore imperfections and work only with illustrative data, we are losing the meat of what is important.
The best analysts I know work with some of the world’s worst data. They often cannot run statistical tests. They cannot say a single thing for certain. But they can think through the ramifications of their data. They consider ethical questions, use long lateral chains of logic, and temper their recommendations appropriately.
Good analysis is rarely about the data. The data is there to inform and help guide, but it takes a backseat to traditional forms of analysis. They are credible not because they rely on the data, but because they know the limits of the data.
A classic example is Abraham Wald’s analysis of plane damage during WWII. A statistician for the Allies during the war, he meticulously mapped out damage from planes that returned from battle.
It looked something like this.
He took his findings to the military, and the conversation went something like this.
General: “This data is great! I’ll tell our production line to add armor to these areas.”
Wald: “Hold on there, sir, we should actually figure out how to strengthen the areas without holes.”
General: “I don’t think I follow. You must be mad!”
Wald: “Quite the contrary. These are mapped from the planes that I observed. Planes that returned safely. All the empty areas aren’t because planes don’t get hit there. It’s because they didn’t make it back.”
And good analysis goes something like that.
We are often left to extrapolate and make hard decisions in grey areas. If analysis isn’t taught with the imperfections of the world in mind, we risk allowing bad analysis to propagate.
It’s like we are teaching people to read, but not giving them the tools to interpret an unreliable narrator. Literacy is a baseline, critical thought is the higher-level valuable skill, and I worry that we are ignoring it.
Teach data analysis Socratically. Ask questions, challenge assumptions, dive into critical thought. Data analysis is not a straightforward application of models and methods. It’s a constant conversation. Data provide us a tiny window into reality. We can peek at it, look at it sideways, but we’ll never get a full view.
We have to accept that limitation, make it a foundational concept for how we think about data. When we accept the imperfections, we can allow expertise to flow in, synthesizing new information with what is known.