Jasmine Samuel works as a data scientist in New York City. She left academia a few years ago for the private sector. Having worked in both academic research and business consulting, she tells me there’s a big difference between what academia wants and what business wants from data science.
“A lot of organizations will just use very basic descriptive statistics to understand performance,” she told me.
Like many of the young talent in the growing profession, she understands what better predictive modeling can do for decision making.
The only problem, though, is it’s a challenge to make that clear to the business stakeholders.
“Oftentimes the metrics they want are going to be something like ‘Group A performed 60% or 20% better,’” she said. “That’s how they’re gonna make a decision.”
While that type of information is certainly handy to know, it means that many businesses are not fully utilizing the data they have, much less the advanced skills of data scientists.
“Descriptive statistics are not for decision making,” said Jasmine. “They’re not inferential. That’s something I wish more people widely understood.”
This is part of the growing pains the new data science profession is experiencing as it is embraced by businesses that haven’t historically used statistical modeling.
“I think we have a pretty big problem within the space of data,” said Jasmine. “Every organization wants to be data-centric, but organizations don’t necessarily know how to do that. From a leadership perspective, they don’t understand the data and they don’t know how to monetize it.”
Academia, insurance, finance, and pharmaceuticals have relied on statistical modeling for much longer than most industries. They have developed a shared vocabulary and understanding for what researchers, actuaries, economists, and biostatisticians actually do for their organizations.
“That’s a great thing about academia and research,” said Jasmine. “You’re working with people with the same mental models and generally the same skill set. I’ve never had to break down a model, because they understand what it is and what it does.”
Other businesses have not developed that shared understanding yet.
“When I’m running any type of testing, I try to get away from the words hypothesis testing,” said Jasmine. “Because I don’t think people understand that or it ends up becoming a little too academic for that environment.”
While many companies proudly state they’re investing in data and predictive analytics, many still focus on what Jasmine referred to as “descriptive statistics.”
Descriptive statistics usually show up in organizations as key performance indicators(KPI), which is what most organizations think of when they think of data. Most KPIs are unique to an industry, but they usually include simple calculations like average, total, median, etc.
KPIs are important to a business. They are a good way to rally teams behind a shared business strategy, but there’s a point where creating new KPIs has a diminishing rate of return for businesses.
Predictive analytics offers more precision and insights for improving businesses once they reach peak effectiveness for KPI reporting.
Data scientists know this. Like Jasmine though, they’re having to adapt their language to help stakeholders see this value.
“I think that anyone that’s a data scientist or a data analyst is going to struggle with is trying to communicate any test they’ve run to stakeholders,” said Jasmine. “I think the biggest thing is being able to not only explain what happened but explain what didn’t happen.”
How can data scientists learn to better explain this stuff?
One answer might be better data storytelling.
If you follow data science closely, you’ve probably heard of the term data storytelling. Data storytelling is something of a buzzword nowadays. But like most buzzwords, there is value hidden beneath the surface.
I spoke with Brent Dykes, who authored the book Effective Data Storytelling: How to Drive Change with Data, Narrative, and Visuals,and he explained to me why data science professionals often ignore this skill set.
“I think there’s some hesitancy from data people to be persuasive with their information and I think that’s a mistake,” said Brent. “You can’t be completely objective, especially if we want people to take action on the data. I think we need to guide audiences down a certain path and interpret the information in a certain way.”
Some data scientists may not like this advice. It can sound like manipulation, which is certainly a valid concern. There’s no shortage of instances where people have phrased data in a way to give false information to the general public.
But according to Brent, being good at telling a story does not mean manipulating the audience.
“In Germany,” he said. “There was a group of people that did some research into the benefits of eating chocolate. They ran a bunch of tests to see how chocolate affected the health of these individuals that consume the chocolate and then published their findings in a reputable health scientific journal in Europe.”
He was referring to an intentionally bogus study about how chocolate would help you lose weight. This study was published in a respected journal and was shared by respectable news outlets all over the world.
“The research was really bad science,” said Brent. “[The researchers] wanted to show how people latch onto these facts and figures that come out of these studies. That’s an example of what looks like a data story on the surface, but it’s actually not a data story because it’s not built on that solid foundation data.”
In other words, good data storytelling will never be good if it didn’t start with good analysis in the first place. A data scientist who uses good methodology and doesn’t use data storytelling as a shortcut to a good narrative isn’t being manipulative at all. Both are required, according to Brent.
“Obviously, we have their best interest at hand,” said Brent. “We’re not trying to deceive or manipulate [stakeholders] in any way. We’re just trying to show them the insights that they need to pay attention to and the consequences of those insights. Otherwise, what are we doing?”
Even though stakeholders are used to thinking in terms of KPIs, they probably would appreciate the insights from predictive analytics. We just have to learn how to sell those benefits.
That helps explain the problem that Jasmine, the data scientist, highlighted earlier. Most data scientists understand predictive analytics, but the stakeholder seems to want only descriptive statistics.
Why? It’s far easier for stakeholders to find a story in descriptive statistics than predictive analytics.
An analyst who states “Group A did 60% better than Group B” does more to tell a narrative than a data scientist who focuses on telling how they reduced false positives in their model.
As Brent told me, “[a good] data storyteller understands the audience, they understand the data, and they’re able to communicate in an effective way using a narrative structure.”
If you noticed though, Brent didn’t just advise focusing on a good narrative structure — you also have to consider the audience too.
Something data scientists may want to ask themselves is whether their audience really needs predictive analytics at all? Maybe descriptive statistics tell all the data stories their industry will ever need.
The hard truth may be that the audience just doesn’t need the story we’re trying to tell.