Logo

The Data Daily

Lighting a Fire Under Enterprise Data Science - DATAVERSITY

Lighting a Fire Under Enterprise Data Science - DATAVERSITY

How do we characterize a good Data Science professional? A well-trained Data Scientist can deal with dirty data and knows how to cleanse it the right way to avoid compromising analysis at the back end. He or she is the individual who the CEO should be thanking for preventing a multi-billion-dollar mistake when some random data cleansing artifacts taint an analytical outcome.

Today, such mistakes are more likely to happen because the IT group often has the job of joining data together from multiple different sources, and a separate statistician group performs analysis on what is handed to it. Lack of communication between the two entities means that it is rare to catch data cleansing mistakes, which can lead to wrong decisions and to loss of business value.

The Data Scientist’s ability to own the whole process is incredibly powerful and essential to enabling enterprises to consume what they really want out of Data Science: “Business ideas that generate value,” said Michael Li, founder of The Data Incubator, a Data Science fellowship, hiring, and training organization.

Li discussed the topic of building a Data Science mindset and driving Data Science adoption in businesses during his presentation, “Growing a Data Science Organization,” at the Enterprise Data World 2017 Conference. Li focused much of his attention on helping industries understand how Data Science aligns with the issues confronting them.

In financial services, for example, automation of what often remain incredibly manual processes could make a big dent in reducing risk and identifying upsells. Take an insurer’s underwriting processes: while a provider may ultimately come up with the best quote for a customer, if it shows up two days later than a competitor’s offer, the client may have already moved on. “If you can push and accelerate the time to deliver the quote using Data Science to automate some things that underwriters would do manually, you can win business outright,” Li said.

Other opportunities in the industry lie in using Data Science for lookalike analysis – that is, to quickly identify commonalities among customer groups to determine which of an insurer’s hundreds or thousands of products tend to be used by particular segments, and then building customized offers of those same products to new clients who fit the same profiles. “It’s a prediction problem or recommendation system,” Li said.

In the healthcare sector, pharmaceutical companies can use Data Science to uncover information in unstructured hospital physician notes to identify latent patient populations who could be suitable candidates for new clinical trials. Hospitals could use Analytics of textual data to understand who among their patient populations is likely to require readmission and then provide those patients with additional support in an effort to subvert that high-cost occurrence.

In the defense industry, Data Science can be employed to protect troops by using Natural Language Processing and Analytics to identify trends in enemy communications or time series analysis to predict combatant movements.

“We’ve found in working with organizations [that it’s important to] build up use cases, [to show] how to map the core body of knowledge that is Data Science and connect it to what’s happening in these industries,” Li noted.

As important as use cases are to growing a Data Science mindset, so too is ridding companies of misconceptions they may have about what Data Science is or what is necessary to enable it. For example, a lot of emphasis is placed on p-value and hypothesis testing. Li noted that there’s almost a blind reliance on p-values, and one of the biggest issues with that is that such reliance often ignores the fact that p-value and business-value don’t necessarily equate to the same thing. “You really care about business significance,” Li said. If the measurements are statistically significant but have no significance to the business, then ignore the results:

“If it is significant to the business but not statistically significant, maybe you just need to collect more data. As you get more data it tells you if it’s true or not true and you get to a statistically significant combination that probably tells you where you want to go and what category you want to act in.”

He also reminded the audience that there is a lot of drudgery that accompanies Data Science – like those potentially troublesome extract, transform, load (ETL) processes. It’s not all “the sexy Machine Learning Analytics part,” he said. And never be fooled into thinking that Data Science is just the data and that the data will tell you everything. “It’s thinking about data cleverly and really pulling out the kind of clever insights around sample biases in data,” he said. Who among your customers, for instance, is not supplying data, and how does that potentially affect your analysis? That’s important to making the data story meaningful in a business context.

On the first point, Li commented that evidence-based decision making must take precedence over Highest-Paid Person’s Opinion (HiPPO) decision making. Businesses must leverage constant testing and fail in fast and controlled ways to constantly renew products and services. A default “yes” on permissions – democratizing data by opening up access to it – also feeds a data-driven culture, as does universal data literacy through training and support. “You don’t just want a few people to understand how to use the data you’ve made available,” he said.

Also, it’s important that the Data Scientists not be off in their own little corner, doing analysis devoid of business value, but rather they should be embedded in the business units themselves.

All this is easier to make happen when top executives lead the charge to make the company more data-driven and take steps like recruiting an elite Data Science core team to educate and drive best practices around how the organization will “do data.” They can give those who work with them autonomy but establish the goals and key performance indicators (KPIs) of where the data-driven culture should take the organization. They can be responsible for standardizing the data infrastructure such that there’s no need to unnecessarily reproduce work across teams.

“Let the elite core standardize a connected data infrastructure,” Li said, noting that the fellows coming out of The Data Incubator are often brought into organizations for that purpose.

This elite core should also have the role of maintaining good data hygiene and quality. “They have to constantly be on top of what one data source does, why a value in one column is ‘no’ half the time, what’s wrong with original source data,” he added. “That matters to getting anything useful from data.”

Finally, building a Data Science Center of Excellence is recommended. This provides a place that defines Data Science in terms of best practices and can also support an enterprise-wide training plan based on the skill sets a business has and what’s missing to meet its goals.

Depending on the situation – such as, a company wants to grow an idea quickly but lacks the talent to do so – developing those skills won’t just be a matter of training, but also of making outright hires who will operate under a strong leader.

A Data Science Center of Excellence can also function as a Data Science project accelerator, providing an opportunity for everyone throughout the organization to pitch Data Science ideas that will be judged by external parties to provide credibility. That, Li believes, can build excitement as people consider how they can contribute to something that may bring them high-profile recognition in the enterprise.

Here is the video of the Enterprise Data World 2017 Presentation:

Images Powered by Shutterstock