The Data Daily

Why The Data Science Venn Diagram Is Misleading

Last updated: 12-03-2019

Read original article here

Why The Data Science Venn Diagram Is Misleading

Data Science is a relatively new industry albeit the fact that its components have been around for a long time. So why is it then that it is apparently so unclear what Data Science exactly is? Googling “what is data science” yields 1.590.000.000 results while googling “what is computer science?” only yields results. Given that Computer Science has been around for way longer than Data Science, this is quite astonishing.

If you have ever wondered what Data Science is, then you have probably come across the so-called “Data Science Venn Diagram”. As far as I could tell, one of the first people to introduce this visualization was Drew Conway, the CEO and founder of Alluvium, in 2010. Another article I stumbled upon during my research, published by Nathan Yu in 2009, elaborates on the components of the visualization in greater detail. Let’s take a look at the “Data Science Venn Diagram”:

The appeal of this visualization is pretty obvious. It is easy to understand and it conveys that data science is a combination of several disciplines. In this Venn diagram, the three components are hacking skills, math & statistics knowledge, and substantive expertise. Now, there are many variations of this Venn diagram on the Internet but, in essence, nearly all of them are based on these same three components.

The Data Science Venn Diagram is not wrong. It mentions essential components of data science while at the same time illustrating that data science takes place at the intersection of these components. Thus, if you have no idea what data science is and just want to get an idea of what it means, this Venn diagram is for you. Should you, however, desire to delve deeper into the seemingly endless realms of data science, then this Venn diagram is a starting point at best and misleading at worst. Let me end this paragraph with a quote from the legendary statistician John Tukey (1962) that summarizes my feelings towards the Data Science Venn Diagram very accurately:

Many articles that attempt to explain what data science is sooner or later utilize this visualization. Judging from the number of articles describing how to become a data scientist, a decent percentage of these readers probably aspire to become data scientists themselves. And this is where the Venn diagram can become problematic.

The Venn diagram is an abstraction. An abstraction, by definition, does not attempt to capture the complexities of reality. Nevertheless, a more appropriate title for this Venn diagram would probably be similar to something along the lines of “Data Science Hard Skills Venn Diagram”. All of the components of the Venn diagram are hard skills, that is, skills that can easily be measured by, for instance, taking a written test. Soft skills, sometimes referred to as interpersonal skills, on the other hand, can usually not be measured by taking a written test. They include skills like being able to effectively work in a team, communicate with other people in the organization (including non-technical employees), as well as being able to lead and manage a team.

Now, companies do not hire data scientists to work secluded from the remainder of the organization. They hire data scientists because they expect the data scientists to extract actionable insights from data that create value. Thus, first and foremost, a data scientist should be familiar with his or her company’s business model and understand how it creates value. Just to make sure we are on the same page: I am referring to making profits. Without an adequate amount of soft skills, even the most competent data scientist will have serious trouble achieving this goal.

Let me try to visualize how most companies look at data science:

The reality of working on data science projects today is that the management of most companies does not really care about the details of the data science part in the middle. That is why they hired you. Their job is to make decisions that will help the company to increase profits. Therefore, it is absolutely critical to not only master the technical aspects required to successfully practice data science but also the necessary soft skills.

In my opinion, the three most salient soft skills for successfully completing a data science project include:

Let us begin with the business problem. In order to be able to solve a problem, you must first understand it. In a corporate setting, you will be confronted with a business problem, that is, a problem that, when solved, will lead to value. This could be anything from creating a dashboard that simplifies and accelerates the decision-making process of management to using machine learning to predict sales growth. Business executives without a technical background do not necessarily know all of the details about data science (neither do they have to). They have been confronted with a problem and want to explore new ways to solve it. As a result, these business problems will usually not be very specifically laid out in advance. It the data scientist’s task to identify if or what aspects of the problem can be solved using data science. Since most data scientists do not have a business background, this can prove to be a challenging hurdle. Being able to switch between a business and technical mindset is an essential skill for effective business problem-solving.

During the second stage of my visualization, the skills named in the Data Science Venn Diagram are absolutely crucial. However, soft skills are still needed. Most importantly, data scientists need to be able to effectively work in teams of any kind. Depending on the organizational structure of the organization, data scientists could be working as the only data scientist within a team or as part of a larger analytics team alongside other data scientists, data warehousing specialists, etc.. Efficiently assigning tasks and working towards a common goal as a team is another essential contributor to success.

In this case, communicating refers mainly to interactions with employees within the organization that do not have a technical background. Being able to explain what the data science team is currently working on in layman’s terms bridges the gap between decorative and effective data science. After completing the technical part, data scientists have to communicate their findings to management. Giving captivating and interesting presentations is a skill that has to be earned through repeated practice. Withal, soft skills are rarely extensively covered in the curricula of technical majors. This includes everything from effective slide design to preparing a script that will catch and keep the listener’s attention throughout the presentation.

Working on soft skills can be just as important as working on hard skills when attempting to become an effective data scientist. Therefore, I believe that soft skills should be given more attention in one’s attempt to become a data scientist. Whether through self-study or taking a communications course at college, improving one’s soft skills is always a worthy investment.

Read the rest of this article here