Logo

The Data Daily

The Need for an Improved Data Science Project Management Process

The Need for an Improved Data Science Project Management Process

Data science managers (and senior leaders managing data science teams) need to think through many questions relating to how to best execute their data science efforts. For example, how should the team brainstorm ideas, how should the team prioritize those potential ideas, and more generally, how to help ensure the team delivers actionable insights.

While these challenges are very different then the technical machine learning challenges that most teams focus on trying to solve, these management challenges are equally important in helping to ensure a successful data science project. In other words, teams need to think about not only which specific algorithm a team should use, but also think about the process they use to effectively and efficiently work on a data science project.

In short, while there are many reasonswhy data science projects fail, many are not “technical” in nature. That is why having an effective data science process can minimize many potential issues. To understand how a team process can increase the value of data science projects by focusing on their team process, below are 5 typical reasons why a project will not deliver on its potential:

Having an effective data science team process can minimize these issues. For example, solving the wrong problem is often related to stakeholder engagement and communication within the data science team. Hence, a collaboration and communication framework would minimize the risk of solving the wrong problem. Similarly, operationalization challenges can also be reduced via improved communication across the team, in terms of discussing how a model would be used during the project, not just at the end of the project.

If data science teams need to use a team process, people often think that the best path forward is to use a software development process framework. In other words, data science teams (or team leaders) might recognize the need for a better process, and think, “let’s just use what works for software development”.

On the surface, data science and software may seem similar. Both fields produce code. Both are high-tech fields led by skilled professionals. Both are built on top of computer science and mathematics. But, there are key differences in data science projects as compared to software development projects. For example, data science projects are often much more exploratory in nature. Feel free toexplore the differences in more depth. Taking this into account, data science teams should use a project management / process framework that is best suited for a data science project (not something that works well for software development projects).

Some data science teams do use Scrum, which is the most popular agile framework for software development projects. However, many of these teams find that using Scrum is challenging to use in a data science context. For example, one key challenge for data science teams that use Scrum is that, in Scrum, iterations (known as sprints in Scrum) are always the same length. But it is often difficult for data science teams to know what can “fit in a sprint” (i.e., adhering to Scrum’s fixed time-boxed sprints can be problematic). Also, sometimes it helps to learn from an iteration that might be shorter (or longer) than the defined sprint length.

So, with this in mind, data science teams (and data science leaders) should establish an agile data science project process framework, but think more broadly than just assuming they should use Scrum. For example, Data Driven Scrum enables many of the benefits of agility but was defined within a data science context (as opposed to a software development context).

As part of defining a team process, the team should also define a data science life cycle, which are the steps required to do a data science project (a life cycle is sometimes called workflow). A team’s life cycle typically includes steps such as obtaining data, cleaning the data, and then creating a machine learning model.

A data science life cycle is useful, as it helps to make sure the team has a shared mental model (and common vocabulary) of the work required in a data science project. CRISP-DM is the most commonly used framework for defining a data science life cycle. Get an overview of CRISP-DM to understand its strengths and weaknesses.

Wrapping it all up, when working on data science projects, it is important to think through how the team will work together and this team process should include (1) how the team will communicate and collaborate via an agile framework (2) a life cycle framework, and (3) how these two frameworks are integrated.

For more information on data science project management, browse my blog posts on www.datascience-pm.com.

Images Powered by Shutterstock