With the advent of automated machine learning, data scientists will need to adapt their role in the data science life cycle.
Automated machine learning (autoML) is being adopted more broadly across all industries as companies try to get the most out of their data science programs. As this trend continues, many data scientists are questioning their value to the organization and what they can offer that autoML cannot. To understand this, it is important to understand just what autoML is and how it fits into the full data science life cycle.
AutoML is the umbrella term for tools and platforms that automate the steps of selecting the right model and optimizing its hyperparameters to generate the best model possible under a given set of data. There are libraries such as auto-sklearn and auto-WEKA that provide these autoML capabilities. There are also cloud platforms in this space that provide an entire ecosystem for automating machine learning, including Azure Machine Learning, Amazon Machine Learning, the Google Cloud Platform, and IBM Watson. These cloud providers fall under the category of machine learning as a service (MLaaS).
The goal of autoML is to shorten the cycle of experimentation and trial and error. It cycles through a large number of models and the hyperparameters used to configure those models to determine the best model available for the data presented. This is a tedious and time-consuming activity for any human data scientist, even if he or she is highly skilled. AutoML platforms can perform this repetitive task more quickly and exhaustively to reach a solution faster and more effectively.
The ultimate value of the autoML tools is not to replace data scientists but to offload their routine work and streamline their process to free them and their teams to focus their energy and attention on other aspects of the process that require a higher level of thinking and creativity. As their priorities change, it is important for data scientists to understand the full life cycle so that they can shift their energy to higher-value tasks and hone their skills to further elevate their value to their organizations.
The first step in any machine learning initiative is to identify what problem the business has to solve. During problem identification, data science teams evaluate what defines success for the business and determine where the application of machine learning can assist the business in achieving its business targets.
In this step, it is vital that data science teams understand the business (and business in general) well. Team members must understand business processes, have expertise in existing and potential markets, know the competitive and regulatory landscape within which its business operates, and be able to navigate the political ecosystem in which the data science program lives.
Such business acumen is not always among the strengths of traditional data scientists -- whose focus has traditionally been the mathematical and computer programming aspects of their role -- but it must become so for the future. This is an opportunity for a data science team to expand its team composition. The more technical data scientists can coach individuals with more business expertise on the intrinsic value of machine learning. Traditional technical data science teams working jointly with techno-savvy business partners can improve their outreach into (and value to) the business.