Artificial intelligence is a data hog; effectively building and deploying AI and machine learning systems require large data sets. “The development of a machine learning algorithm depends on large volumes of data, from which the learning process draws many entities, relationships, and clusters,” says Philip Russom of TDWI. “To broaden and enrich the correlations made by the algorithm, machine learning needs data from diverse sources, in diverse formats, about diverse business processes.”
At the same time, AI itself can be instrumental in identifying and preparing the data needed to increase the value of AI-driven or analytics-driven systems. Companies have needed cadres of data scientists or high-level analysts to put AI and machine learning algorithms in place, AI itself may ultimately help automate such roles to a large degree.
“A new generation of enterprise analytics is emerging, and it incorporates some degree of both automation and contextual information,” according to Tom Davenport and Joey Fitts, writing in Harvard Business Review. AI-enhanced analytics systems “can prepare insights and recommendations that can be delivered directly to decision makers without requiring an analyst to prepare them in advance.”
Business intelligence analysts and quantitative professionals “will still have important tasks to perform, but many will no longer have to provide support and training to amateur data users,” according to Davenport and Fitts. “Small to mid-size businesses that haven’t been able to afford data scientists will be able to analyze their own data with higher precision and clearer insight. All that will matter to organizations’ analytical prowess will be a cultural appetite for data, a set of transactional systems that generate data to be analyzed, and a willingness to invest in and deploy these new technologies.”
Of course, the ability to effectively automate data science tasks depends on industry and circumstances. As Matt Przybyla, senior data scientist and author of Toward Data Science, points out, there often still needs to be trained human guidance to AI and machine learning initiatives, especially if the output is critical to the tasks at hand. “Sure, use an automated data science platform if you already have a data analyst on your team. Or, use the automated solution for predictions that are not harmful if incorrect. Categorizing clothes incorrectly is not the worst thing that can happen, but when you are in the health or finance industry and you classify a disease or large sums of money incorrectly, the harm is undeniable.”
While automated AI data science tools or platforms may be easy and powerful, they also may leave businesses with unanswered questions. “Imagine you are not a data scientist and have not had an academic background in the various types of machine learning algorithms,” Przybyla continues. “You will have to explain these platform model results and implement the suggestions or predictions with regards to your company’s integrations, which could prove to be time-consuming and difficult.”
There may be ways to automate various pieces of data science roles, but the skills category that will still be essential is that of data engineer. There are many tasks required to source, manage and store data in which data scientists don’t necessarily want to get involved. “To succeed with AI, companies should have an automation environment with reliable historian data,” a McKinsey report observes. Then, companies “will need to adapt their big data into a form that is amenable to AI, often with far fewer variables and with intelligent, first principles–based feature engineering,” the study’s authors, led by Jay Agarwal, state. Data engineering is needed to produce “smart data” to improve predictive accuracy and aids in root-cause analysis. This, along with equipping staff with the right skills, can provide services that can help increase revenues up to 15 percent, they relate.
“The most important role the most important first hire is a data engineer,” says John Mosch, senior manager of analytics, business intelligence, and data science at Cisco. “Without data, there’s nothing to do. These are the people who are going to make the data available and usable. They’re going to collect it and arrange it into a form that can be ultimately useful for analytics that ultimately is used by data scientists. A data scientist can’t find anything, can’t do anything until there’s a good set of data to work from.”
Data scientists and high-level data analysts will continue to be in demand, and are critical to helping enterprises design and test algorithms and data needed to predict trends, automate processes, understand customers, and engage with customers. However, the amount of data flowing into and through enterprises is overwhelming, as are demands for new algorithms and capabilities — beyond what a data scientist can accomplish. AI is opening the door to better and more accessible AI.