While there are some benefits to having citizen data scientists, they are no silver bullet – and they certainly aren’t a replacement for true data scientists.
Gartner analysts have coined the term, “citizen data scientist” to mean “a person who creates and generates models that leverage predictive or prescriptive analytics, but whose primary job function is outside of the field of statistics and analytics.” And in some circles, this role has been widely promoted as being the solution to help organizations accelerate their ML/AI journey.
But as the saying goes, caveat emptor. While there are some benefits to having citizen data scientists, they are no silver bullet – and they certainly aren’t a replacement for true data scientists.
Organizations in almost every sector are actively working to see how they can leverage AI and ML to accelerate their business and achieve greater outcomes. On average, only 54% of AI models move from the pilot to production phase, according to a new survey by Gartner. There are several reasons behind this; for some companies, not having a skilled data science team is just one of them.
Skilled professionals cost money – you have to hire the right people with the right skills. So, the idea of a citizen data scientist is appealing – this notion that you could have someone go in to quickly build a model using the tool sets they’re given without having a strong understanding about the data, the background of that data, or how to cleanse the data and pick the right features.
This scenario is appealing but highly unlikely. Someone without that background, without that context-specific understanding, can’t just go and build a model. There’s not going to be a good outcome because fundamentally, they don’t know what they have; they’re just throwing things into the system.
What’s more is that we’ve seen a number of examples where even well-trained data scientists have (inadvertently) contributed to bias and drift issues with ML models. Think of the case of Zillow’s failed iBuying algorithms or Facebook’s terrible photo mislabelingincident. In other words, if even the experts can get it wrong, how can we reasonably expect the novices to get it right?
If you go this route, you’re going to wind up with a classic “garbage in, garbage out” problem. Let’s say your organization is a bank, and you’re a business analyst with access to a data warehouse. You have access to people’s income level, demographics, locations, address and so on. The idea that some technology vendors are putting forth is that you can just take this data and throw it into their tool – and then it will pick the right algorithm for you and hand you the right prediction.
But what often happens in this process is that maybe no one is going in to ensure that data is correct. You have to make sure you’re doing data cleansing or feature engineering. You have to understand what you have. For instance, in a loan application, you have different elements – a street address, a phone number and so on. When you’re doing feature engineering, each piece of information can be a feature.
Typically, what a data scientist needs to do is determine which element carries the weight in order to predict the outcomes. And a citizen data scientist is unlikely to be able to do this without a lot of training and heavy lifting.
If that person doesn’t understand the data and puts it into a tool without proper data cleansing, your input is garbage, and the system will spit out garbage based on a lack of understanding of the data. The tool alone can’t make the data better.
Even with advanced AI/ML tools, you still need trained data scientists who can curate the data and determine what’s good and bad. You need people who know how to do feature engineering. Otherwise, you’re going to end up with models or algorithms that fail you.
A citizen data scientist is a great aspiration, but it’s not a cure-all, and it’s not a replacement for trained data scientists. To put it another way, how would you feel if you were on a plane and were informed that a “citizen pilot” (or even a student pilot or an avid user of flight simulators) would be flying the plane?
It’s just not that simple. And again, this doesn’t mean that the citizen data scientist isn’t a viable concept. It’s just important to understand that these roles need to be supplements to, not replacements for, data scientists.
It’s tempting to think you can use technology to disrupt or replace the need for certain skilled roles, especially as many industries struggle with skills gaps.However, at this current juncture in time, the technology isn’t at a level where AI/ML projects can just be handed off to citizen data scientists.
Despite all the talk about no-code or low-code software development, the industry has come to understand how realistic this approach is and where it works. For serious software development, the no-code/low-code approach doesn’t work when you need to develop mission critical software. It is even more far-fetched, then, to have only citizen data scientists running your AI/ML.
A deep understanding of your data is critical to AI/ML success. This is what professional data scientists bring, along with the necessary contextual insight that determines which data is good and useful and which isn’t. While it’s certainly a positive to bring in new voices and ideas into the field, this doesn’t eliminate the need for skilled, professional data scientists. You can’t cut costs by not having professional data scientists in place. Victor Thu is president of Datatron. Throughout his career, Victor has specialized in product marketing, go-to-market and product management in C-level and director positions for companies such as Petuum, VMware and Citrix.