Almost half of the work of data scientists who are involved in machine learning production involves re-coding models from Python/R to another language or vice versa. According to Anaconda’s State Of Data Science 2021 report, re-coding models was the biggest roadblock to production for those involved in infrastructure.
The survey that involved more than 4000 respondents from 140 countries also had a view that meeting IT security standards is the top blocker for data engineers, DevOps, product managers, and system admins. Also, for machine learning engineers, the biggest roadblock to getting models to production is access to compute resources.
From working in silo to cleaning massive data on legacy infrastructure, the most promising job of the century seems to be losing its charm. There are many reasons why data scientists are unhappy or decide to quit, but there are also many that just pisses them off.
Data scientists are expected to do magic with their use of data and just solve all the issues of the company and its sales. This requires long working hours, extremely short deadlines and clearly no scope for error. This is not really a healthy environment for the ‘scientist’ to grow and mature.
According to the Anaconda survey, 25% of respondents said that a lack of data literacy among decision-makers at their organisation limited their team’s ability to impact business decisions. This lack of data literacy at the executive level ultimately hurts the ability to make data-driven business decisions. The data scientists think that they are brought in to write smart machine learning algorithms and create analytic reports. But many times, the company needs a chart to present in board meetings each day. This gap leads to frustration on both sides where the company doesn’t see value being driven ‘quickly enough’ and the data scientist just becomes unhappy in their role.
Many data scientists are hired in companies where there is no setup for analysis at all. The employee needs to hire more people and build the infrastructure from scratch. This makes them more of a product manager, and they don’t really get to work with data and build models. This issue props us as many companies fail to hire senior/experienced data practitioners before hiring juniors. This is the perfect recipe for an unhappy relationship for both parties.
Too many uncooked inflated benefits that come from being a data scientist have just dropped. Employees have come to realise that the reason they joined the glorious data science actually doesn’t exist, and it is just another job.
For data to actually show results, it is essential that data scientists and the different departments of a company have multiple collaborations. But it is quite strange that most of the time, data scientists actually work alone in silos. What data scientists find the most difficult to do is politics. Each team across the organisation needs to be data-centric and collect data in a proper way for algorithms to work. While it is not achievable immediately, many departments also feel that while asking for data, the data scientists are ‘after them.’ Also, with a lot of politics in the corporate culture, it is difficult for data teams to bring policies into place.
Many companies still rely on legacy systems and do not have proper machine learning tools. The data scientist entrant comes from an academic world and has been on platforms like Kaggle, GitHub, and other open-source projects. They want to work on high-end projects, but in reality, they spend a large part of their time making sense of the data. The Anaconda report states that machine learning engineers find compute resources to be the most significant roadblock to deploying models to production.
A large part of a data scientist’s job is monotonous and requires cleaning and processing raw data. Almost 80 per cent of his/her time is spent doing that. For many companies, it, in fact, has to be started at digitalising data from handwritten or files.
Let’s look at the sales manager taking all the credit for cracking amazing deals throughout the quarter with the help of the data analysis done by the team of less than 5 data scientists, something which took them months to work on. The other side of it also involves the non-technical executives making many assumptions about the skills of a data scientist, if he/she is not an expert in just one of Spark, Hive, Pig, Hadoop, SQL, MySQL, Neo4J, Python, R, Tensorflow, Scala, Pytorch, A/B Testing, NLP or anything machine learning. Frustrating, isn’t it?
After the long working hours, rigging through data, and managing office politics, it appears hard to believe for people how a data scientist can be burnt out with the ‘sexiest’ job of the century?