I spoke recently to Bárbara Barbosa, leader of the Creditas Data Science team, about the difficulty she had in finding good candidates for the Data Engineering team. The data field is quite competitive, and its scenario is made up of inexperienced candidates and salaries that are too high; and those candidates that were actually good choices for our team really want to become data scientists. Since then, whenever I talk about this matter, I say:
Well, I don’t need to say that I’m not the only one who thinks so, do I? See this question on Quora about whether we are heading toward a talent bubble in Data Science.
The truth is that in order to be a true data science talent you need to be that fantastic being everyone keeps talking about: someone who has hacking skills, math and statistics knowledge, and mastery of business. A true unicorn!
Yes, unicorns do exist. True data science talents do exist. Those people who master everything needed to impact the business by using data, from the conception of the business issue up to the delivery of the production model, they do exist. But they did not start their careers like that. It is simply not possible to gather enough experience to perform such a complex role after only a few years in the area.
Studying hard, dedication, taking online courses on all kinds of algorithms and techniques seems to be the standard strategy to becoming a unicorn. But the truth is that you need much more than that to become significant in solving real issues. You need experience, and you can only get experience with time and work.
In her "Data Science is different now" post, Vicki Boykis, a data scientist since 2012, says that the current market is full of candidates for junior vacancies in data science, where competition can be 100 candidates for 1 vacancy. She advises those who wish to become data scientists to use the following career plan:
then, regarding this PWC report, she says:
And that is exactly the reality we see in applications to Creditas Data Scientist and Data Engineering vacancies. DS vacancies are much more competitive, while we have many more open DE positions.
Even though unicorns are extremely valuable to our teams, we know that this is not the best way to work. Times have changed. Creditas now wishes to specialize its Data Science and Data Engineering teams. This way, we can improve the efficiency of both teams.
It is a well-known fact that 70% to 80% or more of the time spent by data scientists is actually just in preparing the data. This is a task that is clearly the responsibility of a Data Engineer.
Another example is data organization in data lake, data capture through crawlers, or even deployment and packaging of the generated models (a position nowadays known as a machine learning engineer).
We do not need unicorns who are capable of solving issues on their own because that model is neither sustainable nor scalable. What we need today are data engineers who can create an infrastructure to maximize time for scientists focused on data analysis and model training.
Data Engineering is part of Data Science, focused mainly on technological and analytical infrastructure in order to collect, organize and enable the necessary analyses. To better explain this, I have created a new version of the Venn diagram for the skills needed for a data scientist, by going beyond "hacking skills" in "programming" and quot;database," and that are specific to and present in the Data Engineering routine.
When we look at the diagram, we see that the only skill not used in Data Engineering is Math and statistics. What we do every day is encode data movement, preparation and transformation flows to democratize data access within the organization simply and instinctively.
As I said before, the need to make faster deliveries requires our teams to be specialized, and the use of Mathematics and statistical models has become a well-defined border between DE and DS responsibilities; and it is probable that this is a trend for the market as a whole.
Moreover, it must be noted that a Data Engineer is not a Data Scientist who does not know Mathematics; this is not a matter of qualification, but specialization. DE focuses on infrastructure so that DS can focus on research and modelling. A good DE team unlocks the entire company’s productivity in data use, including for the DS team. (This paragraph was suggested by Jéssika Darambaris, thank you!)
In short, everything related to programs that manipulate databases (create, migrate, transform, convert, etc.), such as:
In the list above, the technologies in brackets are those we use here at Creditas.
Professionals from different backgrounds can be valuable to our team in many ways,and the most common are:
If you found this article interesting and would like to apply for a vacancy, follow the link: https://jobs.kenoby.com/careers-creditas