As we inch further into the year, I have seen more and more postings for data science positions, especially on LinkedIn, and other similar job-posting sites. After an expected lull due to current events, companies have figured out their budget and focus. Some of those companies include newer data science positions that they need to hire as soon as possible or in the near future.
There are several reasons for becoming a data scientist. I am going to highlight five main reasons I became a data scientist, and hopefully, it can align with some of the reasons why you would become one as well.
As with many positions that have any general set of expected skills, data science is no exception, and can usually be thought to have these skills that I will outline below. Of course, there are others, but I will focus on the skills I come across the most at various companies as a data scientist.
— the heavily debated Python versus R is usually controversial, but ultimately, it just depends on what the company is already using as their main programming language. Sometimes, data scientists can work alone and form models and output results directly to a stakeholder, and usually refer more to R in this case. However, in my experience, it has been easier to work cross-functionally with both data engineers and software engineers with the use of Python. This language is oftentimes used for deployment purposes, so, it can be easier to start with Python from the start. The benefit is that in the process of learning data science, you will learn Python or R, which will help you earn a variety of skills that can support you better down the road if you chose a different career path such as software development.
— another popular skill for data scientists is SQL. Sometimes, online courses and universities neglect to stress the importance of how widely used this language is for data scientists. It is nearly used for every project I work on because the dataset is not simply given to you. You have to make your own dataset, and that involves querying your database tables with SQL. Like Python (and somewhat R), learning SQL is useful not only for data science but for data engineering and data analytics as well.
— while this skill is not a programming language, it is still important. Business, more so a concept, is something every data scientist learns. Similarly to SQL, it is not taught in education settings nearly as much as it should. What I mean by the business is that you need to really get used to jumping into situations that are not strictly just data science. The business uses data scientists to either make a process more efficient or find insights that will change the business in the future. Oftentimes, education for data science will focus so much on obtaining the highest accuracy for say, segmenting different types of customers. It can be great to achieve 98% accuracy, but if you are not able to come up with a plan for how you would implement the model and its results thereafter, then your model is useless.
You need to know that stakeholders, CEO’s, C-Suite/higher leadership, will ask what you will do with your results to change the business. So in turn, you would want to apply those customer segmentation groups to a marketing campaign through various, targeted emails. Then, you would create a test of some sorts to see how the emails performed, say with an AB test. As you can see, just having an extremely accurate model is just one part of the data science and business process. Practicing this business process over and over again is extremely beneficial.
— there was more focus on statistics in school, and it can prove to solve many problems for a data scientist. Knowing statistics is critical for data scientists, as it is the foundation of machine learning models. Practicing analysis of variance, or population sampling, etc, is useful in several forms of the business, say marketing campaigns again, or AB testing.