What is Data Science?
Data Science seems really exciting but first, let us get our basics clear! What actually is data science? I’m not going to bore you with long lines of definition so here’s a short explanation:
Data Science is an amalgamation of Statistics, Computer Science, and specific domain knowledge.
Statistics and computer science are the generic fundamentals that can be perfected by studying and a little bit of practice. It is the domain knowledge that takes time, research, and effort to gain.
You don’t need to master each vertical but having a decent grip on all will help you in the long run.
Data Science is quite a big field in itself. It starts with simple data reporting activities to advanced predictive modeling using Artificial Intelligence. As you can observe by looking at the Data science spectrum below, the higher the complexity the higher its business value.
Data science is thrilling! Now, let’s look at the actual role of a data scientist.
What does a data scientist role look like?
Caution: These terms are losely used in the industry. The exact role can depend on the maturity of your organization in data initiatives.
The role of a data scientist is fairly expansive and will depend majorly on the type of project that you are working on. Here, we will discuss the general lifecycle of a data science project.
Understanding the problem statement – Seems really simple, right? Believe me, it isn’t. Understanding the problem statement will be the make-or-break situation for the complete duration of the project. At this stage, A team of data scientists and the concerned team go over the objectives and expected requirements of the project. It requires good communication skills, stakeholder management for this step. A good data scientist won’t hesitate to spend an ample amount of time on this step. Once the problem statement is clear, the data scientist can move on to the collection of data
Gathering Data – Once the requirements are obtained and the hypothesis formed, the data scientist then proceeds to mine the needed data. The source of the data can vary such as company data warehouse, web scraping, and so on
Data Cleaning – This is the most time-consuming process of the entire data science project. It may take up to 80% of your time. Here, the data scientist will be munging, manipulating, wrangling the data. The time and effort are worth it since the health of your data will reflect the health of your output model. During this stage, the data scientist deals with outliers, missing data values, correcting the data types, and many other operations. This is not the most exciting step but the most essential one
Exploratory Data Analysis (EDA) – It is basically the step where the data scientist gets the “feel” of the data. It is at this stage that you can analyze each feature or multiple features in the dataset and check how they behave. You may also analyze the relationship of features with other features. You can expect a lot of data visualization at this stage. Be ready to gain some crucial insights during this stage that will help you in other steps
Feature Engineering – Feature engineering is not so much of a step but an art. It is an iterative process, going one by one through all the features and applying operations to improve the performance of the model. For example, you can combine some of the strong features and try to improve the model. It will require a lot of trial and error
Model Building – Model building in itself is relatively a fast step but planning is important. Do you want a model with high accuracy or a model that can return the importance of features? You will need to think upon and select your strategy for model building and its evaluation
Deployment – Once you have built and evaluated your model, it is finally time to deploy it in the real world. This step typically requires the data scientists to work with data engineers or machine learning engineers
Problems solved by data scientists
As I discussed in the earlier section, the role of a data scientist is relevant to all the fields and departments and so are its applications. In this section, I will be discussing a couple of problem statements that a data scientist works upon.
Build a model to predict which transaction is fraudulent.
Requires real-time decisions on fast-flowing data.
Complex problems since 99%+ transactions are not fraud.
It has a direct impact on the bottom line of the organization.
A vast amount of past customer behavior data is used.
Use Vehicle images from accidents to assess the extent of the damage for an Insurance Company
Extracting damage information from images is a highly complex task.
It requires automation of the task
Automation will help the current team to assess the damage better.
A vast amount of image data is required.
These are a few problem statements and can vary according to the data maturity of the organization.
Data Science-based roles you must know
Data Scientist – Works on complex and specific problems to bring non-linear growth to the company. For example, making a credit risk solution for the banking industry or use images of vehicles & assess the damage for an insurance company automatically.
Data Engineer – He/she would implement the outcomes derived by the data scientist in production by using industry best practices. For example, Deploying the machine learning model built for credit risk modeling on banking software.
Business Analyst – Helps in running the business smoothly by assisting the management to make data-driven decisions on a day-to-day basis. This role would be communicating with the IT side and the business side simultaneously.
Again, there are a lot of other roles under the data science umbrella such as data analyst, statistician, data analytics manager, MIS professional BI professional, etc. Make sure you do your due diligience before jumping into this space.
I’m excited! What do I need to get started in Data Science?
Getting Started with Data Science and Python: The start of your journey to becoming a data scientist! Understand what a data scientist does, the various terms associated with data science, and start getting acquainted with the Python programming language
Statistics and Mathematics: The backbone of data science. Some of the key concepts you’ll cover are probability, inferential statistics, and get a hang of how to perform exploratory data analysis (EDA). This will also include the basics of linear algebra (another core machine learning topic)
Machine Learning Basics: Welcome to the world of machine learning! This section is all about introducing you to the basic machine learning algorithms and techniques, including linear regression, logistic regression, decision trees, Naive Bayes, support vector machines (SVM), among others
Ensemble Learning: Time to deep dive into advanced machine learning topics. Understand what ensembling is, the different ensemble techniques, and start working on datasets to gain a hands-on practical experience
As I mentioned earlier, you can learn all of this in comprehensive details as part of the BlackBelt+ program.
Tools you must master for Data Science
Microsoft Excel – Excel prevails as the easiest and most popular tool for handling small amounts of data. The maximum amount of rows it supports is just a shade over 1 million and one sheet can handle only up to 16,380 columns at a time. These numbers are simply not enough when the amount of data is big.
SQL – SQL is one of the most popular data management systems which has been around since the 1970s. It was the primary database solution for a few decades. SQL still remains popular but there’s a drawback – It becomes difficult to scale it as the database continues to grow.
Python – This is one of the most dominant languages for data science in the industry today because of its ease, flexibility, open-source nature. It has gained rapid popularity and acceptance in the ML community.
Tableau – It is amongst the most popular data visualization tools in the market today. It is capable of handling large amounts of data and even offers Excel-like calculation functions and parameters. Tableau is well-liked because of its neat dashboard and story interface.
Soft Skills for Data Science
Problem-Solving skills – The knowledge of statistics and computer science can be achieved by studying but it is the domain knowledge along with the problem-solving skills that will help you become a long shot. A majority of companies start their data science recruitment with problem-solving tests. You don’t need to be a master at it but a curious mind will help you in forming this skill.
Structured Thinking – The ability to structure your thoughts and map out each of them is certainly a must-have skill. Structured thinking is made of use in the initial steps of the project where the problem statement and hypothesis are to be formulated.
Storytelling Skills – A key skill that all the data science and analytics professionals must have is the ability to express the data in a format that is understandable by the stakeholders – a story. It is this step that requires creativity and human skills.
I’m learning. But how do I get recognized by Data Science recruiters?
Build your GitHub profile – Github is the place where you keep your projects. Other people can go through your project, add improvements, and so on. It is a great place to get recognized by critical people and network with them. Start your project and upload it to Github. This will help you in building a strong foundation.
Keep updating your resume – It is a natural tendency for humans to go towards perfectionism but it can be harmful. Instead of adding Python, machine learning, and SQL together in your resume with half-baked knowledge, it is advised to add skills one-by-one after perfecting it. For example, add Python when you are comfortable with it and only then move on to machine learning.
Participate in competitions – Data Science competitions are a sure shot way to improve your performance as a data scientist. Although it may take you a while to get adjusted, it will help you in the long run. You can go on the DataHack platform and pick a problem statement of your choice and get started. Recruiters love the candidates who have built their knowledge through practical applications.
Start writing articles – If you have a knack for data science and a passion for writing then what is a better way to express yourself than writing articles? Article writing helps you learn all the hard technical concepts and turn them into easy-to-grasp topics. Article writing is another great way to help you catch the eyes of potential recruiters.
Bonus Tip – Find a mentor and personalize your goals
Finally, let us delve into something you must keep in mind before starting your data science journey. Each and every one of us is unique and comes from different backgrounds. All the above points must be applied in a personalized manner to reap the maximum benefits.
For example, let’s say you are working in the IT sector for 5 years and you wish to move internally to the newly formed data science team. It turns out that the data science team majorly works on the NLP front. How will you go about it? You will need to craft your own personal goals.
The whole process might sound easy to implement in a linear fashion – learn Python -> machine learning -> deep learning and so on, but that’s not the case in a real-world scenario and you need that last piece of the puzzle to master data science – a mentor.
A mentor can customize your goals and your learning path and make sure that you earn the industry exposure that is relevant to you. And this is how Analytics Vidhya’s flagship course – BlackBelt + is crafted.
BlackBelt + consists of all the courses that you will require to master the art of data science – from basic Python, SQL, Excel to advance machine learning and deep learning techniques. What’s the best part? You will always be connected to your mentor. The mentor will devise a learning path according to your own goals and learning objectives.
Following this approach won’t just help you learn but become an industry-leading professional that is termed as the “Sexiest Job of the 21st century”.
Data Science is evolving a lightning speed and that’s what makes the role of a data scientist so exciting. In this article. we discussed some of the key points you must know before delving into this thrilling profession. I hope you’ll be much more insightful about the role after going through all the sections.
Start your journey by perfecting the basics and then begin to enter into advanced activities while building your profile simultaneously to be able to recognized by the potential recruiters. Make sure that you find the right mentor who will be able to guide you on the right path. If you want to become an industry-ready professional you can connect with mentors at Analytics Vidhya as part of the flagship course BlackBelt + .
You can also read this article on our Mobile APP