Logo

The Data Daily

Machine Learning Process

Machine Learning Process

If you would like to get a general introduction to Machine Learning before this, check out this article:

Now that we understand what Machine Learning is, let us now learn about how Machine Learning is applied to solve any problem. 

This is the basic process which is used to apply machine learning to any problem :-

First step to solving to any machine learning problem is to gather relevant data. It could be from different sources and in different formats like plain text, categorical or numerical. Data Gathering is important as the outcome of this step directly affects the nature of our problem.

In most cases, data is not handed to us on a silver platter all ready-made, that is, it is not usually the case that the data we have decided is relevant may not be available right away. It is very much possible that we may have to perform some sort of an exercise or a controlled experiment to gather data that we can work with. We must also keep in mind that the data we are collecting is from legitimate and legal processes such that all the parties involved are well aware of what is being collected.

Let us, for the purpose of this article, assume that we have gathered data about cars and that we are trying to predict the price of a new car with the help of machine learning. 

Now that we have gathered data that is relevant to the problem in hand, we must bring it to a homogeneousstate. The present form of our data could include datasets of various types, maybe a table made up of a thousand rows and multiple columns of car data, or maybe pictures of cars from different angles. It is always advisable to keep thing simple and work with data of one particular type, that is, we should decide before we start working on our algorithm whether we want to work with image data, text data, or video data if we are feeling a little too adventurous!

Like every computer program, Machine Learning algorithms also only understand 1s and 0s. So in order to run any such algorithm, we have to first convert the data into a machine-readable format. It simply won’t understand if we put on a slideshow of our pictures! We can go with any type of data -numerical, image, video or text- but we will have to configure it such that it is machine understandable. We make sure this happens by Encoding the data — a process in which we take all the types of data and represent them numerically. 

For a simple and comprehensible introduction to Data Preprocessing and all the steps involved, check out this article :

Before we start building a Machine Learning model, we have to first identify our featuresand decide on our goal. Features are the attributes of our data which tell us about the different entities in the data. For instance, we could be having a huge dataset about cars to predict the price of a new car using machine learning. With these cars being the entities, features in this case might be the engine power, mileage, top speed, color, seating capacity, type of car etc. etc..The goal or the Target variable in this case would be the price of the car. 

When we work on any machine learning problem, we always split the dataset that we have into a Training Set and a Test set, usually a (70/30) or (80/20) split respectively. The Training set, as the name suggests, is used to train the model. When we “train” the model, it tries to understand how all the features of the dataset form the target variable — in case of supervised learning, or the relationships and correlations between all the features — in case of unsupervised learning. After this, the Test set is then used to find out how well the model’s understanding is of the data.

After transforming data such that it is clean and workable, we get a better idea of the solution we will try and implement to solve the problem. This is because it is actually the data that decides what we can and cannot use.

Say we want to build a chatbot. The chatbot will answer as per the user queries. So, we can say that the first step to any conversation will be the chatbot trying to identify the intent of the user, and there is our first machine learning problem – Intent Classification.

This problem requires us to use a particular type of data – Text based data. The machine learning algorithm we choose must be a classificationalgorithm, that is, it classifies the new input data to a certain label class as per the data which it has already seen. Before this step, of course the text from the user will get encoded and go through all the data preprocessing steps necessary and then it will be fed into the machine learning algorithm. Although we have to be careful in selecting our machine learning algorithm, it is good to explore all the available options and working out with various appropriate machine learning algorithms before selecting the final one – it is considered as a best practice anyway.

A Cost Function in a nutshell, is a mathematical function which gives out cost as a metric; And as you may have heard – there is a cost associated with every decision that we take. 

This function is used to quantify the penalty correspondent to every step in any procedure. In terms of an optimization problem, we must work on minimizing this cost value. 

Let us go through an example – Suppose you are climbing down a cliff. At any point you have several paths to take to eventually reach the bottom, but you will :

If we associate going up with a penaltyor a cost,we will be increasing the total cost (in terms of time and effort) if we go up. So we can potentially keep time and effort as factors if we were to design a mathematical function to quantify this costmetric. 

Another example – Suppose you are driving down on a road trip from Place A to Place B. Again, we have several paths to reach B but we :

If we associate this situation with cost, we will have a high cost if we neglect the two points mentioned above. Here we can keep time and gas money as the factors making up our cost function and judge the path taken henceforth. 

Any machine learning algorithm must reach an optimal state for it to function properly. A Cost Function helps us in determining whether or not our model is at that optimal state. That optimal state is found out by the model by continuously comparing the model hypothesis value to the original value in the training set. Woah…back up! What!?  Don’t worry we will go through all the concepts carefully!

Behind any machine learning model is essentially a mathematical function which explains the role of various features in the data to either form the target variable or to form correlations between different features. 

As mentioned before, during training the machine learning model tries to understand how different combinations of values of the training data featuresform the corresponding target variables. To understand better let us take one training record, training essentially means taking all the features of this record and somehow mapping it to this training record’s target value. A brilliant example would be the cars dataset we were talking about earlier. Notation-wise, the features are taken as X and the Target variable is taken as Y. During this process, training data is fed into a learning algorithm which is chosen based on the problem we are trying to solve. It could be a classification problem, a regression problem or maybe something else entirely. It is the job of this learning algorithm to output this Hypothesis Function.

For a two-variable problem, this could be our Hypothesis function. All the θ values are parameters, or weights, which are chosen such that we get an estimate value closest to the corresponding Target value for each record.

The Hypothesis Function then takes in the features from each training record and tries to estimate the corresponding target value. This function could be a simple linear function or maybe something complex, it really depends on the data and the type of algorithm that is being used. And because it is an estimator function, the output values are not expected to be exactly equal to the target values, at least not in the first attempt. Let us take our cars dataset once more, if we put a learning algorithm to use on this dataset and try to train it using the features, we will get an estimate of the price of each car in the dataset. Now as this is a training dataset, we already have the price of each car as the Target variable.

This is where the Cost Function comes into play. We want the difference between the estimated value and the actual Target value present in the training data to be as low as possible, only then can we call say our model is a strong one, meaning that it will give out the correct value of the Target value or at least return a value which is very very close to this Target value for a particular training record. So, this becomes a minimizationproblem. The difference is what is called the cost and the minimization function is what is called the cost function. There are several ways to achieve a state of minima. We could simply minimize the difference between the estimated value and the target value over the whole training set, or we could take the squared difference, or maybe some other variation to achieve the same thing. One of the most widely accepted and quite a reasonable cost function is this one which you will stumble upon very easily if you are reading up on machine learning algorithms :

This function works well for most of the regression problems. Yeah, I know, I know I said I will keep it simple and not scare you with weird equations. Worry not, we are not going to fill up a giant chalkboard with formulas or formulae if you were to be so formal. Let me give you a quick explanation and make everything crystal clear.

J(θ) – Cost Function Notation m – Number of training records hθ – Hypothesis Function x(i) – ith training data record hθ(x(i)) – Hypothesis Function value for ith training record y(i) – ith target value

As shown in the Hypothesis trendline graph above, the main objective of minimizing the cost function is to get a straight trendline which covers most of the Target values, or at least is as close to the points as possible. This is why we calculate the differences and write a cost function to reduce them.

And this is not a one-time process, it is more of an iterative process in which we choose our parameters for the Hypothesis function, calculate the estimated values, then use the cost function to find out the cost. After that we minimize this cost and perform the whole activity again. In this way we re-do the whole calculation to get to the point where we think we have the most optimized function. We can check the state of the current result at any time by just plotting the function against the Target values. However, this whole iterative process is what is at the heart of all optimizing algorithms in place today, so you don’t have to perform this activity repeatedly. The most popular one, and you might even have heard about it, is Gradient Descent.

And when we do have that Hypothesis function which has the estimated values closest to the Target values, we can take that function and claim it as the function which fit the data in the best possible manner. And there we have our model!

In this article, I wanted to write about the general process which is followed when solving any machine learning problem and building a machine learning model. This was more of a theoretical explanation, but I do have more technical guides lines up.

I hope you liked this article, subscribe/like/share for more such content.If you have any concerns regarding the content, let me know in the comments! Thanks for reading!

Images Powered by Shutterstock