Much Needed Mathematics for Machine Learning Algorithms
Photo by: Emeric’s Timelapse
Pure mathematics is, in its way, the poetry of logical ideas.— Albert Einstein
The fundamental idea behind these series of stories is to spread the knowledge with fellow data science | ML | AI enthusiasts. I remember my initial days, the days I struggled to understand a few concepts just to solve a miniature problem. The sleepless nights to finish assignments scare me at times. I don’t want my fellow learners to face this.
Hence, this following story is going to talk about the mathematics needed for understanding different machine learning algorithms.
Data Science, Business Analytics or Business Intelligence all of these are birds of the same nest and they have some features in common, It is safe to say that they are same same but different. One of the common features is the algorithms and models to compare, analyse and predict stuff.
Some of the most commonly used machine learning algorithms with mathematics are explained as follows.
Linear regression tries to represent the relationship between two variables by fitting a linear equation. Where, One variable is illustrative, and the other is supposed to be dependent.
Before we implement a Linear model, we must check whether there is a relation between the two variables. This doesn’t mean that one variable affects the outcomes of another. For example, higher IQ does not mean higher class graduates, Instead, there is some sort of significant relationship between the variables.
Example of Scatter-plot
Application of scatter-plot towards the variables shows increasing or decreasing in the overall trend which gives us enough leverage to obtain information. If there appears to be no relationship between the variables as observed on scatter-plot, then it is not suggested to implement a linear regression model.
Correlation Coefficient is the meaning or the value connecting two variables and it is between -1 and 1 and this shows the strength of the relationship.
— — — — -(Positive Correlation) — — — — —-— — — —(Negative Correlation) — — — — -
Math in Linear Regression:
Y=mX+c is the linear equation, where X is an illustrative or explanatory variable and Y is a dependent variable, m is the slope of the line with c being the intercept.
When X=0 then Y=m(0)+c gives Y=c
Further, the trend line follows a pattern that enables prediction, these trend lines are drawn using the models (mathematical functions)that fit the experiment with the help of the training data set. The model used in linear regression is as shown below.
Here, y is the target column that we are trying to predict, a’s are the values in data set and x’s are the columns we chose to experiment.
This is a method in which we try to derive the outcome of the experiment from a large set of independent variables to determine a column with binary value or the dependent variable. In short to predict the category.
1. To know if an operation will be a Fail or Success.
2. Whether the baby will be Boy or Girl
Dependent variable: The variables that need a value to be predicted and it is a binary number (0 or 1).
Independent Variable: The variable that is supposed to influence the value of the dependent variable.
Logistic regression is identified for the function used at the heart of the process, the logistic function. It is also called a sigmoid function and was explained by statisticians to explain characteristics of population growth. If plotted on a graph it would resemble an S shape.
Example for Logic regression S curve
Math in Logic Regression
A good understanding of Algebra and Probability will be helpful. The sigmoid function is represented as
Where y is the predicted output, e is the base of natural log or the Euler’s number. The value(z) is the actual value that we want to modify.
Naive Bayes’ classifiers are a collection of algorithms that are based on Bayes’ theorem. All the algorithms have a common goal that is every pair of columns or test data used is independent of each other. To make the calculations tractable, the probabilities for the hypothesis are made simple.
This can be mathematically written as:
P(A|B) = P(B|A)P(A)/P(B)
Here P(B) is not equal to 0, the events defined are A and B, P(A) and P(B) is to observe the probability of A and B independently. Whereas P(A|B) is a condition that states the occurrence of A when B is true, P(B|A) is a condition that stated the occurrence of B when A is true.
Naive Bayes model is easy to develop and especially helpful for very large data sets. It is not only simple but also is acknowledged to beat even very complex classification methods.
This is a collection of machine learning methods which are modelled to represent a human brain. It is often portrayed as ANN or Artificial Neural Network. It has a combination of Neurons, the output of every neuron is being verified or modified by the following. This is achievable by inputting the output of one neuron as an input to the other.
Larger the Neural Network, larger is the complexity. It is very evident that we are filled with applications of Neural Networks all around us and yet we don’t realize its potential. Some of the applications used are related to face recognition, auto-correction, Deep Learning…etc
A typical ANN is made with different layers, Every layer has a definite set of neurons. Different types of layers are Weights, Hidden layers, Input and output layers.
As the name suggests, It is a clustering technique in which a large pool of data is divided in to smaller groups or collections. These collections or clusters are formed by identifying a similarity.
This one of the unsupervised machine learning algorithm that is simple and effective.
A cluster is described as a group of data objects aggregated together due to certain similarities.
The motto behind K-means is pretty simple, It tries to group similar things together and find patterns among them. K-means looks for a certain number of clusters in a given data set.
Every cluster has a centroid that is the data point that is at the centre of the cluster and every other data point is moved closer to the centre to in order to keep the centroid as small as possible
Figure representing K-Means clustering
This basically uses a very simple math and you just have to know the concept of Co-ordinate Geometry or the formula to find the distance between two points on a plane also called as the Euclidean metric.
It is one of the most simple and effective form of supervised learning technique but is widely used. The decisions are usually made by if or else clauses.
The final answer or target is obtained by iterating through the tree in top to bottom approach by performing multiple gate operations like OR, NOR, XOR & AND…etc as every node further branches to two and the decision is made in a binary fashion(0 or 1).
OR operation, image by Hackerearth
AND operation, image by Hackerearth
From the decision trees above you can see that the tree starts from the top and ends at the bottom. So, the topmost node is the root and every time a node is split into two, it forms edges and the un-split edges are the leaves or the final decision.
The math involved is pretty simple, one must have a good knowledge towards basic algebra and probability.
The above were some of the techniques from ML | AI | data Science. These are very vast topics and innovation won’t wait for no one. A hyped technology is always a niche and its recommended to be updated with tech around us. Ultimately, Probability and Statistics are the backbones to any data science problem. If you are not a mathematics major, the concepts may seem a little intuitive but if you have already known these concepts then it’s just applying your theoretical knowledge to practice. Since learning data science is not exceptional you might want to give it some time to seep in all the required concepts before you perform your first experiment
Check out a related story: