When dimension of data increases it becomes difficult for machine learning algorithms to process it. To overcome this limitation, deep learning was introduced and is now the game changer of 21st century AI world. As the name suggests, deep learning models analyze deep into the data and they extract high level features from the data. Deep learning model contains many layers such as input layer, hidden layers, output layer. As they contain more number of additional layers to extract high level features from the data, they are called deep learning models. So, what these layers are made up of?
Neurons are the building blocks of neural networks. A single neuron is termed as perceptron. This perceptron takes the input and perform mathematical operations to provide the output. The mathematical operations involved are the summation function and activation function. The summation performed here can be defined as the sum of product of each input and its respective weight added by bias. There are several activation functions and each one is carried out using unique mathematical operations.
In a single neural network there can be many layers and each layer may have different activation functions. Below is the formulation for each activation function in neural networks.
To get a better model, we always mix up activation functions within a neural network. Until the summing up process, the data is linear. Once it goes through the activation function it becomes nonlinear.
Consider a scenario where you create a model that predicts whether the image provided is of that of an alien or a human. The model has been trained and is ready to test. During the testing process, you are passing an image of a person who got injured in a fire accident and your model predicted human image as of that of an alien. Where did our model go wrong?
It is because we haven’t trained our model with the features of a person with physical disabilities. To make the model understand where it has gone wrong, we use the loss function to minimize the loss and maximize the prediction accuracy. This is where several mathematical formulations and optimization techniques are carried out.
We all might have come across Naïve Bayes theorem. Naïve Bayes theorem is associated with deep learning. This can be explained as; each input contributes individually to the output of a neural network. The percentage up to which each input contributes is termed as weights of each input.
Mathematical optimization technique deals with minimization and maximization. To maximize something, we minimize the other thing and vice versa. This process is called optimization. To optimize the algorithm, we change the weights and bias using back propagation method and improve the learning rate. There are several optimizers used in neural networks. They are:
Calculus comes into play in Gradient Descent Algorithm. The various optimizers are actually used to tune the learning rate of our model. The size of steps used to attain this minimization function that provides the best learning rate is to be determined using calculus. Calculus calculates slope of a graph at a particular point. Gradient Descent uses minimum value to rectify the optimization function. The desired magnitude and direction of the small steps is actually decided using calculus and this is where vectors come into play.
The important data structures behind the neural networks are:
All these mathematical concepts along with linear algebra and lot more are interconnected and they are altogether connected to Deep learning. Without mathematics there would be no deep learning.