4 must-know normalization techniques for data scientists
Just now·4 min read
If you have been practicing machine learning and deep learning for some time, you should have come across this word normalization.
What is this normalization or standardization and how this technique is helping data scientists and AI engineers all over the world to help improve their models?
In this article, we will be discussing four types of normalization techniques that are quite popular among the community.
4. Local Response Normalization
Standardization is nothing but converting your data into a standard format. What is this standard format you may ask? A standard format is that format of data when the mean of the data is “Zero” and the standard deviation of that data is “One”.
It means the data will range from -1 to 1, also one more important point is that the distribution of the standardized data will look like a Gaussian curve or a bell curve.
Image is taken from https://www.advsofteng.com/doc/cdcfdoc/images/histogram.png
So, how can we convert our data into standardized data?
The answer lies in the definition itself. We have to first find the mean and standard deviation of the data. subtract every point of the data with the mean we just found and then divide it by the standard deviation of the whole data. (It has been seen that the fractional power of any data-set also behaves like a normal distribution like x⁰⁴, x⁰², etc(here 04 means 0.4 and so on))
Image is taken from https://media.vlpt.us/images/jiselectric/post/8862ef9a-13a2-4402-8c80-1929d7c37083/0_PXGPVYIxyI_IEHP7.png
Normalization is much simpler than standardization. it is just re-scaling of our data into a particular range. Generally, that range is from 0 to 1 but you can take any range.
A very famous way to normalize your data is min-max normalization. all we have to do is to find the minimum and maximum value from our data.
subtract every point of the data with the minimum value and divide it by the difference of the minimum and maximum value.
Image is taken from https://androidkt.com/wp-content/uploads/2020/10/Selection_060.png
Till now we have seen 2 types of feature scaling methods. Both of these techniques are preprocessing techniques, which means they are performed on the data before it is given to the model. But if you have worked with artificial neural networks then you know that there are many hidden layers in between input and output layers. The idea behind batch normalization is that the hidden layers can also be benefited from the standardization technique we just learned.
we perform standardization just like above, the only new thing here is that we add two new learnable parameters according to the given formula. These parameters are added so that the data distribution can overlap with the derivative curve of the activation function we are using in that layer. (Don’t get scared by the above sentence if it doesn’t make complete sense, you don’t need to implement it from scratch in most of the cases)
batch normalization formula is taken from https://kratzert.github.io/images/bn_backpass/bn_algorithm.PNG
These gamma and beta variables are known as scaling and shifting parameters
sorry for this chaotic image :/
LOCAL RESPONSE NORMALIZATION
Local response normalization was first widely used in the alexnet architecture made by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton
The idea behind this type of normalization is that if we are extracting different features from any image then pixels at similar locations of different feature maps should support each other’s decision.
Image is taken from https://cdn-images-1.medium.com/max/1000/1*78qCGHQ7HQwPdCiEQhQJaQ.png
Unlike other normalization techniques which were only applied to one dimension of numerical data, LRN(Local response normalization) can be applied depth-wise across the channel of any convolution layer. (you have to know about CNN before reading about LRN)
Image is taken from https://paperswithcode.com/media/methods/Screen_Shot_2020-06-22_at_3.35.19_PM.png
And this concludes my discussion about normalization techniques used in machine learning and deep learning. My knowledge about data and its manipulation is still very limited, so all the suggestions are heartily welcome.