Introduction to the Gradient Boosting Algorithm

Read original article here

The Boosting Algorithm is one of the most powerful learning ideas introduced in the last twenty years. Gradient Boosting is an supervised machine learning algorithm used for classification and regression problems. It is an ensemble technique which uses multiple weak learners to produce a strong model for regression and classification.

Gradient Boosting relies on the intuition that the best possible next model , when combined with the previous models, minimizes the overall prediction errors. The key idea is to set the target outcomes from the previous models to the next model in order to minimize the errors. This is another boosting algorithm(few others are Adaboost, XGBoost etc.).

The loss function basically tells how my algorithm, models the data set.In simple terms it is difference between actual values and predicted values.

A gradient descent procedure is used to minimize the loss when adding trees.

Weak learners are the models which is used sequentially to reduce the error generated from the previous models and to return a strong model on the end.

Decision trees are used as weak learner in gradient boosting algorithm.

In gradient boosting, decision trees are added one at a time (in sequence), and existing trees in the model are not changed.

This is our data set. Here Age, Sft., Location is independent variables and Price is dependent variable or Target variable.

Step 1: Calculate the average/mean of the target variable.

Step 2: Calculate the residuals for each sample.

Step 3: Construct a decision tree. We build a tree with the goal of predicting the Residuals.

In the event if there are more residuals then leaf nodes(here its 6 residuals),some residuals will end up inside the same leaf. When this happens, we compute their average and place that inside the leaf.

After this tree become like this.

Step 4: Predict the target label using all the trees within the ensemble.

Each sample passes through the decision nodes of the newly formed tree until it reaches a given lead. The residual in the said leaf is used to predict the house price.

Calculation above for Residual value (-338) and (-208) in Step 2

Same way we will calculate the Predicted Price for other values

Note: We have initially taken 0.1 as learning rate.

When Price is 350 and 480 Respectively.

With our Single leaf with average value(688) we have the below column of Residual.

With our decision tree ,we ended up the below new residuals.

Step 6: Repeat steps 3 to 5 until the number of iterations matches the number specified by the hyper parameter(numbers of estimators)

Step 7: Once trained, use all of the trees in the ensemble to make a final prediction as to value of the target variable. The final prediction will be equal to the mean we computed in Step 1 plus all the residuals predicted by the trees that make up the forest multiplied by the learning rate.

Gradient Boosting algorithm is very widely used machine learning and predictive modeling technique (Preferred in Kaggle and other code competitions).

Hope you like my article!

Images Powered by Shutterstock

The Data Daily

Introduction to the Gradient Boosting Algorithm