Ensemble Techniques!

SagarDhandare
3 min readJan 26, 2022

--

The Ensemble technique is one of the most fundamental algorithms for classification and regression in the Machine Learning world.

In the election, we know that the candidate would win when they get a maximum number of votes i.e majority of votes. The Ensemble technique has a similar underlying formula where we aggregate predictions from a group of predictors (models), which may be classifiers or regressors, and most of the time the prediction is better than the single predictor. Such algorithms are called Ensemble methods and such predictors are called Ensembles.

The Ensemble technique is a combination of multiple models, we can also be called multiple decision-makers. Instead of using one single algorithm, we can use n number of algorithms to build the model.

There are major three types of ensemble technique,

  1. Bagging
  2. Boosting
  3. Stacking

Let’s see one by one what it is,

Bagging

Bagging is the type of ensemble technique in which a single training algorithm is used on different subsets of the training data where the subset of sampling is done with replacement (bootstrap). Once the algorithm is trained on all the subsets, then bagging makes the prediction by aggregating all the predictions made by the algorithm on different subsets. Bagging is also called Bootstrap Aggregation.

In the Regression problem statement, the prediction is simply the average of all the predictions and in the classification problem statement, the prediction is the most frequent prediction i.e majority vote among all the predictions.

Bagging algorithms

  • Random Forest

Advantages of a Bagging

  • It will help in decreasing the variance of the model as we are aggregating the result of the n number of models.
  • If the data is large, then it can save computational time by training the model on a smaller data set and still can increase the accuracy of the model.

Pasting

Pasting is also an ensemble technique similar to bagging with the only difference being that there is no replacement done while sampling the training dataset.

Boosting

Boosting is the type of ensemble technique that starts from a weaker decision and keeps on building the models such that the final prediction is the weighted sum of all the weaker decision-makers. The weights are assigned based on the performance of each tree. While calculating the weight of the next decision tree, the learning from the previous tree is also considered.

Boosting Algorithms

  • Ada Boost (Adaptive Boosting)
  • Gradient Boosting
  • XG-Boost (Extreme Gradient Boosting)

Advantages of a Boosting

  • It handles the missing values.
  • Robust to Outliers.
  • Feature Scaling is not required.

Stacking

Stacking is a type of ensemble technique that combines the predictions of two or more models. Assume we have a problem statement and we want to use several different models like Random Forest, Support Vector Machine, K Nearest Neighbours, etc... in that case, we will use stacking.

Summary

In this article, We saw what is ensemble, the types of ensemble techniques, the advantages of each ensemble technique.

Thanks for reading. Please let me know in the comments below or else ping me on LinkedIn if you have any doubt.

find me on LinkedIn | GitHub | Email

Happy Learning!!! ^_^

--

--