Ensemble Methods : Get your Entourage together.

Randy Taylor
3 min readMar 27, 2021

Ensemble techniques are another Machine Learning methodology. These are probably the best in their ability to have a protective buffer. What I mean is that Decision Trees can become drastically different with only a small change in data. However Ensemble techniques will build not just one but many models. Then once created the Ensemble as a whole will take a vote or average(for a continuous features) as the answer. There are two distinct ways to get an Ensemble of Models to join in on making the one final prediction; bagging(bootstrap aggregation) and boosting. In terms of robustness these techniques take a slew of different models and predictions and combines them. This then uses the different predictions in combination for one final prediction, much like a committee. Lets say we are predicting a category such as malignant or benign. We tally the vote of each doctor and pick the winner of the vote. With continuous variables we take the average of the predictions.

Bagging(bootstrap aggregation) is a very intuitive idea that you take your dataset and bag it into smaller sub-datasets. This method is considered to happen in parallel. This is because each model you now build from it’s own mini-dataset are independent and a slice of the original dataset. Then you build different models on those sub-datasets independently and differently from one another. This is classically known as a Random Forest. The default of BaggingClassifier is to build many trees. This then becomes the Random Forest technique that uses many decision trees. Each have variability and inherently designed to have each tree uniquely split. However you do not actually have to use different trees. You can use other machine learning models with the same idea(although this is known as an Ensemble Learning). With a random forest you will build different models on a sub-dataset however you only build splits on different columns(with BaggingClassifier you build models on all of the columns), again different for each model. Further you will split the leaves at different features. This will yield a forest of trees that each are slightly different. From this we will take the average/mode to give the final prediction. Because of the variation between each tree we no long worry about pruning. In fact overfitting becomes a feature. However we will not really be overfitting as we are only using a fraction of the training data. Thus we are essentially creating many different training datasets and training trees then averaging them. This reminds me of a machine gun as you are just taking so many shots you are going to hit something.

Boosting is the other methodology that is like a sniper. Here we build a simple decision tree. Then we use this decision tree and see where it was incorrect. We know how far off we were from the test(true as it’s called in ML) data and then adjust just a little. We use this tree to build the next tree and correct the inaccuracy by giving badly predictions a higher weight and more importance in the next tree. Conversely the correctly predicted values are given a lower weight and thus less important. This happens successively until we reach a tree is much better. We eventually have a tree that hits the target.

--

--