Boosting 1 - AdaBoost.M1 Overview
A series of blog posts on Boosting Methods
1 Introduction
Boosting is a powerful machine learning technique that transforms a collection of "weak" classifiers into a strong, unified model.
A "weak" classifier performs only slightly better than random guessing, but by combining their outputs, boosting creates a "committee" of classifiers that collectively produce highly accurate predictions.
At its core, boosting aims to improve the performance of weak classifiers, which are slightly better than random guessing, by iteratively focusing on the data points that are hardest to classify.
2 AdaBoost.M1.
AdaBoost.M1 is a prominent implementation of the boosting technique for binary classification.
It adjusts weights to emphasize misclassified examples, effectively combining weak classifiers into a robust model.
While initially designed for two-class problems, AdaBoost.M1 can also be extended to multi-class tasks.
2.1 Two-class Problem
- Output variable \(Y \in\{-1,1\}\)
- Predictor variables \(X\)
- Classifier \(G(X)\) produces a prediction in \(\{-1,1\}\).
2.2 Weighted Error Rate
The typical error rate on the training sample is \[ \overline{\operatorname{err}}=\frac{1}{N} \sum_{i=1}^{N} I\left(y_{i} \neq G\left(x_{i}\right)\right) \]
Expected error rate on future predictions is \(\mathrm{E}_{X Y} I(Y \neq G(X))\).
In AdaBoost, we use the weighted error rate to assign different weights to samples, guiding the classifier to focus on harder-to-classify examples. \[ \operatorname{err}_{m}=\frac{\sum_{i=1}^{N} w_{i} I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)}{\sum_{i=1}^{N} w_{i}} \]
2.3 Weighted Majority Vote Classifier (Prediction)
\[G(x)=\operatorname{sign}\left( \sum_{m=1}^{M} \alpha_{m} G_{m}(x)\right)\]
where \(\alpha_1,\ldots,\alpha_M>0\) are weights from the boosting algorithm and they weight the contribution of each respective \(G_{m}(x)\).
2.4 Key Idea
- Observations weights: misclassified observations \((y_i \neq G_m(x_i))\) have their weights \(w_i\) increased for the next step by \(\exp(\alpha_m)>1\) at each step \[ w_i \leftarrow w_i \cdot \exp\left[\alpha_m \cdot I(y_i \neq G_m(x_i))\right] \]
- Classification weights: give higher influence to the more accurate classifiers \[\alpha_{m}=\log \left(\left(1-\operatorname{err}_{m}\right) / \operatorname{err}_{m}\right)\]
3 Algorithm AdaBoost.M1.
Initialize the observation weights \(w_{i}=1 / N, i=1,2, \ldots, N\)
For \(m=1\) to \(M\):
Fit a classifier \(G_{m}(x)\) to the training data using weights \(w_{i}\)
Compute weighted error rate
\[ \operatorname{err}_{m}=\frac{\sum_{i=1}^{N} w_{i} I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)}{\sum_{i=1}^{N} w_{i}} \]
Compute classifier weight \(\alpha_{m}=\log \left(\left(1-\operatorname{err}_{m}\right) / \operatorname{err}_{m}\right)\).
- larger error will have smaller \(\alpha_m\)
- give higher influence to the more accurate classifiers
Set \(w_{i} \leftarrow w_{i} \cdot \exp \left[\alpha_{m} \cdot I\left(y_{i} \neq G_{m}\left(x_{i}\right)\right)\right], i=1,2, \ldots, N\)
- For misclassified observation, increase the weights since \(\exp(\cdot) > 1\)
- observations misclassified by \(G_m(x)\) have their weights scaled by a factor \(\exp \left(\alpha_m\right)\)
Output
\[ G(x)=\operatorname{sign}\left[\sum_{m=1}^{M} \alpha_{m} G_{m}(x)\right]. \]
4 Code
4.1 Code from Package
1 | import numpy as np |
When features dimension is low:
n_features=2, n_informative=2, n_redundant = 0, random_state=6)
, the AdaBoost present a slightly better accuracy. The decision boundaries can correctly predict more samples.1
2
3
4AdaBoost Accuracy: 0.91
AdaBoost Error Rate: 0.09
Decision Tree Accuracy: 0.89
Decision Tree Error Rate: 0.11When features dimension is high (especially with redundant features):
n_features=20, n_informative=15, n_redundant=5
, AdaBoost present a significant advantage.1
2
3
4AdaBoost Accuracy: 0.82
AdaBoost Error Rate: 0.18
Decision Tree Accuracy: 0.66
Decision Tree Error Rate: 0.34
4.2 Code from Scratch
1 | class AdaBoost: |
5 Reference
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer. https://doi.org/10.1007/978-0-387-84858-7