Python / Python Mathematical Intuition and Scikit Learn Interview Questions
Derive mathematically why bagging (bootstrap aggregating) reduces variance, and under what condition it does NOT help.
Suppose you have B independent models, each with the same variance σ² and the same expected prediction (no bias change from averaging). If the predictions were truly independent, the variance of their average would be Var(average) = σ²/B — variance shrinks proportionally to the number of models, approaching zero as B grows. This is the textbook justification for averaging predictions.
In practice, bagged models are trained on bootstrap samples drawn from the same original dataset, so their predictions are correlated with some pairwise correlation ρ, not independent. The correct formula for the variance of an average of B correlated variables is Var(average) = ρσ² + (1-ρ)σ²/B. As B → ∞, this converges to ρσ², not zero — meaning bagging's benefit is capped by how correlated the base models are. If the base models are highly correlated (ρ close to 1, e.g. deep, low-variance, very similar decision trees on similar data), bagging provides little benefit. This is exactly why random forests add feature-level randomness on top of bagging: to drive ρ down further.
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import cross_val_score
# Plain bagging: bootstrap samples only, no feature randomness
bagging = BaggingClassifier(
estimator=DecisionTreeClassifier(),
n_estimators=100,
bootstrap=True,
max_features=1.0, # ALL features considered each split — higher correlation
)
# Random forest: bootstrap samples AND feature randomness
forest = RandomForestClassifier(
n_estimators=100,
max_features='sqrt', # only sqrt(n) features per split — lower correlation
)
bagging_scores = cross_val_score(bagging, X, y, cv=5)
forest_scores = cross_val_score(forest, X, y, cv=5)
print('Bagging std:', bagging_scores.std())
print('Forest std:', forest_scores.std())
# Forest typically has lower variance due to decorrelated trees
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
