Python / Python Mathematical Intuition and Scikit Learn Interview Questions
What is the statistical rationale behind k-fold cross-validation, and why are k=5 or k=10 commonly used?
Cross-validation estimates how well a model generalises to unseen data by repeatedly splitting the training data into a training fold and a validation fold, training on the former and evaluating on the latter, then averaging the results. K-fold CV divides data into k equal partitions, using each partition once as validation while training on the remaining k-1 folds, giving k separate performance estimates that are then averaged.
This addresses a fundamental statistical tension: using more folds (larger k) means each training set is larger and closer to using all the data, which reduces bias in the performance estimate, but the k estimates become more correlated with each other (since training sets overlap heavily), which can increase variance of the final averaged estimate. The extreme case, k=n (leave-one-out CV), has very low bias but high variance and is computationally expensive. Empirically, k=5 or k=10 has been found to offer a good bias-variance balance for the estimate itself, while remaining computationally tractable.
from sklearn.model_selection import KFold, cross_val_score, StratifiedKFold
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100, random_state=42)
# Standard k-fold
kf = KFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=kf)
print(f'Mean: {scores.mean():.3f}, Std: {scores.std():.3f}')
# StratifiedKFold preserves class proportions in each fold —
# CRITICAL for imbalanced classification problems
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
stratified_scores = cross_val_score(model, X, y, cv=skf)
# Leave-one-out: k=n, very low bias but high variance, expensive
from sklearn.model_selection import LeaveOneOut
# loo_scores = cross_val_score(model, X, y, cv=LeaveOneOut()) # slow!
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
