Python / Python Mathematical Intuition and Scikit Learn Interview Questions
Why is PCA sensitive to feature scaling while decision tree feature importance is not, mathematically?
PCA's objective is to find directions of maximum variance in the data, computed from the covariance matrix. Variance is measured in squared units of the original feature, so a feature measured in large units (e.g. salary in dollars, variance in the millions) will dominate the covariance matrix and consequently the principal components, regardless of whether that feature is actually more informative than a feature measured in small units (e.g. age in years, variance in the tens). This makes PCA fundamentally scale-dependent.
Decision trees, by contrast, choose splits based on threshold comparisons (feature ≤ t) and evaluate the resulting impurity reduction — neither the comparison nor the impurity calculation depends on the numeric scale of the feature, only its relative ordering and how well a split separates classes/reduces variance. Multiplying a feature by 1000 doesn't change which split point achieves the best separation, so tree-based feature importance (computed from total impurity reduction attributable to a feature across all trees/splits) is naturally scale-invariant.
import numpy as np
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
# Demonstrating PCA's scale sensitivity
X_unscaled = np.column_stack([
np.random.randn(100) * 1, # small variance feature
np.random.randn(100) * 1000, # huge variance feature (different units)
])
pca_unscaled = PCA(n_components=2).fit(X_unscaled)
print(pca_unscaled.explained_variance_ratio_)
# Almost entirely dominated by the large-variance feature!
pca_scaled = PCA(n_components=2).fit(StandardScaler().fit_transform(X_unscaled))
print(pca_scaled.explained_variance_ratio_)
# Closer to 50/50 — reflects each feature's TRUE informativeness
# Tree-based feature importance is scale-invariant — no scaling needed
rf = RandomForestClassifier(n_estimators=100).fit(X_unscaled, y)
print(rf.feature_importances_) # unaffected by the artificial scale difference
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
