Python / Python Mathematical Intuition and Scikit Learn Interview Questions
Why is a log transformation commonly applied to skewed numerical features before modeling, mathematically?
Many real-world quantities — income, population, word frequencies, prices — follow a right-skewed (long right tail) distribution, often approximately log-normal. The mathematical property of the logarithm that makes it useful here is that it compresses large values much more than small ones: log(1000) - log(100) ≈ 2.3 while log(100) - log(10) ≈ 2.3 as well — equal ratios become equal differences after a log transform. This converts multiplicative relationships into additive ones and pulls in the long tail, making the distribution closer to symmetric/normal.
This matters for linear models because OLS assumes residuals are normally distributed with constant variance (homoscedasticity); a skewed target or feature violates this and can lead to heteroscedastic residuals where prediction error grows with the magnitude of the target. It also matters for distance-based and gradient-based methods, where a few extreme outliers in the raw scale would otherwise dominate the loss or distance calculations.
import numpy as np
from sklearn.preprocessing import FunctionTransformer, PowerTransformer
import pandas as pd
# Simulating right-skewed income data
income = np.random.lognormal(mean=10, sigma=1, size=1000)
print('Skewness before:', pd.Series(income).skew()) # highly positive
log_income = np.log1p(income) # log1p handles zero values safely: log(1+x)
print('Skewness after:', pd.Series(log_income).skew()) # close to 0
# Integrate into a scikit-learn pipeline
log_transformer = FunctionTransformer(np.log1p, validate=True)
X_log = log_transformer.fit_transform(X[['income']])
# Box-Cox / Yeo-Johnson: more general power transforms that
# find the optimal transformation parameter automatically
pt = PowerTransformer(method='yeo-johnson') # handles negative values too
X_transformed = pt.fit_transform(X)
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
