Python / Python Deep Learning and Neural Networks Interview Questions
How does batch size affect deep learning training mathematically and practically?
Batch size controls the trade-off between gradient estimate quality and training speed. With batch size B, the gradient is estimated as the average loss gradient over B samples — the variance of this estimate is proportional to σ²/B, where σ² is the per-sample gradient variance. Larger batches give lower-variance (more accurate) gradient estimates, but with diminishing returns: the benefit of doubling the batch size has halved variance but the compute cost also doubles.
Generalisation effect: empirically, large batches often lead to sharper minima that generalise worse than the flatter minima found by small batches. The noise in small-batch SGD acts as implicit regularisation — the stochastic gradient trajectory tends to find broader minima, which are more robust to small perturbations. This is the 'large batch training problem'. Mitigations: linear scaling rule (scale lr proportionally with batch size), warmup, and gradient accumulation (simulate large batches while maintaining small-batch noise).
import torch
import torch.nn as nn
model = nn.Linear(10, 1)
criterion = nn.MSELoss()
# Gradient accumulation: simulate batch_size=1024 with micro_batch=32
accumulation_steps = 32 # effective_batch_size = 32 * 32 = 1024
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
optimizer.zero_grad()
for step, (X, y) in enumerate(loader):
# Forward and backward every micro-batch
loss = criterion(model(X), y) / accumulation_steps # scale by 1/K
loss.backward() # gradients accumulate, not cleared
if (step + 1) % accumulation_steps == 0:
# Clip and step only after accumulating K micro-batches
nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
optimizer.zero_grad(set_to_none=True)
# Linear scaling rule: if you double batch size, double the lr
base_lr = 1e-3
base_batch = 256
new_batch = 1024
new_lr = base_lr * (new_batch / base_batch) # 4e-3
# But use warmup to stabilise the larger lr at the start
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
