Python / Python Deep Learning and Neural Networks Interview Questions
Compare SGD, SGD with momentum, RMSProp, and Adam optimizers. When do you choose each?
All these optimizers share the same goal — updating parameters to reduce loss — but differ in how they use gradient history to adapt the update step. Understanding the mechanics helps diagnose slow training and poor generalisation.
| Optimizer | Update rule (simplified) | Key advantage | Limitation |
|---|---|---|---|
| SGD | θ ← θ - η·g | Simple, no memory overhead | Slow convergence, sensitive to lr |
| SGD + Momentum | v ← βv + g; θ ← θ - η·v | Accelerates consistent directions, damps oscillation | Still global lr |
| RMSProp | θ ← θ - η·g / √(E[g²]+ε) | Adapts lr per parameter; good for RNNs | No momentum term |
| Adam | Combines momentum + RMSProp; bias-corrected | Robust default; fast convergence | Can generalise worse than SGD on some tasks |
import torch
import torch.nn as nn
model = nn.Linear(10, 1)
# SGD — baseline, works but needs careful lr tuning
opt_sgd = torch.optim.SGD(model.parameters(), lr=0.01)
# SGD + Momentum — adds velocity; β=0.9 is standard
opt_mom = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9,
weight_decay=1e-4) # L2 regularisation
# Adam — adaptive learning rate + momentum; best default for DL
opt_adam = torch.optim.Adam(model.parameters(),
lr=1e-3, # default, usually works
betas=(0.9, 0.999), # momentum terms
eps=1e-8,
weight_decay=1e-5)
# AdamW — Adam with decoupled weight decay (better than Adam + L2)
opt_adamw = torch.optim.AdamW(model.parameters(), lr=1e-3,
weight_decay=1e-2)
# Learning rate schedulers — change lr during training
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(opt_adam, T_max=100)
for epoch in range(100):
# ... training loop ...
scheduler.step() # decrease lr following cosine curveWhen to choose: Adam is the safe default for most deep learning tasks. SGD with momentum often achieves better final generalisation on image classification tasks (the finding that motivated the NLP community's shift back to AdamW for fine-tuning pre-trained transformers). AdamW is now the standard for fine-tuning large language models.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
