Python / PyTorch Fundamentals Interview Questions
What optimizers does PyTorch provide and how do you choose between them?
An optimizer updates model parameters based on computed gradients. PyTorch provides all major optimizers in torch.optim. Choosing the right optimizer and tuning its hyperparameters has a large impact on training speed and final performance.
| Optimizer | Class | Key parameters | Best for |
|---|---|---|---|
| SGD | optim.SGD | lr, momentum, weight_decay, nesterov | Image classification (with momentum); can generalise better than Adam |
| SGD + Momentum | optim.SGD(momentum=0.9) | momentum=0.9 standard | Most vision tasks |
| Adam | optim.Adam | lr=1e-3, betas=(0.9,0.999), eps=1e-8 | Default choice; fast convergence |
| AdamW | optim.AdamW | lr, weight_decay (decoupled) | Fine-tuning transformers; correct L2 |
| RMSprop | optim.RMSprop | lr, alpha=0.99 | RNNs |
| Adagrad | optim.Adagrad | lr | Sparse features, NLP |
import torch, torch.nn as nn, torch.optim as optim
model = nn.Linear(10, 1)
# SGD with momentum (common for vision)
sgd = optim.SGD(
model.parameters(),
lr=0.01,
momentum=0.9,
weight_decay=1e-4, # L2 regularisation
nesterov=True,
)
# Adam (default for most tasks)
adam = optim.Adam(
model.parameters(),
lr=1e-3,
betas=(0.9, 0.999),
eps=1e-8,
weight_decay=0, # NOTE: weight decay in Adam is coupled (bug!)
)
# AdamW — decoupled weight decay (correct implementation)
adamw = optim.AdamW(
model.parameters(),
lr=1e-3,
weight_decay=0.01, # decoupled from gradient update
)
# Per-layer learning rates (useful for fine-tuning)
optimizer = optim.AdamW([
{"params": model.weight, "lr": 1e-4}, # lower lr for pretrained
{"params": model.bias, "lr": 1e-3}, # higher lr for new head
], weight_decay=0.01)
# Standard training step
optimizer.zero_grad()
loss = nn.MSELoss()(model(torch.randn(8,10)), torch.randn(8,1))
loss.backward()
optimizer.step()Adam vs AdamW: In standard Adam, adding weight_decay couples the regularisation with the adaptive learning rate, weakening its effect. AdamW fixes this by applying weight decay directly to the parameters, separate from the gradient update — this is the correct L2 regularisation and is now the standard for transformer fine-tuning.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
