Python / PyTorch Fundamentals Interview Questions

What optimizers does PyTorch provide and how do you choose between them?

An optimizer updates model parameters based on computed gradients. PyTorch provides all major optimizers in torch.optim. Choosing the right optimizer and tuning its hyperparameters has a large impact on training speed and final performance.

Common PyTorch optimizers
Optimizer	Class	Key parameters	Best for
SGD	optim.SGD	lr, momentum, weight_decay, nesterov	Image classification (with momentum); can generalise better than Adam
SGD + Momentum	optim.SGD(momentum=0.9)	momentum=0.9 standard	Most vision tasks
Adam	optim.Adam	lr=1e-3, betas=(0.9,0.999), eps=1e-8	Default choice; fast convergence
AdamW	optim.AdamW	lr, weight_decay (decoupled)	Fine-tuning transformers; correct L2
RMSprop	optim.RMSprop	lr, alpha=0.99	RNNs
Adagrad	optim.Adagrad	lr	Sparse features, NLP

import torch, torch.nn as nn, torch.optim as optim

model = nn.Linear(10, 1)

# SGD with momentum (common for vision)
sgd = optim.SGD(
    model.parameters(),
    lr=0.01,
    momentum=0.9,
    weight_decay=1e-4,   # L2 regularisation
    nesterov=True,
)

# Adam (default for most tasks)
adam = optim.Adam(
    model.parameters(),
    lr=1e-3,
    betas=(0.9, 0.999),
    eps=1e-8,
    weight_decay=0,      # NOTE: weight decay in Adam is coupled (bug!)
)

# AdamW — decoupled weight decay (correct implementation)
adamw = optim.AdamW(
    model.parameters(),
    lr=1e-3,
    weight_decay=0.01,   # decoupled from gradient update
)

# Per-layer learning rates (useful for fine-tuning)
optimizer = optim.AdamW([
    {"params": model.weight, "lr": 1e-4},   # lower lr for pretrained
    {"params": model.bias,   "lr": 1e-3},   # higher lr for new head
], weight_decay=0.01)

# Standard training step
optimizer.zero_grad()
loss = nn.MSELoss()(model(torch.randn(8,10)), torch.randn(8,1))
loss.backward()
optimizer.step()

Adam vs AdamW: In standard Adam, adding weight_decay couples the regularisation with the adaptive learning rate, weakening its effect. AdamW fixes this by applying weight decay directly to the parameters, separate from the gradient update — this is the correct L2 regularisation and is now the standard for transformer fine-tuning.

Take quiz

What is the key difference between Adam and AdamW?AdamW is faster than Adam

✗ Try again.

AdamW decouples weight decay from the gradient update — in Adam, weight decay is incorrectly scaled by the adaptive learning rate; AdamW applies it directly to parameters

✓ Correct! Well done.

Adam uses a different momentum formula

✗ Try again.

AdamW is only available for PyTorch models, not custom modules

✗ Try again.

When might SGD with momentum outperform Adam for a vision model?SGD always outperforms Adam

✗ Try again.

Never — Adam always converges to a better solution

✗ Try again.

SGD with careful tuning can find flatter minima that generalise better on i.i.d. image datasets — several papers show SGD beats Adam on CIFAR and ImageNet despite Adam's faster early convergence

✓ Correct! Well done.

SGD with momentum is only useful when training on CPU

✗ Try again.

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.

Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

Show more question and Answers...

Tools

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database Integration Cloud Scala Python Tools Golang	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.

Python / PyTorch Fundamentals Interview Questions

What optimizers does PyTorch provide and how do you choose between them?

Comments & Discussions

Recently added...