Python / PyTorch Fundamentals Interview Questions

How do you debug a PyTorch training loop where the loss is not decreasing or is NaN?

Diagnosing a stuck or diverging training loop is one of the most valuable practical PyTorch skills. The shape of the loss curve and a few targeted checks usually reveal the root cause.

Common training failure modes
Symptom	Likely cause	Fix
Loss is NaN from step 1	Exploding gradients, bad data (inf/NaN inputs), lr too high	Check input data, add gradient clipping, lower lr
Loss never decreases	Vanishing gradients, lr too low, forgot optimizer.step()	Check gradient norms, raise lr, verify training loop order
Loss decreases then plateaus high	Model too small, lr too high for fine convergence	Increase capacity, add lr scheduler
Train loss low, val loss high	Overfitting	Add dropout, weight decay, more data, early stopping
Loss oscillates wildly	lr too high, batch size too small	Lower lr, increase batch size, use lr warmup

/div>

import torch
import torch.nn as nn

model     = nn.Linear(10, 5)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()

for step, (X, y) in enumerate(loader):
    optimizer.zero_grad()
    logits = model(X)
    loss = criterion(logits, y)

    # ── Check 1: is the loss finite?
    if not torch.isfinite(loss):
        print(f"Step {step}: non-finite loss = {loss.item()}")
        print("Input contains NaN:", torch.isnan(X).any().item())
        print("Input contains Inf:", torch.isinf(X).any().item())
        break

    loss.backward()

    # ── Check 2: gradient norms — are gradients flowing at all?
    total_norm = sum(
        p.grad.norm().item() ** 2 for p in model.parameters() if p.grad is not None
    ) ** 0.5
    if step % 50 == 0:
        print(f"Step {step}: loss={loss.item():.4f} grad_norm={total_norm:.4f}")

    # ── Check 3: are any gradients None? (means that param was unused!)
    for name, p in model.named_parameters():
        if p.grad is None:
            print(f"WARNING: {name} has no gradient — is it used in forward()?")

    optimizer.step()

# ── Check 4: verify model output shape and range make sense
with torch.no_grad():
    sample_out = model(X[:1])
    print("Output range:", sample_out.min().item(), sample_out.max().item())

# ── Check 5: overfit a tiny batch — sanity check the architecture
# If the model cannot drive loss near zero on 5 examples, there is a bug
tiny_X, tiny_y = X[:5], y[:5]
for _ in range(200):
    optimizer.zero_grad()
    loss = criterion(model(tiny_X), tiny_y)
    loss.backward()
    optimizer.step()
print(f"Tiny-batch overfit loss: {loss.item():.6f}")  # should approach 0

Take quiz

If a training loss is NaN starting from the very first step, what should you check first?Whether the model has enough layers

✗ Try again.

Whether the input data itself contains NaN or Inf values, and whether the learning rate is too high — both are the most common immediate causes of NaN loss

✓ Correct! Well done.

Whether the batch size is too large

✗ Try again.

Whether the validation set is correctly split

✗ Try again.

What does the 'overfit a tiny batch' sanity check (training on 5 examples until loss ≈ 0) verify?That the model will generalise well to the full dataset

✗ Try again.

That the model architecture, loss function, and training loop are wired correctly — if a model cannot memorise even 5 examples, there's a bug somewhere in the implementation, not the data or hyperparameters

✓ Correct! Well done.

That the learning rate is optimal for the full dataset

✗ Try again.

That the model is not overfitting

✗ Try again.

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.

Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

Show more question and Answers...

Tools

Comments & Discussions

Core Python Fundamentals Interview Questions 45 Data Science Essentials Interview Questions 45 Python Mathematical Intuition and Scikit Learn Interview Questions 36 Python Deep Learning and Neural Networks Interview Questions 38 Python Modern Generative AI and Agents Interview Questions 38 FastAPI Interview Questions 38 PyTorch Fundamentals Interview Questions 48

Recently added...

What are activation functions in PyTorch and how do you apply them?

What optimizers does PyTorch provide and how do you choose between them?

What is the computation graph in PyTorch and how does the dynamic graph differ from a static graph?

What built-in layers does PyTorch's nn module provide and how do you use the most common ones?

What are learning rate schedulers in PyTorch and how do you use them?

What loss functions does PyTorch provide and when do you use each?

What are the most important tensor operations in PyTorch?

What is autograd in PyTorch and how does it compute gradients?

What is nn.Module and how do you build a custom neural network in PyTorch?

What are nn.Sequential and other container modules in PyTorch?

What are the most important loss functions in PyTorch and when do you use each?

What optimizers does PyTorch provide and how do you configure them?

What are the most common built-in layers in torch.nn and what do they do?

How do you initialise weights in a PyTorch model?

What is PyTorch and what are its key advantages over other deep learning frameworks?

What is a PyTorch tensor and how does it differ from a NumPy array?

What are tensor data types (dtypes) in PyTorch and why do they matter?

How does broadcasting work in PyTorch and what are the rules?

How do torch.no_grad() and tensor.detach() differ, and when do you use each?

What are learning rate schedulers in PyTorch and how do you use them?

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database Integration Cloud Scala Python Tools Golang	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.