Python / PyTorch Fundamentals Interview Questions

What is Batch Normalization in PyTorch and how does it differ from Layer Normalization?

Normalization layers stabilise training by re-centring and re-scaling activations. PyTorch provides several variants; Batch Normalization (BatchNorm) and Layer Normalization (LayerNorm) are the two most widely used, but they normalise over different dimensions and suit different architectures.

BatchNorm vs LayerNorm
Feature	BatchNorm (nn.BatchNorm1d/2d)	LayerNorm (nn.LayerNorm)
Normalises over	Batch dimension (per-feature statistics)	Feature dimension (per-sample statistics)
Statistics at train	Computed from current mini-batch	Computed from current sample's features
Statistics at eval	Uses running mean/var accumulated during training	Always computed fresh from current input
Batch size dependency	Noisy with very small batches (< 8)	Independent of batch size — works with batch=1
Best for	CNNs (image models)	Transformers, RNNs, NLP models
Parameters	gamma (scale), beta (shift) per feature	Same, but normalised per sample

/div>

import torch
import torch.nn as nn

# ── BatchNorm — for feedforward / CNN models
class BNModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(20, 64)
        self.bn1 = nn.BatchNorm1d(64)   # 64 features
        self.fc2 = nn.Linear(64, 10)

    def forward(self, x):
        x = torch.relu(self.bn1(self.fc1(x)))
        return self.fc2(x)

# BatchNorm behaves differently in train vs eval mode!
# train: normalise using batch mean/var, update running stats
# eval:  use accumulated running_mean / running_var
model = BNModel()
model.train()   # must be in train mode during training!

# ── LayerNorm — for transformers and sequence models
class LNModel(nn.Module):
    def __init__(self, d_model=64):
        super().__init__()
        self.fc1 = nn.Linear(20, d_model)
        self.ln1 = nn.LayerNorm(d_model)   # normalise over last dim
        self.fc2 = nn.Linear(d_model, 10)

    def forward(self, x):
        x = torch.relu(self.ln1(self.fc1(x)))
        return self.fc2(x)

# LayerNorm produces the SAME result at train and eval
ln_model = LNModel()
ln_model.train()
x = torch.randn(8, 20)
out_train = ln_model(x)
ln_model.eval()
out_eval  = ln_model(x)
print(torch.allclose(out_train, out_eval))  # True — LayerNorm is mode-independent!

Common bug: forgetting to call model.train() before training and model.eval() before validation when using BatchNorm — at eval, it uses accumulated running statistics, and if these were never updated (because the model was always in eval mode), predictions will be incorrect.

Take quiz

Why does BatchNorm produce different outputs depending on whether the model is in train() or eval() mode?eval() mode disables the learnable gamma and beta parameters

✗ Try again.

In train mode, BatchNorm normalises using the current mini-batch's mean and variance; in eval mode, it uses accumulated running_mean and running_var from training — without calling model.eval(), inference uses noisy batch statistics rather than stable running statistics

✓ Correct! Well done.

BatchNorm applies dropout in train mode but not eval mode

✗ Try again.

eval() mode doubles the batch size internally to compute more stable statistics

✗ Try again.

For which type of architecture is LayerNorm preferred over BatchNorm, and why?CNNs — LayerNorm handles spatial dimensions better

✗ Try again.

Transformers and sequence models — LayerNorm normalises per-sample (independent of batch size), making it well-suited for variable-length sequences and tasks where batch size may be very small (e.g. language model fine-tuning)

✓ Correct! Well done.

Generative models — LayerNorm generates better image quality

✗ Try again.

Any model with more than 5 layers — BatchNorm is only stable in shallow networks

✗ Try again.

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.

Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

Show more question and Answers...

Tools

Comments & Discussions

Core Python Fundamentals Interview Questions 45 Data Science Essentials Interview Questions 45 Python Mathematical Intuition and Scikit Learn Interview Questions 36 Python Deep Learning and Neural Networks Interview Questions 38 Python Modern Generative AI and Agents Interview Questions 38 FastAPI Interview Questions 38 PyTorch Fundamentals Interview Questions 48

Recently added...

What are activation functions in PyTorch and how do you apply them?

What optimizers does PyTorch provide and how do you choose between them?

What is the computation graph in PyTorch and how does the dynamic graph differ from a static graph?

What built-in layers does PyTorch's nn module provide and how do you use the most common ones?

What are learning rate schedulers in PyTorch and how do you use them?

What loss functions does PyTorch provide and when do you use each?

What are the most important tensor operations in PyTorch?

What is autograd in PyTorch and how does it compute gradients?

What is nn.Module and how do you build a custom neural network in PyTorch?

What are nn.Sequential and other container modules in PyTorch?

What are the most important loss functions in PyTorch and when do you use each?

What optimizers does PyTorch provide and how do you configure them?

What are the most common built-in layers in torch.nn and what do they do?

How do you initialise weights in a PyTorch model?

What is PyTorch and what are its key advantages over other deep learning frameworks?

What is a PyTorch tensor and how does it differ from a NumPy array?

What are tensor data types (dtypes) in PyTorch and why do they matter?

How does broadcasting work in PyTorch and what are the rules?

How do torch.no_grad() and tensor.detach() differ, and when do you use each?

What are learning rate schedulers in PyTorch and how do you use them?

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database Integration Cloud Scala Python Tools Golang	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.