Python / Python Deep Learning and Neural Networks Interview Questions
What are vanishing and exploding gradients, and what techniques are used to address them?
Vanishing gradients occur when gradients shrink exponentially as they are backpropagated through many layers — the product of many small numbers (e.g. sigmoid derivatives ≤ 0.25) approaches zero, making early layer weights unable to update meaningfully. Exploding gradients are the opposite: the product of many large numbers causes gradients to grow exponentially, destabilising training with numerically infinite or NaN updates.
Both problems worsen with depth. The root mathematical cause is that repeated matrix multiplication of the weight matrices during backprop concentrates the gradient spectrum: if weight matrices have singular values consistently less than 1, gradients vanish; if greater than 1, they explode. Several techniques address this:
| Technique | Addresses | How it helps |
|---|---|---|
| ReLU / Leaky ReLU | Vanishing | Gradient = 1 for positive inputs — no shrinkage |
| Batch Normalisation | Both | Normalises layer inputs; stabilises gradient magnitude |
| Residual connections (ResNet) | Vanishing | Gradient highway: ∂L/∂x = ∂L/∂(x+F) flows directly |
| Gradient clipping | Exploding | Caps gradient norm before the update step |
| Careful weight init (Xavier/He) | Both | Ensures variance stable across layers at init |
| LSTM/GRU gates | Vanishing (RNNs) | Gating controls gradient flow through time |
import torch
import torch.nn as nn
# Gradient clipping — applied AFTER backward(), BEFORE optimizer.step()
model = nn.LSTM(input_size=10, hidden_size=128, num_layers=3, batch_first=True)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
x = torch.randn(32, 20, 10) # (batch, seq_len, input_size)
output, _ = model(x)
loss = output.sum()
optimizer.zero_grad()
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # clip!
optimizer.step()
# Residual connection in code:
class ResidualBlock(nn.Module):
def __init__(self, dim):
super().__init__()
self.net = nn.Sequential(
nn.Linear(dim, dim), nn.ReLU(),
nn.Linear(dim, dim)
)
def forward(self, x):
return x + self.net(x) # gradient flows through x directly
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
