Python / Python Deep Learning and Neural Networks Interview Questions
What is mixed precision training and how does it speed up deep learning with torch.cuda.amp?
Modern GPUs (Volta and later) have dedicated hardware for 16-bit floating-point operations (FP16 / BFloat16) that can be 2–8× faster than FP32 for matrix multiplications. Mixed precision training runs the forward pass and gradient computations in FP16 (or BF16) for speed, while maintaining a master copy of the weights in FP32 for numerical precision during the optimizer update.
Loss scaling addresses a key challenge: FP16's limited dynamic range (smallest positive ≈ 6×10⁻⁸) can cause small gradient values to underflow to zero. The scaler multiplies the loss by a large scalar before backward (inflating gradients into FP16's representable range), then divides the gradients back before the optimizer step. PyTorch's GradScaler automates this and dynamically adjusts the scale factor.
import torch
import torch.nn as nn
from torch.cuda.amp import autocast, GradScaler
model = nn.Linear(1024, 512).cuda()
optimizer = torch.optim.AdamW(model.parameters(), lr=1e-3)
scaler = GradScaler() # manages loss scaling automatically
x = torch.randn(256, 1024).cuda()
y = torch.randn(256, 512).cuda()
for step in range(100):
optimizer.zero_grad()
# autocast: runs eligible ops in FP16 automatically
with autocast(device_type='cuda', dtype=torch.float16):
y_hat = model(x) # FP16 matrix multiply
loss = nn.MSELoss()(y_hat, y)
# Scale loss -> backward in FP16 -> unscale gradients -> optimizer step
scaler.scale(loss).backward() # inflate loss to prevent underflow
scaler.unscale_(optimizer) # restore original gradient magnitudes
nn.utils.clip_grad_norm_(model.parameters(), 1.0) # clip after unscale
scaler.step(optimizer) # skip step if gradients are inf/NaN
scaler.update() # adjust scale factor for next step
# BFloat16 (bfloat16): available on A100+ GPUs
# - Same exponent range as FP32 (no underflow problem -> no scaler needed)
# - Less precision (7-bit mantissa vs 10-bit for FP16)
with autocast(device_type='cuda', dtype=torch.bfloat16):
y_hat = model(x) # no scaler needed with BF16
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
