Python / Python Deep Learning and Neural Networks Interview Questions
What is torch.compile and how does it speed up PyTorch model execution?
Introduced in PyTorch 2.0, torch.compile applies ahead-of-time compilation to a PyTorch model or function. Rather than executing each operation eagerly (PyTorch's default), it captures the computation as a graph, optimises it (fusing operations, eliminating redundant memory reads/writes), and compiles it to efficient machine code using a backend (TorchInductor by default, which generates CUDA/C++ kernels).
The primary benefit is kernel fusion: instead of launching a separate GPU kernel for each operation (e.g. separate kernels for matrix multiply, add bias, and ReLU), the compiler fuses them into a single kernel that reads and writes GPU memory once. GPU memory bandwidth is often the bottleneck for transformer-style models, so reducing memory round-trips directly translates to throughput gains — typically 10–50% speedup for training and inference on modern hardware.
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(1024, 1024), nn.GELU(),
nn.Linear(1024, 512), nn.GELU(),
nn.Linear(512, 10)
)
# Compile the model — first call triggers compilation (may take 30s+)
compiled_model = torch.compile(model)
# Usage is identical to a regular model
x = torch.randn(256, 1024).cuda()
compiled_model = compiled_model.cuda()
out = compiled_model(x) # warm-up: triggers compilation
out = compiled_model(x) # subsequent calls use compiled kernels
# Compilation modes (trade-off speed of compilation vs runtime)
model_default = torch.compile(model) # best overall
model_reduce = torch.compile(model, mode='reduce-overhead') # fewer overheads
model_max = torch.compile(model, mode='max-autotune') # slowest to compile, fastest to run
# Measure speedup
import time
x = torch.randn(512, 1024, device='cuda')
for _ in range(5): model(x) # warm-up
t0 = time.time()
for _ in range(100): model(x)
torch.cuda.synchronize()
print('Eager:', time.time() - t0)
for _ in range(5): compiled_model(x)
t0 = time.time()
for _ in range(100): compiled_model(x)
torch.cuda.synchronize()
print('Compiled:', time.time() - t0)
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
