Python / Python Deep Learning and Neural Networks Interview Questions
How does PyTorch's Dataset and DataLoader pipeline work, and what are the key performance considerations?
PyTorch's data loading follows a clean two-class design: Dataset encapsulates how to access a single sample (index → (X, y)), and DataLoader wraps a Dataset to handle batching, shuffling, and parallel data loading. Separating these responsibilities makes it easy to write dataset-specific logic once and reuse the same efficient loading infrastructure.
The most critical performance consideration is that the data loading pipeline must keep the GPU continuously fed — the GPU should never sit idle waiting for the next batch. Key knobs: num_workers launches subprocesses that prefetch batches in parallel with the GPU computation; pin_memory=True allocates batch tensors in pinned (non-pageable) CPU memory, enabling faster CPU→GPU transfers via DMA; prefetch_factor controls how many batches each worker prefetches ahead.
import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np
class TabularDataset(Dataset):
def __init__(self, X: np.ndarray, y: np.ndarray):
# Convert to tensors once at construction (not per __getitem__)
self.X = torch.tensor(X, dtype=torch.float32)
self.y = torch.tensor(y, dtype=torch.long)
def __len__(self):
return len(self.X) # required — DataLoader uses this for indexing
def __getitem__(self, idx):
return self.X[idx], self.y[idx] # single sample
dataset = TabularDataset(X_train, y_train)
loader = DataLoader(
dataset,
batch_size=256,
shuffle=True, # shuffle each epoch
num_workers=4, # parallel data loading
pin_memory=True, # faster CPU->GPU transfer
drop_last=True, # drop incomplete final batch
persistent_workers=True, # keep workers alive between epochs
)
# Training loop
for X_batch, y_batch in loader:
X_batch = X_batch.cuda(non_blocking=True) # async transfer
y_batch = y_batch.cuda(non_blocking=True)
# ... forward, backward, step
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
