Python / PyTorch Fundamentals Interview Questions

What are Dataset and DataLoader in PyTorch and how do they work together?

PyTorch's data pipeline follows a clean two-class design: Dataset defines how to access a single sample (index → data), and DataLoader wraps a Dataset to handle batching, shuffling, and parallel loading.

import torch
from torch.utils.data import Dataset, DataLoader
import numpy as np

class TabularDataset(Dataset):
    def __init__(self, X: np.ndarray, y: np.ndarray):
        # Convert once at construction — not inside __getitem__!
        self.X = torch.tensor(X, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.long)

    def __len__(self) -> int:
        """Required — tells DataLoader how many samples exist."""
        return len(self.X)

    def __getitem__(self, idx: int):
        """Required — return a single (features, label) sample."""
        return self.X[idx], self.y[idx]

# Synthetic data
X = np.random.randn(1000, 20).astype(np.float32)
y = np.random.randint(0, 3, size=1000)

dataset = TabularDataset(X, y)
print(len(dataset))          # 1000
print(dataset[0])            # (tensor of 20 features, tensor scalar label)

loader = DataLoader(
    dataset,
    batch_size=32,
    shuffle=True,            # shuffle each epoch — essential for training
    num_workers=4,           # parallel data loading subprocesses
    pin_memory=True,         # faster CPU→GPU transfer
    drop_last=True,          # drop incomplete final batch
)

# Iterate over batches
for X_batch, y_batch in loader:
    print(X_batch.shape, y_batch.shape)  # (32, 20) (32,)
    break

# torchvision pre-built datasets
from torchvision import datasets, transforms
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,)),
])
mnist = datasets.MNIST(root="./data", train=True, download=True, transform=transform)
mnist_loader = DataLoader(mnist, batch_size=64, shuffle=True)

Dataset and DataLoader responsibilities
Component	Responsibility	Required methods
Dataset	Defines how to access ONE sample by index	__len__, __getitem__
DataLoader	Batches samples, shuffles, parallelises loading	Wraps any Dataset object

/div>

Take quiz

What two methods must a custom PyTorch Dataset class implement?__init__ and __call__

✗ Try again.

__len__ (number of samples) and __getitem__ (return one sample by index)

✓ Correct! Well done.

__iter__ and __next__

✗ Try again.

load() and fetch()

✗ Try again.

Why is shuffle=True important when creating a DataLoader for training (but typically False for validation)?Shuffling speeds up data loading

✗ Try again.

Without shuffling, the model sees data in the same fixed order every epoch, which can cause it to learn spurious patterns related to data ordering rather than the underlying signal — shuffling each epoch prevents this; validation order doesn't affect learning so it's left False for reproducibility

✓ Correct! Well done.

Shuffling reduces GPU memory usage

✗ Try again.

DataLoader requires shuffle=True to support batching

✗ Try again.

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.

Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

Show more question and Answers...

Tools

Comments & Discussions

Core Python Fundamentals Interview Questions 45 Data Science Essentials Interview Questions 45 Python Mathematical Intuition and Scikit Learn Interview Questions 36 Python Deep Learning and Neural Networks Interview Questions 38 Python Modern Generative AI and Agents Interview Questions 38 FastAPI Interview Questions 38 PyTorch Fundamentals Interview Questions 48

Recently added...

What are activation functions in PyTorch and how do you apply them?

What optimizers does PyTorch provide and how do you choose between them?

What is the computation graph in PyTorch and how does the dynamic graph differ from a static graph?

What built-in layers does PyTorch's nn module provide and how do you use the most common ones?

What are learning rate schedulers in PyTorch and how do you use them?

What loss functions does PyTorch provide and when do you use each?

What are the most important tensor operations in PyTorch?

What is autograd in PyTorch and how does it compute gradients?

What is nn.Module and how do you build a custom neural network in PyTorch?

What are nn.Sequential and other container modules in PyTorch?

What are the most important loss functions in PyTorch and when do you use each?

What optimizers does PyTorch provide and how do you configure them?

What are the most common built-in layers in torch.nn and what do they do?

How do you initialise weights in a PyTorch model?

What is PyTorch and what are its key advantages over other deep learning frameworks?

What is a PyTorch tensor and how does it differ from a NumPy array?

What are tensor data types (dtypes) in PyTorch and why do they matter?

How does broadcasting work in PyTorch and what are the rules?

How do torch.no_grad() and tensor.detach() differ, and when do you use each?

What are learning rate schedulers in PyTorch and how do you use them?

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database Integration Cloud Scala Python Tools Golang	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.