Prev Next

Python / Python Deep Learning and Neural Networks Interview Questions

What is self-supervised learning and how do contrastive methods like SimCLR learn representations?

Self-supervised learning (SSL) is a form of unsupervised learning where the model is trained on a pretext task defined entirely from the data itself — no human-provided labels. The learned representations can then be transferred to downstream tasks with few or no labels (linear probe, fine-tuning).

Contrastive methods like SimCLR define a pretext task based on augmentation invariance: for each input, create two random augmented views (crops, colour jitter, flips) and train the model so that representations of the two views of the same image are similar (positive pair), while representations of views from different images are dissimilar (negative pairs). The NT-Xent loss (normalised temperature-scaled cross-entropy) implements this: for a batch of N images (2N views), the model is trained to identify the matching view among 2(N-1) negative candidates.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as T

# Augmentation pipeline: two random views of the same image
augment = T.Compose([
    T.RandomResizedCrop(224),
    T.RandomHorizontalFlip(),
    T.ColorJitter(0.4, 0.4, 0.4, 0.1),
    T.RandomGrayscale(p=0.2),
    T.GaussianBlur(kernel_size=23),
    T.ToTensor(),
])

class SimCLRLoss(nn.Module):
    def __init__(self, temperature=0.07):
        super().__init__()
        self.tau = temperature

    def forward(self, z1, z2):
        # L2-normalise projections to unit sphere
        z1 = F.normalize(z1, dim=1)
        z2 = F.normalize(z2, dim=1)
        # All 2N representations as rows
        z  = torch.cat([z1, z2], dim=0)   # (2N, d)
        # Pairwise cosine similarities / temperature
        sim_matrix = z @ z.T / self.tau   # (2N, 2N)
        # Mask out self-similarities on diagonal
        n = z1.size(0)
        labels = torch.cat([torch.arange(n, 2*n), torch.arange(n)]).to(z.device)
        # Remove diagonal (self-similarity)
        mask = ~torch.eye(2*n, dtype=bool, device=z.device)
        sim_matrix = sim_matrix[mask].view(2*n, -1)
        return F.cross_entropy(sim_matrix, labels)

# After pretraining: linear evaluation
# Freeze backbone, train linear head on downstream task
backbone = resnet50_pretrained
for p in backbone.parameters(): p.requires_grad = False
linear_head = nn.Linear(2048, num_classes)
optimizer = torch.optim.Adam(linear_head.parameters(), lr=1e-3)
What does the temperature parameter τ (tau) control in the NT-Xent contrastive loss?
What are positive and negative pairs in contrastive self-supervised learning?

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.


Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is a neural network and how does forward propagation work mathematically? Explain backpropagation mathematically. How does the chain rule enable computing gradients through many layers? What are the most common activation functions and why did ReLU replace sigmoid/tanh as the default? What are vanishing and exploding gradients, and what techniques are used to address them? Why does weight initialization matter in neural networks, and what is the difference between Xavier and He initialization? How does Batch Normalization work mathematically and why does it stabilize training? Compare SGD, SGD with momentum, RMSProp, and Adam optimizers. When do you choose each? How does Dropout work mathematically, and why does it act as regularization? Explain how convolutional layers work and why they are well-suited to image data. How do RNNs work and why did LSTMs solve the long-range dependency problem? What is the self-attention mechanism in Transformers and why did it replace RNNs for sequence modeling? What loss functions does PyTorch provide for classification and regression, and which to use when? What is transfer learning and how do you fine-tune a pretrained model in PyTorch? How does PyTorch's Dataset and DataLoader pipeline work, and what are the key performance considerations? Why is learning rate scheduling important and what are the most common strategies? What are the most effective regularization strategies for deep learning and how do they differ from classical ML regularization? What are embedding layers in deep learning and how are they different from one-hot encoding? How do you save and load PyTorch models correctly, and what is included in a proper checkpoint? What is mixed precision training and how does it speed up deep learning with torch.cuda.amp? What is the difference between model.eval(), torch.no_grad(), and torch.inference_mode()? When do you use each? How do you use GPUs in PyTorch and what are the key patterns for writing device-agnostic code? What are the differences between Batch Norm, Layer Norm, Group Norm, and Instance Norm? What is an autoencoder and what can a well-trained latent space be used for? How do you diagnose a neural network that is not training correctly from its loss curves? What is the mathematical setup of a Generative Adversarial Network (GAN) and what training challenges do they have? What is torch.compile and how does it speed up PyTorch model execution? Why do Transformers need positional encodings and how does sinusoidal encoding work? What are the most impactful hyperparameters to tune in deep learning and what is the recommended search order? What is an encoder-decoder architecture and how is it used for sequence-to-sequence tasks? What is model quantization in deep learning and how does PyTorch support it? What does a production-quality PyTorch training loop look like, incorporating all best practices? How does batch size affect deep learning training mathematically and practically? How do you choose the right layer type (Linear, Conv, Attention) for a given input modality? What evaluation metrics are most commonly used in deep learning tasks and how do you implement them in PyTorch? How do you export a PyTorch model for production deployment using TorchScript or ONNX? What is knowledge distillation and how does it compress large neural networks into smaller ones? What is self-supervised learning and how do contrastive methods like SimCLR learn representations? How would you implement and train a simple feedforward neural network in PyTorch from scratch, without using nn.Sequential?
Show more question and Answers...

Python Modern Generative AI and Agents Interview Questions

Comments & Discussions