Python / Python Deep Learning and Neural Networks Interview Questions
What is the self-attention mechanism in Transformers and why did it replace RNNs for sequence modeling?
Self-attention computes a weighted sum of all input vectors, where the weight between positions i and j reflects how much position i should 'attend to' position j. Concretely, input vectors are linearly projected into queries (Q), keys (K), and values (V), and the attention output is: Attention(Q, K, V) = softmax(QKᵀ/√dₖ) · V. The division by √dₖ prevents the dot products from growing large in high-dimensional spaces, which would push softmax into saturation.
Multi-head attention runs H parallel attention heads with different Q/K/V projections, then concatenates and projects their outputs — each head can learn to attend to different types of relationships simultaneously. The critical advantage over RNNs: self-attention connects any two positions in the sequence in O(1) operations regardless of their distance, while RNNs need O(n) sequential steps to connect positions n apart. This makes transformers trainable in parallel across the sequence length, enabling training on vastly larger datasets.
import torch
import torch.nn as nn
import math
class ScaledDotProductAttention(nn.Module):
def forward(self, Q, K, V, mask=None):
d_k = Q.shape[-1]
scores = (Q @ K.transpose(-2, -1)) / math.sqrt(d_k)
if mask is not None:
scores = scores.masked_fill(mask == 0, float('-inf'))
weights = torch.softmax(scores, dim=-1)
return weights @ V, weights
# PyTorch's built-in multi-head attention
mha = nn.MultiheadAttention(
embed_dim=512,
num_heads=8, # 8 heads, each with dim=64
dropout=0.1,
batch_first=True
)
seq_len, batch, d_model = 20, 4, 512
x = torch.randn(batch, seq_len, d_model)
out, attn_weights = mha(x, x, x) # Q=K=V=x for self-attention
print(out.shape) # (4, 20, 512)
print(attn_weights.shape)# (4, 20, 20) — weight of each position pair
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
