Python / Python Deep Learning and Neural Networks Interview Questions
Why do Transformers need positional encodings and how does sinusoidal encoding work?
Self-attention is permutation equivariant — swapping two positions in the input produces the same output with those two positions swapped, because attention treats all positions symmetrically. Without positional information, a transformer cannot distinguish 'The dog bit the man' from 'The man bit the dog'. Positional encodings inject sequence order information into the token embeddings before they enter the transformer.
The original 'Attention is All You Need' paper uses sinusoidal encodings: PE(pos, 2i) = sin(pos / 10000^{2i/d}) and PE(pos, 2i+1) = cos(pos / 10000^{2i/d}), where pos is the position and i is the dimension index. Each dimension oscillates at a different frequency, giving a unique fingerprint to every position. The key properties: (1) each position has a unique encoding; (2) the encoding for position pos+k is a linear function of position pos, allowing the model to reason about relative distances; (3) it generalises to sequence lengths unseen during training.
import torch
import math
def sinusoidal_positional_encoding(max_len, d_model):
pe = torch.zeros(max_len, d_model)
position = torch.arange(0, max_len).unsqueeze(1).float()
div_term = torch.exp(
torch.arange(0, d_model, 2).float() * -(math.log(10000.0) / d_model)
)
pe[:, 0::2] = torch.sin(position * div_term) # even dims: sin
pe[:, 1::2] = torch.cos(position * div_term) # odd dims: cos
return pe # (max_len, d_model)
import torch.nn as nn
class PositionalEncoding(nn.Module):
def __init__(self, d_model, max_len=512, dropout=0.1):
super().__init__()
self.dropout = nn.Dropout(dropout)
pe = sinusoidal_positional_encoding(max_len, d_model)
self.register_buffer('pe', pe) # not a parameter; saved with model
def forward(self, x): # x: (batch, seq_len, d_model)
x = x + self.pe[:x.size(1)] # add pos encoding to each embedding
return self.dropout(x)
# Modern alternative: Rotary Position Embeddings (RoPE)
# Used in LLaMA, Mistral — encodes relative rather than absolute position
# Applied directly to Q and K matrices before attention computation
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
