Python / Python Deep Learning and Neural Networks Interview Questions
How do RNNs work and why did LSTMs solve the long-range dependency problem?
A vanilla RNN processes a sequence step-by-step, maintaining a hidden state hₜ = tanh(Wₓxₜ + Wₕhₜ₋₁ + b) that acts as a compressed memory of everything seen so far. The problem is that this hidden state must be updated at every step — and during backpropagation through time (BPTT), gradients are multiplied by Wₕ repeatedly. If the spectral radius of Wₕ is less than 1, gradients vanish over long sequences; if greater than 1, they explode. In practice, vanilla RNNs cannot effectively learn dependencies longer than ~10–20 steps.
LSTMs introduce a separate cell state cₜ (the long-term memory) and three gates — forget, input, and output — each controlled by sigmoid activations. The forget gate fₜ = σ(Wf[hₜ₋₁, xₜ] + bf) decides what to erase from cₜ₋₁; the input gate decides what new information to write; the output gate controls what the hidden state exposes. The key mathematical insight is that the cell state update is additive: cₜ = fₜ⊙cₜ₋₁ + iₜ⊙c̃ₜ. Additive updates mean the gradient can flow through time without repeated multiplicative shrinkage, solving the vanishing gradient problem for long sequences.
import torch
import torch.nn as nn
# LSTM usage in PyTorch
lstm = nn.LSTM(
input_size=64,
hidden_size=128,
num_layers=2, # stacked LSTM
batch_first=True, # input shape: (batch, seq, features)
dropout=0.2, # applied between stacked layers
bidirectional=False
)
x = torch.randn(32, 50, 64) # (batch=32, seq_len=50, input=64)
output, (h_n, c_n) = lstm(x)
print(output.shape) # (32, 50, 128) — all time-step hidden states
print(h_n.shape) # (2, 32, 128) — final hidden state, both layers
print(c_n.shape) # (2, 32, 128) — final cell state, both layers
# GRU: simplified LSTM with only 2 gates — often comparable quality
gru = nn.GRU(input_size=64, hidden_size=128, batch_first=True)
out_gru, h_gru = gru(x)
# For classification, use the LAST hidden state:
last_h = output[:, -1, :] # (32, 128) — last time step
classifier = nn.Linear(128, 5)
logits = classifier(last_h)
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
