Python / Python Deep Learning and Neural Networks Interview Questions
What are the most common activation functions and why did ReLU replace sigmoid/tanh as the default?
Activation functions introduce non-linearity — without them, stacking linear layers would collapse into a single linear transformation. Several families exist, each with different mathematical properties that affect training dynamics.
| Function | Formula | Range | Key property |
|---|---|---|---|
| Sigmoid | 1/(1+e⁻ˣ) | (0, 1) | Saturates for |x|>>0 — causes vanishing gradient |
| Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | (-1, 1) | Zero-centred; still saturates |
| ReLU | max(0, x) | [0, ∞) | Non-saturating for x>0; sparse; fast |
| Leaky ReLU | max(αx, x) α≈0.01 | (-∞,∞) | Fixes ReLU's dying neuron problem |
| GELU | x·Φ(x) | (-∞,∞) | Used in BERT/GPT; smooth probabilistic gate |
| Softmax | eˣⁱ/Σeˣʲ | (0,1) sums to 1 | Multi-class output — probability distribution |
import torch
import torch.nn.functional as F
x = torch.linspace(-3, 3, 7)
print(F.relu(x)) # [0, 0, 0, 0, 1, 2, 3] (zeroes negatives)
print(F.sigmoid(x)) # (0,1) — saturates near 0 and 1 at extremes
print(F.tanh(x)) # (-1,1) — saturates near ±1
print(F.leaky_relu(x, negative_slope=0.01)) # small slope for x<0
print(F.gelu(x)) # smooth variant used in transformers
# Softmax: multi-class final layer
logits = torch.tensor([2.0, 1.0, 0.1])
probs = F.softmax(logits, dim=0)
print(probs) # [0.659, 0.242, 0.099] — sums to 1.0
# In a model: prefer nn.ReLU() (in-place optional with inplace=True)
import torch.nn as nn
act = nn.ReLU() # stateless — can be shared across layersWhy ReLU replaced sigmoid: for large networks the vanishing gradient problem made sigmoid/tanh networks nearly untrainable. For a neuron deep in the network, the gradient arriving from backprop has already been multiplied by many sigmoid derivatives — each at most 0.25 — so the gradient shrinks exponentially with depth. ReLU's derivative is exactly 1 for positive inputs (no shrinkage in that direction), allowing gradients to flow through deep networks without exponential decay. The trade-off is the 'dying ReLU' problem where neurons receiving strongly negative inputs get stuck outputting zero permanently, addressed by Leaky ReLU and ELU variants.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
