Prev Next

Python / Data Science Essentials Interview Questions

How is NumPy linear algebra used in data science applications?

Linear algebra underpins almost all of machine learning — from computing gradients to PCA to solving systems of equations. NumPy's linalg submodule provides production-grade implementations of the core operations.

import numpy as np

# --- Solving a system of linear equations: Ax = b ---
# 2x + y = 8
# x + 3y = 11
A = np.array([[2, 1], [1, 3]])
b = np.array([8, 11])
x = np.linalg.solve(A, b)
print(x)   # [2.6  2.8]  — verify: A @ x ≈ b

# --- Matrix decompositions ---
M = np.array([[3, 1], [1, 3]], dtype=float)

# Eigenvalue decomposition
eigenvalues, eigenvectors = np.linalg.eig(M)
# eigenvalues = [4. 2.], eigenvectors (columns) = principal directions

# Singular Value Decomposition — used in PCA, recommendation systems
X = np.random.default_rng(42).random((100, 5))   # 100 samples, 5 features
X -= X.mean(axis=0)                               # centre
U, S, Vt = np.linalg.svd(X, full_matrices=False)
# S = singular values (square roots of eigenvalues of X^T X)
# Vt rows = principal components
# Project onto first 2 components:
X_pca = X @ Vt[:2].T    # shape (100, 2)

# --- Norms ---
v = np.array([3.0, 4.0])
np.linalg.norm(v)        # 5.0 — L2 norm
np.linalg.norm(v, ord=1) # 7.0 — L1 norm

# --- Matrix rank, determinant, inverse ---
np.linalg.matrix_rank(A)
np.linalg.det(A)
np.linalg.inv(A)   # only for square non-singular matrices
np.linalg.pinv(A)  # Moore-Penrose pseudoinverse for non-square

SVD is the engine behind PCA: the right singular vectors (rows of Vt) are the principal components, and the singular values tell you how much variance each component explains. Using full_matrices=False (economy SVD) is essential for tall matrices — it skips computing the large, unused portions of U.

In PCA implemented via SVD, what do the rows of the Vt matrix represent?
Which NumPy function solves the linear system Ax = b without computing the inverse of A?

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.


Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is NumPy and why is it significantly faster than plain Python lists for numerical work? What are the main ways to create NumPy arrays? How do NumPy array shape, reshape, and axis work? What is NumPy broadcasting and how does it work? How does NumPy boolean masking and fancy indexing work? What are the most commonly used NumPy mathematical functions in data science? What is a Pandas DataFrame and how does it differ from a NumPy array? How do you read CSV, Excel, and JSON files into a Pandas DataFrame? What is the difference between df.loc[] and df.iloc[] in Pandas? How do you detect, handle, and fill missing values in a Pandas DataFrame? What are the different ways to filter rows in a Pandas DataFrame? How does Pandas groupby work and what aggregation patterns are most useful? How do you merge and join DataFrames in Pandas, and what do the different join types mean? When should you use df.apply() versus vectorised Pandas operations? How do you use pd.pivot_table to summarise data? How do you perform string operations on Pandas DataFrame columns? How do you work with dates and times in Pandas? What is Matplotlib and what are the key components of a figure? What are the most common chart types in Matplotlib and when do you use each? How do you create multi-panel figures with Matplotlib subplots? What is Seaborn and how does it differ from Matplotlib? What are the most important Seaborn plot types for exploratory data analysis? How do you create and interpret a correlation heatmap with Seaborn? What is Seaborn's FacetGrid and how does it enable multi-panel statistical plots? How do you compute descriptive statistics on a Pandas DataFrame? How do you reduce a Pandas DataFrame's memory usage through dtype optimisation? How do you generate reproducible random data with NumPy? How do you use value_counts() and pd.crosstab() to understand categorical data? How do you style Matplotlib figures and save them for reports? What is np.where and how is it used for conditional array creation? What is Pandas method chaining and how does df.pipe() support it? What does a typical exploratory data analysis (EDA) workflow look like in Python? How do you stack, concatenate, and split NumPy arrays? How do you detect and remove duplicate rows in a Pandas DataFrame? How do you control colours and colour palettes in Matplotlib and Seaborn? How do rolling and expanding window functions work in Pandas? How do Seaborn jointplot and pairplot help explore multivariate relationships? What are the key performance tips when using NumPy for large-scale data processing? How do you visualise regression results and residuals using Seaborn and Matplotlib? How do you process large CSV files that don't fit in memory using Pandas? How do you add annotations and text to Matplotlib charts? How do you quickly extract top/bottom rows and random samples from a Pandas DataFrame? How is NumPy linear algebra used in data science applications? How do you compare distributions across categories using Seaborn categorical plots? How do you build an end-to-end data cleaning and visualisation pipeline with NumPy, Pandas, and Seaborn?
Show more question and Answers...

Tools

Comments & Discussions