Python / Data Science Essentials Interview Questions
How do you process large CSV files that don't fit in memory using Pandas?
When a CSV is larger than available RAM, loading it with a plain pd.read_csv causes a MemoryError. Pandas provides three strategies: chunking, selective loading, and dtype optimisation.
import pandas as pd
import numpy as np
# --- Strategy 1: Read only necessary columns and rows ---
df = pd.read_csv(
'big_log.csv',
usecols=['timestamp', 'user_id', 'event', 'amount'], # skip unneeded cols
dtype={'user_id': 'int32', 'amount': 'float32'}, # smaller dtypes
parse_dates=['timestamp'],
nrows=500_000, # read a sample first for exploration
)
# --- Strategy 2: Process in chunks ---
chunk_size = 100_000
results = []
for chunk in pd.read_csv('big_log.csv', chunksize=chunk_size,
usecols=['user_id', 'amount']):
# Process each chunk independently
summary = chunk.groupby('user_id')['amount'].sum()
results.append(summary)
# Combine partial results
final = pd.concat(results).groupby(level=0).sum()
# --- Strategy 3: Filter while reading with chunksize ---
high_value_chunks = []
for chunk in pd.read_csv('big_log.csv', chunksize=chunk_size):
filtered = chunk[chunk['amount'] > 1000]
high_value_chunks.append(filtered)
high_value_df = pd.concat(high_value_chunks, ignore_index=True)
# --- Alternative: Parquet format (much faster than CSV) ---
# Convert once:
df.to_parquet('big_log.parquet', index=False)
# Then read efficiently — Parquet supports column projection and row filters
import pyarrow.parquet as pq
table = pq.read_table('big_log.parquet',
columns=['user_id', 'amount'],
filters=[('amount', '>', 1000)])For truly large-scale work (tens of GB), consider switching from CSV to Parquet (columnar, compressed, fast column projection) and using Dask or Polars instead of Pandas — both operate on lazy computation graphs that stream data without loading everything into memory at once.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
