Python / Data Science Essentials Interview Questions
How do you detect and remove duplicate rows in a Pandas DataFrame?
Duplicate rows silently inflate counts, distort means, and can cause data leakage between training and test sets. Pandas provides duplicated() and drop_duplicates() for systematic duplicate management.
import pandas as pd
df = pd.DataFrame({
'order_id': [1, 2, 2, 3, 4, 4],
'product': ['A', 'B', 'B', 'C', 'D', 'D'],
'amount': [100, 200, 200, 150, 80, 90], # last pair differs!
})
# --- Detecting duplicates ---
df.duplicated() # True for all duplicates (keeps first)
df.duplicated(keep='last') # True for all duplicates (keeps last)
df.duplicated(keep=False) # True for ALL occurrences
print(df.duplicated().sum()) # count of duplicate rows
# Duplicate check on a subset of columns only
df.duplicated(subset=['order_id', 'product'])
# True where order_id AND product are repeated (ignores amount diff)
# --- Removing duplicates ---
df.drop_duplicates() # removes all but first occurrence
df.drop_duplicates(keep='last') # keeps last occurrence
df.drop_duplicates(keep=False) # removes all occurrences of any duplicate
# Subset-based deduplication — keep first by order_id
df.drop_duplicates(subset=['order_id'], keep='first')
# Sort before deduplicating to control which row is 'first'
# (e.g., keep the highest amount per order)
df.sort_values('amount', ascending=False).drop_duplicates(subset=['order_id'])When deduplicating on a subset of columns, think carefully about which row to keep. Sorting the DataFrame first (by timestamp, version, or a quality metric) ensures drop_duplicates(keep='first') retains the most appropriate record, not just whatever happened to be first in the file.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
