BigData / Apache Parquet Interview Questions
How do you perform upserts (MERGE INTO) on Parquet-based tables in Delta Lake?
Raw Parquet files are immutable — you cannot update individual rows. Delta Lake adds MERGE INTO support, which implements upserts by reading affected Parquet files, rewriting them with changes, and recording the transaction:
from delta.tables import DeltaTable
delta_table = DeltaTable.forPath(spark, "/delta/customers")
delta_table.alias("target").merge(
updates_df.alias("source"),
"target.customer_id = source.customer_id"
).whenMatchedUpdateAll(
).whenNotMatchedInsertAll(
).execute()
Or in SQL:
MERGE INTO customers AS target
USING updates AS source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
Under the hood: Delta identifies which Parquet files contain matching rows, rewrites only those files with updated values, and records the old files as removed and new files as added in the transaction log. Non-matching files are untouched.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
