BigData / Apache Parquet Interview Questions
What is Delta Lake and how does it extend Parquet for ACID transactions?
Delta Lake is an open-source storage layer built by Databricks that adds transactional guarantees on top of Parquet files stored in object storage. The core idea: all changes (inserts, updates, deletes) are written as immutable Parquet files and tracked via a JSON transaction log (the _delta_log directory).
ACID properties in Delta Lake:
- Atomicity — a transaction either fully commits (log entry added) or is rolled back (log entry absent).
- Consistency — schema enforcement prevents corrupt writes.
- Isolation — Optimistic Concurrency Control (OCC) detects conflicts between concurrent writers.
- Durability — log + data files in object storage are highly durable.
Data files remain Parquet; Delta adds a _delta_log/ with JSON commit files that record which Parquet files are added or removed in each transaction. Periodic checkpoint files (Parquet snapshots of the log) speed up log replay.
# Spark Delta write with schema enforcement
df.write.format("delta").mode("append").save("/delta/events")
# Time travel
spark.read.format("delta").option("versionAsOf", 5).load("/delta/events")
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
