BigData / Apache Parquet Interview Questions
What is Apache Iceberg and how does it use Parquet?
Apache Iceberg is a high-performance open table format for huge analytic datasets, designed to replace traditional Hive table management. Iceberg stores actual data in Parquet (or ORC/Avro) files and adds a metadata layer on top.
Iceberg adds to raw Parquet:
- ACID transactions — snapshot isolation for concurrent reads and writes.
- Hidden partitioning — partition transforms (bucket, truncate, year/month/day) applied transparently without user-specified partition paths.
- Full schema evolution — rename, reorder, drop columns tracked via field IDs.
- Time travel — query any historical snapshot:
SELECT * FROM events VERSION AS OF 12345; - Metadata statistics — manifest files track per-file min/max/null counts for fast planning.
Parquet is the default data format in Iceberg deployments (Spark, Flink, Trino, Hive). Iceberg handles all the metadata overhead so Parquet files can remain immutable and easily compactable.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
