BigData / Data Lake Interview questions
What is Apache Iceberg and how does it improve Data Lake table management?
Apache Iceberg is an open table format for huge analytic datasets, designed to solve challenges in managing petabyte-scale tables in data lakes. Originally developed at Netflix and now an Apache top-level project, Iceberg provides reliable, high-performance table semantics on top of object storage like S3, ADLS, or Google Cloud Storage.
Iceberg was created to address limitations in traditional Hive tables, including metadata scalability issues, partition management complexity, and lack of schema evolution support. It works with multiple compute engines including Spark, Trino, Flink, Hive, and Presto.
Key Features of Apache Iceberg:
1. Hidden Partitioning: Unlike Hive tables where users must explicitly specify partition columns in queries, Iceberg handles partitioning transparently. The table format maintains partition metadata automatically, preventing the common mistake of full table scans when partition filters are forgotten.
2. Partition Evolution: Iceberg allows changing partitioning schemes without rewriting data. You can start with daily partitions and later switch to hourly without data migration—Iceberg tracks which partition spec applies to which files.
3. Snapshot Isolation and Time Travel: Every write creates an immutable snapshot. Readers always see a consistent view of the table, even during concurrent writes. Time travel enables querying historical states for audit, debugging, and rollback scenarios.
4. Schema Evolution: Iceberg supports safe schema changes including adding columns, dropping columns, renaming fields, reordering columns, and promoting types. Column IDs track fields across schema versions, ensuring queries work correctly across schema changes.
5. Scalable Metadata: Iceberg uses a tree structure for metadata instead of listing all files in a single manifest. This enables constant-time planning for queries, even on tables with millions of files. Planning a query on a billion-file table takes seconds, not hours.
Iceberg has been adopted by major cloud platforms including AWS (Athena, EMR), Azure (Synapse), Google Cloud (BigQuery), and Snowflake, making it one of the most widely supported open table formats.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
