BigData / Apache Parquet Interview Questions
What is Schema Evolution in Parquet?
Schema evolution is the ability to change a Parquet dataset's schema over time without rewriting existing files. Parquet natively supports:
- Adding columns — new columns appear as
nullin older files when read together with newer files. - Renaming columns — supported via field IDs (used in formats like Iceberg on top of Parquet).
- Widening types — e.g., INT32 → INT64 is safe; narrowing is not.
In Apache Spark, set mergeSchema = true to merge schemas across multiple Parquet files automatically:
df = spark.read.option("mergeSchema", "true").parquet("s3://bucket/data/")
Schema evolution is critical for long-lived data lakes where upstream producers add new fields without co-ordinating with all downstream consumers.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
