BigData / Apache Parquet Interview Questions
What is the Parquet file footer and why does the reader fetch it first?
The Parquet file footer is a serialised Thrift structure at the end of every Parquet file. It contains:
- The full file schema (field names, types, nesting).
- Per-row-group metadata: byte offsets, compressed/uncompressed sizes, row counts.
- Per-column-chunk statistics: min value, max value, null count, distinct count.
- Encoding and compression codec per column chunk.
- Bloom filter offsets (if present).
Readers always fetch the footer first because it is small (typically kilobytes) and provides the complete map needed to plan which row groups and column chunks to read. Without the footer, the reader would have to scan the entire file sequentially.
The last 4 bytes of a Parquet file are the magic bytes PAR1; the 4 bytes before that are a 32-bit integer giving the footer length, so readers seek to the end of the file first.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
