BigData / Apache Parquet Interview Questions
How are Parquet files structured? (Row Groups, Column Chunks, Pages)?
Parquet organises data in a three-level hierarchy:
- Row Group — a horizontal slice of the dataset, typically 128 MB–1 GB of data. Each row group contains one column chunk per column.
- Column Chunk — all values for a single column within a row group. This is the unit of compression and encoding.
- Page — the smallest addressable unit inside a column chunk (default 1 MB). Pages can be data pages, dictionary pages, or index pages.
At the end of the file, a footer stores the schema and per-column statistics (min, max, null count, distinct count). Readers fetch the footer first to plan which row groups and pages to skip.
Parquet File\n├── Row Group 1 (128 MB)\n│ ├── Column Chunk: id\n│ ├── Column Chunk: name\n│ └── Column Chunk: amount\n├── Row Group 2\n│ └── ...\n└── Footer (schema + statistics)
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
