BigData / Apache Parquet Interview Questions
What is Z-ordering (Z-order clustering) and how does it help Parquet queries?
Z-ordering is a multi-dimensional data-skipping technique that physically co-locates related rows across multiple filter columns within Parquet files. It maps multiple column values to a single Z-order curve value and sorts data along that curve.
Problem it solves: Standard partitioning and sorting works well for one column. When queries filter on two or more independent columns (e.g., WHERE region='EU' AND product_category='Electronics'), traditional single-column sorting cannot co-locate all relevant rows.
Z-ordering interleaves the bits of the filter column values so that rows with similar values in multiple columns are physically adjacent.
In Delta Lake (Databricks):
OPTIMIZE events ZORDER BY (region, product_category);
After Z-ordering, the same query skips far more row groups because the min/max ranges for both columns are tight within each file.
Trade-off: Z-order OPTIMIZE is a full rewrite of affected files and should be scheduled periodically, not on every write.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
