BigData / Apache Parquet Interview Questions
If a Spark query on Parquet is slow, what optimisation steps would you take?
Diagnosing and tuning slow Parquet queries in Spark follows a layered approach:
- Check partitioning — ensure the table is partitioned on high-cardinality filter columns (e.g.,
date,region). Without partitions, Spark scans all files.df.write.partitionBy("date", "region").parquet("s3://path/") - Verify column pruning — use
SELECT col1, col2rather thanSELECT *so only required column chunks are read. - Align filters with statistics — filter on columns that have good min/max spread so row group skipping is effective. Sort data by the filter column before writing (Z-ordering or file-level sorting).
- Right-size row groups — too-small row groups → many footer reads; too-large → coarse skipping. Target 128 MB–512 MB per row group.
- Check compression codec — Snappy (fast decompression) suits CPU-bound queries; ZSTD suits I/O-bound scenarios.
- Enable Adaptive Query Execution (AQE) —
spark.sql.adaptive.enabled=truereoptimises joins and coalesces small partitions at runtime. - Inspect the Spark UI — look at the scan stage: bytes read vs. bytes skipped tells you how effective pushdown is.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
