BigData / Apache Parquet Interview Questions
How does AWS Athena query Parquet files in S3?
AWS Athena is a serverless interactive query service that uses Presto/Trino under the hood. It reads Parquet files directly from S3 using the Parquet columnar reader.
Steps to query Parquet in Athena:
- Define a Glue Data Catalog table pointing to the S3 prefix:
CREATE EXTERNAL TABLE events (
user_id BIGINT,
event_type STRING,
revenue DOUBLE,
event_date DATE
)
STORED AS PARQUET
LOCATION 's3://my-bucket/events/'
TBLPROPERTIES ("parquet.compress"="SNAPPY");
- Query as normal SQL — Athena applies column pruning and predicate pushdown automatically.
Cost optimisation: Athena charges per byte scanned. Parquet compression + columnar reads can reduce costs by 90%+ compared to querying CSV for the same logical data.
Partition projection or partition-filtered queries (WHERE event_date BETWEEN ...) reduce both scan size and latency.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
