BigData / Apache Parquet Interview Questions

What are the supported data types in Parquet?

Parquet defines primitive types (physical storage) and logical types (semantic meaning layered on top).

Primitive types: BOOLEAN, INT32, INT64, INT96 (legacy timestamps), FLOAT, DOUBLE, BYTE_ARRAY, FIXED_LEN_BYTE_ARRAY.

Common logical types (annotations on primitives):

Logical Type	Physical Mapping	Example
STRING	BYTE_ARRAY (UTF-8)	"London"
DATE	INT32 (days since epoch)	2026-01-01
TIMESTAMP_MILLIS	INT64	1704067200000
DECIMAL	INT32/INT64/BYTE_ARRAY	99.99
LIST	Repeated group	[1, 2, 3]
MAP	Repeated key-value group	{k: v}
ENUM	BYTE_ARRAY	"ACTIVE"
UUID	FIXED_LEN_BYTE_ARRAY(16)	RFC-4122

Understanding the primitive/logical split matters when debugging type mismatch errors between Spark, Hive, and other readers.

How does Parquet store a DATE column at the physical level?As INT32 representing days since the Unix epoch

✓ Correct! Well done.

As a UTF-8 string in ISO 8601 format

✗ Try again if not.

As FLOAT representing fractional Julian days

✗ Try again if not.

What physical Parquet type is used to represent a STRING logical type?INT64

✗ Try again if not.

BYTE_ARRAY with UTF-8 annotation

✓ Correct! Well done.

FIXED_LEN_BYTE_ARRAY

✗ Try again if not.

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.

Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is Apache Parquet and why is it used? What are the advantages of Parquet over CSV? How are Parquet files structured? (Row Groups, Column Chunks, Pages)? What is Schema Evolution in Parquet? What is Column Pruning and Projection Pushdown in Parquet? When would you choose Avro over Parquet? How does Parquet handle compression and encoding? What is the Vectorized Reader in Spark and how does it improve Parquet performance? How do you handle schema mismatches when merging multiple Parquet files? If a Spark query on Parquet is slow, what optimisation steps would you take? How do you load Parquet files into Snowflake? What are the supported data types in Parquet? How do you read and write Parquet files in PySpark? How do you read and write Parquet files in Python with PyArrow? What is partitioning in Parquet and how does it improve query performance? What are Bloom Filters in Parquet and when should you use them? What is the difference between Parquet, ORC, and Avro? What is Z-ordering (Z-order clustering) and how does it help Parquet queries? What is Apache Iceberg and how does it use Parquet? How does DuckDB query Parquet files and what makes it fast? What is the Parquet file footer and why does the reader fetch it first? How does Parquet support nested data (structs, lists, maps)? What is small file problem in Parquet-based data lakes and how do you solve it? What is the difference between repartition and coalesce when writing Parquet files? How does AWS Athena query Parquet files in S3? What is predicate pushdown in Parquet and how does it work end-to-end? What are best practices for writing Parquet files in production? How does Google BigQuery use Parquet-style columnar storage internally? What is Delta Lake and how does it extend Parquet for ACID transactions? How do you perform upserts (MERGE INTO) on Parquet-based tables in Delta Lake?

Show more question and Answers...

Data Lake Interview questions

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database Integration Cloud Scala Python Tools Golang	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.

BigData / Apache Parquet Interview Questions

What are the supported data types in Parquet?

Comments & Discussions

Recently added...