BigData / Data Lake Interview questions
What is the difference between ETL and ELT in the context of Data Lakes?
ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) represent different approaches to data integration, with ELT becoming the preferred pattern for cloud data lakes due to their massive compute and storage capabilities.
ETL (Traditional Approach): Data is extracted from sources, transformed in an intermediate processing layer (often on separate ETL servers), then loaded into the target system. ETL dominated when storage was expensive and target systems (data warehouses) had limited compute. Transformation logic runs outside the target platform.
ETL Characteristics:
- Transformation occurs before loading
- Requires separate ETL servers/tools
- Only transformed data reaches the target
- Slower initial load due to transformation overhead
- Difficult to reprocess with different logic (source data not preserved)
- Common tools: Informatica, Talend, SSIS, DataStage
ELT (Modern Cloud Approach): Raw data is extracted and immediately loaded into the data lake, then transformed using the lake's native compute engines (Spark, Presto, Athena). ELT leverages cloud scalability and preserves raw data for flexibility.
ELT Characteristics:
- Load raw data first, transform later
- Use data lake's compute for transformation (Spark, SQL engines)
- Preserves complete raw data history
- Faster initial ingestion
- Easy reprocessing with different transformation logic
- Fits medallion architecture: Bronze (raw load), Silver/Gold (transformation)
- Common tools: dbt, Databricks, Snowflake, BigQuery
Why ELT Works for Data Lakes:
- Cheap Storage: Cloud object storage is inexpensive, so storing raw data is affordable
- Elastic Compute: Spin up massive compute clusters for transformation, pay only for usage
- Schema-on-Read: No need to define schemas before loading
- Reprocessability: Raw data enables rerunning transformations with new logic
- Parallel Processing: Distributed engines handle transformation at scale
When to Use ETL vs ELT:
- Use ETL: Limited target storage, sensitive data requiring pre-load filtering, legacy systems, compliance restrictions on raw data storage
- Use ELT: Cloud data lakes, need for data exploration, changing requirements, auditable full history, leveraging cloud-native compute
Modern architectures often blend both: ELT for most workloads, ETL for specific sources requiring heavy transformation or data privacy controls.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
