BigData / Data pipeline interview questions

Difference between Data pipeline and ETL pipeline.

Data Pipelines and ETL Pipelines, both signify processes for moving data from one system to the other; they are not entirely the same thing. Below are three key differences:

Data Pipeline Is an Umbrella Term of Which ETL Pipelines Are a Subset. An ETL Pipeline ends with loading the data into a database or data warehouse. A Data Pipeline doesn't always end with the loading. In a Data Pipeline, the loading can instead activate new processes and flows by triggering webhooks in other systems.

ETL Pipelines Always Involve Transformation. ETL is a series of processes extracting data from a source, transforming it, and then loading it into the output destination. Data Pipelines also involve moving data between different systems but do not necessarily include transforming it.

ETL Pipelines Run In Batches While Data Pipelines Run In Real-Time. ETL Pipelines usually run in batches, where data is moved in chunks on a regular schedule. It could be that the pipeline runs twice per day, or at a set time when general system traffic is low. Data Pipelines are often run as a real-time process with streaming computation, meaning that the data is updating continuously.

