Prev Next

Integration / Apache NiFi Interview Questions

What is the difference between GetFile and ListFile + FetchFile processors?

Both approaches ingest files from a local filesystem, but they differ in architecture, parallelism, and operational characteristics.

GetFile: The older, simpler, single-processor approach. It lists a directory, picks up files matching the configured filter, moves or deletes the source file atomically, and produces a FlowFile with the file content. Critical limitation: it is not safe to run with multiple concurrent tasks because two threads could attempt to process the same file simultaneously — its filesystem rename locking is not atomic on all filesystems or NFS mounts. On a cluster, GetFile should run on the Primary Node only.

ListFile + FetchFile: The modern, recommended approach. ListFile scans the directory and emits one FlowFile per found file containing only metadata attributes (filename, path, size, last modified). It uses State Management to track already-listed files. FetchFile then reads the actual file content from disk. This separation enables:

  • Parallel fetching: multiple concurrent FetchFile tasks read files simultaneously
  • Clear separation of concerns: listing happens once; fetching can be retried independently per file
  • Works correctly in clustered NiFi without Primary Node restriction on the fetch step

GetFile remains appropriate for simple single-node use cases. For new development and cluster deployments, ListFile + FetchFile is preferred.

Why is GetFile not recommended for use with multiple concurrent tasks in a NiFi cluster?
What mechanism does ListFile use to avoid re-listing files it has already seen on subsequent runs?

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.


Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is Apache NiFi and what problem does it solve? What is a FlowFile in Apache NiFi? What are the three NiFi repositories and what does each store? What is a Processor in Apache NiFi and what are the main processor categories? What is a Connection in NiFi and how does back-pressure work? What is NiFi Expression Language and where can it be used? What is data provenance in Apache NiFi and how do you access it? What is a Process Group in NiFi and why is it used? What is NiFi Registry and how does it integrate with NiFi? How does NiFi clustering work and what is the role of ZooKeeper? What is a Controller Service in NiFi and how is it different from a Processor? What is the GenerateTableFetch and QueryDatabaseTable pattern for incremental database ingestion? What is the Record-based processing model in NiFi and why is it preferred? What is State Management in NiFi and what types of state scope exist? What is NiFi Site-to-Site (S2S) and when do you use it? What is NiFi and how does it relate to Apache NiFi? What is NiFi Parameter Context and how does it differ from Variables? How does NiFi handle security — TLS, authentication, and authorization? What is the NiFi NAR (NiFi Archive) classloading model? What are Reporting Tasks in NiFi and what are common use cases? How do you handle errors and failures in a NiFi flow? What is the SplitText processor and how do you control split behavior? What is the MergeContent processor and how is it used? What is the InvokeHTTP processor and what are key configuration considerations? What is the PublishKafka and ConsumeKafka processor pair and what are key configuration options? What is the ExecuteScript processor and what scripting languages does it support? What is the JoltTransformJSON processor and how do you use it? What is the PutDatabaseRecord processor and how does it differ from ExecuteSQL? What is the ListSFTP and FetchSFTP processor pattern and how does it work? What is the LookupRecord processor used for? What is the PartitionRecord processor and what is a common use case? What is the ConvertRecord processor and how is it used for format conversion? What are the NiFi processor scheduling strategies? What is the difference between EvaluateJsonPath and FlattenJson processors? How does NiFi integrate with Apache Hadoop and HDFS? What is the UpdateAttribute processor and how is its Advanced Mode used? How do you implement deduplication in a NiFi flow? What is the HandleHttpRequest and HandleHttpResponse processor pair used for? How does NiFi achieve guaranteed delivery and what are its durability guarantees? What is the Funnel component in NiFi and when do you use it? What is the difference between GetFile and ListFile + FetchFile processors? How does NiFi support schema evolution in data pipelines? What is the RouteText processor and how does it differ from RouteOnContent? What performance tuning options are available in NiFi and what are common bottleneck patterns? How does NiFi integrate with cloud storage services like Amazon S3?
Show more question and Answers...

Cloud

Comments & Discussions