Prev Next

Integration / Apache NiFi Interview Questions

What is the MergeContent processor and how is it used?

MergeContent is a NiFi processor that combines multiple FlowFiles into a single FlowFile. It is the counterpart to processors like SplitText and SplitJSON, enabling a scatter-gather pattern: split a large FlowFile into pieces for parallel processing, then merge the results back together.

MergeContent supports two merge strategies:

Defragment: Reassembles fragments produced by a split operation. It reads the fragment.identifier and fragment.count attributes and waits until all fragments with the same identifier have arrived before merging them in order. This mode requires fragment attributes to be present.

Bin-Packing Algorithm: Collects FlowFiles and merges when one of several triggers fires — minimum and maximum FlowFile count, minimum and maximum bin size in bytes, or a maximum wait time. Used for batching many small FlowFiles into a larger one for efficient downstream writing (e.g., batching records before writing to S3 as Parquet).

Output format options include: Binary Concatenation (concatenate raw content), TAR (create a TAR archive), ZIP (create a ZIP archive), and FlowFileStream v3 (NiFi's internal format that preserves all attributes of each constituent FlowFile). The FlowFileStream format is used with UnpackContent to later unpack the merged FlowFile back into individual FlowFiles with attributes intact.

In Defragment merge strategy, what attribute pair does MergeContent use to know when all fragments of a split have arrived?
Which merge output format preserves all FlowFile attributes when packing multiple FlowFiles so they can later be individually restored by UnpackContent?

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.


Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is Apache NiFi and what problem does it solve? What is a FlowFile in Apache NiFi? What are the three NiFi repositories and what does each store? What is a Processor in Apache NiFi and what are the main processor categories? What is a Connection in NiFi and how does back-pressure work? What is NiFi Expression Language and where can it be used? What is data provenance in Apache NiFi and how do you access it? What is a Process Group in NiFi and why is it used? What is NiFi Registry and how does it integrate with NiFi? How does NiFi clustering work and what is the role of ZooKeeper? What is a Controller Service in NiFi and how is it different from a Processor? What is the GenerateTableFetch and QueryDatabaseTable pattern for incremental database ingestion? What is the Record-based processing model in NiFi and why is it preferred? What is State Management in NiFi and what types of state scope exist? What is NiFi Site-to-Site (S2S) and when do you use it? What is NiFi and how does it relate to Apache NiFi? What is NiFi Parameter Context and how does it differ from Variables? How does NiFi handle security — TLS, authentication, and authorization? What is the NiFi NAR (NiFi Archive) classloading model? What are Reporting Tasks in NiFi and what are common use cases? How do you handle errors and failures in a NiFi flow? What is the SplitText processor and how do you control split behavior? What is the MergeContent processor and how is it used? What is the InvokeHTTP processor and what are key configuration considerations? What is the PublishKafka and ConsumeKafka processor pair and what are key configuration options? What is the ExecuteScript processor and what scripting languages does it support? What is the JoltTransformJSON processor and how do you use it? What is the PutDatabaseRecord processor and how does it differ from ExecuteSQL? What is the ListSFTP and FetchSFTP processor pattern and how does it work? What is the LookupRecord processor used for? What is the PartitionRecord processor and what is a common use case? What is the ConvertRecord processor and how is it used for format conversion? What are the NiFi processor scheduling strategies? What is the difference between EvaluateJsonPath and FlattenJson processors? How does NiFi integrate with Apache Hadoop and HDFS? What is the UpdateAttribute processor and how is its Advanced Mode used? How do you implement deduplication in a NiFi flow? What is the HandleHttpRequest and HandleHttpResponse processor pair used for? How does NiFi achieve guaranteed delivery and what are its durability guarantees? What is the Funnel component in NiFi and when do you use it? What is the difference between GetFile and ListFile + FetchFile processors? How does NiFi support schema evolution in data pipelines? What is the RouteText processor and how does it differ from RouteOnContent? What performance tuning options are available in NiFi and what are common bottleneck patterns? How does NiFi integrate with cloud storage services like Amazon S3?
Show more question and Answers...

Cloud

Comments & Discussions