Prev Next

Integration / Apache NiFi Interview Questions

What is the Record-based processing model in NiFi and why is it preferred?

NiFi's record-based processing model treats FlowFile content as a structured stream of records rather than an opaque blob. A record is one logical row — one JSON object, one CSV line, one Avro record, one database row. Record-aware processors operate on individual records within a FlowFile, enabling format-agnostic transformations.

The model relies on three Controller Service types:

RecordReader: Parses the FlowFile content and produces a stream of records. Implementations include JsonTreeReader, CSVReader, AvroReader, ParquetReader, XMLReader, and GrokReader (for unstructured log parsing).

RecordWriter: Serializes records back to bytes. Implementations include JsonRecordSetWriter, CSVRecordSetWriter, AvroRecordSetWriter, and ParquetRecordSetWriter.

Schema Registry: Optionally provides Avro schemas that readers and writers use to interpret and validate records. NiFi includes an embedded AvroSchemaRegistry.

Key record-aware processors: ConvertRecord (format conversion), QueryRecord (apply SQL SELECT against FlowFile records using Apache Calcite), LookupRecord (enrich records from external sources), UpdateRecord, and PartitionRecord (split into one FlowFile per distinct field value).

The key advantage is format independence: changing from JSON to CSV input requires only swapping the RecordReader Controller Service — no processor logic changes. It also avoids materializing entire FlowFiles into memory by streaming records one at a time.

How would you change a record-based NiFi flow from processing JSON input to processing CSV input?
Which record-aware processor allows you to apply SQL SELECT statements against the records within a FlowFile?

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.


Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is Apache NiFi and what problem does it solve? What is a FlowFile in Apache NiFi? What are the three NiFi repositories and what does each store? What is a Processor in Apache NiFi and what are the main processor categories? What is a Connection in NiFi and how does back-pressure work? What is NiFi Expression Language and where can it be used? What is data provenance in Apache NiFi and how do you access it? What is a Process Group in NiFi and why is it used? What is NiFi Registry and how does it integrate with NiFi? How does NiFi clustering work and what is the role of ZooKeeper? What is a Controller Service in NiFi and how is it different from a Processor? What is the GenerateTableFetch and QueryDatabaseTable pattern for incremental database ingestion? What is the Record-based processing model in NiFi and why is it preferred? What is State Management in NiFi and what types of state scope exist? What is NiFi Site-to-Site (S2S) and when do you use it? What is NiFi and how does it relate to Apache NiFi? What is NiFi Parameter Context and how does it differ from Variables? How does NiFi handle security — TLS, authentication, and authorization? What is the NiFi NAR (NiFi Archive) classloading model? What are Reporting Tasks in NiFi and what are common use cases? How do you handle errors and failures in a NiFi flow? What is the SplitText processor and how do you control split behavior? What is the MergeContent processor and how is it used? What is the InvokeHTTP processor and what are key configuration considerations? What is the PublishKafka and ConsumeKafka processor pair and what are key configuration options? What is the ExecuteScript processor and what scripting languages does it support? What is the JoltTransformJSON processor and how do you use it? What is the PutDatabaseRecord processor and how does it differ from ExecuteSQL? What is the ListSFTP and FetchSFTP processor pattern and how does it work? What is the LookupRecord processor used for? What is the PartitionRecord processor and what is a common use case? What is the ConvertRecord processor and how is it used for format conversion? What are the NiFi processor scheduling strategies? What is the difference between EvaluateJsonPath and FlattenJson processors? How does NiFi integrate with Apache Hadoop and HDFS? What is the UpdateAttribute processor and how is its Advanced Mode used? How do you implement deduplication in a NiFi flow? What is the HandleHttpRequest and HandleHttpResponse processor pair used for? How does NiFi achieve guaranteed delivery and what are its durability guarantees? What is the Funnel component in NiFi and when do you use it? What is the difference between GetFile and ListFile + FetchFile processors? How does NiFi support schema evolution in data pipelines? What is the RouteText processor and how does it differ from RouteOnContent? What performance tuning options are available in NiFi and what are common bottleneck patterns? How does NiFi integrate with cloud storage services like Amazon S3?
Show more question and Answers...

Cloud

Comments & Discussions