Prev Next

Database / CouchDB Interview Questions

How does CouchDB handle large document sets — what are the performance trade-offs of large vs many small documents?

CouchDB does not have a hard document size limit (the default max_document_size is 4GB), but the performance trade-offs between storing data as a small number of large documents versus many small documents are significant.

Large documents (e.g., one document per entity with thousands of nested items):

  • Every update requires a full rewrite of the document, even if only one nested field changed. This amplifies write I/O and revision chain growth.
  • Replication transfers the entire document body on every change. For a 5MB document that changes frequently, this saturates replication bandwidth quickly.
  • MVCC conflicts are more likely and more costly to merge because the entire body must be transferred and compared.
  • Reading the document always loads the full JSON, even if only one field is needed (CouchDB has no projection at the storage layer — Mango fields projection happens after the document is loaded).

Many small documents (one document per event/record):

  • Updates are small and cheap; conflicts affect only the specific document touched.
  • Replication is incremental — only changed documents are transferred.
  • Mango and view queries can filter and paginate efficiently.
  • The trade-off: each document has overhead (~200 bytes for metadata). A database with 100 million tiny 50-byte documents will have metadata overhead larger than the data itself.

The recommended pattern: keep documents to a natural entity size (an order with its line items, not an order with the entire customer history). Avoid designs that require updating a single document at a rate faster than ~100 writes/second — high-frequency counters belong in Redis, not in a CouchDB document.

Why is storing frequently-updated data as one large CouchDB document problematic for replication?
What is the primary trade-off when splitting data into millions of tiny CouchDB documents?

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.


Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is Apache CouchDB and what makes it different from relational databases? What data model does CouchDB use and how is a document structured? What is the CouchDB HTTP REST API and how do you perform basic CRUD operations? What is MVCC (Multi-Version Concurrency Control) in CouchDB and how does it handle write conflicts? What is the _rev field in CouchDB and why is it required for updates and deletes? What is the CouchDB storage engine (B-tree) and how does its append-only write work? What is database compaction in CouchDB and when should you run it? What are CouchDB attachments and when would you use them? What is the difference between CouchDB and Couchbase? What are the CAP theorem trade-offs for CouchDB — is it CP or AP? What are CouchDB design documents and what do they contain? What are MapReduce views in CouchDB and how do you define a map function? How does the reduce function work in CouchDB views and what are the built-in reduce functions? What are view indexes in CouchDB and how are they built and updated, including stale options? What is the Mango query language in CouchDB and how does it differ from MapReduce views? How do you create and use a Mango index in CouchDB (json and text indexes)? What are the query operators available in the Mango selector syntax? What is the _all_docs endpoint in CouchDB and how does it differ from a custom view? How do you paginate results in CouchDB views using startkey, endkey, and skip/limit? What is a list function in CouchDB and when would you use it? How does CouchDB replication work and what is the replication protocol? What is the difference between one-shot and continuous replication in CouchDB? What is filtered replication in CouchDB and how do you implement it? What is CouchDB Cluster mode (CouchDB 2.x+) and how does it differ from single-node CouchDB 1.x? How does CouchDB cluster sharding work — what are the Q, n, r, and w parameters? What is the _node and _cluster_setup API used for in CouchDB clustering? How does CouchDB handle replication conflicts and what strategies exist to resolve them? What is the CouchDB winning revision algorithm for conflict resolution? What is PouchDB and how does it enable offline-first applications with CouchDB sync? What is Couchbase Sync Gateway and how does it relate to CouchDB's replication model? How does CouchDB implement authentication — cookie auth, JWT, and proxy auth? What is CouchDB's permission model — admin party, database admins, and database readers? How do you implement document-level security in CouchDB using validate_doc_update functions? What is a CouchDB _security object and how do you configure roles and members? How do you enable SSL/TLS in CouchDB and what configuration is required? How do you monitor CouchDB performance using the _stats and _active_tasks endpoints? What are the key CouchDB configuration parameters to tune for production (max_dbs_open, os_process_limit, etc.)? How does CouchDB handle large document sets — what are the performance trade-offs of large vs many small documents? What is the CouchDB _changes feed and how do you use it for real-time event streaming? What are CouchDB update handlers and how do they differ from direct PUT operations? What are CouchDB show functions and when were they deprecated? How do you back up and restore a CouchDB database? How does CouchDB compare to MongoDB for document storage use cases? What are common CouchDB anti-patterns and how do you avoid them? How do you migrate data between CouchDB versions or instances?
Show more question and Answers...

MuleESB

Comments & Discussions