Prev Next

Database / CouchDB Interview Questions

1. What is Apache CouchDB and what makes it different from relational databases? 2. What data model does CouchDB use and how is a document structured? 3. What is the CouchDB HTTP REST API and how do you perform basic CRUD operations? 4. What is MVCC (Multi-Version Concurrency Control) in CouchDB and how does it handle write conflicts? 5. What is the _rev field in CouchDB and why is it required for updates and deletes? 6. What is the CouchDB storage engine (B-tree) and how does its append-only write work? 7. What is database compaction in CouchDB and when should you run it? 8. What are CouchDB attachments and when would you use them? 9. What is the difference between CouchDB and Couchbase? 10. What are the CAP theorem trade-offs for CouchDB — is it CP or AP? 11. What are CouchDB design documents and what do they contain? 12. What are MapReduce views in CouchDB and how do you define a map function? 13. How does the reduce function work in CouchDB views and what are the built-in reduce functions? 14. What are view indexes in CouchDB and how are they built and updated, including stale options? 15. What is the Mango query language in CouchDB and how does it differ from MapReduce views? 16. How do you create and use a Mango index in CouchDB (json and text indexes)? 17. What are the query operators available in the Mango selector syntax? 18. What is the _all_docs endpoint in CouchDB and how does it differ from a custom view? 19. How do you paginate results in CouchDB views using startkey, endkey, and skip/limit? 20. What is a list function in CouchDB and when would you use it? 21. How does CouchDB replication work and what is the replication protocol? 22. What is the difference between one-shot and continuous replication in CouchDB? 23. What is filtered replication in CouchDB and how do you implement it? 24. What is CouchDB Cluster mode (CouchDB 2.x+) and how does it differ from single-node CouchDB 1.x? 25. How does CouchDB cluster sharding work — what are the Q, n, r, and w parameters? 26. What is the _node and _cluster_setup API used for in CouchDB clustering? 27. How does CouchDB handle replication conflicts and what strategies exist to resolve them? 28. What is the CouchDB winning revision algorithm for conflict resolution? 29. What is PouchDB and how does it enable offline-first applications with CouchDB sync? 30. What is Couchbase Sync Gateway and how does it relate to CouchDB's replication model? 31. How does CouchDB implement authentication — cookie auth, JWT, and proxy auth? 32. What is CouchDB's permission model — admin party, database admins, and database readers? 33. How do you implement document-level security in CouchDB using validate_doc_update functions? 34. What is a CouchDB _security object and how do you configure roles and members? 35. How do you enable SSL/TLS in CouchDB and what configuration is required? 36. How do you monitor CouchDB performance using the _stats and _active_tasks endpoints? 37. What are the key CouchDB configuration parameters to tune for production (max_dbs_open, os_process_limit, etc.)? 38. How does CouchDB handle large document sets — what are the performance trade-offs of large vs many small documents? 39. What is the CouchDB _changes feed and how do you use it for real-time event streaming? 40. What are CouchDB update handlers and how do they differ from direct PUT operations? 41. What are CouchDB show functions and when were they deprecated? 42. How do you back up and restore a CouchDB database? 43. How does CouchDB compare to MongoDB for document storage use cases? 44. What are common CouchDB anti-patterns and how do you avoid them? 45. How do you migrate data between CouchDB versions or instances?
Could not find what you were looking for? send us the question and we would be happy to answer your question.

1. What is Apache CouchDB and what makes it different from relational databases?

Apache CouchDB is an open-source NoSQL document database that stores all data as self-contained JSON documents and exposes its entire API over plain HTTP/HTTPS. No proprietary wire protocol or special client driver is required. It was created by Damien Katz, open-sourced in 2005, and graduated to an Apache Software Foundation top-level project in 2008. The 3.x line introduced a native clustered mode built on the existing Erlang OTP foundation.

Three properties set CouchDB apart from relational databases:

  • Schema-free JSON documents — there are no tables or fixed column definitions. Each document in a database can have a completely different structure, and adding a field to one document has zero effect on any other.
  • HTTP as the primary interface — every operation (CRUD, querying, replication, admin) is a plain HTTP request. You can interact with CouchDB using curl, a browser, or any HTTP library without installing a database driver.
  • Built-in, protocol-level replication — CouchDB's replication is a peer-to-peer HTTP protocol where any node can replicate to or from any other node, supporting master-master setups, offline sync, and mobile clients natively.

Relational databases enforce ACID across multi-row, multi-table transactions using row-level locks and a shared transaction log. CouchDB provides ACID at the single-document level through MVCC and an append-only B-tree engine. There are no JOINs; relationships are denormalized or referenced by document ID.

Which primary interface does CouchDB use for all database operations, including replication?
At what granularity does CouchDB guarantee ACID properties?
2. What data model does CouchDB use and how is a document structured?

CouchDB uses a document data model: all data is stored as discrete JSON objects grouped into databases. There is no enforced schema — documents in the same database can have entirely different fields.

Every CouchDB document has two mandatory system fields:

  • _id — the unique primary key. If omitted, CouchDB generates a UUID. Typed prefix conventions like "order:2024-001" co-locate related documents in the B-tree.
  • _rev — the current revision token in the format {generation}-{md5hash}. You must supply the current _rev on every update or delete.
{
  "_id": "order:2024-00188",
  "_rev": "2-7d3a9f012b4e8c56ab1d2ef3",
  "type": "order",
  "customer_id": "user:42",
  "items": [
    { "sku": "WIDGET-01", "qty": 3, "price": 9.99 },
    { "sku": "GADGET-07", "qty": 1, "price": 49.00 }
  ],
  "total": 78.97,
  "status": "pending",
  "created_at": "2024-03-15T08:22:00Z",
  "_attachments": {
    "invoice.pdf": {
      "content_type": "application/pdf",
      "length": 48312,
      "stub": true
    }
  }
}

Values can be any valid JSON type: strings, numbers, booleans, arrays, or nested objects. Binary data is stored as attachments under the reserved _attachments key. Special system documents — design documents (_design/) and local documents (_local/) — live in the same database namespace but are handled differently by the replication engine.

What format does a CouchDB _rev field follow?
What are _design/ documents in CouchDB used for?
3. What is the CouchDB HTTP REST API and how do you perform basic CRUD operations?

CouchDB maps every database operation to a standard HTTP method and URL. The server root is typically http://localhost:5984. No driver installation is required — curl or any HTTP client works directly.

# Create a database
curl -X PUT http://admin:pass@localhost:5984/inventory
# {"ok":true}

# Create a document with a specific _id (PUT)
curl -X PUT http://admin:pass@localhost:5984/inventory/item:001 \
  -H "Content-Type: application/json" \
  -d '{"type":"item","name":"Widget","stock":150}'
# {"ok":true,"id":"item:001","rev":"1-3c6a8..."}

# Create a document with auto-generated _id (POST)
curl -X POST http://admin:pass@localhost:5984/inventory \
  -H "Content-Type: application/json" \
  -d '{"type":"item","name":"Gadget","stock":30}'

# Read a document
curl http://admin:pass@localhost:5984/inventory/item:001
# Returns JSON with _id and _rev

# Update — _rev is mandatory in the body
curl -X PUT http://admin:pass@localhost:5984/inventory/item:001 \
  -H "Content-Type: application/json" \
  -d '{"_rev":"1-3c6a8...","type":"item","name":"Widget","stock":200}'
# {"ok":true,"id":"item:001","rev":"2-9b4f1..."}

# Delete — pass _rev as query param
curl -X DELETE "http://admin:pass@localhost:5984/inventory/item:001?rev=2-9b4f1..."
# {"ok":true,"id":"item:001","rev":"3-d2e79..."}

# Bulk upsert
curl -X POST http://admin:pass@localhost:5984/inventory/_bulk_docs \
  -H "Content-Type: application/json" \
  -d '{"docs":[{"type":"item","name":"Part-A"},{"type":"item","name":"Part-B"}]}'

HTTP status codes follow REST conventions: 201 Created on success, 200 OK for reads, 404 Not Found, and 409 Conflict when the supplied _rev does not match the server's current revision. The _bulk_docs endpoint accepts thousands of documents per request, dramatically reducing round-trips for bulk loads.

Which HTTP method and URL does CouchDB use to create a document with a caller-specified ID?
What HTTP status code does CouchDB return when a write fails because the supplied _rev is stale?
4. What is MVCC (Multi-Version Concurrency Control) in CouchDB and how does it handle write conflicts?

MVCC in CouchDB means every write produces a new immutable version of the document instead of modifying data in place. Readers always see a consistent snapshot from the moment they start reading; no read locks are acquired. The append-only B-tree storage engine keeps old revisions on disk until compaction removes them.

The conflict mechanism works like this: when two concurrent writers both read a document at revision 2-abc and both try to PUT with _rev: "2-abc", only the first writer to reach the storage engine succeeds. The second receives HTTP 409 Conflict immediately.

# Both clients read revision 2-abc...

# Client A succeeds:
curl -X PUT http://localhost:5984/db/doc1 \
  -d '{"_rev":"2-abc","value":10}'
# 201: {"ok":true,"rev":"3-xyz"}

# Client B fails — same _rev already superseded:
curl -X PUT http://localhost:5984/db/doc1 \
  -d '{"_rev":"2-abc","value":20}'
# 409: {"error":"conflict","reason":"Document update conflict."}

The standard resolution is a read-modify-write retry loop: on 409, re-read the document to get the latest _rev, apply the business logic to the fresh body, and retry the PUT. This is optimistic locking enforced at the protocol level.

In multi-master replication scenarios, two nodes can independently accept writes to the same document. These produce open conflicts stored as sibling revisions, visible via ?conflicts=true. The application must explicitly merge or discard the losing revision to resolve them (see Q27).

In CouchDB's MVCC model, what happens to the previous document revision on a write?
What is the recommended strategy when a CouchDB PUT returns HTTP 409?
5. What is the _rev field in CouchDB and why is it required for updates and deletes?

The _rev field is CouchDB's revision token — a unique identifier for a specific version of a document. Its format is {generation}-{hash} where generation is a monotonically increasing integer (starting at 1) and hash is an MD5 of the document body. Example: "1-967a00dff5e02add41819138abb3284d". After one update it becomes something like "2-7051cbe5c8faecd085a3fa619e6e6337".

Why it is mandatory for updates and deletes: The _rev is the MVCC optimistic-lock token. CouchDB compares the supplied _rev against what the storage engine holds. A match means the write is based on the current state — the operation proceeds and a new revision is assigned. A mismatch means another writer updated the document since you read it — CouchDB returns HTTP 409 Conflict, preventing silent data loss.

# Read — always capture the returned _rev
curl http://localhost:5984/db/doc1
# {"_id":"doc1","_rev":"1-abc","name":"Alice"}

# Update: supply the exact current _rev in the body
curl -X PUT http://localhost:5984/db/doc1 \
  -H "Content-Type: application/json" \
  -d '{"_rev":"1-abc","name":"Alice Smith"}'
# Response: {"ok":true,"rev":"2-def"}

# Delete: supply _rev as a query parameter
curl -X DELETE "http://localhost:5984/db/doc1?rev=2-def"
# Writes a tombstone: {"_id":"doc1","_rev":"3-xyz","_deleted":true}

Deleting a document does not remove it physically. CouchDB writes a tombstone — a minimal document with _deleted: true at the next revision — so the deletion event replicates correctly to other nodes. Tombstones are only removed by the _purge API, which bypasses the replication system and should be used with care.

When CouchDB deletes a document, what physically happens to the document record?
What does the generation number in a _rev (e.g. "3" in "3-abc") represent?
6. What is the CouchDB storage engine (B-tree) and how does its append-only write work?

CouchDB stores each database as a single file on disk structured around an append-only B-tree. There are multiple B-trees per database file: one for documents and one for each view index. Every write — new document, updated revision, or index update — is appended to the end of the file. The existing bytes are never modified in place. New B-tree nodes are written at the end, and a small database header near the end of the file is atomically updated to point to the new B-tree root.

Three important consequences of this design:

  • Crash safety without a WAL — a crash mid-write at most leaves an incomplete append at the tail. On restart, CouchDB scans backward for the last valid database header and discards any partial write. No separate Write-Ahead Log is needed.
  • No read locks — the previous B-tree root remains valid until the header atomically advances. Concurrent readers always see a consistent snapshot, which is the physical basis of MVCC.
  • Simple fsync durability — CouchDB calls fsync after writing each committed transaction before returning 201 to the client, guaranteeing data is on stable storage.

The trade-off: the file grows with every write because old revisions accumulate as unreachable B-tree nodes. This is why compaction is essential in write-heavy workloads. In CouchDB 3.x each shard of a clustered database is its own append-only file following the same model.

Why does CouchDB not require a separate Write-Ahead Log for crash recovery?
What happens to old B-tree nodes from previous document revisions in a CouchDB database file?
7. What is database compaction in CouchDB and when should you run it?

Database compaction rewrites a CouchDB database file from scratch, retaining only the current (winning) revision of each document and discarding all stale revisions and orphaned B-tree nodes. Because CouchDB uses an append-only storage engine, every update grows the file. A database with millions of updates can be orders of magnitude larger than the size of its live data. Compaction reclaims that space.

When to run compaction:

  • After a bulk data migration or large import that produced deep revision chains.
  • When disk usage grows significantly faster than the document count (high update churn).
  • On a regular nightly schedule in write-heavy production systems.
  • CouchDB 2.x+ supports automatic compaction triggers via smoosh (the built-in compaction daemon) which fires when the ratio of live data to total file size drops below a configurable threshold.
# Manually trigger compaction on a database
curl -X POST http://admin:pass@localhost:5984/mydb/_compact
# {"ok":true}

# Compact view indexes of a specific design document
curl -X POST http://admin:pass@localhost:5984/mydb/_compact/my_ddoc
# {"ok":true}

# Monitor progress
curl http://admin:pass@localhost:5984/_active_tasks
# Shows compaction tasks with "progress" percentage

During compaction the database stays fully online — CouchDB continues serving reads and writes from the old file and atomically switches to the new file when compaction finishes. View compaction is separate from document compaction; each design document's index has its own compaction command.

Is the CouchDB database accessible for reads and writes while compaction is running?
What does the _compact/{design_doc} endpoint compact compared to the plain _compact endpoint?

8. What are CouchDB attachments and when would you use them?

Attachments in CouchDB are binary blobs stored directly alongside a document under the reserved _attachments key. Each attachment has a filename, a MIME content type, byte length, and an MD5 digest. They are stored in the same database file as the document but transferred separately — a GET on the document returns only metadata stubs by default, not the binary payload.

# Attach a PDF to an existing document (must supply current _rev)
curl -X PUT \
  "http://admin:pass@localhost:5984/contracts/contract:1001/agreement.pdf?rev=2-abc" \
  -H "Content-Type: application/pdf" \
  --data-binary @agreement.pdf
# {"ok":true,"id":"contract:1001","rev":"3-xyz"}

# Fetch just the raw binary
curl http://admin:pass@localhost:5984/contracts/contract:1001/agreement.pdf

# Inline all attachment data in the document response
curl "http://admin:pass@localhost:5984/contracts/contract:1001?attachments=true"

Good use cases for attachments:

  • Small binary files that must replicate alongside their parent document — thumbnails, QR codes, small PDFs.
  • Offline-first mobile apps using PouchDB where images must sync alongside document metadata.

When to avoid them: Each attachment write bumps the document's _rev, making concurrent updates prone to 409 conflicts. Large attachments (over ~1 MB) bloat the database file and increase compaction time significantly. For large media, store the file in an object store (S3, MinIO, Cloudflare R2) and keep only the URL in the CouchDB document.

What does CouchDB return in the _attachments field when you GET a document without the ?attachments=true parameter?
Why does writing an attachment to a document create a potential concurrency problem?
9. What is the difference between CouchDB and Couchbase?

CouchDB and Couchbase are two distinct products. CouchDB is an Apache project. Couchbase emerged from a 2011 merger of CouchDB and Membase (a Memcached-compatible store). They diverged sharply afterward and now target different use cases with different architectures.

Apache CouchDB vs Couchbase Server
AspectApache CouchDBCouchbase Server
Primary use caseOffline-first sync, HTTP-native document storeHigh-performance operational database with caching
Query languageMapReduce views + Mango (MongoDB-style)N1QL — SQL for JSON
Primary APIHTTP REST — no driver neededLanguage SDKs (Java, .NET, Node.js, etc.)
Replication / mobile syncHTTP peer-to-peer; PouchDB for offline-firstXDCR for cross-datacenter; Couchbase Lite + Sync Gateway for mobile
In-memory cachingNone built inManaged RAM cache (Memcached heritage)
IndexingIncremental MapReduce B-tree; Mango JSON indexesGlobal Secondary Indexes, Full-Text Search, Analytics Service
LicensingApache 2.0 — fully open sourceCommunity Edition (OSS) + Enterprise (commercial)

Pick CouchDB for a lightweight, HTTP-accessible document store with excellent offline/mobile sync via PouchDB. Pick Couchbase when you need sub-millisecond latency, high concurrent throughput, N1QL SQL analytics, or the integrated Couchbase Lite mobile platform.

Which SQL-compatible query language does Couchbase provide that Apache CouchDB does not?
What heritage gives Couchbase its in-memory caching capability that CouchDB lacks?
10. What are the CAP theorem trade-offs for CouchDB — is it CP or AP?

CouchDB is an AP system — it prioritizes Availability and Partition tolerance over strict Consistency. When a network partition occurs, CouchDB nodes on either side continue accepting reads and writes rather than refusing requests to maintain linearizability. The result is that two nodes can hold diverged versions of the same document (called open conflicts) until replication heals the partition.

CouchDB's consistency model is eventual consistency: after a partition heals and replication runs, all nodes converge. The conflict resolution mechanism — the deterministic winning-revision algorithm plus application-level merge — is how convergence is achieved without a global coordinator.

The AP versus CP distinction surfaces in these specific scenarios:

  • Multi-master replication — both nodes independently accept writes to the same document. The replication protocol syncs them and surfaces the conflict for application resolution.
  • CouchDB 3.x cluster quorum settings — the write quorum w and read quorum r default to a majority of n replicas. Raising both to n makes the cluster refuse writes when a node is down, shifting behavior toward CP at the cost of availability.

This AP design is the reason CouchDB excels in offline-first and mobile applications via PouchDB: the mobile client writes locally (always available) and syncs to the server when connectivity returns, with conflicts resolved deterministically.

Under the CAP theorem, which guarantees does CouchDB prioritize in a multi-master distributed setup?
How can you shift a CouchDB 3.x cluster toward CP behavior during node failures?
11. What are CouchDB design documents and what do they contain?

Design documents are special CouchDB documents whose IDs begin with _design/. They live in the same database as regular documents but hold server-side JavaScript code that CouchDB's query server executes. Updating a design document invalidates and rebuilds all its associated indexes.

A design document can contain the following sections:

  • views — MapReduce index definitions. Each view has a map function and an optional reduce function.
  • indexes — Mango (json/text) index definitions for the _find endpoint.
  • validate_doc_update — a JavaScript function that runs before any document is saved; throw an error to reject the write.
  • filters — JavaScript functions used to filter which documents are replicated or streamed via the _changes feed.
  • updates — update handler functions that let you perform server-side document transformations via a POST request.
  • lists and shows — legacy functions (deprecated in 3.x) for server-side rendering of view results and individual documents as HTML/XML/text.
{
  "_id": "_design/orders",
  "views": {
    "by_status": {
      "map": "function(doc){ if(doc.type==='order') emit(doc.status, doc.total); }",
      "reduce": "_sum"
    }
  },
  "validate_doc_update": "function(newDoc, oldDoc, userCtx){ if(!newDoc.type) throw({forbidden:'type required'}); }",
  "filters": {
    "pending_only": "function(doc, req){ return doc.type==='order' && doc.status==='pending'; }"
  }
}

Design documents are versioned just like regular documents and replicate alongside data documents. Changing a design document in a replicated database will propagate the new index definitions to all replica nodes.

What naming convention identifies a design document in CouchDB?
What happens to a MapReduce view index when its parent design document is updated?
12. What are MapReduce views in CouchDB and how do you define a map function?

MapReduce views are CouchDB's primary indexing mechanism. A view has a map phase (mandatory) and an optional reduce phase. The map function is a JavaScript function that CouchDB runs against every document in the database. For each document it emits zero or more key-value pairs. CouchDB stores these emissions in a B-tree index, kept sorted by key. The reduce function (when present) aggregates values within a key range.

Views are defined inside design documents under the views key:

{
  "_id": "_design/products",
  "views": {
    "by_category": {
      "map": "function(doc) { if (doc.type === 'product' && doc.category) { emit(doc.category, { name: doc.name, price: doc.price }); } }"
    },
    "price_by_category": {
      "map": "function(doc) { if (doc.type === 'product') { emit(doc.category, doc.price); } }",
      "reduce": "_sum"
    },
    "by_compound_key": {
      "map": "function(doc) { if (doc.type === 'order') { emit([doc.year, doc.month, doc.day], 1); } }"
    }
  }
}

Key points about map functions:

  • The emit(key, value) call adds an entry to the index. A single document can emit multiple times, creating multiple index entries.
  • Keys can be strings, numbers, arrays, or null. Arrays support compound-key queries — range queries on [year, month] work naturally.
  • The value can be any JSON. Emitting null as the value and using include_docs=true in the query avoids duplicating the full document in the index.
  • Map functions must be pure (no side effects, no external HTTP calls) and deterministic.

Views are built lazily on first query and updated incrementally on subsequent queries — only documents changed since the last index update are re-processed.

What function call inside a CouchDB map function adds an entry to the view index?
When is a CouchDB MapReduce view index built and updated?
13. How does the reduce function work in CouchDB views and what are the built-in reduce functions?

The reduce function in a CouchDB MapReduce view aggregates the values emitted by the map function within a key range. CouchDB implements reduce using a rereduce mechanism: values are first reduced in small groups (reduce pass), then those partial results are reduced again (rereduce pass) until a single value remains. This makes reduce scalable across large datasets but also means your reduce function must handle rereduce correctly.

CouchDB provides three built-in reduce functions implemented natively in Erlang (much faster than JavaScript):

  • _sum — sums all emitted values. Input values must be numbers or arrays of numbers.
  • _count — counts the number of emitted key-value pairs regardless of value.
  • _stats — returns a statistics object with sum, count, min, max, and sumsqr (for standard deviation).
# Query a view with reduce (default: group_level=0, returns grand total)
curl "http://localhost:5984/sales/_design/reports/_view/revenue_by_region"
# {"rows":[{"key":null,"value":1482390.50}]}

# Group by exact key (group=true)
curl "http://localhost:5984/sales/_design/reports/_view/revenue_by_region?group=true"
# {"rows":[{"key":"APAC","value":312450},{"key":"EMEA","value":589120},...]}

# Group by first element of a compound key array
curl "http://localhost:5984/sales/_design/reports/_view/by_date?group_level=1"
# Groups by year only when key is [year, month, day]

Custom JavaScript reduce functions are allowed but must handle the rereduceboolean parameter: when rereduce=true, the input values are partial reduce results rather than raw map values. Incorrect rereduce handling is a common source of wrong aggregation results.

Which CouchDB built-in reduce function returns sum, count, min, max, and sumsqr statistics?
In a custom CouchDB reduce function, when rereduce=true, what are the input values?
14. What are view indexes in CouchDB and how are they built and updated, including stale options?

A view index in CouchDB is a persistent B-tree file on disk that stores all the key-value pairs emitted by a view's map function across all documents in the database. It is stored separately from the main database file (with a .view extension in the views/ directory). The index is sorted by emitted key, enabling efficient range queries.

Build and update lifecycle:

  • First query — if the index does not exist, CouchDB processes every document through the map function and builds the index from scratch. This can be slow for large databases.
  • Subsequent queries — CouchDB checks the database's update sequence number. Documents changed since the last index update are re-run through the map function incrementally. The index reflects all committed documents before returning results.
  • Design document change — any modification to the design document invalidates the entire index; full rebuild required.

The stale query parameter (CouchDB 1.x) or update parameter (2.x+) controls this behavior:

# Default: wait for index to be fully up to date before returning
GET /db/_design/ddoc/_view/my_view

# Return stale (potentially outdated) results immediately; trigger index update in background
GET /db/_design/ddoc/_view/my_view?stale=update_after   # 1.x style
GET /db/_design/ddoc/_view/my_view?update=lazy          # 2.x+ style

# Return whatever is in the index right now, do not update
GET /db/_design/ddoc/_view/my_view?stale=ok             # 1.x
GET /db/_design/ddoc/_view/my_view?update=false         # 2.x+

Using stale=update_after is a common pattern for dashboard queries where slightly stale data is acceptable and you want to avoid blocking the user while the index refreshes.

What triggers a full rebuild of a CouchDB view index?
What does the stale=update_after (or update=lazy in 2.x+) parameter do when querying a CouchDB view?
15. What is the Mango query language in CouchDB and how does it differ from MapReduce views?

Mango is CouchDB's declarative, MongoDB-inspired query language introduced in CouchDB 2.0. Instead of writing JavaScript map functions, you POST a JSON selector document to the _find endpoint. CouchDB evaluates the selector against a Mango index (or falls back to a full scan) and returns matching documents.

POST /mydb/_find
{
  "selector": {
    "type": "order",
    "status": "pending",
    "total": { "$gt": 100 }
  },
  "fields": ["_id", "customer_id", "total", "created_at"],
  "sort": [{ "created_at": "desc" }],
  "limit": 20,
  "skip": 0
}

Key differences between Mango and MapReduce views:

Mango vs MapReduce Views
AspectMango (_find)MapReduce Views
SyntaxJSON selector — no JavaScript requiredJavaScript map/reduce functions
Primary useAd-hoc filtering and sorting on arbitrary fieldsPre-aggregated sorted indexes; efficient range queries
AggregationNo built-in aggregation; returns documentsYes — _sum, _count, _stats reduce functions
Index typeMango JSON index or full-text indexPersistent sorted B-tree
Fallback without indexFull database scan (slow — avoid in production)N/A — view always has an index
Best forFlexible ad-hoc queries, REST APIs, searchReporting, aggregations, sorted lookups by known key

Mango is generally the right choice for new applications because it requires no JavaScript and works well for the typical document-filtering use cases. Use MapReduce views when you need server-side aggregation (sums, counts) or must query by a complex compound key with range semantics.

What CouchDB endpoint does the Mango query language use?
Which capability does MapReduce have that Mango _find queries do not natively support?
16. How do you create and use a Mango index in CouchDB (json and text indexes)?

Mango supports two index types: json indexes (B-tree, for equality and range queries on specific fields) and text indexes (full-text Lucene-backed, for free-text search on string fields). Both are created via POST to /_index.

# Create a JSON index on status + created_at for the orders collection
curl -X POST http://admin:pass@localhost:5984/mydb/_index \
  -H "Content-Type: application/json" \
  -d '{
    "index": {
      "fields": ["type", "status", "created_at"]
    },
    "name": "idx-orders-status-date",
    "type": "json",
    "ddoc": "_design/mango_indexes"
  }'
# {"result":"created","id":"_design/mango_indexes","name":"idx-orders-status-date"}

# Create a text (full-text) index
curl -X POST http://admin:pass@localhost:5984/mydb/_index \
  -H "Content-Type: application/json" \
  -d '{
    "index": {
      "default_field": { "enabled": true, "analyzer": "standard" }
    },
    "name": "idx-fulltext",
    "type": "text"
  }'

# Query using the json index (CouchDB picks the index automatically)
curl -X POST http://admin:pass@localhost:5984/mydb/_find \
  -H "Content-Type: application/json" \
  -d '{
    "selector": { "type": "order", "status": "pending" },
    "sort": [{ "created_at": "desc" }],
    "limit": 10
  }'

# List all indexes
curl http://admin:pass@localhost:5984/mydb/_index

CouchDB automatically selects the best available index for a _find query. Check the response header X-Couch-Request-ID and use _explain (POST to _explain with the same selector) to confirm which index was chosen. Without an appropriate index, CouchDB falls back to a full database scan, which is safe for development but unacceptable in production.

Which endpoint do you POST to in order to create a new Mango index in CouchDB?
How can you verify which Mango index CouchDB chose for a _find query?
17. What are the query operators available in the Mango selector syntax?

Mango selectors are JSON objects where each key is a document field or a Mango operator. Operators begin with $. They fall into four groups: comparison, logical, element, and array operators.

POST /mydb/_find
{
  "selector": {
    "$and": [
      { "type": { "$eq": "product" } },
      { "price": { "$gte": 10, "$lte": 100 } },
      { "tags": { "$elemMatch": { "$eq": "sale" } } },
      { "discontinued": { "$exists": false } },
      { "name": { "$regex": "^Widget" } }
    ]
  }
}
Mango Selector Operators
CategoryOperatorsDescription
Comparison$eq, $ne, $lt, $lte, $gt, $gteEquality and range comparisons
Logical$and, $or, $not, $norBoolean combinations of conditions
Element$exists, $typeCheck field presence or JSON type
Array$in, $nin, $all, $elemMatch, $sizeMatch values in or against arrays
Evaluation$regex, $modRegex match; modulo arithmetic

Important constraints: $regex queries do not use B-tree json indexes — they require a full-text (Lucene) index or fall back to a full scan. Compound conditions using $and can use a json index if all fields in the index prefix are covered by equality conditions. For best performance, structure selectors so the most selective equality conditions come first and match the leading fields of a json index.

Which Mango array operator checks that at least one element in an array field satisfies a nested condition?
Why should you avoid using $regex in a high-throughput Mango query without a text index?
18. What is the _all_docs endpoint in CouchDB and how does it differ from a custom view?

The _all_docs endpoint is a built-in view that CouchDB automatically maintains for every database. It returns all non-deleted documents sorted by their _id (ascending by default). Internally it is backed by the same document B-tree that stores the documents themselves, so it is always up to date with zero additional index maintenance cost.

# Retrieve all documents (just metadata by default)
curl "http://admin:pass@localhost:5984/mydb/_all_docs?limit=10"

# Include full document bodies
curl "http://admin:pass@localhost:5984/mydb/_all_docs?include_docs=true&limit=10"

# Range query by _id prefix (using the collation order of strings)
curl "http://admin:pass@localhost:5984/mydb/_all_docs?startkey=%22order:%22&endkey=%22order:￿%22&include_docs=true"

# Fetch specific documents by ID (bulk read equivalent to GET on each)
curl -X POST http://admin:pass@localhost:5984/mydb/_all_docs?include_docs=true \
  -H "Content-Type: application/json" \
  -d '{"keys":["order:001","order:002","user:42"]}'

Differences from a custom view:

  • _all_docs is always current — no lazy build delay on first access.
  • It is keyed only by _id. You cannot query by any other field — for that you need a view or Mango index.
  • A custom view can emit any key (category, date, compound key) and can aggregate values via reduce. _all_docs cannot.
  • _all_docs includes design documents; you can filter them out by requesting startkey="a" (design docs start with _, which sorts before alphabetic characters in CouchDB's collation).
By what field does CouchDB's _all_docs endpoint sort its results?
What is the main limitation of _all_docs compared to a custom MapReduce view?
19. How do you paginate results in CouchDB views using startkey, endkey, and skip/limit?

CouchDB views are sorted B-trees, so efficient pagination uses key-based cursoring rather than offset-based skipping. Two approaches exist: offset pagination (simpler but slow at large offsets) and key-based pagination (efficient at any depth).

# ── Approach 1: Offset-based (avoid for deep pages) ──
# Page 1
GET /db/_design/ddoc/_view/by_date?limit=10

# Page 2 (skip=10 forces a full scan of the first 10 rows — slow at scale)
GET /db/_design/ddoc/_view/by_date?limit=10&skip=10

# ── Approach 2: Key-based cursor (recommended for production) ──
# Page 1: fetch limit+1 to detect whether a next page exists
GET /db/_design/ddoc/_view/by_date?limit=11&descending=false

# From the response, take the last row's key and doc ID as the cursor:
# last key = "2024-03-15", last id = "order:0099"

# Page 2: start from the cursor using startkey + startkey_docid
GET /db/_design/ddoc/_view/by_date?startkey=%222024-03-15%22\
  &startkey_docid=order%3A0099&limit=11

# Range query — all orders between two dates
GET /db/_design/ddoc/_view/by_date?startkey=%222024-01-01%22\
  &endkey=%222024-03-31%22&include_docs=true

skip is implemented by scanning and discarding leading rows — O(n) in the skipped count. At page 500 with page size 20, skip=10000 forces CouchDB to read 10000 index entries before returning 20 results. For large datasets always use the key-cursor approach. The startkey_docid parameter resolves ties when multiple documents share the same emitted key, ensuring the cursor lands on the exact right row.

Why is using a large skip value for deep pagination in CouchDB views slow?
What parameter resolves ties when multiple documents share the same emitted key in key-cursor pagination?
20. What is a list function in CouchDB and when would you use it?

A list function is a server-side JavaScript function stored in a design document under the lists key. It acts as a streaming transformer for view query results — instead of returning raw JSON rows, it lets you produce any output format (HTML, XML, CSV, plain text) directly from CouchDB without an intermediary application server.

When called, the list function receives the view result rows one at a time via the getRow() function and can write arbitrary output using send(), building up the response incrementally. This streaming model means large result sets do not need to be buffered in memory.

// In _design/reports, "lists" section:
{
  "as_csv": "function(head, req) { start({'headers':{'Content-Type':'text/csv'}}); send('id,status,total\n'); var row; while(row = getRow()) { send(row.id+','+row.value.status+','+row.value.total+'\n'); } }"
}
# Call the list function against a view
GET /db/_design/reports/_list/as_csv/by_status?include_docs=false

When to use list functions: They were popular in CouchApps (self-contained web apps hosted entirely inside CouchDB) where HTML was served from list functions. They can also transform view output to feed legacy systems expecting XML or CSV without an application layer.

Deprecation status: List functions are deprecated in CouchDB 3.x along with show functions. The recommended replacement is to query views from your application server and perform the transformation there. The JavaScript query server adds latency and complexity compared to doing the same transformation in your application code.

What does a CouchDB list function do with the view result rows it receives?
What is the recommended replacement for CouchDB list functions in CouchDB 3.x?
21. How does CouchDB replication work and what is the replication protocol?

CouchDB replication is a document-level sync protocol that copies documents from a source database to a target database using standard HTTP. Either or both of source and target can be local or remote CouchDB instances. Replication is initiated by posting a replication document to the _replicator database or to the /_replicate endpoint directly.

The protocol works through these concrete steps:

  1. Get peer info — the replicator calls GET /target to confirm the target is reachable and retrieves its UUID.
  2. Read checkpoint — it reads the last replication checkpoint stored in a _local/ document on both source and target to know the last source sequence number already replicated.
  3. Get changes — it calls GET /source/_changes?since={last_seq} to fetch all document IDs and their current revisions changed since the last checkpoint.
  4. Check target revisions — it POSTs the changed IDs to POST /target/_revs_diff to find which revisions the target is missing.
  5. Fetch missing docs — it fetches the missing document bodies from the source (with attachments if any) and POSTs them in bulk to /target/_bulk_docs.
  6. Save checkpoint — it writes the new sequence number to _local/ docs on both peers so the next replication starts from there.
// Replication document in the _replicator database:
{
  "_id": "sync-orders-to-replica",
  "source": "http://admin:pass@primary:5984/orders",
  "target": "http://admin:pass@replica:5984/orders",
  "continuous": false,
  "create_target": true
}

The protocol is idempotent: re-running a replication never loses data or creates duplicates. Because it uses standard HTTP and checkpoint documents, any two CouchDB instances can replicate without special network infrastructure — making it practical for cloud-to-edge and offline-mobile scenarios.

What does CouchDB call to determine which document revisions the target is missing before fetching them?
Where does CouchDB's replication protocol store its checkpoint to resume from the last sync position?
22. What is the difference between one-shot and continuous replication in CouchDB?

CouchDB supports two replication modes: one-shot (the default) and continuous. The mode is set by the continuous boolean in the replication document.

One-shot replication syncs all documents changed since the last checkpoint, then completes. The replication job disappears once finished. It is appropriate for scheduled batch syncs, point-in-time backups, or bootstrapping a new replica.

Continuous replication runs indefinitely after initial sync. It keeps a long-lived _changes feed connection open to the source, processing new changes as they arrive in near real-time. The replication job persists in the _replicator database and is restarted automatically after node restarts.

# One-shot replication via _replicator database
curl -X POST http://admin:pass@localhost:5984/_replicator \
  -H "Content-Type: application/json" \
  -d '{
    "_id": "one-time-backup",
    "source": "http://localhost:5984/mydb",
    "target": "http://replica:5984/mydb",
    "continuous": false,
    "create_target": true
  }'

# Continuous replication
curl -X POST http://admin:pass@localhost:5984/_replicator \
  -H "Content-Type: application/json" \
  -d '{
    "_id": "live-sync-to-replica",
    "source": "http://localhost:5984/orders",
    "target": "http://replica:5984/orders",
    "continuous": true
  }'

# Check replication status
curl http://admin:pass@localhost:5984/_scheduler/jobs

Continuous replication introduces a persistent connection that consumes resources on both nodes. For high-volume databases, monitor the scheduler via /_scheduler/jobs and /_scheduler/docs to detect stalled or crashing replication jobs. A job that enters a crash-loop loop usually indicates a network issue, authentication problem, or an unfixable conflict on the target.

What is the key behavioral difference between one-shot and continuous replication in CouchDB?
Which CouchDB endpoint lets you monitor the status of running replication jobs?
23. What is filtered replication in CouchDB and how do you implement it?

Filtered replication allows you to replicate only a subset of documents from a source database, rather than copying every document. This reduces bandwidth, storage on the target, and replication lag. There are two ways to filter: using a filter function (server-side JavaScript) or using a Mango selector in the replication document (CouchDB 2.x+).

Option 1 — Filter function in a design document:

// In _design/replication_filters:
{
  "filters": {
    "by_type": "function(doc, req) { return doc.type === req.query.type; }"
  }
}
# Replicate only order documents
curl -X POST http://admin:pass@localhost:5984/_replicator \
  -H "Content-Type: application/json" \
  -d '{
    "_id": "orders-only",
    "source": "http://localhost:5984/mydb",
    "target": "http://replica:5984/orders",
    "continuous": true,
    "filter": "replication_filters/by_type",
    "query_params": { "type": "order" }
  }'

Option 2 — Mango selector (preferred in 2.x+, avoids a round-trip through the JavaScript query server):

curl -X POST http://admin:pass@localhost:5984/_replicator \
  -H "Content-Type: application/json" \
  -d '{
    "_id": "active-orders",
    "source": "http://localhost:5984/mydb",
    "target": "http://replica:5984/active_orders",
    "continuous": true,
    "selector": { "type": "order", "status": { "$in": ["pending","processing"] } }
  }'

The Mango selector approach is more efficient because the filter is evaluated against the changes feed using an in-process Erlang evaluator rather than spawning a JavaScript OS process for each document. It is the recommended approach for all new replication setups.

Which filtered replication approach in CouchDB 2.x+ is more efficient: a JavaScript filter function or a Mango selector?
When using a JavaScript filter function for replication, where is the filter definition stored?
24. What is CouchDB Cluster mode (CouchDB 2.x+) and how does it differ from single-node CouchDB 1.x?

CouchDB 2.0 (released 2016) absorbed the BigCouch clustering code from Cloudant and made clustered operation the default architecture. CouchDB 3.x continues this model. A CouchDB cluster consists of multiple Erlang nodes that cooperate via a distributed hash ring (using consistent hashing) to shard and replicate data automatically.

CouchDB 1.x Single Node vs 2.x/3.x Cluster
AspectCouchDB 1.x (single node)CouchDB 2.x/3.x (cluster)
Horizontal scalabilityNone — single process, single machineAdd nodes; data shards distributed automatically
Fault toleranceSingle point of failureConfigurable replica count (n) per database
Shard distributionNo sharding — one database fileConfigurable Q shards per database, each with n copies
Quorum reads/writesN/AConfigurable r (read quorum) and w (write quorum)
Admin interfaceFutonFauxton (modern React UI)
Single-node deploymentDefault modeSupported via single_node config option in 3.x
Database creationPUT /dbPUT /db?q=8&n=3 (control shards and replicas)

In a cluster, each database is split into Q shards (default 8). Each shard has n copies (default 3) stored on different nodes. When a node is added to the cluster, CouchDB uses the _cluster_setup API to join it to the ring and the rebalancing happens via standard replication. There is no external ZooKeeper or etcd dependency — cluster membership and topology are stored in the _dbs and _nodes internal databases.

What was the source of the CouchDB clustering code introduced in CouchDB 2.0?
In a CouchDB 3.x cluster, where is cluster topology (node membership and shard placement) stored?
25. How does CouchDB cluster sharding work — what are the Q, n, r, and w parameters?

In a CouchDB cluster, each database is divided into Q shards (also called range partitions). The key space of document IDs is divided into Q equally-sized ranges using consistent hashing. Each shard is stored as an independent database file on a node. Each shard has n replicas — copies stored on n different nodes for fault tolerance. The default is Q=8 shards and n=3 replicas, giving 24 shard files total for a 3-node cluster.

# Create a database with 4 shards and 2 replicas
curl -X PUT "http://admin:pass@localhost:5984/mydb?q=4&n=2"

# The cluster places shard copies according to the ring
# Check shard placement:
curl http://admin:pass@localhost:5984/mydb/_shards
curl http://admin:pass@localhost:5984/mydb/_shards/doc1  # which shard holds doc1

When a client writes a document, CouchDB hashes the document's _id to determine which shard it belongs to, then writes to all n replicas of that shard. The write succeeds when w replicas acknowledge the write. When a client reads, it contacts the relevant shard replicas and returns when r replicas agree.

CouchDB Cluster Quorum Parameters
ParameterMeaningDefault
QNumber of shards per database8
nNumber of replica copies per shard3
wWrite quorum — replicas that must acknowledge a write2 (majority of n=3)
rRead quorum — replicas that must respond to a read2

Setting w=1 maximizes write availability at the cost of potential data loss if the one acknowledging node crashes immediately after. Setting w=n requires all replicas to be available for every write — maximum durability but reduced availability. The default majority quorum (w=2 out of n=3) is the recommended balance for most production clusters.

In a CouchDB cluster with n=3 replicas, what is the default write quorum (w)?
What does the Q parameter control when creating a CouchDB database in cluster mode?
26. What is the _node and _cluster_setup API used for in CouchDB clustering?

The _node and _cluster_setup APIs are the two primary endpoints for managing a CouchDB cluster's topology. They are distinct in scope: _node operates on individual node configuration, while _cluster_setup orchestrates the multi-step process of forming or extending a cluster.

The _node API (/_node/{node-name}/) provides per-node operations:

  • GET /_node/_local/_config — read the running configuration of the local node.
  • PUT /_node/_local/_config/{section}/{key} — change a configuration value live without restart.
  • GET /_node/_local/_stats — performance counters for the local node.
  • GET /_node/_local/_system — Erlang VM stats (memory, processes, ports).

The _cluster_setup API (/_cluster_setup) provides a guided wizard for cluster formation:

# Step 1: Enable cluster mode on node 1
curl -X POST http://admin:pass@node1:5984/_cluster_setup \
  -H "Content-Type: application/json" \
  -d '{"action":"enable_cluster","username":"admin","password":"pass",
       "node_count":3,"bind_address":"0.0.0.0","port":5984}'

# Step 2: Add node 2 to the cluster
curl -X POST http://admin:pass@node1:5984/_cluster_setup \
  -H "Content-Type: application/json" \
  -d '{"action":"add_node","username":"admin","password":"pass",
       "host":"node2","port":5984}'

# Step 3: Finish cluster setup
curl -X POST http://admin:pass@node1:5984/_cluster_setup \
  -d '{"action":"finish_cluster"}'

# Check cluster membership
curl http://admin:pass@localhost:5984/_membership

After cluster formation, GET /_membership returns the list of all nodes in the ring. Nodes are identified by their Erlang node name, typically couchdb@hostname. The _node/_local shortcut always refers to the node receiving the request, which is convenient in scripting.

Which CouchDB API endpoint provides a guided wizard for forming or extending a CouchDB cluster?
What does the shortcut _node/_local refer to in the CouchDB _node API?
27. How does CouchDB handle replication conflicts and what strategies exist to resolve them?

A replication conflict in CouchDB occurs when two nodes have independently updated the same document (same _id) and neither update knew about the other. This is the normal result of multi-master or offline-sync workflows — it is not an error, it is an expected state that the application must handle.

CouchDB stores both conflicting revisions in the database. The document is still readable via its _id, but one revision is designated the winning revision (see Q28 for the algorithm). You can see all conflicting revisions by requesting ?conflicts=true:

# Detect a conflict
curl "http://localhost:5984/mydb/doc1?conflicts=true"
# {
#   "_id":"doc1","_rev":"3-winner...","name":"Alice",
#   "_conflicts":["3-loser..."]
# }

# Fetch the losing revision
curl "http://localhost:5984/mydb/doc1?rev=3-loser..."

# Resolution strategy: pick winning revision, DELETE the losing one
curl -X DELETE "http://localhost:5984/mydb/doc1?rev=3-loser..."

# OR: merge both and save merged version, then DELETE loser
curl -X PUT http://localhost:5984/mydb/doc1 \
  -d '{"_rev":"3-winner...","name":"Alice Smith","merged":true}'
curl -X DELETE "http://localhost:5984/mydb/doc1?rev=3-loser..."

Common resolution strategies:

  • Last-write-wins — keep the winning revision (already done automatically), delete losers. Simple but may discard valid changes.
  • Application-level merge — read both revisions, merge fields using domain logic (e.g., take the higher stock count), write the merged result as the new winning revision, delete the loser.
  • Conflict-free design — model data to avoid conflicts: use separate documents per event (append-only log) instead of updating a shared document; use CouchDB's _local documents for per-device state that does not replicate.
Which query parameter do you add to a CouchDB GET request to see all conflicting revisions of a document?
What is the conflict-free design pattern for CouchDB documents that are updated frequently by multiple nodes?
28. What is the CouchDB winning revision algorithm for conflict resolution?

When CouchDB has two or more conflicting revisions of the same document, it must deterministically pick one as the winning revision — the one returned by a normal GET request without ?conflicts=true. The algorithm is deterministic so that all cluster nodes independently arrive at the same winner without any coordination.

The winning revision is chosen by these rules in order:

  1. Prefer non-deleted revisions over deleted ones. A live document always beats a tombstone, regardless of generation count. This prevents a delete from silently "winning" over an edit that arrived at a replica later.
  2. Among non-deleted (or all-deleted) revisions, prefer the one with the higher generation number (the integer prefix in _rev). Generation 5 beats generation 3.
  3. If generation numbers tie, compare the revision hash strings lexicographically. The hash that sorts later (higher string order) wins. This is the tiebreaker of last resort and is arbitrary from a business logic perspective — which is why applications should not rely on the winning revision for semantically meaningful merges.

The consequence of this algorithm is that the winner is not necessarily the revision with the "most recent" wall-clock timestamp, nor the one with the most recent change. It is entirely possible for an older edit to "win" if its revision chain is longer. This is why CouchDB recommends application-level conflict detection and resolution rather than trusting the automatic winner for data that matters.

You can trigger a re-evaluation of which revision wins by deleting the current winner — the next-highest-generation conflict becomes the new winner automatically.

In CouchDB's winning revision algorithm, what happens when one conflicting revision is deleted (tombstone) and the other is a live document?
When two conflicting CouchDB revisions have the same generation number, what is the final tiebreaker?
29. What is PouchDB and how does it enable offline-first applications with CouchDB sync?

PouchDB is an open-source JavaScript database that runs entirely inside the browser (using IndexedDB or WebSQL as the local storage backend) or in Node.js (using LevelDB). It implements the CouchDB replication protocol, which means it can sync bidirectionally with any CouchDB-compatible server — including Apache CouchDB and IBM Cloudant — using the same HTTP-based protocol.

The offline-first pattern works as follows:

  1. The browser app reads and writes to the local PouchDB instance — always available, zero latency, no network required.
  2. When connectivity is available, PouchDB syncs to the remote CouchDB server using the replication protocol: it pushes local changes and pulls remote changes.
  3. If two users edited the same document while offline, PouchDB surfaces the conflict the same way CouchDB does, and the application resolves it.
// Create a local PouchDB database
const localDB = new PouchDB('myapp');

// Write offline — works even without network
await localDB.put({ _id: 'order:001', type: 'order', total: 99.99 });

// Set up continuous two-way sync when online
const sync = localDB.sync('https://mycouch.example.com/myapp', {
  live: true,       // continuous
  retry: true,      // reconnect automatically on network failure
  filter: 'myapp/user_docs',   // optional: sync only relevant docs
});

sync.on('change', (change) => console.log('Synced:', change));
sync.on('error', (err) => console.error('Sync error:', err));

PouchDB is the canonical choice for progressive web apps, React Native apps, and any scenario where users need to work offline and sync reliably. The CouchDB replication protocol's idempotent, checkpoint-based design means a sync can be interrupted and resumed without data loss or duplicates.

What local storage backend does PouchDB use when running inside a web browser?
What property of the CouchDB replication protocol ensures a PouchDB sync can be interrupted and resumed without data loss?
30. What is Couchbase Sync Gateway and how does it relate to CouchDB's replication model?

Couchbase Sync Gateway is the replication middleware layer in the Couchbase mobile stack. It sits between mobile clients running Couchbase Lite (the embedded mobile database) and a Couchbase Server cluster, handling authentication, authorization, and document routing. Historically it implemented a subset of the CouchDB replication protocol so that CouchDB-compatible clients could sync against it, but Couchbase has since moved toward its own DCP-based (Database Change Protocol) sync approach in newer versions.

The relationship to CouchDB's replication model:

  • Early versions of Couchbase Sync Gateway exposed a CouchDB-compatible REST API and replication endpoint. This meant PouchDB could sync to Sync Gateway using exactly the same protocol it uses with CouchDB.
  • Sync Gateway adds access control channels — each document is tagged with channels and each user is granted access to specific channels. This is a layer that CouchDB itself does not provide natively (CouchDB's access control is at the database level, not document level).
  • From Couchbase Mobile 3.x onward, Couchbase Lite uses a proprietary BLIP WebSocket protocol (not the CouchDB HTTP replication protocol) for sync, diverging from CouchDB compatibility.

For CouchDB users, Sync Gateway is mainly relevant as a comparison point: if you need per-document access control with mobile sync, Sync Gateway's channel model is a more mature solution than CouchDB's validate_doc_update-based approach. Pure CouchDB users achieve similar results by combining PouchDB sync with per-user databases or filtered replication.

What was the historical relationship between early Couchbase Sync Gateway and the CouchDB replication protocol?
What access control feature does Couchbase Sync Gateway provide that standard CouchDB replication lacks natively?
31. How does CouchDB implement authentication — cookie auth, JWT, and proxy auth?

CouchDB supports four authentication mechanisms, configurable simultaneously. Each request is checked against the enabled handlers in order.

1. Basic Authentication — HTTP Basic Auth over HTTPS. Credentials are sent with every request. Simple to implement but requires HTTPS in production to avoid credential exposure.

curl -u admin:password http://localhost:5984/_session

2. Cookie (Session) Authentication — the most common for web apps. POST credentials to /_session to receive a session cookie, then use that cookie for subsequent requests. The cookie has a configurable timeout.

# Login and get a session cookie
curl -X POST http://localhost:5984/_session \
  -H "Content-Type: application/json" \
  -d '{"name":"alice","password":"s3cret"}'
# Set-Cookie: AuthSession=abc123...; Version=1; Secure; HttpOnly

# Use the session
curl -b "AuthSession=abc123..." http://localhost:5984/mydb/_all_docs

# Logout
curl -X DELETE http://localhost:5984/_session -b "AuthSession=abc123..."

3. JWT Authentication (CouchDB 3.3+) — validates a JSON Web Token in the Authorization: Bearer {token} header. The JWT must contain a sub claim (the username) and optionally _couchdb.roles. CouchDB verifies the signature using a configured HMAC secret or RSA public key — it does not issue JWTs, only validates them.

[jwt_auth]
required_claims = exp
[jwt_keys]
hmac:default = aGVsbG93b3JsZA==

4. Proxy Authentication — for reverse-proxy setups (nginx, HAProxy). The proxy authenticates the user externally and forwards the identity in headers (X-Auth-CouchDB-UserName, X-Auth-CouchDB-Roles, X-Auth-CouchDB-Token). CouchDB trusts the headers if the correct HMAC token is present.

Which CouchDB endpoint do you POST to in order to establish a session and receive a cookie?
In CouchDB JWT authentication, does CouchDB issue the JWT token to the client?
32. What is CouchDB's permission model — admin party, database admins, and database readers?

CouchDB has a two-tier permission hierarchy: server-level admins and database-level members. Understanding each tier and the dangerous default state ("admin party") is essential before deploying any CouchDB instance.

Admin Party — when CouchDB is first installed, there are no server admins configured. In this state, every request (including anonymous HTTP calls) has full admin privileges. This is the admin party. You must immediately create at least one server admin via Fauxton or the API to exit admin party mode. CouchDB 3.x requires an admin to be set during installation and will not start without one.

# Create the first server admin (exits admin party)
curl -X PUT http://localhost:5984/_node/_local/_config/admins/admin \
  -d '"mys3cretpass"'

Server Admins — stored in the CouchDB config file (not the _users database). They can create/delete databases, manage all users, and access all databases. There is no per-database restriction for server admins.

Database-level Security — set via the _security document on each database. Contains two lists:

  • admins — users and roles that can write design documents and manage the database's security settings.
  • members — users and roles that can read and write regular documents. If the members list is empty, the database is public-read.

Regular users are stored in the _users database as documents with IDs like org.couchdb.user:{username}. Roles are arbitrary strings assigned to users and checked against the database security object.

What is "admin party" in CouchDB?
Where are CouchDB server admin credentials stored?
33. How do you implement document-level security in CouchDB using validate_doc_update functions?

The validate_doc_update (VDU) function is a JavaScript function stored in a design document that CouchDB calls before every document write to that database. If the function throws an error, the write is rejected with the specified HTTP status and message. This is the primary mechanism for enforcing document-level business rules and security policies.

// In _design/security:
{
  "validate_doc_update": "function(newDoc, oldDoc, userCtx, secObj) {
    // Reject if not logged in
    if (!userCtx.name) {
      throw({ unauthorized: 'You must be logged in to write documents.' });
    }
    // Enforce required fields
    if (!newDoc.type) {
      throw({ forbidden: 'Documents must have a type field.' });
    }
    // Prevent changing the owner field after creation
    if (oldDoc && oldDoc.owner !== newDoc.owner) {
      throw({ forbidden: 'Cannot change document owner.' });
    }
    // Only admins can set status to archived
    if (newDoc.status === 'archived' && userCtx.roles.indexOf('_admin') === -1) {
      throw({ forbidden: 'Only admins can archive documents.' });
    }
  }"
}

The function receives four arguments:

  • newDoc — the document being written (the new version).
  • oldDoc — the existing document (null if this is a new document creation).
  • userCtx — the user context: { name, roles, db }. Roles include _admin for server admins and _reader, _writer, or custom roles from the user's profile.
  • secObj — the database's _security object.

Throw { unauthorized: "message" } to return HTTP 401 (authentication required). Throw { forbidden: "message" } to return HTTP 403 (permission denied). Any other JavaScript throw returns HTTP 500.

What HTTP status code does CouchDB return when a validate_doc_update function throws { forbidden: "..." }?
What is the value of oldDoc in a validate_doc_update function when a brand-new document is being created?
34. What is a CouchDB _security object and how do you configure roles and members?

The _security object is a special document stored at /db/_security. It defines which users and roles can act as admins (write design documents, change security) or members (read and write regular documents) for that specific database. Every database has one.

{
  "admins": {
    "names": ["alice", "bob"],
    "roles": ["db_admin_role"]
  },
  "members": {
    "names": ["charlie"],
    "roles": ["viewer", "editor"]
  }
}
# Set the _security object
curl -X PUT http://admin:pass@localhost:5984/mydb/_security \
  -H "Content-Type: application/json" \
  -d '{
    "admins":  { "names": ["alice"], "roles": ["db_admin_role"] },
    "members": { "names": [],        "roles": ["editor","viewer"] }
  }'

# Read the current _security object
curl http://admin:pass@localhost:5984/mydb/_security

Key behaviors:

  • If the members list is empty (both names and roles), the database is readable by any authenticated user or even anonymously (public database).
  • Server admins bypass the _security object entirely — they always have full access to every database.
  • Roles are arbitrary strings. They are assigned to users in the _users database under the roles array in the user document. CouchDB does not provide a built-in role management UI; roles are managed by updating user documents.
  • Only server admins and database admins can modify the _security object.
What access is granted when the members list in a CouchDB _security object is completely empty?
Where are roles assigned to a CouchDB user?
35. How do you enable SSL/TLS in CouchDB and what configuration is required?

CouchDB has a built-in HTTPS listener that can be enabled by adding a [ssl] section to the CouchDB configuration (local.ini or local.d/*.ini). No reverse proxy is required for basic TLS, though using nginx in front is common in production for certificate management and connection pooling.

[ssl]
enable = true
port = 6984
cert_file = /etc/couchdb/ssl/couchdb.pem
key_file  = /etc/couchdb/ssl/privkey.pem
# Optional: require client certificates
cacert_file = /etc/couchdb/ssl/cacert.pem
verify_ssl_certificates = false

# Restrict to strong cipher suites
ssl_options = [{secure_renegotiate, true}]

Configuration steps:

  1. Generate or obtain a certificate and private key (Let's Encrypt, self-signed, or a commercial CA).
  2. Place the PEM files in a directory readable by the CouchDB process (but not world-readable).
  3. Add the [ssl] section to local.ini. CouchDB listens on port 6984 for HTTPS by default (the plaintext port 5984 continues to work unless you disable it).
  4. Restart CouchDB and verify: curl https://localhost:6984/
  5. In production, disable the plaintext listener by setting [chttpd] bind_address = 127.0.0.1 and routing all external traffic through the HTTPS port or a TLS-terminating reverse proxy.

For clustered setups, TLS should be configured both for client-facing traffic and for node-to-node replication traffic. The inter-node Erlang distribution channel can be secured using Erlang TLS distribution, though this requires additional Erlang configuration beyond the CouchDB config file.

What is the default HTTPS port CouchDB listens on when SSL is enabled?
What is the recommended way to prevent external plaintext HTTP access in a CouchDB production deployment?
36. How do you monitor CouchDB performance using the _stats and _active_tasks endpoints?

CouchDB exposes two key monitoring endpoints out of the box — /_stats and /_active_tasks — which together give a real-time snapshot of server health and ongoing operations.

GET /_stats returns a JSON object of cumulative performance counters organized by category. Key metrics to watch:

  • httpd.requests.value — total HTTP requests processed.
  • httpd_request_methods.{GET,PUT,POST,DELETE}.value — breakdown by HTTP verb.
  • httpd_status_codes.{200,201,400,404,409,500}.value — response code breakdown; rising 409s may indicate conflict storms; rising 500s indicate bugs.
  • couchdb.open_databases.value — number of databases currently open (compare to max_dbs_open).
  • couchdb.request_time.value — mean, min, max request latency.
curl http://admin:pass@localhost:5984/_stats | python3 -m json.tool | head -60

# On a cluster, query per node:
curl http://admin:pass@localhost:5984/_node/couchdb@node1/_stats

GET /_active_tasks returns a live array of currently running background tasks. Each task has a type field:

  • database_compaction — compaction progress as a percentage.
  • view_compaction — view index compaction progress.
  • indexer — a view index being built or incrementally updated.
  • replication — active replication job with checkpoint seq and docs per second.
curl http://admin:pass@localhost:5984/_active_tasks
# [{"type":"indexer","node":"couchdb@node1","design_document":"_design/orders",
#   "view":"by_status","started_on":1700000000,"updated_on":1700000010,
#   "progress":45}]

For production monitoring, both endpoints integrate with Prometheus via the community couchdb-exporter, allowing dashboards in Grafana alongside alerting on queue depth, error rates, and compaction lag.

Which _stats metric would you watch to detect an unusual number of write conflicts in CouchDB?
What task type in _active_tasks represents an incremental view index update?
37. What are the key CouchDB configuration parameters to tune for production (max_dbs_open, os_process_limit, etc.)?

CouchDB's default configuration targets a single-developer workstation. Production deployments require tuning several parameters across different configuration sections:

Key CouchDB Production Configuration Parameters
Section / KeyDefaultWhat it controls
[couchdb] max_dbs_open500Maximum number of database files open simultaneously. Each open database holds a file descriptor. Increase for servers with many databases; ensure OS ulimits allow it.
[couchdb] os_process_limit100Maximum JavaScript OS processes for the query server (views, VDU). Each concurrent JavaScript request consumes one process. Increase for high-concurrency view workloads.
[chttpd] workers100HTTP request handler pool size. Increase for high concurrent request rates.
[couch_httpd_auth] timeout600Session cookie timeout in seconds.
[smoosh] *variousAuto-compaction daemon thresholds. min_priority controls when a database qualifies for compaction based on data/file size ratio.
[rexi] buffer_count2000Internal message buffer for cluster inter-node RPC. Increase if you see rexi_buffer errors in logs.
[fabric] request_timeout60000msTimeout for cluster-level requests. Increase for slow queries over large datasets.

OS-level tuning is equally important: set ulimit -n to at least 65535 for file descriptors (each open database + each view index file counts). On Linux, set vm.swappiness=1 to prevent Erlang heap from being swapped. For high write throughput, ensure the storage device has noatime mount option to avoid inode update I/O on every read.

What does the CouchDB [couchdb] max_dbs_open parameter control?
What resource does each active CouchDB JavaScript query server process consume that requires os_process_limit tuning?
38. How does CouchDB handle large document sets — what are the performance trade-offs of large vs many small documents?

CouchDB does not have a hard document size limit (the default max_document_size is 4GB), but the performance trade-offs between storing data as a small number of large documents versus many small documents are significant.

Large documents (e.g., one document per entity with thousands of nested items):

  • Every update requires a full rewrite of the document, even if only one nested field changed. This amplifies write I/O and revision chain growth.
  • Replication transfers the entire document body on every change. For a 5MB document that changes frequently, this saturates replication bandwidth quickly.
  • MVCC conflicts are more likely and more costly to merge because the entire body must be transferred and compared.
  • Reading the document always loads the full JSON, even if only one field is needed (CouchDB has no projection at the storage layer — Mango fields projection happens after the document is loaded).

Many small documents (one document per event/record):

  • Updates are small and cheap; conflicts affect only the specific document touched.
  • Replication is incremental — only changed documents are transferred.
  • Mango and view queries can filter and paginate efficiently.
  • The trade-off: each document has overhead (~200 bytes for metadata). A database with 100 million tiny 50-byte documents will have metadata overhead larger than the data itself.

The recommended pattern: keep documents to a natural entity size (an order with its line items, not an order with the entire customer history). Avoid designs that require updating a single document at a rate faster than ~100 writes/second — high-frequency counters belong in Redis, not in a CouchDB document.

Why is storing frequently-updated data as one large CouchDB document problematic for replication?
What is the primary trade-off when splitting data into millions of tiny CouchDB documents?
39. What is the CouchDB _changes feed and how do you use it for real-time event streaming?

The _changes feed is CouchDB's built-in event stream. It reports every document change (create, update, delete) in a database as a sequence of events, each with a sequence number (seq), document ID (id), list of changed revisions (changes), and optionally the full document body. It is the mechanism that powers replication and can also drive event-driven application architectures.

# One-shot: get all changes since the beginning
curl "http://admin:pass@localhost:5984/mydb/_changes"

# Long-polling: block until at least one change arrives
curl "http://admin:pass@localhost:5984/mydb/_changes?feed=longpoll&since=now"

# Continuous streaming feed (server-sent events style)
curl "http://admin:pass@localhost:5984/mydb/_changes?feed=continuous&since=now&heartbeat=5000"

# Include full document body in each change event
curl "http://admin:pass@localhost:5984/mydb/_changes?feed=continuous&include_docs=true&since=now"

# Resume from a checkpoint (since= is the last seq you processed)
curl "http://admin:pass@localhost:5984/mydb/_changes?feed=continuous&since=45-g1AAAA..."

# Filter by Mango selector (2.x+)
curl -X POST "http://admin:pass@localhost:5984/mydb/_changes?feed=continuous&since=now" \
  -H "Content-Type: application/json" \
  -d '{"selector":{"type":"order","status":"pending"}}'

Each event looks like:

{"seq":"46-g1AAAA...","id":"order:001","changes":[{"rev":"3-abc"}]}

The seq value is your cursor: always persist the last-processed seq to your consumer state store so you can resume without reprocessing. The continuous feed sends a newline heartbeat at regular intervals to keep HTTP connections alive through proxies. Eventsource (feed=eventsource) wraps changes in the Server-Sent Events format for direct browser consumption.

Which _changes feed parameter value creates a persistent streaming connection that continuously delivers events as they occur?
Why should a _changes consumer always persist the last-processed seq value?
40. What are CouchDB update handlers and how do they differ from direct PUT operations?

Update handlers are server-side JavaScript functions stored in a design document under the updates key. They allow you to perform document transformations atomically on the server without a client round-trip — the client sends a POST request, and the update handler reads the current document, applies business logic, and returns the modified document in a single operation.

// In _design/handlers:
{
  "updates": {
    "increment_stock": "function(doc, req) {
      if (!doc) { doc = { _id: req.id, stock: 0, type: 'item' }; }
      var body = JSON.parse(req.body);
      doc.stock = (doc.stock || 0) + (body.amount || 1);
      doc.last_updated = new Date().toISOString();
      return [doc, toJSON({ ok: true, new_stock: doc.stock })];
    }"
  }
}
# Call the update handler
curl -X POST \
  "http://admin:pass@localhost:5984/mydb/_design/handlers/_update/increment_stock/item:001" \
  -H "Content-Type: application/json" \
  -d '{"amount": 5}'
# {"ok":true,"new_stock":155}

Differences from a direct PUT:

  • No client round-trip — the client does not need to first GET the document to read the current _rev and current field values; the handler receives both and returns the updated document.
  • Atomic transformation — the read, compute, and write happen within a single server-side operation, reducing MVCC conflict probability for frequently-updated counters or timestamps.
  • Custom response body — the handler can return any JSON in the response, not just the standard {"ok":true,"rev":...}.
  • When not to use them — update handlers are deprecated in CouchDB 3.x (along with list and show functions). The same logic implemented in your application using a read-modify-write loop is more maintainable and testable.
What is the main operational advantage of a CouchDB update handler over a direct client PUT for frequently updated fields?
What does the update handler function return to CouchDB?
41. What are CouchDB show functions and when were they deprecated?

Show functions are server-side JavaScript functions stored in a design document under the shows key. They transform a single document into any output format (HTML, XML, plain text) directly from CouchDB, without requiring a separate application server. When a client calls GET /db/_design/ddoc/_show/func_name/doc_id, CouchDB fetches the document, passes it to the show function, and returns the function's output as the HTTP response.

// In _design/render:
{
  "shows": {
    "as_html": "function(doc, req) {
      if (!doc) { return { code: 404, body: 'Not found' }; }
      return {
        headers: { 'Content-Type': 'text/html' },
        body: '

' + doc.name + '

' + doc.description + '

' }; }" } }
GET /mydb/_design/render/_show/as_html/product:001

Show functions were primarily used in CouchApps — self-contained web applications where HTML pages, CSS, JavaScript, and data were all served from a single CouchDB database. The appeal was zero-infrastructure: the database was the entire application stack. Show functions rendered individual documents as HTML pages; list functions (Q20) rendered view query results.

Deprecation: Show functions were officially deprecated in CouchDB 3.0 (2021) along with list functions and the legacy JavaScript-based rewrites system. They are disabled by default in CouchDB 3.x and will be removed in a future major version. The recommended approach is to handle document rendering in your application layer — any web framework can fetch a document via the REST API and render it. The JavaScript query server overhead, security isolation challenges, and limited debugging tooling made CouchApps impractical at scale.

In which CouchDB version were show functions officially deprecated?
What application architecture pattern relied heavily on CouchDB show and list functions?
42. How do you back up and restore a CouchDB database?

CouchDB does not have a dedicated backup command like mysqldump. The recommended backup approaches depend on your deployment type and RPO requirements:

1. Replication-based backup (recommended for live systems) — replicate the database to a dedicated backup CouchDB instance (local or remote). Because CouchDB replication is idempotent and incremental, subsequent backup runs only copy changed documents. Schedule it via the _replicator database or a cron job calling /_replicate.

# One-shot backup to a backup server
curl -X POST http://admin:pass@localhost:5984/_replicate \
  -H "Content-Type: application/json" \
  -d '{
    "source": "http://localhost:5984/production_db",
    "target": "http://backup:5984/production_db_backup_2024_03",
    "create_target": true
  }'

2. File-system snapshot — stop CouchDB (or freeze I/O via OS-level snapshot), copy the .couch database files from the data directory (/var/lib/couchdb/ on Linux), then restart. Simple but requires downtime or snapshot coordination.

3. couchdb-backup / couchdbdump tools — community tools like couchdb-backup or couchdbdump serialize all documents to a JSON/ndjson file using the _all_docs or _changes endpoint.

# Dump all documents to ndjson using the _all_docs feed
curl "http://admin:pass@localhost:5984/mydb/_all_docs?include_docs=true" \
  | python3 -c "import sys,json; [print(json.dumps(r['doc'])) for r in json.load(sys.stdin)['rows'] if not r['id'].startswith('_design')]" \
  > mydb_backup_$(date +%F).ndjson

# Restore by posting each line to _bulk_docs
jq -sc '{docs:.}' mydb_backup_2024-03-15.ndjson | \
  curl -X POST http://admin:pass@localhost:5984/mydb_restore/_bulk_docs \
  -H "Content-Type: application/json" -d @-

For clustered deployments, replicate per-database since there is no single database file to snapshot. Always verify backups by doing a test restore periodically.

What is the recommended backup approach for a live CouchDB production database?
What CouchDB endpoint is most commonly used to export all documents for a JSON-file backup?
43. How does CouchDB compare to MongoDB for document storage use cases?

Both CouchDB and MongoDB are JSON document databases, but they make fundamentally different architectural choices that determine where each excels.

CouchDB vs MongoDB for Document Storage
AspectCouchDBMongoDB
Query interfaceHTTP REST — any HTTP client; Mango JSON queriesMongoDB wire protocol; requires language driver
Query powerMango (limited) + MapReduce; no aggregation pipelineRich aggregation pipeline; $lookup (join); text search; geospatial
Replication / syncFirst-class HTTP peer-to-peer; PouchDB offline-firstReplica sets; change streams; no offline-first protocol
Conflict handlingMulti-master conflicts surfaced and resolved by appReplica set — single primary; no write conflicts by design
TransactionsACID per document; no multi-document transactionsACID multi-document, multi-collection transactions (4.0+)
Write throughputLower — fsync per write; append-only B-treeHigher — WiredTiger storage with group commit
Mobile / offline syncExcellent — PouchDB is production-gradeAtlas Device Sync (commercial); no free equivalent
Deployment simplicitySingle binary, zero dependenciesMore complex; requires mongod + replica set for production HA

Choose CouchDB when the offline-first / mobile sync use case is central, when HTTP-native access matters (IoT, edge devices, no-driver environments), or when you need a simple embedded-friendly document store. Choose MongoDB when you need a rich aggregation pipeline, multi-document ACID transactions, geospatial queries, or high write throughput at scale.

Which feature gives MongoDB a significant query advantage over CouchDB for complex analytical queries?
For which primary use case is CouchDB clearly superior to MongoDB?
44. What are common CouchDB anti-patterns and how do you avoid them?

Several CouchDB anti-patterns cause performance degradation, excessive conflicts, or runaway disk usage. Understanding them helps you design applications that work with CouchDB's architecture rather than against it.

  • High-frequency counter documents — updating a single document hundreds of times per second (e.g., a page-view counter) creates a conflict storm and exponential revision chain growth. Solution: batch counter increments, use a reduce view for aggregation, or keep counters in Redis.
  • Using skip for deep paginationskip=10000&limit=20 forces a full scan of 10000 index rows per page load. Solution: use key-cursor pagination with startkey + startkey_docid.
  • Querying views without indexes — running Mango _find queries without a matching json index causes full database scans. Always create a Mango index for fields used in production selectors and verify with _explain.
  • Storing large binaries as attachments — attachments over ~1 MB bloat the database file and slow compaction. Use an object store (S3, MinIO) and store the URL in the document.
  • Changing design documents frequently — every design document change triggers a full view index rebuild. In high-write databases this causes prolonged indexing load. Batch design document changes; test index changes in staging with realistic data volumes before deploying.
  • Never running compaction — an update-heavy database that is never compacted can grow to 10-100x the size of its live data. Configure the smoosh auto-compaction daemon or schedule nightly compaction.
  • Ignoring unresolved conflicts — conflicts from multi-master replication accumulate silently. Build a conflict-detection routine into your application and resolve them regularly.
What is the recommended solution for a CouchDB document that needs to track a high-frequency counter (hundreds of increments per second)?
What operational consequence follows from frequently updating a CouchDB design document in a high-write production database?
45. How do you migrate data between CouchDB versions or instances?

CouchDB provides several migration paths depending on whether you are upgrading in place, moving to a new cluster, or changing data structure during migration.

1. Replication-based migration (zero-downtime, recommended)

# Step 1: Replicate from old instance to new
curl -X POST http://admin:pass@new-couch:5984/_replicator \
  -H "Content-Type: application/json" \
  -d '{
    "_id": "migrate-orders",
    "source": "http://admin:pass@old-couch:5984/orders",
    "target": "http://admin:pass@new-couch:5984/orders",
    "continuous": true,
    "create_target": true
  }'

# Step 2: Monitor until caught up
curl http://admin:pass@new-couch:5984/_scheduler/docs

# Step 3: Stop writes to old instance, verify new instance is current
# (compare document counts and last seq numbers)

# Step 4: Switch application connection string to new instance
# Step 5: Stop the replication job and decommission old instance

2. In-place upgrade (CouchDB 1.x to 2.x/3.x) — CouchDB 2.x reads CouchDB 1.x database files directly (the on-disk format is forward-compatible). Install the new version over the old one, point it at the same data directory. However, the cluster setup and configuration format changed significantly — review the upgrade guide for your specific version pair.

3. Document transformation during migration — if the schema changes (adding required fields, renaming fields), write a migration script that reads from the source using _all_docs or _changes, transforms each document, and writes to the target via _bulk_docs. Process in batches of 100-500 documents to avoid memory pressure.

# Estimate progress: compare total_rows on both sides
curl http://admin:pass@old:5984/orders/ | python3 -c "import sys,json; d=json.load(sys.stdin); print('old:', d['doc_count'])"
curl http://admin:pass@new:5984/orders/ | python3 -c "import sys,json; d=json.load(sys.stdin); print('new:', d['doc_count'])"

Always verify the migration by comparing document counts, running a sample of queries on both instances, and doing a test cutover before the production switch.

Why is replication-based migration preferred over file-copy migration for moving between CouchDB instances?
What is the recommended batch size for writing transformed documents to the target during a CouchDB schema-migration script?
«
»
MuleESB

Comments & Discussions