Database / CouchDB Interview Questions
Apache CouchDB is an open-source NoSQL document database that stores all data as self-contained JSON documents and exposes its entire API over plain HTTP/HTTPS. No proprietary wire protocol or special client driver is required. It was created by Damien Katz, open-sourced in 2005, and graduated to an Apache Software Foundation top-level project in 2008. The 3.x line introduced a native clustered mode built on the existing Erlang OTP foundation.
Three properties set CouchDB apart from relational databases:
- Schema-free JSON documents — there are no tables or fixed column definitions. Each document in a database can have a completely different structure, and adding a field to one document has zero effect on any other.
- HTTP as the primary interface — every operation (CRUD, querying, replication, admin) is a plain HTTP request. You can interact with CouchDB using curl, a browser, or any HTTP library without installing a database driver.
- Built-in, protocol-level replication — CouchDB's replication is a peer-to-peer HTTP protocol where any node can replicate to or from any other node, supporting master-master setups, offline sync, and mobile clients natively.
Relational databases enforce ACID across multi-row, multi-table transactions using row-level locks and a shared transaction log. CouchDB provides ACID at the single-document level through MVCC and an append-only B-tree engine. There are no JOINs; relationships are denormalized or referenced by document ID.
CouchDB uses a document data model: all data is stored as discrete JSON objects grouped into databases. There is no enforced schema — documents in the same database can have entirely different fields.
Every CouchDB document has two mandatory system fields:
_id— the unique primary key. If omitted, CouchDB generates a UUID. Typed prefix conventions like"order:2024-001"co-locate related documents in the B-tree._rev— the current revision token in the format{generation}-{md5hash}. You must supply the current_revon every update or delete.
{
"_id": "order:2024-00188",
"_rev": "2-7d3a9f012b4e8c56ab1d2ef3",
"type": "order",
"customer_id": "user:42",
"items": [
{ "sku": "WIDGET-01", "qty": 3, "price": 9.99 },
{ "sku": "GADGET-07", "qty": 1, "price": 49.00 }
],
"total": 78.97,
"status": "pending",
"created_at": "2024-03-15T08:22:00Z",
"_attachments": {
"invoice.pdf": {
"content_type": "application/pdf",
"length": 48312,
"stub": true
}
}
}
Values can be any valid JSON type: strings, numbers, booleans, arrays, or nested objects. Binary data is stored as attachments under the reserved _attachments key. Special system documents — design documents (_design/) and local documents (_local/) — live in the same database namespace but are handled differently by the replication engine.
CouchDB maps every database operation to a standard HTTP method and URL. The server root is typically http://localhost:5984. No driver installation is required — curl or any HTTP client works directly.
# Create a database
curl -X PUT http://admin:pass@localhost:5984/inventory
# {"ok":true}
# Create a document with a specific _id (PUT)
curl -X PUT http://admin:pass@localhost:5984/inventory/item:001 \
-H "Content-Type: application/json" \
-d '{"type":"item","name":"Widget","stock":150}'
# {"ok":true,"id":"item:001","rev":"1-3c6a8..."}
# Create a document with auto-generated _id (POST)
curl -X POST http://admin:pass@localhost:5984/inventory \
-H "Content-Type: application/json" \
-d '{"type":"item","name":"Gadget","stock":30}'
# Read a document
curl http://admin:pass@localhost:5984/inventory/item:001
# Returns JSON with _id and _rev
# Update — _rev is mandatory in the body
curl -X PUT http://admin:pass@localhost:5984/inventory/item:001 \
-H "Content-Type: application/json" \
-d '{"_rev":"1-3c6a8...","type":"item","name":"Widget","stock":200}'
# {"ok":true,"id":"item:001","rev":"2-9b4f1..."}
# Delete — pass _rev as query param
curl -X DELETE "http://admin:pass@localhost:5984/inventory/item:001?rev=2-9b4f1..."
# {"ok":true,"id":"item:001","rev":"3-d2e79..."}
# Bulk upsert
curl -X POST http://admin:pass@localhost:5984/inventory/_bulk_docs \
-H "Content-Type: application/json" \
-d '{"docs":[{"type":"item","name":"Part-A"},{"type":"item","name":"Part-B"}]}'
HTTP status codes follow REST conventions: 201 Created on success, 200 OK for reads, 404 Not Found, and 409 Conflict when the supplied _rev does not match the server's current revision. The _bulk_docs endpoint accepts thousands of documents per request, dramatically reducing round-trips for bulk loads.
MVCC in CouchDB means every write produces a new immutable version of the document instead of modifying data in place. Readers always see a consistent snapshot from the moment they start reading; no read locks are acquired. The append-only B-tree storage engine keeps old revisions on disk until compaction removes them.
The conflict mechanism works like this: when two concurrent writers both read a document at revision 2-abc and both try to PUT with _rev: "2-abc", only the first writer to reach the storage engine succeeds. The second receives HTTP 409 Conflict immediately.
# Both clients read revision 2-abc...
# Client A succeeds:
curl -X PUT http://localhost:5984/db/doc1 \
-d '{"_rev":"2-abc","value":10}'
# 201: {"ok":true,"rev":"3-xyz"}
# Client B fails — same _rev already superseded:
curl -X PUT http://localhost:5984/db/doc1 \
-d '{"_rev":"2-abc","value":20}'
# 409: {"error":"conflict","reason":"Document update conflict."}
The standard resolution is a read-modify-write retry loop: on 409, re-read the document to get the latest _rev, apply the business logic to the fresh body, and retry the PUT. This is optimistic locking enforced at the protocol level.
In multi-master replication scenarios, two nodes can independently accept writes to the same document. These produce open conflicts stored as sibling revisions, visible via ?conflicts=true. The application must explicitly merge or discard the losing revision to resolve them (see Q27).
The _rev field is CouchDB's revision token — a unique identifier for a specific version of a document. Its format is {generation}-{hash} where generation is a monotonically increasing integer (starting at 1) and hash is an MD5 of the document body. Example: "1-967a00dff5e02add41819138abb3284d". After one update it becomes something like "2-7051cbe5c8faecd085a3fa619e6e6337".
Why it is mandatory for updates and deletes: The _rev is the MVCC optimistic-lock token. CouchDB compares the supplied _rev against what the storage engine holds. A match means the write is based on the current state — the operation proceeds and a new revision is assigned. A mismatch means another writer updated the document since you read it — CouchDB returns HTTP 409 Conflict, preventing silent data loss.
# Read — always capture the returned _rev
curl http://localhost:5984/db/doc1
# {"_id":"doc1","_rev":"1-abc","name":"Alice"}
# Update: supply the exact current _rev in the body
curl -X PUT http://localhost:5984/db/doc1 \
-H "Content-Type: application/json" \
-d '{"_rev":"1-abc","name":"Alice Smith"}'
# Response: {"ok":true,"rev":"2-def"}
# Delete: supply _rev as a query parameter
curl -X DELETE "http://localhost:5984/db/doc1?rev=2-def"
# Writes a tombstone: {"_id":"doc1","_rev":"3-xyz","_deleted":true}
Deleting a document does not remove it physically. CouchDB writes a tombstone — a minimal document with _deleted: true at the next revision — so the deletion event replicates correctly to other nodes. Tombstones are only removed by the _purge API, which bypasses the replication system and should be used with care.
CouchDB stores each database as a single file on disk structured around an append-only B-tree. There are multiple B-trees per database file: one for documents and one for each view index. Every write — new document, updated revision, or index update — is appended to the end of the file. The existing bytes are never modified in place. New B-tree nodes are written at the end, and a small database header near the end of the file is atomically updated to point to the new B-tree root.
Three important consequences of this design:
- Crash safety without a WAL — a crash mid-write at most leaves an incomplete append at the tail. On restart, CouchDB scans backward for the last valid database header and discards any partial write. No separate Write-Ahead Log is needed.
- No read locks — the previous B-tree root remains valid until the header atomically advances. Concurrent readers always see a consistent snapshot, which is the physical basis of MVCC.
- Simple fsync durability — CouchDB calls fsync after writing each committed transaction before returning 201 to the client, guaranteeing data is on stable storage.
The trade-off: the file grows with every write because old revisions accumulate as unreachable B-tree nodes. This is why compaction is essential in write-heavy workloads. In CouchDB 3.x each shard of a clustered database is its own append-only file following the same model.
Database compaction rewrites a CouchDB database file from scratch, retaining only the current (winning) revision of each document and discarding all stale revisions and orphaned B-tree nodes. Because CouchDB uses an append-only storage engine, every update grows the file. A database with millions of updates can be orders of magnitude larger than the size of its live data. Compaction reclaims that space.
When to run compaction:
- After a bulk data migration or large import that produced deep revision chains.
- When disk usage grows significantly faster than the document count (high update churn).
- On a regular nightly schedule in write-heavy production systems.
- CouchDB 2.x+ supports automatic compaction triggers via
smoosh(the built-in compaction daemon) which fires when the ratio of live data to total file size drops below a configurable threshold.
# Manually trigger compaction on a database
curl -X POST http://admin:pass@localhost:5984/mydb/_compact
# {"ok":true}
# Compact view indexes of a specific design document
curl -X POST http://admin:pass@localhost:5984/mydb/_compact/my_ddoc
# {"ok":true}
# Monitor progress
curl http://admin:pass@localhost:5984/_active_tasks
# Shows compaction tasks with "progress" percentage
During compaction the database stays fully online — CouchDB continues serving reads and writes from the old file and atomically switches to the new file when compaction finishes. View compaction is separate from document compaction; each design document's index has its own compaction command.
Attachments in CouchDB are binary blobs stored directly alongside a document under the reserved _attachments key. Each attachment has a filename, a MIME content type, byte length, and an MD5 digest. They are stored in the same database file as the document but transferred separately — a GET on the document returns only metadata stubs by default, not the binary payload.
# Attach a PDF to an existing document (must supply current _rev)
curl -X PUT \
"http://admin:pass@localhost:5984/contracts/contract:1001/agreement.pdf?rev=2-abc" \
-H "Content-Type: application/pdf" \
--data-binary @agreement.pdf
# {"ok":true,"id":"contract:1001","rev":"3-xyz"}
# Fetch just the raw binary
curl http://admin:pass@localhost:5984/contracts/contract:1001/agreement.pdf
# Inline all attachment data in the document response
curl "http://admin:pass@localhost:5984/contracts/contract:1001?attachments=true"
Good use cases for attachments:
- Small binary files that must replicate alongside their parent document — thumbnails, QR codes, small PDFs.
- Offline-first mobile apps using PouchDB where images must sync alongside document metadata.
When to avoid them: Each attachment write bumps the document's _rev, making concurrent updates prone to 409 conflicts. Large attachments (over ~1 MB) bloat the database file and increase compaction time significantly. For large media, store the file in an object store (S3, MinIO, Cloudflare R2) and keep only the URL in the CouchDB document.
CouchDB and Couchbase are two distinct products. CouchDB is an Apache project. Couchbase emerged from a 2011 merger of CouchDB and Membase (a Memcached-compatible store). They diverged sharply afterward and now target different use cases with different architectures.
| Aspect | Apache CouchDB | Couchbase Server |
|---|---|---|
| Primary use case | Offline-first sync, HTTP-native document store | High-performance operational database with caching |
| Query language | MapReduce views + Mango (MongoDB-style) | N1QL — SQL for JSON |
| Primary API | HTTP REST — no driver needed | Language SDKs (Java, .NET, Node.js, etc.) |
| Replication / mobile sync | HTTP peer-to-peer; PouchDB for offline-first | XDCR for cross-datacenter; Couchbase Lite + Sync Gateway for mobile |
| In-memory caching | None built in | Managed RAM cache (Memcached heritage) |
| Indexing | Incremental MapReduce B-tree; Mango JSON indexes | Global Secondary Indexes, Full-Text Search, Analytics Service |
| Licensing | Apache 2.0 — fully open source | Community Edition (OSS) + Enterprise (commercial) |
Pick CouchDB for a lightweight, HTTP-accessible document store with excellent offline/mobile sync via PouchDB. Pick Couchbase when you need sub-millisecond latency, high concurrent throughput, N1QL SQL analytics, or the integrated Couchbase Lite mobile platform.
CouchDB is an AP system — it prioritizes Availability and Partition tolerance over strict Consistency. When a network partition occurs, CouchDB nodes on either side continue accepting reads and writes rather than refusing requests to maintain linearizability. The result is that two nodes can hold diverged versions of the same document (called open conflicts) until replication heals the partition.
CouchDB's consistency model is eventual consistency: after a partition heals and replication runs, all nodes converge. The conflict resolution mechanism — the deterministic winning-revision algorithm plus application-level merge — is how convergence is achieved without a global coordinator.
The AP versus CP distinction surfaces in these specific scenarios:
- Multi-master replication — both nodes independently accept writes to the same document. The replication protocol syncs them and surfaces the conflict for application resolution.
- CouchDB 3.x cluster quorum settings — the write quorum
wand read quorumrdefault to a majority ofnreplicas. Raising both tonmakes the cluster refuse writes when a node is down, shifting behavior toward CP at the cost of availability.
This AP design is the reason CouchDB excels in offline-first and mobile applications via PouchDB: the mobile client writes locally (always available) and syncs to the server when connectivity returns, with conflicts resolved deterministically.
Design documents are special CouchDB documents whose IDs begin with _design/. They live in the same database as regular documents but hold server-side JavaScript code that CouchDB's query server executes. Updating a design document invalidates and rebuilds all its associated indexes.
A design document can contain the following sections:
views— MapReduce index definitions. Each view has amapfunction and an optionalreducefunction.indexes— Mango (json/text) index definitions for the_findendpoint.validate_doc_update— a JavaScript function that runs before any document is saved; throw an error to reject the write.filters— JavaScript functions used to filter which documents are replicated or streamed via the_changesfeed.updates— update handler functions that let you perform server-side document transformations via a POST request.listsandshows— legacy functions (deprecated in 3.x) for server-side rendering of view results and individual documents as HTML/XML/text.
{
"_id": "_design/orders",
"views": {
"by_status": {
"map": "function(doc){ if(doc.type==='order') emit(doc.status, doc.total); }",
"reduce": "_sum"
}
},
"validate_doc_update": "function(newDoc, oldDoc, userCtx){ if(!newDoc.type) throw({forbidden:'type required'}); }",
"filters": {
"pending_only": "function(doc, req){ return doc.type==='order' && doc.status==='pending'; }"
}
}
Design documents are versioned just like regular documents and replicate alongside data documents. Changing a design document in a replicated database will propagate the new index definitions to all replica nodes.
MapReduce views are CouchDB's primary indexing mechanism. A view has a map phase (mandatory) and an optional reduce phase. The map function is a JavaScript function that CouchDB runs against every document in the database. For each document it emits zero or more key-value pairs. CouchDB stores these emissions in a B-tree index, kept sorted by key. The reduce function (when present) aggregates values within a key range.
Views are defined inside design documents under the views key:
{
"_id": "_design/products",
"views": {
"by_category": {
"map": "function(doc) { if (doc.type === 'product' && doc.category) { emit(doc.category, { name: doc.name, price: doc.price }); } }"
},
"price_by_category": {
"map": "function(doc) { if (doc.type === 'product') { emit(doc.category, doc.price); } }",
"reduce": "_sum"
},
"by_compound_key": {
"map": "function(doc) { if (doc.type === 'order') { emit([doc.year, doc.month, doc.day], 1); } }"
}
}
}
Key points about map functions:
- The
emit(key, value)call adds an entry to the index. A single document can emit multiple times, creating multiple index entries. - Keys can be strings, numbers, arrays, or null. Arrays support compound-key queries — range queries on
[year, month]work naturally. - The value can be any JSON. Emitting
nullas the value and usinginclude_docs=truein the query avoids duplicating the full document in the index. - Map functions must be pure (no side effects, no external HTTP calls) and deterministic.
Views are built lazily on first query and updated incrementally on subsequent queries — only documents changed since the last index update are re-processed.
The reduce function in a CouchDB MapReduce view aggregates the values emitted by the map function within a key range. CouchDB implements reduce using a rereduce mechanism: values are first reduced in small groups (reduce pass), then those partial results are reduced again (rereduce pass) until a single value remains. This makes reduce scalable across large datasets but also means your reduce function must handle rereduce correctly.
CouchDB provides three built-in reduce functions implemented natively in Erlang (much faster than JavaScript):
_sum— sums all emitted values. Input values must be numbers or arrays of numbers._count— counts the number of emitted key-value pairs regardless of value._stats— returns a statistics object withsum,count,min,max, andsumsqr(for standard deviation).
# Query a view with reduce (default: group_level=0, returns grand total)
curl "http://localhost:5984/sales/_design/reports/_view/revenue_by_region"
# {"rows":[{"key":null,"value":1482390.50}]}
# Group by exact key (group=true)
curl "http://localhost:5984/sales/_design/reports/_view/revenue_by_region?group=true"
# {"rows":[{"key":"APAC","value":312450},{"key":"EMEA","value":589120},...]}
# Group by first element of a compound key array
curl "http://localhost:5984/sales/_design/reports/_view/by_date?group_level=1"
# Groups by year only when key is [year, month, day]
Custom JavaScript reduce functions are allowed but must handle the rereduceboolean parameter: when rereduce=true, the input values are partial reduce results rather than raw map values. Incorrect rereduce handling is a common source of wrong aggregation results.
A view index in CouchDB is a persistent B-tree file on disk that stores all the key-value pairs emitted by a view's map function across all documents in the database. It is stored separately from the main database file (with a .view extension in the views/ directory). The index is sorted by emitted key, enabling efficient range queries.
Build and update lifecycle:
- First query — if the index does not exist, CouchDB processes every document through the map function and builds the index from scratch. This can be slow for large databases.
- Subsequent queries — CouchDB checks the database's update sequence number. Documents changed since the last index update are re-run through the map function incrementally. The index reflects all committed documents before returning results.
- Design document change — any modification to the design document invalidates the entire index; full rebuild required.
The stale query parameter (CouchDB 1.x) or update parameter (2.x+) controls this behavior:
# Default: wait for index to be fully up to date before returning
GET /db/_design/ddoc/_view/my_view
# Return stale (potentially outdated) results immediately; trigger index update in background
GET /db/_design/ddoc/_view/my_view?stale=update_after # 1.x style
GET /db/_design/ddoc/_view/my_view?update=lazy # 2.x+ style
# Return whatever is in the index right now, do not update
GET /db/_design/ddoc/_view/my_view?stale=ok # 1.x
GET /db/_design/ddoc/_view/my_view?update=false # 2.x+
Using stale=update_after is a common pattern for dashboard queries where slightly stale data is acceptable and you want to avoid blocking the user while the index refreshes.
Mango is CouchDB's declarative, MongoDB-inspired query language introduced in CouchDB 2.0. Instead of writing JavaScript map functions, you POST a JSON selector document to the _find endpoint. CouchDB evaluates the selector against a Mango index (or falls back to a full scan) and returns matching documents.
POST /mydb/_find
{
"selector": {
"type": "order",
"status": "pending",
"total": { "$gt": 100 }
},
"fields": ["_id", "customer_id", "total", "created_at"],
"sort": [{ "created_at": "desc" }],
"limit": 20,
"skip": 0
}
Key differences between Mango and MapReduce views:
| Aspect | Mango (_find) | MapReduce Views |
|---|---|---|
| Syntax | JSON selector — no JavaScript required | JavaScript map/reduce functions |
| Primary use | Ad-hoc filtering and sorting on arbitrary fields | Pre-aggregated sorted indexes; efficient range queries |
| Aggregation | No built-in aggregation; returns documents | Yes — _sum, _count, _stats reduce functions |
| Index type | Mango JSON index or full-text index | Persistent sorted B-tree |
| Fallback without index | Full database scan (slow — avoid in production) | N/A — view always has an index |
| Best for | Flexible ad-hoc queries, REST APIs, search | Reporting, aggregations, sorted lookups by known key |
Mango is generally the right choice for new applications because it requires no JavaScript and works well for the typical document-filtering use cases. Use MapReduce views when you need server-side aggregation (sums, counts) or must query by a complex compound key with range semantics.
Mango supports two index types: json indexes (B-tree, for equality and range queries on specific fields) and text indexes (full-text Lucene-backed, for free-text search on string fields). Both are created via POST to /_index.
# Create a JSON index on status + created_at for the orders collection
curl -X POST http://admin:pass@localhost:5984/mydb/_index \
-H "Content-Type: application/json" \
-d '{
"index": {
"fields": ["type", "status", "created_at"]
},
"name": "idx-orders-status-date",
"type": "json",
"ddoc": "_design/mango_indexes"
}'
# {"result":"created","id":"_design/mango_indexes","name":"idx-orders-status-date"}
# Create a text (full-text) index
curl -X POST http://admin:pass@localhost:5984/mydb/_index \
-H "Content-Type: application/json" \
-d '{
"index": {
"default_field": { "enabled": true, "analyzer": "standard" }
},
"name": "idx-fulltext",
"type": "text"
}'
# Query using the json index (CouchDB picks the index automatically)
curl -X POST http://admin:pass@localhost:5984/mydb/_find \
-H "Content-Type: application/json" \
-d '{
"selector": { "type": "order", "status": "pending" },
"sort": [{ "created_at": "desc" }],
"limit": 10
}'
# List all indexes
curl http://admin:pass@localhost:5984/mydb/_index
CouchDB automatically selects the best available index for a _find query. Check the response header X-Couch-Request-ID and use _explain (POST to _explain with the same selector) to confirm which index was chosen. Without an appropriate index, CouchDB falls back to a full database scan, which is safe for development but unacceptable in production.
Mango selectors are JSON objects where each key is a document field or a Mango operator. Operators begin with $. They fall into four groups: comparison, logical, element, and array operators.
POST /mydb/_find
{
"selector": {
"$and": [
{ "type": { "$eq": "product" } },
{ "price": { "$gte": 10, "$lte": 100 } },
{ "tags": { "$elemMatch": { "$eq": "sale" } } },
{ "discontinued": { "$exists": false } },
{ "name": { "$regex": "^Widget" } }
]
}
}
| Category | Operators | Description |
|---|---|---|
| Comparison | $eq, $ne, $lt, $lte, $gt, $gte | Equality and range comparisons |
| Logical | $and, $or, $not, $nor | Boolean combinations of conditions |
| Element | $exists, $type | Check field presence or JSON type |
| Array | $in, $nin, $all, $elemMatch, $size | Match values in or against arrays |
| Evaluation | $regex, $mod | Regex match; modulo arithmetic |
Important constraints: $regex queries do not use B-tree json indexes — they require a full-text (Lucene) index or fall back to a full scan. Compound conditions using $and can use a json index if all fields in the index prefix are covered by equality conditions. For best performance, structure selectors so the most selective equality conditions come first and match the leading fields of a json index.
The _all_docs endpoint is a built-in view that CouchDB automatically maintains for every database. It returns all non-deleted documents sorted by their _id (ascending by default). Internally it is backed by the same document B-tree that stores the documents themselves, so it is always up to date with zero additional index maintenance cost.
# Retrieve all documents (just metadata by default)
curl "http://admin:pass@localhost:5984/mydb/_all_docs?limit=10"
# Include full document bodies
curl "http://admin:pass@localhost:5984/mydb/_all_docs?include_docs=true&limit=10"
# Range query by _id prefix (using the collation order of strings)
curl "http://admin:pass@localhost:5984/mydb/_all_docs?startkey=%22order:%22&endkey=%22order:%22&include_docs=true"
# Fetch specific documents by ID (bulk read equivalent to GET on each)
curl -X POST http://admin:pass@localhost:5984/mydb/_all_docs?include_docs=true \
-H "Content-Type: application/json" \
-d '{"keys":["order:001","order:002","user:42"]}'
Differences from a custom view:
_all_docsis always current — no lazy build delay on first access.- It is keyed only by
_id. You cannot query by any other field — for that you need a view or Mango index. - A custom view can emit any key (category, date, compound key) and can aggregate values via reduce.
_all_docscannot. _all_docsincludes design documents; you can filter them out by requestingstartkey="a"(design docs start with_, which sorts before alphabetic characters in CouchDB's collation).
CouchDB views are sorted B-trees, so efficient pagination uses key-based cursoring rather than offset-based skipping. Two approaches exist: offset pagination (simpler but slow at large offsets) and key-based pagination (efficient at any depth).
# ── Approach 1: Offset-based (avoid for deep pages) ──
# Page 1
GET /db/_design/ddoc/_view/by_date?limit=10
# Page 2 (skip=10 forces a full scan of the first 10 rows — slow at scale)
GET /db/_design/ddoc/_view/by_date?limit=10&skip=10
# ── Approach 2: Key-based cursor (recommended for production) ──
# Page 1: fetch limit+1 to detect whether a next page exists
GET /db/_design/ddoc/_view/by_date?limit=11&descending=false
# From the response, take the last row's key and doc ID as the cursor:
# last key = "2024-03-15", last id = "order:0099"
# Page 2: start from the cursor using startkey + startkey_docid
GET /db/_design/ddoc/_view/by_date?startkey=%222024-03-15%22\
&startkey_docid=order%3A0099&limit=11
# Range query — all orders between two dates
GET /db/_design/ddoc/_view/by_date?startkey=%222024-01-01%22\
&endkey=%222024-03-31%22&include_docs=true
skip is implemented by scanning and discarding leading rows — O(n) in the skipped count. At page 500 with page size 20, skip=10000 forces CouchDB to read 10000 index entries before returning 20 results. For large datasets always use the key-cursor approach. The startkey_docid parameter resolves ties when multiple documents share the same emitted key, ensuring the cursor lands on the exact right row.
A list function is a server-side JavaScript function stored in a design document under the lists key. It acts as a streaming transformer for view query results — instead of returning raw JSON rows, it lets you produce any output format (HTML, XML, CSV, plain text) directly from CouchDB without an intermediary application server.
When called, the list function receives the view result rows one at a time via the getRow() function and can write arbitrary output using send(), building up the response incrementally. This streaming model means large result sets do not need to be buffered in memory.
// In _design/reports, "lists" section:
{
"as_csv": "function(head, req) { start({'headers':{'Content-Type':'text/csv'}}); send('id,status,total\n'); var row; while(row = getRow()) { send(row.id+','+row.value.status+','+row.value.total+'\n'); } }"
}
# Call the list function against a view
GET /db/_design/reports/_list/as_csv/by_status?include_docs=false
When to use list functions: They were popular in CouchApps (self-contained web apps hosted entirely inside CouchDB) where HTML was served from list functions. They can also transform view output to feed legacy systems expecting XML or CSV without an application layer.
Deprecation status: List functions are deprecated in CouchDB 3.x along with show functions. The recommended replacement is to query views from your application server and perform the transformation there. The JavaScript query server adds latency and complexity compared to doing the same transformation in your application code.
CouchDB replication is a document-level sync protocol that copies documents from a source database to a target database using standard HTTP. Either or both of source and target can be local or remote CouchDB instances. Replication is initiated by posting a replication document to the _replicator database or to the /_replicate endpoint directly.
The protocol works through these concrete steps:
- Get peer info — the replicator calls
GET /targetto confirm the target is reachable and retrieves its UUID. - Read checkpoint — it reads the last replication checkpoint stored in a
_local/document on both source and target to know the last source sequence number already replicated. - Get changes — it calls
GET /source/_changes?since={last_seq}to fetch all document IDs and their current revisions changed since the last checkpoint. - Check target revisions — it POSTs the changed IDs to
POST /target/_revs_diffto find which revisions the target is missing. - Fetch missing docs — it fetches the missing document bodies from the source (with attachments if any) and POSTs them in bulk to
/target/_bulk_docs. - Save checkpoint — it writes the new sequence number to
_local/docs on both peers so the next replication starts from there.
// Replication document in the _replicator database:
{
"_id": "sync-orders-to-replica",
"source": "http://admin:pass@primary:5984/orders",
"target": "http://admin:pass@replica:5984/orders",
"continuous": false,
"create_target": true
}
The protocol is idempotent: re-running a replication never loses data or creates duplicates. Because it uses standard HTTP and checkpoint documents, any two CouchDB instances can replicate without special network infrastructure — making it practical for cloud-to-edge and offline-mobile scenarios.
CouchDB supports two replication modes: one-shot (the default) and continuous. The mode is set by the continuous boolean in the replication document.
One-shot replication syncs all documents changed since the last checkpoint, then completes. The replication job disappears once finished. It is appropriate for scheduled batch syncs, point-in-time backups, or bootstrapping a new replica.
Continuous replication runs indefinitely after initial sync. It keeps a long-lived _changes feed connection open to the source, processing new changes as they arrive in near real-time. The replication job persists in the _replicator database and is restarted automatically after node restarts.
# One-shot replication via _replicator database
curl -X POST http://admin:pass@localhost:5984/_replicator \
-H "Content-Type: application/json" \
-d '{
"_id": "one-time-backup",
"source": "http://localhost:5984/mydb",
"target": "http://replica:5984/mydb",
"continuous": false,
"create_target": true
}'
# Continuous replication
curl -X POST http://admin:pass@localhost:5984/_replicator \
-H "Content-Type: application/json" \
-d '{
"_id": "live-sync-to-replica",
"source": "http://localhost:5984/orders",
"target": "http://replica:5984/orders",
"continuous": true
}'
# Check replication status
curl http://admin:pass@localhost:5984/_scheduler/jobs
Continuous replication introduces a persistent connection that consumes resources on both nodes. For high-volume databases, monitor the scheduler via /_scheduler/jobs and /_scheduler/docs to detect stalled or crashing replication jobs. A job that enters a crash-loop loop usually indicates a network issue, authentication problem, or an unfixable conflict on the target.
Filtered replication allows you to replicate only a subset of documents from a source database, rather than copying every document. This reduces bandwidth, storage on the target, and replication lag. There are two ways to filter: using a filter function (server-side JavaScript) or using a Mango selector in the replication document (CouchDB 2.x+).
Option 1 — Filter function in a design document:
// In _design/replication_filters:
{
"filters": {
"by_type": "function(doc, req) { return doc.type === req.query.type; }"
}
}
# Replicate only order documents
curl -X POST http://admin:pass@localhost:5984/_replicator \
-H "Content-Type: application/json" \
-d '{
"_id": "orders-only",
"source": "http://localhost:5984/mydb",
"target": "http://replica:5984/orders",
"continuous": true,
"filter": "replication_filters/by_type",
"query_params": { "type": "order" }
}'
Option 2 — Mango selector (preferred in 2.x+, avoids a round-trip through the JavaScript query server):
curl -X POST http://admin:pass@localhost:5984/_replicator \
-H "Content-Type: application/json" \
-d '{
"_id": "active-orders",
"source": "http://localhost:5984/mydb",
"target": "http://replica:5984/active_orders",
"continuous": true,
"selector": { "type": "order", "status": { "$in": ["pending","processing"] } }
}'
The Mango selector approach is more efficient because the filter is evaluated against the changes feed using an in-process Erlang evaluator rather than spawning a JavaScript OS process for each document. It is the recommended approach for all new replication setups.
CouchDB 2.0 (released 2016) absorbed the BigCouch clustering code from Cloudant and made clustered operation the default architecture. CouchDB 3.x continues this model. A CouchDB cluster consists of multiple Erlang nodes that cooperate via a distributed hash ring (using consistent hashing) to shard and replicate data automatically.
| Aspect | CouchDB 1.x (single node) | CouchDB 2.x/3.x (cluster) |
|---|---|---|
| Horizontal scalability | None — single process, single machine | Add nodes; data shards distributed automatically |
| Fault tolerance | Single point of failure | Configurable replica count (n) per database |
| Shard distribution | No sharding — one database file | Configurable Q shards per database, each with n copies |
| Quorum reads/writes | N/A | Configurable r (read quorum) and w (write quorum) |
| Admin interface | Futon | Fauxton (modern React UI) |
| Single-node deployment | Default mode | Supported via single_node config option in 3.x |
| Database creation | PUT /db | PUT /db?q=8&n=3 (control shards and replicas) |
In a cluster, each database is split into Q shards (default 8). Each shard has n copies (default 3) stored on different nodes. When a node is added to the cluster, CouchDB uses the _cluster_setup API to join it to the ring and the rebalancing happens via standard replication. There is no external ZooKeeper or etcd dependency — cluster membership and topology are stored in the _dbs and _nodes internal databases.
In a CouchDB cluster, each database is divided into Q shards (also called range partitions). The key space of document IDs is divided into Q equally-sized ranges using consistent hashing. Each shard is stored as an independent database file on a node. Each shard has n replicas — copies stored on n different nodes for fault tolerance. The default is Q=8 shards and n=3 replicas, giving 24 shard files total for a 3-node cluster.
# Create a database with 4 shards and 2 replicas
curl -X PUT "http://admin:pass@localhost:5984/mydb?q=4&n=2"
# The cluster places shard copies according to the ring
# Check shard placement:
curl http://admin:pass@localhost:5984/mydb/_shards
curl http://admin:pass@localhost:5984/mydb/_shards/doc1 # which shard holds doc1
When a client writes a document, CouchDB hashes the document's _id to determine which shard it belongs to, then writes to all n replicas of that shard. The write succeeds when w replicas acknowledge the write. When a client reads, it contacts the relevant shard replicas and returns when r replicas agree.
| Parameter | Meaning | Default |
|---|---|---|
| Q | Number of shards per database | 8 |
| n | Number of replica copies per shard | 3 |
| w | Write quorum — replicas that must acknowledge a write | 2 (majority of n=3) |
| r | Read quorum — replicas that must respond to a read | 2 |
Setting w=1 maximizes write availability at the cost of potential data loss if the one acknowledging node crashes immediately after. Setting w=n requires all replicas to be available for every write — maximum durability but reduced availability. The default majority quorum (w=2 out of n=3) is the recommended balance for most production clusters.
The _node and _cluster_setup APIs are the two primary endpoints for managing a CouchDB cluster's topology. They are distinct in scope: _node operates on individual node configuration, while _cluster_setup orchestrates the multi-step process of forming or extending a cluster.
The _node API (/_node/{node-name}/) provides per-node operations:
GET /_node/_local/_config— read the running configuration of the local node.PUT /_node/_local/_config/{section}/{key}— change a configuration value live without restart.GET /_node/_local/_stats— performance counters for the local node.GET /_node/_local/_system— Erlang VM stats (memory, processes, ports).
The _cluster_setup API (/_cluster_setup) provides a guided wizard for cluster formation:
# Step 1: Enable cluster mode on node 1
curl -X POST http://admin:pass@node1:5984/_cluster_setup \
-H "Content-Type: application/json" \
-d '{"action":"enable_cluster","username":"admin","password":"pass",
"node_count":3,"bind_address":"0.0.0.0","port":5984}'
# Step 2: Add node 2 to the cluster
curl -X POST http://admin:pass@node1:5984/_cluster_setup \
-H "Content-Type: application/json" \
-d '{"action":"add_node","username":"admin","password":"pass",
"host":"node2","port":5984}'
# Step 3: Finish cluster setup
curl -X POST http://admin:pass@node1:5984/_cluster_setup \
-d '{"action":"finish_cluster"}'
# Check cluster membership
curl http://admin:pass@localhost:5984/_membership
After cluster formation, GET /_membership returns the list of all nodes in the ring. Nodes are identified by their Erlang node name, typically couchdb@hostname. The _node/_local shortcut always refers to the node receiving the request, which is convenient in scripting.
A replication conflict in CouchDB occurs when two nodes have independently updated the same document (same _id) and neither update knew about the other. This is the normal result of multi-master or offline-sync workflows — it is not an error, it is an expected state that the application must handle.
CouchDB stores both conflicting revisions in the database. The document is still readable via its _id, but one revision is designated the winning revision (see Q28 for the algorithm). You can see all conflicting revisions by requesting ?conflicts=true:
# Detect a conflict
curl "http://localhost:5984/mydb/doc1?conflicts=true"
# {
# "_id":"doc1","_rev":"3-winner...","name":"Alice",
# "_conflicts":["3-loser..."]
# }
# Fetch the losing revision
curl "http://localhost:5984/mydb/doc1?rev=3-loser..."
# Resolution strategy: pick winning revision, DELETE the losing one
curl -X DELETE "http://localhost:5984/mydb/doc1?rev=3-loser..."
# OR: merge both and save merged version, then DELETE loser
curl -X PUT http://localhost:5984/mydb/doc1 \
-d '{"_rev":"3-winner...","name":"Alice Smith","merged":true}'
curl -X DELETE "http://localhost:5984/mydb/doc1?rev=3-loser..."
Common resolution strategies:
- Last-write-wins — keep the winning revision (already done automatically), delete losers. Simple but may discard valid changes.
- Application-level merge — read both revisions, merge fields using domain logic (e.g., take the higher stock count), write the merged result as the new winning revision, delete the loser.
- Conflict-free design — model data to avoid conflicts: use separate documents per event (append-only log) instead of updating a shared document; use CouchDB's
_localdocuments for per-device state that does not replicate.
When CouchDB has two or more conflicting revisions of the same document, it must deterministically pick one as the winning revision — the one returned by a normal GET request without ?conflicts=true. The algorithm is deterministic so that all cluster nodes independently arrive at the same winner without any coordination.
The winning revision is chosen by these rules in order:
- Prefer non-deleted revisions over deleted ones. A live document always beats a tombstone, regardless of generation count. This prevents a delete from silently "winning" over an edit that arrived at a replica later.
- Among non-deleted (or all-deleted) revisions, prefer the one with the higher generation number (the integer prefix in
_rev). Generation 5 beats generation 3. - If generation numbers tie, compare the revision hash strings lexicographically. The hash that sorts later (higher string order) wins. This is the tiebreaker of last resort and is arbitrary from a business logic perspective — which is why applications should not rely on the winning revision for semantically meaningful merges.
The consequence of this algorithm is that the winner is not necessarily the revision with the "most recent" wall-clock timestamp, nor the one with the most recent change. It is entirely possible for an older edit to "win" if its revision chain is longer. This is why CouchDB recommends application-level conflict detection and resolution rather than trusting the automatic winner for data that matters.
You can trigger a re-evaluation of which revision wins by deleting the current winner — the next-highest-generation conflict becomes the new winner automatically.
PouchDB is an open-source JavaScript database that runs entirely inside the browser (using IndexedDB or WebSQL as the local storage backend) or in Node.js (using LevelDB). It implements the CouchDB replication protocol, which means it can sync bidirectionally with any CouchDB-compatible server — including Apache CouchDB and IBM Cloudant — using the same HTTP-based protocol.
The offline-first pattern works as follows:
- The browser app reads and writes to the local PouchDB instance — always available, zero latency, no network required.
- When connectivity is available, PouchDB syncs to the remote CouchDB server using the replication protocol: it pushes local changes and pulls remote changes.
- If two users edited the same document while offline, PouchDB surfaces the conflict the same way CouchDB does, and the application resolves it.
// Create a local PouchDB database
const localDB = new PouchDB('myapp');
// Write offline — works even without network
await localDB.put({ _id: 'order:001', type: 'order', total: 99.99 });
// Set up continuous two-way sync when online
const sync = localDB.sync('https://mycouch.example.com/myapp', {
live: true, // continuous
retry: true, // reconnect automatically on network failure
filter: 'myapp/user_docs', // optional: sync only relevant docs
});
sync.on('change', (change) => console.log('Synced:', change));
sync.on('error', (err) => console.error('Sync error:', err));
PouchDB is the canonical choice for progressive web apps, React Native apps, and any scenario where users need to work offline and sync reliably. The CouchDB replication protocol's idempotent, checkpoint-based design means a sync can be interrupted and resumed without data loss or duplicates.
Couchbase Sync Gateway is the replication middleware layer in the Couchbase mobile stack. It sits between mobile clients running Couchbase Lite (the embedded mobile database) and a Couchbase Server cluster, handling authentication, authorization, and document routing. Historically it implemented a subset of the CouchDB replication protocol so that CouchDB-compatible clients could sync against it, but Couchbase has since moved toward its own DCP-based (Database Change Protocol) sync approach in newer versions.
The relationship to CouchDB's replication model:
- Early versions of Couchbase Sync Gateway exposed a CouchDB-compatible REST API and replication endpoint. This meant PouchDB could sync to Sync Gateway using exactly the same protocol it uses with CouchDB.
- Sync Gateway adds access control channels — each document is tagged with channels and each user is granted access to specific channels. This is a layer that CouchDB itself does not provide natively (CouchDB's access control is at the database level, not document level).
- From Couchbase Mobile 3.x onward, Couchbase Lite uses a proprietary BLIP WebSocket protocol (not the CouchDB HTTP replication protocol) for sync, diverging from CouchDB compatibility.
For CouchDB users, Sync Gateway is mainly relevant as a comparison point: if you need per-document access control with mobile sync, Sync Gateway's channel model is a more mature solution than CouchDB's validate_doc_update-based approach. Pure CouchDB users achieve similar results by combining PouchDB sync with per-user databases or filtered replication.
CouchDB supports four authentication mechanisms, configurable simultaneously. Each request is checked against the enabled handlers in order.
1. Basic Authentication — HTTP Basic Auth over HTTPS. Credentials are sent with every request. Simple to implement but requires HTTPS in production to avoid credential exposure.
curl -u admin:password http://localhost:5984/_session
2. Cookie (Session) Authentication — the most common for web apps. POST credentials to /_session to receive a session cookie, then use that cookie for subsequent requests. The cookie has a configurable timeout.
# Login and get a session cookie
curl -X POST http://localhost:5984/_session \
-H "Content-Type: application/json" \
-d '{"name":"alice","password":"s3cret"}'
# Set-Cookie: AuthSession=abc123...; Version=1; Secure; HttpOnly
# Use the session
curl -b "AuthSession=abc123..." http://localhost:5984/mydb/_all_docs
# Logout
curl -X DELETE http://localhost:5984/_session -b "AuthSession=abc123..."
3. JWT Authentication (CouchDB 3.3+) — validates a JSON Web Token in the Authorization: Bearer {token} header. The JWT must contain a sub claim (the username) and optionally _couchdb.roles. CouchDB verifies the signature using a configured HMAC secret or RSA public key — it does not issue JWTs, only validates them.
[jwt_auth]
required_claims = exp
[jwt_keys]
hmac:default = aGVsbG93b3JsZA==
4. Proxy Authentication — for reverse-proxy setups (nginx, HAProxy). The proxy authenticates the user externally and forwards the identity in headers (X-Auth-CouchDB-UserName, X-Auth-CouchDB-Roles, X-Auth-CouchDB-Token). CouchDB trusts the headers if the correct HMAC token is present.
CouchDB has a two-tier permission hierarchy: server-level admins and database-level members. Understanding each tier and the dangerous default state ("admin party") is essential before deploying any CouchDB instance.
Admin Party — when CouchDB is first installed, there are no server admins configured. In this state, every request (including anonymous HTTP calls) has full admin privileges. This is the admin party. You must immediately create at least one server admin via Fauxton or the API to exit admin party mode. CouchDB 3.x requires an admin to be set during installation and will not start without one.
# Create the first server admin (exits admin party)
curl -X PUT http://localhost:5984/_node/_local/_config/admins/admin \
-d '"mys3cretpass"'
Server Admins — stored in the CouchDB config file (not the _users database). They can create/delete databases, manage all users, and access all databases. There is no per-database restriction for server admins.
Database-level Security — set via the _security document on each database. Contains two lists:
admins— users and roles that can write design documents and manage the database's security settings.members— users and roles that can read and write regular documents. If the members list is empty, the database is public-read.
Regular users are stored in the _users database as documents with IDs like org.couchdb.user:{username}. Roles are arbitrary strings assigned to users and checked against the database security object.
The validate_doc_update (VDU) function is a JavaScript function stored in a design document that CouchDB calls before every document write to that database. If the function throws an error, the write is rejected with the specified HTTP status and message. This is the primary mechanism for enforcing document-level business rules and security policies.
// In _design/security:
{
"validate_doc_update": "function(newDoc, oldDoc, userCtx, secObj) {
// Reject if not logged in
if (!userCtx.name) {
throw({ unauthorized: 'You must be logged in to write documents.' });
}
// Enforce required fields
if (!newDoc.type) {
throw({ forbidden: 'Documents must have a type field.' });
}
// Prevent changing the owner field after creation
if (oldDoc && oldDoc.owner !== newDoc.owner) {
throw({ forbidden: 'Cannot change document owner.' });
}
// Only admins can set status to archived
if (newDoc.status === 'archived' && userCtx.roles.indexOf('_admin') === -1) {
throw({ forbidden: 'Only admins can archive documents.' });
}
}"
}
The function receives four arguments:
newDoc— the document being written (the new version).oldDoc— the existing document (null if this is a new document creation).userCtx— the user context:{ name, roles, db }. Roles include_adminfor server admins and_reader,_writer, or custom roles from the user's profile.secObj— the database's_securityobject.
Throw { unauthorized: "message" } to return HTTP 401 (authentication required). Throw { forbidden: "message" } to return HTTP 403 (permission denied). Any other JavaScript throw returns HTTP 500.
The _security object is a special document stored at /db/_security. It defines which users and roles can act as admins (write design documents, change security) or members (read and write regular documents) for that specific database. Every database has one.
{
"admins": {
"names": ["alice", "bob"],
"roles": ["db_admin_role"]
},
"members": {
"names": ["charlie"],
"roles": ["viewer", "editor"]
}
}
# Set the _security object
curl -X PUT http://admin:pass@localhost:5984/mydb/_security \
-H "Content-Type: application/json" \
-d '{
"admins": { "names": ["alice"], "roles": ["db_admin_role"] },
"members": { "names": [], "roles": ["editor","viewer"] }
}'
# Read the current _security object
curl http://admin:pass@localhost:5984/mydb/_security
Key behaviors:
- If the
memberslist is empty (both names and roles), the database is readable by any authenticated user or even anonymously (public database). - Server admins bypass the
_securityobject entirely — they always have full access to every database. - Roles are arbitrary strings. They are assigned to users in the
_usersdatabase under therolesarray in the user document. CouchDB does not provide a built-in role management UI; roles are managed by updating user documents. - Only server admins and database admins can modify the
_securityobject.
CouchDB has a built-in HTTPS listener that can be enabled by adding a [ssl] section to the CouchDB configuration (local.ini or local.d/*.ini). No reverse proxy is required for basic TLS, though using nginx in front is common in production for certificate management and connection pooling.
[ssl]
enable = true
port = 6984
cert_file = /etc/couchdb/ssl/couchdb.pem
key_file = /etc/couchdb/ssl/privkey.pem
# Optional: require client certificates
cacert_file = /etc/couchdb/ssl/cacert.pem
verify_ssl_certificates = false
# Restrict to strong cipher suites
ssl_options = [{secure_renegotiate, true}]
Configuration steps:
- Generate or obtain a certificate and private key (Let's Encrypt, self-signed, or a commercial CA).
- Place the PEM files in a directory readable by the CouchDB process (but not world-readable).
- Add the
[ssl]section tolocal.ini. CouchDB listens on port 6984 for HTTPS by default (the plaintext port 5984 continues to work unless you disable it). - Restart CouchDB and verify:
curl https://localhost:6984/ - In production, disable the plaintext listener by setting
[chttpd] bind_address = 127.0.0.1and routing all external traffic through the HTTPS port or a TLS-terminating reverse proxy.
For clustered setups, TLS should be configured both for client-facing traffic and for node-to-node replication traffic. The inter-node Erlang distribution channel can be secured using Erlang TLS distribution, though this requires additional Erlang configuration beyond the CouchDB config file.
CouchDB exposes two key monitoring endpoints out of the box — /_stats and /_active_tasks — which together give a real-time snapshot of server health and ongoing operations.
GET /_stats returns a JSON object of cumulative performance counters organized by category. Key metrics to watch:
httpd.requests.value— total HTTP requests processed.httpd_request_methods.{GET,PUT,POST,DELETE}.value— breakdown by HTTP verb.httpd_status_codes.{200,201,400,404,409,500}.value— response code breakdown; rising 409s may indicate conflict storms; rising 500s indicate bugs.couchdb.open_databases.value— number of databases currently open (compare tomax_dbs_open).couchdb.request_time.value— mean, min, max request latency.
curl http://admin:pass@localhost:5984/_stats | python3 -m json.tool | head -60
# On a cluster, query per node:
curl http://admin:pass@localhost:5984/_node/couchdb@node1/_stats
GET /_active_tasks returns a live array of currently running background tasks. Each task has a type field:
database_compaction— compaction progress as a percentage.view_compaction— view index compaction progress.indexer— a view index being built or incrementally updated.replication— active replication job with checkpoint seq and docs per second.
curl http://admin:pass@localhost:5984/_active_tasks
# [{"type":"indexer","node":"couchdb@node1","design_document":"_design/orders",
# "view":"by_status","started_on":1700000000,"updated_on":1700000010,
# "progress":45}]
For production monitoring, both endpoints integrate with Prometheus via the community couchdb-exporter, allowing dashboards in Grafana alongside alerting on queue depth, error rates, and compaction lag.
CouchDB's default configuration targets a single-developer workstation. Production deployments require tuning several parameters across different configuration sections:
| Section / Key | Default | What it controls |
|---|---|---|
| [couchdb] max_dbs_open | 500 | Maximum number of database files open simultaneously. Each open database holds a file descriptor. Increase for servers with many databases; ensure OS ulimits allow it. |
| [couchdb] os_process_limit | 100 | Maximum JavaScript OS processes for the query server (views, VDU). Each concurrent JavaScript request consumes one process. Increase for high-concurrency view workloads. |
| [chttpd] workers | 100 | HTTP request handler pool size. Increase for high concurrent request rates. |
| [couch_httpd_auth] timeout | 600 | Session cookie timeout in seconds. |
| [smoosh] * | various | Auto-compaction daemon thresholds. min_priority controls when a database qualifies for compaction based on data/file size ratio. |
| [rexi] buffer_count | 2000 | Internal message buffer for cluster inter-node RPC. Increase if you see rexi_buffer errors in logs. |
| [fabric] request_timeout | 60000ms | Timeout for cluster-level requests. Increase for slow queries over large datasets. |
OS-level tuning is equally important: set ulimit -n to at least 65535 for file descriptors (each open database + each view index file counts). On Linux, set vm.swappiness=1 to prevent Erlang heap from being swapped. For high write throughput, ensure the storage device has noatime mount option to avoid inode update I/O on every read.
CouchDB does not have a hard document size limit (the default max_document_size is 4GB), but the performance trade-offs between storing data as a small number of large documents versus many small documents are significant.
Large documents (e.g., one document per entity with thousands of nested items):
- Every update requires a full rewrite of the document, even if only one nested field changed. This amplifies write I/O and revision chain growth.
- Replication transfers the entire document body on every change. For a 5MB document that changes frequently, this saturates replication bandwidth quickly.
- MVCC conflicts are more likely and more costly to merge because the entire body must be transferred and compared.
- Reading the document always loads the full JSON, even if only one field is needed (CouchDB has no projection at the storage layer — Mango
fieldsprojection happens after the document is loaded).
Many small documents (one document per event/record):
- Updates are small and cheap; conflicts affect only the specific document touched.
- Replication is incremental — only changed documents are transferred.
- Mango and view queries can filter and paginate efficiently.
- The trade-off: each document has overhead (~200 bytes for metadata). A database with 100 million tiny 50-byte documents will have metadata overhead larger than the data itself.
The recommended pattern: keep documents to a natural entity size (an order with its line items, not an order with the entire customer history). Avoid designs that require updating a single document at a rate faster than ~100 writes/second — high-frequency counters belong in Redis, not in a CouchDB document.
The _changes feed is CouchDB's built-in event stream. It reports every document change (create, update, delete) in a database as a sequence of events, each with a sequence number (seq), document ID (id), list of changed revisions (changes), and optionally the full document body. It is the mechanism that powers replication and can also drive event-driven application architectures.
# One-shot: get all changes since the beginning
curl "http://admin:pass@localhost:5984/mydb/_changes"
# Long-polling: block until at least one change arrives
curl "http://admin:pass@localhost:5984/mydb/_changes?feed=longpoll&since=now"
# Continuous streaming feed (server-sent events style)
curl "http://admin:pass@localhost:5984/mydb/_changes?feed=continuous&since=now&heartbeat=5000"
# Include full document body in each change event
curl "http://admin:pass@localhost:5984/mydb/_changes?feed=continuous&include_docs=true&since=now"
# Resume from a checkpoint (since= is the last seq you processed)
curl "http://admin:pass@localhost:5984/mydb/_changes?feed=continuous&since=45-g1AAAA..."
# Filter by Mango selector (2.x+)
curl -X POST "http://admin:pass@localhost:5984/mydb/_changes?feed=continuous&since=now" \
-H "Content-Type: application/json" \
-d '{"selector":{"type":"order","status":"pending"}}'
Each event looks like:
{"seq":"46-g1AAAA...","id":"order:001","changes":[{"rev":"3-abc"}]}
The seq value is your cursor: always persist the last-processed seq to your consumer state store so you can resume without reprocessing. The continuous feed sends a newline heartbeat at regular intervals to keep HTTP connections alive through proxies. Eventsource (feed=eventsource) wraps changes in the Server-Sent Events format for direct browser consumption.
Update handlers are server-side JavaScript functions stored in a design document under the updates key. They allow you to perform document transformations atomically on the server without a client round-trip — the client sends a POST request, and the update handler reads the current document, applies business logic, and returns the modified document in a single operation.
// In _design/handlers:
{
"updates": {
"increment_stock": "function(doc, req) {
if (!doc) { doc = { _id: req.id, stock: 0, type: 'item' }; }
var body = JSON.parse(req.body);
doc.stock = (doc.stock || 0) + (body.amount || 1);
doc.last_updated = new Date().toISOString();
return [doc, toJSON({ ok: true, new_stock: doc.stock })];
}"
}
}
# Call the update handler
curl -X POST \
"http://admin:pass@localhost:5984/mydb/_design/handlers/_update/increment_stock/item:001" \
-H "Content-Type: application/json" \
-d '{"amount": 5}'
# {"ok":true,"new_stock":155}
Differences from a direct PUT:
- No client round-trip — the client does not need to first GET the document to read the current
_revand current field values; the handler receives both and returns the updated document. - Atomic transformation — the read, compute, and write happen within a single server-side operation, reducing MVCC conflict probability for frequently-updated counters or timestamps.
- Custom response body — the handler can return any JSON in the response, not just the standard
{"ok":true,"rev":...}. - When not to use them — update handlers are deprecated in CouchDB 3.x (along with list and show functions). The same logic implemented in your application using a read-modify-write loop is more maintainable and testable.
Show functions are server-side JavaScript functions stored in a design document under the shows key. They transform a single document into any output format (HTML, XML, plain text) directly from CouchDB, without requiring a separate application server. When a client calls GET /db/_design/ddoc/_show/func_name/doc_id, CouchDB fetches the document, passes it to the show function, and returns the function's output as the HTTP response.
// In _design/render:
{
"shows": {
"as_html": "function(doc, req) {
if (!doc) { return { code: 404, body: 'Not found' }; }
return {
headers: { 'Content-Type': 'text/html' },
body: '' + doc.name + '
' + doc.description + '
'
};
}"
}
}
GET /mydb/_design/render/_show/as_html/product:001
Show functions were primarily used in CouchApps — self-contained web applications where HTML pages, CSS, JavaScript, and data were all served from a single CouchDB database. The appeal was zero-infrastructure: the database was the entire application stack. Show functions rendered individual documents as HTML pages; list functions (Q20) rendered view query results.
Deprecation: Show functions were officially deprecated in CouchDB 3.0 (2021) along with list functions and the legacy JavaScript-based rewrites system. They are disabled by default in CouchDB 3.x and will be removed in a future major version. The recommended approach is to handle document rendering in your application layer — any web framework can fetch a document via the REST API and render it. The JavaScript query server overhead, security isolation challenges, and limited debugging tooling made CouchApps impractical at scale.
CouchDB does not have a dedicated backup command like mysqldump. The recommended backup approaches depend on your deployment type and RPO requirements:
1. Replication-based backup (recommended for live systems) — replicate the database to a dedicated backup CouchDB instance (local or remote). Because CouchDB replication is idempotent and incremental, subsequent backup runs only copy changed documents. Schedule it via the _replicator database or a cron job calling /_replicate.
# One-shot backup to a backup server
curl -X POST http://admin:pass@localhost:5984/_replicate \
-H "Content-Type: application/json" \
-d '{
"source": "http://localhost:5984/production_db",
"target": "http://backup:5984/production_db_backup_2024_03",
"create_target": true
}'
2. File-system snapshot — stop CouchDB (or freeze I/O via OS-level snapshot), copy the .couch database files from the data directory (/var/lib/couchdb/ on Linux), then restart. Simple but requires downtime or snapshot coordination.
3. couchdb-backup / couchdbdump tools — community tools like couchdb-backup or couchdbdump serialize all documents to a JSON/ndjson file using the _all_docs or _changes endpoint.
# Dump all documents to ndjson using the _all_docs feed
curl "http://admin:pass@localhost:5984/mydb/_all_docs?include_docs=true" \
| python3 -c "import sys,json; [print(json.dumps(r['doc'])) for r in json.load(sys.stdin)['rows'] if not r['id'].startswith('_design')]" \
> mydb_backup_$(date +%F).ndjson
# Restore by posting each line to _bulk_docs
jq -sc '{docs:.}' mydb_backup_2024-03-15.ndjson | \
curl -X POST http://admin:pass@localhost:5984/mydb_restore/_bulk_docs \
-H "Content-Type: application/json" -d @-
For clustered deployments, replicate per-database since there is no single database file to snapshot. Always verify backups by doing a test restore periodically.
Both CouchDB and MongoDB are JSON document databases, but they make fundamentally different architectural choices that determine where each excels.
| Aspect | CouchDB | MongoDB |
|---|---|---|
| Query interface | HTTP REST — any HTTP client; Mango JSON queries | MongoDB wire protocol; requires language driver |
| Query power | Mango (limited) + MapReduce; no aggregation pipeline | Rich aggregation pipeline; $lookup (join); text search; geospatial |
| Replication / sync | First-class HTTP peer-to-peer; PouchDB offline-first | Replica sets; change streams; no offline-first protocol |
| Conflict handling | Multi-master conflicts surfaced and resolved by app | Replica set — single primary; no write conflicts by design |
| Transactions | ACID per document; no multi-document transactions | ACID multi-document, multi-collection transactions (4.0+) |
| Write throughput | Lower — fsync per write; append-only B-tree | Higher — WiredTiger storage with group commit |
| Mobile / offline sync | Excellent — PouchDB is production-grade | Atlas Device Sync (commercial); no free equivalent |
| Deployment simplicity | Single binary, zero dependencies | More complex; requires mongod + replica set for production HA |
Choose CouchDB when the offline-first / mobile sync use case is central, when HTTP-native access matters (IoT, edge devices, no-driver environments), or when you need a simple embedded-friendly document store. Choose MongoDB when you need a rich aggregation pipeline, multi-document ACID transactions, geospatial queries, or high write throughput at scale.
Several CouchDB anti-patterns cause performance degradation, excessive conflicts, or runaway disk usage. Understanding them helps you design applications that work with CouchDB's architecture rather than against it.
- High-frequency counter documents — updating a single document hundreds of times per second (e.g., a page-view counter) creates a conflict storm and exponential revision chain growth. Solution: batch counter increments, use a reduce view for aggregation, or keep counters in Redis.
- Using skip for deep pagination —
skip=10000&limit=20forces a full scan of 10000 index rows per page load. Solution: use key-cursor pagination withstartkey+startkey_docid. - Querying views without indexes — running Mango
_findqueries without a matching json index causes full database scans. Always create a Mango index for fields used in production selectors and verify with_explain. - Storing large binaries as attachments — attachments over ~1 MB bloat the database file and slow compaction. Use an object store (S3, MinIO) and store the URL in the document.
- Changing design documents frequently — every design document change triggers a full view index rebuild. In high-write databases this causes prolonged indexing load. Batch design document changes; test index changes in staging with realistic data volumes before deploying.
- Never running compaction — an update-heavy database that is never compacted can grow to 10-100x the size of its live data. Configure the smoosh auto-compaction daemon or schedule nightly compaction.
- Ignoring unresolved conflicts — conflicts from multi-master replication accumulate silently. Build a conflict-detection routine into your application and resolve them regularly.
CouchDB provides several migration paths depending on whether you are upgrading in place, moving to a new cluster, or changing data structure during migration.
1. Replication-based migration (zero-downtime, recommended)
# Step 1: Replicate from old instance to new
curl -X POST http://admin:pass@new-couch:5984/_replicator \
-H "Content-Type: application/json" \
-d '{
"_id": "migrate-orders",
"source": "http://admin:pass@old-couch:5984/orders",
"target": "http://admin:pass@new-couch:5984/orders",
"continuous": true,
"create_target": true
}'
# Step 2: Monitor until caught up
curl http://admin:pass@new-couch:5984/_scheduler/docs
# Step 3: Stop writes to old instance, verify new instance is current
# (compare document counts and last seq numbers)
# Step 4: Switch application connection string to new instance
# Step 5: Stop the replication job and decommission old instance
2. In-place upgrade (CouchDB 1.x to 2.x/3.x) — CouchDB 2.x reads CouchDB 1.x database files directly (the on-disk format is forward-compatible). Install the new version over the old one, point it at the same data directory. However, the cluster setup and configuration format changed significantly — review the upgrade guide for your specific version pair.
3. Document transformation during migration — if the schema changes (adding required fields, renaming fields), write a migration script that reads from the source using _all_docs or _changes, transforms each document, and writes to the target via _bulk_docs. Process in batches of 100-500 documents to avoid memory pressure.
# Estimate progress: compare total_rows on both sides
curl http://admin:pass@old:5984/orders/ | python3 -c "import sys,json; d=json.load(sys.stdin); print('old:', d['doc_count'])"
curl http://admin:pass@new:5984/orders/ | python3 -c "import sys,json; d=json.load(sys.stdin); print('new:', d['doc_count'])"
Always verify the migration by comparing document counts, running a sample of queries on both instances, and doing a test cutover before the production switch.
