Database / ChromaDB Interview Questions
When should you use upsert() instead of add() in ChromaDB, and what are common patterns?
upsert() is the idempotent write operation in ChromaDB: it inserts a document if the ID does not exist, or updates it if the ID already exists. This makes it safe to call repeatedly without checking whether a document has been indexed before — a critical property for ETL pipelines, scheduled sync jobs, and incremental indexing.
import chromadb
from datetime import datetime
client = chromadb.PersistentClient(path="./upsert_demo")
col = client.get_or_create_collection("products")
# Pattern 1: Safe initial load
# Can re-run the script without duplicate ID errors
def sync_products(products: list[dict]):
col.upsert(
documents=[p["description"] for p in products],
ids= [str(p["id"]) for p in products],
metadatas= [{"name": p["name"], "price": p["price"], "updated": int(datetime.now().timestamp())}
for p in products],
)
products_v1 = [
{"id": 1, "name": "Widget", "description": "A blue widget", "price": 9.99},
{"id": 2, "name": "Gadget", "description": "A red gadget", "price": 14.99},
]
sync_products(products_v1) # inserts both
print(col.count()) # 2
# Product 1 description changed — upsert handles it cleanly
products_v2 = [
{"id": 1, "name": "Widget", "description": "An improved blue widget v2", "price": 11.99},
{"id": 3, "name": "Doohickey", "description": "A green doohickey", "price": 4.99},
]
sync_products(products_v2) # updates id=1, inserts id=3
print(col.count()) # 3
# Verify the update
result = col.get(ids=["1"])
print(result["documents"][0]) # "An improved blue widget v2"
print(result["metadatas"][0]["price"]) # 11.99
# Pattern 2: Incremental indexing — only upsert changed documents
def incremental_sync(items, last_sync_ts: int):
changed = [i for i in items if i["updated_at"] > last_sync_ts]
if changed:
col.upsert(
documents=[i["body"] for i in changed],
ids= [i["id"] for i in changed],
metadatas= [{"updated_at": i["updated_at"]} for i in changed],
)| Scenario | Use |
|---|---|
| First-time bulk load with guaranteed unique IDs | add() — faster, errors catch duplicate bugs |
| Recurring sync job (daily/hourly) | upsert() — safe to re-run without cleanup |
| User-triggered document update | upsert() — don't need to check if doc exists first |
| Append-only event log | add() — duplicates should be errors, not updates |
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
