Database / ChromaDB Interview Questions
What is a production readiness checklist for a ChromaDB-based application?
Moving a ChromaDB application from prototype to production involves several architectural decisions around storage, concurrency, reliability, and observability. This checklist covers the key concerns.
| Area | Recommendation |
|---|---|
| Storage mode | Use HttpClient connecting to a ChromaDB server — not PersistentClient in multi-process apps |
| Embedding consistency | Store embedding model name in collection metadata; always re-supply EF on get_collection() |
| Distance metric | Set hnsw:space='cosine' at collection creation for text; cannot change later |
| Backups | Schedule regular directory snapshots or SQLite online backups; test restore procedure |
| Telemetry | Set ANONYMIZED_TELEMETRY=False for privacy |
| Batching | Insert in batches of 100–500; use upsert() for idempotent pipelines |
| Error handling | Catch IDAlreadyExistsError, InvalidCollectionException; implement retry logic for HttpClient |
| HNSW tuning | Increase hnsw:construction_ef to 200 and hnsw:search_ef to 50–100 for large collections |
| Metadata schema | Use ints for dates/booleans; document schema in collection metadata |
| Security | Run server behind a reverse proxy with TLS; add auth headers for HttpClient |
| Monitoring | Log query latency, collection size, and embedding function errors |
| Scale planning | Plan ~1.5 KB/doc for 384-dim vectors + 25% HNSW overhead; consider alternatives above 10M docs |
# Minimal production-ready ChromaDB setup
import chromadb
from chromadb.utils import embedding_functions
from chromadb.config import Settings
import os
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
EMBEDDING_MODEL = "text-embedding-3-small"
COLLECTION_NAME = "prod_knowledge_base"
def create_client():
return chromadb.HttpClient(
host=os.environ["CHROMA_HOST"],
port=int(os.environ.get("CHROMA_PORT", 8000)),
settings=Settings(anonymized_telemetry=False),
)
def get_collection(client):
ef = embedding_functions.OpenAIEmbeddingFunction(
api_key=os.environ["OPENAI_API_KEY"],
model_name=EMBEDDING_MODEL,
)
return client.get_or_create_collection(
name=COLLECTION_NAME,
embedding_function=ef,
metadata={
"hnsw:space": "cosine",
"hnsw:construction_ef": 200,
"hnsw:search_ef": 100,
"embedding_model": EMBEDDING_MODEL,
},
)
client = create_client()
client.heartbeat() # fail fast if server is unreachable
collection = get_collection(client)
logger.info(f"Connected to collection with {collection.count()} documents")
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
