Database / ChromaDB Interview Questions
What are best practices for structuring ChromaDB collection metadata for production use?
Collection-level metadata (set via create_collection(metadata=...)) stores configuration about the collection itself. Document-level metadata (set per document via add(metadatas=[...])) enables filtered retrieval. Both need thoughtful design for maintainable production systems.
import chromadb
from datetime import datetime
client = chromadb.PersistentClient(path="./prod_db")
# Good collection-level metadata: document operational details
collection = client.get_or_create_collection(
name="support_tickets_v2",
metadata={
# HNSW config
"hnsw:space": "cosine",
"hnsw:construction_ef": 200,
"hnsw:search_ef": 100,
# Operational metadata
"embedding_model": "text-embedding-3-small",
"embedding_dims": "1536",
"schema_version": "2",
"created_at": "2024-01-15",
"description": "Customer support ticket embeddings for semantic search",
},
)
# Good document-level metadata: filterable, flat, typed
def add_ticket(ticket: dict):
collection.upsert(
documents=[ticket["description"]],
ids=[f"ticket-{ticket['id']}"],
metadatas=[{
# Filterable dimensions
"status": ticket["status"], # "open"/"closed"/"pending"
"priority": ticket["priority"], # "low"/"medium"/"high"
"category": ticket["category"], # "billing"/"technical"/"general"
"agent_id": ticket["agent_id"], # str identifier
# Date as Unix timestamp (int) — enables $gt/$lt range queries
"created_ts": int(datetime.fromisoformat(ticket["created_at"]).timestamp()),
"year": int(ticket["created_at"][:4]),
# Boolean as int — ChromaDB does not support bool type
"is_escalated": int(ticket.get("escalated", False)),
}],
)
# Effective compound filter
results = collection.query(
query_texts=["payment failed cannot checkout"],
n_results=10,
where={"$and": [
{"status": "open"},
{"priority": {"$in": ["high", "medium"]}},
{"category": "billing"},
{"created_ts":{"$gte": int(datetime(2024, 1, 1).timestamp())}},
]},
)Key rules: store dates as Unix timestamps (int) for range filtering. Store booleans as 0/1 integers. Keep metadata keys short and snake_case. Document your schema in collection-level metadata so future developers know what fields exist.
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
