Database / ChromaDB Interview Questions
How do you efficiently add large numbers of documents to ChromaDB using batching?
Adding tens of thousands of documents one at a time is slow because each call triggers embedding computation and index updates. The right approach is to batch documents into groups of 100–500 and add each batch with a single add() call — this amortises embedding overhead and index writes.
import chromadb
from chromadb.utils import embedding_functions
from typing import List
client = chromadb.PersistentClient(path="./bulk_db")
ef = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2"
)
collection = client.get_or_create_collection(
"large_corpus", embedding_function=ef
)
# Simulate a large list of documents
documents = [f"Article about topic {i}" for i in range(10_000)]
ids = [f"doc-{i}" for i in range(10_000)]
metadatas = [{"index": i, "batch": i // 500} for i in range(10_000)]
# Efficient batch insertion
BATCH_SIZE = 500
for start in range(0, len(documents), BATCH_SIZE):
end = start + BATCH_SIZE
collection.add(
documents=documents[start:end],
ids=ids[start:end],
metadatas=metadatas[start:end],
)
print(f"Added batch {start // BATCH_SIZE + 1}, total: {collection.count()}")
print(f"Final count: {collection.count()}") # 10000
# Alternative: provide pre-computed embeddings to skip re-embedding
# (useful when you already called the embedding API)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")
docs_batch = documents[:500]
vectors = model.encode(docs_batch, batch_size=64, show_progress_bar=True)
collection.add(
embeddings=vectors.tolist(),
documents=docs_batch,
ids=ids[:500],
)| Tip | Reason |
|---|---|
| Batch size 100–500 | Balances memory use and embedding throughput |
| Pre-compute embeddings externally | Avoid re-embedding if you already have vectors from an API call |
| Use GPU for local models | SentenceTransformer encodes ~100x faster on GPU |
| Upsert instead of add in loops | upsert() is safe to re-run; add() fails on duplicate IDs |
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
