Database / ChromaDB Interview Questions
How do you ensure the correct embedding function is used when reopening a persistent ChromaDB collection?
ChromaDB stores document text and vectors persistently, but it does not store which embedding function was used. When you reopen a PersistentClient, you must re-supply the same embedding function to the collection — otherwise ChromaDB may default to a different model, producing embedding mismatches.
import chromadb
from chromadb.utils import embedding_functions
import os
DB_PATH = "./persistent_ef_demo"
# === SESSION 1: Create and populate collection ===
client1 = chromadb.PersistentClient(path=DB_PATH)
ef_openai = embedding_functions.OpenAIEmbeddingFunction(
api_key=os.environ["OPENAI_API_KEY"],
model_name="text-embedding-3-small",
)
col1 = client1.get_or_create_collection(
name="my_docs",
embedding_function=ef_openai, # set the EF
metadata={"hnsw:space": "cosine",
"embedding_model": "text-embedding-3-small"}, # document it
)
col1.add(documents=["ChromaDB is great"], ids=["d1"])
print("Session 1 done, process exits...")
del client1, col1
# === SESSION 2: Reopen — MUST re-supply the same embedding function ===
client2 = chromadb.PersistentClient(path=DB_PATH)
# WRONG: ChromaDB defaults to all-MiniLM-L6-v2 (384-dim)
# Querying with a different model produces wrong results!
# col_wrong = client2.get_collection("my_docs") # DO NOT DO THIS
# CORRECT: Re-supply the exact same embedding function
ef_openai_v2 = embedding_functions.OpenAIEmbeddingFunction(
api_key=os.environ["OPENAI_API_KEY"],
model_name="text-embedding-3-small", # must match session 1
)
col2 = client2.get_collection(
name="my_docs",
embedding_function=ef_openai_v2, # required!
)
results = col2.query(query_texts=["vector databases"], n_results=1)
print(results["documents"]) # correct result
# TIP: Read model name from collection metadata to avoid hardcoding
saved_model = col2.metadata.get("embedding_model", "all-MiniLM-L6-v2")
print(f"Using model: {saved_model}")| Scenario | Problem | Solution |
|---|---|---|
| Reopen collection without EF | Defaults to all-MiniLM-L6-v2, mismatches stored vectors | Always pass embedding_function= on get_collection() |
| Upgrade embedding model | Old vectors incompatible with new model | Create new collection, re-embed all docs, migrate |
| Team member uses different EF | Silent quality degradation | Store model name in collection metadata, document in README |
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
