Database / ChromaDB Interview Questions
What is embedding consistency and why is it critical in ChromaDB applications?
Embedding consistency means using the exact same embedding model and version for both indexing (adding documents) and querying. If you embed documents with model A but query with model B, the resulting vectors live in incompatible geometric spaces — similarity distances become meaningless and retrieval quality collapses.
import chromadb
from chromadb.utils import embedding_functions
client = chromadb.PersistentClient(path="./consistency_demo")
# CORRECT: same embedding function for add and query
ef = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="all-MiniLM-L6-v2"
)
collection = client.get_or_create_collection(
"correct_usage",
embedding_function=ef, # stored on collection
)
collection.add(
documents=["Hello world"],
ids=["d1"],
)
# query() automatically uses the same ef stored on the collection
results = collection.query(query_texts=["greetings"], n_results=1)
# Works correctly — ef is applied to both document and query
# ---
# PITFALL 1: switching models between sessions
# Session 1: add with all-MiniLM-L6-v2 (384 dims)
# Session 2: accidentally use all-mpnet-base-v2 (768 dims) → dimension mismatch error!
# PITFALL 2: updating embedding model version
# Model v1.0 and v1.1 may produce different vector spaces
# Always re-embed ALL documents when upgrading the embedding model
# BEST PRACTICE: store the model name in collection metadata
collection_safe = client.get_or_create_collection(
"safe_collection",
embedding_function=ef,
metadata={
"hnsw:space": "cosine",
"embedding_model": "all-MiniLM-L6-v2", # document which model was used
"embedding_dim": "384",
},
)
# On load, verify the model matches what is stored:
meta = collection_safe.metadata
print(meta["embedding_model"]) # "all-MiniLM-L6-v2"
print(meta["embedding_dim"]) # "384"
# When you need to upgrade the embedding model:
# 1. Create a NEW collection with the new model
# 2. Re-embed and re-insert all documents
# 3. Run validation queries to confirm quality
# 4. Delete the old collection| Check | Why |
|---|---|
| Same model name | Different models produce vectors in different spaces |
| Same model version | Even minor version updates can shift the vector space |
| Same preprocessing | Lowercasing, truncation, etc. must be identical |
| Store model name in metadata | Documents which model was used for future reference |
| Re-embed on model upgrade | Old and new vectors cannot coexist in the same collection |
Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!
Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.
Invest now!!! Get Free equity stock (US, UK only)!
Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.
The Robinhood app makes it easy to trade stocks, crypto and more.
Webull! Receive free stock by signing up using the link: Webull signup.
More Related questions...
