Database / ChromaDB Interview Questions

What is the HNSW index in ChromaDB and what parameters can you tune?

ChromaDB uses HNSW (Hierarchical Navigable Small World) as its Approximate Nearest Neighbour (ANN) index. HNSW builds a layered graph structure where each node connects to its closest neighbours — queries traverse this graph efficiently to find approximate nearest neighbours in O(log n) time instead of exhaustive O(n) linear scan.

import chromadb

client = chromadb.Client()

# HNSW parameters are set as metadata at collection creation
collection = client.create_collection(
    name="tuned_collection",
    metadata={
        "hnsw:space":           "cosine",   # distance metric
        "hnsw:construction_ef": 200,         # default 100
        # Controls quality of index during construction.
        # Higher = better recall, slower inserts.

        "hnsw:search_ef":       100,         # default 10
        # Controls quality of search at query time.
        # Higher = better recall, slower queries.

        "hnsw:M":               16,          # default 16
        # Number of bi-directional links per node.
        # Higher = better recall + more memory + slower inserts.
        # Typical range: 4-64.
    },
)

# Note: HNSW parameters cannot be changed after collection creation
# You would need to recreate the collection and re-insert data

collection.add(
    documents=[f"Document number {i}" for i in range(10000)],
    ids=[str(i) for i in range(10000)],
)

HNSW tuning guide
Parameter	Default	Effect of increasing	Effect of decreasing
hnsw:space	l2	Changes metric (cosine/ip)	—
hnsw:M	16	Better recall, more memory, slower inserts	Faster inserts, less memory, lower recall
hnsw:construction_ef	100	Better index quality, slower inserts	Faster inserts, lower quality graph
hnsw:search_ef	10	Better recall, slower queries	Faster queries, lower recall

For most RAG use cases, the defaults work well for collections under ~100K documents. For large collections or when recall matters, increase hnsw:search_ef to 50–200 and set hnsw:construction_ef to at least 200 when building the index.

The size of the dynamic candidate list during query-time search — higher values give better recall at the cost of slower queries

✓ Correct! Well done.

The number of results returned per query

✗ Try again.

The distance metric used during nearest-neighbour search

✗ Try again.

What type of algorithm is HNSW and why does ChromaDB use it instead of exact search?A sorting algorithm — it sorts vectors before comparison

✗ Try again.

An Approximate Nearest Neighbour algorithm — it finds very close (but not always the exact closest) neighbours in O(log n) time, making large-scale similarity search practical

✓ Correct! Well done.

A compression algorithm — it reduces vector dimensions before storing

✗ Try again.

A hashing algorithm — it assigns vectors to buckets for O(1) lookup

✗ Try again.

Take quiz

What does the hnsw:search_ef parameter control in ChromaDB?The number of dimensions in stored embedding vectors

✗ Try again.

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.

Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is ChromaDB and what problem does it solve? What are embeddings and why are they central to how ChromaDB works? What distance metrics does ChromaDB support and how do you choose between them? What is a ChromaDB collection and how do you create, list, get, and delete collections? How do you add documents to a ChromaDB collection? How do you query a ChromaDB collection for similar documents? How do you retrieve, update, and delete specific documents in ChromaDB? How do you filter query results using metadata in ChromaDB? What is the difference between ChromaDB's in-memory and persistent storage modes? What is ChromaDB's default embedding function and how does it work? How do you use the OpenAI embedding function with ChromaDB? How do you use HuggingFace models as embedding functions in ChromaDB? How do you create a custom embedding function for ChromaDB? How does ChromaDB's PersistentClient store data on disk, and what are its limitations? What is the HNSW index in ChromaDB and what parameters can you tune? How do you efficiently add large numbers of documents to ChromaDB using batching? What is the where_document filter in ChromaDB and how does it differ from where? How do you control what data ChromaDB returns in query and get results using include? How do you design metadata schemas for effective filtering in ChromaDB? How do you inspect a ChromaDB collection's contents and configuration? How do you build a basic RAG (Retrieval-Augmented Generation) pipeline with ChromaDB? What are effective document chunking strategies when indexing documents into ChromaDB for RAG? How do you use ChromaDB as a vector store with LangChain? How do you implement multi-tenancy or data isolation in ChromaDB? What is embedding consistency and why is it critical in ChromaDB applications? How do you run ChromaDB as a standalone HTTP server and connect to it from multiple clients? When should you use upsert() instead of add() in ChromaDB, and what are common patterns? What are best practices for structuring ChromaDB collection metadata for production use? How does ChromaDB compare to FAISS, and when should you choose one over the other? What are common ChromaDB errors and how do you handle them in production code? How do you back up and restore a ChromaDB persistent database? How do you ensure the correct embedding function is used when reopening a persistent ChromaDB collection? How do you interpret ChromaDB query distances and convert them into meaningful relevance scores? What are ChromaDB's practical size limits and performance characteristics at scale? How do you use ChromaDB to detect and remove near-duplicate or semantically similar documents? How do you reset or clear a ChromaDB collection without deleting and recreating it? What configuration settings does ChromaDB support and how do you disable telemetry? What is a production readiness checklist for a ChromaDB-based application?

Show more question and Answers...

Integration

Comments & Discussions

MongoDB Interview questions 31 SQL 39 REDIS 18 Apache Cassandra Interview Questions 27 Amazon DynamoDB Interview questions 6 ScyllaDB Interview questions 1 InfluxDb interview questions 30 InfluxDb interview questions II 50 Vector database interview questions 45 PineCone Database Interview questions 50 PineCone Database Interview questions II 5 Snowflake Interview Questions 45 CouchDB Interview Questions 45 Liquibase interview questions 34 Azure Cosmos DB interview questions 0 ChromaDB Interview Questions 38

Recently added...

What is the HNSW index in ChromaDB and what parameters can you tune?

When should you use upsert() instead of add() in ChromaDB, and what are common patterns?

What distance metrics does ChromaDB support and how do you choose between them?

How does ChromaDB's PersistentClient store data on disk, and what are its limitations?

How do you use ChromaDB as a vector store with LangChain?

How do you run ChromaDB as a standalone HTTP server and connect to it from multiple clients?

How do you add documents to a ChromaDB collection?

How do you query a ChromaDB collection for similar documents?

How do you use the OpenAI embedding function with ChromaDB?

How do you create a custom embedding function for ChromaDB?

How do you efficiently add large numbers of documents to ChromaDB using batching?

What is the where_document filter in ChromaDB and how does it differ from where?

How do you implement multi-tenancy or data isolation in ChromaDB?

What is embedding consistency and why is it critical in ChromaDB applications?

What is ChromaDB and what problem does it solve?

What are embeddings and why are they central to how ChromaDB works?

How do you use ChromaDB to detect and remove near-duplicate or semantically similar documents?

How do you reset or clear a ChromaDB collection without deleting and recreating it?

What is ChromaDB's default embedding function and how does it work?

What are best practices for structuring ChromaDB collection metadata for production use?

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database Integration Cloud Scala Python Tools Golang	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.