Spring / Spring AI interview questions

What is semantic caching in Spring AI and how would you implement it?

Semantic caching is an optimisation where you cache LLM responses not by exact query string match but by semantic similarity — if a new question is semantically close enough to a previously answered one, return the cached answer rather than calling the LLM again. This is far more effective than a traditional string-equality cache for AI workloads where users phrase the same question in different ways.

Spring AI does not ship a built-in semantic cache, but the framework provides all the building blocks — a VectorStore, an EmbeddingModel, and the Advisor pattern — to build one cleanly as a custom RequestResponseAdvisor:

@Component
public class SemanticCacheAdvisor implements RequestResponseAdvisor {

    private final VectorStore cacheStore;
    private final double threshold;

    public SemanticCacheAdvisor(VectorStore cacheStore) {
        this.cacheStore = cacheStore;
        this.threshold = 0.92;
    }

    @Override
    public AdvisedRequest adviseRequest(AdvisedRequest req, Map<String, Object> ctx) {
        List<Document> hits = cacheStore.similaritySearch(
            SearchRequest.query(req.userText())
                .withTopK(1).withSimilarityThreshold(threshold));
        if (!hits.isEmpty()) {
            ctx.put("cache_hit", hits.get(0).getMetadata().get("cached_answer"));
        }
        return req;
    }

    @Override
    public ChatClientResponse adviseResponse(ChatClientResponse resp, Map<String, Object> ctx) {
        if (!ctx.containsKey("cache_hit")) {
            // Store new answer in cache
            Document entry = new Document(
                (String) resp.chatResponse().getResult().getOutput().getContent(),
                Map.of("cached_answer", resp.chatResponse().getResult().getOutput().getContent()));
            cacheStore.add(List.of(entry));
        }
        return resp;
    }
}

The similarity threshold (0.9–0.95) is the key tunable: too low and semantically different questions share cached answers; too high and the cache hit rate drops to near zero. For time-sensitive data, add a TTL by storing a timestamp in metadata and invalidating on retrieval.

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.

Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is Spring AI and what problem does it solve? What AI model providers does Spring AI support? What is the difference between ChatModel and ChatClient in Spring AI? How do you create and use a ChatClient in a Spring Boot application? What message types does Spring AI support in a Prompt? What is Retrieval-Augmented Generation (RAG) and how does Spring AI implement it? What is a VectorStore in Spring AI and which implementations are available? What is an EmbeddingModel in Spring AI and why must the same model be used for ingestion and retrieval? How does PromptTemplate work in Spring AI? What is structured output in Spring AI and how does it work internally? What are Advisors in Spring AI and what built-in advisors are available? How does conversation memory work in Spring AI? What is function calling (tool use) in Spring AI and how do you register a function? How do you stream responses from an LLM in Spring AI? What is the Document class in Spring AI and how is it used in RAG? What is TokenTextSplitter and why is document chunking necessary? What DocumentReaders does Spring AI provide for loading content into the RAG pipeline? What is the Spring AI ETL pipeline and how does it work? How does Spring AI integrate with Spring Boot auto-configuration? What are ChatOptions in Spring AI and how do you apply them per-request? What is the SearchRequest API in Spring AI's VectorStore? How does Spring AI support multimodal inputs such as images? What is image generation in Spring AI and how do you use ImageModel? How does Spring AI handle observability and what metrics does it expose? How do you test Spring AI components without calling real AI APIs? What is the Spring AI MCP (Model Context Protocol) integration? What is the role of MetadataEnricher and KeywordMetadataEnricher in Spring AI? What are the Spring AI Chat Model options for controlling response determinism? What is the Spring AI Agentic pattern and how does it differ from a single-turn chat call? What does the spring-ai-bom do and why should you use it? What is PgVector and how do you configure it as a VectorStore in Spring AI? How does Spring AI's retry and resilience mechanism work for LLM API calls? What is the Spring AI Evaluation framework and how do you use it? How do you use Spring AI with Spring WebFlux for a reactive AI endpoint? What are the Spring AI Spring Initializr options and how do you bootstrap a project? What is the Spring AI content moderation strategy and how do you implement it? How does Spring AI support multi-tenancy where different users need different LLM configurations? What is the Spring AI AudioModel and how does it support speech synthesis? How does Spring AI handle prompt injection attacks? What are the performance tuning strategies for a Spring AI RAG application at scale? How does Spring AI support the Ollama provider for local model development? What is semantic caching in Spring AI and how would you implement it? How does Spring AI integrate with Spring Security for securing AI endpoints? How does Spring AI's Document metadata filtering work with PgVector and what filter operators are available?

Show more question and Answers...

Hibernate

	Interviews Questions Java Spring Hibernate Maven Testing API BigData Web DataStructures AI Database Integration Cloud Scala Python Tools Golang	About Javapedia.net Javapedia.net is for Java and J2EE developers, technologist and college students who prepare of interview. Also this site includes many practical examples. This site is developed using J2EE technologies by Steve Antony, a senior Developer/lead at one of the logistics based company.
	contact: javatutorials2016[at]gmail[dot]com
Kindly consider donating for maintaining this website. Thanks.
	Copyright © 2026, javapedia.net, all rights reserved. privacy policy.

Spring / Spring AI interview questions

What is semantic caching in Spring AI and how would you implement it?

Comments & Discussions

Recently added...