stream(@RequestParam String message) { return chatClient.prompt() .user(message) .stream() .content(); } // For full metadata (finish reason, token counts per chunk) @GetMapping(value = "/stream/full", produces = MediaType.TEXT_EVENT_STREAM_VALUE) public Flux streamFull(@RequestParam String message) { return chatClient.prompt() .user(message) .stream() .chatResponse(); } } From the browser or curl, the client reads the event stream as tokens arrive. Backpressure is handled by Project Reactor — if the client cannot consume fast enough, the Flux signals backpressure upstream. For SSE with Spring MVC (not WebFlux), SseEmitter combined with Flux.subscribe() and a manual emitter thread achieves the same result, though WebFlux is cleaner."> stream(@RequestParam String message) { return chatClient.prompt() .user(message) .stream() .content(); } // For full metadata (finish reason, token counts per chunk) @GetMapping(value = "/stream/full", produces = MediaType.TEXT_EVENT_STREAM_VALUE) public Flux streamFull(@RequestParam String message) { return chatClient.prompt() .user(message) .stream() .chatResponse(); } } From the browser or curl, the client reads the event stream as tokens arrive. Backpressure is handled by Project Reactor — if the client cannot consume fast enough, the Flux signals backpressure upstream. For SSE with Spring MVC (not WebFlux), SseEmitter combined with Flux.subscribe() and a manual emitter thread achieves the same result, though WebFlux is cleaner." />

Prev Next

Spring / Spring AI interview questions

How do you use Spring AI with Spring WebFlux for a reactive AI endpoint?

Spring AI integrates naturally with Spring WebFlux's reactive pipeline. Because LLM streaming returns a Flux<String> or Flux<ChatResponse>, you can return it directly from a WebFlux controller with zero blocking, delivering tokens to the browser as Server-Sent Events (SSE) as fast as the model produces them.

@RestController
@RequestMapping("/ai")
public class AiStreamController {

    private final ChatClient chatClient;

    public AiStreamController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<String> stream(@RequestParam String message) {
        return chatClient.prompt()
            .user(message)
            .stream()
            .content();
    }

    // For full metadata (finish reason, token counts per chunk)
    @GetMapping(value = "/stream/full", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ChatResponse> streamFull(@RequestParam String message) {
        return chatClient.prompt()
            .user(message)
            .stream()
            .chatResponse();
    }
}

From the browser or curl, the client reads the event stream as tokens arrive. Backpressure is handled by Project Reactor — if the client cannot consume fast enough, the Flux signals backpressure upstream. For SSE with Spring MVC (not WebFlux), SseEmitter combined with Flux.subscribe() and a manual emitter thread achieves the same result, though WebFlux is cleaner.

Invest now in Acorns!!! 🚀 Join Acorns and get your $5 bonus!

Invest now in Acorns!!! 🚀
Join Acorns and get your $5 bonus!

Earn passively and while sleeping

Acorns is a micro-investing app that automatically invests your "spare change" from daily purchases into diversified, expert-built portfolios of ETFs. It is designed for beginners, allowing you to start investing with as little as $5. The service automates saving and investing. Disclosure: I may receive a referral bonus.

Invest now!!! Get Free equity stock (US, UK only)!

Use Robinhood app to invest in stocks. It is safe and secure. Use the Referral link to claim your free stock when you sign up!.

The Robinhood app makes it easy to trade stocks, crypto and more.


Webull! Receive free stock by signing up using the link: Webull signup.

More Related questions...

What is Spring AI and what problem does it solve? What AI model providers does Spring AI support? What is the difference between ChatModel and ChatClient in Spring AI? How do you create and use a ChatClient in a Spring Boot application? What message types does Spring AI support in a Prompt? What is Retrieval-Augmented Generation (RAG) and how does Spring AI implement it? What is a VectorStore in Spring AI and which implementations are available? What is an EmbeddingModel in Spring AI and why must the same model be used for ingestion and retrieval? How does PromptTemplate work in Spring AI? What is structured output in Spring AI and how does it work internally? What are Advisors in Spring AI and what built-in advisors are available? How does conversation memory work in Spring AI? What is function calling (tool use) in Spring AI and how do you register a function? How do you stream responses from an LLM in Spring AI? What is the Document class in Spring AI and how is it used in RAG? What is TokenTextSplitter and why is document chunking necessary? What DocumentReaders does Spring AI provide for loading content into the RAG pipeline? What is the Spring AI ETL pipeline and how does it work? How does Spring AI integrate with Spring Boot auto-configuration? What are ChatOptions in Spring AI and how do you apply them per-request? What is the SearchRequest API in Spring AI's VectorStore? How does Spring AI support multimodal inputs such as images? What is image generation in Spring AI and how do you use ImageModel? How does Spring AI handle observability and what metrics does it expose? How do you test Spring AI components without calling real AI APIs? What is the Spring AI MCP (Model Context Protocol) integration? What is the role of MetadataEnricher and KeywordMetadataEnricher in Spring AI? What are the Spring AI Chat Model options for controlling response determinism? What is the Spring AI Agentic pattern and how does it differ from a single-turn chat call? What does the spring-ai-bom do and why should you use it? What is PgVector and how do you configure it as a VectorStore in Spring AI? How does Spring AI's retry and resilience mechanism work for LLM API calls? What is the Spring AI Evaluation framework and how do you use it? How do you use Spring AI with Spring WebFlux for a reactive AI endpoint? What are the Spring AI Spring Initializr options and how do you bootstrap a project? What is the Spring AI content moderation strategy and how do you implement it? How does Spring AI support multi-tenancy where different users need different LLM configurations? What is the Spring AI AudioModel and how does it support speech synthesis? How does Spring AI handle prompt injection attacks? What are the performance tuning strategies for a Spring AI RAG application at scale? How does Spring AI support the Ollama provider for local model development? What is semantic caching in Spring AI and how would you implement it? How does Spring AI integrate with Spring Security for securing AI endpoints? How does Spring AI's Document metadata filtering work with PgVector and what filter operators are available?


Comments & Discussions