LangChain4j and Java AI Integration

LangChain4j is the primary Java LLM framework — annotation-driven AI Services, tool calling, RAG pipeline, and MCP integration; Java 21 virtual threads make parallel LLM calls practical without async complexity.

Updated Invalid Date·

java langchain4j spring-ai llm mcp rag java-21

The Java ecosystem for building LLM-powered applications. LangChain4j is the primary framework; Spring AI provides Spring-native integration.

LangChain4j

The most complete LLM framework for Java. Mirrors LangChain Python's abstractions but Java-idiomatic: interfaces, annotations, builders.

Core Concepts

ChatLanguageModel — the LLM interface:

ChatLanguageModel model = AnthropicChatModel.builder()
    .apiKey(System.getenv("ANTHROPIC_API_KEY"))
    .modelName("claude-sonnet-4-6")
    .maxTokens(1024)
    .build();

String response = model.generate("What is the capital of France?");

StreamingChatLanguageModel — streaming:

StreamingChatLanguageModel streamingModel = AnthropicStreamingChatModel.builder()
    .apiKey(System.getenv("ANTHROPIC_API_KEY"))
    .modelName("claude-sonnet-4-6")
    .build();

streamingModel.generate("Tell me a story", new StreamingResponseHandler<AiMessage>() {
    @Override
    public void onNext(String token) { System.out.print(token); }
    @Override
    public void onComplete(Response<AiMessage> response) { System.out.println("\nDone"); }
    @Override
    public void onError(Throwable error) { error.printStackTrace(); }
});

AI Services

The most ergonomic API. Define an interface and LangChain4j generates the implementation:

interface AssistantService {
    @SystemMessage("You are a helpful customer support assistant.")
    String chat(@MemoryId String userId, @UserMessage String userMessage);
    
    @UserMessage("Summarise the following text in {{language}}: {{text}}")
    String summarise(String text, String language);
}

AssistantService assistant = AiServices.builder(AssistantService.class)
    .chatLanguageModel(model)
    .chatMemoryProvider(userId -> MessageWindowChatMemory.withMaxMessages(10))
    .build();

String response = assistant.chat("user123", "Hello, I have a billing question.");

Tool Use (Function Calling)

@Tool("Get the current weather for a location")
public String getCurrentWeather(@P("City name") String city) {
    return weatherService.getWeather(city);
}

// Register tools with the AI service
AssistantService assistant = AiServices.builder(AssistantService.class)
    .chatLanguageModel(model)
    .tools(new WeatherTools())
    .build();

RAG Pipeline

EmbeddingModel embeddingModel = new AllMiniLmL6V2EmbeddingModel();  // local
EmbeddingStore<TextSegment> store = new InMemoryEmbeddingStore<>();

// Ingest
EmbeddingStoreIngestor.ingest(
    Document.from("LangChain4j is a Java LLM framework..."),
    store
);

// Retrieve
EmbeddingStoreContentRetriever retriever = EmbeddingStoreContentRetriever.builder()
    .embeddingStore(store)
    .embeddingModel(embeddingModel)
    .maxResults(3)
    .build();

// Augment AI service with retrieval
AssistantService rag = AiServices.builder(AssistantService.class)
    .chatLanguageModel(model)
    .contentRetriever(retriever)
    .build();

MCP Integration

LangChain4j has a Java MCP SDK:

McpClient client = new McpClient.Builder()
    .transport(new StdioMcpTransport("python", "-m", "my_mcp_server"))
    .build();
client.initialize();

// Use MCP tools as LangChain4j tools
List<ToolSpecification> mcpTools = client.listTools();

Spring AI

Spring Boot integration for LLM applications. Follows Spring conventions: auto-configuration, @Autowired, application.properties configuration.

@SpringBootApplication
public class AiApplication {
    @Autowired
    private ChatClient chatClient;  // Auto-configured from properties

    public String chat(String message) {
        return chatClient.prompt()
            .user(message)
            .call()
            .content();
    }
}

# application.properties
spring.ai.anthropic.api-key=${ANTHROPIC_API_KEY}
spring.ai.anthropic.chat.options.model=claude-sonnet-4-6
spring.ai.anthropic.chat.options.max-tokens=1024

VectorStore — Spring AI abstraction over vector stores (pgvector, Weaviate, Pinecone, etc.):

@Autowired
VectorStore vectorStore;

vectorStore.add(List.of(new Document("content here", Map.of("source", "docs.pdf"))));
List<Document> results = vectorStore.similaritySearch("query text");

When to use Spring AI vs LangChain4j:

Existing Spring Boot project: Spring AI (natural fit)
New Java project, need full LLM framework: LangChain4j
Need MCP, advanced agent patterns, RAG pipeline: LangChain4j

Java 21 Features for AI Workloads

Virtual Threads (Project Loom):

// Run multiple LLM calls without blocking OS threads
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
    var futures = List.of("question 1", "question 2", "question 3").stream()
        .map(q -> executor.submit(() -> model.generate(q)))
        .toList();
    // All 3 calls run in parallel on virtual threads
}

Virtual threads make synchronous LLM calls non-blocking without async/CompletableFuture complexity. Excellent for parallelising multiple LLM calls.

Pattern Matching:

sealed interface LLMResult permits SuccessResult, ErrorResult {}
record SuccessResult(String text, int tokens) implements LLMResult {}
record ErrorResult(String message, int statusCode) implements LLMResult {}

String display = switch (result) {
    case SuccessResult(String text, int tokens) -> "OK: " + text;
    case ErrorResult(String message, int code) -> "Error " + code + ": " + message;
};

Build: Gradle with Kotlin DSL

// build.gradle.kts
dependencies {
    implementation("dev.langchain4j:langchain4j-anthropic:0.36.0")
    implementation("dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2-q:0.36.0")
    testImplementation("org.junit.jupiter:junit-jupiter:5.10.0")
}

Key Facts

LangChain4j v0.36.0 (latest as of 2026-04-29)
AI Services: define an interface, LangChain4j generates the implementation automatically
@MemoryId annotation on method param enables per-user conversation memory
MessageWindowChatMemory.withMaxMessages(10) keeps last N messages per user
MCP Java SDK: StdioMcpTransport for subprocess MCP servers
Java 21 virtual threads: Executors.newVirtualThreadPerTaskExecutor() parallelises LLM calls without blocking OS threads
Spring AI: use for existing Spring Boot projects; LangChain4j for standalone or richer agent capabilities

Common Failure Cases

AiServices.builder(AssistantService.class) throws at runtime because the interface method return type is not supported by LangChain4j's proxy
Why: LangChain4j's AI Services proxy supports a limited set of return types (String, AiMessage, extracted POJO via @ExtractWith); returning a custom class without a registered extractor causes an UnsupportedReturnTypeException that only appears at invocation time, not at build time.
Detect: the AiServices.builder(...).build() call succeeds but the first method invocation throws UnsupportedReturnTypeException; adding print statements confirms the proxy was created successfully.
Fix: either return String and parse the JSON manually, or annotate the method with @ExtractWith(MyClass.class) and ensure MyClass has a no-args constructor and public fields that match the model's JSON output.

Chat memory leaks across users because @MemoryId is on the wrong parameter or the chatMemoryProvider is not configured
Why: if chatMemoryProvider is omitted from AiServices.builder(), LangChain4j uses a single shared in-memory store for all calls; every user's conversation history is merged into one shared context window, causing cross-user data leakage.
Detect: conversation history from user A appears in user B's responses; removing the @MemoryId annotation and using a constant ID reproduces the single-shared-memory behaviour.
Fix: always configure chatMemoryProvider(userId -> MessageWindowChatMemory.withMaxMessages(20)) when using @MemoryId; verify isolation by calling the service with two different user IDs in a test and confirming separate history.

MCP StdioMcpTransport subprocess fails silently when the Python MCP server prints to stderr before the JSON handshake
Why: the stdio transport reads JSON-RPC messages from the subprocess's stdout; any non-JSON output on stdout (debug prints, import warnings) before the handshake breaks the protocol parser; stderr output is discarded silently.
Detect: client.initialize() hangs or throws a JSON parse exception; adding python -W ignore or checking the server's startup sequence reveals non-JSON output on stdout.
Fix: ensure the MCP server writes only valid JSON-RPC to stdout; redirect all debug/log output to stderr or a log file; run the server manually and pipe its output through jq . to verify the first bytes are valid JSON.

EmbeddingStoreIngestor.ingest() creates duplicate embeddings on repeated ingestion because the store has no deduplication
Why: InMemoryEmbeddingStore and most vector store implementations do not deduplicate on insert; calling ingest() twice with the same documents doubles the stored vectors, causing search results to return duplicate chunks with inflated similarity scores.
Detect: search results show identical content chunks appearing multiple times; the embedding store size doubles with each ingest run; cosine similarity scores are correct but the same text appears twice.
Fix: either clear the store before re-ingestion, or track ingested document IDs and skip already-present documents; for production stores, use a content-hash as the vector ID to enforce deduplication at the store level.

Connections

apis/anthropic-api — the underlying API LangChain4j calls for Claude models
protocols/mcp — MCP Java SDK integrates MCP servers as LangChain4j tools
infra/vector-stores — InMemoryEmbeddingStore for dev; Qdrant/pgvector for production
rag/pipeline — Java-agnostic RAG concepts implemented by LangChain4j pipeline
java/spring-ai — Spring Boot alternative; comparison on Spring integration and agent maturity
java/build-tools — Maven and Gradle setup for LangChain4j projects, including the langchain4j-bom BOM
java/what-is-java — JVM fundamentals, virtual threads, static typing — context for why Java in AI

Open Questions

When will LangChain4j reach feature parity with Python LangChain on agentic patterns?
How does LangChain4j's MCP Java SDK compare in capability to the TypeScript SDK?
What is the performance overhead of LangChain4j's AI Services interface proxy vs direct model calls?