BeginnerAI Engineer

Trace an LLM call end-to-end in Langfuse

Add Langfuse instrumentation to an existing chatbot or LLM script. You will create traces, nest spans for retrieval and generation steps, capture latency and token cost per call, and use the Langfuse dashboard to identify the most expensive query across a simulated 10-turn session.

Why this matters

You cannot improve what you cannot measure. Latency and cost in LLM systems are almost never distributed evenly; one query type often accounts for 40% of spend. Langfuse makes this visible in minutes, and the habit of adding traces from day one prevents the situation where a production system is burning money in a way no one can explain.

Before you start

Step-by-step guide

  1. 1

    Initialise the Langfuse client

    Import Langfuse and create a client using your public and secret keys. Verify the connection by running langfuse.auth_check(); it will throw if the credentials are wrong. Add this to your script's initialisation block.

  2. 2

    Wrap each user turn in a trace

    For each user message, call langfuse.trace(name='chat-turn', input=user_message). This creates the top-level entry in Langfuse's UI. Pass a session_id so all 10 turns from your simulated session appear grouped together.

  3. 3

    Add a generation span for the LLM call

    Inside the trace, call trace.generation(name='claude', model='claude-sonnet-4-6', input=messages) before your API call. After the call returns, call generation.end(output=response.content, usage=response.usage). This captures the token counts and latency for the LLM step specifically.

  4. 4

    Simulate 10 turns and check the dashboard

    Run your chatbot through 10 varied queries. Open the Langfuse dashboard, navigate to your session, and look at the timeline. Identify which turn has the highest total_cost and which has the highest latency. These are usually different turns.

  5. 5

    Add a retrieval span if using RAG

    If your chatbot fetches context before calling Claude, wrap the retrieval step in trace.span(name='retrieval'). Record how many chunks were retrieved and what the search query was. Langfuse will show retrieval latency separately from generation latency; often retrieval is the bottleneck, not the model.

Relevant Axiom pages

What to do next

Back to Practice Lab