IntermediateSDET

Test a streaming LLM endpoint with Playwright

Write a Playwright test that calls a Server-Sent Events streaming endpoint, captures each chunk as it arrives rather than waiting for the full response, reconstructs the complete content, and asserts that it contains expected content and follows the correct SSE format throughout the stream.

Why this matters

Streaming endpoints are increasingly common in AI applications and fail in ways that regular API tests miss entirely: partial chunks, malformed event data, broken SSE formatting mid-stream, or content that is correct token by token but wrong as a whole. Testing streaming behaviour requires a different approach to standard request-response testing.

Before you start

Playwright installed with TypeScript
A streaming endpoint to test (an LLM chat API or a simple SSE server you write yourself)
Understanding of what Server-Sent Events are and the data: / event: / id: format
Basic Playwright knowledge: page.route, page.evaluate

Step-by-step guide

Intercept the streaming response

Use page.route() to intercept requests to your streaming endpoint. In the route handler, call route.fetch() to get the actual response. The response body will be a ReadableStream; you need to read it chunk by chunk rather than awaiting the full body.

import { test, expect } from "@playwright/test";

test("streaming endpoint returns SSE chunks", async ({ page }) => {
  const chunks: string[] = [];

  await page.route("**/api/chat/stream", async (route) => {
    const response = await route.fetch();
    const body = response.body();

    // body() returns a Buffer — convert to string and split on newlines
    const text = body.toString("utf-8");
    chunks.push(...text.split("\n").filter(Boolean));

    // Forward the response unchanged so the page still renders
    await route.fulfill({ response });
  });

  await page.goto("/chat");
});

Parse the SSE format

Each chunk is a Uint8Array. Decode it with TextDecoder and split on newline. Lines starting with 'data: ' contain the payload. Lines that are just a newline signal the end of an event. Parse each event into a structured object and collect them in an array.

interface SSEEvent {
  data: string;
  event?: string;
  id?: string;
}

function parseSSEChunks(raw: string): SSEEvent[] {
  const events: SSEEvent[] = [];
  let current: Partial<SSEEvent> = {};

  for (const line of raw.split("\n")) {
    if (line.startsWith("data: ")) {
      current.data = line.slice(6).trim();
    } else if (line.startsWith("event: ")) {
      current.event = line.slice(7).trim();
    } else if (line.startsWith("id: ")) {
      current.id = line.slice(4).trim();
    } else if (line === "") {
      // blank line = end of event
      if (current.data !== undefined) events.push(current as SSEEvent);
      current = {};
    }
  }
  return events;
}

Reconstruct the full response

Concatenate the data fields from each event to reconstruct the full response text. For LLM streaming responses, each event typically contains a token or a JSON object with a delta field. Write a helper that handles both formats so your test is not brittle to minor API changes.

function reconstructResponse(events: SSEEvent[]): string {
  return events
    .filter((e) => e.data !== "[DONE]")
    .map((e) => {
      // Handle plain-text token format
      if (!e.data.startsWith("{")) return e.data;
      // Handle JSON delta format: {"delta": {"text": "..."}}
      try {
        const parsed = JSON.parse(e.data);
        return parsed?.delta?.text ?? parsed?.text ?? "";
      } catch {
        return e.data;
      }
    })
    .join("");
}

Assert chunk-level and full-response behaviour

Assert: the first chunk arrives within 2 seconds (time-to-first-token), each chunk's data field is valid JSON or plain text (not malformed), the reconstructed response contains expected content, and the stream terminates with the correct done signal (data: [DONE] or similar).

test("SSE stream assertions", async ({ page, request }) => {
  const startTime = Date.now();
  const chunks: string[] = [];

  // Use APIRequestContext to stream the response directly
  const response = await request.post("/api/chat/stream", {
    data: { message: "Say hello in one word." },
  });

  expect(response.status()).toBe(200);
  expect(response.headers()["content-type"]).toContain("text/event-stream");

  const body = await response.text();
  const events = parseSSEChunks(body);

  // First chunk should arrive quickly (time-to-first-token)
  expect(Date.now() - startTime).toBeLessThan(2000);

  // Every event should have a data field
  expect(events.every((e) => e.data !== undefined)).toBe(true);

  // Last event should be [DONE]
  expect(events[events.length - 1].data).toBe("[DONE]");

  const full = reconstructResponse(events);
  expect(full.toLowerCase()).toContain("hello");
});

Assert the UI reflects the stream

Navigate to a page that renders the streaming response. Assert that text appears progressively; check that the UI is not blank for the first 3 seconds by polling for visible text. This tests the front-end streaming rendering, not just the API.

test("UI renders streaming tokens progressively", async ({ page }) => {
  await page.goto("/chat");

  // Type a message and submit
  await page.getByRole("textbox", { name: /message/i }).fill("Say hello.");
  await page.getByRole("button", { name: /send/i }).click();

  // Assert text starts appearing within 3 seconds (not blank)
  const messageLocator = page.locator('[data-testid="assistant-message"]');
  await expect(messageLocator).not.toBeEmpty({ timeout: 3000 });

  // Wait for streaming to finish (done indicator disappears)
  await expect(page.locator('[data-testid="streaming-indicator"]')).toBeHidden({
    timeout: 15000,
  });

  // Assert the final message contains expected content
  const finalText = await messageLocator.textContent();
  expect(finalText?.toLowerCase()).toContain("hello");
});

Relevant Axiom pages

Playwright Testing LLM applications Anthropic API; streaming

What to do next

Back to Practice Lab