Test a streaming LLM endpoint with Playwright
Write a Playwright test that calls a Server-Sent Events streaming endpoint, captures each chunk as it arrives rather than waiting for the full response, reconstructs the complete content, and asserts that it contains expected content and follows the correct SSE format throughout the stream.
Why this matters
Streaming endpoints are increasingly common in AI applications and fail in ways that regular API tests miss entirely: partial chunks, malformed event data, broken SSE formatting mid-stream, or content that is correct token by token but wrong as a whole. Testing streaming behaviour requires a different approach to standard request-response testing.
Before you start
- Playwright installed with TypeScript
- A streaming endpoint to test (an LLM chat API or a simple SSE server you write yourself)
- Understanding of what Server-Sent Events are and the data: / event: / id: format
- Basic Playwright knowledge: page.route, page.evaluate
Step-by-step guide
- 1
Intercept the streaming response
Use page.route() to intercept requests to your streaming endpoint. In the route handler, call route.fetch() to get the actual response. The response body will be a ReadableStream; you need to read it chunk by chunk rather than awaiting the full body.
import { test, expect } from "@playwright/test"; test("streaming endpoint returns SSE chunks", async ({ page }) => { const chunks: string[] = []; await page.route("**/api/chat/stream", async (route) => { const response = await route.fetch(); const body = response.body(); // body() returns a Buffer — convert to string and split on newlines const text = body.toString("utf-8"); chunks.push(...text.split("\n").filter(Boolean)); // Forward the response unchanged so the page still renders await route.fulfill({ response }); }); await page.goto("/chat"); }); - 2
Parse the SSE format
Each chunk is a Uint8Array. Decode it with TextDecoder and split on newline. Lines starting with 'data: ' contain the payload. Lines that are just a newline signal the end of an event. Parse each event into a structured object and collect them in an array.
interface SSEEvent { data: string; event?: string; id?: string; } function parseSSEChunks(raw: string): SSEEvent[] { const events: SSEEvent[] = []; let current: Partial<SSEEvent> = {}; for (const line of raw.split("\n")) { if (line.startsWith("data: ")) { current.data = line.slice(6).trim(); } else if (line.startsWith("event: ")) { current.event = line.slice(7).trim(); } else if (line.startsWith("id: ")) { current.id = line.slice(4).trim(); } else if (line === "") { // blank line = end of event if (current.data !== undefined) events.push(current as SSEEvent); current = {}; } } return events; } - 3
Reconstruct the full response
Concatenate the data fields from each event to reconstruct the full response text. For LLM streaming responses, each event typically contains a token or a JSON object with a delta field. Write a helper that handles both formats so your test is not brittle to minor API changes.
function reconstructResponse(events: SSEEvent[]): string { return events .filter((e) => e.data !== "[DONE]") .map((e) => { // Handle plain-text token format if (!e.data.startsWith("{")) return e.data; // Handle JSON delta format: {"delta": {"text": "..."}} try { const parsed = JSON.parse(e.data); return parsed?.delta?.text ?? parsed?.text ?? ""; } catch { return e.data; } }) .join(""); } - 4
Assert chunk-level and full-response behaviour
Assert: the first chunk arrives within 2 seconds (time-to-first-token), each chunk's data field is valid JSON or plain text (not malformed), the reconstructed response contains expected content, and the stream terminates with the correct done signal (data: [DONE] or similar).
test("SSE stream assertions", async ({ page, request }) => { const startTime = Date.now(); const chunks: string[] = []; // Use APIRequestContext to stream the response directly const response = await request.post("/api/chat/stream", { data: { message: "Say hello in one word." }, }); expect(response.status()).toBe(200); expect(response.headers()["content-type"]).toContain("text/event-stream"); const body = await response.text(); const events = parseSSEChunks(body); // First chunk should arrive quickly (time-to-first-token) expect(Date.now() - startTime).toBeLessThan(2000); // Every event should have a data field expect(events.every((e) => e.data !== undefined)).toBe(true); // Last event should be [DONE] expect(events[events.length - 1].data).toBe("[DONE]"); const full = reconstructResponse(events); expect(full.toLowerCase()).toContain("hello"); }); - 5
Assert the UI reflects the stream
Navigate to a page that renders the streaming response. Assert that text appears progressively; check that the UI is not blank for the first 3 seconds by polling for visible text. This tests the front-end streaming rendering, not just the API.
test("UI renders streaming tokens progressively", async ({ page }) => { await page.goto("/chat"); // Type a message and submit await page.getByRole("textbox", { name: /message/i }).fill("Say hello."); await page.getByRole("button", { name: /send/i }).click(); // Assert text starts appearing within 3 seconds (not blank) const messageLocator = page.locator('[data-testid="assistant-message"]'); await expect(messageLocator).not.toBeEmpty({ timeout: 3000 }); // Wait for streaming to finish (done indicator disappears) await expect(page.locator('[data-testid="streaming-indicator"]')).toBeHidden({ timeout: 15000, }); // Assert the final message contains expected content const finalText = await messageLocator.textContent(); expect(finalText?.toLowerCase()).toContain("hello"); });