Debug: Cache Inconsistency

Runbook for diagnosing stale, wrong, or missing data caused by cache inconsistency.

Updated Invalid Date·

debugging cache redis stale-data runbook

Symptom: Users see stale data after an update, different users see different versions of the same record, or data is correct in the database but wrong in the response.

Quick Diagnosis

Pattern	Likely cause
Stale data after write, clears itself eventually	TTL too long, no cache invalidation on write
Different users see different values	Cache key includes user context unintentionally, or partial invalidation
Data correct in DB but wrong in response	Cache not invalidated after update
Cache miss storm after deploy	Keys all expire simultaneously (thundering herd)
Stale data only on some nodes	Local in-process cache not invalidated across instances

Likely Causes (ranked by frequency)

Write updates DB but does not invalidate or update the cache key
TTL too long — stale data outlives its usefulness
Cache key is wrong — different keys for the same logical data
In-process (local) cache not cleared on other instances after write
Race condition — read-through populates cache between write and invalidation

First Checks (fastest signal first)

Confirm the value in the DB matches what the cache returns — is this a stale cache or a bad write?
Check whether the write path invalidates or updates the cache key — is invalidation missing entirely?
Confirm the cache key is deterministic — does the same request always produce the same key?
Check TTL on the affected key — is it longer than the acceptable staleness window?
If multiple instances: confirm invalidation is broadcast to all nodes, not just the writing instance

Signal example: Product price updated in DB, but API returns old price for 60 seconds — write path updates DB only, TTL is 60s, no explicit invalidation on price change.

Drill Paths

Suspect	Go to
Cache invalidation missing on write	cs-fundamentals/caching
Redis key inspection and TTL debugging	infra/vector-stores
Thundering herd after mass expiry	cs-fundamentals/error-handling-patterns
In-process cache across instances	cs-fundamentals/distributed-systems
Cache layer in the full request path	synthesis/request-flow-anatomy

Fix Patterns

Invalidate or update cache immediately on write — do not rely on TTL alone for correctness
Use write-through caching for high-consistency data — update cache and DB in the same operation
Set TTL appropriate to staleness tolerance — not a default; choose per data type
Add jitter to TTL to prevent mass expiry at the same time
For multi-instance: use a pub/sub invalidation broadcast (Redis keyspace events or a message bus)

When This Is Not the Issue

If the cache is being invalidated correctly and TTLs are appropriate but inconsistency persists:

The write itself may not be completing — check DB transaction logs for rollbacks
A second writer may be overwriting the DB after the cache is set
The cache key may be correct but serialisation is producing a different value on each read

Pivot to cs-fundamentals/database-transactions to confirm the write is actually committed before the cache is populated.

Connections

cs-fundamentals/error-handling-patterns · cs-fundamentals/distributed-systems · cs-fundamentals/database-transactions · synthesis/request-flow-anatomy

Open Questions

What has changed since this synthesis was written that would alter the conclusions?
What evidence would cause you to revise the key recommendation here?