cache-invalidation
$npx skills add blunotech-dev/agents --skill cache-invalidationImplement cache invalidation strategies for data consistency. Use when handling stale data, invalidating on updates, designing cache flows, or working with SWR, cache tags, or event-driven invalidation across Redis, CDNs, or client caches.
| name | description | category |
|---|---|---|
| cache-invalidation | Implement cache invalidation strategies for data consistency. Use when handling stale data, invalidating on updates, designing cache flows, or working with SWR, cache tags, or event-driven invalidation across Redis, CDNs, or client caches. | Backend |
Cache Invalidation
Three patterns, each with different consistency guarantees and complexity. Pick the right one before writing any code.
Phase 1: Discovery
Before recommending a pattern, establish:
What is being cached?
- Per-user vs shared data — shared data is harder; one user's write invalidates everyone's cache
- Aggregate/computed data (counts, totals, ranked lists) — these have fan-out invalidation problems
- Relational data — a single entity change may invalidate N cache keys that join against it
Where is the cache?
- In-process (Node.js Map, Python dict) — fastest, per-instance, no shared invalidation
- Shared cache (Redis, Memcached) — consistent across instances, but network latency
- HTTP/CDN cache (Cache-Control headers, Cloudflare) — hardest to invalidate, most impact
- Client cache (React Query, SWR, browser cache) — requires client cooperation or full TTL expiry
What is the write pattern?
- Who writes? Single service, multiple services, external webhooks?
- How frequent? High-write systems often shouldn't cache at all, or should cache with very short TTL
- Are writes transactional? If yes, invalidation must happen after commit, not before
What consistency level is acceptable?
- Strict: cache miss is always correct data (event-driven, synchronous invalidation)
- Eventual: stale data for up to N seconds is OK (TTL-based, SWR)
- Read-your-writes: user sees their own changes immediately, others may see stale (per-user invalidation only)
Phase 2: Pattern Selection
Pattern A: Event-driven invalidation
Use when: writes are infrequent, consistency requirements are strict, or cache fan-out is bounded.
Pattern B: Tag-based invalidation
Use when: one entity change invalidates multiple related cache keys (e.g., updating a product invalidates product page, category page, search results, related product lists).
Pattern C: Stale-while-revalidate (SWR)
Use when: eventual consistency is acceptable, read volume is very high, and the cost of a stale response is low (not financial, not security-sensitive).
These patterns compose. A CDN cache often needs SWR at the edge and event-driven invalidation at the application layer simultaneously.
Phase 3: Implementation
Pattern A: Event-driven invalidation
The non-obvious part: invalidate after commit, never before or during.
// WRONG — race condition: cache cleared before transaction commits
async function updateUser(id: string, data: UserUpdate) {
await cache.del(`user:${id}`) // cleared too early
await db.users.update(id, data) // if this fails, cache was cleared for nothing
}
// WRONG — another race: between delete and re-read, another request repopulates with stale data
async function updateUser(id: string, data: UserUpdate) {
await db.users.update(id, data)
await cache.del(`user:${id}`)
// request races in here, reads old DB replica, repopulates stale cache
}
// CORRECT — invalidate after confirmed write, accept the replication lag window
async function updateUser(id: string, data: UserUpdate) {
await db.users.update(id, data)
await cache.del(`user:${id}`)
// callers should read from primary DB for the next request if read-your-writes is required
}
Fan-out invalidation — the hidden cost: A write to one entity may need to invalidate dozens of keys. Enumerate them explicitly:
async function invalidateProduct(productId: string, categoryId: string) {
const keys = [
`product:${productId}`,
`category:${categoryId}:products`,
`category:${categoryId}:count`,
`search:*`, // wildcard — Redis SCAN, not KEYS (KEYS blocks)
`homepage:featured`,
]
// For wildcard patterns, use SCAN + pipeline delete, never KEYS in production
await deleteByScan(cache, `search:*`)
await cache.del(...keys.filter(k => !k.includes('*')))
}
Transactional outbox pattern — when the cache and DB must stay consistent across service boundaries:
- Write to DB + outbox table in one transaction
- A background worker reads the outbox and fires cache invalidation
- Guarantees at-least-once invalidation even if the app crashes between write and cache.del
Pattern B: Tag-based invalidation
Tags map a logical group to a set of cache keys. When the group is invalidated, all associated keys are cleared.
Redis implementation:
// Store key→tags mapping as a Redis Set per tag
async function setWithTags(key: string, value: unknown, tags: string[], ttl: number) {
const pipeline = cache.pipeline()
pipeline.set(key, JSON.stringify(value), 'EX', ttl)
for (const tag of tags) {
pipeline.sadd(`tag:${tag}`, key)
pipeline.expire(`tag:${tag}`, ttl * 2) // tag set outlives the keys it tracks
}
await pipeline.exec()
}
async function invalidateByTag(tag: string) {
const keys = await cache.smembers(`tag:${tag}`)
if (keys.length === 0) return
const pipeline = cache.pipeline()
pipeline.del(...keys)
pipeline.del(`tag:${tag}`)
await pipeline.exec()
}
Non-obvious failure mode: tag sets can contain dead keys (TTL expired on the value but the tag set still references it). This is harmless for correctness but the set grows unbounded. Two mitigations:
- Set tag set TTL longer than value TTL (shown above) — tag set eventually expires
- Periodically prune: after
invalidateByTag, the set is deleted anyway
Next.js / Vercel edge cache tags:
// Tag cache entries at fetch time
const res = await fetch('/api/products', {
next: { tags: ['products', `category:${categoryId}`] }
})
// Invalidate from server action or route handler
import { revalidateTag } from 'next/cache'
revalidateTag('products') // purges all entries tagged 'products' from the edge
Cloudflare Cache-Tag header:
Cache-Tag: product-123,category-45,homepage-featured
Invalidate via API: POST /zones/{zone}/purge_cache with { "tags": ["product-123"] }
Note: Cache-Tag is a Cloudflare Enterprise feature. On lower plans, use purge_cache by URL instead.
Pattern C: Stale-while-revalidate
SWR serves stale data immediately and triggers a background refresh. The key insight is that the TTL is split into two windows: fresh (serve without revalidation) and stale (serve stale, revalidate in background).
HTTP Cache-Control:
Cache-Control: max-age=60, stale-while-revalidate=300
Means: fresh for 60s, serve stale for up to 5min while revalidating. After 6min total, the cache must wait for a fresh response.
Application-level SWR (Redis):
async function getWithSWR<T>(
key: string,
fetcher: () => Promise<T>,
freshTTL: number, // serve without revalidation
staleTTL: number // serve stale + revalidate in background
): Promise<T> {
const raw = await cache.get(key)
if (raw) {
const { value, cachedAt } = JSON.parse(raw)
const age = Date.now() - cachedAt
if (age < freshTTL * 1000) {
return value // fresh, return immediately
}
if (age < staleTTL * 1000) {
// stale but within window — return stale, revalidate in background
setImmediate(async () => {
const fresh = await fetcher()
await cache.set(key, JSON.stringify({ value: fresh, cachedAt: Date.now() }), 'EX', staleTTL)
})
return value
}
}
// cache miss or beyond stale window — fetch synchronously
const fresh = await fetcher()
await cache.set(key, JSON.stringify({ value: fresh, cachedAt: Date.now() }), 'EX', staleTTL)
return fresh
}
Where SWR breaks down:
- Data with security implications (permissions, pricing, inventory) — a user might act on stale data
- High-write entities — the stale window means mutations aren't reflected quickly; the cache is providing no useful hit rate
- Aggregates that must be accurate — counts, totals, balances
Phase 4: Non-obvious failure modes to explicitly address
Cache stampede (thundering herd): When a popular cache key expires, N concurrent requests all miss and hit the DB simultaneously. Fix: probabilistic early expiration or a per-key lock.
// Lock-based stampede prevention
async function getWithLock<T>(key: string, fetcher: () => Promise<T>, ttl: number): Promise<T> {
const cached = await cache.get(key)
if (cached) return JSON.parse(cached)
const lockKey = `lock:${key}`
const acquired = await cache.set(lockKey, '1', 'EX', 5, 'NX') // NX = only if not exists
if (!acquired) {
// another request is recomputing — short poll or return a default
await sleep(50)
return getWithLock(key, fetcher, ttl) // retry
}
const value = await fetcher()
await cache.set(key, JSON.stringify(value), 'EX', ttl)
await cache.del(lockKey)
return value
}
Negative caching: If a DB lookup returns null, cache the null explicitly with a short TTL. Otherwise, every miss hammers the DB. cache.set(key, 'NULL', 'EX', 30) and check for the sentinel on read.
Replica lag invalidation hole: If you invalidate and then immediately read from a DB read replica, you may re-populate the cache with stale data. After a write, route the cache-repopulation read to the primary, or skip immediate repopulation and let the next request populate from the replica after lag clears.
In-process cache in multi-instance deployments: Local caches (Node.js Map) are invisible to other instances. A write on instance A invalidates A's cache; instances B and C serve stale until their TTL expires. Either: use a shared cache for anything that must be consistent across instances, or accept the TTL window and document it.