多级缓存 - 亿级流量的缓存方案
参考:多级缓存
Now let me build a comprehensive interactive diagram for multi-level caching.Here's a full review of the three-layer caching stack. The widget above is fully interactive — explore each tab.
The core idea
Multi-level caching is about exploiting the latency/capacity tradeoff at each level. Each tier is faster but smaller than the one below it, so you serve the hottest data from the nearest cache and fall back progressively.
Layer 1 — Nginx / OpenResty
OpenResty embeds LuaJIT into Nginx, letting you run cache logic right in the request pipeline — before any connection reaches your JVM service.
The key primitive is lua_shared_dict: a block of shared memory visible to all Nginx worker processes simultaneously. You write Lua handlers that check this dict first. On a hit, the response is returned without touching a single backend socket.
The main production concern is the cache stampede: when a popular key expires, thousands of concurrent requests all miss at once and hammer the backend. The standard fix is resty.lock — one request acquires an exclusive lock, fetches the value, repopulates the dict, and releases; all other waiters reuse the result.
Layer 2 — Redis cluster
Redis sits between Nginx and your JVM services as the shared, distributed authority for cached data. Every JVM instance sees the same Redis, so there's no inconsistency between nodes.
In a cluster, data is sharded across master nodes using hash slots (16,384 slots, each master owns a range). Each master has one or more replicas for failover. Your Lettuce or Jedis client handles slot routing transparently.
The dominant consistency pattern in production is cache-aside (lazy loading): read Redis → miss → read DB → write Redis with TTL. Write-through (synchronous write to both) is used when stale reads are especially unacceptable. Cache invalidation (just delete the key, let the next read repopulate) avoids dual-write race conditions.
Layer 3 — JVM / Caffeine
Caffeine is a near-optimal in-process cache for the JVM. Because it lives on the heap, reads are nanoseconds and require zero serialization — you get back the actual Java object reference.
The eviction algorithm, W-TinyLFU (Window TinyLFU), combines a frequency sketch with a small admission window and a segmented LRU. It achieves near-optimal hit rates with very low memory overhead, outperforming plain LRU significantly on skewed workloads (which most real traffic is).
The key tradeoff: each JVM instance has its own copy, so invalidation is local. In a multi-node deployment you need either very short TTLs or a Redis pub/sub channel that broadcasts invalidation events to all nodes when data changes.
How the layers interact in real projects
The TTL hierarchy is critical: Caffeine TTL < Redis TTL < DB source-of-truth. If Caffeine TTL were longer than Redis TTL, you could serve Caffeine data for a key that Redis has already evicted and the DB has updated — a stale read.
A typical write path:
- Write to DB (source of truth)
- Invalidate or update the Redis key
- Let Caffeine expire naturally (short TTL), or broadcast an invalidation event via Redis pub/sub
The Request flow tab in the widget walks through all four scenarios (Nginx hit, Redis hit, JVM hit, full cold miss) with per-step latency estimates. The Config examples tab shows real Lua, Spring/Redis, and Caffeine configuration patterns you'd see in production.