Caffeine Deep Dive
A high-performance Java in-process cache (W-TinyLFU eviction + lock-free concurrency). This document covers the architecture, internals, API, a Guava → Caffeine migration guide, and a deep-dive Q&A reference.
Table of Contents
Foundations
- [1. Overview](#1. Overview)
- [2. Core Architecture](#2. Core Architecture)
The W-TinyLFU Eviction Policy
- [3. W-TinyLFU Eviction Strategy](#3. W-TinyLFU Eviction Strategy)
Concurrency & Internals
- [4. Concurrency Design](#4. Concurrency Design)
- [5. Expiration Strategies](#5. Expiration Strategies)
- [7. Adaptive Adjustment (Hill Climbing)](#7. Adaptive Adjustment (Hill Climbing))
API & Features
- [6. Async Support](#6. Async Support)
- [8. Usage Examples](#8. Usage Examples)
- [9. Scheduler (Proactive Expiration Cleanup)](#9. Scheduler (Proactive Expiration Cleanup))
Migrating from Guava
- [10. Migration from Guava Cache](#10. Migration from Guava Cache)
- [11. Performance Comparison](#11. Performance Comparison)
- [12. Common Pitfalls](#12. Common Pitfalls)
Deep Dives & Reference
- [13. W-TinyLFU & FrequencySketch --- Q&A Deep Dive](#13. W-TinyLFU & FrequencySketch — Q&A Deep Dive)
- [14. Interview Key Points](#14. Interview Key Points)
Reading paths: New to Caffeine? Read §1--§3, then §8. Tuning/operating it? §4, §5, §7, §9. Migrating from Guava? §10--§12. Going deep (or prepping for interviews)? §13--§14. Note that §13 intentionally re-walks the §3 material in Q&A form with more worked examples.
1. Overview
Caffeine is a high-performance Java local cache library developed by Ben Manes, the original author of Guava Cache. It uses the W-TinyLFU eviction strategy (near-optimal hit rate) and a lock-free concurrent design (MPSC Buffer + CAS + async maintenance thread), delivering performance far exceeding Guava Cache.
- Spring 5+ default cache implementation (
spring-boot-starter-cache) - Memory : Supports up to
maximumSize(Long.MAX_VALUE), but typically limited by JVM heap - Positioning: Extreme-performance in-process cache
2. Core Architecture
┌──────────────────────────────────────────────────────┐
│ Caffeine │
├──────────────────────────────────────────────────────┤
│ BoundedLocalCache (Core implementation) │
│ ├── ConcurrentHashMap<K, Node<K,V>> data │
│ │ (Storage layer - JDK8 CHM, fully concurrent)│
│ ├── Window (admission window, ~1% capacity, │
│ │ adaptive 0.2%-80%) │
│ │ └── AccessOrderDeque (LRU) │
│ ├── Probation (probation zone) │
│ │ └── AccessOrderDeque (LRU) │
│ ├── Protected (protected zone, ~80% of Main) │
│ │ └── AccessOrderDeque (LRU) │
│ ├── FrequencySketch (Count-Min Sketch + Doorkeeper)│
│ │ └── long[] table (4-bit counters, block layout)│
│ ├── ReadBuffer (MpscGrowableArrayQueue) │
│ ├── WriteBuffer (MpscGrowableArrayQueue) │
│ ├── TimerWheel (hierarchical timer wheel, expiry) │
│ └── Maintenance (ForkJoinPool / custom Executor) │
└──────────────────────────────────────────────────────┘
Key design principles:
- All mutations are asynchronous : Read/write operations record changes to buffers; a single maintenance thread batch-processes them.
- Lock-free read path : Hot path is only
ConcurrentHashMap.get()+afterRead()writing to buffer.
3. W-TinyLFU Eviction Strategy
3.1 Overall Structure
New entry → [Window Cache] → Admission Filter → [Main Cache (SLRU)]
(LRU) (TinyLFU) Probation + Protected
Default proportions:
Window: 1%, Main: 99% (= Probation 20% + Protected 79%)
Window can adaptively adjust via Hill Climbing (0.2% - 80%)
Purpose of the three zones:
- Window : New entries enter here first, LRU eviction. Absorbs burst traffic and new hotspots, preventing "one-time accesses" from directly polluting the main cache.
- Probation : Entries evicted from Window enter here after passing admission filter. Accessed again → promoted to Protected.
- Protected: Stable zone for frequently accessed entries. When Protected is full, LRU evicts from head to Probation tail.
3.2 Admission Filter
java
// When Window evicts a candidate, compare frequency with Probation tail victim
boolean admit(K candidateKey, K victimKey) {
int candidateFreq = frequencySketch.frequency(candidateKey);
int victimFreq = frequencySketch.frequency(victimKey);
if (candidateFreq > victimFreq) return true; // Candidate has higher frequency, admit
if (candidateFreq < victimFreq) return false; // Victim has higher frequency, reject
// Equal frequency, random decision (avoids unfairness from hash collisions)
return ThreadLocalRandom.current().nextBoolean();
}
Design intent : Core improvement over Guava's LRU --- considers not just "was it accessed recently" but also "historical access frequency." Particularly effective against scan-loop adversarial access patterns.
3.3 FrequencySketch (Count-Min Sketch)
java
// 4-bit saturating counters (max value 15)
// Each key uses 4 hashes, takes minimum as frequency estimate
// Block layout: each long (64 bit) stores 16 4-bit counters
final class FrequencySketch<K> {
long[] table; // Block layout
int sampleSize; // Sample window size = 10 * maximumSize
int size; // Current cumulative increment count
// Increment frequency (add 1 at each of 4 hash positions, saturate at 15)
public void increment(K key) {
int hash = spread(key.hashCode());
int start = (hash & 3) << 2; // Counter start position within block
boolean added = false;
for (int i = 0; i < 4; i++) {
int index = indexOf(hash, i);
// 4 counters in same block are cache-friendly
added |= incrementAt(index, start + i);
}
// Periodic decay: all counters right-shift by 1 (= divide by 2)
if (added && ++size >= sampleSize) reset();
}
// Query frequency (take minimum of 4 positions, guaranteed no underestimate)
public int frequency(K key) {
int hash = spread(key.hashCode());
int start = (hash & 3) << 2;
int frequency = Integer.MAX_VALUE;
for (int i = 0; i < 4; i++) {
int index = indexOf(hash, i);
frequency = Math.min(frequency, countAt(index, start + i));
}
return frequency;
}
// Decay: all counters divided by 2, size halved
void reset() {
for (int i = 0; i < table.length; i++) {
table[i] = (table[i] >>> 1) & RESET_MASK;
}
size = (size >>> 1);
}
}
Summary:
| Dimension | Description |
|---|---|
| Space | ~4 bit × 4 = 16 bit per key (amortized) |
| Time | O(1) --- 4 hash lookups |
| Precision | Allows overestimation , never underestimates (Count-Min property) |
| Decay | Right-shift by 1 when sample is full, adapts to access pattern changes |
| Cache-friendly | Block layout: the 4 counters live in 4 distinct longs of the same 8-long block (= one 64-byte cache line), not the same long. See §13.5 for the exact index math. |
How reset() halves all counters: (word >>> 1) & RESET_MASK
A long packs 16 four-bit counters. The decay halves every counter with just one shift + one mask --- no per-counter loop:
-
word >>> 1shifts the entire 64-bit word. Inside each nibble this correctly computesvalue / 2, but each counter's low bit slides into the top bit of its right neighbor's nibble, polluting it:[c1=0001][c0=0001] --(>>> 1)--> [...0][1 000...] ← c1's low bit leaked into top of c0 -
& RESET_MASKwhereRESET_MASK = 0x7777777777777777L(0111× 16) forces the top bit of every nibble to 0 , erasing exactly those leaked bits:raw shift nibble: 1xxx (stray top bit from left neighbor) & 0111 : 0xxx ← stray bit cleared → clean value/2 -
Example:
15 (1111) >>> 1 = 7 (0111), mask keeps0111→7✓ (integer 15/2).
Each iteration ages 16 counters with 2 ALU ops. (Real Caffeine also adjusts size by the count of cleared low-bits to keep aging accounting exact.)
⚠️ Pseudocode simplification : the
start = (hash & 3) << 2/indexOf(hash, i)/start + ishown above is an illustrative model. In real Caffeine each of the 4 counters gets its ownlongindex (different peri, spread via+ (i << 1)) and its own independently-chosen nibble --- see §13.5. The 4 counters share a block (one cache line), not a single long.
3.4 Complete W-TinyLFU Flow
Trigger : This whole flow runs only inside the maintenance task's evictEntries() step (see §4.2), and only when total cache size exceeds maximumSize . A non-full cache never compares or evicts --- new entries just accumulate in the Window. The window entry is not "evicted first and then compared"; it is selected as a candidate because an eviction is required, and the comparison decides which of the two (candidate or victim) is actually evicted.
1. New entry inserted → enters Window zone (LRU) via drainWriteBuffer
2. Insert pushes total cache size > maximumSize → eviction needed
3. evictEntries() runs on the maintenance thread:
candidate = Window LRU head (oldest in window)
victim = Probation LRU tail (weakest main entry)
compare freq(candidate) vs freq(victim):
freq(cand) > freq(victim) → candidate enters Probation, victim evicted
freq(cand) < freq(victim) → candidate evicted, victim stays in Probation
freq(cand) == freq(victim) → random (biased toward rejecting a young candidate)
4. Total size is back to maximumSize
5. Probation entry accessed again → promoted to Protected
6. Protected full → head LRU-evicted to Probation tail
Note : It is not the Window being individually "full" that triggers this --- it is the whole cache exceeding maximumSize. The eviction logic then decides whether to pull from the Window or the main region based on their target proportions (tuned by Hill Climbing, §7). The candidate-vs-victim comparison above is specifically the Window-overflow case.
3.5 W-TinyLFU in a Nutshell
The problem it solves --- classic policies each have a blind spot:
| Policy | Keeps | Strength | Weakness |
|---|---|---|---|
| LRU (Least Recently Used) | Recently-touched entries | Great for recency workloads | A scan of one-time keys floods the cache and evicts hot entries |
| LFU (Least Frequently Used) | Frequently-touched entries | Great for stable popularity | Adapts slowly: stale hot items linger, new hot items struggle to earn a spot |
W-TinyLFU = Windowed + TinyLFU + aging --- it combines both: frequency decides what deserves to stay, and a small recency window catches newly-popular items quickly.
┌──────────────┐
new entry ───> │ Window (LRU) │ ~1% of capacity
└──────┬───────┘
│ evicted from window
▼
┌───────────┐ TinyLFU admission filter
│ candidate │ ──── compare frequency ────┐
└───────────┘ │
▼
┌──────────────────────────────┐ keep higher-freq
│ Main / Probationary (SLRU) │ <─────────────────
│ ~99% of capacity │
└──────────────────────────────┘
How the pieces fit together:
- Admission window (LRU, ~1%) --- every new entry lands here first, giving fresh/bursty items a chance to prove themselves before being judged on frequency they haven't had time to build.
- TinyLFU admission filter --- when an entry is pushed out of the window, TinyLFU compares its estimated frequency against the victim it would replace in the main region. Only the more valuable one is admitted. This is the scan resistance: one-shot keys never accumulate frequency, so they get rejected.
- Frequency sketch (Count-Min Sketch) --- frequencies are estimated, not stored exactly, using a few 4-bit counters per entry, so tracking the whole keyspace costs very little memory.
- Aging / reset --- counters are periodically halved, so yesterday's hot item decays. This is what makes it adaptive rather than a static LFU.
- Main region uses SLRU (probationary + protected segments) so entries that get re-hit are promoted and protected.
Why it wins : near-optimal hit rates with bounded memory, scan resistance against cold-key floods, and fast adaptation to newly-popular keys. You don't configure any of this --- it's the built-in policy whenever a Caffeine cache is bounded via maximumSize or maximumWeight; you just set size and TTL.
4. Concurrency Design
4.1 Read/Write Buffers
java
// Read operation hot path (lock-free)
V get(K key) {
Node<K, V> node = data.get(key); // CHM.get() - fully concurrent
if (node == null) return null;
V value = node.getValue();
afterRead(node, now, recordHit); // Write to ReadBuffer (lock-free)
return value;
}
void afterRead(Node<K, V> node, long now, boolean recordHit) {
if (recordHit) statsCounter.recordHits(1);
// Write to MPSC ring buffer; drop on failure (lossy recording, doesn't affect correctness)
boolean delayable = readBuffer.offer(node) != Buffer.FULL;
if (shouldDrainBuffers(delayable)) scheduleDrainBuffers();
}
// Write operation
void afterWrite(Runnable task) {
for (int i = 0; i < WRITE_BUFFER_MAX; i++) {
if (writeBuffer.offer(task)) {
scheduleAfterWrite();
return;
}
scheduleDrainBuffers(); // Buffer full, proactively trigger drain
Thread.onSpinWait();
}
// Fallback: execute synchronously
lock.lock();
try { task.run(); } finally { lock.unlock(); }
}
Key details:
- ReadBuffer tolerates loss: Read events are statistics/LRU information; loss only affects hit rate, not correctness.
- WriteBuffer cannot lose data: Writes must land in CHM. When buffer is full, triggers synchronous drain or acquires lock to execute.
What the write task is, and why the fallback task.run() is lossless-by-necessity
The Runnable task in afterWrite wraps one pending structural mutation to the eviction bookkeeping (the value itself is already in the CHM --- these tasks update the ordering metadata the CHM doesn't track):
| Operation | Task type | What run() does |
|---|---|---|
| insert new key | AddTask |
link node into the Window deque, add to weightedSize, frequencySketch.increment(key), trigger eviction if over capacity |
| update existing value | UpdateTask |
adjust weightedSize by the weight delta, move node to MRU in its region, increment frequency |
| explicit removal | RemovalTask |
unlink node from its region deque, subtract from weightedSize |
Normally the task is enqueued (writeBuffer.offer(task)) and applied later by the maintenance thread in drainWriteBuffer(). But the WriteBuffer is bounded and lossless --- a dropped AddTask/RemovalTask would desync the eviction state from the CHM. So when the buffer stays full after spinning WRITE_BUFFER_MAX times, the writer falls back to running it inline:
java
lock.lock();
try { task.run(); } // apply the mutation now, under the eviction lock
finally { lock.unlock(); }
The eviction lock preserves the single-writer invariant (the deques are still only mutated under exclusive access). This is the exact opposite of the ReadBuffer:
ReadBuffer full → DROP the event (lossy OK --- only a recency hint)
WriteBuffer full → spin, then task.run() under lock (lossless required --- structural change)
Where afterWrite(task) is called
afterWrite is the write-side counterpart to afterRead, invoked at the tail end of every mutating map operation --- after the value is already in (or removed from) the ConcurrentHashMap:
| Map operation | Branch | Task passed |
|---|---|---|
put / putIfAbsent |
key absent → new node | new AddTask(node, weight) |
put / replace |
key existed, value replaced | new UpdateTask(node, weightDelta) |
remove / invalidate |
node unlinked from CHM | new RemovalTask(node) |
computeIfAbsent |
loader created a node | AddTask |
compute / merge |
created / updated / removed | AddTask / UpdateTask / RemovalTask |
Every write path is: (1) mutate the CHM (value now visible to readers) → (2) afterWrite(task) enqueues the matching eviction-metadata task onto the WriteBuffer → (3) the maintenance thread later runs task.run() in drainWriteBuffer() (or the writer runs it inline under the lock if the buffer stays full). Reads call afterRead(node, now, recordHit) instead --- lossy, just a recency hint.
Node vs. Task --- easy to conflate, but different roles:
| Node | Task | |
|---|---|---|
| Nature | data (a noun) | a deferred command (a verb) |
| What it is | the stored entry inside the CHM: key, value, deque prev/next pointers, weight, expiration timestamps, current region |
a short-lived Runnable (AddTask / UpdateTask / RemovalTask) that mutates a node's eviction metadata |
| Lifetime | long-lived --- persists for the entry's whole life; what readers see via data.get(key) |
one-shot --- created by a write, run once by maintenance, then discarded |
| Carries | the actual cached value | a reference to the node it operates on |
The split exists because step 1 (CHM mutation) is concurrent/lock-free, while step 2 (deque + sketch updates) must run single-threaded under the eviction lock. The task is how a write hands that second piece off to the maintenance thread.
Why reads don't use a task: a read always does the same single action (reorder the node toward MRU), so the buffer carries the bare
Nodeand the drain applies that one fixed op --- no per-get()Runnableallocation and no dispatch. Writes have three distinct actions (add/update/remove) and must always be applied, so they need the polymorphic, lossless task.
4.2 Maintenance Task
java
// Single-threaded maintenance (CAS ensures only one thread enters)
// Default runs on ForkJoinPool.commonPool, customizable via Executor
void maintenance() {
drainReadBuffer(); // 1. Batch-apply read events (update LRU positions)
drainWriteBuffer(); // 2. Batch-apply write events (insert/delete/update)
expireEntries(); // 3. Clean expired entries (via TimerWheel)
evictEntries(); // 4. Execute eviction (W-TinyLFU)
climb(); // 5. Hill Climbing to adjust Window proportion
}
4.3 Concurrency Model Comparison with Guava Cache
| Feature | Guava Cache | Caffeine |
|---|---|---|
| Lock model | Segment locks (ReentrantLock) | CAS + MPSC Buffer + single-thread drain |
| Data storage | Custom Segment\[\] | JDK8 ConcurrentHashMap |
| Read operations | May wait for lock (during cleanup) | Fully lock-free (CHM.get + buffer.offer) |
| Write operations | Acquires Segment lock | CAS + buffer + async maintenance |
| Eviction execution | Synchronous during write | Async maintenance thread batch execution |
| Throughput | ~33M ops/s | ~100M+ ops/s (~3x) |
4.4 ReadBuffer Internals --- How It Stays Lock-Free
The ReadBuffer absorbs an offer(node) from every get() across many threads, yet never takes a lock. Four cooperating techniques make this work:
| Technique | What it buys |
|---|---|
| Lossy semantics | offer never blocks/retries --- dropping an event is safe (see below) |
| Thread-striped ring buffers | Producers spread across stripes (by thread probe hash) → CAS on different cache lines → near-zero contention. Stripe count grows with detected contention. |
| Per-slot CAS, fixed-size ring | Lock-free MPSC offer, no allocation on the hot path |
| Single-consumer drain (CAS-gated) | Only one maintenance thread drains → the AccessOrderDeques need no locks |
Two distinct drop causes (both silent, both safe):
java
final int offer(E e) {
long head = readCounter;
long tail = writeCounterOpaque;
if ((tail - head) >= maximum) return FULL; // cause 1: buffer full
if (casWriteCounter(tail, tail + 1)) { // claim a slot
buffer.lazySet((int)(tail & mask), e);
return SUCCESS;
}
return FAILED; // cause 2: lost the CAS race
}
- FULL --- stripe at capacity (maintenance hasn't drained yet). Mitigated by draining + dynamic stripe growth.
- FAILED --- two threads on the same stripe raced the same tail slot; the CAS loser drops its event with no retry loop. Mitigated by striping (different threads usually hit different stripes).
A retry loop is deliberately avoided --- it would reintroduce the contention striping exists to remove. Both drops are safe because read events only feed the recency heuristic; a missing hint makes LRU ordering slightly stale, never wrong.
4.5 Read Ordering Under Striping
Striping means there is no global read-operation order --- and Caffeine doesn't need one:
- Cross-stripe order is lost: events from different stripes are drained independently, so the real-time interleaving across threads is not reconstructed.
- Per-stripe FIFO is preserved: each ring buffer drains in insertion order, and a given thread (stable probe hash) usually maps to the same stripe --- so one thread's own reads stay roughly ordered.
- Value correctness is independent of the buffer :
data.get(key)reads the CHM directly (happens-before via volatile semantics). "Did I read the latest value?" never depends on buffer ordering --- the buffer only carries a recency hint. - Drain order is fixed per pass :
maintenance()always runsdrainReadBuffer()beforedrainWriteBuffer()(§4.2).
A global sequence number would force every read to CAS one shared counter --- exactly the serialization point striping eliminates. The eviction policy only needs approximate recency, so relaxed ordering costs a tiny bit of hit-rate accuracy and nothing in correctness.
4.6 ReadBuffer vs Guava's recencyQueue
Same concept, different implementation. Both record which entries were read (a reference/hint, not the value --- the value lives in the CHM / entry table) so LRU reordering can be batched off the hot path.
| Aspect | Guava recencyQueue |
Caffeine ReadBuffer |
|---|---|---|
| Scope | one per segment (default 4, set by concurrencyLevel); a key routes to its segment by hash |
thread-striped (by thread probe) |
| Structure | unbounded ConcurrentLinkedQueue |
bounded fixed-size ring buffer(s) |
| Loss | never drops --- grows until drained | lossy --- drops on FULL or lost-CAS |
| Drain trigger | any lock-holding op: writes, eviction/cleanup, and reads past DRAIN_THRESHOLD (64) or that take the lock (e.g. load-on-miss) |
dedicated maintenance task (CAS-gated) |
| Lock interaction | drained under the segment ReentrantLock |
single-consumer, no lock on the deque |
| Failure mode | bloat when neither writes nor the read-drain threshold fire often enough (queue grows) | bounded --- worst case is dropped hints, not memory growth |
Two clarifications people often get wrong: (1)
recencyQueueis not one global queue --- there is one per segment , each with its own lock and LRUaccessQueue. (2) It's not drained only on writes --- Guava'spostReadCleanupdrains it from a read everyDRAIN_THRESHOLD(64) reads (readCount & 64 == 0→tryLock+ drain). Draining always happens under the segment lock, which is the cost Caffeine avoids by using a single lock-free consumer.
Two practical wins for Caffeine: bounded (can't bloat like Guava's queue in read-heavy, write-light traffic) and striped by thread, not segment (hot keys in one segment don't funnel through a single queue + lock).
4.7 When Maintenance Actually Runs (shouldDrainBuffers + scheduleDrainBuffers)
A read records a recency hint but usually does not run maintenance --- work is amortized. Two calls decide when the deferred drain fires:
java
boolean delayable = (readBuffer.offer(node) != Buffer.FULL);
if (shouldDrainBuffers(delayable)) scheduleDrainBuffers();
delayable = "can we defer?" --- true if the buffer accepted the event (no urgency); false if the buffer was FULL (the hint was dropped → draining is now urgent).
shouldDrainBuffers(delayable) consults a drainStatus state machine (CAS-updated):
| State | Meaning | Decision |
|---|---|---|
IDLE |
nothing scheduled/running | drain only if !delayable (buffer full / urgent); otherwise skip and amortize |
REQUIRED |
work pending (e.g. a write flagged it) | drain now |
PROCESSING_TO_IDLE |
a drain is already running | don't start another; CAS → PROCESSING_TO_REQUIRED so the running drain does one more pass |
PROCESSING_TO_REQUIRED |
running + re-run already noted | do nothing |
The key efficiency point: the common hot-path read hits IDLE + delayable → returns false → no maintenance triggered. Events pile up and get batched, keeping reads nearly free.
scheduleDrainBuffers() launches the drain at most once, without blocking:
java
void scheduleDrainBuffers() {
if (drainStatus() >= PROCESSING_TO_IDLE) return; // already running
if (evictionLock.tryLock()) { // tryLock --- never blocks the caller
try {
if (drainStatus() >= PROCESSING_TO_IDLE) return; // re-check under lock
setDrainStatusRelease(PROCESSING_TO_IDLE);
executor.execute(drainBuffersTask); // run maintenance() async
} finally { evictionLock.unlock(); }
}
// tryLock failed → another thread is already scheduling → just return
}
tryLock()notlock()--- the hot path never stalls; if someone else is scheduling, this thread returns immediately.- Single-flight --- the double
>= PROCESSING_TO_IDLEchecks ensure only one drain runs at a time. This is why theAccessOrderDeques need no locks: exactly one thread mutates them. - Async hand-off ---
executor.execute(...)runsmaintenance()(drain read → drain write → expire → evict → climb, §4.2) off the calling thread (defaultForkJoinPool.commonPool).
End-to-end on a read: IDLE + buffer-not-full → no drain (amortize); buffer FULL or REQUIRED → one async drain; concurrent triggers collapse into a single running drain (with a re-run flag if work arrives mid-drain).
4.8 Ring Buffer Counter Mechanics
Both the ReadBuffer (per stripe) and the WriteBuffer are ring buffers: a fixed-size array plus two monotonically-increasing 64-bit counters --- write (producers) and read (the maintenance consumer). The physical slot is counter & mask where mask = capacity - 1 (capacity is a power of two, so the AND is a cheap modulo).
capacity = 8, mask = 7
slots: [0][1][2][3][4][5][6][7]
counters are absolute and only ever climb; slot = counter & 7
write++ → 8 → slot (8 & 7) = 0 ← only the SLOT INDEX wraps, never the counter
Who advances which counter (mind the terminology inversion):
| Action | Counter advanced | Why |
|---|---|---|
cache get() → offer(node) into the buffer |
write ↑ | the get is a producer into the ring |
| maintenance drains / consumes a node | read ↑ | maintenance is the consumer |
A cache read advances the ring's write counter (it produces a hint into the ring); the ring's read counter belongs to the maintenance consumer. So get → write++, drain → read++.
The fullness invariant --- gap = write − read = number of unconsumed entries, capped at capacity:
java
if ((tail - head) >= maximum) return FULL; // maximum = ring capacity
This guarantees the producer can never overwrite an unread slot. Write can only wrap onto slot 0 again once read has advanced past it:
read=8 write=8 gap=0 empty
get → produce: read=8 write=9 gap=1 (node at slot 9&7 = 1)
get → produce: read=8 write=10 gap=2
maintenance drains: read=10 write=10 gap=0 (read catches up to write → room restored)
If read=0, write=8 the check 8 - 0 >= 8 is true → FULL → the event is dropped (ReadBuffer) rather than clobbering slot 0's still-unread entry. For the ReadBuffer, FULL also flags !delayable and triggers a drain to advance read (§4.7).
Two different "maximums":
| "Maximum" | Bounds | Value |
|---|---|---|
maximum in offer |
the gap write − read (unconsumed entries) |
ring capacity --- small, fixed (e.g. ~16/stripe) |
| counter ceiling | the read / write counters themselves |
Long.MAX_VALUE --- never reached (~centuries at 1B ops/s) |
Counter overflow is a non-issue: the fullness check compares the delta tail - head (two's-complement subtraction), not absolute values, so it stays correct even across a wraparound --- the same trick System.nanoTime() comparisons use.
FULL is per-stripe, not global (ReadBuffer): each afterRead resolves to one stripe via the thread's probe hash and calls offer on that stripe only. return FULL means that one stripe is full --- other stripes may be empty. A single stripe's FULL is enough to drop the hint and schedule a drain, which then flushes all stripes.
What grows the stripe count: contention, not fullness. BoundedBuffer is a StripedBuffer (same pattern as the JDK's Striped64/LongAdder). It adds stripes only when a producer loses the CAS on its slot --- the FAILED outcome --- capped at 4 × ceilingPowerOfTwo(NCPU). A FULL result does not grow the table; it just drops the hint. Each individual ring stays a fixed size (16) --- Caffeine adds more rings (to spread contending producers across distinct cache lines), it never enlarges one ring.
Stripe offer result |
Meaning | Grows stripe count? |
|---|---|---|
SUCCESS |
enqueued | No |
FULL |
this stripe's ring is at capacity | No --- drops the hint |
FAILED |
lost the CAS race (contention) | Yes |
5. Expiration Strategies
5.1 Timer Wheel (TimerWheel)
java
// Caffeine uses a hierarchical timer wheel for expiration (Hashed Wheel Timer)
// More efficient than Guava's linear scan, O(1) insertion, batch expiration
final class TimerWheel<K, V> {
// 5-tier timer wheel, bucket sizes increase per tier
// Approximately: 1.07s → 1.14min → 1.22hr → 1.63day → 6.5day
Node<K, V>[][] wheel;
long[] nanos; // Current time pointer per tier
// Schedule expiration: O(1) insertion
void schedule(Node<K, V> node) {
long delay = node.getExpirationTime() - nanos[0];
int tier = findTier(delay);
int bucket = findBucket(tier, node.getExpirationTime());
link(wheel[tier][bucket], node); // Insert into bucket linked list
}
// Advance time: redistribute from lower to higher tiers
void advance(long currentTime) { /* ... */ }
}
Principle --- a timer wheel schedules huge numbers of timeouts with O(1) insert, remove, and expire, beating sorted structures (O(log n)) and Guava's linear scan (O(n)).
-
Clock-with-buckets : a circular array of buckets, each a doubly-linked list of entries; a time pointer sweeps around it.
bucket = (expirationTime / tickDuration) mod wheelSize.- Insert → compute index, append to that bucket → O(1).
- Remove → unlink (entries hold their own prev/next) → O(1).
- Expire → when the pointer reaches a bucket, everything in it is due; no scanning of other buckets, no sorting.
-
Hierarchical (5 levels) --- one wheel can't cover both "1s" and "6 days" at fine resolution, so Caffeine nests wheels at coarser granularities (like a clock's second/minute/hour hands, or an odometer): ~1.07s → ~1.14min → ~1.22hr → ~1.63day → ~6.5day per bucket. An entry goes in the level matching its distance in the future, keeping every wheel small while covering a vast range.
-
Cascading (re-bucketing) --- a coarse bucket only knows "expires sometime this ~hour," not the exact second, so entries don't fire from coarse levels directly. As the pointer advances and a higher-level bucket comes due, its entries are redistributed down into the finer level that can now place them precisely. An entry trickles Level 4→3→2→1→0 over its life, each move O(1), only a few moves total. This is the trade-off vs. a single fine wheel: a little re-bucketing work in exchange for bounded memory + fine resolution across a huge range.
-
Lazily advanced inside maintenance --- there is no dedicated ticking thread.
expireEntries()callstimerWheel.advance(now), sweeping the pointer from its last position to now, cascading coarse→fine and firing due Level-0 buckets. It piggybacks on the single-threaded, batched maintenance pass (§4.2). The optionalScheduler(§9) just nudges a maintenance run near the next deadline so entries still expire with no traffic (otherwiseadvanceruns only when something else triggers maintenance).
| Approach | Insert | Find/expire next | Drawback |
|---|---|---|---|
Sorted list / PriorityQueue |
O(log n) | O(log n) | log-factor per op; reorder on every change |
| Linear scan (Guava-style) | O(1) | O(n) scan | must scan all entries to find expired ones |
| Hierarchical timer wheel | O(1) | O(1) amortized | small re-bucketing cost on cascade (cheap) |
O(1) scheduling + O(1) "what expired now" is what makes Caffeine's per-entry TTL (Expiry, §5.2) practical at scale --- something Guava's coarse scan can't do efficiently.
5.2 Expiration Types
java
Caffeine.newBuilder()
.expireAfterWrite(Duration.ofMinutes(10)) // TTL after write
.expireAfterAccess(Duration.ofMinutes(5)) // TTL after last access
.expireAfter(new Expiry<K, V>() { // Custom per-entry TTL
public long expireAfterCreate(K k, V v, long now) {
return v.getTtl().toNanos(); // Independent TTL per entry
}
public long expireAfterUpdate(K k, V v, long now, long currentDuration) {
return currentDuration; // Update doesn't change expiration time
}
public long expireAfterRead(K k, V v, long now, long currentDuration) {
return currentDuration;
}
});
Key feature : Expiry supports per-entry TTL, which Guava Cache cannot do.
6. Async Support
java
// AsyncLoadingCache
AsyncLoadingCache<String, Widget> cache = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(10))
.buildAsync(key -> loadFromDB(key)); // AsyncCacheLoader
CompletableFuture<Widget> future = cache.get("key");
// Custom executor
AsyncLoadingCache<String, Widget> cache = Caffeine.newBuilder()
.buildAsync((key, executor) ->
CompletableFuture.supplyAsync(() -> loadFromDB(key), executor));
// Manual AsyncCache (without loader)
AsyncCache<String, Widget> manual = Caffeine.newBuilder().buildAsync();
CompletableFuture<Widget> f = manual.get("key", (k, exec) -> ...);
Semantics:
buildAsync()returnsCompletableFuture; other threads get the same future during loading (prevents cache stampede).- On load failure, the future is automatically removed from cache (next request retries loading).
7. Adaptive Adjustment (Hill Climbing)
7.1 The Problem It Solves
The W-TinyLFU split between Window (LRU) and Main (SLRU, frequency-gated) has no single "right" ratio:
- A recency-heavy / churny workload (lots of newly-popular keys) wants a bigger Window, so fresh items survive long enough to build frequency before being judged by the admission filter.
- A frequency-heavy / scan-resistant workload (stable hot set + cold scans) wants a smaller Window, so the LFU main region dominates and cold one-shot keys get rejected fast.
Rather than forcing you to tune this, Caffeine treats the Window-vs-Main ratio as a knob it optimizes online , using hit rate as the objective function. That is hill climbing: nudge the knob, measure whether hit rate improved, then keep moving in the direction that helped.
7.2 The Algorithm
The climb() step runs at the end of each maintenance() pass (§4.2), but the actual adjustment fires only once per sample interval, not on every maintenance run.
java
// Window and Main proportions dynamically adjusted (Hill Climbing algorithm)
// Goal: maximize hit rate
// Range: Window proportion between 0.2% - 80%
void climb() {
double hitRate = currentHitRate();
double delta = hitRate - previousHitRate; // empirical slope of the "hill"
if (sign(delta) == sign(previousAdjustment)) {
// Correct direction (hit rate moved as the last adjustment predicted)
stepSize *= 2; // accelerate --- climb faster
} else {
// Wrong direction (hit rate got worse) --- overshot or went the wrong way
stepSize /= 2; // brake (this is what makes it converge)
stepSize = -stepSize; // reverse
}
adjustWindowSize(stepSize);
previousHitRate = hitRate;
}
Step by step:
- Sampling window. Caffeine accumulates requests until it has seen roughly
10 × maximumSizeof them (same scale as the FrequencySketch sample). Hit rate over too few requests is noisy, so adjustments are batched. - Measure the gradient.
delta = currentHitRate − previousHitRateis the empirical derivative of hit rate with respect to the last size change --- the slope of the hill. - Decide direction & step:
- Same sign → we're climbing → double the step to climb faster.
- Opposite sign → hit rate worsened → halve the step and flip direction . Shrinking the step is what makes the search converge instead of oscillating forever.
- Apply the move.
adjustWindowSizeshifts capacity between the Window and Main's Protected region. Growing the window steals slots from Protected; shrinking it gives them back. Entries aren't discarded --- the deque boundaries move and normal eviction reconciles sizes on later passes. - Clamp. Window is bounded to 0.2% -- 80% of total capacity: never collapses to zero (always keep a recency buffer for bursts) and never fully starves the frequency-based main region.
7.3 Production Refinements (Beyond the Pseudocode)
The climb() above is the conceptual model. The real HillClimberAdaptiveScheme adds:
- Adaptive restart on convergence. When
stepSizedecays below a tiny threshold, the optimizer has "settled." Caffeine periodically resets the step back up so it can re-explore if the workload shifts later --- otherwise a cache that converged at hour 1 could never adapt to a different access pattern at hour 5. - Step size proportional to capacity. The initial step (
hillClimberStepPercent, default ~6.25%) and decay rate (hillClimberStepDecayRate, ~0.98) are fractions ofmaximumSize, so behavior is consistent for a 1K cache and a 1M cache. - Restart on large hit-rate movement. A big swing in hit rate (a workload regime change) can trigger re-exploration rather than waiting for the slow periodic reset.
7.4 Why Hill Climbing Specifically
Hit rate as a function of window ratio is, empirically, roughly unimodal (one broad peak) for most real workloads --- the ideal shape for hill climbing, since a local optimum is the global optimum. A cheap gradient-follow finds it without expensive global search, costing essentially two ALU ops per sample plus a counter read (negligible against the maintenance work already happening).
The trade-off: on a multi-modal or rapidly-thrashing workload, hill climbing can sit in a local optimum or chase a moving target. The periodic restart mitigates this but doesn't fully eliminate it --- that's the theoretical weakness.
7.5 Convergence Behavior
| Workload | Window trend | Reason |
|---|---|---|
| Scan / loop over cold keys | → small (toward 0.2%) | One-shot keys never build frequency; a big window just delays evicting garbage. LFU main region dominates → scan resistance. |
| High new-hotspot churn (trending content) | → large (toward 80%) | Fresh items need recency runway to accumulate hits before the admission filter judges them. LRU-ish behavior wins. |
| Stable hot set (Zipfian, fixed popularity) | → mid/small | Frequency sketch already separates hot from cold; the window just absorbs the occasional burst. |
| Mixed / shifting | oscillates then re-converges | Restart logic lets it track regime changes. |
8. Usage Examples
java
// Synchronous LoadingCache
LoadingCache<String, Widget> cache = Caffeine.newBuilder()
.maximumSize(10_000)
// NOTE: maximumSize and maximumWeight are MUTUALLY EXCLUSIVE --- shown together
// here only to list the options. Pick ONE (see §8.1). Using both throws.
.maximumWeight(1_000_000)
.weigher((Weigher<String, Widget>) (k, v) -> v.size())
.expireAfterWrite(Duration.ofMinutes(10))
.expireAfterAccess(Duration.ofMinutes(5))
.refreshAfterWrite(Duration.ofMinutes(1))
.scheduler(Scheduler.systemScheduler()) // Proactive expiration cleanup
.executor(Executors.newFixedThreadPool(4)) // Maintenance thread pool
.recordStats()
.evictionListener((k, v, cause) -> // Eviction (size/expired)
log.info("Evicted {} cause={}", k, cause))
.removalListener((k, v, cause) -> // All removals (including explicit)
log.info("Removed {} cause={}", k, cause))
.build(key -> loadFromDB(key));
// Manual Cache
Cache<String, Widget> manual = Caffeine.newBuilder()
.maximumSize(10_000)
.build();
manual.get("key", k -> loadFromDB(k)); // computeIfAbsent semantics
manual.put("key", widget);
manual.invalidate("key");
manual.policy().eviction().ifPresent(e -> e.setMaximum(20_000)); // Runtime adjustment
// Statistics
CacheStats stats = cache.stats();
stats.hitRate();
stats.evictionCount();
stats.averageLoadPenalty();
8.1 Sizing: maximumSize vs maximumWeight
Two ways to bound a cache --- use exactly one (configuring both throws at build time):
| Option | Bounds by | When to use |
|---|---|---|
maximumSize(n) |
entry count --- at most n entries |
Entries are roughly uniform in cost. The simple, common default. |
maximumWeight(w) + weigher |
summed weight of all entries | Entry sizes vary a lot; you want to cap total footprint (bytes, item count, cost score), not count. |
maximumWeight requires a weigher (and vice versa) --- the weigher assigns each entry's cost, and W-TinyLFU evicts once the sum of weights exceeds the budget:
java
LoadingCache<String, Widget> cache = Caffeine.newBuilder()
.maximumWeight(1_000_000) // total weight budget (no maximumSize!)
.weigher((String k, Widget v) -> v.sizeInBytes()) // int >= 0, cost of one entry
.build(key -> loadFromDB(key));
Rules and gotchas:
- Mutually exclusive with
maximumSize--- pick one. - Weigher returns an
int, must be≥ 0. Returning0means the entry never counts toward the limit (effectively unbounded for that entry --- use deliberately). - Weight is static --- computed once at insert (and on update), never re-read. Don't weigh a mutable size; if
v.sizeInBytes()changes later, the cache won't notice. - Keep the weigher fast and side-effect-free --- it runs on the write path. Prefer a precomputed size field over a deep traversal.
- An entry heavier than the maximum is admitted but immediately evictable, so set the budget above your largest expected entry.
- Weight ≠ exact bytes --- it's any unit you choose (estimated bytes, collection element count, relative cost).
Runtime adjustment via the policy API (works for both modes):
java
cache.policy().eviction().ifPresent(e -> {
long limit = e.getMaximum(); // current size/weight limit
e.setMaximum(2_000_000); // resize live
long used = e.weightedSize().orElse(0); // current total weight (weight mode)
});
evictionListener vs removalListener
| Listener | Trigger Timing | Execution Thread |
|---|---|---|
evictionListener |
Only SIZE / EXPIRED / COLLECTED (not user-initiated) |
Same thread as maintenance task (synchronous) |
removalListener |
All removals (including EXPLICIT / REPLACED) |
Asynchronous (specified executor) |
Recommendations:
- Need immediate response to eviction (e.g., cleanup associated resources) →
evictionListener - Need logging, event publishing →
removalListener(async, doesn't block maintenance)
The five RemovalCauses --- split by who caused the removal (RemovalCause.wasEvicted() returns true for the first three):
| Cause | What happened | Automatic? | evictionListener |
removalListener |
|---|---|---|---|---|
SIZE |
policy evicted a victim (over maximumSize/maximumWeight) |
✅ | ✅ | ✅ |
EXPIRED |
TTL passed (expireAfter* / Expiry) |
✅ | ✅ | ✅ |
COLLECTED |
key/value GC'd (weakKeys / weakValues / softValues) |
✅ | ✅ | ✅ |
EXPLICIT |
you called invalidate / remove |
❌ | ❌ | ✅ |
REPLACED |
you overwrote the value (put / replace / updating compute) |
❌ | ❌ | ✅ |
evictionListener fires only for automatic removals --- the ones Caffeine itself decides. It skips EXPLICIT/REPLACED because the caller already knows about those.
Who invokes evictionListener, and on which thread
Automatic removals are usually discovered inside maintenance() (evictEntries() / expireEntries(), §4.2). The callback runs inline, synchronously, while the eviction lock is held --- not handed off to an executor. So yes, the maintenance task invokes evictionListener directly.
The precise guarantee is: synchronous, under the eviction lock, on whatever thread performed the removal --- usually the maintenance thread, but it can also be a user thread when eviction/expiration is detected on the request path (e.g. a get hitting an expired entry, or a put that triggers synchronous eviction via the afterWrite lock fallback, §4.1).
maintenance() (or a user thread under the eviction lock)
evictEntries() / expireEntries() picks an entry
→ unlink from deque + remove from CHM
→ evictionListener.onRemoval(k, v, cause) ← inline, blocking, under lock
→ removalListener → dispatched async to the Executor (not inline)
⚠️ Because it runs inside maintenance under the lock, a slow or throwing evictionListener stalls maintenance --- blocking further eviction, expiration, and buffer drains. Keep it fast and non-throwing; put heavy work (I/O, logging, event publishing) in removalListener so it runs off the maintenance thread.
9. Scheduler (Proactive Expiration Cleanup)
java
// Solves Guava's lazy expiration problem (expired entries not cleaned without access)
LoadingCache<K, V> cache = Caffeine.newBuilder()
.expireAfterWrite(Duration.ofMinutes(10))
.scheduler(Scheduler.systemScheduler()) // Recommended for JDK 9+
// Or custom ScheduledExecutorService
.scheduler(Scheduler.forScheduledExecutorService(scheduledExecutor))
.build(loader);
Problem without Scheduler : If cache has no read/write operations for a long time, expired entries won't be cleaned, occupying heap memory.
With Scheduler : Caffeine proactively schedules a drainBuffers based on the nearest expiration time, cleaning expired entries promptly.
9.1 How It Works --- Event-Driven, Not Polling
The Scheduler does not periodically scan the cache for expired entries. It schedules one targeted wake-up at the next known deadline , which the timer wheel (§5.1) already tracks. After firing it cleans what's due and re-arms for the new nearest deadline --- a self-rescheduling, one-timer-at-a-time loop:
1. entry written → scheduled in the timer wheel at its deadline
2. maintenance ends → ask the wheel "when is the next expiration?" → time T
3. Scheduler.schedule(executor, drainBuffersTask, T - now, NANOS) ← single delayed task, no loop
4. at T → task fires → maintenance() → expireEntries() advances the wheel
→ due entries removed, evictionListener fired (EXPIRED)
5. maintenance ends → next deadline T' → schedule again
So it is event-driven and self-rescheduling, never a fixed-interval poll.
9.2 What the Scheduler Triggers vs. Runs
The scheduled task only triggers a maintenance pass; the actual expiry work still runs on the cache's Executor (default ForkJoinPool.commonPool), preserving the single-threaded maintenance model. The Scheduler is just the alarm clock --- maintenance() does the work.
Scheduler.systemScheduler()(JDK 9+, recommended) --- backed by a single shared system timer thread, so it's cheap (no per-cache thread).Scheduler.forScheduledExecutorService(ses)--- supply your own.
9.3 Scope and Caveats
- Time-based expiration only. The Scheduler matters only with
expireAfterWrite/expireAfterAccess/Expiry. Size-based eviction (maximumSize) is always handled inline during writes/maintenance and never needs a scheduled wake-up. - Tick granularity. Wake-ups ride on the timer wheel's resolution (finest tier ≈ 1.07s), so cleanup is prompt but not nanosecond-exact.
- Correctness vs. promptness. Lazy on-access checking already guarantees an expired entry is never served . The Scheduler only adds prompt reclamation when idle --- free heap sooner and fire
EXPIREDlisteners near the real deadline. If you only care that stale data isn't served, you don't strictly need it.
10. Migration from Guava Cache
java
// Guava
LoadingCache<K, V> guava = CacheBuilder.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(10, TimeUnit.MINUTES)
.build(new CacheLoader<K, V>() {
public V load(K key) { return fetch(key); }
});
// Caffeine (nearly identical API)
LoadingCache<K, V> caffeine = Caffeine.newBuilder()
.maximumSize(10_000)
.expireAfterWrite(Duration.ofMinutes(10))
.build(key -> fetch(key));
Key API differences:
CacheLoader→ supportsFunction/AsyncCacheLoader- Time parameters use
Duration - New
Expiryinterface (per-entry TTL) - New
AsyncLoadingCache/AsyncCache - New
Scheduler(proactive expiration) - New
evictionListener refreshAfterWriteis truly async by default (Guava defaults to synchronous)- No
concurrencyLevelby default (lock-free design)
⚠️ Migration pitfalls:
- Guava's
RemovalListenercorresponds to Caffeine'sremovalListener, but Caffeine executes it asynchronously --- behavior differs. - Guava doesn't allow
CacheLoader.load()to returnnull(throws exception); Caffeine allows returningnullto mean "no such value" (not cached).
11. Performance Comparison
| Scenario | Guava Cache | Caffeine | Improvement |
|---|---|---|---|
| Read (8 threads) | ~33M ops/s | ~100M ops/s | ~3x |
| Write (8 threads) | ~12M ops/s | ~45M ops/s | ~3.7x |
| Hit rate (Zipf distribution) | ~65% (LRU) | ~78% (W-TinyLFU) | +13% |
| Hit rate (scan loop) | ~0% (LRU) | ~45% (W-TinyLFU) | Significant |
| Hit rate (search workload) | ~35% | ~48% | +13% |
Data from Caffeine benchmarks.
12. Common Pitfalls
12.1 refreshAfterWrite does NOT mean automatic background refresh
java
// Common misconception: "refreshAfterWrite(1min) will auto-refresh in background every minute"
// Reality: Refresh is only triggered when **that key is accessed**
Caffeine.newBuilder()
.refreshAfterWrite(Duration.ofMinutes(1))
.build(loader);
// If a key is not accessed for a long time, it won't be refreshed
// For true background refresh, you need to schedule it yourself
12.2 Relationship between Scheduler and expireAfter
java
// Only expireAfterWrite configured, no Scheduler
Caffeine.newBuilder()
.expireAfterWrite(Duration.ofMinutes(10))
.build();
// If cache has no access for 1 hour, expired entries still occupy memory → memory leak risk
// Configure Scheduler to ensure expired entries are cleaned promptly
Caffeine.newBuilder()
.expireAfterWrite(Duration.ofMinutes(10))
.scheduler(Scheduler.systemScheduler())
.build();
12.3 evictionListener blocking affects maintenance task
java
// ❌ evictionListener runs on same thread as maintenance, blocking = delays all eviction
.evictionListener((k, v, cause) -> {
httpClient.notify(k); // Network call, potentially slow
})
// ✅ Put complex logic in removalListener (async by default)
.removalListener((k, v, cause) -> httpClient.notify(k))
12.4 buildAsync() loader cannot return null future
java
// AsyncLoadingCache requires loader to return non-null CompletableFuture
// The value inside the future can be null, but the future itself cannot be null
Caffeine.newBuilder().buildAsync((key, executor) -> null); // ❌ Throws NPE
// Correct: return CompletableFuture.completedFuture(null) to mean "no such value"
Caffeine.newBuilder().buildAsync((key, executor) ->
CompletableFuture.completedFuture(null));
12.5 maximumWeight must be paired with weigher
java
// ❌ Only maximumWeight without weigher → default weight=1, equivalent to maximumSize
Caffeine.newBuilder().maximumWeight(1_000_000).build(); // IllegalStateException
// ✅ Pair with weigher
Caffeine.newBuilder()
.maximumWeight(1_000_000)
.weigher((k, v) -> v.estimateSize())
.build();
12.6 Atomic overhead of statistics
java
// recordStats() has atomic counting overhead (LongAdder), visible at extremely high QPS
Caffeine.newBuilder().recordStats().build();
// If only partial metrics needed, customize StatsCounter to reduce overhead
Caffeine.newBuilder()
.recordStats(() -> new MyStatsCounter())
.build();
12.7 Executor configuration
java
// Default maintenance task uses ForkJoinPool.commonPool
// If business code also runs on commonPool (e.g., parallel streams), they may interfere
Caffeine.newBuilder()
.executor(Executors.newFixedThreadPool(4, new ThreadFactoryBuilder()
.setNameFormat("caffeine-maintenance-%d").setDaemon(true).build()))
.build();
13. W-TinyLFU & FrequencySketch --- Q&A Deep Dive
This section consolidates a walkthrough of how W-TinyLFU behaves and how the underlying Count-Min Sketch addresses, increments, finds, ages, and collides on counters.
13.1 Worked Example: Why W-TinyLFU Beats LRU and LFU
Setup : capacity = 10 entries → Window (LRU) = 1 slot (~1%), Main (SLRU) = 9 slots. A frequency sketch counts how often every key is requested, including keys not currently cached.
Workload (news site): a few hot articles A, B, C hit constantly, plus a crawler scanning thousands of cold one-time articles X1, X2, X3, ...:
A B A C A B X1 A B C X2 A C X3 X4 A B X5 ...
Plain LRU fails : each new cold Xn is "most recent," so LRU keeps inserting them and evicts hot A, B, C. The scan destroys the cache.
W-TinyLFU survives (scan resistance):
-
X1enters the Window (1 LRU slot). -
X2arrives → Window full →X1evicted from window, becomes a candidate for Main. -
Admission filter compares candidate vs the weakest Main victim:
freq(X1) = 1 ← scanned once freq(C) = 15 ← hit constantly (counters saturate at 15)freq(X1) < freq(C)→ X1 rejected, never pollutes Main.(Counters are 4-bit saturating, so frequency is capped at 15 --- see §3.3 / §13.3.)
-
Every cold key (freq 1) gets bounced the same way. The scan churns only through the window; hot keys stay safe.
Catching a newly-viral item (fast adaptation, the LFU weakness solved):
- New article
Nstarts at freq 0 --- pure LFU would never let it in. Nsits in the window and keeps getting hit while trending:freq(N): 1 → 5 → 12 → 15(4-bit counters saturate at 15).- When
Nis pushed out of the window, its frequency now beats a victim → admitted. The window bought it time to prove itself.
Mental model:
Window = "audition stage" for new/bursty keys. Admission filter + sketch = "bouncer" that only admits a key if it's more popular than whoever it would evict. Aging = the bouncer slowly forgets old fame.
13.2 When Are Counters Halved (Aging)?
Halving (reset()) is triggered by traffic volume, not a wall-clock timer:
java
int sampleSize; // = 10 * maximumSize
int size; // running count of successful increments
public void increment(K key) {
// ... bump the key's 4 counters ...
if (added && ++size >= sampleSize) reset(); // halve everything
}
sizecounts successful increments across all keys (not the number of cached entries).- The
addedflag: each of the key's 4 counters is a 4-bit value saturating at 15. The increment touches all 4, andaddedis the OR of the four per-counter results ---trueif at least one counter was below 15 and got bumped. Soaddedisfalseonly when all 4 counters are already at 15 (i.e. the key's estimatedfrequency, which is the minimum of the 4, has reached 15). A fully-saturated key adds no information, sosizedoes not advance. Note: it's the minimum hitting 15, not the max --- if even one counter is still below 15,addedistrue. - When
sizereachessampleSize = 10 × maximumSize, every counter is right-shifted by 1 (÷2) andsizeis halved.
Example: maximumSize = 10_000 → sampleSize = 100_000. After ~100k recorded increments, all counters halve: 14→7, 12→6, 3→1, 1→0 (counters are 4-bit, so each is at most 15 before halving).
Why : keeps 4-bit counters in range, and is the aging mechanism --- a counter means "recent frequency" (old hits exponentially discounted), so stale hot keys decay and fresh keys can overtake them. Busy caches age fast; idle caches age slowly.
Why halve size (not reset it to 0)? size is the odometer mirroring the total count stored in the counters . Since reset() halves every counter, it halves the total frequency mass --- so size must be halved to stay consistent with what the table actually holds. This also fixes the decay cadence : after the first reset, size becomes sampleSize/2, so each subsequent decay fires after another ~sampleSize/2 new increments --- a steady rhythm:
size: 0 ──(100k)──> 100k=sampleSize → reset, size→50k
50k ──(50k)──> 100k → reset, size→50k
50k ──(50k)──> 100k → reset ... (steady: decay every ~sampleSize/2 increments)
If it reset to 0 instead, each cycle would need a full sampleSize increments and the intervals would stretch relative to the surviving counts, letting old history dominate too long. Halving keeps a stable exponential-decay window ≈ the last sampleSize accesses.
13.3 How a Key Maps to 4 Counters (Add / Find)
A key does not own one counter --- it maps to 4 counters in one block (block = a group of longs; each long = 16 four-bit counters). All 4 live in the same block for cache-line friendliness.
java
// FIND: read 4 counters, return the MINIMUM
public int frequency(K key) {
int hash = spread(key.hashCode());
int freq = Integer.MAX_VALUE;
for (int i = 0; i < 4; i++) {
int index = indexOf(hash, i);
freq = Math.min(freq, countAt(index, ...));
}
return freq;
}
// ADD: increment all 4 counters, saturating at 15
public void increment(K key) {
int hash = spread(key.hashCode());
boolean added = false;
for (int i = 0; i < 4; i++) {
int index = indexOf(hash, i);
added |= incrementAt(index, ...); // +1 unless already 15
}
if (added && ++size >= sampleSize) reset();
}
Why minimum on read? A collision can only push a counter up , never down. The minimum of the 4 is the counter least polluted by other keys → closest to truth. This is the Count-Min guarantee: over-estimate possible, under-estimate never.
13.4 How Different Keys Collide
Collisions are unavoidable by the pigeonhole principle : the sketch is sized to maximumSize (e.g. ~160 KB for 10k entries), but the keyspace is effectively unlimited (millions of IDs). Far more keys than counters → keys must share slots.
A collision occurs when two keys select the same (longIndex, nibble) position, which can happen because:
-
Low-bit hash collision --- the table size is a power of two, so only low bits pick the position; high bits are discarded.
-
Birthday effect --- with many more keys than slots, overlaps are statistically guaranteed.
key "A" → block 7 → slots {1, 5, 8, 12}
key "Z" → block 7 → slots {3, 5, 9, 14}
↑ both touch slot 5 → collision there only
The saving grace : collisions are usually partial. A and Z collided on slot 5 but not the other 3, so:
A's slots: {1:6, 5:15(polluted/saturated), 8:6, 12:6}
freq(A) = min(6,15,6,6) = 6 ✅ still correct
A wrong read needs a full 4-way collision ; if a single-slot collision chance is p, full collision ≈ p⁴ --- tiny. That's why 4 counters instead of 1.
13.5 Confirming the Exact Slot Position for a Key
The mapping is pure deterministic bit math --- same key + same table size → same 4 slots every time. Two fixed mixers drive it:
java
static int spread(int x) { // picks the BLOCK
x ^= x >>> 17; x *= 0xed5ad4bb;
x ^= x >>> 11; x *= 0xac4c1b51;
x ^= x >>> 15; return x;
}
static int rehash(int x) { // picks the 4 COUNTERS
x *= 0x31848bab; x ^= x >>> 14; return x;
}
The table is carved into blocks of 8 longs:
java
blockMask = (table.length >>> 3) - 1;
int blockHash = spread(key.hashCode());
int counterHash = rehash(blockHash);
int block = (blockHash & blockMask) << 3; // base long index = blockNumber * 8
for (int i = 0; i < 4; i++) {
int h = counterHash >>> (i << 3); // byte i of counterHash
int index = (h >>> 1) & 15; // which nibble (0..15) in the long
int offset = h & 1; // which of the 2 longs in the pair
int slot = block + offset + (i << 1); // which long in the array
}
Each counter i is identified by (slot, index). The + (i << 1) forces counter i into a distinct long-pair (longs 0/1, 2/3, 4/5, 6/7), keeping the 4 counters apart while the whole 8-long block (64 bytes) fits one cache line.
Verifying empirically : reflect into the package-private FrequencySketch.table, increment a key N times, assert frequency(key) == N; or copy the pure spread/rehash/index math into a standalone class and print the 4 (slot, index) pairs (deterministic, matches the live cache). Two keys collide iff any (slot, index) pair is shared.
13.6 Why index = (h >>> 1) & 15 Shifts by 1 First
One hash byte h feeds two independent decisions, so they must use disjoint bits:
h byte: b7 b6 b5 b4 b3 b2 b1 b0
└──┬──┘ └┬┘
index: bits 1..4 offset: bit 0
offset = h & 1consumes bit 0 (which long in the pair).index = (h >>> 1) & 15shifts bit 0 away, then takes bits 1--4 (nibble 0--15).
If you used h & 15 instead, the index would reuse bit 0 --- the bit offset already depends on --- making the two choices correlated and producing more structured collisions. The >>> 1 keeps them independent → more uniform spreading.
Quick check, h = 0x51 (0101_0001): offset = 1, index = (0x51>>>1)&15 = 0x28&15 = 8. h = 0x06 (0000_0110): offset = 0, index = (0x06>>>1)&15 = 3.
13.7 Common Misconceptions --- Clarified
A consolidation of subtle points that are easy to get wrong about W-TinyLFU and the FrequencySketch.
Is a key's frequency "stored" in 4 counters?
No. Frequency is estimated, not stored. A key does not own 4 private counters:
- Increment: hash the key → +1 to each of 4 shared counter positions (saturating at 15).
- Read : hash the key → return the minimum of those 4 positions.
- The 4 positions are shared with other keys (collisions). The min is taken because a collision can only push a counter up , so the smallest of the 4 is the least-polluted estimate. This is the Count-Min guarantee: over-estimate possible, under-estimate never.
Counters saturate at 15 (4-bit)
frequency(key) can never exceed 15 . Any narrative showing values like 20, 47, or 60 is wrong --- counters are 4-bit saturating, so a hot key climbs ... → 12 → 15 and stops. Periodic halving (reset()) keeps real values churning well under the cap anyway.
maximumSize is an entry count, not a key size
maximumSize= max number of entries (key→value pairs) the cache holds. It has nothing to do with how large a key or value is.- To bound by memory/weight instead, use
maximumWeight+weigher. - A key's byte length is irrelevant to the sketch --- only its
hashCode()matters, which collapses any key to a singleintbefore hashing into the table.
Sketch table sizing: tied to maximumSize, NOT the keyspace
java
table = new long[Math.max(ceilingPowerOfTwo(maximumSize), 8)]; // longs
sampleSize = 10 * maximumSize; // aging cadence
- Each
longpacks 16 four-bit counters → total counters ≈16 × table.length≈16 × maximumSize. - Example (
maximumSize = 10_000):table.length = 16_384longs,262_144counters,~128 KB. - The table is sized to rank the ~
maximumSizekeys contending for cache slots , NOT to fit the entire keyspace. The keyspace (millions of IDs) vastly exceeds the counters, so by the pigeonhole principle collisions are guaranteed --- and that's accepted by design.
"Disjoint capacity" is a myth --- and more keys means MORE collisions
- If counters were assigned disjointly, the table could track
262_144 / 4 = 65_536keys cleanly. But the 4 positions are chosen by hashing (random with replacement), not partitioned --- so collisions begin far earlier (birthday paradox), not at "fill." - Adding more distinct keys increases collisions, never reduces them. What keeps error low is (a) over-provisioning counters relative to the contending set (~16×) and (b) 4 counters + take-min (a wrong read needs all 4 to collide ≈
p⁴).
Admission comparison runs during maintenance, only when over capacity
- The candidate-vs-victim comparison happens inside the maintenance task's
evictEntries()step (§4.2), and only when total cache size >maximumSize. A non-full cache never compares or evicts. - The window entry is not "evicted first then compared." It is selected as a candidate because an eviction is required; the comparison decides which of the two is evicted.
candidate= Window LRU head;victim= Probation LRU tail. The trigger is the whole cache exceeding capacity (not the window alone); region selection follows Hill-Climbing-tuned proportions.
14. Interview Key Points
- W-TinyLFU :
Window(1%) + SLRU(Probation 20% + Protected 79%), admission filter uses Count-Min Sketch to compare frequencies. - Count-Min Sketch: 4-bit saturating counters × 4 hashes, block layout improves cache hits, periodic right-shift decay.
- Lock-free concurrency: MPSC ReadBuffer + WriteBuffer, async maintenance thread batch-applies changes; read path is only CHM.get + buffer.offer.
- vs Guava :
- Higher hit rate (W-TinyLFU > LRU)
- Higher throughput (lock-free > segment locks) ~3x
- Supports async (
AsyncLoadingCache) - Supports
Schedulerfor proactive expiration - Supports per-entry TTL (
Expiry)
- TimerWheel: 5-tier hierarchical timer wheel, O(1) insertion, batch expiration.
- Adaptive: Hill Climbing dynamically adjusts Window/Main proportion (0.2%-80%).
- Scheduler: Solves lazy expiration problem, promptly cleans expired entries with no recent access.
- evictionListener vs removalListener: Former handles only eviction (synchronous), latter handles all removals (asynchronous).
- ReadBuffer can lose data: Lost read events don't affect correctness, only hit rate; WriteBuffer must be reliable.