Caffeine Deep Dive

Caffeine Deep Dive

A high-performance Java in-process cache (W-TinyLFU eviction + lock-free concurrency). This document covers the architecture, internals, API, a Guava → Caffeine migration guide, and a deep-dive Q&A reference.

Table of Contents

Foundations

  • [1. Overview](#1. Overview)
  • [2. Core Architecture](#2. Core Architecture)

The W-TinyLFU Eviction Policy

  • [3. W-TinyLFU Eviction Strategy](#3. W-TinyLFU Eviction Strategy)

Concurrency & Internals

  • [4. Concurrency Design](#4. Concurrency Design)
  • [5. Expiration Strategies](#5. Expiration Strategies)
  • [7. Adaptive Adjustment (Hill Climbing)](#7. Adaptive Adjustment (Hill Climbing))

API & Features

  • [6. Async Support](#6. Async Support)
  • [8. Usage Examples](#8. Usage Examples)
  • [9. Scheduler (Proactive Expiration Cleanup)](#9. Scheduler (Proactive Expiration Cleanup))

Migrating from Guava

  • [10. Migration from Guava Cache](#10. Migration from Guava Cache)
  • [11. Performance Comparison](#11. Performance Comparison)
  • [12. Common Pitfalls](#12. Common Pitfalls)

Deep Dives & Reference

  • [13. W-TinyLFU & FrequencySketch --- Q&A Deep Dive](#13. W-TinyLFU & FrequencySketch — Q&A Deep Dive)
  • [14. Interview Key Points](#14. Interview Key Points)

Reading paths: New to Caffeine? Read §1--§3, then §8. Tuning/operating it? §4, §5, §7, §9. Migrating from Guava? §10--§12. Going deep (or prepping for interviews)? §13--§14. Note that §13 intentionally re-walks the §3 material in Q&A form with more worked examples.


1. Overview

Caffeine is a high-performance Java local cache library developed by Ben Manes, the original author of Guava Cache. It uses the W-TinyLFU eviction strategy (near-optimal hit rate) and a lock-free concurrent design (MPSC Buffer + CAS + async maintenance thread), delivering performance far exceeding Guava Cache.

  • Spring 5+ default cache implementation (spring-boot-starter-cache)
  • Memory : Supports up to maximumSize(Long.MAX_VALUE), but typically limited by JVM heap
  • Positioning: Extreme-performance in-process cache

2. Core Architecture

复制代码
┌──────────────────────────────────────────────────────┐
│                     Caffeine                          │
├──────────────────────────────────────────────────────┤
│  BoundedLocalCache (Core implementation)             │
│    ├── ConcurrentHashMap<K, Node<K,V>> data          │
│    │     (Storage layer - JDK8 CHM, fully concurrent)│
│    ├── Window (admission window, ~1% capacity,       │
│    │          adaptive 0.2%-80%)                     │
│    │     └── AccessOrderDeque (LRU)                  │
│    ├── Probation (probation zone)                    │
│    │     └── AccessOrderDeque (LRU)                  │
│    ├── Protected (protected zone, ~80% of Main)      │
│    │     └── AccessOrderDeque (LRU)                  │
│    ├── FrequencySketch (Count-Min Sketch + Doorkeeper)│
│    │     └── long[] table (4-bit counters, block layout)│
│    ├── ReadBuffer (MpscGrowableArrayQueue)           │
│    ├── WriteBuffer (MpscGrowableArrayQueue)          │
│    ├── TimerWheel (hierarchical timer wheel, expiry) │
│    └── Maintenance (ForkJoinPool / custom Executor)  │
└──────────────────────────────────────────────────────┘

Key design principles:

  • All mutations are asynchronous : Read/write operations record changes to buffers; a single maintenance thread batch-processes them.
  • Lock-free read path : Hot path is only ConcurrentHashMap.get() + afterRead() writing to buffer.

3. W-TinyLFU Eviction Strategy

3.1 Overall Structure

复制代码
New entry → [Window Cache] → Admission Filter → [Main Cache (SLRU)]
               (LRU)          (TinyLFU)          Probation + Protected

Default proportions:
  Window: 1%, Main: 99% (= Probation 20% + Protected 79%)
  Window can adaptively adjust via Hill Climbing (0.2% - 80%)

Purpose of the three zones:

  • Window : New entries enter here first, LRU eviction. Absorbs burst traffic and new hotspots, preventing "one-time accesses" from directly polluting the main cache.
  • Probation : Entries evicted from Window enter here after passing admission filter. Accessed again → promoted to Protected.
  • Protected: Stable zone for frequently accessed entries. When Protected is full, LRU evicts from head to Probation tail.

3.2 Admission Filter

java 复制代码
// When Window evicts a candidate, compare frequency with Probation tail victim
boolean admit(K candidateKey, K victimKey) {
    int candidateFreq = frequencySketch.frequency(candidateKey);
    int victimFreq = frequencySketch.frequency(victimKey);

    if (candidateFreq > victimFreq) return true;   // Candidate has higher frequency, admit
    if (candidateFreq < victimFreq) return false;  // Victim has higher frequency, reject
    // Equal frequency, random decision (avoids unfairness from hash collisions)
    return ThreadLocalRandom.current().nextBoolean();
}

Design intent : Core improvement over Guava's LRU --- considers not just "was it accessed recently" but also "historical access frequency." Particularly effective against scan-loop adversarial access patterns.

3.3 FrequencySketch (Count-Min Sketch)

java 复制代码
// 4-bit saturating counters (max value 15)
// Each key uses 4 hashes, takes minimum as frequency estimate
// Block layout: each long (64 bit) stores 16 4-bit counters
final class FrequencySketch<K> {
    long[] table;        // Block layout
    int sampleSize;      // Sample window size = 10 * maximumSize
    int size;            // Current cumulative increment count

    // Increment frequency (add 1 at each of 4 hash positions, saturate at 15)
    public void increment(K key) {
        int hash = spread(key.hashCode());
        int start = (hash & 3) << 2;  // Counter start position within block

        boolean added = false;
        for (int i = 0; i < 4; i++) {
            int index = indexOf(hash, i);
            // 4 counters in same block are cache-friendly
            added |= incrementAt(index, start + i);
        }

        // Periodic decay: all counters right-shift by 1 (= divide by 2)
        if (added && ++size >= sampleSize) reset();
    }

    // Query frequency (take minimum of 4 positions, guaranteed no underestimate)
    public int frequency(K key) {
        int hash = spread(key.hashCode());
        int start = (hash & 3) << 2;
        int frequency = Integer.MAX_VALUE;
        for (int i = 0; i < 4; i++) {
            int index = indexOf(hash, i);
            frequency = Math.min(frequency, countAt(index, start + i));
        }
        return frequency;
    }

    // Decay: all counters divided by 2, size halved
    void reset() {
        for (int i = 0; i < table.length; i++) {
            table[i] = (table[i] >>> 1) & RESET_MASK;
        }
        size = (size >>> 1);
    }
}

Summary:

Dimension Description
Space ~4 bit × 4 = 16 bit per key (amortized)
Time O(1) --- 4 hash lookups
Precision Allows overestimation , never underestimates (Count-Min property)
Decay Right-shift by 1 when sample is full, adapts to access pattern changes
Cache-friendly Block layout: the 4 counters live in 4 distinct longs of the same 8-long block (= one 64-byte cache line), not the same long. See §13.5 for the exact index math.

How reset() halves all counters: (word >>> 1) & RESET_MASK

A long packs 16 four-bit counters. The decay halves every counter with just one shift + one mask --- no per-counter loop:

  • word >>> 1 shifts the entire 64-bit word. Inside each nibble this correctly computes value / 2, but each counter's low bit slides into the top bit of its right neighbor's nibble, polluting it:

    复制代码
    [c1=0001][c0=0001]  --(>>> 1)-->  [...0][1 000...]   ← c1's low bit leaked into top of c0
  • & RESET_MASK where RESET_MASK = 0x7777777777777777L (0111 × 16) forces the top bit of every nibble to 0 , erasing exactly those leaked bits:

    复制代码
    raw shift nibble:  1xxx   (stray top bit from left neighbor)
    & 0111          :  0xxx   ← stray bit cleared → clean value/2
  • Example: 15 (1111) >>> 1 = 7 (0111), mask keeps 01117 ✓ (integer 15/2).

Each iteration ages 16 counters with 2 ALU ops. (Real Caffeine also adjusts size by the count of cleared low-bits to keep aging accounting exact.)

⚠️ Pseudocode simplification : the start = (hash & 3) << 2 / indexOf(hash, i) / start + i shown above is an illustrative model. In real Caffeine each of the 4 counters gets its own long index (different per i, spread via + (i << 1)) and its own independently-chosen nibble --- see §13.5. The 4 counters share a block (one cache line), not a single long.

3.4 Complete W-TinyLFU Flow

Trigger : This whole flow runs only inside the maintenance task's evictEntries() step (see §4.2), and only when total cache size exceeds maximumSize . A non-full cache never compares or evicts --- new entries just accumulate in the Window. The window entry is not "evicted first and then compared"; it is selected as a candidate because an eviction is required, and the comparison decides which of the two (candidate or victim) is actually evicted.

复制代码
1. New entry inserted → enters Window zone (LRU) via drainWriteBuffer
2. Insert pushes total cache size > maximumSize → eviction needed
3. evictEntries() runs on the maintenance thread:
     candidate = Window LRU head (oldest in window)
     victim    = Probation LRU tail (weakest main entry)
     compare freq(candidate) vs freq(victim):
       freq(cand) >  freq(victim) → candidate enters Probation, victim evicted
       freq(cand) <  freq(victim) → candidate evicted, victim stays in Probation
       freq(cand) == freq(victim) → random (biased toward rejecting a young candidate)
4. Total size is back to maximumSize
5. Probation entry accessed again → promoted to Protected
6. Protected full → head LRU-evicted to Probation tail

Note : It is not the Window being individually "full" that triggers this --- it is the whole cache exceeding maximumSize. The eviction logic then decides whether to pull from the Window or the main region based on their target proportions (tuned by Hill Climbing, §7). The candidate-vs-victim comparison above is specifically the Window-overflow case.

3.5 W-TinyLFU in a Nutshell

The problem it solves --- classic policies each have a blind spot:

Policy Keeps Strength Weakness
LRU (Least Recently Used) Recently-touched entries Great for recency workloads A scan of one-time keys floods the cache and evicts hot entries
LFU (Least Frequently Used) Frequently-touched entries Great for stable popularity Adapts slowly: stale hot items linger, new hot items struggle to earn a spot

W-TinyLFU = Windowed + TinyLFU + aging --- it combines both: frequency decides what deserves to stay, and a small recency window catches newly-popular items quickly.

复制代码
                  ┌──────────────┐
   new entry ───> │ Window (LRU) │  ~1% of capacity
                  └──────┬───────┘
                         │ evicted from window
                         ▼
                   ┌───────────┐      TinyLFU admission filter
                   │ candidate │ ──── compare frequency ────┐
                   └───────────┘                            │
                                                            ▼
                  ┌──────────────────────────────┐   keep higher-freq
                  │   Main / Probationary (SLRU)  │ <─────────────────
                  │       ~99% of capacity        │
                  └──────────────────────────────┘

How the pieces fit together:

  1. Admission window (LRU, ~1%) --- every new entry lands here first, giving fresh/bursty items a chance to prove themselves before being judged on frequency they haven't had time to build.
  2. TinyLFU admission filter --- when an entry is pushed out of the window, TinyLFU compares its estimated frequency against the victim it would replace in the main region. Only the more valuable one is admitted. This is the scan resistance: one-shot keys never accumulate frequency, so they get rejected.
  3. Frequency sketch (Count-Min Sketch) --- frequencies are estimated, not stored exactly, using a few 4-bit counters per entry, so tracking the whole keyspace costs very little memory.
  4. Aging / reset --- counters are periodically halved, so yesterday's hot item decays. This is what makes it adaptive rather than a static LFU.
  5. Main region uses SLRU (probationary + protected segments) so entries that get re-hit are promoted and protected.

Why it wins : near-optimal hit rates with bounded memory, scan resistance against cold-key floods, and fast adaptation to newly-popular keys. You don't configure any of this --- it's the built-in policy whenever a Caffeine cache is bounded via maximumSize or maximumWeight; you just set size and TTL.

4. Concurrency Design

4.1 Read/Write Buffers

java 复制代码
// Read operation hot path (lock-free)
V get(K key) {
    Node<K, V> node = data.get(key);  // CHM.get() - fully concurrent
    if (node == null) return null;
    V value = node.getValue();
    afterRead(node, now, recordHit);  // Write to ReadBuffer (lock-free)
    return value;
}

void afterRead(Node<K, V> node, long now, boolean recordHit) {
    if (recordHit) statsCounter.recordHits(1);
    // Write to MPSC ring buffer; drop on failure (lossy recording, doesn't affect correctness)
    boolean delayable = readBuffer.offer(node) != Buffer.FULL;
    if (shouldDrainBuffers(delayable)) scheduleDrainBuffers();
}

// Write operation
void afterWrite(Runnable task) {
    for (int i = 0; i < WRITE_BUFFER_MAX; i++) {
        if (writeBuffer.offer(task)) {
            scheduleAfterWrite();
            return;
        }
        scheduleDrainBuffers();  // Buffer full, proactively trigger drain
        Thread.onSpinWait();
    }
    // Fallback: execute synchronously
    lock.lock();
    try { task.run(); } finally { lock.unlock(); }
}

Key details:

  • ReadBuffer tolerates loss: Read events are statistics/LRU information; loss only affects hit rate, not correctness.
  • WriteBuffer cannot lose data: Writes must land in CHM. When buffer is full, triggers synchronous drain or acquires lock to execute.

What the write task is, and why the fallback task.run() is lossless-by-necessity

The Runnable task in afterWrite wraps one pending structural mutation to the eviction bookkeeping (the value itself is already in the CHM --- these tasks update the ordering metadata the CHM doesn't track):

Operation Task type What run() does
insert new key AddTask link node into the Window deque, add to weightedSize, frequencySketch.increment(key), trigger eviction if over capacity
update existing value UpdateTask adjust weightedSize by the weight delta, move node to MRU in its region, increment frequency
explicit removal RemovalTask unlink node from its region deque, subtract from weightedSize

Normally the task is enqueued (writeBuffer.offer(task)) and applied later by the maintenance thread in drainWriteBuffer(). But the WriteBuffer is bounded and lossless --- a dropped AddTask/RemovalTask would desync the eviction state from the CHM. So when the buffer stays full after spinning WRITE_BUFFER_MAX times, the writer falls back to running it inline:

java 复制代码
lock.lock();
try { task.run(); }      // apply the mutation now, under the eviction lock
finally { lock.unlock(); }

The eviction lock preserves the single-writer invariant (the deques are still only mutated under exclusive access). This is the exact opposite of the ReadBuffer:

复制代码
ReadBuffer full  → DROP the event              (lossy OK --- only a recency hint)
WriteBuffer full → spin, then task.run() under lock   (lossless required --- structural change)

Where afterWrite(task) is called

afterWrite is the write-side counterpart to afterRead, invoked at the tail end of every mutating map operation --- after the value is already in (or removed from) the ConcurrentHashMap:

Map operation Branch Task passed
put / putIfAbsent key absent → new node new AddTask(node, weight)
put / replace key existed, value replaced new UpdateTask(node, weightDelta)
remove / invalidate node unlinked from CHM new RemovalTask(node)
computeIfAbsent loader created a node AddTask
compute / merge created / updated / removed AddTask / UpdateTask / RemovalTask

Every write path is: (1) mutate the CHM (value now visible to readers) → (2) afterWrite(task) enqueues the matching eviction-metadata task onto the WriteBuffer → (3) the maintenance thread later runs task.run() in drainWriteBuffer() (or the writer runs it inline under the lock if the buffer stays full). Reads call afterRead(node, now, recordHit) instead --- lossy, just a recency hint.

Node vs. Task --- easy to conflate, but different roles:

Node Task
Nature data (a noun) a deferred command (a verb)
What it is the stored entry inside the CHM: key, value, deque prev/next pointers, weight, expiration timestamps, current region a short-lived Runnable (AddTask / UpdateTask / RemovalTask) that mutates a node's eviction metadata
Lifetime long-lived --- persists for the entry's whole life; what readers see via data.get(key) one-shot --- created by a write, run once by maintenance, then discarded
Carries the actual cached value a reference to the node it operates on

The split exists because step 1 (CHM mutation) is concurrent/lock-free, while step 2 (deque + sketch updates) must run single-threaded under the eviction lock. The task is how a write hands that second piece off to the maintenance thread.

Why reads don't use a task: a read always does the same single action (reorder the node toward MRU), so the buffer carries the bare Node and the drain applies that one fixed op --- no per-get() Runnable allocation and no dispatch. Writes have three distinct actions (add/update/remove) and must always be applied, so they need the polymorphic, lossless task.

4.2 Maintenance Task

java 复制代码
// Single-threaded maintenance (CAS ensures only one thread enters)
// Default runs on ForkJoinPool.commonPool, customizable via Executor
void maintenance() {
    drainReadBuffer();    // 1. Batch-apply read events (update LRU positions)
    drainWriteBuffer();   // 2. Batch-apply write events (insert/delete/update)
    expireEntries();      // 3. Clean expired entries (via TimerWheel)
    evictEntries();       // 4. Execute eviction (W-TinyLFU)
    climb();              // 5. Hill Climbing to adjust Window proportion
}

4.3 Concurrency Model Comparison with Guava Cache

Feature Guava Cache Caffeine
Lock model Segment locks (ReentrantLock) CAS + MPSC Buffer + single-thread drain
Data storage Custom Segment\[\] JDK8 ConcurrentHashMap
Read operations May wait for lock (during cleanup) Fully lock-free (CHM.get + buffer.offer)
Write operations Acquires Segment lock CAS + buffer + async maintenance
Eviction execution Synchronous during write Async maintenance thread batch execution
Throughput ~33M ops/s ~100M+ ops/s (~3x)

4.4 ReadBuffer Internals --- How It Stays Lock-Free

The ReadBuffer absorbs an offer(node) from every get() across many threads, yet never takes a lock. Four cooperating techniques make this work:

Technique What it buys
Lossy semantics offer never blocks/retries --- dropping an event is safe (see below)
Thread-striped ring buffers Producers spread across stripes (by thread probe hash) → CAS on different cache lines → near-zero contention. Stripe count grows with detected contention.
Per-slot CAS, fixed-size ring Lock-free MPSC offer, no allocation on the hot path
Single-consumer drain (CAS-gated) Only one maintenance thread drains → the AccessOrderDeques need no locks

Two distinct drop causes (both silent, both safe):

java 复制代码
final int offer(E e) {
    long head = readCounter;
    long tail = writeCounterOpaque;
    if ((tail - head) >= maximum) return FULL;       // cause 1: buffer full
    if (casWriteCounter(tail, tail + 1)) {           // claim a slot
        buffer.lazySet((int)(tail & mask), e);
        return SUCCESS;
    }
    return FAILED;                                    // cause 2: lost the CAS race
}
  1. FULL --- stripe at capacity (maintenance hasn't drained yet). Mitigated by draining + dynamic stripe growth.
  2. FAILED --- two threads on the same stripe raced the same tail slot; the CAS loser drops its event with no retry loop. Mitigated by striping (different threads usually hit different stripes).

A retry loop is deliberately avoided --- it would reintroduce the contention striping exists to remove. Both drops are safe because read events only feed the recency heuristic; a missing hint makes LRU ordering slightly stale, never wrong.

4.5 Read Ordering Under Striping

Striping means there is no global read-operation order --- and Caffeine doesn't need one:

  • Cross-stripe order is lost: events from different stripes are drained independently, so the real-time interleaving across threads is not reconstructed.
  • Per-stripe FIFO is preserved: each ring buffer drains in insertion order, and a given thread (stable probe hash) usually maps to the same stripe --- so one thread's own reads stay roughly ordered.
  • Value correctness is independent of the buffer : data.get(key) reads the CHM directly (happens-before via volatile semantics). "Did I read the latest value?" never depends on buffer ordering --- the buffer only carries a recency hint.
  • Drain order is fixed per pass : maintenance() always runs drainReadBuffer() before drainWriteBuffer() (§4.2).

A global sequence number would force every read to CAS one shared counter --- exactly the serialization point striping eliminates. The eviction policy only needs approximate recency, so relaxed ordering costs a tiny bit of hit-rate accuracy and nothing in correctness.

4.6 ReadBuffer vs Guava's recencyQueue

Same concept, different implementation. Both record which entries were read (a reference/hint, not the value --- the value lives in the CHM / entry table) so LRU reordering can be batched off the hot path.

Aspect Guava recencyQueue Caffeine ReadBuffer
Scope one per segment (default 4, set by concurrencyLevel); a key routes to its segment by hash thread-striped (by thread probe)
Structure unbounded ConcurrentLinkedQueue bounded fixed-size ring buffer(s)
Loss never drops --- grows until drained lossy --- drops on FULL or lost-CAS
Drain trigger any lock-holding op: writes, eviction/cleanup, and reads past DRAIN_THRESHOLD (64) or that take the lock (e.g. load-on-miss) dedicated maintenance task (CAS-gated)
Lock interaction drained under the segment ReentrantLock single-consumer, no lock on the deque
Failure mode bloat when neither writes nor the read-drain threshold fire often enough (queue grows) bounded --- worst case is dropped hints, not memory growth

Two clarifications people often get wrong: (1) recencyQueue is not one global queue --- there is one per segment , each with its own lock and LRU accessQueue. (2) It's not drained only on writes --- Guava's postReadCleanup drains it from a read every DRAIN_THRESHOLD (64) reads (readCount & 64 == 0tryLock + drain). Draining always happens under the segment lock, which is the cost Caffeine avoids by using a single lock-free consumer.

Two practical wins for Caffeine: bounded (can't bloat like Guava's queue in read-heavy, write-light traffic) and striped by thread, not segment (hot keys in one segment don't funnel through a single queue + lock).

4.7 When Maintenance Actually Runs (shouldDrainBuffers + scheduleDrainBuffers)

A read records a recency hint but usually does not run maintenance --- work is amortized. Two calls decide when the deferred drain fires:

java 复制代码
boolean delayable = (readBuffer.offer(node) != Buffer.FULL);
if (shouldDrainBuffers(delayable)) scheduleDrainBuffers();

delayable = "can we defer?" --- true if the buffer accepted the event (no urgency); false if the buffer was FULL (the hint was dropped → draining is now urgent).

shouldDrainBuffers(delayable) consults a drainStatus state machine (CAS-updated):

State Meaning Decision
IDLE nothing scheduled/running drain only if !delayable (buffer full / urgent); otherwise skip and amortize
REQUIRED work pending (e.g. a write flagged it) drain now
PROCESSING_TO_IDLE a drain is already running don't start another; CAS → PROCESSING_TO_REQUIRED so the running drain does one more pass
PROCESSING_TO_REQUIRED running + re-run already noted do nothing

The key efficiency point: the common hot-path read hits IDLE + delayable → returns false → no maintenance triggered. Events pile up and get batched, keeping reads nearly free.

scheduleDrainBuffers() launches the drain at most once, without blocking:

java 复制代码
void scheduleDrainBuffers() {
    if (drainStatus() >= PROCESSING_TO_IDLE) return;   // already running
    if (evictionLock.tryLock()) {                      // tryLock --- never blocks the caller
        try {
            if (drainStatus() >= PROCESSING_TO_IDLE) return;  // re-check under lock
            setDrainStatusRelease(PROCESSING_TO_IDLE);
            executor.execute(drainBuffersTask);        // run maintenance() async
        } finally { evictionLock.unlock(); }
    }
    // tryLock failed → another thread is already scheduling → just return
}
  • tryLock() not lock() --- the hot path never stalls; if someone else is scheduling, this thread returns immediately.
  • Single-flight --- the double >= PROCESSING_TO_IDLE checks ensure only one drain runs at a time. This is why the AccessOrderDeques need no locks: exactly one thread mutates them.
  • Async hand-off --- executor.execute(...) runs maintenance() (drain read → drain write → expire → evict → climb, §4.2) off the calling thread (default ForkJoinPool.commonPool).

End-to-end on a read: IDLE + buffer-not-full → no drain (amortize); buffer FULL or REQUIRED → one async drain; concurrent triggers collapse into a single running drain (with a re-run flag if work arrives mid-drain).

4.8 Ring Buffer Counter Mechanics

Both the ReadBuffer (per stripe) and the WriteBuffer are ring buffers: a fixed-size array plus two monotonically-increasing 64-bit counters --- write (producers) and read (the maintenance consumer). The physical slot is counter & mask where mask = capacity - 1 (capacity is a power of two, so the AND is a cheap modulo).

复制代码
capacity = 8, mask = 7
slots:   [0][1][2][3][4][5][6][7]
counters are absolute and only ever climb; slot = counter & 7
write++ → 8 → slot (8 & 7) = 0   ← only the SLOT INDEX wraps, never the counter

Who advances which counter (mind the terminology inversion):

Action Counter advanced Why
cache get()offer(node) into the buffer write the get is a producer into the ring
maintenance drains / consumes a node read maintenance is the consumer

A cache read advances the ring's write counter (it produces a hint into the ring); the ring's read counter belongs to the maintenance consumer. So get → write++, drain → read++.

The fullness invariant --- gap = write − read = number of unconsumed entries, capped at capacity:

java 复制代码
if ((tail - head) >= maximum) return FULL;   // maximum = ring capacity

This guarantees the producer can never overwrite an unread slot. Write can only wrap onto slot 0 again once read has advanced past it:

复制代码
read=8 write=8  gap=0  empty
get → produce:        read=8  write=9   gap=1   (node at slot 9&7 = 1)
get → produce:        read=8  write=10  gap=2
maintenance drains:   read=10 write=10  gap=0   (read catches up to write → room restored)

If read=0, write=8 the check 8 - 0 >= 8 is true → FULL → the event is dropped (ReadBuffer) rather than clobbering slot 0's still-unread entry. For the ReadBuffer, FULL also flags !delayable and triggers a drain to advance read (§4.7).

Two different "maximums":

"Maximum" Bounds Value
maximum in offer the gap write − read (unconsumed entries) ring capacity --- small, fixed (e.g. ~16/stripe)
counter ceiling the read / write counters themselves Long.MAX_VALUE --- never reached (~centuries at 1B ops/s)

Counter overflow is a non-issue: the fullness check compares the delta tail - head (two's-complement subtraction), not absolute values, so it stays correct even across a wraparound --- the same trick System.nanoTime() comparisons use.

FULL is per-stripe, not global (ReadBuffer): each afterRead resolves to one stripe via the thread's probe hash and calls offer on that stripe only. return FULL means that one stripe is full --- other stripes may be empty. A single stripe's FULL is enough to drop the hint and schedule a drain, which then flushes all stripes.

What grows the stripe count: contention, not fullness. BoundedBuffer is a StripedBuffer (same pattern as the JDK's Striped64/LongAdder). It adds stripes only when a producer loses the CAS on its slot --- the FAILED outcome --- capped at 4 × ceilingPowerOfTwo(NCPU). A FULL result does not grow the table; it just drops the hint. Each individual ring stays a fixed size (16) --- Caffeine adds more rings (to spread contending producers across distinct cache lines), it never enlarges one ring.

Stripe offer result Meaning Grows stripe count?
SUCCESS enqueued No
FULL this stripe's ring is at capacity No --- drops the hint
FAILED lost the CAS race (contention) Yes

5. Expiration Strategies

5.1 Timer Wheel (TimerWheel)

java 复制代码
// Caffeine uses a hierarchical timer wheel for expiration (Hashed Wheel Timer)
// More efficient than Guava's linear scan, O(1) insertion, batch expiration
final class TimerWheel<K, V> {
    // 5-tier timer wheel, bucket sizes increase per tier
    // Approximately: 1.07s → 1.14min → 1.22hr → 1.63day → 6.5day
    Node<K, V>[][] wheel;
    long[] nanos;  // Current time pointer per tier

    // Schedule expiration: O(1) insertion
    void schedule(Node<K, V> node) {
        long delay = node.getExpirationTime() - nanos[0];
        int tier = findTier(delay);
        int bucket = findBucket(tier, node.getExpirationTime());
        link(wheel[tier][bucket], node);  // Insert into bucket linked list
    }

    // Advance time: redistribute from lower to higher tiers
    void advance(long currentTime) { /* ... */ }
}

Principle --- a timer wheel schedules huge numbers of timeouts with O(1) insert, remove, and expire, beating sorted structures (O(log n)) and Guava's linear scan (O(n)).

  1. Clock-with-buckets : a circular array of buckets, each a doubly-linked list of entries; a time pointer sweeps around it. bucket = (expirationTime / tickDuration) mod wheelSize.

    • Insert → compute index, append to that bucket → O(1).
    • Remove → unlink (entries hold their own prev/next) → O(1).
    • Expire → when the pointer reaches a bucket, everything in it is due; no scanning of other buckets, no sorting.
  2. Hierarchical (5 levels) --- one wheel can't cover both "1s" and "6 days" at fine resolution, so Caffeine nests wheels at coarser granularities (like a clock's second/minute/hour hands, or an odometer): ~1.07s → ~1.14min → ~1.22hr → ~1.63day → ~6.5day per bucket. An entry goes in the level matching its distance in the future, keeping every wheel small while covering a vast range.

  3. Cascading (re-bucketing) --- a coarse bucket only knows "expires sometime this ~hour," not the exact second, so entries don't fire from coarse levels directly. As the pointer advances and a higher-level bucket comes due, its entries are redistributed down into the finer level that can now place them precisely. An entry trickles Level 4→3→2→1→0 over its life, each move O(1), only a few moves total. This is the trade-off vs. a single fine wheel: a little re-bucketing work in exchange for bounded memory + fine resolution across a huge range.

  4. Lazily advanced inside maintenance --- there is no dedicated ticking thread. expireEntries() calls timerWheel.advance(now), sweeping the pointer from its last position to now, cascading coarse→fine and firing due Level-0 buckets. It piggybacks on the single-threaded, batched maintenance pass (§4.2). The optional Scheduler (§9) just nudges a maintenance run near the next deadline so entries still expire with no traffic (otherwise advance runs only when something else triggers maintenance).

Approach Insert Find/expire next Drawback
Sorted list / PriorityQueue O(log n) O(log n) log-factor per op; reorder on every change
Linear scan (Guava-style) O(1) O(n) scan must scan all entries to find expired ones
Hierarchical timer wheel O(1) O(1) amortized small re-bucketing cost on cascade (cheap)

O(1) scheduling + O(1) "what expired now" is what makes Caffeine's per-entry TTL (Expiry, §5.2) practical at scale --- something Guava's coarse scan can't do efficiently.

5.2 Expiration Types

java 复制代码
Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(10))   // TTL after write
    .expireAfterAccess(Duration.ofMinutes(5))   // TTL after last access
    .expireAfter(new Expiry<K, V>() {           // Custom per-entry TTL
        public long expireAfterCreate(K k, V v, long now) {
            return v.getTtl().toNanos();  // Independent TTL per entry
        }
        public long expireAfterUpdate(K k, V v, long now, long currentDuration) {
            return currentDuration;  // Update doesn't change expiration time
        }
        public long expireAfterRead(K k, V v, long now, long currentDuration) {
            return currentDuration;
        }
    });

Key feature : Expiry supports per-entry TTL, which Guava Cache cannot do.

6. Async Support

java 复制代码
// AsyncLoadingCache
AsyncLoadingCache<String, Widget> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(Duration.ofMinutes(10))
    .buildAsync(key -> loadFromDB(key));  // AsyncCacheLoader

CompletableFuture<Widget> future = cache.get("key");

// Custom executor
AsyncLoadingCache<String, Widget> cache = Caffeine.newBuilder()
    .buildAsync((key, executor) ->
        CompletableFuture.supplyAsync(() -> loadFromDB(key), executor));

// Manual AsyncCache (without loader)
AsyncCache<String, Widget> manual = Caffeine.newBuilder().buildAsync();
CompletableFuture<Widget> f = manual.get("key", (k, exec) -> ...);

Semantics:

  • buildAsync() returns CompletableFuture; other threads get the same future during loading (prevents cache stampede).
  • On load failure, the future is automatically removed from cache (next request retries loading).

7. Adaptive Adjustment (Hill Climbing)

7.1 The Problem It Solves

The W-TinyLFU split between Window (LRU) and Main (SLRU, frequency-gated) has no single "right" ratio:

  • A recency-heavy / churny workload (lots of newly-popular keys) wants a bigger Window, so fresh items survive long enough to build frequency before being judged by the admission filter.
  • A frequency-heavy / scan-resistant workload (stable hot set + cold scans) wants a smaller Window, so the LFU main region dominates and cold one-shot keys get rejected fast.

Rather than forcing you to tune this, Caffeine treats the Window-vs-Main ratio as a knob it optimizes online , using hit rate as the objective function. That is hill climbing: nudge the knob, measure whether hit rate improved, then keep moving in the direction that helped.

7.2 The Algorithm

The climb() step runs at the end of each maintenance() pass (§4.2), but the actual adjustment fires only once per sample interval, not on every maintenance run.

java 复制代码
// Window and Main proportions dynamically adjusted (Hill Climbing algorithm)
// Goal: maximize hit rate
// Range: Window proportion between 0.2% - 80%

void climb() {
    double hitRate = currentHitRate();
    double delta = hitRate - previousHitRate;   // empirical slope of the "hill"

    if (sign(delta) == sign(previousAdjustment)) {
        // Correct direction (hit rate moved as the last adjustment predicted)
        stepSize *= 2;                           // accelerate --- climb faster
    } else {
        // Wrong direction (hit rate got worse) --- overshot or went the wrong way
        stepSize /= 2;                           // brake (this is what makes it converge)
        stepSize = -stepSize;                    // reverse
    }
    adjustWindowSize(stepSize);
    previousHitRate = hitRate;
}

Step by step:

  1. Sampling window. Caffeine accumulates requests until it has seen roughly 10 × maximumSize of them (same scale as the FrequencySketch sample). Hit rate over too few requests is noisy, so adjustments are batched.
  2. Measure the gradient. delta = currentHitRate − previousHitRate is the empirical derivative of hit rate with respect to the last size change --- the slope of the hill.
  3. Decide direction & step:
    • Same sign → we're climbing → double the step to climb faster.
    • Opposite sign → hit rate worsened → halve the step and flip direction . Shrinking the step is what makes the search converge instead of oscillating forever.
  4. Apply the move. adjustWindowSize shifts capacity between the Window and Main's Protected region. Growing the window steals slots from Protected; shrinking it gives them back. Entries aren't discarded --- the deque boundaries move and normal eviction reconciles sizes on later passes.
  5. Clamp. Window is bounded to 0.2% -- 80% of total capacity: never collapses to zero (always keep a recency buffer for bursts) and never fully starves the frequency-based main region.

7.3 Production Refinements (Beyond the Pseudocode)

The climb() above is the conceptual model. The real HillClimberAdaptiveScheme adds:

  • Adaptive restart on convergence. When stepSize decays below a tiny threshold, the optimizer has "settled." Caffeine periodically resets the step back up so it can re-explore if the workload shifts later --- otherwise a cache that converged at hour 1 could never adapt to a different access pattern at hour 5.
  • Step size proportional to capacity. The initial step (hillClimberStepPercent, default ~6.25%) and decay rate (hillClimberStepDecayRate, ~0.98) are fractions of maximumSize, so behavior is consistent for a 1K cache and a 1M cache.
  • Restart on large hit-rate movement. A big swing in hit rate (a workload regime change) can trigger re-exploration rather than waiting for the slow periodic reset.

7.4 Why Hill Climbing Specifically

Hit rate as a function of window ratio is, empirically, roughly unimodal (one broad peak) for most real workloads --- the ideal shape for hill climbing, since a local optimum is the global optimum. A cheap gradient-follow finds it without expensive global search, costing essentially two ALU ops per sample plus a counter read (negligible against the maintenance work already happening).

The trade-off: on a multi-modal or rapidly-thrashing workload, hill climbing can sit in a local optimum or chase a moving target. The periodic restart mitigates this but doesn't fully eliminate it --- that's the theoretical weakness.

7.5 Convergence Behavior

Workload Window trend Reason
Scan / loop over cold keys → small (toward 0.2%) One-shot keys never build frequency; a big window just delays evicting garbage. LFU main region dominates → scan resistance.
High new-hotspot churn (trending content) → large (toward 80%) Fresh items need recency runway to accumulate hits before the admission filter judges them. LRU-ish behavior wins.
Stable hot set (Zipfian, fixed popularity) → mid/small Frequency sketch already separates hot from cold; the window just absorbs the occasional burst.
Mixed / shifting oscillates then re-converges Restart logic lets it track regime changes.

8. Usage Examples

java 复制代码
// Synchronous LoadingCache
LoadingCache<String, Widget> cache = Caffeine.newBuilder()
    .maximumSize(10_000)
    // NOTE: maximumSize and maximumWeight are MUTUALLY EXCLUSIVE --- shown together
    // here only to list the options. Pick ONE (see §8.1). Using both throws.
    .maximumWeight(1_000_000)
    .weigher((Weigher<String, Widget>) (k, v) -> v.size())
    .expireAfterWrite(Duration.ofMinutes(10))
    .expireAfterAccess(Duration.ofMinutes(5))
    .refreshAfterWrite(Duration.ofMinutes(1))
    .scheduler(Scheduler.systemScheduler())   // Proactive expiration cleanup
    .executor(Executors.newFixedThreadPool(4)) // Maintenance thread pool
    .recordStats()
    .evictionListener((k, v, cause) ->         // Eviction (size/expired)
        log.info("Evicted {} cause={}", k, cause))
    .removalListener((k, v, cause) ->          // All removals (including explicit)
        log.info("Removed {} cause={}", k, cause))
    .build(key -> loadFromDB(key));

// Manual Cache
Cache<String, Widget> manual = Caffeine.newBuilder()
    .maximumSize(10_000)
    .build();

manual.get("key", k -> loadFromDB(k));  // computeIfAbsent semantics
manual.put("key", widget);
manual.invalidate("key");
manual.policy().eviction().ifPresent(e -> e.setMaximum(20_000));  // Runtime adjustment

// Statistics
CacheStats stats = cache.stats();
stats.hitRate();
stats.evictionCount();
stats.averageLoadPenalty();

8.1 Sizing: maximumSize vs maximumWeight

Two ways to bound a cache --- use exactly one (configuring both throws at build time):

Option Bounds by When to use
maximumSize(n) entry count --- at most n entries Entries are roughly uniform in cost. The simple, common default.
maximumWeight(w) + weigher summed weight of all entries Entry sizes vary a lot; you want to cap total footprint (bytes, item count, cost score), not count.

maximumWeight requires a weigher (and vice versa) --- the weigher assigns each entry's cost, and W-TinyLFU evicts once the sum of weights exceeds the budget:

java 复制代码
LoadingCache<String, Widget> cache = Caffeine.newBuilder()
    .maximumWeight(1_000_000)                  // total weight budget (no maximumSize!)
    .weigher((String k, Widget v) -> v.sizeInBytes())  // int >= 0, cost of one entry
    .build(key -> loadFromDB(key));

Rules and gotchas:

  • Mutually exclusive with maximumSize --- pick one.
  • Weigher returns an int, must be ≥ 0. Returning 0 means the entry never counts toward the limit (effectively unbounded for that entry --- use deliberately).
  • Weight is static --- computed once at insert (and on update), never re-read. Don't weigh a mutable size; if v.sizeInBytes() changes later, the cache won't notice.
  • Keep the weigher fast and side-effect-free --- it runs on the write path. Prefer a precomputed size field over a deep traversal.
  • An entry heavier than the maximum is admitted but immediately evictable, so set the budget above your largest expected entry.
  • Weight ≠ exact bytes --- it's any unit you choose (estimated bytes, collection element count, relative cost).

Runtime adjustment via the policy API (works for both modes):

java 复制代码
cache.policy().eviction().ifPresent(e -> {
    long limit = e.getMaximum();                 // current size/weight limit
    e.setMaximum(2_000_000);                      // resize live
    long used  = e.weightedSize().orElse(0);      // current total weight (weight mode)
});

evictionListener vs removalListener

Listener Trigger Timing Execution Thread
evictionListener Only SIZE / EXPIRED / COLLECTED (not user-initiated) Same thread as maintenance task (synchronous)
removalListener All removals (including EXPLICIT / REPLACED) Asynchronous (specified executor)

Recommendations:

  • Need immediate response to eviction (e.g., cleanup associated resources) → evictionListener
  • Need logging, event publishing → removalListener (async, doesn't block maintenance)

The five RemovalCauses --- split by who caused the removal (RemovalCause.wasEvicted() returns true for the first three):

Cause What happened Automatic? evictionListener removalListener
SIZE policy evicted a victim (over maximumSize/maximumWeight)
EXPIRED TTL passed (expireAfter* / Expiry)
COLLECTED key/value GC'd (weakKeys / weakValues / softValues)
EXPLICIT you called invalidate / remove
REPLACED you overwrote the value (put / replace / updating compute)

evictionListener fires only for automatic removals --- the ones Caffeine itself decides. It skips EXPLICIT/REPLACED because the caller already knows about those.

Who invokes evictionListener, and on which thread

Automatic removals are usually discovered inside maintenance() (evictEntries() / expireEntries(), §4.2). The callback runs inline, synchronously, while the eviction lock is held --- not handed off to an executor. So yes, the maintenance task invokes evictionListener directly.

The precise guarantee is: synchronous, under the eviction lock, on whatever thread performed the removal --- usually the maintenance thread, but it can also be a user thread when eviction/expiration is detected on the request path (e.g. a get hitting an expired entry, or a put that triggers synchronous eviction via the afterWrite lock fallback, §4.1).

复制代码
maintenance()  (or a user thread under the eviction lock)
   evictEntries() / expireEntries() picks an entry
        → unlink from deque + remove from CHM
        → evictionListener.onRemoval(k, v, cause)   ← inline, blocking, under lock
        → removalListener  → dispatched async to the Executor (not inline)

⚠️ Because it runs inside maintenance under the lock, a slow or throwing evictionListener stalls maintenance --- blocking further eviction, expiration, and buffer drains. Keep it fast and non-throwing; put heavy work (I/O, logging, event publishing) in removalListener so it runs off the maintenance thread.

9. Scheduler (Proactive Expiration Cleanup)

java 复制代码
// Solves Guava's lazy expiration problem (expired entries not cleaned without access)
LoadingCache<K, V> cache = Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(10))
    .scheduler(Scheduler.systemScheduler())  // Recommended for JDK 9+
    // Or custom ScheduledExecutorService
    .scheduler(Scheduler.forScheduledExecutorService(scheduledExecutor))
    .build(loader);

Problem without Scheduler : If cache has no read/write operations for a long time, expired entries won't be cleaned, occupying heap memory.

With Scheduler : Caffeine proactively schedules a drainBuffers based on the nearest expiration time, cleaning expired entries promptly.

9.1 How It Works --- Event-Driven, Not Polling

The Scheduler does not periodically scan the cache for expired entries. It schedules one targeted wake-up at the next known deadline , which the timer wheel (§5.1) already tracks. After firing it cleans what's due and re-arms for the new nearest deadline --- a self-rescheduling, one-timer-at-a-time loop:

复制代码
1. entry written → scheduled in the timer wheel at its deadline
2. maintenance ends → ask the wheel "when is the next expiration?" → time T
3. Scheduler.schedule(executor, drainBuffersTask, T - now, NANOS)   ← single delayed task, no loop
4. at T → task fires → maintenance() → expireEntries() advances the wheel
          → due entries removed, evictionListener fired (EXPIRED)
5. maintenance ends → next deadline T' → schedule again

So it is event-driven and self-rescheduling, never a fixed-interval poll.

9.2 What the Scheduler Triggers vs. Runs

The scheduled task only triggers a maintenance pass; the actual expiry work still runs on the cache's Executor (default ForkJoinPool.commonPool), preserving the single-threaded maintenance model. The Scheduler is just the alarm clock --- maintenance() does the work.

  • Scheduler.systemScheduler() (JDK 9+, recommended) --- backed by a single shared system timer thread, so it's cheap (no per-cache thread).
  • Scheduler.forScheduledExecutorService(ses) --- supply your own.

9.3 Scope and Caveats

  • Time-based expiration only. The Scheduler matters only with expireAfterWrite / expireAfterAccess / Expiry. Size-based eviction (maximumSize) is always handled inline during writes/maintenance and never needs a scheduled wake-up.
  • Tick granularity. Wake-ups ride on the timer wheel's resolution (finest tier ≈ 1.07s), so cleanup is prompt but not nanosecond-exact.
  • Correctness vs. promptness. Lazy on-access checking already guarantees an expired entry is never served . The Scheduler only adds prompt reclamation when idle --- free heap sooner and fire EXPIRED listeners near the real deadline. If you only care that stale data isn't served, you don't strictly need it.

10. Migration from Guava Cache

java 复制代码
// Guava
LoadingCache<K, V> guava = CacheBuilder.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(10, TimeUnit.MINUTES)
    .build(new CacheLoader<K, V>() {
        public V load(K key) { return fetch(key); }
    });

// Caffeine (nearly identical API)
LoadingCache<K, V> caffeine = Caffeine.newBuilder()
    .maximumSize(10_000)
    .expireAfterWrite(Duration.ofMinutes(10))
    .build(key -> fetch(key));

Key API differences:

  1. CacheLoader → supports Function / AsyncCacheLoader
  2. Time parameters use Duration
  3. New Expiry interface (per-entry TTL)
  4. New AsyncLoadingCache / AsyncCache
  5. New Scheduler (proactive expiration)
  6. New evictionListener
  7. refreshAfterWrite is truly async by default (Guava defaults to synchronous)
  8. No concurrencyLevel by default (lock-free design)

⚠️ Migration pitfalls:

  • Guava's RemovalListener corresponds to Caffeine's removalListener, but Caffeine executes it asynchronously --- behavior differs.
  • Guava doesn't allow CacheLoader.load() to return null (throws exception); Caffeine allows returning null to mean "no such value" (not cached).

11. Performance Comparison

Scenario Guava Cache Caffeine Improvement
Read (8 threads) ~33M ops/s ~100M ops/s ~3x
Write (8 threads) ~12M ops/s ~45M ops/s ~3.7x
Hit rate (Zipf distribution) ~65% (LRU) ~78% (W-TinyLFU) +13%
Hit rate (scan loop) ~0% (LRU) ~45% (W-TinyLFU) Significant
Hit rate (search workload) ~35% ~48% +13%

Data from Caffeine benchmarks.

12. Common Pitfalls

12.1 refreshAfterWrite does NOT mean automatic background refresh

java 复制代码
// Common misconception: "refreshAfterWrite(1min) will auto-refresh in background every minute"
// Reality: Refresh is only triggered when **that key is accessed**
Caffeine.newBuilder()
    .refreshAfterWrite(Duration.ofMinutes(1))
    .build(loader);

// If a key is not accessed for a long time, it won't be refreshed
// For true background refresh, you need to schedule it yourself

12.2 Relationship between Scheduler and expireAfter

java 复制代码
// Only expireAfterWrite configured, no Scheduler
Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(10))
    .build();
// If cache has no access for 1 hour, expired entries still occupy memory → memory leak risk

// Configure Scheduler to ensure expired entries are cleaned promptly
Caffeine.newBuilder()
    .expireAfterWrite(Duration.ofMinutes(10))
    .scheduler(Scheduler.systemScheduler())
    .build();

12.3 evictionListener blocking affects maintenance task

java 复制代码
// ❌ evictionListener runs on same thread as maintenance, blocking = delays all eviction
.evictionListener((k, v, cause) -> {
    httpClient.notify(k);  // Network call, potentially slow
})

// ✅ Put complex logic in removalListener (async by default)
.removalListener((k, v, cause) -> httpClient.notify(k))

12.4 buildAsync() loader cannot return null future

java 复制代码
// AsyncLoadingCache requires loader to return non-null CompletableFuture
// The value inside the future can be null, but the future itself cannot be null
Caffeine.newBuilder().buildAsync((key, executor) -> null);  // ❌ Throws NPE

// Correct: return CompletableFuture.completedFuture(null) to mean "no such value"
Caffeine.newBuilder().buildAsync((key, executor) ->
    CompletableFuture.completedFuture(null));

12.5 maximumWeight must be paired with weigher

java 复制代码
// ❌ Only maximumWeight without weigher → default weight=1, equivalent to maximumSize
Caffeine.newBuilder().maximumWeight(1_000_000).build();  // IllegalStateException

// ✅ Pair with weigher
Caffeine.newBuilder()
    .maximumWeight(1_000_000)
    .weigher((k, v) -> v.estimateSize())
    .build();

12.6 Atomic overhead of statistics

java 复制代码
// recordStats() has atomic counting overhead (LongAdder), visible at extremely high QPS
Caffeine.newBuilder().recordStats().build();

// If only partial metrics needed, customize StatsCounter to reduce overhead
Caffeine.newBuilder()
    .recordStats(() -> new MyStatsCounter())
    .build();

12.7 Executor configuration

java 复制代码
// Default maintenance task uses ForkJoinPool.commonPool
// If business code also runs on commonPool (e.g., parallel streams), they may interfere
Caffeine.newBuilder()
    .executor(Executors.newFixedThreadPool(4, new ThreadFactoryBuilder()
        .setNameFormat("caffeine-maintenance-%d").setDaemon(true).build()))
    .build();

13. W-TinyLFU & FrequencySketch --- Q&A Deep Dive

This section consolidates a walkthrough of how W-TinyLFU behaves and how the underlying Count-Min Sketch addresses, increments, finds, ages, and collides on counters.

13.1 Worked Example: Why W-TinyLFU Beats LRU and LFU

Setup : capacity = 10 entries → Window (LRU) = 1 slot (~1%), Main (SLRU) = 9 slots. A frequency sketch counts how often every key is requested, including keys not currently cached.

Workload (news site): a few hot articles A, B, C hit constantly, plus a crawler scanning thousands of cold one-time articles X1, X2, X3, ...:

复制代码
A B A C A B  X1  A B C  X2  A C  X3  X4  A B  X5 ...

Plain LRU fails : each new cold Xn is "most recent," so LRU keeps inserting them and evicts hot A, B, C. The scan destroys the cache.

W-TinyLFU survives (scan resistance):

  1. X1 enters the Window (1 LRU slot).

  2. X2 arrives → Window full → X1 evicted from window, becomes a candidate for Main.

  3. Admission filter compares candidate vs the weakest Main victim:

    复制代码
    freq(X1) = 1     ← scanned once
    freq(C)  = 15    ← hit constantly (counters saturate at 15)

    freq(X1) < freq(C)X1 rejected, never pollutes Main.

    (Counters are 4-bit saturating, so frequency is capped at 15 --- see §3.3 / §13.3.)

  4. Every cold key (freq 1) gets bounced the same way. The scan churns only through the window; hot keys stay safe.

Catching a newly-viral item (fast adaptation, the LFU weakness solved):

  • New article N starts at freq 0 --- pure LFU would never let it in.
  • N sits in the window and keeps getting hit while trending: freq(N): 1 → 5 → 12 → 15 (4-bit counters saturate at 15).
  • When N is pushed out of the window, its frequency now beats a victim → admitted. The window bought it time to prove itself.

Mental model:

Window = "audition stage" for new/bursty keys. Admission filter + sketch = "bouncer" that only admits a key if it's more popular than whoever it would evict. Aging = the bouncer slowly forgets old fame.

13.2 When Are Counters Halved (Aging)?

Halving (reset()) is triggered by traffic volume, not a wall-clock timer:

java 复制代码
int sampleSize;   // = 10 * maximumSize
int size;         // running count of successful increments

public void increment(K key) {
    // ... bump the key's 4 counters ...
    if (added && ++size >= sampleSize) reset();   // halve everything
}
  • size counts successful increments across all keys (not the number of cached entries).
  • The added flag: each of the key's 4 counters is a 4-bit value saturating at 15. The increment touches all 4, and added is the OR of the four per-counter results --- true if at least one counter was below 15 and got bumped. So added is false only when all 4 counters are already at 15 (i.e. the key's estimated frequency, which is the minimum of the 4, has reached 15). A fully-saturated key adds no information, so size does not advance. Note: it's the minimum hitting 15, not the max --- if even one counter is still below 15, added is true.
  • When size reaches sampleSize = 10 × maximumSize, every counter is right-shifted by 1 (÷2) and size is halved.

Example: maximumSize = 10_000sampleSize = 100_000. After ~100k recorded increments, all counters halve: 14→7, 12→6, 3→1, 1→0 (counters are 4-bit, so each is at most 15 before halving).

Why : keeps 4-bit counters in range, and is the aging mechanism --- a counter means "recent frequency" (old hits exponentially discounted), so stale hot keys decay and fresh keys can overtake them. Busy caches age fast; idle caches age slowly.

Why halve size (not reset it to 0)? size is the odometer mirroring the total count stored in the counters . Since reset() halves every counter, it halves the total frequency mass --- so size must be halved to stay consistent with what the table actually holds. This also fixes the decay cadence : after the first reset, size becomes sampleSize/2, so each subsequent decay fires after another ~sampleSize/2 new increments --- a steady rhythm:

复制代码
size: 0 ──(100k)──> 100k=sampleSize → reset, size→50k
      50k ──(50k)──> 100k → reset, size→50k
      50k ──(50k)──> 100k → reset ...   (steady: decay every ~sampleSize/2 increments)

If it reset to 0 instead, each cycle would need a full sampleSize increments and the intervals would stretch relative to the surviving counts, letting old history dominate too long. Halving keeps a stable exponential-decay window ≈ the last sampleSize accesses.

13.3 How a Key Maps to 4 Counters (Add / Find)

A key does not own one counter --- it maps to 4 counters in one block (block = a group of longs; each long = 16 four-bit counters). All 4 live in the same block for cache-line friendliness.

java 复制代码
// FIND: read 4 counters, return the MINIMUM
public int frequency(K key) {
    int hash = spread(key.hashCode());
    int freq = Integer.MAX_VALUE;
    for (int i = 0; i < 4; i++) {
        int index = indexOf(hash, i);
        freq = Math.min(freq, countAt(index, ...));
    }
    return freq;
}

// ADD: increment all 4 counters, saturating at 15
public void increment(K key) {
    int hash = spread(key.hashCode());
    boolean added = false;
    for (int i = 0; i < 4; i++) {
        int index = indexOf(hash, i);
        added |= incrementAt(index, ...);   // +1 unless already 15
    }
    if (added && ++size >= sampleSize) reset();
}

Why minimum on read? A collision can only push a counter up , never down. The minimum of the 4 is the counter least polluted by other keys → closest to truth. This is the Count-Min guarantee: over-estimate possible, under-estimate never.

13.4 How Different Keys Collide

Collisions are unavoidable by the pigeonhole principle : the sketch is sized to maximumSize (e.g. ~160 KB for 10k entries), but the keyspace is effectively unlimited (millions of IDs). Far more keys than counters → keys must share slots.

A collision occurs when two keys select the same (longIndex, nibble) position, which can happen because:

  • Low-bit hash collision --- the table size is a power of two, so only low bits pick the position; high bits are discarded.

  • Birthday effect --- with many more keys than slots, overlaps are statistically guaranteed.

    key "A" → block 7 → slots {1, 5, 8, 12}
    key "Z" → block 7 → slots {3, 5, 9, 14}
    ↑ both touch slot 5 → collision there only

The saving grace : collisions are usually partial. A and Z collided on slot 5 but not the other 3, so:

复制代码
A's slots: {1:6, 5:15(polluted/saturated), 8:6, 12:6}
freq(A) = min(6,15,6,6) = 6   ✅ still correct

A wrong read needs a full 4-way collision ; if a single-slot collision chance is p, full collision ≈ p⁴ --- tiny. That's why 4 counters instead of 1.

13.5 Confirming the Exact Slot Position for a Key

The mapping is pure deterministic bit math --- same key + same table size → same 4 slots every time. Two fixed mixers drive it:

java 复制代码
static int spread(int x) {        // picks the BLOCK
    x ^= x >>> 17; x *= 0xed5ad4bb;
    x ^= x >>> 11; x *= 0xac4c1b51;
    x ^= x >>> 15; return x;
}
static int rehash(int x) {        // picks the 4 COUNTERS
    x *= 0x31848bab; x ^= x >>> 14; return x;
}

The table is carved into blocks of 8 longs:

java 复制代码
blockMask = (table.length >>> 3) - 1;

int blockHash   = spread(key.hashCode());
int counterHash = rehash(blockHash);
int block = (blockHash & blockMask) << 3;   // base long index = blockNumber * 8

for (int i = 0; i < 4; i++) {
    int h      = counterHash >>> (i << 3);  // byte i of counterHash
    int index  = (h >>> 1) & 15;            // which nibble (0..15) in the long
    int offset = h & 1;                     // which of the 2 longs in the pair
    int slot   = block + offset + (i << 1); // which long in the array
}

Each counter i is identified by (slot, index). The + (i << 1) forces counter i into a distinct long-pair (longs 0/1, 2/3, 4/5, 6/7), keeping the 4 counters apart while the whole 8-long block (64 bytes) fits one cache line.

Verifying empirically : reflect into the package-private FrequencySketch.table, increment a key N times, assert frequency(key) == N; or copy the pure spread/rehash/index math into a standalone class and print the 4 (slot, index) pairs (deterministic, matches the live cache). Two keys collide iff any (slot, index) pair is shared.

13.6 Why index = (h >>> 1) & 15 Shifts by 1 First

One hash byte h feeds two independent decisions, so they must use disjoint bits:

复制代码
h byte:   b7 b6 b5 b4 b3 b2 b1 b0
                      └──┬──┘     └┬┘
                  index: bits 1..4  offset: bit 0
  • offset = h & 1 consumes bit 0 (which long in the pair).
  • index = (h >>> 1) & 15 shifts bit 0 away, then takes bits 1--4 (nibble 0--15).

If you used h & 15 instead, the index would reuse bit 0 --- the bit offset already depends on --- making the two choices correlated and producing more structured collisions. The >>> 1 keeps them independent → more uniform spreading.

Quick check, h = 0x51 (0101_0001): offset = 1, index = (0x51>>>1)&15 = 0x28&15 = 8. h = 0x06 (0000_0110): offset = 0, index = (0x06>>>1)&15 = 3.

13.7 Common Misconceptions --- Clarified

A consolidation of subtle points that are easy to get wrong about W-TinyLFU and the FrequencySketch.

Is a key's frequency "stored" in 4 counters?

No. Frequency is estimated, not stored. A key does not own 4 private counters:

  • Increment: hash the key → +1 to each of 4 shared counter positions (saturating at 15).
  • Read : hash the key → return the minimum of those 4 positions.
  • The 4 positions are shared with other keys (collisions). The min is taken because a collision can only push a counter up , so the smallest of the 4 is the least-polluted estimate. This is the Count-Min guarantee: over-estimate possible, under-estimate never.
Counters saturate at 15 (4-bit)

frequency(key) can never exceed 15 . Any narrative showing values like 20, 47, or 60 is wrong --- counters are 4-bit saturating, so a hot key climbs ... → 12 → 15 and stops. Periodic halving (reset()) keeps real values churning well under the cap anyway.

maximumSize is an entry count, not a key size
  • maximumSize = max number of entries (key→value pairs) the cache holds. It has nothing to do with how large a key or value is.
  • To bound by memory/weight instead, use maximumWeight + weigher.
  • A key's byte length is irrelevant to the sketch --- only its hashCode() matters, which collapses any key to a single int before hashing into the table.
Sketch table sizing: tied to maximumSize, NOT the keyspace
java 复制代码
table = new long[Math.max(ceilingPowerOfTwo(maximumSize), 8)];  // longs
sampleSize = 10 * maximumSize;                                   // aging cadence
  • Each long packs 16 four-bit counters → total counters ≈ 16 × table.length16 × maximumSize.
  • Example (maximumSize = 10_000): table.length = 16_384 longs, 262_144 counters, ~128 KB.
  • The table is sized to rank the ~maximumSize keys contending for cache slots , NOT to fit the entire keyspace. The keyspace (millions of IDs) vastly exceeds the counters, so by the pigeonhole principle collisions are guaranteed --- and that's accepted by design.
"Disjoint capacity" is a myth --- and more keys means MORE collisions
  • If counters were assigned disjointly, the table could track 262_144 / 4 = 65_536 keys cleanly. But the 4 positions are chosen by hashing (random with replacement), not partitioned --- so collisions begin far earlier (birthday paradox), not at "fill."
  • Adding more distinct keys increases collisions, never reduces them. What keeps error low is (a) over-provisioning counters relative to the contending set (~16×) and (b) 4 counters + take-min (a wrong read needs all 4 to collide ≈ p⁴).
Admission comparison runs during maintenance, only when over capacity
  • The candidate-vs-victim comparison happens inside the maintenance task's evictEntries() step (§4.2), and only when total cache size > maximumSize. A non-full cache never compares or evicts.
  • The window entry is not "evicted first then compared." It is selected as a candidate because an eviction is required; the comparison decides which of the two is evicted.
  • candidate = Window LRU head; victim = Probation LRU tail. The trigger is the whole cache exceeding capacity (not the window alone); region selection follows Hill-Climbing-tuned proportions.

14. Interview Key Points

  1. W-TinyLFU : Window(1%) + SLRU(Probation 20% + Protected 79%), admission filter uses Count-Min Sketch to compare frequencies.
  2. Count-Min Sketch: 4-bit saturating counters × 4 hashes, block layout improves cache hits, periodic right-shift decay.
  3. Lock-free concurrency: MPSC ReadBuffer + WriteBuffer, async maintenance thread batch-applies changes; read path is only CHM.get + buffer.offer.
  4. vs Guava :
    • Higher hit rate (W-TinyLFU > LRU)
    • Higher throughput (lock-free > segment locks) ~3x
    • Supports async (AsyncLoadingCache)
    • Supports Scheduler for proactive expiration
    • Supports per-entry TTL (Expiry)
  5. TimerWheel: 5-tier hierarchical timer wheel, O(1) insertion, batch expiration.
  6. Adaptive: Hill Climbing dynamically adjusts Window/Main proportion (0.2%-80%).
  7. Scheduler: Solves lazy expiration problem, promptly cleans expired entries with no recent access.
  8. evictionListener vs removalListener: Former handles only eviction (synchronous), latter handles all removals (asynchronous).
  9. ReadBuffer can lose data: Lost read events don't affect correctness, only hit rate; WriteBuffer must be reliable.