死磕 Netty 之内存篇：PoolThreadCache 源码分析

本文为稀土掘金技术社区首发签约文章，30天内禁止转载，30天后未获授权禁止转载，侵权必究！

大家好，我是大明哥，一个专注「死磕 Java」系列创作的硬核程序员。

本文已收录到我的技术网站：www.skjava.com。有全网最优质的系列文章、Java 全栈技术文档以及大厂完整面经

经过大明哥前面几篇文章的阐述，相信各位小伙伴对 Netty 的内存管理有了一个比较深入的理解，这篇文章我们来了解 Netty 内存模型最后一个组件：PoolThreadCache。

PoolThreadCache 是什么？

通过前面几篇学习，我们知道 Netty 为了优化内存的分配和回收，使用了内存池机制来分配 ByteBuf。内存池技术在一定程度上提升了 Netty 的性能，但是对于对性能有极致要求的 Netty 来说还不够。每次在分配和释放时都直接作用在内存池上，比如每次分配新的 ByteBuf 时都需要使用内存分配算法来分配内存，效率比较低，有没有一种更好的方法呢？PoolThreadCache。

使用 PoolThreadCache 可以在 ByteBuf 使用完回收时不直接回收到内存池去，而是将其缓存到 PoolThreadCache 中，下次在需要分配同大小（同范围）的 ByteBuf 时可以直接从 PoolThreadCache 中获取了，这样就可以避免频繁地在内存池中分配和释放内存了，效率比较高。

PoolThreadCache 从字面上翻译是本地内存池缓存，为什么叫做本地呢？因为 Netty 会为每一个线程都维护一个 PoolThreadCache 对象，当该线程进行内存申请时，会首先从 PoolThreadCache 中申请，如果无法申请到，才会从 Netty 的内存池中申请。

那 PoolThreadCache 里面缓存的是什么呢？并不是 ByteBuf ，而是待释放 ByteBuf 对一个的 Chunk 以及 handle。

PoolThreadCache 数据结构

PoolThreadCache 内部维护四个数组：

swift 复制代码

private final MemoryRegionCache<byte[]>[] smallSubPageHeapCaches;
private final MemoryRegionCache<ByteBuffer>[] smallSubPageDirectCaches;
private final MemoryRegionCache<byte[]>[] normalHeapCaches;
private final MemoryRegionCache<ByteBuffer>[] normalDirectCaches;

这四个数组分别保存了 Small 和 Normal 规格的数据，且都为 MemoryRegionCache 类型。

MemoryRegionCache 是 PoolThreadCache 真正缓存数据的地方，作为抽象类，它有两个子类分别为 SubPageMemoryRegionCache 和 NormalMemoryRegionCache。

MemoryRegionCache 内部有一个有界队列 Queue，对于 Small 类型的该队列的长度为 256 ，Normal 类型的该队列的长度为 64，在进行内存释放时，缓存数据都是保存在 Queue 队列中，如果队列已经满了，那么会将该内存块释放回 PoolArena。

arduino 复制代码

private abstract static class MemoryRegionCache<T> {
  private final int size;
  private final Queue<Entry<T>> queue;
  private final SizeClass sizeClass;
  private int allocations;
        
  //...
}

有界队列 Queue 中所有元素都是 Entry 这种数据类型，其结构如下：

swift 复制代码

static final class Entry<T> {
  // Recycler Handle 
  final Handle<Entry<?>> recyclerHandle;
  // 记录了当前内存块来自的 PoolChunk
  PoolChunk<T> chunk;
  // 如果是直接内存，该属性记录了当前内存块所在的ByteBuffer对象
  ByteBuffer nioBuffer;
  // PoolChunk 或者 PoolSubpage 内存块对应的 handle
  long handle = -1;
  int normCapacity;

PoolThreadCache 中维护的每一个内存块最终都是使用一个 Entry 对象来缓存，从 Entry 结构来看，它用 chunk 和 handle 来记录内存块的情况，chunk 记录当前内存块所在的 PoolChunk，而 handle 则记录该内存块的基本信息。还有一个 recyclerHandle（详情参考这篇文章：深入理解 Netty 的对象池：Recycler）。

至此，我们对 PoolThreadCache 的整体结构就有了一个比较清晰的认识，如下图：

PoolThreadCache 初始化

PoolThreadCache 整体结构不是很复杂，它的核心就在于四个数组的维护，我们先看 PoolThreadCache 初始化过程。它的入口在 PoolThreadLocalCache 的 initialValue()，该方法用于无法从 PoolThreadLocalCache 中获取数据时，调用该方法初始化：

java 复制代码

        protected synchronized PoolThreadCache initialValue() {
            // 获取对应 PoolArena 数组中最少被使用的那个 PoolArena
            final PoolArena<byte[]> heapArena = leastUsedArena(heapArenas);
            final PoolArena<ByteBuffer> directArena = leastUsedArena(directArenas);

            final Thread current = Thread.currentThread();
            final EventExecutor executor = ThreadExecutorMap.currentExecutor();
            
            // useCacheForAllThreads 为每一个线程都使用缓存
            if (useCacheForAllThreads || current instanceof FastThreadLocalThread || executor != null) {
               // 初始化
                final PoolThreadCache cache = new PoolThreadCache(
                        heapArena, directArena, smallCacheSize, normalCacheSize,
                        DEFAULT_MAX_CACHED_BUFFER_CAPACITY, DEFAULT_CACHE_TRIM_INTERVAL);

                if (DEFAULT_CACHE_TRIM_INTERVAL_MILLIS > 0) {
                    if (executor != null) {
                        executor.scheduleAtFixedRate(trimTask, DEFAULT_CACHE_TRIM_INTERVAL_MILLIS,
                                DEFAULT_CACHE_TRIM_INTERVAL_MILLIS, TimeUnit.MILLISECONDS);
                    }
                }
                return cache;
            }
            // 如果指定不使用缓存，或者线程对象不是 FastThreadLocalThread，则创建一个空的 PoolThreadCache。该对象不做任何缓存，因为所有的都是 0
            return new PoolThreadCache(heapArena, directArena, 0, 0, 0, 0);
        }

调用构造函数初始化 PoolThreadCache：

scss 复制代码

    PoolThreadCache(PoolArena<byte[]> heapArena, PoolArena<ByteBuffer> directArena,
                    int smallCacheSize, int normalCacheSize, int maxCachedBufferCapacity,
                    int freeSweepAllocationThreshold) {
        // ...
        if (directArena != null) {
            smallSubPageDirectCaches = createSubPageCaches(
                    smallCacheSize, directArena.numSmallSubpagePools);

            normalDirectCaches = createNormalCaches(
                    normalCacheSize, maxCachedBufferCapacity, directArena);

            directArena.numThreadCaches.getAndIncrement();
        } else {
            // ...
        }
        if (heapArena != null) {
            smallSubPageHeapCaches = createSubPageCaches(
                    smallCacheSize, heapArena.numSmallSubpagePools);

            normalHeapCaches = createNormalCaches(
                    normalCacheSize, maxCachedBufferCapacity, heapArena);

            heapArena.numThreadCaches.getAndIncrement();
        } else {
            // ...
        }

        // ...
    }

构造函数中最重要的就是四个 createXxCaches() 方法，这里我们看一个就可以了：

csharp 复制代码

    private static <T> MemoryRegionCache<T>[] createNormalCaches(
            int cacheSize, int maxCachedBufferCapacity, PoolArena<T> area) {
        if (cacheSize > 0 && maxCachedBufferCapacity > 0) {
            int max = Math.min(area.chunkSize, maxCachedBufferCapacity);
            
            List<MemoryRegionCache<T>> cache = new ArrayList<MemoryRegionCache<T>>() ;
            for (int idx = area.numSmallSubpagePools; idx < area.nSizes && area.sizeIdx2size(idx) <= max ; idx++) {
                cache.add(new NormalMemoryRegionCache<T>(cacheSize));
            }
            return cache.toArray(new MemoryRegionCache[0]);
        } else {
            return null;
        }
    }

内部其实就是创建一个 MemoryRegionCache 数组。

从上面可以看出，PoolThreadCache 在初始化时会优先选择 PoolArena 数组中被最少线程占用的哪个 PoolArena ，然后将其封装到一个新建的 PoolThreadCache 中。

PoolThreadCache 分配内存

PoolThreadCache 的内存分配方法有两个：allocateSmall() 和 allocateNormal()，一个用于 Small ，一个用于 Normal。

arduino 复制代码

    // 申请 Small 类型的内存块
    boolean allocateSmall(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int sizeIdx) {
        return allocate(cacheForSmall(area, sizeIdx), buf, reqCapacity);
    }
    
    // 申请 Normal 类型的内存块
    boolean allocateNormal(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int sizeIdx) {
        return allocate(cacheForNormal(area, sizeIdx), buf, reqCapacity);
    }

调用 cacheForXxx() 获取 MemoryRegionCache，以 Small 为例：

typescript 复制代码

    private MemoryRegionCache<?> cacheForSmall(PoolArena<?> area, int sizeIdx) {
        if (area.isDirect()) {
            return cache(smallSubPageDirectCaches, sizeIdx);
        }
        return cache(smallSubPageHeapCaches, sizeIdx);
    }
    
    private static <T> MemoryRegionCache<T> cache(MemoryRegionCache<T>[] cache, int sizeIdx) {
        if (cache == null || sizeIdx > cache.length - 1) {
            return null;
        }
        return cache[sizeIdx];
    }

其实就是从数组 MemoryRegionCache[] 中获取指定位置（sizeIdx）的 MemoryRegionCache 对象。

获取 MemoryRegionCache 对象后调用 allocate() 分配内存：

java 复制代码

    private boolean allocate(MemoryRegionCache<?> cache, PooledByteBuf buf, int reqCapacity) {
        if (cache == null) {
            return false;
        }
        // 调用 cache.allocate() 获取内存块
        boolean allocated = cache.allocate(buf, reqCapacity, this);
        // 判断当前 PoolThreadCache 中申请的内存次数是否超过阈值，如果超过就调用 trim() 进行规整
        if (++ allocations >= freeSweepAllocationThreshold) {
            allocations = 0;
            trim();
        }
        return allocated;
    }

首先调用 MemoryRegionCache.allocate() 分配内存，如果成功就返回 true，否则就 false。

arduino 复制代码

        public final boolean allocate(PooledByteBuf<T> buf, int reqCapacity, PoolThreadCache threadCache) {
            // 直接从队列中获取
            Entry<T> entry = queue.poll();
            if (entry == null) {
                return false;
            }
            // 获取成功就利用该对象来初始化 ByteBuf 对象，这样就表示申请内存成功
            initBuf(entry.chunk, entry.nioBuffer, entry.handle, buf, reqCapacity, threadCache);
            // 回收 entry 对象，可以循环利用
            entry.recycle();
            
            // 更新已成功申请内存的数量
            ++ allocations;
            return true;
        }

其实就是从 Queue 队列中获取，获取到了就调用 initBuf() 方法来初始化 ByteBuf 对象，然后调用 recycle() 对 Entry 对象进行回收。因为内存的分配和释放是一个非常频繁的操作，真正存储内存的 Entry 对象有很高的回收价值。

最后 ++ allocations。这个操作是一个很重要的操作，它表示当前批次我们已成功申请内存的次数，如果该次数大于某个阈值（freeSweepAllocationThreshold：8192）就调用 trim() 对缓存进行规整：

scss 复制代码

    void trim() {
        trim(smallSubPageDirectCaches);
        trim(normalDirectCaches);
        trim(smallSubPageHeapCaches);
        trim(normalHeapCaches);
    }
    
    private static void trim(MemoryRegionCache<?>[] caches) {
        if (caches == null) {
            return;
        }
        for (MemoryRegionCache<?> c: caches) {
            trim(c);
        }
    }
    
    public final void trim() {
        // size：MemoryRegionCache 中 Queue 队列的最大容量
        // allocations：当前 MemoryRegionCache 申请内存的次数
        int free = size - allocations;
        allocations = 0;
        
        // 如果申请的次数连队列的容量都没达到，则释放该内存块
        if (free > 0) {
          free(free, false);
        }
    }

free = size - allocations，如果 free > 0 说明 MemoryRegionCache 缓存的内存使用比较少，那么就需要将其释放到 PoolArena 中，以便其他线程可以申请使用，调用 free() 进行释放操作：

arduino 复制代码

        private int free(int max, boolean finalizer) {
            int numFreed = 0;
            for (; numFreed < max; numFreed++) {
                Entry<T> entry = queue.poll();
                if (entry != null) {
                    freeEntry(entry, finalizer);
                } else {
                    return numFreed;
                }
            }
            return numFreed;
        }

依次从 Queue 队列中取出 Entry 元素，然后调用 freeEntry() 释放该 Entry：

ini 复制代码

        private  void freeEntry(Entry entry, boolean finalizer) {
            // 获取 Entry 里面的 PoolChunk 及对应的 handle
            PoolChunk chunk = entry.chunk;
            long handle = entry.handle;
            ByteBuffer nioBuffer = entry.nioBuffer;
            int normCapacity = entry.normCapacity;

            if (!finalizer) {
                // 回收 Entry 对象
                entry.recycle();
            }
            
            // 调用 PoolArena 的 freeChunk 释放内存
            chunk.arena.freeChunk(chunk, handle, normCapacity, sizeClass, nioBuffer, finalizer);
        }

PoolThreadCache 内存分配模块不是很难，根据 sizeIdx 获取数组 MemoryRegionCache[] 中的 MemoryRegionCache 对象，然后从 MemoryRegionCache 对象中的 Queue 获取 Entry 对象，如果获取得到则说明内存分配成功��，直接初始化 ByteBuf 对象即可。

在整个内存分配过程，利用了 allocations 来记录内存分配成功的次数，当该次数超过阈值 freeSweepAllocationThreshold（8192）时，就需要对内存的使用情况进行调整了，调整的目的主要是判断缓存的内存块使用比较少，这个时候说明 PoolThreadCache 使用率并不是很高，还不如直接归还给 PoolArena，以便其他线程可以申请使用。

PoolThreadCache 缓存内存

内存分配时，调用 PoolThreadCache#allocate()可以获取缓存中的内存块，当我们调用 ByteBuf 对象的 release() 释放内存时，会将其委托给 PoolArena 的 free() ，PoolArena 首先会判断是否池化，如果是池化，则调用 PoolThreadCache.add() 将其添加到 PoolThreadCache 中：

ini 复制代码

    boolean add(PoolArena<?> area, PoolChunk chunk, ByteBuffer nioBuffer,
                long handle, int normCapacity, SizeClass sizeClass) {
        int sizeIdx = area.size2SizeIdx(normCapacity);
        MemoryRegionCache<?> cache = cache(area, sizeIdx, sizeClass);
        if (cache == null) {
            return false;
        }
        return cache.add(chunk, nioBuffer, handle, normCapacity);
    }

根据释放内存的大小 normCapacity 获取对应的 sizeIdx，然后根据 sizeIdx 从数组 MemoryRegionCache[] 获取 MemoryRegionCache，然后调用 MemoryRegionCache 的 add() 加入到队列中：

arduino 复制代码

        public final boolean add(PoolChunk<T> chunk, ByteBuffer nioBuffer, long handle, int normCapacity) {
            Entry<T> entry = newEntry(chunk, nioBuffer, handle, normCapacity);
            boolean queued = queue.offer(entry);
            if (!queued) {
                // 无法使用，直接回收
                entry.recycle();
            }

            return queued;
        }

newEntry() 会返回一个 Entry 对象，将其添加到 Queue 队列中就可以了。