Netty源码—7.ByteBuf原理二

大纲

9.Netty的内存规格

10.缓存数据结构

11.命中缓存的分配流程

12.Netty里有关内存分配的重要概念

13.Page级别的内存分配

14.SubPage级别的内存分配

15.ByteBuf的回收

9.Netty的内存规格

(1)4种内存规格

(2)内存申请单位

(1)4种内存规格

一.tiny:表示从0到512字节之间的内存大小

二.small:表示从512字节到8K范围的内存大小

三.normal:表示从8K到16M范围的内存大小

四.huge:表示大于16M的内存大小

(2)内存申请单位

Netty里所有的内存申请都是以Chunk为单位向操作系统申请的,后续所有的内存分配都是在这个Chunk里进行对应的操作。比如要分配1M的内存,那么首先要申请一个16M的Chunk,然后在这个16M的Chunk里取出一段1M的连续内存放入到Netty的ByteBuf里。

注意:一个Chunk的大小为16M,一个Page的大小为8K,一个SubPage的大小是0~8K,一个Chunk可以分成2048个Page。

10.缓存数据结构

(1)MemoryRegionCache的组成

(2)MemoryRegionCache的类型

(3)MemoryRegionCache的源码

(1)MemoryRegionCache的组成

Netty中与缓存相关的数据结构叫MemoryRegionCache,这是内存相关的一个缓存。MemoryRegionCache由三部分组成:queue、sizeClass、size。

一.queue

queue是一个队列,里面的每个元素都是MemoryRegionCache内部类Entry的一个实体,每一个Entry实体里都有一个chunk和一个handle。Netty里所有的内存都是以Chunk为单位进行分配的,而每一个handle都指向唯一一段连续的内存。所以一个chunk + 一个指向连续内存的handle,就能确定这块Entry的内存大小和内存位置,然后所有这些Entry组合起来就变成一个缓存的链。

二.sizeClass

sizeClass是Netty里的内存规格,其中有三种类型的内存规则。一种是tiny(0512B),一种是small(512B8K),一种是normal(8K~16M)。由于huge是直接使用非缓存的内存分配,所以不在该sizeClass范围内。

三.size

一个MemoryRegionCache所缓存的一个ByteBuf的大小是固定的。如果MemoryRegionCache里缓存了1K的ByteBuf,那么queue里所有的元素都是1K的ByteBuf。也就是说,同一个MemoryRegionCache它的queue里的所有元素都是固定大小的。这些固定大小分别有:tiny类型规则的是16B的整数倍直到498B,small类型规则的有512B、1K、2K、4K,normal类型规定的有8K、16K、32K。所以对于32K以上是不缓存的。

(2)MemoryRegionCache的类型

Netty里所有规格的MemoryRegionCache如下图示,下面的每个节点就相当于一个MemoryRegionCache的数据结构。

其中tiny类型的内存规格有32种,也就是32个节点,分别是16B、32B、48B、......、496B。这里面的每个节点都是一个MemoryRegionCache,每个MemoryRegionCache里都有一个queue。假设要分配一个16B的ByteBuf:首先会定位到small类型的内存规格里的第二个节点,然后从该节点维护的queue队列里取出一个Entry元素。通过该Entry元素可以拿到它属于哪一个chunk以及哪一个handle,从而进行内存划分。

small类型的内存规格有4种,也就是4个节点,分别是512B、1K、2K、4K。每个节点都是一个MemoryRegionCache,每个MemoryRegionCache里都有一个queue。假设要分配一个1K的ByteBuf:首先会定位到small类型的内存规格里的第二个节点,然后从该节点维护的queue里取出一个Entry元素。这样就可以基于这个Entry元素分配出1K内存的ByteBuf,不需要再去Chunk上找一段临时内存了。

normal类型的内存规格有3种,也就是3个节点,分别是8K、16K、32K,关于Normal大小的ByteBuf的内存分配也是同样道理。

(3)MemoryRegionCache的源码

每个线程都会有一个PoolThreadCache对象,每个PoolThreadCache对象都会有tiny、small、normal三种规格的缓存。每种规格又分heap和direct,所以每个PoolThreadCache对象会有6种缓存。PoolThreadCache类正是使用了6个MemoryRegionCache数组来维护这6种缓存。如:

数组tinySubPageHeapCaches拥有32个MemoryRegionCache元素,下标为n的元素用于缓存大小为n * 16B的ByteBuf。

数组smallSubPageHeapCaches拥有4个MemoryRegionCache元素,下标为n的元素用于缓存大小为2^n * 512B的ByteBuf。

数组normalHeapCaches拥有3个MemoryRegionCache元素,下标为n的元素用于缓存大小为2^n * 8K的ByteBuf。

数组tinySubPageHeapCaches里的每个MemoryRegionCache元素,最多可以缓存tinyCacheSize个即512个ByteBuf。

数组smallSubPageHeapCaches里的每个MemoryRegionCache元素,最多可以缓存smallCacheSize个即256个ByteBuf。

数组normalHeapCaches里的每个MemoryRegionCache元素,最多可以缓存normalCacheSize个即64个ByteBuf。

ini 复制代码
final class PoolThreadCache {
    //真正要分配的内存其实就是byte[] 或者 ByteBuffer,所以实际的分配就是得到一个数值handle进行定位
    final PoolArena<byte[]> heapArena;
    final PoolArena<ByteBuffer> directArena;

    //Hold the caches for the different size classes, which are tiny, small and normal.
    //有32个MemoryRegionCache元素,分别存放16B、32B、48B、...、480B、496B的SubPage级别的内存
    private final MemoryRegionCache<byte[]>[] tinySubPageHeapCaches;
    //有4个MemoryRegionCache元素,分别存放512B、1K、2K、4K的SubPage级别的内存
    private final MemoryRegionCache<byte[]>[] smallSubPageHeapCaches;
    //有3个MemoryRegionCache元素,分别存放8K、16K、32K的Page级别的内存
    private final MemoryRegionCache<byte[]>[] normalHeapCaches;
    private final MemoryRegionCache<ByteBuffer>[] tinySubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] smallSubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] normalDirectCaches;
    
    PoolThreadCache(PoolArena<byte[]> heapArena, PoolArena<ByteBuffer> directArena, 
            int tinyCacheSize, int smallCacheSize, int normalCacheSize,
            int maxCachedBufferCapacity, int freeSweepAllocationThreshold) {
        ...
        this.freeSweepAllocationThreshold = freeSweepAllocationThreshold;
        this.heapArena = heapArena;
        this.directArena = directArena;
        if (directArena != null) {
            tinySubPageDirectCaches = createSubPageCaches(tinyCacheSize, PoolArena.numTinySubpagePools, SizeClass.Tiny);
            smallSubPageDirectCaches = createSubPageCaches(smallCacheSize, directArena.numSmallSubpagePools, SizeClass.Small);
            numShiftsNormalDirect = log2(directArena.pageSize);
            normalDirectCaches = createNormalCaches(normalCacheSize, maxCachedBufferCapacity, directArena);
            directArena.numThreadCaches.getAndIncrement();
        } else {
            //No directArea is configured so just null out all caches
            tinySubPageDirectCaches = null;
            smallSubPageDirectCaches = null;
            normalDirectCaches = null;
            numShiftsNormalDirect = -1;
        }
        if (heapArena != null) {
            //Create the caches for the heap allocations
            tinySubPageHeapCaches = createSubPageCaches(tinyCacheSize, PoolArena.numTinySubpagePools, SizeClass.Tiny);
            smallSubPageHeapCaches = createSubPageCaches(smallCacheSize, heapArena.numSmallSubpagePools, SizeClass.Small);
            numShiftsNormalHeap = log2(heapArena.pageSize);
            normalHeapCaches = createNormalCaches(normalCacheSize, maxCachedBufferCapacity, heapArena);
            heapArena.numThreadCaches.getAndIncrement();
        } else {
            //No heapArea is configured so just null out all caches
            tinySubPageHeapCaches = null;
            smallSubPageHeapCaches = null;
            normalHeapCaches = null;
            numShiftsNormalHeap = -1;
        }
        //The thread-local cache will keep a list of pooled buffers which must be returned to the pool when the thread is not alive anymore.
        ThreadDeathWatcher.watch(thread, freeTask);
    }
    
    private static <T> MemoryRegionCache<T>[] createSubPageCaches(int cacheSize, int numCaches, SizeClass sizeClass) {
        if (cacheSize > 0) {
            @SuppressWarnings("unchecked")
            MemoryRegionCache<T>[] cache = new MemoryRegionCache[numCaches];
            for (int i = 0; i < cache.length; i++) {
                cache[i] = new SubPageMemoryRegionCache<T>(cacheSize, sizeClass);
            }
            return cache;
        } else {
            return null;
        }
    }

    private static <T> MemoryRegionCache<T>[] createNormalCaches(int cacheSize, int maxCachedBufferCapacity, PoolArena<T> area) {
        if (cacheSize > 0) {
            int max = Math.min(area.chunkSize, maxCachedBufferCapacity);
            int arraySize = Math.max(1, log2(max / area.pageSize) + 1);

            @SuppressWarnings("unchecked")
            MemoryRegionCache<T>[] cache = new MemoryRegionCache[arraySize];
            for (int i = 0; i < cache.length; i++) {
                cache[i] = new NormalMemoryRegionCache<T>(cacheSize);
            }
            return cache;
        } else {
            return null;
        }
    }
    
    private static final class SubPageMemoryRegionCache<T> extends MemoryRegionCache<T> {
        SubPageMemoryRegionCache(int size, SizeClass sizeClass) {
            super(size, sizeClass);
        }
        ...
    }

    private static int log2(int val) {
        int res = 0;
        while (val > 1) {
            val >>= 1;
            res++;
        }
        return res;
    }
    
    ...
    
    private abstract static class MemoryRegionCache<T> {
        private final int size;
        private final Queue<Entry<T>> queue;
        private final SizeClass sizeClass;

        MemoryRegionCache(int size, SizeClass sizeClass) {
            this.size = MathUtil.safeFindNextPositivePowerOfTwo(size);
            queue = PlatformDependent.newFixedMpscQueue(this.size);
            this.sizeClass = sizeClass;
        }
        ...
        
        static final class Entry<T> {
            final Handle<Entry<?>> recyclerHandle;
            PoolChunk<T> chunk;
            long handle = -1;

            Entry(Handle<Entry<?>> recyclerHandle) {
                this.recyclerHandle = recyclerHandle;
            }

            void recycle() {
                chunk = null;
                handle = -1;
                recyclerHandle.recycle(this);
            }
        }
    }
}

abstract class PoolArena<T> implements PoolArenaMetric {
    enum SizeClass {
        Tiny,
        Small,
        Normal
    }
    ...
}

11.命中缓存的分配流程

(1)内存分配的入口

(2)首先进行分段规格化

(3)然后进行缓存分配

(1)内存分配的入口

内存分配的入口是PooledByteBufAllocator内存分配器的newHeapBuffer()方法或newDirectBuffer()方法,其中这两个方法又会执行heapArena.allocate()方法或者directArena.allocate()方法,所以内存分配的入口其实就是PoolArena的allocate()方法。

scss 复制代码
public class PooledByteBufAllocator extends AbstractByteBufAllocator {
    private final PoolThreadLocalCache threadCache;
    private final PoolArena<byte[]>[] heapArenas;//一个线程会和一个PoolArena绑定
    private final PoolArena<ByteBuffer>[] directArenas;//一个线程会和一个PoolArena绑定
    ...
    @Override
    protected ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity) {
        PoolThreadCache cache = threadCache.get();
        PoolArena<byte[]> heapArena = cache.heapArena;
        ByteBuf buf;
        if (heapArena != null) {
            //分配堆内存
            buf = heapArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            buf = new UnpooledHeapByteBuf(this, initialCapacity, maxCapacity);
        }
        return toLeakAwareBuffer(buf);
    }

    @Override
    protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
        PoolThreadCache cache = threadCache.get();
        PoolArena<ByteBuffer> directArena = cache.directArena;
        ByteBuf buf;
        if (directArena != null) {
            //分配直接内存
            buf = directArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            if (PlatformDependent.hasUnsafe()) {
                buf = UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity);
            } else {
                buf = new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
            }
        }
        return toLeakAwareBuffer(buf);
    }
    ...
}

abstract class PoolArena<T> implements PoolArenaMetric {
    ...
    PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
        PooledByteBuf<T> buf = newByteBuf(maxCapacity);//创建ByteBuf对象
        allocate(cache, buf, reqCapacity);//基于PoolThreadCache对ByteBuf对象进行内存分配
        return buf;
    }
    
    private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
        //1.根据reqCapacity进行分段规格化
        final int normCapacity = normalizeCapacity(reqCapacity);
        if (isTinyOrSmall(normCapacity)) {//capacity < pageSize,需要分配的内存小于8K
            int tableIdx;
            PoolSubpage<T>[] table;
            boolean tiny = isTiny(normCapacity);
            if (tiny) {//< 512
                //2.进行缓存分配
                if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
                    //命中缓存,was able to allocate out of the cache so move on
                    return;
                }
                tableIdx = tinyIdx(normCapacity);
                table = tinySubpagePools;
            } else {
                //2.进行缓存分配
                if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
                    //命中缓存,was able to allocate out of the cache so move on
                    return;
                }
                tableIdx = smallIdx(normCapacity);
                table = smallSubpagePools;
            }

            final PoolSubpage<T> head = table[tableIdx];

            //Synchronize on the head. 
            //This is needed as PoolChunk#allocateSubpage(int) and PoolChunk#free(long) may modify the doubly linked list as well.
            synchronized (head) {
                final PoolSubpage<T> s = head.next;
                if (s != head) {
                    assert s.doNotDestroy && s.elemSize == normCapacity;
                    long handle = s.allocate();
                    assert handle >= 0;
                    s.chunk.initBufWithSubpage(buf, handle, reqCapacity);
                    if (tiny) {
                        allocationsTiny.increment();
                    } else {
                        allocationsSmall.increment();
                    }
                    return;
                }
            }
            //没有命中缓存
            allocateNormal(buf, reqCapacity, normCapacity);
            return;
        }
        if (normCapacity <= chunkSize) {//需要分配的内存大于8K,但小于16M
            //2.进行缓存分配
            if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
                //命中缓存,was able to allocate out of the cache so move on
                return;
            }
            //没有命中缓存
            allocateNormal(buf, reqCapacity, normCapacity);
        } else {//需要分配的内存大于16M
            //Huge allocations are never served via the cache so just call allocateHuge
            allocateHuge(buf, reqCapacity);
        }
    }
    
    //根据reqCapacity进行分段规格化
    int normalizeCapacity(int reqCapacity) {
        if (reqCapacity < 0) {
            throw new IllegalArgumentException("capacity: " + reqCapacity + " (expected: 0+)");
        }
        if (reqCapacity >= chunkSize) {
            return reqCapacity;
        }
        if (!isTiny(reqCapacity)) { // >= 512
            int normalizedCapacity = reqCapacity;
            normalizedCapacity --;
            normalizedCapacity |= normalizedCapacity >>>  1;
            normalizedCapacity |= normalizedCapacity >>>  2;
            normalizedCapacity |= normalizedCapacity >>>  4;
            normalizedCapacity |= normalizedCapacity >>>  8;
            normalizedCapacity |= normalizedCapacity >>> 16;
            normalizedCapacity ++;
            if (normalizedCapacity < 0) {
                normalizedCapacity >>>= 1;
            }
            return normalizedCapacity;
        }
        if ((reqCapacity & 15) == 0) {
            return reqCapacity;
        }
        return (reqCapacity & ~15) + 16;
    }
    ...
}

final class PoolThreadCache {
    final PoolArena<byte[]> heapArena;
    final PoolArena<ByteBuffer> directArena;

    //Hold the caches for the different size classes, which are tiny, small and normal.
    //有32个MemoryRegionCache元素,分别存放16B、32B、48B、...、480B、496B的SubPage级别的内存
    private final MemoryRegionCache<byte[]>[] tinySubPageHeapCaches;
    //有4个MemoryRegionCache元素,分别存放512B、1K、2K、4K的SubPage级别的内存
    private final MemoryRegionCache<byte[]>[] smallSubPageHeapCaches;
    //有3个MemoryRegionCache元素,分别存放8K、16K、32K的Page级别的内存
    private final MemoryRegionCache<byte[]>[] normalHeapCaches;
    private final MemoryRegionCache<ByteBuffer>[] tinySubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] smallSubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] normalDirectCaches;
    ...
    
    //Try to allocate a tiny buffer out of the cache. Returns true if successful false otherwise
    boolean allocateTiny(PoolArena<?> area, PooledByteBuf<?> buf, int reqCapacity, int normCapacity) {
        //首先调用cacheForTiny()方法找到需要分配的size对应的MemoryRegionCache
        //然后调用allocate()方法基于MemoryRegionCache去给ByteBuf对象分配内存
        return allocate(cacheForTiny(area, normCapacity), buf, reqCapacity);
    }
    
    //找到需要分配的size对应的MemoryRegionCache
    private MemoryRegionCache<?> cacheForTiny(PoolArena<?> area, int normCapacity) {
        int idx = PoolArena.tinyIdx(normCapacity);
        if (area.isDirect()) {
            return cache(tinySubPageDirectCaches, idx);
        }
        return cache(tinySubPageHeapCaches, idx);
    }
    
    //根据索引去缓存数组中返回一个MemoryRegionCache元素
    private static <T> MemoryRegionCache<T> cache(MemoryRegionCache<T>[] cache, int idx) {
        if (cache == null || idx > cache.length - 1) {
            return null;
        }
        return cache[idx];
    }
    
    //基于MemoryRegionCache去给ByteBuf对象分配内存
    private boolean allocate(MemoryRegionCache<?> cache, PooledByteBuf buf, int reqCapacity) {
        if (cache == null) {
            return false;
        }
        //调用MemoryRegionCache的allocate()方法给buf分配大小为reqCapacity的一块内存
        boolean allocated = cache.allocate(buf, reqCapacity);
        if (++ allocations >= freeSweepAllocationThreshold) {
            allocations = 0;
            trim();
        }
        return allocated;
    }
    ...
    private abstract static class MemoryRegionCache<T> {
        private final int size;
        private final Queue<Entry<T>> queue;
        private final SizeClass sizeClass;
        private int allocations;
        ...
        //Allocate something out of the cache if possible and remove the entry from the cache.
        public final boolean allocate(PooledByteBuf<T> buf, int reqCapacity) {
            //步骤一:从queue队列中弹出一个Entry元素
            Entry<T> entry = queue.poll();
            if (entry == null) {
                return false;
            }
            //步骤二:初始化buf
            initBuf(entry.chunk, entry.handle, buf, reqCapacity);
            //步骤三:将弹出的Entry元素放入对象池中进行复用
            entry.recycle();

            //allocations is not thread-safe which is fine as this is only called from the same thread all time.
            ++ allocations;
            return true;
        }
     
        //Init the PooledByteBuf using the provided chunk and handle with the capacity restrictions.
        protected abstract void initBuf(PoolChunk<T> chunk, long handle, PooledByteBuf<T> buf, int reqCapacity);
      
        static final class Entry<T> {
            final Handle<Entry<?>> recyclerHandle;
            PoolChunk<T> chunk;
            long handle = -1;

            Entry(Handle<Entry<?>> recyclerHandle) {
                this.recyclerHandle = recyclerHandle;
            }

            void recycle() {
                chunk = null;
                handle = -1;
                recyclerHandle.recycle(this);
            }
        }
    }
}

(2)首先进行分段规格化

normalizeCapacity()方法会根据reqCapacity进行分段规格化,目的是为了让内存在分配完后、后续在release时可以直接放入缓存里而无须进行释放。

当reqCapacity是tiny类型的内存规格时它是以16B进行自增,会把它当成16B的n倍。

当reqCapacity是small类型的内存规格时它是以2的倍数进行自增,会把它变成512B的2^n倍。

当reqCapacity是normal类型的内存规格时它是以2的倍数进行自增,会把它变成8K的2^n倍。

(3)然后进行缓存分配

在进行缓存分配时会有3种规格:

一是cache.allocateTiny()方法

二是cache.allocateSmall()方法

三是cache.allocateNormal()方法

这三种类型的原理差不多,下面以cache.allocateTiny()方法为例介绍命中缓存后的内存分配流程。

步骤一:

首先找到size对应的MemoryRegionCache。 也就是说需要在一个PoolThreadCache里找到一个节点,这个节点是缓存数组中的一个MemoryRegionCache元素。

PoolThreadCache.cacheForTiny()方法的目的就是根据规格化后的需要分配的size去找到对应的MemoryRegionCache节点。该方法会首先将需要分配的size除以16,得出tiny缓存数组的索引,然后通过数组下标的方式去拿到对应的MemoryRegionCache节点。

步骤二:

然后从queue中弹出一个Entry给ByteBuf初始化。 每一个Entry都代表了某一个Chunk下的一段连续内存。初始化ByteBuf时会把这段内存设置给ByteBuf,这样ByteBuf底层就可以依赖这些内存进行数据读写。首先通过queue.poll()弹出一个Entry元素,然后执行initBuf()方法进行初始化。初始化的关键在于给PooledByteBuf的成员变量赋值,比如chunk表示在哪一块内存进行分配、handle表示在这块chunk的哪一段内存进行分配,因为一个ByteBuf对象通过一个chunk和一个handle就能确定一块内存。

步骤三:

最后将弹出的Entry放入对象池里进行复用。 Entry被弹出之后其实就不会再被用到了,而Entry本身也是一个对象。在PooledByteBuf对象初始化完成后,该Entry对象就不再使用了,不再使用的对象有可能会被GC垃圾回收掉。

而Netty为了让对象尽可能复用,会对Entry对象进行entry.recycle()处理,也就是把Entry对象放入到RECYCLE对象池中。后续当ByteBuf对象需要进行回收的时候,就可以直接从RECYCLE对象池中取出该Entry元素。然后把该Entry元素里对应的chunk和handle指向已被回收的ByteBuf对象来实现复用。

Netty会尽可能做到对象的复用,它会通过一个RECYCLE对象池的方式去减少GC,从而减少对象的重复创建和销毁。

12.Netty里有关内存分配的重要概念

(1)PoolArena

(2)PoolChunk

(3)Page和SubPage

(4)总结

(1)PoolArena

一.PoolArena的作用

当一个线程使用PooledByteBufAllocator内存分配器创建一个PooledByteBuf时,首先会通过ThreadLocal拿到属于该线程的一个PoolThreadCache对象,然后通过PoolArena的newByteBuf()方法创建出一个PooledByteBuf对象,接着调用PoolArena的allocate()方法为这个ByteBuf对象基于PoolThreadCache去分配内存。

PoolThreadCache有两大成员变量:一类是不同内存规格大小的MemoryRegionCache,另一类是PoolArena。PoolThreadCache中的PoolArena分为heapArena和directArena,通过PoolArena可以在PoolChunk里划分一块连续的内存分配给ByteBuf对象。和MemoryRegionCache不一样的是,PoolArena会直接开辟一块内存,而MemoryRegionCache是直接缓存一块内存。

swift 复制代码
final class PoolThreadCache {
    final PoolArena<byte[]> heapArena;
    final PoolArena<ByteBuffer> directArena;

    //Hold the caches for the different size classes, which are tiny, small and normal.
    //有32个MemoryRegionCache元素,分别存放16B、32B、48B、...、480B、496B的SubPage级别的内存
    private final MemoryRegionCache<byte[]>[] tinySubPageHeapCaches;
    //有4个MemoryRegionCache元素,分别存放512B、1K、2K、4K的SubPage级别的内存
    private final MemoryRegionCache<byte[]>[] smallSubPageHeapCaches;
    //有3个MemoryRegionCache元素,分别存放8K、16K、32K的Page级别的内存
    private final MemoryRegionCache<byte[]>[] normalHeapCaches;
    private final MemoryRegionCache<ByteBuffer>[] tinySubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] smallSubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] normalDirectCaches;
    ...
}

二.PoolArena的数据结构

PoolArena中有一个双向链表,双向链表中的每一个节点都是一个PoolChunkLisk。PoolChunkLisk中也有一个双向链表,双向链表中的每一个节点都是一个PoolChunk。Netty向操作系统申请内存的最小单位就是PoolChunk,也就是16M。

(2)PoolChunk

为什么PoolArena要通过双向链表的方式把PoolChunkList连接起来,且PoolChunkList也通过双向链表的方式把PoolChunk连接起来?那是因为Netty会实时计算每一个PoolChunk的使用率情况,比如16M分配了8M则使用率为50%。然后把同样使用率范围的PoolChunk放到同一个PoolChunkList中。这样在为ByteBuf寻找一个PoolChunk分配内存时,就可以通过一定的算法找到某个PoolChunkList,然后在该PoolChunkList中选择一个PoolChunk即可。

ini 复制代码
abstract class PoolArena<T> implements PoolArenaMetric {
    ...
    private final PoolChunkList<T> qInit;//存放使用率在0~25%范围内的PoolChunk
    private final PoolChunkList<T> q000;//存放使用率在1%~50%范围内的PoolChunk
    private final PoolChunkList<T> q025;//存放使用率在25%~75%范围内的PoolChunk
    private final PoolChunkList<T> q050;//存放使用率在50%~100%范围内的PoolChunk
    private final PoolChunkList<T> q075;//存放使用率在75%~100%范围内的PoolChunk
    private final PoolChunkList<T> q100;//存放使用率为100%范围内的PoolChunk
    
    protected PoolArena(PooledByteBufAllocator parent, int pageSize, int maxOrder, int pageShifts, int chunkSize) {
        ...
        qInit = new PoolChunkList<T>(q000, Integer.MIN_VALUE, 25, chunkSize);
        q000 = new PoolChunkList<T>(q025, 1, 50, chunkSize);
        q025 = new PoolChunkList<T>(q050, 25, 75, chunkSize);
        q050 = new PoolChunkList<T>(q075, 50, 100, chunkSize);
        q075 = new PoolChunkList<T>(q100, 75, 100, chunkSize);
        q100 = new PoolChunkList<T>(null, 100, Integer.MAX_VALUE, chunkSize);
      
        qInit.prevList(qInit);
        q000.prevList(null);
        q025.prevList(q000);
        q050.prevList(q025);
        q075.prevList(q050);
        q100.prevList(q075);
        ...
    }
    
    final class PoolChunkList<T> implements PoolChunkListMetric {
        private final PoolChunkList<T> nextList;
        private PoolChunkList<T> prevList;
        private PoolChunk<T> head;
             
        private final int minUsage;
        private final int maxUsage;
        private final int maxCapacity;
        ...
        PoolChunkList(PoolChunkList<T> nextList, int minUsage, int maxUsage, int chunkSize) {
            assert minUsage <= maxUsage;
            this.nextList = nextList;
            this.minUsage = minUsage;
            this.maxUsage = maxUsage;
            this.maxCapacity = calculateMaxCapacity(minUsage, chunkSize);
        }
        void prevList(PoolChunkList<T> prevList) {
            assert this.prevList == null;
            this.prevList = prevList;
        }
        ...
    }
    ...
}

final class PoolChunk<T> implements PoolChunkMetric {
    final PoolArena<T> arena;
    final T memory;//内存
    PoolChunkList<T> parent;
    PoolChunk<T> prev;
    PoolChunk<T> next;
    ...
}

(3)Page和SubPage

由于一个PoolChunk的大小是16M,每次分配内存时不可能直接去分配16M的内存,所以Netty又会把一个PoolChunk划分为大小一样的多个Page。Netty会把一个PoolChunk以8K为标准划分成一个个的Page(2048个Page),这样分配内存时只需要以Page为单位进行分配即可。

比如要分配16K的内存,那么只需要在一个PoolChunk里找到连续的两个Page即可。但如果要分配2K的内存,那么每次去找一个8K的Page来分配又会浪费6K的内存。所以Netty会继续把一个Page划分成多个SubPage,有的SubPage大小是按2K来划分的,有的SubPage大小是按1K来划分的。

PoolArena中有两个PoolSubpage数组,其中tinySubpagePools有32个元素,分别代表16B、32B、48B、...、480、496B的SubPage。其中smallSubpagePools有4个元素,分别代表512B、1K、2K、4K的SubPage。

java 复制代码
abstract class PoolArena<T> implements PoolArenaMetric {
    ...
    //不同规格的SubPage和PoolThreadCache的tinySubPageHeapCaches是一样的
    //有32个元素:16B、32B、48B、...、480、496B
    private final PoolSubpage<T>[] tinySubpagePools;
    //有4个元素:512B、1K、2K、4K
    private final PoolSubpage<T>[] smallSubpagePools;
    ...
}

final class PoolChunk<T> implements PoolChunkMetric {
    final PoolArena<T> arena;
    final T memory;//内存
    //一个Page的大小,比如8K
    private final int pageSize;
    //4096个元素的字节数组,表示不同规格的连续内存使用分配情况,用二叉树理解
    private final byte[] memoryMap;
    //2048个元素的数组,表示Chunk里哪些Page是以SubPage方式存在的
    //由于一个PoolChunk是16M,会以8K为标准划分一个个的Page,所以会有16 * 1024 / 8 = 2048个Page
    private final PoolSubpage<T>[] subpages;
    ...
}

final class PoolSubpage<T> implements PoolSubpageMetric {
    final PoolChunk<T> chunk;//属于哪个PoolChunk
    int elemSize;//当前SubPage是以多大的数值进行划分的
    private final long[] bitmap;//用来记录当前SubPage的内存分配情况
    private final int memoryMapIdx;//Page的index
    private final int pageSize;//Page大小
    private final int runOffset;//当前SubPage的index
    PoolSubpage<T> prev;
    PoolSubpage<T> next;
    ...
}

PoolSubpage中的chunk属性表示该SubPage从属于哪个PoolChunk,PoolSubpage中的elemSize属性表示该SubPage是以多大的数值进行划分的,PoolSubpage中的bitmap属性会用来记录该SubPage的内存分配情况,一个Page里的PoolSubpage会连成双向链表。

(4)Netty内存分配总结

首先从线程对应的PoolThreadCache里获取一个PoolArena,然后从PoolArena的一个ChunkList中取出一个Chunk进行内存分配。接着,在这个Chunk上进行内存分配时,会判断需要分配的内存大小是否大于一个Page的大小。如果需要分配的内存超过一个Page的大小,那么就以Page为单位进行内存分配。如果需要分配的内存远小于一个Page的大小,那么就会找一个Page并把该Page切分成多个SubPage然后再从中选择。

13.Page级别的内存分配

(1)Page级别的内存分配的入口

(2)Page级别的内存分配的流程

(3)尝试在现有的PoolChunk上分配

(4)创建一个PoolChunk进行内存分配

(5)初始化PooledByteBuf对象

(1)Page级别的内存分配的入口

下面这3行代码可以用来跟踪进行Page级别内存分配时的调用栈:

ini 复制代码
PooledByteBufAllocator allocator = PooledByteBufAllocator.DEFAULT;
ByteBuf byteBuf = allocator.heapBuffer(8192);//分配8192B内存
byteBuf.release();

PooledByteBufAllocator的heapBuffer()方法通过其newHeapBuffer()方法执行代码heapArena.allocate()时,首先会调用PoolArena的allocate()方法,然后会调用PoolArena的allocateNormal()方法,接着会调用PoolChunk的allocate()方法,并最终调用到PoolChunk的allocateRun()方法进行Page级别的内存分配。进行Page级别的内存分配时只会分配多个Page而不会分配一个Page的一部分。

注意:真正要分配的内存其实就是byte[]或者ByteBuffer,所以实际的分配就是得到一个数值handle进行定位以方便后续的写。

ini 复制代码
public class PooledByteBufAllocator extends AbstractByteBufAllocator {
    private final PoolThreadLocalCache threadCache;
    private final PoolArena<byte[]>[] heapArenas;//一个线程会和一个PoolArena绑定
    private final PoolArena<ByteBuffer>[] directArenas;//一个线程会和一个PoolArena绑定
    ...
    //从下面的代码可知:真正要分配的内存其实就是byte[] 或者 ByteBuffer
    //所以实际的分配就是得到一个数值handle进行定位
    @Override
    protected ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity) {
        PoolThreadCache cache = threadCache.get();
        PoolArena<byte[]> heapArena = cache.heapArena;
        ByteBuf buf;
        if (heapArena != null) {
            //分配堆内存
            buf = heapArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            buf = new UnpooledHeapByteBuf(this, initialCapacity, maxCapacity);
        }
        return toLeakAwareBuffer(buf);
    }
    
    @Override
    protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity) {
        PoolThreadCache cache = threadCache.get();
        PoolArena<ByteBuffer> directArena = cache.directArena;

        ByteBuf buf;
        if (directArena != null) {
            buf = directArena.allocate(cache, initialCapacity, maxCapacity);
        } else {
            if (PlatformDependent.hasUnsafe()) {
                buf = UnsafeByteBufUtil.newUnsafeDirectByteBuf(this, initialCapacity, maxCapacity);
            } else {
                buf = new UnpooledDirectByteBuf(this, initialCapacity, maxCapacity);
            }
        }
        return toLeakAwareBuffer(buf);
    }
    ...
}

abstract class PoolArena<T> implements PoolArenaMetric {
    final PooledByteBufAllocator parent;
    final int pageSize;//默认8192 = 8K
    private final int maxOrder;//默认11
    final int pageShifts;//默认13
    final int chunkSize;//默认16M
    
    private final PoolChunkList<T> qInit;
    private final PoolChunkList<T> q000;
    private final PoolChunkList<T> q025;
    private final PoolChunkList<T> q050;
    private final PoolChunkList<T> q075;
    private final PoolChunkList<T> q100;

    //Metrics for allocations and deallocations
    private long allocationsNormal;
    ...
    
    protected PoolArena(PooledByteBufAllocator parent, int pageSize, int maxOrder, int pageShifts, int chunkSize) {
        this.parent = parent;
        this.pageSize = pageSize;
        this.maxOrder = maxOrder;
        this.pageShifts = pageShifts;
        this.chunkSize = chunkSize;
        ...
        qInit = new PoolChunkList<T>(q000, Integer.MIN_VALUE, 25, chunkSize);
        q000 = new PoolChunkList<T>(q025, 1, 50, chunkSize);
        q025 = new PoolChunkList<T>(q050, 25, 75, chunkSize);
        q050 = new PoolChunkList<T>(q075, 50, 100, chunkSize);
        q075 = new PoolChunkList<T>(q100, 75, 100, chunkSize);
        q100 = new PoolChunkList<T>(null, 100, Integer.MAX_VALUE, chunkSize);
      
        qInit.prevList(qInit);
        q000.prevList(null);
        q025.prevList(q000);
        q050.prevList(q025);
        q075.prevList(q050);
        q100.prevList(q075);
        ...
    }
    
    PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
        PooledByteBuf<T> buf = newByteBuf(maxCapacity);//创建ByteBuf对象
        allocate(cache, buf, reqCapacity);//基于PoolThreadCache对ByteBuf对象进行内存分配
        return buf;
    }
    
    private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
        //1.根据reqCapacity进行分段规格化
        final int normCapacity = normalizeCapacity(reqCapacity);
        if (isTinyOrSmall(normCapacity)) {//capacity < pageSize,需要分配的内存小于8K
            ...
        }
        if (normCapacity <= chunkSize) {//需要分配的内存大于8K,但小于16M
            //2.进行缓存分配
            if (cache.allocateNormal(this, buf, reqCapacity, normCapacity)) {
                //was able to allocate out of the cache so move on
                return;
            }
            allocateNormal(buf, reqCapacity, normCapacity);
        } else {//需要分配的内存大于16M
            //Huge allocations are never served via the cache so just call allocateHuge
            allocateHuge(buf, reqCapacity);
        }
    }
    
    private synchronized void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
        //1.尝试在现有的PoolChunk上分配
        if (q050.allocate(buf, reqCapacity, normCapacity) || q025.allocate(buf, reqCapacity, normCapacity) ||
            q000.allocate(buf, reqCapacity, normCapacity) || qInit.allocate(buf, reqCapacity, normCapacity) ||
            q075.allocate(buf, reqCapacity, normCapacity)) {
            ++allocationsNormal;
            return;
        }

        //2.创建一个PoolChunk并进行内存分配
        PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
        //由handle指向PoolChunk里的一块连续内存
        long handle = c.allocate(normCapacity);
        ++allocationsNormal;
        assert handle > 0;
        //3.初始化PooledByteBuf对象
        c.initBuf(buf, handle, reqCapacity);
        //4.将新建的PoolChunk添加到PoolArena的qInit这个PoolChunkList中
        qInit.add(c);
    }
    
    protected abstract PoolChunk<T> newChunk(int pageSize, int maxOrder, int pageShifts, int chunkSize);
    ...
}

(2)Page级别的内存分配的流程

一.尝试在现有的PoolChunk上分配

一个PoolArena里会有一个由多个PoolChunkList连接起来的双向链表,每个PoolChunkList代表了某种内存使用率的PoolChunk列表。PoolArena的allocateNormal()方法首先会尝试从某使用率的PoolChunkList中获取一个PoolChunk来分配,如果没法从某使用率的PoolChunkList中获取一个PoolChunk进行分配,那么就新创建一个PoolChunk。

二.创建一个PoolChunk并进行内存分配

也就是通过PoolArena的newChunk()方法去创建一个PoolChunk,然后调用该PoolChunk的allocate()方法进行内存分配,最后会调用到PoolChunk的allocateRun()方法去确定PoolChunk里的一块连续内存,这块连续内存会由一个long型的变量名为handle来指向。

三.初始化PooledByteBuf对象

在拿到一个PoolChunk的一块连续内存后(即allocateRun()方法返回的handle标记),需要执行PoolChunk的initBuf()方法去把handle标记设置到PooledByteBuf对象上。

四.将新建的PoolChunk添加PoolChunkList

也就是将新建的PoolChunk添加到PoolArena的qInit这个PoolChunkList中。

(3)尝试在现有的PoolChunk上分配

在PoolChunkList的allocate()方法中:首先会从PoolChunkList的head节点开始往下遍历,然后对每一个PoolChunk都尝试调用PoolChunk.allocate()方法进行分配。如果PoolChunk.allocate()方法返回的handle大于0,接下来会调用PoolChunk.initBuf()方法对PooledByteBuf进行初始化,并且如果当前PoolChunk的使用率大于所属PoolChunkList的最大使用率,那么需要将当前PoolChunk从所在的PoolChunkList中移除并加入到下一个PoolChunkList中。

csharp 复制代码
final class PoolChunkList<T> implements PoolChunkListMetric {
    private PoolChunk<T> head;
    ...
    //reqCapacity为需要分配的内存大小,normCapacity为已规格化的内存大小
    boolean allocate(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
        if (head == null || normCapacity > maxCapacity) {
            //Either this PoolChunkList is empty or the requested capacity is larger then the capacity 
            //which can be handled by the PoolChunks that are contained in this PoolChunkList.
            return false;
        }
      
        //从PoolChunkList的head节点开始往下遍历
        for (PoolChunk<T> cur = head;;) {
            //对每一个PoolChunk都尝试调用PoolChunk.allocate()方法进行分配
            long handle = cur.allocate(normCapacity);
            if (handle < 0) {
                cur = cur.next;
                if (cur == null) {
                    return false;
                }
            } else {
                //调用PoolChunk.initBuf()方法对PooledByteBuf进行初始化
                cur.initBuf(buf, handle, reqCapacity);
                //如果此时的PoolChunk的使用率大于所属PoolChunkList的最大使用率
                if (cur.usage() >= maxUsage) {
                    //将当前PoolChunk从所在的PoolChunkList中移除
                    remove(cur);
                    //将当前PoolChunk加入到下一个PoolChunkList中
                    nextList.add(cur);
                }
                return true;
            }
        }
    }
    ...
}

(4)创建一个PoolChunk进行内存分配

一.创建PoolChunk时的入参和构造方法

PoolArena的allocateNormal()方法在分配Page级别的内存时,会调用newChunk(pageSize, maxOrder, pageShifts, chunkSize),其中pageSize = 8K、maxOrder = 11、pageShifts = 13、chunkSize = 16M。newChunk()方法会由PoolArena的内部类兼子类DirectArena和HeapArena实现。

在HeapArena和DirectArena的newChunk()方法中,会new一个PoolChunk。而在PoolChunk的构造方法中,除了设置newChunk()方法传入的参数外,还会初始化两个数组memoryMap和depthMap来表示不同规格的连续内存使用分配情况。其中的memoryMap是一个有4096个元素的字节数组,depthMap在初始化时和memoryMap完全一样。

java 复制代码
abstract class PoolArena<T> implements PoolArenaMetric {
    ...
    private synchronized void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
        //1.尝试在现有的PoolChunk上分配
        if (q050.allocate(buf, reqCapacity, normCapacity) || q025.allocate(buf, reqCapacity, normCapacity) ||
            q000.allocate(buf, reqCapacity, normCapacity) || qInit.allocate(buf, reqCapacity, normCapacity) ||
            q075.allocate(buf, reqCapacity, normCapacity)) {
            ++allocationsNormal;
            return;
        }

        //2.创建一个PoolChunk并进行内存分配,pageSize = 8K、maxOrder = 11、pageShifts = 13、chunkSize = 16M
        PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
        //由handle指向PoolChunk里的一块连续内存
        long handle = c.allocate(normCapacity);
        ++allocationsNormal;
        assert handle > 0;
        //3.初始化PooledByteBuf对象
        c.initBuf(buf, handle, reqCapacity);
        //4.将新建的PoolChunk添加到PoolArena的qInit这个PoolChunkList中
        qInit.add(c);
    }
    
    protected abstract PoolChunk<T> newChunk(int pageSize, int maxOrder, int pageShifts, int chunkSize);
    
    static final class HeapArena extends PoolArena<byte[]> {
        ...
        //pageSize = 8K、maxOrder = 11、pageShifts = 13、chunkSize = 16M
        @Override
        protected PoolChunk<byte[]> newChunk(int pageSize, int maxOrder, int pageShifts, int chunkSize) {
            //真正的内存new byte[chunkSize],会存放到PoolChunk的memory变量中
            return new PoolChunk<byte[]>(this, new byte[chunkSize], pageSize, maxOrder, pageShifts, chunkSize);
        }
        ...
    }
    
    static final class DirectArena extends PoolArena<ByteBuffer> {
        ...
        //pageSize = 8K、maxOrder = 11、pageShifts = 13、chunkSize = 16M
        @Override
        protected PoolChunk<ByteBuffer> newChunk(int pageSize, int maxOrder, int pageShifts, int chunkSize) {
            //真正的内存ByteBuffer.allocateDirect(chunkSize),会存放到PoolChunk的memory变量中
            return new PoolChunk<ByteBuffer>(this, allocateDirect(chunkSize), pageSize, maxOrder, pageShifts, chunkSize);
        }
     
        private static ByteBuffer allocateDirect(int capacity) {
            return PlatformDependent.useDirectBufferNoCleaner() ?
                PlatformDependent.allocateDirectNoCleaner(capacity) : ByteBuffer.allocateDirect(capacity);
        }
        ...
    }
    ...
}

final class PoolChunk<T> implements PoolChunkMetric {
    final PoolArena<T> arena;
    final T memory;//内存
    //一个Page的大小,比如8K
    private final int pageSize;
    //4096个元素的字节数组,表示不同规格的连续内存使用分配情况,用二叉树理解
    private final byte[] memoryMap;
    private final byte[] depthMap;
    //2048个元素的数组,表示Chunk里哪些Page是以SubPage方式存在的
    //由于一个PoolChunk是16M,会以8K为标准划分一个个的Page,所以会有16 * 1024 / 8 = 2048个Page
    private final PoolSubpage<T>[] subpages;
    //Used to mark memory as unusable
    private final byte unusable;//默认是12
    ...
    //pageSize = 8K、maxOrder = 11、pageShifts = 13、chunkSize = 16M
    PoolChunk(PoolArena<T> arena, T memory, int pageSize, int maxOrder, int pageShifts, int chunkSize) {
        unpooled = false;
        this.arena = arena;
        this.memory = memory;
        this.pageSize = pageSize;//默认是8K
        this.pageShifts = pageShifts;//默认是13
        this.maxOrder = maxOrder;//默认是11
        this.chunkSize = chunkSize;//默认是16M
        unusable = (byte) (maxOrder + 1);//默认是12
        log2ChunkSize = log2(chunkSize);//默认是24
        subpageOverflowMask = ~(pageSize - 1);
        freeBytes = chunkSize;

        assert maxOrder < 30 : "maxOrder should be < 30, but is: " + maxOrder;
        maxSubpageAllocs = 1 << maxOrder;

        //Generate the memory map.
        memoryMap = new byte[maxSubpageAllocs << 1];
        depthMap = new byte[memoryMap.length];
        int memoryMapIndex = 1;
        for (int d = 0; d <= maxOrder; ++ d) {//move down the tree one level at a time
            int depth = 1 << d;
            for (int p = 0; p < depth; ++ p) {
                //in each level traverse left to right and set value to the depth of subtree
                memoryMap[memoryMapIndex] = (byte) d;
                depthMap[memoryMapIndex] = (byte) d;
                memoryMapIndex ++;
            }
        }
        subpages = newSubpageArray(maxSubpageAllocs);
    }
    ...
}

二.PoolChunk的memoryMap属性的数据结构

首先PoolChunk是以Page的方式去组织内存的,然后memoryMap属性中的每个结点都表示一块连续内存是否已被分配,以及depthMap属性中的每个结点都表示其所在树的深度或层高。memoryMap数组中下标为n的元素对应于一棵高度为12的连续内存的平衡二叉树的第n个结点。比如0-4M这个连续内存结点是整棵连续内存的平衡二叉树中的第4个结点,其所在树的深度是2。注意:0-16M对应在memoryMap数组中的索引是1,memoryMap数组中索引为0的元素是空的。

三.在新创建的PoolChunk上分配内存

PoolArena的allocateNormal()方法在创建完一个PoolChunk后,就要从这个PoolChunk上分配一块内存,于是会调用PoolChunk的allocate()方法。由于进行的是Page级别的内存分配,所以最终会调用PoolChunk的allocateRun()方法。

PoolChunk的allocateRun()方法首先根据需要分配的内存大小算出将被分配的连续内存结点在平衡二叉树中的深度,然后再根据PoolChunk的allocateNode()方法取得将被分配的连续内存结点在memoryMap数组中的下标id。

也就是说,执行allocateNode(d)会获得:在平衡二叉树中的 + 深度为d的那一层中 + 还没有被使用过的 + 一个连续内存结点的索引。获得这个索引之后,便会设置memoryMap数组在这个索引位置的元素的值为unusable = 12表示不可用,以及逐层往上标记结点不可用。其中,allocateNode()方法会通过异或运算实现 + 1。

scss 复制代码
final class PoolChunk<T> implements PoolChunkMetric {
    ...
    long allocate(int normCapacity) {
        if ((normCapacity & subpageOverflowMask) != 0) { // >= pageSize
            //Page级别的内存分配
            return allocateRun(normCapacity);
        } else {
            //SubPage级别的内存分配
            return allocateSubpage(normCapacity);
        }
    }
    
    //Allocate a run of pages (>=1)
    //@param normCapacity normalized capacity
    //@return index in memoryMap
    private long allocateRun(int normCapacity) {
        //maxOrder = 11、pageShifts = 13
        //比如要分配的normCapacity = 8K,则d = 11 - (13 - 13) 
        int d = maxOrder - (log2(normCapacity) - pageShifts);
        int id = allocateNode(d);
        if (id < 0) {
            return id;
        }
        freeBytes -= runLength(id);
        return id;
    }
    
    //Algorithm to allocate an index in memoryMap when we query for a free node at depth d
    //@param d depth
    //@return index in memoryMap
    private int allocateNode(int d) {
        int id = 1;
        int initial = - (1 << d);//has last d bits = 0 and rest all = 1
        //取出memoryMap[id]的值
        byte val = value(id);
        if (val > d) {//val = unusable = 12
            return -1;
        }
        //val < d表示当前结点可用
        while (val < d || (id & initial) == 0) {//id & initial == 1 << d for all ids at depth d, for < d it is 0
            id <<= 1;//每循环一次乘以2
            val = value(id);
            if (val > d) {//如果当前id对应的结点不可用
                id ^= 1;//通过异或运算实现对id加1
                val = value(id);
            }
        }
        byte value = value(id);
        assert value == d && (id & initial) == 1 << d : String.format("val = %d, id & initial = %d, d = %d", value, id & initial, d);
        setValue(id, unusable);//mark as unusable = 12
        updateParentsAlloc(id);//逐层往上标记结点不可用
        return id;
    }
    
    private byte value(int id) {
        return memoryMap[id];
    }
    
    //Update method used by allocate.
    //This is triggered only when a successor is allocated and all its predecessors need to update their state.
    //The minimal depth at which subtree rooted at id has some free space.
    private void updateParentsAlloc(int id) {
        while (id > 1) {
            int parentId = id >>> 1;
            byte val1 = value(id);
            byte val2 = value(id ^ 1);
            byte val = val1 < val2 ? val1 : val2;
            setValue(parentId, val);
            id = parentId;
        }
    }
    ...
}

(5)初始化PooledByteBuf对象

在PoolChunk的initBuf()方法中,首先会根据handle计算出memoryMapIdx和bitMapIdx。由于这里进行的是Page级别的内存分配,所以bitMapIdx为0,于是接下来会调用buf.init()方法进行PooledByteBuf的初始化。

在PooledByteBuf的构造函数中:首先会设置PoolChunk,因为要拿到一块可写的内存首先需要拿到一个PoolChunk。然后设置handle,表示指向的是PoolChunk中的哪一块连续内存,也就是平衡二叉树中的第几个结点,或者是memoryMap中的第几个元素。接着设置memory,表示分配PoolChunk时是通过Heap还是Direct方式进行申请的。以及设置offset为0,因为这里进行的是Page级别的内存分配,所以没有偏移量。最后设置length,表示实际这次内存分配到底分配了多少内存。

java 复制代码
final class PoolChunk<T> implements PoolChunkMetric {
    ...
    void initBuf(PooledByteBuf<T> buf, long handle, int reqCapacity) {
        int memoryMapIdx = memoryMapIdx(handle);
        int bitmapIdx = bitmapIdx(handle);
        if (bitmapIdx == 0) {
            byte val = value(memoryMapIdx);
            assert val == unusable : String.valueOf(val);
            buf.init(this, handle, runOffset(memoryMapIdx), reqCapacity, runLength(memoryMapIdx), arena.parent.threadCache()); 
        } else {
            initBufWithSubpage(buf, handle, bitmapIdx, reqCapacity);
        }
    }
    
    private static int memoryMapIdx(long handle) {
        return (int) handle;
    }

    private static int bitmapIdx(long handle) {
        return (int) (handle >>> Integer.SIZE);//即 handle >>> 32
    }
    ...
}

abstract class PooledByteBuf<T> extends AbstractReferenceCountedByteBuf {
    private final Recycler.Handle<PooledByteBuf<T>> recyclerHandle;
    protected PoolChunk<T> chunk;
    protected long handle;
    protected T memory;
    protected int offset;
    protected int length;
    int maxLength;
    PoolThreadCache cache;
    private ByteBuffer tmpNioBuf;

    @SuppressWarnings("unchecked")
    protected PooledByteBuf(Recycler.Handle<? extends PooledByteBuf<T>> recyclerHandle, int maxCapacity) {
        super(maxCapacity);
        this.recyclerHandle = (Handle<PooledByteBuf<T>>) recyclerHandle;
    }

    void init(PoolChunk<T> chunk, long handle, int offset, int length, int maxLength, PoolThreadCache cache) {
        assert handle >= 0;
        assert chunk != null;

        this.chunk = chunk;
        this.handle = handle;
        this.memory = chunk.memory;
        this.offset = offset;
        this.length = length;
        this.maxLength = maxLength;
        this.tmpNioBuf = null;
        this.cache = cache;
    }
    ...
}

14.SubPage级别的内存分配

(1)SubPage级别的内存分配的入口

(2)SubPage级别的内存分配的流程

(3)定位一个SubPage对象

(4)初始化SubPage对象

(5)调用subpage.allocate()进行分配

(1)SubPage级别的内存分配的入口

下面这3行代码可以用来跟踪进行SubPage级别内存分配时的调用栈:

ini 复制代码
PooledByteBufAllocator allocator = PooledByteBufAllocator.DEFAULT;
ByteBuf byteBuf = allocator.directBuffer(16);//分配16B内存
byteBuf.release();

PooledByteBufAllocator的directBuffer()方法通过其newDirectBuffer()方法执行代码directArena.allocate()时,首先会调用PoolArena的allocate()方法,然后会调用PoolArena的allocateNormal()方法,接着会调用PoolChunk的allocate()方法,并最终调用到PoolChunk的allocateSubpage()方法进行SubPage级别的内存分配。进行SubPage级别的内存分配时不会分配多个Page而只会分配一个Page的一部分。

在PoolArena的allocate()方法中,首先会通过tinyIdx(16)拿到tableIdx,此时tableIdx = 16 >>> 4 = 1。然后从PoolArena的tinySubpagePools数组中取出下标为tableIdx的一个PoolSubpage元素赋值给table。tinySubpagePools默认和MemoryRegionCache的缓存一样,也有32个元素:16B、32B、48B、...、480、496B。其中第n个元素表示大小为(n - 1) * 16B的专属SubPage。接着通过PoolArena.allocateNormal()方法调用到PoolChunk.allocateSubpage()方法。

scss 复制代码
abstract class PoolArena<T> implements PoolArenaMetric {
    //有32个元素:16B、32B、48B、...、480、496B
    private final PoolSubpage<T>[] tinySubpagePools;
    //有4个元素:512B、1K、2K、4K
    private final PoolSubpage<T>[] smallSubpagePools;
    ...    
    PooledByteBuf<T> allocate(PoolThreadCache cache, int reqCapacity, int maxCapacity) {
        PooledByteBuf<T> buf = newByteBuf(maxCapacity);//创建ByteBuf对象
        allocate(cache, buf, reqCapacity);//基于PoolThreadCache对ByteBuf对象进行内存分配
        return buf;
    }
    
    private void allocate(PoolThreadCache cache, PooledByteBuf<T> buf, final int reqCapacity) {
        //1.根据reqCapacity进行分段规格化
        final int normCapacity = normalizeCapacity(reqCapacity);
        if (isTinyOrSmall(normCapacity)) {//capacity < pageSize,需要分配的内存小于8K
            int tableIdx;
            PoolSubpage<T>[] table;
            boolean tiny = isTiny(normCapacity);
            if (tiny) {//capacity < 512
                if (cache.allocateTiny(this, buf, reqCapacity, normCapacity)) {
                    //was able to allocate out of the cache so move on
                    return;
                }
                //根据规格化后的需要分配的内存大小normCapacity,获取tableIdx
                tableIdx = tinyIdx(normCapacity);
                table = tinySubpagePools;
            } else {
                if (cache.allocateSmall(this, buf, reqCapacity, normCapacity)) {
                    //was able to allocate out of the cache so move on
                    return;
                }
                //根据规格化后的需要分配的内存大小normCapacity,获取tableIdx
                tableIdx = smallIdx(normCapacity);
                table = smallSubpagePools;
            }

            final PoolSubpage<T> head = table[tableIdx];

            //Synchronize on the head. 
            //This is needed as PoolChunk#allocateSubpage(int) and PoolChunk#free(long) may modify the doubly linked list as well. 
            synchronized (head) {
                final PoolSubpage<T> s = head.next;
                if (s != head) {//PoolArena的tinySubpagePools数组中有可以分配内存的PoolSubpage
                    assert s.doNotDestroy && s.elemSize == normCapacity;
                    //调用SubPage的allocate()方法进行内存分配
                    long handle = s.allocate();
                    assert handle >= 0;
                    s.chunk.initBufWithSubpage(buf, handle, reqCapacity);

                    if (tiny) {
                        allocationsTiny.increment();
                    } else {
                        allocationsSmall.increment();
                    }
                    return;
                }
            }
            allocateNormal(buf, reqCapacity, normCapacity);
            return;
        }
        ...
    }
    
    static int tinyIdx(int normCapacity) {
        return normCapacity >>> 4;
    }

    static int smallIdx(int normCapacity) {
        int tableIdx = 0;
        int i = normCapacity >>> 10;
        while (i != 0) {
            i >>>= 1;
            tableIdx ++;
        }
        return tableIdx;
    }
    
    private synchronized void allocateNormal(PooledByteBuf<T> buf, int reqCapacity, int normCapacity) {
        //1.尝试在现有的PoolChunk上分配
        if (q050.allocate(buf, reqCapacity, normCapacity) || q025.allocate(buf, reqCapacity, normCapacity) ||
            q000.allocate(buf, reqCapacity, normCapacity) || qInit.allocate(buf, reqCapacity, normCapacity) ||
            q075.allocate(buf, reqCapacity, normCapacity)) {
            ++allocationsNormal;
            return;
        }

        //2.创建一个PoolChunk并进行内存分配
        PoolChunk<T> c = newChunk(pageSize, maxOrder, pageShifts, chunkSize);
        //由handle指向PoolChunk里的一块连续内存
        long handle = c.allocate(normCapacity);
        ++allocationsNormal;
        assert handle > 0;
        //3.初始化PooledByteBuf对象
        c.initBuf(buf, handle, reqCapacity);
        //4.将新建的PoolChunk添加到PoolArena的qInit这个PoolChunkList中
        qInit.add(c);
    }
    ...
}

final class PoolChunk<T> implements PoolChunkMetric {
    ...
    long allocate(int normCapacity) {
        if ((normCapacity & subpageOverflowMask) != 0) {//normCapacity >= pageSize
            //Page级别的内存分配
            return allocateRun(normCapacity);
        } else {
            //SubPage级别的内存分配
            return allocateSubpage(normCapacity);
        }
    }
    ...
}

(2)SubPage级别的内存分配的流程

PoolChunk.allocateSubpage()方法的主要操作:

一.定位一个SubPage对象

二.初始化SubPage对象

三.调用SubPage的allocate()方法进行分配

java 复制代码
final class PoolChunk<T> implements PoolChunkMetric {
    final PoolArena<T> arena;
    final T memory;//内存
    //一个Page的大小,比如8K
    private final int pageSize;
    //4096个元素的字节数组,表示不同规格的连续内存使用分配情况,用二叉树理解
    private final byte[] memoryMap;
    private final byte[] depthMap;
    //2048个元素的数组,表示Chunk里哪些Page是以SubPage方式存在的
    //由于一个PoolChunk是16M,会以8K为标准划分一个个的Page,所以会有16 * 1024 / 8 = 2048个Page
    private final PoolSubpage<T>[] subpages;
    //Used to mark memory as unusable
    private final byte unusable;//默认是12
    ...
    //Create/ initialize a new PoolSubpage of normCapacity
    //Any PoolSubpage created/ initialized here is added to subpage pool in the PoolArena that owns this PoolChunk
    //@param normCapacity normalized capacity
    //@return index in memoryMap
    private long allocateSubpage(int normCapacity) {
        //Obtain the head of the PoolSubPage pool that is owned by the PoolArena and synchronize on it.
        //This is need as we may add it back and so alter the linked-list structure.
        //1.定位一个SubPage对象
        //PoolArena的findSubpagePoolHead()方法会通过除以16,来获取用来存放16B的PoolSubpage节点
        PoolSubpage<T> head = arena.findSubpagePoolHead(normCapacity);
        synchronized (head) {
            //只能从层高为11的memoryMap中获取SubPage,因为只有这一层的每个结点都是8K
            int d = maxOrder;//subpages are only be allocated from pages i.e., leaves
            //1.定位一个SubPage对象:在平衡二叉树的第11层上分配一个结点
            //即通过allocateNode(d)找到一个Page在PoolChunk中的下标idx
            int id = allocateNode(d);
            if (id < 0) {
                return id;
            }

            final PoolSubpage<T>[] subpages = this.subpages;
            final int pageSize = this.pageSize;
            freeBytes -= pageSize;
           
            //1.定位一个SubPage对象:确定PoolChunk中第几个Page会以SubPage方式存在
            int subpageIdx = subpageIdx(id);
            PoolSubpage<T> subpage = subpages[subpageIdx];
            //2.初始化SubPage对象
            if (subpage == null) {
                subpage = new PoolSubpage<T>(head, this, id, runOffset(id), pageSize, normCapacity);
                subpages[subpageIdx] = subpage;
            } else {
                subpage.init(head, normCapacity);
            }
            //3.调用SubPage的allocate()方法进行分配
            return subpage.allocate();
        }
    }
    ...
}

(3)定位一个SubPage对象

SubPage是基于一个Page进行划分的,不管是从一个现有的SubPage对象中分配,还是在没有SubPage对象时创建一个SubPage,第一个步骤都是定位一个SubPage对象。

在PoolChunk的allocateSubpage()方法中:首先会通过PoolArena的findSubpagePoolHead()方法去找到在PoolArena的tinySubpagePools中用于存放16B的PoolSubpage结点。然后在连续内存的平衡二叉树的第11层上分配一个Page结点,即通过allocateNode(d)找到一个Page在PoolChunk中的下标id。接着通过subpageIdx(id)确定PoolChunk中第subpageIdx个Page会以SubPage方式存在,从而定位到在PoolChunk的subpages数组中下标为subpageIdx对应的PoolSubpage元素可用来进行SubPage级别的内存分配。

注意:在PoolChunk的subpages数组中,如果某个下标对应的PoolSubpage元素为空,则说明这个下标对应的PoolChunk中的某个Page已经用来进行了Page级别的内存分配或者还没被分配。

ini 复制代码
abstract class PoolArena<T> implements PoolArenaMetric {
    //有32个元素:16B、32B、48B、...、480、496B
    private final PoolSubpage<T>[] tinySubpagePools;
    //有4个元素:512B、1K、2K、4K
    private final PoolSubpage<T>[] smallSubpagePools;
    ...
    //找到在PoolArena的tinySubpagePools中用于存放16B的PoolSubpage结点
    PoolSubpage<T> findSubpagePoolHead(int elemSize) {
        int tableIdx;
        PoolSubpage<T>[] table;
        if (isTiny(elemSize)) {//< 512
            tableIdx = elemSize >>> 4;
            table = tinySubpagePools;
        } else {
            tableIdx = 0;
            elemSize >>>= 10;
            while (elemSize != 0) {
                elemSize >>>= 1;
                tableIdx ++;
            }
            table = smallSubpagePools;
        }
        return table[tableIdx];
    }
    ...
}

final class PoolChunk<T> implements PoolChunkMetric {
    ...
    //Algorithm to allocate an index in memoryMap when we query for a free node at depth d
    //@param d depth
    //@return index in memoryMap
    private int allocateNode(int d) {
        int id = 1;
        int initial = - (1 << d);//has last d bits = 0 and rest all = 1
        //取出memoryMap[id]的值
        byte val = value(id);
        if (val > d) {//val = unusable = 12
            return -1;
        }
        //val < d表示当前结点可用
        while (val < d || (id & initial) == 0) {//id & initial == 1 << d for all ids at depth d, for < d it is 0
            id <<= 1;//每循环一次乘以2
            val = value(id);
            if (val > d) {//如果当前id对应的结点不可用
                id ^= 1;//通过异或运算实现对id加1
                val = value(id);
            }
        }
        byte value = value(id);
        assert value == d && (id & initial) == 1 << d : String.format("val = %d, id & initial = %d, d = %d", value, id & initial, d);
        setValue(id, unusable);//mark as unusable = 12
        updateParentsAlloc(id);//逐层往上标记结点不可用
        return id;
    }
    
    private int subpageIdx(int memoryMapIdx) {
        return memoryMapIdx ^ maxSubpageAllocs;//remove highest set bit, to get offset
    }
    ...
}

(4)初始化SubPage对象

如果PoolChunk的subpages数组中下标为subpageIdx的PoolSubpage元素为空,那么就会创建一个PoolSubpage对象并对其进行初始化。初始化的过程就是去一个PoolChunk里寻找一个Page,然后按照SubPage大小将该Page进行划分。当完成PoolSubpage对象的初始化之后,就可以通过它的allocate()方法来进行内存分配了。具体来说就是把内存信息设置到PooledByteBuf对象中。

ini 复制代码
final class PoolSubpage<T> implements PoolSubpageMetric {
    final PoolChunk<T> chunk;
    private final int memoryMapIdx;
    private final int runOffset;
    private final int pageSize;
    private final long[] bitmap;
    ...
    PoolSubpage(PoolSubpage<T> head, PoolChunk<T> chunk, int memoryMapIdx, int runOffset, int pageSize, int elemSize) {
        this.chunk = chunk;
        this.memoryMapIdx = memoryMapIdx;
        this.runOffset = runOffset;
        this.pageSize = pageSize;
        bitmap = new long[pageSize >>> 10];//pageSize / 16 / 64
        init(head, elemSize);
    }

    void init(PoolSubpage<T> head, int elemSize) {
        doNotDestroy = true;
        this.elemSize = elemSize;
        if (elemSize != 0) {
            maxNumElems = numAvail = pageSize / elemSize;
            nextAvail = 0;
            bitmapLength = maxNumElems >>> 6;
            if ((maxNumElems & 63) != 0) {
                bitmapLength ++;
            }
            for (int i = 0; i < bitmapLength; i ++) {
                bitmap[i] = 0;
            }
        }
        //将当前PoolSubpage对象添加到PoolArena的tinySubpagePools数组中用于存放
        //可以分配16B内存的PoolSubpage链表中
        addToPool(head);
    }
    
    private void addToPool(PoolSubpage<T> head) {
        assert prev == null && next == null;
        prev = head;
        next = head.next;
        next.prev = this;
        head.next = this;
    }
    ...
}

PoolSubpage的构造方法调用init()方法的处理:

一.设置elemSize用于表示一个SubPage的大小,比如规格化后所申请的内存大小:16B、32B等。

二.设置bitmap用于标识把一个Page划分成多个SubPage后哪一个SubPage已被分配,0表示未分配,1表示已分配。

三.通过addToPool()方法把当前PoolSubpage对象添加到PoolArena的tinySubpagePools数组中可以分配某种规格大小内存的PoolSubpage链表里。

(5)调用SubPage的allocate()方法进行分配

首先从位图bitmap里寻找一个未被使用的SubPage。如果可用的SubPage的数量为0,则直接把该PoolSubpage对象从PoolArena的tinySubpagePools数组的某种规格的结点中移除。接着将代表未使用SubPage的bitmapIdx转换成handle,也就是拼接成64位 + bitmapIdx变成高32位 + memoryMapIdx变成低32位,所得的handle便表示一个PoolChunk里第几个Page的第几个SubPage,从而可以拿到连续内存给PooledByteBuf进行初始化。

java 复制代码
final class PoolSubpage<T> implements PoolSubpageMetric {
    int elemSize;
    private final long[] bitmap;
    private int numAvail;
    private final int memoryMapIdx;
    ...
    //Returns the bitmap index of the subpage allocation.
    long allocate() {
        if (elemSize == 0) {
            return toHandle(0);
        }
        if (numAvail == 0 || !doNotDestroy) {
            return -1;
        }
        //1.先从位图bitmap中寻找一个未被使用的SubPage
        final int bitmapIdx = getNextAvail();
        int q = bitmapIdx >>> 6;
        int r = bitmapIdx & 63;
        assert (bitmap[q] >>> r & 1) == 0;
        bitmap[q] |= 1L << r;
      
        //如果可用的SubPage为0
        if (-- numAvail == 0) {
            //把该PoolSubpage对象从PoolArena的tinySubpagePools数组的某种规格的结点中移除
            removeFromPool();
        }
        //2.将bitmapIdx转换成handle
        return toHandle(bitmapIdx);
    }
    
    private long toHandle(int bitmapIdx) {
        //拼接成64位,bitmapIdx变成高32位,memoryMapIdx变成低32位
        return 0x4000000000000000L | (long) bitmapIdx << 32 | memoryMapIdx;
    }
    ...
}

15.ByteBuf的回收

(1)池化的内存如何释放

(2)将连续内存的区段加到缓存

(3)标记连续内存的区段为未使用

(4)将ByteBuf对象添加到对象池

(1)池化的内存如何释放

比如byteBuf.release()会调用到PooledByteBuf的deallocate()方法。该方法首先会清空PooledByteBuf对象的handle、chunk、memory变量值。然后调用PoolArena的free()方法去释放对应PoolChunk在handle处的内存,也就是将连续内存的区段添加到缓存 + 标记连续内存的区段为未使用。接着调用PooledByteBuf的recycle()方法去复用PooledByteBuf对象。

scala 复制代码
public abstract class AbstractReferenceCountedByteBuf extends AbstractByteBuf {
    ...
    //byteBuf执行release()方法释放内存
    @Override
    public boolean release() {
        return release0(1);
    }
    
    private boolean release0(int decrement) {
        for (;;) {
            int refCnt = this.refCnt;
            if (refCnt < decrement) {
                throw new IllegalReferenceCountException(refCnt, -decrement);
            }

            if (refCntUpdater.compareAndSet(this, refCnt, refCnt - decrement)) {
                if (refCnt == decrement) {
                    deallocate();
                    return true;
                }
                return false;
            }
        }
    }

    protected abstract void deallocate();
    ...
}

abstract class PooledByteBuf<T> extends AbstractReferenceCountedByteBuf {
    ...
    protected PoolChunk<T> chunk;
    protected long handle;
    protected T memory;
    ...
    @Override
    protected final void deallocate() {
        if (handle >= 0) {
            //1.清空PooledByteBuf对象的handle、chunk、memory变量值
            final long handle = this.handle;
            this.handle = -1;
            memory = null;
            //2.调用PoolArena的free()方法去释放内存
            //也就是将连续内存的区段添加到缓存 + 标记连续内存的区段为未使用;
            chunk.arena.free(chunk, handle, maxLength, cache);
            //3.调用PooledByteBuf的recycle()方法,将PooledByteBuf对象添加到对象池
            recycle();
        }
    }
}

(2)将连续内存的区段加到缓存

进行内存分配时,第一个步骤就是从缓存里寻找是否有对应大小的连续内存区段,如果有就直接取出来进行分配。

如果释放内存时,将连续内存的区段添加到缓存成功了,那么下次进行内存分配时,对于相同大小的PooledByteBuf,就可以从缓存中直接取出来进行使用了。

如果释放内存时,将连续内存的区段添加到缓存不成功,比如缓存队列已经满了就会不成功,那么就标记该PooledByteBuf对应的连续内存区段为未使用。

在PoolArena的free()方法中,首先会调用PoolThreadCache的add()方法将释放的连续内存区段添加到缓存,然后调用PoolArena的freeChunk()方法标记连续内存的区段为未使用。

scss 复制代码
abstract class PoolArena<T> implements PoolArenaMetric {
    private final PoolSubpage<T>[] tinySubpagePools;
    private final PoolSubpage<T>[] smallSubpagePools;
    
    private final PoolChunkList<T> qInit;
    private final PoolChunkList<T> q000;
    private final PoolChunkList<T> q025;
    private final PoolChunkList<T> q050;
    private final PoolChunkList<T> q075;
    private final PoolChunkList<T> q100;
    ...
    void free(PoolChunk<T> chunk, long handle, int normCapacity, PoolThreadCache cache) {
        if (chunk.unpooled) {
            int size = chunk.chunkSize();
            destroyChunk(chunk);
            activeBytesHuge.add(-size);
            deallocationsHuge.increment();
        } else {
            //判断要释放的内存大小属于哪一种规格
            SizeClass sizeClass = sizeClass(normCapacity);
            //1.调用PoolThreadCache的add()方法将释放的连续内存区段添加到缓存
            if (cache != null && cache.add(this, chunk, handle, normCapacity, sizeClass)) {
                //cached so not free it.
                return;
            }
            //2.调用PoolArena的freeChunk()方法释放内存,也就是标记连续内存的区段为未使用
            freeChunk(chunk, handle, sizeClass);
        }
    }
    
    private SizeClass sizeClass(int normCapacity) {
        if (!isTinyOrSmall(normCapacity)) {
            return SizeClass.Normal;
        }
        return isTiny(normCapacity) ? SizeClass.Tiny : SizeClass.Small;
    }
    ...
}

final class PoolThreadCache {
    private final MemoryRegionCache<byte[]>[] tinySubPageHeapCaches;
    private final MemoryRegionCache<byte[]>[] smallSubPageHeapCaches;
    private final MemoryRegionCache<byte[]>[] normalHeapCaches;
    private final MemoryRegionCache<ByteBuffer>[] tinySubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] smallSubPageDirectCaches;
    private final MemoryRegionCache<ByteBuffer>[] normalDirectCaches;
    ...
    //Add PoolChunk and handle to the cache if there is enough room.
    //Returns true if it fit into the cache false otherwise.
    @SuppressWarnings({ "unchecked", "rawtypes" })
    boolean add(PoolArena<?> area, PoolChunk chunk, long handle, int normCapacity, SizeClass sizeClass) {
        MemoryRegionCache<?> cache = cache(area, normCapacity, sizeClass);
        if (cache == null) {
            return false;
        }
        //将PoolChunk的连续内存区段添加到缓存
        return cache.add(chunk, handle);
    }

    private MemoryRegionCache<?> cache(PoolArena<?> area, int normCapacity, SizeClass sizeClass) {
        switch (sizeClass) {
        case Normal:
            return cacheForNormal(area, normCapacity);
        case Small:
            return cacheForSmall(area, normCapacity);
        case Tiny:
            return cacheForTiny(area, normCapacity);
        default:
            throw new Error();
        }
    }
    
    private MemoryRegionCache<?> cacheForTiny(PoolArena<?> area, int normCapacity) {
        int idx = PoolArena.tinyIdx(normCapacity);
        if (area.isDirect()) {
            return cache(tinySubPageDirectCaches, idx);
        }
        return cache(tinySubPageHeapCaches, idx);
    }

    private MemoryRegionCache<?> cacheForSmall(PoolArena<?> area, int normCapacity) {
        int idx = PoolArena.smallIdx(normCapacity);
        if (area.isDirect()) {
            return cache(smallSubPageDirectCaches, idx);
        }
        return cache(smallSubPageHeapCaches, idx);
    }

    private MemoryRegionCache<?> cacheForNormal(PoolArena<?> area, int normCapacity) {
        if (area.isDirect()) {
            int idx = log2(normCapacity >> numShiftsNormalDirect);
            return cache(normalDirectCaches, idx);
        }
        int idx = log2(normCapacity >> numShiftsNormalHeap);
        return cache(normalHeapCaches, idx);
    }

    private static <T> MemoryRegionCache<T> cache(MemoryRegionCache<T>[] cache, int idx) {
        if (cache == null || idx > cache.length - 1) {
            return null;
        }
        return cache[idx];
    }
    ...
    private abstract static class MemoryRegionCache<T> {
        private final Queue<Entry<T>> queue;
        ...
        //Add to cache if not already full.
        //将PoolChunk的连续内存区段添加到缓存
        @SuppressWarnings("unchecked")
        public final boolean add(PoolChunk<T> chunk, long handle) {
            Entry<T> entry = newEntry(chunk, handle);
            boolean queued = queue.offer(entry);
            if (!queued) {
                //If it was not possible to cache the chunk, immediately recycle the entry
                entry.recycle();
            }
            return queued;
        }
        ...
    }
}

(3)标记连续内存的区段为未使用

标记方式会根据Page级别和SubPage级别进行标记,其中Page级别是根据二叉树来进行标记,SubPage级别是通过位图进行标记。

java 复制代码
abstract class PoolArena<T> implements PoolArenaMetric {
    ...
    void freeChunk(PoolChunk<T> chunk, long handle, SizeClass sizeClass) {
        final boolean destroyChunk;
        synchronized (this) {
            switch (sizeClass) {
            case Normal:
                ++deallocationsNormal;
                break;
            case Small:
                ++deallocationsSmall;
                break;
            case Tiny:
                ++deallocationsTiny;
                break;
            default:
                throw new Error();
            }
            //调用PoolChunk的parent也就是PoolChunkList的free()方法释放PoolChunk
            destroyChunk = !chunk.parent.free(chunk, handle);
        }
        if (destroyChunk) {
            //destroyChunk not need to be called while holding the synchronized lock.
            destroyChunk(chunk);
        }
    }
    ...
}

final class PoolChunkList<T> implements PoolChunkListMetric {
    private PoolChunk<T> head;
    private PoolChunkList<T> prevList;
    ...
    boolean free(PoolChunk<T> chunk, long handle) {
        //标记PoolChunk中连续内存区段为未使用
        chunk.free(handle);
        //如果要释放的PoolChunk的使用率小于当前PoolChunkList的最小使用率
        if (chunk.usage() < minUsage) {
            //从当前PoolChunkList中移除PoolChunk
            remove(chunk);
            //将PoolChunk添加到当前PoolChunkList的下一个PoolChunkList中
            return move0(chunk);
        }
        return true;
    }
    
    private void remove(PoolChunk<T> cur) {
        if (cur == head) {
            head = cur.next;
            if (head != null) {
                head.prev = null;
            }
        } else {
            PoolChunk<T> next = cur.next;
            cur.prev.next = next;
            if (next != null) {
                next.prev = cur.prev;
            }
        }
    }
    
    //Moves the PoolChunk down the PoolChunkList linked-list so it will end up in the right PoolChunkList 
    //that has the correct minUsage / maxUsage in respect to PoolChunk#usage().
    private boolean move0(PoolChunk<T> chunk) {
        if (prevList == null) {
            //There is no previous PoolChunkList so return false which result in having the PoolChunk destroyed and
            //all memory associated with the PoolChunk will be released.
            assert chunk.usage() == 0;
            return false;
        }
        return prevList.move(chunk);
    }
    ...
}

final class PoolChunk<T> implements PoolChunkMetric {
    final PoolArena<T> arena;
    private final PoolSubpage<T>[] subpages;
    ...
    //Free a subpage or a run of pages 
    //When a subpage is freed from PoolSubpage, it might be added back to subpage pool of the owning PoolArena
    //If the subpage pool in PoolArena has at least one other PoolSubpage of given elemSize, 
    //we can completely free the owning Page so it is available for subsequent allocations
    //@param handle handle to free
    void free(long handle) {
        int memoryMapIdx = memoryMapIdx(handle);
        int bitmapIdx = bitmapIdx(handle);

        if (bitmapIdx != 0) { // free a subpage
            PoolSubpage<T> subpage = subpages[subpageIdx(memoryMapIdx)];
            assert subpage != null && subpage.doNotDestroy;

            //Obtain the head of the PoolSubPage pool that is owned by the PoolArena and synchronize on it.
            //This is need as we may add it back and so alter the linked-list structure.
            PoolSubpage<T> head = arena.findSubpagePoolHead(subpage.elemSize);
            synchronized (head) {
                //2.SubPage级别通过位图进行标记
                if (subpage.free(head, bitmapIdx & 0x3FFFFFFF)) {
                    return;
                }
            }
        }
        //1.Page级别根据二叉树来进行标记
        freeBytes += runLength(memoryMapIdx);
        setValue(memoryMapIdx, depth(memoryMapIdx));
        updateParentsFree(memoryMapIdx);
    }
    ...
}

(4)将ByteBuf对象添加到对象池

一开始时,对象池是没有PooledByteBuf对象的,当PooledByteBuf对象被释放时不会被立即销毁,而是会加入到对象池里。

这样当Netty每次去拿一个PooledByteBuf对象时,就可以先从对象池里获取,取出对象之后就可以进行内存分配以及初始化了。

考虑到PooledByteBuf对象会经常被申请和释放,如果QPS非常高,可能会产生很多PooledByteBuf对象,而且频繁创建和释放PooledByteBuf对象也会比较耗费资源和降低性能。

所以Netty便使用了对象池来减少GC:当申请PooledByteBuf对象时,就可以尽可能从对象池里去取。当释放PooledByteBuf对象时,则可以将对象添加到对象池,从而实现对象复用。

scala 复制代码
abstract class PooledByteBuf<T> extends AbstractReferenceCountedByteBuf {
    private final Recycler.Handle<PooledByteBuf<T>> recyclerHandle;
    ...
    private void recycle() {
        recyclerHandle.recycle(this);
    }
    ...
}
相关推荐
东阳马生架构3 天前
Netty源码—8.编解码原理
netty
A22747 天前
Netty——BIO、NIO 与 Netty
java·netty·nio
用户90555842148057 天前
ByteBuf源码分析
netty
A22747 天前
Netty——NIO 空轮询 bug
java·netty·nio