通过学习sync.Pool的实现,可以学习下到很多性能优化的方向和技巧
- 通过内存填充可以避免伪共享 形如
pad [128 - unsafe.Sizeof(poolLocalInternal{})%128]byte
- 双端队列Push和Pop分别从两头进行,最大概率的降低并发冲突概率
- 为了避免Pool中一直缓存的对象长时间引用,影响可用内存,在GC期间,这些临时对象会被清空,缓解内存压力。此外为了降低GC后对Get造成太大的性能突刺,引用victim的概念,一个临时对象在第二次GC的时候才会被清除
- Put操作会向双端队列的队头添加元素 。Get操作获取元素时,如果是从本地队列中获取,同样也是从队头获取元素 (因为同一个P只能被一个G占据,所以不存在并发问题,添加和获取都操作队头,可以很好地利用局部性原理 ),而如果是从其他P的队列获取元素,可能会存在并发,所以为了降低并发获取的概率,则是从队列尾部获取
- 底层元素的存储使用
eface
布局,这样的话可以保存任何元素
go
type eface struct {
typ, val unsafe.Pointer
}
Pool
A Pool is a set of temporary objects that may be individually saved and retrieved. Any item stored in the Pool may be removed automatically at any time without notification. If the Pool holds the only reference when this happens, the item might be deallocated. A Pool is safe for use by multiple goroutines simultaneously. Pool's purpose is to cache allocated but unused items for later reuse, relieving pressure on the garbage collector. That is, it makes it easy to build efficient, thread-safe free lists. However, it is not suitable for all free lists.
An appropriate use of a Pool is to manage a group of temporary items silently shared among and potentially reused by concurrent independent clients of a package. Pool provides a way to amortize allocation overhead across many clients.
go
type Pool struct {
noCopy noCopy
local unsafe.Pointer // local fixed-size per-P pool, actual type is [P]poolLocal
localSize uintptr // size of the local array
victim unsafe.Pointer // local from previous cycle
victimSize uintptr // size of victims array
// New optionally specifies a function to generate
// a value when Get would otherwise return nil.
// It may not be changed concurrently with calls to Get.
New func() any
}
type poolLocal struct {
poolLocalInternal
// Prevents false sharing on widespread platforms with
// 128 mod (cache line size) = 0 .
// 内存填充 避免伪共享问题
pad [ 128 - unsafe.Sizeof(poolLocalInternal{})% 128 ] byte
}
// Local per-P Pool appendix.
type poolLocalInternal struct {
private any // Can be used only by the respective P.
shared poolChain // Local P can pushHead/popHead; any P can popTail.
}
一言以蔽之,Pool
一般是用来存储一些临时对象,避免每次使用都需要重新申请内存 。该结构体中存储的临时对象不保证不会被垃圾回收 。Put
方法和Get
方法可以并发安全的使用
Pool
中关键字段为local
和localSize
。local类型为unsafe.Pointer
,其真正指向的是一块连续内存-[p]poolLocal
其中p=runtime.GOMAXPROCS(0)
。
- Put操作会往该P对应的
poolLocal
中存放临时对象,如果队列满,则会申请一个新的队列(容量是原来的两倍),通过链表的方式进行链接 - poolLocal是临时缓存对象真正的承载载体,private字段是
any
,这是快速路径: 基于该字段存放和提取都能够快速返回。shared字段是一个长度可变的双端队列(dynamically-sized version of poolDequeue
),能够动态扩容。实现上是通过pre和next指针串联多个deque - Get操作会优先从自己的队列中获取元素,如果获取失败,则尝试从
victim
中获取(因为GC的时候,会先清空victim,然后将 local赋值给victim,并且清空local和localSize字段。这样做的好处就是一个临时对象经历过两次GC才会被清空,降低因为GC造成的性能抖动)。如果上述获取还是没有拿到,则会尝试获取从其他P的poolLocal中获取。 - 值得一提的是,Put操作会向双端队列的队头添加元素。Get操作获取元素时,如果是从本地队列中获取,同样也是从队头获取元素(因为同一个P只能被一个G占据,所以不存在并发问题,添加和获取都操作队头,可以很好地利用局部性原理 ),而如果是从其他P的队列获取元素,可能会存在并发,所以为了降低并发获取的概率,则是从队列尾部获取。
poolCleanup 向runtime 注册的清理函数,GC时会被调用
go
// Implemented in runtime.
func runtime_registerPoolCleanup(cleanup func())
func init() {
runtime_registerPoolCleanup(poolCleanup)
}
var (
allPoolsMu Mutex
// allPools is the set of pools that have non-empty primary
// caches. Protected by either 1) allPoolsMu and pinning or 2)
// STW.
allPools []*Pool
// oldPools is the set of pools that may have non-empty victim
// caches. Protected by STW.
oldPools []*Pool
)
func poolCleanup() {
// This function is called with the world stopped, at the beginning of a garbage collection.
// It must not allocate and probably should not call any runtime functions.
// Because the world is stopped, no pool user can be in a
// pinned section (in effect, this has all Ps pinned).
// Drop victim caches from all pools.
for _, p := range oldPools {
p.victim = nil
p.victimSize = 0
}
// Move primary cache to victim cache.
for _, p := range allPools {
p.victim = p.local
p.victimSize = p.localSize
p.local = nil
p.localSize = 0
}
// The pools with non-empty primary caches now have non-empty
// victim caches and no pools have primary caches.
oldPools, allPools = allPools, nil
}
poolChain
Pool 底层的实现便是 可变容量的双端队列 poolChain
poolChain是通过链接多个固定容量的双端队列poolDequeue实现的
-
poolChain is a dynamically-sized version of poolDequeue
-
Once a dequeue fills up, this allocates a new one and only ever pushes to the latest dequeue.
go
// poolChain is a dynamically-sized version of poolDequeue.
//
// This is implemented as a doubly-linked list queue of poolDequeues
// where each dequeue is double the size of the previous one. Once a
// dequeue fills up, this allocates a new one and only ever pushes to
// the latest dequeue. Pops happen from the other end of the list and
// once a dequeue is exhausted, it gets removed from the list.
type poolChain struct {
// head is the poolDequeue to push to. This is only accessed
// by the producer, so doesn't need to be synchronized.
head *poolChainElt
// tail is the poolDequeue to popTail from. This is accessed
// by consumers, so reads and writes must be atomic.
tail *poolChainElt
}
type poolChainElt struct {
poolDequeue
// next and prev link to the adjacent poolChainElts in this
// poolChain.
//
// next is written atomically by the producer and read
// atomically by the consumer. It only transitions from nil to
// non-nil.
//
// prev is written atomically by the consumer and read
// atomically by the producer. It only transitions from
// non-nil to nil.
next, prev *poolChainElt
}
pushHead
go
func (c *poolChain) pushHead(val any) {
d := c.head
if d == nil {
// Initialize the chain.
const initSize = 8 // Must be a power of 2
d = new(poolChainElt)
d.vals = make([]eface, initSize)
c.head = d
storePoolChainElt(&c.tail, d)
}
if d.pushHead(val) {
return
}
// The current dequeue is full. Allocate a new one of twice
// the size.
newSize := len(d.vals) * 2
if newSize >= dequeueLimit {
// Can't make it any bigger.
newSize = dequeueLimit
}
d2 := &poolChainElt{prev: d}
d2.vals = make([]eface, newSize)
c.head = d2
// 原子操作
storePoolChainElt(&d.next, d2)
d2.pushHead(val)
}
popHead
go
func (c *poolChain) popHead() (any, bool) {
d := c.head
for d != nil {
if val, ok := d.popHead(); ok {
return val, ok
}
// There may still be unconsumed elements in the
// previous dequeue, so try backing up.
// 从相临的节点 尝试获取
d = loadPoolChainElt(&d.prev)
}
return nil, false
}
popTail
go
func (c *poolChain) popTail() (any, bool) {
d := loadPoolChainElt(&c.tail)
if d == nil {
return nil, false
}
for {
// It's important that we load the next pointer
// *before* popping the tail. In general, d may be
// transiently empty, but if next is non-nil before
// the pop and the pop fails, then d is permanently
// empty, which is the only condition under which it's
// safe to drop d from the chain.
d2 := loadPoolChainElt(&d.next)
if val, ok := d.popTail(); ok {
return val, ok
}
if d2 == nil {
// This is the only dequeue. It's empty right
// now, but could be pushed to in the future.
return nil, false
}
// 能够走到这里说明 d为空 但是 next队列可能不为空 尝试从下一个队列pop元素
// The tail of the chain has been drained, so move on
// to the next dequeue. Try to drop it from the chain
// so the next pop doesn't have to look at the empty
// dequeue again.
if atomic.CompareAndSwapPointer((*unsafe.Pointer)(unsafe.Pointer(&c.tail)), unsafe.Pointer(d), unsafe.Pointer(d2)) {
// We won the race. Clear the prev pointer so
// the garbage collector can collect the empty
// dequeue and so popHead doesn't back up
// further than necessary.
storePoolChainElt(&d2.prev, nil)
}
d = d2
}
}
poolDequeue
实现无锁 队列
队头push元素
,队尾pop元素
,从而最大程度地降低并发概率。内部主要有三个方法
- pushHead 往队头添加元素 不可并发
- popHead 取出队头元素 不可并发
- popTail 从队尾取出元素 可并发(通过CAS实现)
go
// poolDequeue is a lock-free fixed-size single-producer,
// multi-consumer queue. The single producer can both push and pop
// from the head, and consumers can pop from the tail.
//
// It has the added feature that it nils out unused slots to avoid
// unnecessary retention of objects. This is important for sync.Pool,
// but not typically a property considered in the literature.
type poolDequeue struct {
// headTail packs together a 32-bit head index and a 32-bit
// tail index. Both are indexes into vals modulo len(vals)-1.
//
// tail = index of oldest data in queue
// head = index of next slot to fill
//
// Slots in the range [tail, head) are owned by consumers.
// A consumer continues to own a slot outside this range until
// it nils the slot, at which point ownership passes to the
// producer.
// 高位 存放Head Head是自增 所以可能会溢出 但是是可接受的
// The head index is stored in the most-significant bits so
// that we can atomically add to it and the overflow is
// harmless.
headTail uint64
// vals is a ring buffer of interface{} values stored in this
// dequeue. The size of this must be a power of 2.
//
// vals[i].typ is nil if the slot is empty and non-nil
// otherwise. A slot is still in use until *both* the tail
// index has moved beyond it and typ has been set to nil. This
// is set to nil atomically by the consumer and read
// atomically by the producer.
vals []eface
}
type eface struct {
typ, val unsafe.Pointer
}
const dequeueBits = 32
// dequeueLimit is the maximum size of a poolDequeue.
//
// This must be at most (1<<dequeueBits)/2 because detecting fullness
// depends on wrapping around the ring buffer without wrapping around
// the index. We divide by 4 so this fits in an int on 32-bit.
const dequeueLimit = (1 << dequeueBits) / 4
// dequeueNil is used in poolDequeue to represent interface{}(nil).
// Since we use nil to represent empty slots, we need a sentinel value
// to represent nil.
// 充当哨兵 代替 nil
type dequeueNil *struct{}
func (d *poolDequeue) unpack(ptrs uint64) (head, tail uint32) {
const mask = 1<<dequeueBits - 1
head = uint32((ptrs >> dequeueBits) & mask)
tail = uint32(ptrs & mask)
return
}
func (d *poolDequeue) pack(head, tail uint32) uint64 {
const mask = 1<<dequeueBits - 1
return (uint64(head) << dequeueBits) |
uint64(tail&mask)
}
pushHead
- 不允许并发调用
- 当发现对应槽内
typ!=nil
说明这槽暂时不可用
go
// pushHead adds val at the head of the queue. It returns false if the
// queue is full. It must only be called by a single producer.
func (d *poolDequeue) pushHead(val any) bool {
ptrs := atomic.LoadUint64(&d.headTail)
head, tail := d.unpack(ptrs)
// 队列满判断标识
if (tail+uint32(len(d.vals)))&(1<<dequeueBits-1) == head {
// Queue is full.
return false
}
slot := &d.vals[head&uint32(len(d.vals)-1)]
// Check if the head slot has been released by popTail.
typ := atomic.LoadPointer(&slot.typ)
// 具体参看 popTail
if typ != nil {
// Another goroutine is still cleaning up the tail, so
// the queue is actually still full.
return false
}
// The head slot is free, so we own it.
if val == nil {
// 因为nil 对应的typ字段 与上述判断冲突 所以通过哨兵结构体代替nil
val = dequeueNil(nil)
}
*(*any)(unsafe.Pointer(slot)) = val
// Increment head. This passes ownership of slot to popTail
// and acts as a store barrier for writing the slot.
atomic.AddUint64(&d.headTail, 1<<dequeueBits)
return true
}
popHead
- 从队头取出元素
- 不允许并发调用
- 取出元素后 会将该槽位清空
*slot = eface{}
go
// popHead removes and returns the element at the head of the queue.
// It returns false if the queue is empty. It must only be called by a
// single producer.
func (d *poolDequeue) popHead() (any, bool) {
var slot *eface
for {
ptrs := atomic.LoadUint64(&d.headTail)
head, tail := d.unpack(ptrs)
if tail == head {
// Queue is empty.
return nil, false
}
// Confirm tail and decrement head. We do this before
// reading the value to take back ownership of this
// slot.
head--
ptrs2 := d.pack(head, tail)
// 为什么这里也需要使用CAS 因为可能会和 popTail 操作同一个槽位
if atomic.CompareAndSwapUint64(&d.headTail, ptrs, ptrs2) {
// We successfully took back slot.
slot = &d.vals[head&uint32(len(d.vals)-1)]
break
}
}
val := *(*any)(unsafe.Pointer(slot))
if val == dequeueNil(nil) {
val = nil
}
// Zero the slot. Unlike popTail, this isn't racing with
// pushHead, so we don't need to be careful here.
// 清空该槽位 对应的typ 和val都为 nil
*slot = eface{}
return val, true
}
popTail
- 从队尾取出元素
- 允许并发
- 通过原子的设置
slot.typ
来保障一个槽只能被一个协程占据,其他协程如果发现该槽位slot.type!=nil
说明这个槽位处于pop的中间状态
go
// popTail removes and returns the element at the tail of the queue.
// It returns false if the queue is empty. It may be called by any
// number of consumers.
func (d *poolDequeue) popTail() (any, bool) {
var slot *eface
for {
ptrs := atomic.LoadUint64(&d.headTail)
head, tail := d.unpack(ptrs)
if tail == head {
// Queue is empty.
return nil, false
}
// Confirm head and tail (for our speculative check
// above) and increment tail. If this succeeds, then
// we own the slot at tail.
ptrs2 := d.pack(head, tail+1)
// CAS 设置成功 说明可以取出对应的元素
if atomic.CompareAndSwapUint64(&d.headTail, ptrs, ptrs2) {
slot = &d.vals[tail&uint32(len(d.vals)-1)]
break
}
}
// We now own slot.
val := *(*any)(unsafe.Pointer(slot))
if val == dequeueNil(nil) {
val = nil
}
// Tell pushHead that we're done with this slot. Zeroing the
// slot is also important so we don't leave behind references
// that could keep this object live longer than necessary.
//
// We write to val first and then publish that we're done with
// this slot by atomically writing to typ.
slot.val = nil
// 只有成功清空 typ字段 这个槽位才可以被生产者写入数据
atomic.StorePointer(&slot.typ, nil)
// At this point pushHead owns the slot.
return val, true
}