一:背景
1. 讲故事
前些天有位朋友微信找到我,说它的程序出现了内存暴涨,自己也没分析出啥,让我看下到底怎么回事,然后让这位朋友抓一个dump,拿它占一卦就行了。
二:内存暴涨分析
1. 为什么会暴涨
到底是哪里的暴涨,折半查找一下就知道了,分别通过 !address -summary 和 !eeheap -gc 观察各自的内存,输出如下:
C#
0:000> !eeheap -gc
DATAS =
========================================
Number of GC Heaps: 1
----------------------------------------
generation 0 starts at 2e8fc4b9da8
generation 1 starts at 2e8fc4b99a0
generation 2 starts at 2e780001000
ephemeral segment allocation context: none
Small object heap
segment begin allocated committed allocated size committed size
02e780000000 02e780001000 02e78fffffd0 02e790000000 0xfffefd0 (268431312) 0x10000000 (268435456)
...
02e8b8150000 02e8b8151000 02e8c814ff58 02e8c8150000 0xfffef58 (268431192) 0x10000000 (268435456)
02e8e0150000 02e8e0151000 02e8f014ff90 02e8f0150000 0xfffef90 (268431248) 0x10000000 (268435456)
02ec45c40000 02ec45c41000 02ec55c3fe90 02ec55c40000 0xfffee90 (268430992) 0x10000000 (268435456)
02e8f0150000 02e8f0151000 02e8fc865dc0 02e8fce40000 0xc714dc0 (208752064) 0xccf0000 (214892544)
Large object heap starts at 2e790001000
segment begin allocated committed allocated size committed size
02e790000000 02e790001000 02e7960253a0 02e796046000 0x60243a0 (100811680) 0x6046000 (100950016)
02e7a29d0000 02e7a29d1000 02e7a47242e8 02e7a4745000 0x1d532e8 (30749416) 0x1d75000 (30887936)
02e8c8150000 02e8c8151000 02e8dcae0b50 02e8dcae1000 0x1498fb50 (345570128) 0x14991000 (345575424)
Pinned object heap starts at 2e798001000
segment begin allocated committed allocated size committed size
02e798000000 02e798001000 02e79806d3f0 02e79806e000 0x6c3f0 (443376) 0x6e000 (450560)
------------------------------
GC Allocated Heap Size: Size: 0x128e77b30 (4981226288) bytes.
GC Committed Heap Size: Size: 0x1294aa000 (4987723776) bytes.
从卦中看,很显然这是一个托管内存暴涨问题,接下来怎么办呢?看看托管堆都有哪些对象来进一步的drill down, 接下来使用 !dumpheap -stat 分析托管堆详情,输出如下:
C#
0:000> !address -summary
--- Usage Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
Free 934 7dfd`28767000 ( 125.989 TB) 98.43%
<unknown> 2136 202`6b0c5000 ( 2.009 TB) 99.92% 1.57%
Heap 104 0`4fac0000 ( 1.245 GB) 0.06% 0.00%
Image 1343 0`15f8a000 ( 351.539 MB) 0.02% 0.00%
Stack 219 0`06b00000 ( 107.000 MB) 0.01% 0.00%
Other 16 0`001e7000 ( 1.902 MB) 0.00% 0.00%
TEB 73 0`00092000 ( 584.000 kB) 0.00% 0.00%
PEB 1 0`00001000 ( 4.000 kB) 0.00% 0.00%
--- State Summary ---------------- RgnCount ----------- Total Size -------- %ofBusy %ofTotal
MEM_FREE 934 7dfd`28767000 ( 125.989 TB) 98.43%
MEM_RESERVE 529 201`32f63000 ( 2.005 TB) 99.68% 1.57%
MEM_COMMIT 3363 1`a4926000 ( 6.571 GB) 0.32% 0.01%
0:000> !dumpheap -stat
Statistics:
MT Count TotalSize Class Name
...
7fff3d73b460 135,722 60,558,432 System.Windows.EffectiveValueEntry[]
7fff3d495b58 6,933,877 166,413,048 System.WeakReference
7fff3df42c28 6,737,027 323,377,296 MS.Internal.Data.DataBindEngine+Task
7fff3d46c878 1,719 347,902,656 System.Collections.Hashtable+bucket[]
7fff3df40c18 6,749,752 971,964,28 8 System.Windows.Data.BindingExpression
02e7e69b9b80 5,342,750 2,859,160,528 Free
Total 28,339,839 objects, 4,980,676,386 bytes
这幅卦有意思,Free占大头,这也就表明当前托管堆存在碎片化,有些朋友可能比较好奇,这碎片化到底是怎么个碎片化,能不能给我看看涨什么样子,这个就需要使用 jetbrains 大名鼎鼎的 DotMemory 了。

从卦中可以看到 Gen2 上有大量的灰色小段丛横交错,这就是内部的free撑起来的虚幻内存,到这里我们已然知道内存暴涨和Free有密切的关系。
2. 为啥有那么多的free
要想找到这个问题的答案,就需要看下 free 的前后都是什么对象了,这里我就随便截取一段,参考如下:
C#
0:000> !dumpheap 02e8e0151000 02e8f014ff90
Address MT Size
02e8e09da8f0 02e7e69b9b80 872 Free
02e8e09dac58 7fff3df40c18 144
02e8e09dace8 7fff3d495b58 24
02e8e09dad00 7fff3df42c28 48
02e8e09dad30 02e7e69b9b80 1,000 Free
02e8e09db118 7fff3df40c18 144
02e8e09db1a8 7fff3d495b58 24
02e8e09db1c0 7fff3df42c28 48
02e8e09db1f0 02e7e69b9b80 656 Free
02e8e09db480 7fff3df40c18 144
02e8e09db510 7fff3d495b58 24
02e8e09db528 7fff3df42c28 48
02e8e09db558 02e7e69b9b80 760 Free
02e8e09db850 7fff3df40c18 144
02e8e09db8e0 7fff3d495b58 24
02e8e09db8f8 7fff3df42c28 48
02e8e09db928 02e7e69b9b80 608 Free
02e8e09dbb88 7fff3df40c18 144
02e8e09dbc18 7fff3d495b58 24
02e8e09dbc30 7fff3df42c28 48
02e8e09dbc60 02e7e69b9b80 480 Free
02e8e09dbe40 7fff3df40c18 144
02e8e09dbed0 7fff3d495b58 24
02e8e09dbee8 7fff3df42c28 48
02e8e09dbf18 02e7e69b9b80 656 Free
从卦中的对象分布来看,layout还是蛮有规律的,这里就从 02e8e09dbb88 这个地址上开刀吧,使用 !gcroot 观察。
C#
0:000> !gcroot 02e8e09dbb88
Caching GC roots, this may take a while.
Subsequent runs of this command will be faster.
HandleTable:
000002e7e68d1340 (strong handle)
-> 02e780019858 System.Object[]
-> 02e780019588 System.Windows.Threading.Dispatcher
-> 02e7800196c0 System.Windows.Threading.PriorityQueue<System.Windows.Threading.DispatcherOperation>
-> 02e8fc49c608 System.Windows.Threading.PriorityItem<System.Windows.Threading.DispatcherOperation>
-> 02e8fc4a5c10 System.Windows.Threading.PriorityItem<System.Windows.Threading.DispatcherOperation>
-> 02e8fc4a3d00 System.Windows.Threading.PriorityItem<System.Windows.Threading.DispatcherOperation>
-> 02e8fc4a3bd8 System.Windows.Threading.DispatcherOperation
-> 02e8fc4a3b98 System.Action
-> 02e8fc492380 xxx.UiViewModelBase+<>c__DisplayClass375_0
-> 02e780ed3ea0 xxx.xxx.EFEMViewModel
-> 02e780ed4518 System.Collections.ObjectModel.ObservableCollection<System.String>
-> 02e781581de0 System.Collections.Specialized.NotifyCollectionChangedEventHandler
-> 02e781581c48 System.Windows.Data.ListCollectionView
-> 02e7804095d8 MS.Internal.Data.DataBindEngine
-> 02e780409a70 System.Collections.Specialized.HybridDictionary
-> 02e7869fa948 System.Collections.Hashtable
-> 02e8c8151020 System.Collections.Hashtable+bucket[]
-> 02e8e09dbb88 System.Windows.Data.BindingExpression
揽天地入卦中,我们看到了熟悉的 Dispatcher,这不就是消息循环的调度器嘛,接下来赶紧看看内部的 PriorityQueue 集合,截图如下:

尼玛,居然积压了 8949 个未处理,导致gen2直接碎片化,说实话这个 lead to 我还是第一次见到,以前最多导致 UI 卡慢甚至卡死,害,我也是长见识了。
3. 都是谁在疯狂的推送
要想找到这块信息,可以观察下各个线程都在做什么,看看那些 suspicious 线程都在通过什么进行 Invoke,输出和截图如下:
C#
0:000> ~*e !clrstack
OS Thread Id: 0x4060 (22)
Child SP IP Call Site
000000F80A67EBD8 00007fffb316e0f4 [HelperMethodFrame: 000000f80a67ebd8] System.Threading.WaitHandle.WaitOneCore(IntPtr, Int32)
000000F80A67ECE0 00007fff3e830687 System.Threading.WaitHandle.WaitOneNoCheck(Int32) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/WaitHandle.cs @ 139]
000000F80A67ED40 00007fff3e91e335 System.Windows.Threading.DispatcherOperation+DispatcherOperationEvent.WaitOne() [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/DispatcherOperation.cs @ 659]
000000F80A67EDB0 00007fff3e912dd3 System.Windows.Threading.DispatcherOperation.Wait(System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/DispatcherOperation.cs @ 220]
000000F80A67EDF0 00007fff3e91ddb7 System.Windows.Threading.Dispatcher.InvokeImpl(System.Windows.Threading.DispatcherOperation, System.Threading.CancellationToken, System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs @ 1384]
000000F80A67EE80 00007fff3e91da7e System.Windows.Threading.Dispatcher.Invoke(System.Action, System.Windows.Threading.DispatcherPriority, System.Threading.CancellationToken, System.TimeSpan) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs @ 627]
000000F80A67EF00 00007fff3e91d7f5 System.Windows.Threading.Dispatcher.Invoke(System.Action) [/_/src/Microsoft.DotNet.Wpf/src/WindowsBase/System/Windows/Threading/Dispatcher.cs @ 509]
000000F80A67EF40 00007fff3ebe6acd xxx.UiViewModelBase.GetDataAndUpdate(System.Collections.Generic.IEnumerable`1<System.String>)
000000F80A67F0A0 00007fff3ebe54ff xxx.UiViewModelBase.Poll()
000000F80A67F310 00007fff3e1253a7 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/ExecutionContext.cs @ 183]
000000F80A67F380 00007fff3e906d7e System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef, System.Threading.Thread) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/Task.cs @ 2333]
000000F80A67F430 00007fff9c48b32a System.Threading.Tasks.ThreadPoolTaskScheduler+c.<.cctor>b__10_0(System.Object) [/_/src/libraries/System.Private.CoreLib/src/System/Threading/Tasks/ThreadPoolTaskScheduler.cs @ 35]
000000F80A67F460 00007fff9c461d41 System.Threading.Thread.StartCallback() [/_/src/coreclr/System.Private.CoreLib/src/System/Threading/Thread.CoreCLR.cs @ 105]
000000F80A67F6F0 00007fff9cd1a573 [DebuggerU2MCatchHandlerFrame: 000000f80a67f6f0]

最后的作业就留给这位朋友了,优化代码逻辑,将 PriorityQueue 给降下去,当然原则上来说,朋友没有反馈卡死,可能它这个程序是无人值守的,所以不知道UI线程的惨样。
三:总结
这次生产事故的分析,给我的dump分析之旅增加了一点点缀,毕竟也给我涨了点见识,期待下次精彩相遇。
