一:背景
1. 讲故事
前段时间有位训练营的学员找到我,说他们的软件在客户那边崩溃了,没找到是什么原因,比较着急,让我帮忙看下是怎么回事?毕竟我的学员是永久的免费dump分析,必须给他上一卦。
二:崩溃分析
1. 为什么会崩溃
关于怎么分析崩溃dump,这个在训练营里面早已整出来了套路,先用 !analyze -v
自动化分析崩溃原因,简化后如下:
C#
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
CONTEXT: (.ecxr)
eax=15c96638 ebx=010fecb0 ecx=00000000 edx=000109a8 esi=000109a8 edi=0000001c
eip=02f1d218 esp=010fec7c ebp=010feca8 iopl=0 nv up ei pl nz na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
02f1d218 8b410c mov eax,dword ptr [ecx+0Ch] ds:002b:0000000c=????????
Resetting default scope
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 02f1d218
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: 0000000c
Attempt to read from address 0000000c
STACK_TEXT:
WARNING: Frame IP not in any known module. Following frames may be wrong.
010feca8 758d139b 000109a8 0000001c 00000000 0x2f1d218
010fecd4 758c836a 15c9664e 000109a8 0000001c user32!_InternalCallWinProc+0x2b
010fedb8 758c7f6a 15c9664e 00000000 0000001c user32!UserCallWinProcCheckWow+0x33a
010fee1c 758cbb2f 01aef180 00000000 0000001c user32!DispatchClientMessage+0xea
010fee58 77a64f5d 010fee74 00000020 010ff110 user32!__fnDWORD+0x3f
010feee0 758cbdca 010fefb8 00000000 00000000 ntdll!KiUserCallbackDispatcher+0x4d
010feee0 758cbd3e 00000000 00000000 00000000 user32!_PeekMessage+0x2a
010fef1c 6f8a707c 010fefb8 00000000 00000000 user32!PeekMessageW+0x16e
010fef68 6f85443a 00000000 00000000 00000000 System_Windows_Forms_ni+0x22707c
010feffc 6f8540d1 00000000 ffffffff 00000000 System_Windows_Forms_ni!System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop+0x1b6
010ff050 6f853f23 00000000 00000000 00000000 System_Windows_Forms_ni!System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner+0x175
010ff07c 6f82c83d 00000000 00000000 00000000 System_Windows_Forms_ni!System.Windows.Forms.Application.ThreadContext.RunMessageLoop+0x4f
010ff094 02fa0b04 00000000 00000000 00000000 System_Windows_Forms_ni!System.Windows.Forms.Application.Run+0x35
010ff0f8 7337f066 00000000 00000000 00000000 xxx!xxx.Program.Main+0x2bc
...
从卦中的 DispatchClientMessage
来看,这是提取到了消息队列中的消息,在 0x2f1d218
处出现了访问违例,接下来的问题是寻找到底在处理啥消息?
2. 到底在处理什么消息
要想找到这个问题的答案,可以通过 !dso
在调用栈上寻找 MSG 结构体,简化后的输出如下:
C#
0:000> !dso
OS Thread Id: 0x20b0 (0)
ESP/REG Object Name
010FEF9C 175ea6ec System.Windows.Forms.NativeMethods+MSG[]
0:000> !mdt -e:2 175ea6ec
175ea6ec (System.Windows.Forms.NativeMethods+MSG[], Elements: 1, ElementMT=6f688e60)
[0] (System.Windows.Forms.NativeMethods+MSG) VALTYPE (MT=6f688e60, ADDR=175ea6f4)
hwnd:00140488 (System.IntPtr)
message:0x113 (System.Int32)
wParam:00000531 (System.IntPtr)
lParam:00000000 (System.IntPtr)
time:0xfbf4f32 (System.Int32)
pt_x:0x118 (System.Int32)
pt_y:0x42d (System.Int32)
从卦中的 message:0x113
来看,这是经典的 WM_TIMER
消息,即定时器事件,用 C# 的话术就是窗体的 Timer 控件,参考MSDN截图:

接下来的关注点就是分析崩溃处的汇编代码了,使用 ub 命令反编译,输出如下:
C#
0:000> .ecxr
eax=15c96638 ebx=010fecb0 ecx=00000000 edx=000109a8 esi=000109a8 edi=0000001c
eip=02f1d218 esp=010fec7c ebp=010feca8 iopl=0 nv up ei pl nz na pe nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
02f1d218 8b410c mov eax,dword ptr [ecx+0Ch] ds:002b:0000000c=????????
0:000> ub 02f1d218 La
02f1d200 50 push eax
02f1d201 107567 adc byte ptr [ebp+67h],dh
02f1d204 51 push ecx
02f1d205 83ec04 sub esp,4
02f1d208 ff7304 push dword ptr [ebx+4]
02f1d20b ff7308 push dword ptr [ebx+8]
02f1d20e ff730c push dword ptr [ebx+0Ch]
02f1d211 8b13 mov edx,dword ptr [ebx]
02f1d213 8b4808 mov ecx,dword ptr [eax+8]
02f1d216 8b09 mov ecx,dword ptr [ecx]
由于 02f1d218 处没有显示函数名,根据经验猜测,这个应该是 JIT 动态生成的小函数,并且 02f1d204 是函数的入口点,程序崩溃是因为执行了 ecx=0 导致的,接下来根据 ecx 的来源进行反推看看有没有新的发现,输出如下:
C#
0:000> dp 15c96638+0x8 L1
15c96640 015a8658
0:000> dp 015a8658 L1
015a8658 00000000
0:000> !do 015a8658
<Note: this object has an invalid CLASS field>
Invalid object
0:000> !dumpmd 015a8658
015a8658 is not a MethodDesc
0:000> !dumpmt 015a8658
015a8658 is not a MethodTable
从卦中看没有任何发现,015a8658 既不是 obj,也不是 mt,也不是 md ,这一下子就把我打入了黑暗之渊。。。
3. 在绝望中寻找希望
一时也没想到好办法,到门口边抽烟边思考, message:0x113
是一个 Win32 的 Timer,应该是 Timer 的定时回调在JIT的函数中意外崩掉了,按道理说在崩溃处的内存附近应该能找到与之对应的C# Timer
,有了这个想法之后就在 015a8658 附近内存查找,还真给找到了,参考如下:
C#
0:000> dp 015a8658 L4
015a8658 00000000 2d61d1a8 2d4ef48c 00000000
0:000> !do 2d61d1a8
Name: System.Windows.Forms.NativeMethods+WndProc
MethodTable: 6f687200
EEClass: 6f681458
Size: 32(0x20) bytes
File: C:\windows\Microsoft.Net\assembly\GAC_MSIL\System.Windows.Forms\v4.0_4.0.0.0__b77a5c561934e089\System.Windows.Forms.dll
Fields:
MT Field Offset Type VT Attr Value Name
71ec2734 40002f3 4 System.Object 0 instance 2d61d164 _target
71ec2734 40002f4 8 System.Object 0 instance 00000000 _methodBase
71ec7b18 40002f5 c System.IntPtr 1 instance 5b73c34 _methodPtr
71ec7b18 40002f6 10 System.IntPtr 1 instance 0 _methodPtrAux
71ec2734 4000300 14 System.Object 0 instance 00000000 _invocationList
71ec7b18 4000301 18 System.IntPtr 1 instance 0 _invocationCount
0:000> !do 2d61d164
Name: System.Windows.Forms.Timer+TimerNativeWindow
MethodTable: 6f6995e4
EEClass: 6f6ede04
Size: 56(0x38) bytes
File: C:\windows\Microsoft.Net\assembly\GAC_MSIL\System.Windows.Forms\v4.0_4.0.0.0__b77a5c561934e089\System.Windows.Forms.dll
Fields:
MT Field Offset Type VT Attr Value Name
71ec2734 40005ba 4 System.Object 0 instance 00000000 __identity
71ec7b18 4001cf9 18 System.IntPtr 1 instance 0 handle
6f687200 4001cfa 8 ...veMethods+WndProc 0 instance 2d61d1a8 windowProc
71ec7b18 4001cfb 1c System.IntPtr 1 instance 15ca1bee windowProcPtr
71ec7b18 4001cfc 20 System.IntPtr 1 instance 77a77f70 defWindowProc
71ec878c 4001cfd 28 System.Boolean 1 instance 1 suppressedGC
71ec878c 4001cfe 29 System.Boolean 1 instance 0 ownHandle
6f685da8 4001cff c ...orms.NativeWindow 0 instance 00000000 previousWindow
6f685da8 4001d00 10 ...orms.NativeWindow 0 instance 00000000 nextWindow
71ec6018 4001d01 14 System.WeakReference 0 instance 2d61d19c weakThisPtr
70229854 4001d02 24 System.Int32 1 instance 0 windowDpiAwarenessContext
713fe7cc 4001ce3 b88 ...stics.TraceSwitch 0 static 00000000 WndProcChoice
71ec426c 4001ce4 b8c System.Int32[] 0 static 03111988 primes
71ec878c 4001ceb 1312 System.Boolean 1 static 1 anyHandleCreatedInApp
71ec42a8 4001ced 1304 System.Int32 1 static 1786 handleCount
71ec42a8 4001cee 1308 System.Int32 1 static 2915 hashLoadSize
6f685e9c 4001cef b90 ...ow+HandleBucket[] 0 static 2c7b5f14 hashBuckets
71ec7b18 4001cf0 130c System.IntPtr 1 static 77a77f70 userDefWindowProc
71ec3a08 4001cf3 1313 System.Byte 1 static 0 userSetProcFlagsForApp
71ec882c 4001cf4 1310 System.Int16 1 static 1 globalID
71f1c594 4001cf5 b94 ...ntPtr, mscorlib]] 0 static 03111bc8 hashForIdHandle
71f1c6d0 4001cf6 b98 ...Int16, mscorlib]] 0 static 03111c3c hashForHandleId
71ec2734 4001cf7 b9c System.Object 0 static 03111b90 internalSyncObject
71ec2734 4001cf8 ba0 System.Object 0 static 03111b9c createWindowSyncObject
71ec878c 4001cea 979 System.Boolean 1 TLstatic anyHandleCreated
>> Thread:Value 20b0:1 <<
71ec3a08 4001cf1 97a System.Byte 1 TLstatic wndProcFlags
>> Thread:Value 20b0:1 <<
71ec3a08 4001cf2 97b System.Byte 1 TLstatic userSetProcFlags
>> Thread:Value 20b0:1 <<
6f69ad98 400415e 2c ...ndows.Forms.Timer 0 instance 14462858 _owner
71ec42a8 400415f 30 System.Int32 1 instance 0 _timerID
71ec878c 4004161 2a System.Boolean 1 instance 0 _stoppingTimer
71ec42a8 4004160 190c System.Int32 1 static 2462 TimerID
冥冥之中自有天意。。。一顿欣喜若狂之后,赶紧看看这个 Timer 来自于哪里,使用 !gcroot 2d61d164
即可。
C#
0:000> !gcroot 2d61d164
Thread 20b0:
010fef80 6f85443a System.Windows.Forms.Application+ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr, Int32, Int32)
ebx: (interior)
-> 040e5568 System.Object[]
-> 031ced2c System.Windows.Forms.FormCollection
-> 031ced44 System.Collections.ArrayList
-> 1df9aaec System.Object[]
...
-> 144625fc DevComponents.DotNetBar.Controls.ComboBoxEx
-> 14462858 System.Windows.Forms.Timer
-> 2d61d164 System.Windows.Forms.Timer+TimerNativeWindow
从卦中的引用链来看,原来它是挂在 DevComponents.DotNetBar.Controls.ComboBoxEx
控件之下的,赶紧反向寻找源代码,截图如下:

尼玛居然是加密的,也是无语了,由于是 DevComponents 组件中的代码,赶紧看看组件的版本,结果发现是 2002 年的第一场雪
,距今 23年,没有bug也是奇怪了。。。截图如下:

最后给到朋友的建议就是升级 DevComponents
或者寻找替代品。
三:总结
有人说bug分析就是一门法医学,不断的在绝望中寻找希望,千淘万漉虽辛苦,吹尽狂沙始到金!