Golang程序启动
使用环境
系统架构 | 语言版本 | |
---|---|---|
Amd64 | Go 1.8 |
main函数
main
函数是程序的入口点,它负责初始化程序并执行主要的逻辑。在Go语言中,main
函数是特殊的,因为它会在程序启动
时以goroutine(main goroutine )形式通过调度来执行它。该goroutine被调度在一组特定的gmp模型上,通常是g0、p0、m0组成。如果把main函数作为程序运行的分界线,它们并不是启动main函数之后运行时创建的,这组特定的调度模型是在Go程序准备调度main goroutine 之前有汇编代码进行初始化完成的,专门为调度main函数准备。如果main函数中存在其他的goroutine,那么它们将调度在运行时期gmp上。
main函数为什么要以goroutine形式来执行
通过将main
函数以goroutine的形式运行,Go语言可以实现并发执行和异步操作。这意味着在main
函数中定义的函数或协程可以在不同的gmp模型中以goroutine形式并发地执行,从而实现并发编程。
汇编代码
阶段一初始化g0 m0
该阶段对g0、m0进行初始化,g0与m0相互引用
scss
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
// copy arguments forward on an even stack
MOVQ DI, AX // argc
MOVQ SI, BX // argv
SUBQ $(5*8), SP // 3args 2auto
ANDQ $~15, SP
MOVQ AX, 24(SP)
MOVQ BX, 32(SP)
// 这里初始化g0,为g0分配栈空间
// create istack out of the given (operating system) stack.
// _cgo_init may update stackguard.
MOVQ $runtime·g0(SB), DI
LEAQ (-64*1024+104)(SP), BX
MOVQ BX, g_stackguard0(DI)
MOVQ BX, g_stackguard1(DI)
MOVQ BX, (g_stack+stack_lo)(DI)
MOVQ SP, (g_stack+stack_hi)(DI)
//TODO JMP 检测CPU代码
...
...
...
//TODO JMP 检测CPU代码
//初始化m的tls di = &m0.tls 把m0的tls成员地址存储到DI
LEAQ runtime·m0+m_tls(SB), DI
//TODO 调用settls 设置线程本地存储 之后可以通过fs端寄存器进行访问 找到m0.tls DI绑定
CALL runtime·settls(SB)
// store through it, to make sure it works
//TODO 获取fs段基址 并放入BX寄存器中 其实就是m0.tls[1]的地址 ps: get_tls为编译器生成代码
get_tls(BX)
//TODO m0.tls[0] g是编译器实现 地址-8
MOVQ $0x123, g(BX)
//TODO AX = m0.tls[0]
MOVQ runtime·m0+m_tls(SB), AX //CQS 检测线程本地存储是否初始化成功
CMPQ AX, $0x123
//TODO 跳跃两个地址指令
JEQ 2(PC)
CALL runtime·abort(SB)
ok:
// set the per-goroutine and per-mach "registers"
//TODO fs段基址放到bx m0.tls[1]
get_tls(BX)
//TODO g0的地址放到CX
LEAQ runtime·g0(SB), CX
//TODO m0.tls[0] = &g0 地址赋值
MOVQ CX, g(BX)
//TODO 把m0的地址放到AX
LEAQ runtime·m0(SB), AX
//这里g0与m0之前相互引用
// save m->g0 = g0
//TODO m0.g0 = &g0
MOVQ CX, m_g0(AX)
// save m0 to g0->m
//TODO g0.m0 = &m0
MOVQ AX, g_m(CX)
//CQS fs ==> tls[1] ==> g() ==> tls[0] ==> g0 ==> g0.m0 = &m0 ==> m0.g0 = &g0
CLD // convention is D is always left cleared
//TODO
...
...
...
//TODO
阶段二构建环境
初始化运行环境,系统启动参数、全局变量 ncpu(cpu核心数) 初始化p、堆内存分配、栈内存分配、垃圾回收器初始化
scss
CALL runtime·check(SB)
MOVL 24(SP), AX // copy argc
MOVL AX, 0(SP)
MOVQ 32(SP), AX // copy argv
MOVQ AX, 8(SP)
//TODO 参数初始化 栈空余的16利用
CALL runtime·args(SB)
//TODO 初始化系统核心数
CALL runtime·osinit(SB)
//TODO 开始初始化调度器
CALL runtime·schedinit(SB)
runtime·schedinit
scss
// The bootstrap sequence is:
//
// call osinit
// call schedinit
// make & queue new G
// call runtime·mstart
//
// The new G calls runtime·main.
func schedinit() {
lockInit(&sched.lock, lockRankSched)
lockInit(&sched.sysmonlock, lockRankSysmon)
lockInit(&sched.deferlock, lockRankDefer)
lockInit(&sched.sudoglock, lockRankSudog)
lockInit(&deadlock, lockRankDeadlock)
lockInit(&paniclk, lockRankPanic)
lockInit(&allglock, lockRankAllg)
lockInit(&allpLock, lockRankAllp)
lockInit(&reflectOffs.lock, lockRankReflectOffs)
lockInit(&finlock, lockRankFin)
lockInit(&trace.bufLock, lockRankTraceBuf)
lockInit(&trace.stringsLock, lockRankTraceStrings)
lockInit(&trace.lock, lockRankTrace)
lockInit(&cpuprof.lock, lockRankCpuprof)
lockInit(&trace.stackTab.lock, lockRankTraceStackTab)
// Enforce that this lock is always a leaf lock.
// All of this lock's critical sections should be
// extremely short.
lockInit(&memstats.heapStats.noPLock, lockRankLeafRank)
// raceinit must be the first call to race detector.
// In particular, it must be done before mallocinit below calls racemapshadow.
_g_ := getg()
if raceenabled {
_g_.racectx, raceprocctx0 = raceinit()
}
sched.maxmcount = 10000
// The world starts stopped.
worldStopped()
moduledataverify()
stackinit()
mallocinit()
cpuinit() // must run before alginit
alginit() // maps, hash, fastrand must not be used before this call
fastrandinit() // must run before mcommoninit
mcommoninit(_g_.m, -1)
modulesinit() // provides activeModules
typelinksinit() // uses maps, activeModules
itabsinit() // uses activeModules
stkobjinit() // must run before GC starts
sigsave(&_g_.m.sigmask)
initSigmask = _g_.m.sigmask
if offset := unsafe.Offsetof(sched.timeToRun); offset%8 != 0 {
println(offset)
throw("sched.timeToRun not aligned to 8 bytes")
}
goargs()
goenvs()
parsedebugvars()
gcinit()
lock(&sched.lock)
sched.lastpoll = uint64(nanotime())
procs := ncpu
if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
procs = n
}
if procresize(procs) != nil {
throw("unknown runnable goroutine during bootstrap")
}
unlock(&sched.lock)
// World is effectively started now, as P's can run.
worldStarted()
// For cgocheck > 1, we turn on the write barrier at all times
// and check all pointer writes. We can't do this until after
// procresize because the write barrier needs a P.
if debug.cgocheck > 1 {
writeBarrier.cgo = true
writeBarrier.enabled = true
for _, p := range allp {
p.wbBuf.reset()
}
}
if buildVersion == "" {
// Condition should never trigger. This code just serves
// to ensure runtime·buildVersion is kept in the resulting binary.
buildVersion = "unknown"
}
if len(modinfo) == 1 {
// Condition should never trigger. This code just serves
// to ensure runtime·modinfo is kept in the resulting binary.
modinfo = ""
}
}
// ---------------------------------------初始化p----------------------------------------------------------------------
// Change number of processors.
//
// sched.lock must be held, and the world must be stopped.
//
// gcworkbufs must not be being modified by either the GC or the write barrier
// code, so the GC must not be running if the number of Ps actually changes.
//
// Returns list of Ps with local work, they need to be scheduled by the caller.
func procresize(nprocs int32) *p {
assertLockHeld(&sched.lock)
assertWorldStopped()
old := gomaxprocs
if old < 0 || nprocs <= 0 {
throw("procresize: invalid arg")
}
if trace.enabled {
traceGomaxprocs(nprocs)
}
// update statistics
now := nanotime()
if sched.procresizetime != 0 {
sched.totaltime += int64(old) * (now - sched.procresizetime)
}
sched.procresizetime = now
maskWords := (nprocs + 31) / 32
// Grow allp if necessary.
if nprocs > int32(len(allp)) {
// Synchronize with retake, which could be running
// concurrently since it doesn't run on a P.
lock(&allpLock)
if nprocs <= int32(cap(allp)) {
allp = allp[:nprocs]
} else {
nallp := make([]*p, nprocs)
// Copy everything up to allp's cap so we
// never lose old allocated Ps.
copy(nallp, allp[:cap(allp)])
allp = nallp
}
if maskWords <= int32(cap(idlepMask)) {
idlepMask = idlepMask[:maskWords]
timerpMask = timerpMask[:maskWords]
} else {
nidlepMask := make([]uint32, maskWords)
// No need to copy beyond len, old Ps are irrelevant.
copy(nidlepMask, idlepMask)
idlepMask = nidlepMask
ntimerpMask := make([]uint32, maskWords)
copy(ntimerpMask, timerpMask)
timerpMask = ntimerpMask
}
unlock(&allpLock)
}
// initialize new P's
for i := old; i < nprocs; i++ {
pp := allp[i]
if pp == nil {
pp = new(p)
}
pp.init(i)
atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
}
_g_ := getg()
if _g_.m.p != 0 && _g_.m.p.ptr().id < nprocs {
// continue to use the current P
_g_.m.p.ptr().status = _Prunning
_g_.m.p.ptr().mcache.prepareForSweep()
} else {
// release the current P and acquire allp[0].
//
// We must do this before destroying our current P
// because p.destroy itself has write barriers, so we
// need to do that from a valid P.
if _g_.m.p != 0 {
if trace.enabled {
// Pretend that we were descheduled
// and then scheduled again to keep
// the trace sane.
traceGoSched()
traceProcStop(_g_.m.p.ptr())
}
_g_.m.p.ptr().m = 0
}
_g_.m.p = 0
p := allp[0]
p.m = 0
p.status = _Pidle
acquirep(p)
if trace.enabled {
traceGoStart()
}
}
// g.m.p is now set, so we no longer need mcache0 for bootstrapping.
mcache0 = nil
// release resources from unused P's
for i := nprocs; i < old; i++ {
p := allp[i]
p.destroy()
// can't free P itself because it can be referenced by an M in syscall
}
// Trim allp.
if int32(len(allp)) != nprocs {
lock(&allpLock)
allp = allp[:nprocs]
idlepMask = idlepMask[:maskWords]
timerpMask = timerpMask[:maskWords]
unlock(&allpLock)
}
var runnablePs *p
for i := nprocs - 1; i >= 0; i-- {
p := allp[i]
if _g_.m.p.ptr() == p {
continue
}
p.status = _Pidle
if runqempty(p) {
pidleput(p)
} else {
p.m.set(mget())
p.link.set(runnablePs)
runnablePs = p
}
}
stealOrder.reset(uint32(nprocs))
var int32p *int32 = &gomaxprocs // make compiler check that gomaxprocs is an int32
atomic.Store((*uint32)(unsafe.Pointer(int32p)), uint32(nprocs))
return runnablePs
}
阶段三启动main
创建一个g,并加入队列中,开始调度
scss
// create a new goroutine to start program
MOVQ $runtime·mainPC(SB), AX // entry
PUSHQ AX
//
CALL runtime·newproc(SB)
POPQ AX
// start this M
CALL runtime·mstart(SB)
CALL runtime·abort(SB) // mstart should never return
RET
runtime·newproc
scss
// 创建goroutine系统调用newproc
// Create a new g running fn.
// Put it on the queue of g's waiting to run.
// The compiler turns a go statement into a call to this.
func newproc(fn *funcval) {
gp := getg()
pc := getcallerpc()
systemstack(func() {
// 返回一个newg
newg := newproc1(fn, gp, pc)
// 获取p 并把newg 加入本地队列中去
_p_ := getg().m.p.ptr()
runqput(_p_, newg, true)
if mainStarted {
wakep()
}
})
}
// Create a new g in state _Grunnable, starting at fn. callerpc is the
// address of the go statement that created this. The caller is responsible
// for adding the new g to the scheduler. 添加一个newg 进行调度
func newproc1(fn *funcval, callergp *g, callerpc uintptr) *g {
_g_ := getg()
if fn == nil {
_g_.m.throwing = -1 // do not dump full stacks
throw("go of nil func value")
}
acquirem() // disable preemption because it can be holding p in a local var
_p_ := _g_.m.p.ptr()
newg := gfget(_p_)
// 获取一个newg 若果等于nil 那么创建一个新的,设置栈大小2kb
if newg == nil {
newg = malg(_StackMin)
casgstatus(newg, _Gidle, _Gdead)
allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
}
if newg.stack.hi == 0 {
throw("newproc1: newg missing stack")
}
if readgstatus(newg) != _Gdead {
throw("newproc1: new g is not Gdead")
}
totalSize := uintptr(4*goarch.PtrSize + sys.MinFrameSize) // extra space in case of reads slightly beyond frame
totalSize = alignUp(totalSize, sys.StackAlign)
sp := newg.stack.hi - totalSize
spArg := sp
if usesLR {
// caller's LR
*(*uintptr)(unsafe.Pointer(sp)) = 0
prepGoExitFrame(sp)
spArg += sys.MinFrameSize
}
memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched))
newg.sched.sp = sp
newg.stktopsp = sp
// +PCQuantum so that previous instruction is in same function
newg.sched.pc = abi.FuncPCABI0(goexit) + sys.PCQuantum
newg.sched.g = guintptr(unsafe.Pointer(newg))
gostartcallfn(&newg.sched, fn)
newg.gopc = callerpc
newg.ancestors = saveAncestors(callergp)
newg.startpc = fn.fn
if isSystemGoroutine(newg, false) {
atomic.Xadd(&sched.ngsys, +1)
} else {
// Only user goroutines inherit pprof labels.
if _g_.m.curg != nil {
newg.labels = _g_.m.curg.labels
}
}
// Track initial transition?
newg.trackingSeq = uint8(fastrand())
if newg.trackingSeq%gTrackingPeriod == 0 {
newg.tracking = true
}
casgstatus(newg, _Gdead, _Grunnable)
gcController.addScannableStack(_p_, int64(newg.stack.hi-newg.stack.lo))
if _p_.goidcache == _p_.goidcacheend {
// Sched.goidgen is the last allocated id,
// this batch must be [sched.goidgen+1, sched.goidgen+GoidCacheBatch].
// At startup sched.goidgen=0, so main goroutine receives goid=1.
_p_.goidcache = atomic.Xadd64(&sched.goidgen, _GoidCacheBatch)
_p_.goidcache -= _GoidCacheBatch - 1
_p_.goidcacheend = _p_.goidcache + _GoidCacheBatch
}
newg.goid = int64(_p_.goidcache)
_p_.goidcache++
if raceenabled {
newg.racectx = racegostart(callerpc)
}
if trace.enabled {
traceGoCreate(newg, newg.startpc)
}
releasem(_g_.m)
return newg
}
runtime·mstart
scss
// mstart is the entry-point for new Ms.
// It is written in assembly, uses ABI0, is marked TOPFRAME, and calls mstart0.
func mstart()
// mstart0 is the Go entry-point for new Ms.
// This must not split the stack because we may not even have stack
// bounds set up yet.
//
// May run during STW (because it doesn't have a P yet), so write
// barriers are not allowed.
//
//go:nosplit
//go:nowritebarrierrec
func mstart0() {
_g_ := getg()
osStack := _g_.stack.lo == 0
if osStack {
// Initialize stack bounds from system stack.
// Cgo may have left stack size in stack.hi.
// minit may update the stack bounds.
//
// Note: these bounds may not be very accurate.
// We set hi to &size, but there are things above
// it. The 1024 is supposed to compensate this,
// but is somewhat arbitrary.
size := _g_.stack.hi
if size == 0 {
size = 8192 * sys.StackGuardMultiplier
}
_g_.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
_g_.stack.lo = _g_.stack.hi - size + 1024
}
// Initialize stack guard so that we can start calling regular
// Go code.
_g_.stackguard0 = _g_.stack.lo + _StackGuard
// This is the g0, so we can also call go:systemstack
// functions, which check stackguard1.
_g_.stackguard1 = _g_.stackguard0
mstart1()
// Exit this thread.
if mStackIsSystemAllocated() {
// Windows, Solaris, illumos, Darwin, AIX and Plan 9 always system-allocate
// the stack, but put it in _g_.stack before mstart,
// so the logic above hasn't set osStack yet.
osStack = true
}
mexit(osStack)
}
// The go:noinline is to guarantee the getcallerpc/getcallersp below are safe,
// so that we can set up g0.sched to return to the call of mstart1 above.
//go:noinline
func mstart1() {
_g_ := getg()
if _g_ != _g_.m.g0 {
throw("bad runtime·mstart")
}
// Set up m.g0.sched as a label returning to just
// after the mstart1 call in mstart0 above, for use by goexit0 and mcall.
// We're never coming back to mstart1 after we call schedule,
// so other calls can reuse the current frame.
// And goexit0 does a gogo that needs to return from mstart1
// and let mstart0 exit the thread.
_g_.sched.g = guintptr(unsafe.Pointer(_g_))
_g_.sched.pc = getcallerpc()
_g_.sched.sp = getcallersp()
asminit()
minit()
// Install signal handlers; after minit so that minit can
// prepare the thread to be able to handle the signals.
if _g_.m == &m0 {
mstartm0()
}
if fn := _g_.m.mstartfn; fn != nil {
fn()
}
if _g_.m != &m0 {
acquirep(_g_.m.nextp.ptr())
_g_.m.nextp = 0
}
schedule()
}
g0的作用
在Go语言中,g0是一个特殊的goroutine(协程),主要在运行时期被用作调度器(scheduler)执行调度循环的场地(栈)。对于一个线程(M)来说,g0总是它第一个创建的goroutine。
g0的作用包括:
- 调度:g0负责调度其他Goroutine的执行。它是调度器执行调度循环的主要场地,负责获取下一个需要执行的Goroutine。
- 执行系统调用和阻塞操作:当当前Goroutine需要进行系统调用或阻塞操作时,g0会执行这些操作,并在完成后继续执行原来的Goroutine。
- 垃圾回收:g0参与垃圾回收过程,例如标记、扫描等操作。
- 扩容栈:当Goroutine需要扩展其堆栈时,g0会负责进行栈扩容。
- 执行一些特殊任务:例如创建新的Goroutine、处理defer语句等。
总之,g0在Go语言中扮演了重要的角色,为其他Goroutine提供了调度和执行的场地,并参与了系统调用、垃圾回收和栈扩容等操作。
g0的区别
在Go语言中,执行runtime.main的main goroutine(g0)和运行时期由m创建的goroutine(g0)是两个不同的概念。
- 执行runtime.main的main goroutine是程序启动后首先被创建的goroutine,特殊的是,这个goroutine会被标记为"系统栈",并且它是由运行时(runtime)在M0上创建的。一旦创建,这个特殊的goroutine(我们称之为g0)会与M0绑定,并开始执行main函数。在执行过程中,M0需要找到一个空闲的P去捆绑,然后将main函数放入捆绑的P的本地队列中等待执行。
- 另一方面,每个并发的执行单元被称为一个goroutine,而运行时期创建的g0则更为普遍。特别的是,每一个M都会有一个名叫g0的初代goroutine,此goroutine在M的创建时被创建,它的
栈空间默认为 8K
。,其主要工作就是进行goroutine的调度、垃圾回收等。不同于执行runtime.main的main goroutine,这个g0的栈是在主线程栈上分配的,并且它的栈空间有64k
。
总结来说:main
方法的g0
是专为程序入口点设计的特殊goroutine,而运行时期由M创建的goroutine(g0)是动态创建的用于执行特定任务的goroutine,它们在创建方式、作用、生命周期和调度上存在区别
运行时期goroutine的空间在堆还是栈分配
在Go语言中,goroutine的栈是动态地分配在堆上的。每个goroutine开始时会在堆上分配一小块栈空间,而这个栈空间会根据需要动态地增长或缩减。
- 具体来说,运行时包含两个重要的全局变量,分别是 runtime.stackpool 和 runtime.stackLarge,这两个变量分别表示全局的栈缓存和大栈缓存,前者可以分配小于 32KB 的内存,后者用来分配大于 32KB 的栈空间。当一个goroutine被创建时,它会在堆上获得一小块初始栈空间。随着goroutine调用的函数层级的深入或者局部变量需要的越来越多时,运行时会调用 runtime.morestack 和 runtime.newstack创建一个新的栈空间,这些栈空间是不连续的,但是当前goroutine的多个栈空间会以双向链表的方式连接起来。
- 需要注意的是,频繁的堆栈分配和释放操作可能会造成巨大的开销。例如,如果在一个快速紧密的循环中,连续进行堆栈分配操作,那么分配/释放操作将会造成巨大开销。为了避免这种情况,Go将堆栈的最小值从 2Kb 加到 8Kb,当采用连续堆栈策略后,又将其减小回 2Kb。
- 同时,Go语言的内存管理实现了主动申请与主动释放管理,增加了逃逸分析和垃圾回收机制,将开发者从复杂的内存管理中解放出来,让开发者有更多的精力去关注软件设计。