Golang程序启动

Golang程序启动

使用环境

系统架构 语言版本
Amd64 Go 1.8

main函数

main函数是程序的入口点,它负责初始化程序并执行主要的逻辑。在Go语言中,main函数是特殊的,因为它会在程序启动时以goroutine(main goroutine )形式通过调度来执行它。该goroutine被调度在一组特定的gmp模型上,通常是g0、p0、m0组成。如果把main函数作为程序运行的分界线,它们并不是启动main函数之后运行时创建的,这组特定的调度模型是在Go程序准备调度main goroutine 之前有汇编代码进行初始化完成的,专门为调度main函数准备。如果main函数中存在其他的goroutine,那么它们将调度在运行时期gmp上。

main函数为什么要以goroutine形式来执行

通过将main函数以goroutine的形式运行,Go语言可以实现并发执行和异步操作。这意味着在main函数中定义的函数或协程可以在不同的gmp模型中以goroutine形式并发地执行,从而实现并发编程。

汇编代码

阶段一初始化g0 m0

该阶段对g0、m0进行初始化,g0与m0相互引用

scss 复制代码
TEXT runtime·rt0_go(SB),NOSPLIT|TOPFRAME,$0
   // copy arguments forward on an even stack
   MOVQ   DI, AX    // argc
   MOVQ   SI, BX    // argv
   SUBQ   $(5*8), SP    // 3args 2auto
   ANDQ   $~15, SP
   MOVQ   AX, 24(SP)
   MOVQ   BX, 32(SP)
​
   // 这里初始化g0,为g0分配栈空间
   // create istack out of the given (operating system) stack.
   // _cgo_init may update stackguard.
   MOVQ   $runtime·g0(SB), DI
   LEAQ   (-64*1024+104)(SP), BX
   MOVQ   BX, g_stackguard0(DI)
   MOVQ   BX, g_stackguard1(DI)
   MOVQ   BX, (g_stack+stack_lo)(DI)
   MOVQ   SP, (g_stack+stack_hi)(DI)
​
  //TODO JMP 检测CPU代码
  ...
  ...
  ...
  //TODO JMP 检测CPU代码
​
  //初始化m的tls di = &m0.tls 把m0的tls成员地址存储到DI
  LEAQ    runtime·m0+m_tls(SB), DI
  //TODO 调用settls 设置线程本地存储 之后可以通过fs端寄存器进行访问 找到m0.tls DI绑定
  CALL    runtime·settls(SB)
  // store through it, to make sure it works
  //TODO 获取fs段基址 并放入BX寄存器中 其实就是m0.tls[1]的地址 ps: get_tls为编译器生成代码
  get_tls(BX)
  //TODO m0.tls[0] g是编译器实现 地址-8
  MOVQ    $0x123, g(BX)
  //TODO AX = m0.tls[0]
  MOVQ    runtime·m0+m_tls(SB), AX //CQS 检测线程本地存储是否初始化成功
  CMPQ    AX, $0x123
  //TODO 跳跃两个地址指令
  JEQ 2(PC)
  CALL    runtime·abort(SB)
ok:
  // set the per-goroutine and per-mach "registers"
  //TODO fs段基址放到bx m0.tls[1]
  get_tls(BX)
  //TODO g0的地址放到CX
  LEAQ    runtime·g0(SB), CX
  //TODO m0.tls[0] = &g0 地址赋值
  MOVQ    CX, g(BX)
  //TODO 把m0的地址放到AX
  LEAQ    runtime·m0(SB), AX
  //这里g0与m0之前相互引用
  // save m->g0 = g0
  //TODO m0.g0 = &g0
  MOVQ    CX, m_g0(AX)
  // save m0 to g0->m
  //TODO g0.m0 = &m0
  MOVQ    AX, g_m(CX)
  //CQS fs ==> tls[1] ==> g() ==> tls[0] ==> g0 ==> g0.m0 = &m0 ==> m0.g0 = &g0
  CLD             // convention is D is always left cleared
​
​
  //TODO 
  ...
  ...
  ...
  //TODO 
​

阶段二构建环境

初始化运行环境,系统启动参数、全局变量 ncpu(cpu核心数) 初始化p、堆内存分配、栈内存分配、垃圾回收器初始化

scss 复制代码
   CALL   runtime·check(SB)
   MOVL   24(SP), AX    // copy argc
   MOVL   AX, 0(SP)
   MOVQ   32(SP), AX    // copy argv
   MOVQ   AX, 8(SP)
   //TODO 参数初始化 栈空余的16利用
   CALL    runtime·args(SB)
   //TODO 初始化系统核心数
   CALL    runtime·osinit(SB)
   //TODO 开始初始化调度器
   CALL    runtime·schedinit(SB)
  
runtime·schedinit
scss 复制代码
// The bootstrap sequence is:
//
//  call osinit
//  call schedinit
//  make & queue new G
//  call runtime·mstart
//
// The new G calls runtime·main.
func schedinit() {
  lockInit(&sched.lock, lockRankSched)
  lockInit(&sched.sysmonlock, lockRankSysmon)
  lockInit(&sched.deferlock, lockRankDefer)
  lockInit(&sched.sudoglock, lockRankSudog)
  lockInit(&deadlock, lockRankDeadlock)
  lockInit(&paniclk, lockRankPanic)
  lockInit(&allglock, lockRankAllg)
  lockInit(&allpLock, lockRankAllp)
  lockInit(&reflectOffs.lock, lockRankReflectOffs)
  lockInit(&finlock, lockRankFin)
  lockInit(&trace.bufLock, lockRankTraceBuf)
  lockInit(&trace.stringsLock, lockRankTraceStrings)
  lockInit(&trace.lock, lockRankTrace)
  lockInit(&cpuprof.lock, lockRankCpuprof)
  lockInit(&trace.stackTab.lock, lockRankTraceStackTab)
  // Enforce that this lock is always a leaf lock.
  // All of this lock's critical sections should be
  // extremely short.
  lockInit(&memstats.heapStats.noPLock, lockRankLeafRank)
​
  // raceinit must be the first call to race detector.
  // In particular, it must be done before mallocinit below calls racemapshadow.
  _g_ := getg()
  if raceenabled {
    _g_.racectx, raceprocctx0 = raceinit()
  }
​
  sched.maxmcount = 10000
​
  // The world starts stopped.
  worldStopped()
​
  moduledataverify()
  stackinit()
  mallocinit()
  cpuinit()      // must run before alginit
  alginit()      // maps, hash, fastrand must not be used before this call
  fastrandinit() // must run before mcommoninit
  mcommoninit(_g_.m, -1)
  modulesinit()   // provides activeModules
  typelinksinit() // uses maps, activeModules
  itabsinit()     // uses activeModules
  stkobjinit()    // must run before GC starts
​
  sigsave(&_g_.m.sigmask)
  initSigmask = _g_.m.sigmask
​
  if offset := unsafe.Offsetof(sched.timeToRun); offset%8 != 0 {
    println(offset)
    throw("sched.timeToRun not aligned to 8 bytes")
  }
​
  goargs()
  goenvs()
  parsedebugvars()
  gcinit()
​
  lock(&sched.lock)
  sched.lastpoll = uint64(nanotime())
  procs := ncpu
  if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
    procs = n
  }
  if procresize(procs) != nil {
    throw("unknown runnable goroutine during bootstrap")
  }
  unlock(&sched.lock)
​
  // World is effectively started now, as P's can run.
  worldStarted()
​
  // For cgocheck > 1, we turn on the write barrier at all times
  // and check all pointer writes. We can't do this until after
  // procresize because the write barrier needs a P.
  if debug.cgocheck > 1 {
    writeBarrier.cgo = true
    writeBarrier.enabled = true
    for _, p := range allp {
      p.wbBuf.reset()
    }
  }
​
  if buildVersion == "" {
    // Condition should never trigger. This code just serves
    // to ensure runtime·buildVersion is kept in the resulting binary.
    buildVersion = "unknown"
  }
  if len(modinfo) == 1 {
    // Condition should never trigger. This code just serves
    // to ensure runtime·modinfo is kept in the resulting binary.
    modinfo = ""
  }
}
​
// ---------------------------------------初始化p----------------------------------------------------------------------
// Change number of processors.
//
// sched.lock must be held, and the world must be stopped.
//
// gcworkbufs must not be being modified by either the GC or the write barrier
// code, so the GC must not be running if the number of Ps actually changes.
//
// Returns list of Ps with local work, they need to be scheduled by the caller.
func procresize(nprocs int32) *p {
  assertLockHeld(&sched.lock)
  assertWorldStopped()
​
  old := gomaxprocs
  if old < 0 || nprocs <= 0 {
    throw("procresize: invalid arg")
  }
  if trace.enabled {
    traceGomaxprocs(nprocs)
  }
​
  // update statistics
  now := nanotime()
  if sched.procresizetime != 0 {
    sched.totaltime += int64(old) * (now - sched.procresizetime)
  }
  sched.procresizetime = now
​
  maskWords := (nprocs + 31) / 32
​
  // Grow allp if necessary.
  if nprocs > int32(len(allp)) {
    // Synchronize with retake, which could be running
    // concurrently since it doesn't run on a P.
    lock(&allpLock)
    if nprocs <= int32(cap(allp)) {
      allp = allp[:nprocs]
    } else {
      nallp := make([]*p, nprocs)
      // Copy everything up to allp's cap so we
      // never lose old allocated Ps.
      copy(nallp, allp[:cap(allp)])
      allp = nallp
    }
​
    if maskWords <= int32(cap(idlepMask)) {
      idlepMask = idlepMask[:maskWords]
      timerpMask = timerpMask[:maskWords]
    } else {
      nidlepMask := make([]uint32, maskWords)
      // No need to copy beyond len, old Ps are irrelevant.
      copy(nidlepMask, idlepMask)
      idlepMask = nidlepMask
​
      ntimerpMask := make([]uint32, maskWords)
      copy(ntimerpMask, timerpMask)
      timerpMask = ntimerpMask
    }
    unlock(&allpLock)
  }
​
  // initialize new P's
  for i := old; i < nprocs; i++ {
    pp := allp[i]
    if pp == nil {
      pp = new(p)
    }
    pp.init(i)
    atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
  }
​
  _g_ := getg()
  if _g_.m.p != 0 && _g_.m.p.ptr().id < nprocs {
    // continue to use the current P
    _g_.m.p.ptr().status = _Prunning
    _g_.m.p.ptr().mcache.prepareForSweep()
  } else {
    // release the current P and acquire allp[0].
    //
    // We must do this before destroying our current P
    // because p.destroy itself has write barriers, so we
    // need to do that from a valid P.
    if _g_.m.p != 0 {
      if trace.enabled {
        // Pretend that we were descheduled
        // and then scheduled again to keep
        // the trace sane.
        traceGoSched()
        traceProcStop(_g_.m.p.ptr())
      }
      _g_.m.p.ptr().m = 0
    }
    _g_.m.p = 0
    p := allp[0]
    p.m = 0
    p.status = _Pidle
    acquirep(p)
    if trace.enabled {
      traceGoStart()
    }
  }
​
  // g.m.p is now set, so we no longer need mcache0 for bootstrapping.
  mcache0 = nil
​
  // release resources from unused P's
  for i := nprocs; i < old; i++ {
    p := allp[i]
    p.destroy()
    // can't free P itself because it can be referenced by an M in syscall
  }
​
  // Trim allp.
  if int32(len(allp)) != nprocs {
    lock(&allpLock)
    allp = allp[:nprocs]
    idlepMask = idlepMask[:maskWords]
    timerpMask = timerpMask[:maskWords]
    unlock(&allpLock)
  }
​
  var runnablePs *p
  for i := nprocs - 1; i >= 0; i-- {
    p := allp[i]
    if _g_.m.p.ptr() == p {
      continue
    }
    p.status = _Pidle
    if runqempty(p) {
      pidleput(p)
    } else {
      p.m.set(mget())
      p.link.set(runnablePs)
      runnablePs = p
    }
  }
  stealOrder.reset(uint32(nprocs))
  var int32p *int32 = &gomaxprocs // make compiler check that gomaxprocs is an int32
  atomic.Store((*uint32)(unsafe.Pointer(int32p)), uint32(nprocs))
  return runnablePs
}
​

阶段三启动main

创建一个g,并加入队列中,开始调度

scss 复制代码
  // create a new goroutine to start program
   MOVQ   $runtime·mainPC(SB), AX       // entry
   PUSHQ  AX
  // 
   CALL   runtime·newproc(SB)
   POPQ   AX
  // start this M
   CALL   runtime·mstart(SB)
   CALL   runtime·abort(SB)  // mstart should never return
   RET
runtime·newproc
scss 复制代码
// 创建goroutine系统调用newproc
// Create a new g running fn.
// Put it on the queue of g's waiting to run.
// The compiler turns a go statement into a call to this.
func newproc(fn *funcval) {
  gp := getg()
  pc := getcallerpc()
  systemstack(func() {
    // 返回一个newg
    newg := newproc1(fn, gp, pc)
    // 获取p 并把newg 加入本地队列中去
    _p_ := getg().m.p.ptr()
    runqput(_p_, newg, true)
    if mainStarted {
      wakep()
    }
  })
}
​
// Create a new g in state _Grunnable, starting at fn. callerpc is the
// address of the go statement that created this. The caller is responsible
// for adding the new g to the scheduler. 添加一个newg 进行调度
func newproc1(fn *funcval, callergp *g, callerpc uintptr) *g {
  _g_ := getg()
​
  if fn == nil {
    _g_.m.throwing = -1 // do not dump full stacks
    throw("go of nil func value")
  }
  acquirem() // disable preemption because it can be holding p in a local var
​
  _p_ := _g_.m.p.ptr()
  newg := gfget(_p_)
  // 获取一个newg 若果等于nil 那么创建一个新的,设置栈大小2kb
  if newg == nil {
    newg = malg(_StackMin)
    casgstatus(newg, _Gidle, _Gdead)
    allgadd(newg) // publishes with a g->status of Gdead so GC scanner doesn't look at uninitialized stack.
  }
  if newg.stack.hi == 0 {
    throw("newproc1: newg missing stack")
  }
​
  if readgstatus(newg) != _Gdead {
    throw("newproc1: new g is not Gdead")
  }
​
  totalSize := uintptr(4*goarch.PtrSize + sys.MinFrameSize) // extra space in case of reads slightly beyond frame
  totalSize = alignUp(totalSize, sys.StackAlign)
  sp := newg.stack.hi - totalSize
  spArg := sp
  if usesLR {
    // caller's LR
    *(*uintptr)(unsafe.Pointer(sp)) = 0
    prepGoExitFrame(sp)
    spArg += sys.MinFrameSize
  }
​
  memclrNoHeapPointers(unsafe.Pointer(&newg.sched), unsafe.Sizeof(newg.sched))
  newg.sched.sp = sp
  newg.stktopsp = sp
  // +PCQuantum so that previous instruction is in same function
  newg.sched.pc = abi.FuncPCABI0(goexit) + sys.PCQuantum
  newg.sched.g = guintptr(unsafe.Pointer(newg))
  gostartcallfn(&newg.sched, fn)
  newg.gopc = callerpc
  newg.ancestors = saveAncestors(callergp)
  newg.startpc = fn.fn
  if isSystemGoroutine(newg, false) {
    atomic.Xadd(&sched.ngsys, +1)
  } else {
    // Only user goroutines inherit pprof labels.
    if _g_.m.curg != nil {
      newg.labels = _g_.m.curg.labels
    }
  }
  // Track initial transition?
  newg.trackingSeq = uint8(fastrand())
  if newg.trackingSeq%gTrackingPeriod == 0 {
    newg.tracking = true
  }
  casgstatus(newg, _Gdead, _Grunnable)
  gcController.addScannableStack(_p_, int64(newg.stack.hi-newg.stack.lo))
​
  if _p_.goidcache == _p_.goidcacheend {
    // Sched.goidgen is the last allocated id,
    // this batch must be [sched.goidgen+1, sched.goidgen+GoidCacheBatch].
    // At startup sched.goidgen=0, so main goroutine receives goid=1.
    _p_.goidcache = atomic.Xadd64(&sched.goidgen, _GoidCacheBatch)
    _p_.goidcache -= _GoidCacheBatch - 1
    _p_.goidcacheend = _p_.goidcache + _GoidCacheBatch
  }
  newg.goid = int64(_p_.goidcache)
  _p_.goidcache++
  if raceenabled {
    newg.racectx = racegostart(callerpc)
  }
  if trace.enabled {
    traceGoCreate(newg, newg.startpc)
  }
  releasem(_g_.m)
​
  return newg
}
​
​
runtime·mstart
scss 复制代码
// mstart is the entry-point for new Ms.
// It is written in assembly, uses ABI0, is marked TOPFRAME, and calls mstart0.
func mstart()
​
// mstart0 is the Go entry-point for new Ms.
// This must not split the stack because we may not even have stack
// bounds set up yet.
//
// May run during STW (because it doesn't have a P yet), so write
// barriers are not allowed.
//
//go:nosplit
//go:nowritebarrierrec
func mstart0() {
  _g_ := getg()
​
  osStack := _g_.stack.lo == 0
  if osStack {
    // Initialize stack bounds from system stack.
    // Cgo may have left stack size in stack.hi.
    // minit may update the stack bounds.
    //
    // Note: these bounds may not be very accurate.
    // We set hi to &size, but there are things above
    // it. The 1024 is supposed to compensate this,
    // but is somewhat arbitrary.
    size := _g_.stack.hi
    if size == 0 {
      size = 8192 * sys.StackGuardMultiplier
    }
    _g_.stack.hi = uintptr(noescape(unsafe.Pointer(&size)))
    _g_.stack.lo = _g_.stack.hi - size + 1024
  }
  // Initialize stack guard so that we can start calling regular
  // Go code.
  _g_.stackguard0 = _g_.stack.lo + _StackGuard
  // This is the g0, so we can also call go:systemstack
  // functions, which check stackguard1.
  _g_.stackguard1 = _g_.stackguard0
  mstart1()
​
  // Exit this thread.
  if mStackIsSystemAllocated() {
    // Windows, Solaris, illumos, Darwin, AIX and Plan 9 always system-allocate
    // the stack, but put it in _g_.stack before mstart,
    // so the logic above hasn't set osStack yet.
    osStack = true
  }
  mexit(osStack)
}
​
// The go:noinline is to guarantee the getcallerpc/getcallersp below are safe,
// so that we can set up g0.sched to return to the call of mstart1 above.
//go:noinline
func mstart1() {
  _g_ := getg()
​
  if _g_ != _g_.m.g0 {
    throw("bad runtime·mstart")
  }
​
  // Set up m.g0.sched as a label returning to just
  // after the mstart1 call in mstart0 above, for use by goexit0 and mcall.
  // We're never coming back to mstart1 after we call schedule,
  // so other calls can reuse the current frame.
  // And goexit0 does a gogo that needs to return from mstart1
  // and let mstart0 exit the thread.
  _g_.sched.g = guintptr(unsafe.Pointer(_g_))
  _g_.sched.pc = getcallerpc()
  _g_.sched.sp = getcallersp()
​
  asminit()
  minit()
​
  // Install signal handlers; after minit so that minit can
  // prepare the thread to be able to handle the signals.
  if _g_.m == &m0 {
    mstartm0()
  }
​
  if fn := _g_.m.mstartfn; fn != nil {
    fn()
  }
​
  if _g_.m != &m0 {
    acquirep(_g_.m.nextp.ptr())
    _g_.m.nextp = 0
  }
  schedule()
}
​

g0的作用

在Go语言中,g0是一个特殊的goroutine(协程),主要在运行时期被用作调度器(scheduler)执行调度循环的场地(栈)。对于一个线程(M)来说,g0总是它第一个创建的goroutine。

g0的作用包括:

  1. 调度:g0负责调度其他Goroutine的执行。它是调度器执行调度循环的主要场地,负责获取下一个需要执行的Goroutine。
  2. 执行系统调用和阻塞操作:当当前Goroutine需要进行系统调用或阻塞操作时,g0会执行这些操作,并在完成后继续执行原来的Goroutine。
  3. 垃圾回收:g0参与垃圾回收过程,例如标记、扫描等操作。
  4. 扩容栈:当Goroutine需要扩展其堆栈时,g0会负责进行栈扩容。
  5. 执行一些特殊任务:例如创建新的Goroutine、处理defer语句等。

总之,g0在Go语言中扮演了重要的角色,为其他Goroutine提供了调度和执行的场地,并参与了系统调用、垃圾回收和栈扩容等操作。

g0的区别

在Go语言中,执行runtime.main的main goroutine(g0)和运行时期由m创建的goroutine(g0)是两个不同的概念。

  1. 执行runtime.main的main goroutine是程序启动后首先被创建的goroutine,特殊的是,这个goroutine会被标记为"系统栈",并且它是由运行时(runtime)在M0上创建的。一旦创建,这个特殊的goroutine(我们称之为g0)会与M0绑定,并开始执行main函数。在执行过程中,M0需要找到一个空闲的P去捆绑,然后将main函数放入捆绑的P的本地队列中等待执行。
  1. 另一方面,每个并发的执行单元被称为一个goroutine,而运行时期创建的g0则更为普遍。特别的是,每一个M都会有一个名叫g0的初代goroutine,此goroutine在M的创建时被创建,它的栈空间默认为 8K。,其主要工作就是进行goroutine的调度、垃圾回收等。不同于执行runtime.main的main goroutine,这个g0的栈是在主线程栈上分配的,并且它的栈空间有64k

总结来说:main方法的g0是专为程序入口点设计的特殊goroutine,而运行时期由M创建的goroutine(g0)是动态创建的用于执行特定任务的goroutine,它们在创建方式、作用、生命周期和调度上存在区别

运行时期goroutine的空间在堆还是栈分配

在Go语言中,goroutine的栈是动态地分配在堆上的。每个goroutine开始时会在堆上分配一小块栈空间,而这个栈空间会根据需要动态地增长或缩减。

  1. 具体来说,运行时包含两个重要的全局变量,分别是 runtime.stackpool 和 runtime.stackLarge,这两个变量分别表示全局的栈缓存和大栈缓存,前者可以分配小于 32KB 的内存,后者用来分配大于 32KB 的栈空间。当一个goroutine被创建时,它会在堆上获得一小块初始栈空间。随着goroutine调用的函数层级的深入或者局部变量需要的越来越多时,运行时会调用 runtime.morestack 和 runtime.newstack创建一个新的栈空间,这些栈空间是不连续的,但是当前goroutine的多个栈空间会以双向链表的方式连接起来。
  1. 需要注意的是,频繁的堆栈分配和释放操作可能会造成巨大的开销。例如,如果在一个快速紧密的循环中,连续进行堆栈分配操作,那么分配/释放操作将会造成巨大开销。为了避免这种情况,Go将堆栈的最小值从 2Kb 加到 8Kb,当采用连续堆栈策略后,又将其减小回 2Kb。
  1. 同时,Go语言的内存管理实现了主动申请与主动释放管理,增加了逃逸分析和垃圾回收机制,将开发者从复杂的内存管理中解放出来,让开发者有更多的精力去关注软件设计。
相关推荐
爱勇宝11 分钟前
深扒 Anthropic 1680 位工程师简历:应届生几乎没机会,AI 公司最缺的不是博士
前端·后端·程序员
AskHarries28 分钟前
工具失败时怎么办:重试、回滚、人工确认和风险提示
后端·程序员
苏三说技术2 小时前
Claude Code从失控到起飞,只用了这些技巧
后端
长栎3 小时前
写 for 循环写了十年,你却从没用过迭代器模式最狠的那一面
后端
LiaCode3 小时前
Redis 在生产项目的使用
前端·后端
用户559822481223 小时前
Docker Compose Down 导致容器数据误删——ext4 日志恢复全记录
后端
LiaCode3 小时前
一天学完 redis 的爽翻版核心知识总结
前端·后端
大刚测试开发实战3 小时前
如何内网穿透访问本地私有化部署的TestHub
前端·后端·github
xiaodaoluanzha3 小时前
迄今為止,最簡單的編程語言 Nolang
前端·后端
Csvn3 小时前
Docker 容器管理入门 — 从镜像到容器编排
后端