导语：本文主要介绍协程的相关概念，同时以 Kotlin 为切入点，介绍 Kotlin 中协程的使用和相关的实现原理。

概念

协程（英语：coroutine）最早于1958年由马尔文·爱德华·康威提出，是计算机程序的一类组件，协程最大的特点是在程序执行过程中允许被挂起与被恢复，挂起后并不阻塞执行的线程。相对线程而言，协程更为一般更加轻量和灵活，是在用户态中实现的，而非内核态。在协程之间的切换不需要涉及任何系统调用或任何阻塞调用，也不需要使用保证关键代码段的同步性原语，比如互斥锁、信号量等，甚至不需要来自操作系统的支持（无需调用操作系统 api）。协程的调度（挂起、恢复）等是进程在用户态就可以控制的，没有额外的系统层面的切换开销，而线程的创建、运行、切换、停止等都需要通过操作系统能力，进程需要陷入内核态来完成。因此协程的挂起和恢复（切换）的效率非常高。

协程是计算机领域内的一个并发编程范式，Golang、Kotlin、C#、JS、Python 等语言目前都支持协程，本文以 Kotlin 为切入点，通过 Kotlin 协程的实现来窥探协程的设计思路。

协程使用

协程在 Kotlin 1.3 版本后被引入，作为一个额外的协程库存在，因此在使用前需要先添加协程库的依赖：

groovy 复制代码

implementation("org.jetbrains.kotlinx:kotlinx-coroutines-core:1.5.2")

依赖添加完成后，先创建一个最简单的协程：

kotlin 复制代码

fun main(args: Array<String>) {

    CoroutineScope(Dispatchers.Default).launch {
        // 该协程内部只打印了一行 log
        println("${Thread.currentThread().name}: coroutine test 1")
    }

}

运行结果：

log 复制代码

DefaultDispatcher-worker-1: coroutine test 1

因为 Kotlin 中协程必须要依附于某个作用域（CoroutineScope），我们先调用 CoroutineScope(Dispatchers.Default) 来创建一个具体默认派发器的协程作用域，派发器的作用主要是决定将协程派发到哪个线程/线程池里面执行，这里选择了默认的派发器，即为将该作用域内的协程派发到协程默认的线程池中执行。有了协程作用域管理对象后，我们通过 CoroutineScope.launch 方法来创建并派发一个协程，而这个协程的工作即为打印出当前执行协程任务的线程名。

CoroutineScope.launch 方法会返回一个 Job 对象，用来获取协程信息并控制当前协程的执行状态。其提供了三种描述协程状态的字段：

kotlin 复制代码

/**
 * 描述当前协程是否已经激活，如果协程正在执行中，或者是在等待子协程执行完成，则返回 true
 */
isActive

/**
 * 描述当前协程是否已经执行完成
 */
isCompleted

/**
 * 描述当前协程是否被取消
 */
isCancelled

通过 Job.start 方法可以启动协程（如果已经启动则无效），通过 Job.cancel 方法可以取消协程，能否成功取消和协程内部的实现有关。

协程的派发和调度

调度策略

协程的派发和调度主要由派发器和调度策略决定，其中派发器（CoroutineDispatcher）是协程上下文（CoroutineContext）中的数据之一，同时也是 CoroutineContext 的间接子类，可以用来创建协程作用域：

kotlin 复制代码

// CoroutineDispatcher.kt

public abstract class CoroutineDispatcher :
    AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor {
    // ...
}

CoroutineContext 也是新建协程的 launch 方法的第一个参数，派发器主要决定将协程派发到哪个线程/线程池执行，Kotlin 默认提供了四个协程派发器：

kotlin 复制代码

// 参考 kotlinx.coroutines.Dispatchers 类

/**
 * 默认的派发器，会将协程派发到默认的线程池进行调度和执行
 */
Default

/**
 * 主线程派发器，会将协程派发到主线程调度和执行，Kotlin 默认没有实现该调度器，需要实际的接入方提供具体的实现，针对 Android 开发来说，需要额外添加 kotlinx-coroutines-android 的运行时依赖
 */
Main

/**
 * 未限定协程执行线程的派发器，该派发器会在调用线程中执行该协程，当协程被挂起后，恢复时所在的线程由挂起函数决定（在挂起时，由挂起函数决定要将该协程派发到哪个线程执行）
 */
Unconfined

/**
 * IO 派发器，和 Default 派发器共用执行线程池，但是对最大的并发执行的协程个数做了限制，取系统 "kotlinx.coroutines.io.parallelism" 配置值，默认为 64 和虚拟机中可用的处理器核心数中的最大值
 */
IO

而调度策略是协程作用域 launch 方法的第二个参数，主要决定了协程调度的时机，Kotlin 也默认提供了几种调度策略：

kotlin 复制代码

// 参考 kotlinx.coroutines.CoroutineStart 类

/**
 * 默认的调度策略，会立刻将协程调度进入目标线程/线程池中的任务队列，等待执行，在实际执行之前可以通过 Job.cancel 方法取消。
 */
DEFAULT

/**
 * 懒调度策略，只有当协程需要被执行的时候才会进行调度，调用方可以通过 Job.start 方法来触发调度
 */
LAZY

/**
 * 自动调度，和 DEFAULT 策略类似，会立刻调度协程到任务队列中，区别在于该调度策略在协程实际执行前无法取消
 */
@ExperimentalCoroutinesApi // Since 1.0.0, no ETA on stability
ATOMIC

/**
 * 立刻在当前调用线程中执行该协程，直到协程遇到第一个挂起函数，恢复时根据协程上下文中的派发器来决定要将协程派发到哪个线程/线程池中继续执行。
 * 类似于 Unconfined 派发器的作用，区别在于该策略在协程挂起并恢复后的执行线程由协程派发器决定。
 */
UNDISPATCHED

调度原理

我们在前面已经体验了协程的创建并且介绍了默认的派发和调度策略，下面来看下协程的调度原理，看下协程在创建后是如何被执行的，以开头的代码为例：

kotlin 复制代码

CoroutineScope(Dispatchers.Default).launch {
    // 该协程内部只打印了一行 log
    println("${Thread.currentThread().name}: coroutine test 1")
}

进入 CoroutineScope.launch 方法：

kotlin 复制代码

public fun CoroutineScope.launch(
    context: CoroutineContext = EmptyCoroutineContext,
    start: CoroutineStart = CoroutineStart.DEFAULT,
    // block 即为协程体，实际执行的 lambda
    block: suspend CoroutineScope.() -> Unit
): Job {
    // 结合 CoroutineScope 的协程上下文和 launch 方法参数传入的协程上下文，形成一个新的上下文
    val newContext = newCoroutineContext(context)
    // 根据调度策略，创建对应类型的协程
    val coroutine = if (start.isLazy)
        LazyStandaloneCoroutine(newContext, block) else
        StandaloneCoroutine(newContext, active = true)
    // 协程启动并传入调度策略
    coroutine.start(start, coroutine, block)
    return coroutine
}

我们继续跟进 coroutine.start 方法：

kotlin 复制代码

public fun <R> start(start: CoroutineStart, receiver: R, block: suspend R.() -> T) {
    start(block, receiver, this)
}

这里直接将 CoroutineStart 类型的参数 start 作为方法调用了，作为函数调用运算符（()）重载的实现，实际上调用的是对应参数的 CoroutineStart.invoke 方法：

kotlin 复制代码

public operator fun <R, T> invoke(block: suspend R.() -> T, receiver: R, completion: Continuation<T>): Unit =
    when (this) {
        DEFAULT -> block.startCoroutineCancellable(receiver, completion)
        ATOMIC -> block.startCoroutine(receiver, completion)
        UNDISPATCHED -> block.startCoroutineUndispatched(receiver, completion)
        LAZY -> Unit // will start lazily
    }

在这里已经根据调度策略来执行不同的逻辑了，因为我们代码中传入的 DEFAULT 策略，因此进入 block.startCoroutineCancellable 方法：

kotlin 复制代码

internal fun <R, T> (suspend (R) -> T).startCoroutineCancellable(
    receiver: R, completion: Continuation<T>,
    onCancellation: ((cause: Throwable) -> Unit)? = null
) =
    runSafely(completion) {
        createCoroutineUnintercepted(receiver, completion).intercepted().resumeCancellableWith(Result.success(Unit), onCancellation)
    }

startCoroutineCancellable 方法是协程框架为可挂起的 lambda 提供的一个扩展方法，内部会继续调用对应的扩展方法 createCoroutineUnintercepted:

kotlin 复制代码

@SinceKotlin("1.3")
public actual fun <R, T> (suspend R.() -> T).createCoroutineUnintercepted(
    receiver: R,
    completion: Continuation<T>
): Continuation<Unit> {
    val probeCompletion = probeCoroutineCreated(completion)
    return if (this is BaseContinuationImpl)
        create(receiver, probeCompletion)
    else {
        createCoroutineFromSuspendFunction(probeCompletion) {
            (this as Function2<R, Continuation<T>, Any?>).invoke(receiver, it)
        }
    }
}

这个方法最终会创建出一个 Continuation 类型的对象，其内部会走上面 if 的条件分支，这里有一个隐藏点：可挂起的 lambda 表达式在实际运行时其实是 SuspendLambda 类型的子类，我们可以通过断点验证这点：

而 SuspendLambda 类其实是 BaseContinuationImpl 类型的子类：

kotlin 复制代码

@SinceKotlin("1.3")
// Suspension lambdas inherit from this class
internal abstract class SuspendLambda(
    public override val arity: Int,
    completion: Continuation<Any?>?
) : ContinuationImpl(completion), FunctionBase<Any?>, SuspendFunction {
    // ...
}

@SinceKotlin("1.3")
// State machines for named suspend functions extend from this class
internal abstract class ContinuationImpl(
    completion: Continuation<Any?>?,
    private val _context: CoroutineContext?
) : BaseContinuationImpl(completion) {
    // ...
}

而在 SuspendLambda 类的注释上面也提到了: 可挂起的 lambda 会继承 SuspendLambda 类。继续往后面走，会调用 Continuation.intercepted 方法：

kotlin 复制代码

@SinceKotlin("1.3")
public actual fun <T> Continuation<T>.intercepted(): Continuation<T> =
    (this as? ContinuationImpl)?.intercepted() ?: this

进入 ContinuationImpl.intercepted 方法：

kotlin 复制代码

public fun intercepted(): Continuation<Any?> =
        intercepted
            ?: (context[ContinuationInterceptor]?.interceptContinuation(this) ?: this)
                .also { intercepted = it }

可以看到最终调用了协程上下文中注入的 ContinuationInterceptor 接口的 interceptContinuation 方法，这个注入的 ContinuationInterceptor 默认就是协程派发器：

kotlin 复制代码

// 协程派发器实现了 ContinuationInterceptor 接口
public abstract class CoroutineDispatcher :
    AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor

最终调用的是协程派发器的 interceptContinuation 方法：

kotlin 复制代码

public final override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> =
        DispatchedContinuation(this, continuation)

协程派发器的 interceptContinuation 内部会生成一个 DispatchedContinuation 类型的对象，并将第一个参数设置为派发器自己，用于后续的协程实际派发工作。最后，在生成最终的 Continuation 对象之后，会调用该对象的 resumeCancellableWith 扩展方法：

kotlin 复制代码

@InternalCoroutinesApi
public fun <T> Continuation<T>.resumeCancellableWith(
    result: Result<T>,
    onCancellation: ((cause: Throwable) -> Unit)? = null
): Unit = when (this) {
    is DispatchedContinuation -> resumeCancellableWith(result, onCancellation)
    else -> resumeWith(result)
}

因为在这里的 this 的类型为 DispatchedContinuation，因此会调用 resumeCancellableWith 方法：

kotlin 复制代码

@Suppress("NOTHING_TO_INLINE")
inline fun resumeCancellableWith(
    result: Result<T>,
    noinline onCancellation: ((cause: Throwable) -> Unit)?
) {
    val state = result.toState(onCancellation)
    // 对于当前协程的上下文，派发器是否需要进行协程派发
    if (dispatcher.isDispatchNeeded(context)) {
        _state = state
        resumeMode = MODE_CANCELLABLE
        dispatcher.dispatch(context, this)
    } else {
        executeUnconfined(state, MODE_CANCELLABLE) {
            if (!resumeCancelled(state)) {
                resumeUndispatchedWith(result)
            }
        }
    }
}

这里的 dispatcher 其实就是创建该 Continuation 时传入的第一个派发器参数，内部会调用 dispatcher.isDispatchNeeded 方法，该方法默认返回 true，所以对于上面的代码 case，最终会走到 dispatcher.dispatch 方法中，而这个 dispatcher 对象其实就是最开始传入的 Dispatchers.Default 派发器：

kotlin 复制代码

public actual val Default: CoroutineDispatcher = createDefaultDispatcher()

// createDefaultDispatcher 方法默认会使用 DefaultScheduler 对象作为返回值
internal actual fun createDefaultDispatcher(): CoroutineDispatcher =
    if (useCoroutinesScheduler) DefaultScheduler else CommonPool

因此我们进入 DefaultScheduler 的 dispatch 方法：

kotlin 复制代码

override fun dispatch(context: CoroutineContext, block: Runnable): Unit =
    try {
        coroutineScheduler.dispatch(block)
    } catch (e: RejectedExecutionException) {
        // CoroutineScheduler only rejects execution when it is being closed and this behavior is reserved
        // for testing purposes, so we don't have to worry about cancelling the affected Job here.
        DefaultExecutor.dispatch(context, block)
    }

内部会调用 coroutineScheduler.dispatch 方法，coroutineScheduler 是 CoroutineScheduler 的对象：

kotlin 复制代码

internal class CoroutineScheduler(
    @JvmField val corePoolSize: Int,
    @JvmField val maxPoolSize: Int,
    @JvmField val idleWorkerKeepAliveNs: Long = IDLE_WORKER_KEEP_ALIVE_NS,
    @JvmField val schedulerName: String = DEFAULT_SCHEDULER_NAME
) : Executor, Closeable {

    fun dispatch(
        block: Runnable,
        taskContext: TaskContext = NonBlockingContext,
        tailDispatch: Boolean = false
    ) {
        trackTask() // this is needed for virtual time support
        // 创建一个 task 对象
        val task = createTask(block, taskContext)
        // try to submit the task to the local queue and act depending on the result
        val currentWorker = currentWorker()
        // 添加到任务队列中
        val notAdded = currentWorker.submitToLocalQueue(task, tailDispatch)
        if (notAdded != null) {
            if (!addToGlobalQueue(notAdded)) {
                // Global queue is closed in the last step of close/shutdown -- no more tasks should be accepted
                throw RejectedExecutionException("$schedulerName was terminated")
            }
        }
        val skipUnpark = tailDispatch && currentWorker != null
        // Checking 'task' instead of 'notAdded' is completely okay
        if (task.mode == TASK_NON_BLOCKING) {
            if (skipUnpark) return
            // 有新的任务添加到队列，尝试唤醒线程
            signalCpuWork()
        } else {
            // Increment blocking tasks anyway
            signalBlockingWork(skipUnpark = skipUnpark)
        }
    }

    fun signalCpuWork() {
        if (tryUnpark()) return
        if (tryCreateWorker()) return
        tryUnpark()
    }

    /**
     * 实际执行任务的线程
     */
    internal inner class Worker private constructor() : Thread() {
        init {
            // 设置为守护线程
            isDaemon = true
        }
        
        override fun run() = runWorker()

        private fun runWorker() {
            var rescanned = false
            while (!isTerminated && state != WorkerState.TERMINATED) {
                // 从任务队列中取任务
                val task = findTask(mayHaveLocalTasks)
                // Task found. Execute and repeat
                if (task != null) {
                    rescanned = false
                    minDelayUntilStealableTaskNs = 0L
                    // 执行任务
                    executeTask(task)
                    continue
                } else {
                    mayHaveLocalTasks = false
                }
                /*
                 * No tasks were found:
                 * 1) Either at least one of the workers has stealable task in its FIFO-buffer with a stealing deadline.
                 *    Then its deadline is stored in [minDelayUntilStealableTask]
                 *
                 * Then just park for that duration (ditto re-scanning).
                 * While it could potentially lead to short (up to WORK_STEALING_TIME_RESOLUTION_NS ns) starvations,
                 * excess unparks and managing "one unpark per signalling" invariant become unfeasible, instead we are going to resolve
                 * it with "spinning via scans" mechanism.
                 * NB: this short potential parking does not interfere with `tryUnpark`
                 */
                if (minDelayUntilStealableTaskNs != 0L) {
                    if (!rescanned) {
                        rescanned = true
                    } else {
                        rescanned = false
                        tryReleaseCpu(WorkerState.PARKING)
                        interrupted()
                        LockSupport.parkNanos(minDelayUntilStealableTaskNs)
                        minDelayUntilStealableTaskNs = 0L
                    }
                    continue
                }
                /*
                 * 2) Or no tasks available, time to park and, potentially, shut down the thread.
                 * Add itself to the stack of parked workers, re-scans all the queues
                 * to avoid missing wake-up (requestCpuWorker) and either starts executing discovered tasks or parks itself awaiting for new tasks.
                 */
                tryPark()
            }
            tryReleaseCpu(WorkerState.TERMINATED)
        }

        private fun executeTask(task: Task) {
            val taskMode = task.mode
            idleReset(taskMode)
            beforeTask(taskMode)
            // 执行任务
            runSafely(task)
            afterTask(taskMode)
        }
    }

    fun runSafely(task: Task) {
        try {
            // 这里执行包装好的协程任务
            task.run()
        } catch (e: Throwable) {
            val thread = Thread.currentThread()
            thread.uncaughtExceptionHandler.uncaughtException(thread, e)
        } finally {
            unTrackTask()
        }
    }
}

最终发现 CoroutineScheduler 类其实是线程池的另一种实现，主要的逻辑是将协程包装成一个 Task，放入任务队列中，并唤醒线程从从队列中取任务再执行，这里就不再继续展开了，我们直接看实际的 run 方法的实现：

kotlin 复制代码

internal abstract class DispatchedTask<in T>(
    @JvmField public var resumeMode: Int
) : SchedulerTask() {

    public final override fun run() {
        assert { resumeMode != MODE_UNINITIALIZED } // should have been set before dispatching
        val taskContext = this.taskContext
        var fatalException: Throwable? = null
        try {
            val delegate = delegate as DispatchedContinuation<T>
            val continuation = delegate.continuation
            withContinuationContext(continuation, delegate.countOrElement) {
                val context = continuation.context
                val state = takeState() // NOTE: Must take state in any case, even if cancelled
                val exception = getExceptionalResult(state)
                /*
                 * Check whether continuation was originally resumed with an exception.
                 * If so, it dominates cancellation, otherwise the original exception
                 * will be silently lost.
                 */
                val job =
                    if (exception == null && resumeMode.isCancellableMode) context[Job] else null
                // 根据状态信息，进行任务调用
                if (job != null && !job.isActive) {
                    val cause = job.getCancellationException()
                    cancelCompletedResult(state, cause)
                    continuation.resumeWithStackTrace(cause)
                } else {
                    if (exception != null) {
                        continuation.resumeWithException(exception)
                    } else {
                        continuation.resume(getSuccessfulResult(state))
                    }
                }
            }
        } catch (e: Throwable) {
            // This instead of runCatching to have nicer stacktrace and debug experience
            fatalException = e
        } finally {
            val result = runCatching { taskContext.afterTask() }
            handleFatalException(fatalException, result.exceptionOrNull())
        }
    }
}

最终会调用到 continuation.resume 方法中，该方法是在 Continuation 接口中定义的，我们直接看它的实现：

kotlin 复制代码

internal abstract class BaseContinuationImpl(
    // This is `public val` so that it is private on JVM and cannot be modified by untrusted code, yet
    // it has a public getter (since even untrusted code is allowed to inspect its call stack).
    public val completion: Continuation<Any?>?
) : Continuation<Any?>, CoroutineStackFrame, Serializable {
    // This implementation is final. This fact is used to unroll resumeWith recursion.
    public final override fun resumeWith(result: Result<Any?>) {
        // This loop unrolls recursion in current.resumeWith(param) to make saner and shorter stack traces on resume
        var current = this
        var param = result
        while (true) {
            // Invoke "resume" debug probe on every resumed continuation, so that a debugging library infrastructure
            // can precisely track what part of suspended callstack was already resumed
            probeCoroutineResumed(current)
            with(current) {
                val completion =
                    completion!! // fail fast when trying to resume continuation without completion
                val outcome: Result<Any?> =
                    try {
                        // 这里是实际调用 lambda 表达式的入口！
                        val outcome = invokeSuspend(param)
                        if (outcome === COROUTINE_SUSPENDED) return
                        Result.success(outcome)
                    } catch (exception: Throwable) {
                        Result.failure(exception)
                    }
                releaseIntercepted() // this state machine instance is terminating
                if (completion is BaseContinuationImpl) {
                    // unrolling recursion via loop
                    current = completion
                    param = outcome
                } else {
                    // top-level completion reached -- invoke and return
                    completion.resumeWith(outcome)
                    return
                }
            }
        }
    }

    /**
     * 这个抽象方法内部就是协程实际的 lambda 的调用方法
     */
    protected abstract fun invokeSuspend(result: Result<Any?>): Any?
}

最终还是回到了 BaseContinuationImpl 类的 resumeWith 实现了，我们上面提到，可挂起的 lambda 表达式其实是 SuspendLambda 的子类，而 SuspendLambda 又是 BaseContinuationImpl 子类。

resumeWith 方法内部会调用 invokeSuspend 方法，这个方法其实就是我们的协程任务代码的入口。为什么这么说？我们可以反编译下我们的协程代码：

kotlin 复制代码

fun main(args: Array<String>) {
    CoroutineScope(Dispatchers.Default).launch {
        println("${Thread.currentThread().name}: coroutine test 1")
    }
}

反编译后：

java 复制代码

public final class CoroutineTestKt {
    public static final void main(@NotNull String[] args) {
        Intrinsics.checkNotNullParameter(args, "args");
        BuildersKt.launch$default(CoroutineScopeKt.CoroutineScope((CoroutineContext)Dispatchers.getDefault()), (CoroutineContext)null, (CoroutineStart)null, (Function2)(new Function2((Continuation)null) {
            int label;

            @Nullable
            public final Object invokeSuspend(@NotNull Object var1) {
                Object var3 = IntrinsicsKt.getCOROUTINE_SUSPENDED();
                switch (this.label) {
                    case 0:
                        ResultKt.throwOnFailure(var1);
                        // 这里就是我们写的业务代码
                        StringBuilder var10000 = new StringBuilder();
                        Thread var10001 = Thread.currentThread();
                        Intrinsics.checkNotNullExpressionValue(var10001, "Thread.currentThread()");
                        String var2 = var10000.append(var10001.getName()).append(": coroutine test 1").toString();
                        System.out.println(var2);
                        return Unit.INSTANCE;
                    default:
                        throw new IllegalStateException("call to 'resume' before 'invoke' with coroutine");
                }
            }

            @NotNull
            public final Continuation create(@Nullable Object value, @NotNull Continuation completion) {
                Intrinsics.checkNotNullParameter(completion, "completion");
                Function2 var3 = new <anonymous constructor>(completion);
                return var3;
            }

            public final Object invoke(Object var1, Object var2) {
                return ((<undefinedtype>)this.create(var1, (Continuation)var2)).invokeSuspend(Unit.INSTANCE);
            }
        }), 3, (Object)null);
    }
}

可以看到，Kotlin 编译器会把我们在 lambda 表达式中的代码包装在对象的 invokeSuspend 方法中，而 invokeSuspend 方法正好是 BaseContinuationImpl 类中定义的一个抽象方法：

kotlin 复制代码

@SinceKotlin("1.3")
internal abstract class BaseContinuationImpl(
    // This is `public val` so that it is private on JVM and cannot be modified by untrusted code, yet
    // it has a public getter (since even untrusted code is allowed to inspect its call stack).
    public val completion: Continuation<Any?>?
) : Continuation<Any?>, CoroutineStackFrame, Serializable {

    // ...
    
    protected abstract fun invokeSuspend(result: Result<Any?>): Any?
}

最终会被对应线程池中的线程调用。至此完成对协程从创建、派发、调度到执行流程的整体分析。

从上面的流程中可以看出，派发器主要负责把协程放到对应的执行器（线程池）中，具体的执行时机由协程的调度策略和执行器（线程池）的实现共同决定。开发者也可以通过自定义 CoroutineDispatcher 来决定协程的派发策略。

挂起和恢复

我们在开头介绍协程的时候提到，协程支持挂起和恢复，在挂起时不阻塞协程的执行线程，同时可以在用户态完成，无需额外的系统调用。Kotlin 协程是如何做到这一点的？我们先看一个协程挂起的例子：

kotlin 复制代码

fun main(args: Array<String>) {
    CoroutineScope(Dispatchers.Default).launch {
        println("${Thread.currentThread().id}: coroutine1 log 1")
        // 这里使用 delay 挂起函数将当前协程挂起 200 ms
        delay(200)
        println("${Thread.currentThread().id}: coroutine1 log 2")
    }
    Thread.sleep(50)
    CoroutineScope(Dispatchers.Default).launch {
        println("${Thread.currentThread().id}: coroutine2 log 1")
    }
    // 因为协程默认的执行线程池中的线程都是守护线程，因此主线程这里需要 sleep， 等待协程执行完成
    Thread.sleep(500)
}

结果：

log 复制代码

31: coroutine1 log 1
31: coroutine2 log 1
31: coroutine1 log 2

在这段代码中我们创建了两个协程，在第一个协程打印完一行 log 之后，通过 delay 函数将当前协程挂起 200ms，同时在主线程休眠了 50ms 后创建了第二个协程并打印了另一行 log，最后在协程1挂起结束恢复后再打印一行 log。

从结果上我们可以看到两个协程的执行线程是同一个（线程 id 相同），由此可初步证明协程1在挂起后，31号线程并没有被 delay 函数阻塞，而是等到协程2被创建和派发后去执行了协程2，待协程1挂起结束后再继续执行协程1。

这种可以挂起协程但又不会阻塞执行线程的函数我们称为挂起函数，而调用挂起函数的地方就叫挂起点。

挂起函数

挂起函数的声明和普通函数类似，但是会在声明前多加一个 suspend 关键字：

kotlin 复制代码

suspend fun suspendFun() {
}

但是并不是简单的为函数声明加一个 suspend 关键字就能让其成为可用的挂起函数，能否真正做到不阻塞调用线程的前提下挂起协程取决于挂起函数的实现，我们来看一个例子：

kotlin 复制代码

suspend fun suspendFun() {
    // 这个挂起函数模拟耗时操作，将线程休眠 200ms
    Thread.sleep(200)
}

fun main(args: Array<String>) {
    CoroutineScope(Dispatchers.Default).launch {
        println("${Thread.currentThread().id}: coroutine1 log 1")
        suspendFun()
        println("${Thread.currentThread().id}: coroutine1 log 2")
    }
    Thread.sleep(50)
    CoroutineScope(Dispatchers.Default).launch {
        println("${Thread.currentThread().id}: coroutine2 log 1")
    }
    // 因为协程默认的执行线程池中的线程都是守护线程，因此主线程这里需要 sleep， 等待协程执行完成
    Thread.sleep(500)
}

一样的代码，只是把 delay 函数调用换成了调用我们自定义的挂起函数 suspendFun，来看看结果：

log 复制代码

30: coroutine1 log 1
31: coroutine2 log 1
30: coroutine1 log 2

我们发现：30号线程在 suspendFun 函数中被阻塞了，导致协程2派发后线程池新建了一个线程来执行协程2的代码。即 suspendFun 不是一个合格的挂起函数，其阻塞了调用者线程。

挂起原理

那么如何写出不阻塞调用线程的挂起函数呢？我们先参考下协程官方的 delay 挂起函数是如何实现的：

kotlin 复制代码

public suspend fun delay(timeMillis: Long) {
    if (timeMillis <= 0) return // don't delay
    return suspendCancellableCoroutine sc@ { cont: CancellableContinuation<Unit> ->
        // if timeMillis == Long.MAX_VALUE then just wait forever like awaitCancellation, don't schedule.
        if (timeMillis < Long.MAX_VALUE) {
            cont.context.delay.scheduleResumeAfterDelay(timeMillis, cont)
        }
    }
}

首先是通过 suspendCancellableCoroutine 方法来创建了一个可取消的 Continuation 对象（CancellableContinuation）, 实际上是通过调用 Delay.scheduleResumeAfterDelay 来实现 delay 的，继续看 context.delay 的获取：

kotlin 复制代码

internal val CoroutineContext.delay: Delay get() = get(ContinuationInterceptor) as? Delay ?: DefaultDelay

Delay 对象优先取协程上下文中 Key 为 ContinuationInterceptor 的元素，上文提到过：Key 为 ContinuationInterceptor 的上下文默认为协程派发器对象，而默认的协程派发器是没有实现 Delay 接口的：

kotlin 复制代码

public abstract class CoroutineDispatcher :
    AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor

因此这里拿到的 Delay 对象为 DefaultDelay：

kotlin 复制代码

internal actual val DefaultDelay: Delay = DefaultExecutor

即为 DefaultExecutor，查看 DefaultExecutor 的实现：

kotlin 复制代码

public override fun scheduleResumeAfterDelay(timeMillis: Long, continuation: CancellableContinuation<Unit>) {
    val timeNanos = delayToNanos(timeMillis)
    if (timeNanos < MAX_DELAY_NS) {
        val now = nanoTime()
        DelayedResumeTask(now + timeNanos, continuation).also { task ->
            continuation.disposeOnCancellation(task)
            schedule(now, task)
        }
    }
}

public fun schedule(now: Long, delayedTask: DelayedTask) {
    when (scheduleImpl(now, delayedTask)) {
        SCHEDULE_OK -> if (shouldUnpark(delayedTask)) unpark()
        SCHEDULE_COMPLETED -> reschedule(now, delayedTask)
        SCHEDULE_DISPOSED -> {} // do nothing -- task was already disposed
        else -> error("unexpected result")
    }
}

private fun scheduleImpl(now: Long, delayedTask: DelayedTask): Int {
    if (isCompleted) return SCHEDULE_COMPLETED
    val delayedQueue = _delayed.value ?: run {
        _delayed.compareAndSet(null, DelayedTaskQueue(now))
        _delayed.value!!
    }
    return delayedTask.scheduleTask(now, delayedQueue, this)
}

绕了几个圈子，最终在 scheduleImpl 方法中调用了 DelayedTask.scheduleTask 来完成：

kotlin 复制代码

@Synchronized
fun scheduleTask(now: Long, delayed: DelayedTaskQueue, eventLoop: EventLoopImplBase): Int {
    if (_heap === DISPOSED_TASK) return SCHEDULE_DISPOSED // don't add -- was already disposed
    // 将任务添加到队列中
    delayed.addLastIf(this) { firstTask ->
        if (eventLoop.isCompleted) return SCHEDULE_COMPLETED // non-local return from scheduleTask
        /**
         * We are about to add new task and we have to make sure that [DelayedTaskQueue]
         * invariant is maintained. The code in this lambda is additionally executed under
         * the lock of [DelayedTaskQueue] and working with [DelayedTaskQueue.timeNow] here is thread-safe.
         */
        if (firstTask == null) {
            /**
             * When adding the first delayed task we simply update queue's [DelayedTaskQueue.timeNow] to
             * the current now time even if that means "going backwards in time". This makes the structure
             * self-correcting in spite of wild jumps in `nanoTime()` measurements once all delayed tasks
             * are removed from the delayed queue for execution.
             */
            delayed.timeNow = now
        } else {
            /**
             * Carefully update [DelayedTaskQueue.timeNow] so that it does not sweep past first's tasks time
             * and only goes forward in time. We cannot let it go backwards in time or invariant can be
             * violated for tasks that were already scheduled.
             */
            val firstTime = firstTask.nanoTime
            // compute min(now, firstTime) using a wrap-safe check
            val minTime = if (firstTime - now >= 0) now else firstTime
            // update timeNow only when going forward in time
            if (minTime - delayed.timeNow > 0) delayed.timeNow = minTime
        }
        /**
         * Here [DelayedTaskQueue.timeNow] was already modified and we have to double-check that newly added
         * task does not violate [DelayedTaskQueue] invariant because of that. Note also that this scheduleTask
         * function can be called to reschedule from one queue to another and this might be another reason
         * where new task's time might now violate invariant.
         * We correct invariant violation (if any) by simply changing this task's time to now.
         */
        if (nanoTime - delayed.timeNow < 0) nanoTime = delayed.timeNow
        true
    }
    return SCHEDULE_OK
}

最终通过 DelayedTaskQueue.addLastIf 来完成，来看下 DelayedTaskQueue 的相关实现

kotlin 复制代码

// @Synchronized // NOTE! NOTE! NOTE! inline fun cannot be @Synchronized
// Condition also receives current first node in the heap
public inline fun addLastIf(node: T, cond: (T?) -> Boolean): Boolean = synchronized(this) {
    if (cond(firstImpl())) {
        addImpl(node)
        true
    } else {
        false
    }
}

@PublishedApi
internal fun addImpl(node: T) {
    assert { node.heap == null }
    node.heap = this
    val a = realloc()
    val i = size++
    a[i] = node
    node.index = i
    siftUpFrom(i)
}

private tailrec fun siftUpFrom(i: Int) {
    if (i <= 0) return
    val a = a!!
    val j = (i - 1) / 2
    // 对比当前节点和父节点，如果当前节点值小于父节点，则在下面调用 swap 方法进行交换
    if (a[j]!! <= a[i]!!) return
    swap(i, j)
    siftUpFrom(j)
}

可以看到内部使用了最小堆 的数据结构来对添加的任务进行排序，查看 DelayedTask 的 compareTo 方法：

kotlin 复制代码

override fun compareTo(other: DelayedTask): Int {
    val dTime = nanoTime - other.nanoTime
    return when {
        dTime > 0 -> 1
        dTime < 0 -> -1
        else -> 0
    }
}

通过计算两个任务的最终恢复的时间戳，来确保堆顶的任务的 delay 时间是最短的。

既然有任务入队，肯定会有任务出队，我们看下出队的逻辑：

kotlin 复制代码

internal actual object DefaultExecutor : EventLoopImplBase(), Runnable {
    
    override val thread: Thread
        get() = _thread ?: createThreadSync()

    @Synchronized
    private fun createThreadSync(): Thread {
        return _thread ?: Thread(this, THREAD_NAME).apply {
            _thread = this
            isDaemon = true
            start()
        }
    }

    override fun run() {
        ThreadLocalEventLoop.setEventLoop(this)
        registerTimeLoopThread()
        try {
            var shutdownNanos = Long.MAX_VALUE
            if (!notifyStartup()) return
            while (true) {
                Thread.interrupted() // just reset interruption flag
                // 这里，取出队头事件并进行消费
                var parkNanos = processNextEvent()
                if (parkNanos == Long.MAX_VALUE) {
                    // nothing to do, initialize shutdown timeout
                    val now = nanoTime()
                    if (shutdownNanos == Long.MAX_VALUE) shutdownNanos = now + KEEP_ALIVE_NANOS
                    val tillShutdown = shutdownNanos - now
                    if (tillShutdown <= 0) return // shut thread down
                    parkNanos = parkNanos.coerceAtMost(tillShutdown)
                } else
                    shutdownNanos = Long.MAX_VALUE
                if (parkNanos > 0) {
                    // check if shutdown was requested and bail out in this case
                    if (isShutdownRequested) return
                    parkNanos(this, parkNanos)
                }
            }
        } finally {
            _thread = null // this thread is dead
            acknowledgeShutdownIfNeeded()
            unregisterTimeLoopThread()
            // recheck if queues are empty after _thread reference was set to null (!!!)
            if (!isEmpty) thread // recreate thread if it is needed
        }
    }

    override fun processNextEvent(): Long {
        // unconfined events take priority
        if (processUnconfinedEvent()) return 0
        // queue all delayed tasks that are due to be executed
        val delayed = _delayed.value
        if (delayed != null && !delayed.isEmpty) {
            val now = nanoTime()
            while (true) {
                // make sure that moving from delayed to queue removes from delayed only after it is added to queue
                // to make sure that 'isEmpty' and `nextTime` that check both of them
                // do not transiently report that both delayed and queue are empty during move
                delayed.removeFirstIf {
                    // 根据当前时间戳，判断是否可以执行队头任务，如果可以，则取出来放入另一个队列中
                    if (it.timeToExecute(now)) {
                        enqueueImpl(it)
                    } else
                        false
                } ?: break // quit loop when nothing more to remove or enqueueImpl returns false on "isComplete"
            }
        }
        // then process one event from queue
        // 再次取出另一个队列的队头任务，并执行
        val task = dequeue()
        if (task != null) {
            task.run()
            return 0
        }
        return nextTime
    }
}

DefaultExecutor 内部有一个单线程用于轮询堆顶任务，如果时间戳满足条件（delay 的时间到了），则运行该任务（task.run）。继续看任务运行的过程：

kotlin 复制代码

private inner class DelayedResumeTask(
    nanoTime: Long,
    private val cont: CancellableContinuation<Unit>
) : DelayedTask(nanoTime) {
    override fun run() { with(cont) { resumeUndispatched(Unit) } }
    override fun toString(): String = super.toString() + cont.toString()
}

resumeUndispatched 是 CancellableContinuation 的接口，实现在 CancellableContinuationImpl 中：

kotlin 复制代码

override fun CoroutineDispatcher.resumeUndispatched(value: T) {
    val dc = delegate as? DispatchedContinuation
    resumeImpl(value, if (dc?.dispatcher === this) MODE_UNDISPATCHED else resumeMode)
}

最终会调用 dispatchResume 方法：

kotlin 复制代码

private fun dispatchResume(mode: Int) {
    if (tryResume()) return // completed before getResult invocation -- bail out
    // otherwise, getResult has already commenced, i.e. completed later or in other thread
    dispatch(mode)
}

最终通过 DispatchedTask.dispatch 方法完成任务派发：

kotlin 复制代码

internal fun <T> DispatchedTask<T>.dispatch(mode: Int) {
    assert { mode != MODE_UNINITIALIZED } // invalid mode value for this method
    val delegate = this.delegate
    val undispatched = mode == MODE_UNDISPATCHED
    if (!undispatched && delegate is DispatchedContinuation<*> && mode.isCancellableMode == resumeMode.isCancellableMode) {
        // dispatch directly using this instance's Runnable implementation
        val dispatcher = delegate.dispatcher
        val context = delegate.context
        if (dispatcher.isDispatchNeeded(context)) {
            // 通过派发器将协程派发到线程池中
            dispatcher.dispatch(context, this)
        } else {
            resumeUnconfined()
        }
    } else {
        // delegate is coming from 3rd-party interceptor implementation (and does not support cancellation)
        // or undispatched mode was requested
        resume(delegate, undispatched)
    }
}

最终又是通过 dispatcher.dispatch 来进行任务派发，一旦将任务派发到线程池中，后续的执行逻辑就和前面协程第一次调度的执行过程一致了，线程池会从任务列表取出任务并执行，最后会运行到 BaseContinuationImpl.invokeSuspend 方法，继续执行协程。

到这里还有一个问题，挂起恢复后的执行入口也是 invokeSuspend 方法，那么协程如何保证恢复后的执行起点是上一次的挂起点呢？这个问题我们还是通过反编译协程代码来解释：

java 复制代码

public final class CoroutineSuspendKt {
    public static final void main(@NotNull String[] args) {
        Intrinsics.checkNotNullParameter(args, "args");
        BuildersKt.launch$default(CoroutineScopeKt.CoroutineScope((CoroutineContext) Dispatchers.getDefault()), (CoroutineContext) null, (CoroutineStart) null, (Function2) (new Function2((Continuation) null) {
            // 注意这里的 label 字段！
            int label;

            @Nullable
            public final Object invokeSuspend(@NotNull Object $result) {
                Object var3 = IntrinsicsKt.getCOROUTINE_SUSPENDED();
                StringBuilder var10000;
                Thread var10001;
                String var2;
                switch (this.label) {
                    // 初始时 label 为 0
                    case 0:
                        ResultKt.throwOnFailure($result);
                        var10000 = new StringBuilder();
                        var10001 = Thread.currentThread();
                        Intrinsics.checkNotNullExpressionValue(var10001, "Thread.currentThread()");
                        var2 = var10000.append(var10001.getId()).append(": coroutine1 log 1").toString();
                        System.out.println(var2);
                        // 执行完第一行 log 的打印，准备执行 delay 函数挂起协程，这里把 label 设置为 1
                        this.label = 1;
                        // 协程挂起成功，直接 return
                        if (DelayKt.delay(200L, this) == var3) {
                            return var3;
                        }
                        break;
                    // 挂起完成后恢复执行，此时 label 为 1，在判断完失败状态（协程是否被取消、执行异常等）后，跳出循环
                    case 1:
                        ResultKt.throwOnFailure($result);
                        break;
                    default:
                        throw new IllegalStateException("call to 'resume' before 'invoke' with coroutine");
                }

                // 这里是挂起恢复后执行的代码，打印第二行 log
                var10000 = new StringBuilder();
                var10001 = Thread.currentThread();
                Intrinsics.checkNotNullExpressionValue(var10001, "Thread.currentThread()");
                var2 = var10000.append(var10001.getId()).append(": coroutine1 log 2").toString();
                System.out.println(var2);
                return Unit.INSTANCE;
            }

            @NotNull
            public final Continuation create(@Nullable Object value, @NotNull Continuation completion) {
                Intrinsics.checkNotNullParameter(completion, "completion");
                Function2 var3 = new <anonymous constructor > (completion);
                return var3;
            }

            public final Object invoke(Object var1, Object var2) {
                return (( < undefinedtype >) this.create(var1, (Continuation) var2)).
                invokeSuspend(Unit.INSTANCE);
            }
        }), 3, (Object) null);
        
        // ....
    }
}

为了减小篇幅，上面只把第一个协程反编译后的代码贴出来了，关键点在于编译器在协程内部添加了一个 label 字段，每次要执行挂起函数前，通过将 label 递增的形式来将挂起点前后的代码进行切片 。由此来实现每一次调用 invokeSuspend 方法时，精准的控制代码的执行起点。

我们也可以通过伪代码来精简上面的代码思路：

kotlin 复制代码

var label = 0
fun invokeSuspend() {
    when (label) {
        0 -> {
            // 代码片段1 ...
            label = 1
            delay(200)
        }
        1 -> {
            // 代码片段2 ...
            label = 2
            delay(200)
        }
        else -> {
            // 代码片段3 ...
        }
    }
}

本质上是内部维护了一个状态机，通过 label 字段来实现多次调用时的场景还原。

至此，通过对 delay 函数的分析，我们知道了为什么 delay 函数为何能在不阻塞调用线程的前提下挂起协程：其实是在挂起点将任务处抛入了事件队列，并结束当次任务调用。待延迟时间结束后，通过事件机制取出任务并执行协程派发逻辑，将协程送入执行器（线程池）中继续执行。

自定义挂起函数

我们仿造 delay 函数的实现来改造我们的 suspendFun 挂起函数：

kotlin 复制代码

suspend fun suspendFun(): String {
    return suspendCancellableCoroutine<String> { con ->
        // 耗时操作放入新的子线程，其实可以再放入一个新的协程中，避免创建单个独立线程，为了演示挂起效果没有这么做
        thread {
            // 模拟耗时操作
            Thread.sleep(200)
            // 耗时操作完成，恢复协程调用
            con.resume("This is suspendFun result")
        }
    }
}

fun main(args: Array<String>) {
    CoroutineScope(Dispatchers.Default).launch {
        println("${Thread.currentThread().id}: coroutine1 log 1, timeStamp = ${System.currentTimeMillis()}")
        println(suspendFun())
        println("${Thread.currentThread().id}: coroutine1 log 2, timeStamp = ${System.currentTimeMillis()}")
    }
    Thread.sleep(50)
    CoroutineScope(Dispatchers.Default).launch {
        println("${Thread.currentThread().id}: coroutine2 log 1")
    }
    // 因为协程默认的执行线程池中的线程都是守护线程，因此主线程这里需要 sleep， 等待协程执行完成
    Thread.sleep(500)
}

我们在 suspendFun 中通过 suspendCancellableCoroutine 创建了一个 CancellableContinuation 类型的对象，在耗时任务完成后，调用其 resume 方法以恢复被挂起的协程，resume 方法最终会调用到协程任务的 dispatch 方法，而后再通过协程派发器将任务派发到对应的执行器（线程池）中执行，这里不再分析完整流程。运行结果：

log 复制代码

30: coroutine1 log 1, timeStamp = 1700709773170
30: coroutine2 log 1
This is suspendFun result
30: coroutine1 log 2, timeStamp = 1700709773374

可以发现改造之后，main 函数里面的两个协程就都是30号线程运行的，证明 suspendFun 没有阻塞调用线程。同时挂起函数的返回值可以设置为协程恢复时的 resume 方法参数，而协程恢复的时机是由开发者决定的。借助这个特性，我们可以通过同步调用的形式来完成和以前异步操作回调一样的效果，我们将 suspendFun 改造成普通的异步回调方法：

kotlin 复制代码

fun suspendFun(callback: (String) -> Unit) {
    thread {
        // 模拟耗时操作
        Thread.sleep(200)
        callback("This is suspendFun result")
    }
}

对于这种通过 callback 回调实现的异步方法，都可以借助协程切换成同步方法调用的形式，避免过多回调。同时对于普通异步回调的实现，你在回调方法中需要关心当前的调用线程，比如 Android 开发中在异步获取网络数据，需要切换回主线程才能更新 UI。而通过协程就可以避免这个问题，因为协程的恢复其实也是使用了派发器将任务派发到对应的执行器（线程池/线程）中执行，由于协程挂起恢复前后协程的派发器是一样的，因此可以保证挂起恢复前后的代码的执行器（线程池/线程）是一致的。

协程的这两个特点可以减少我们处理异步任务的工作量并提升代码的可读性。

suspend 函数

上面的例子里面，我们在 suspendFun 中通过 suspendCancellableCoroutine 方法创建出一个可以恢复外部协程的 Continuation 对象，并且在合适的时机恢复被挂起的外部协程。这个是如何做到的？我们尝试反编译一下 suspendFun 函数，查看其内部的实现：

kotlin 复制代码

suspend fun suspendFun(): String {
    return suspendCancellableCoroutine<String> { con ->
        // 直接恢复外部协程
        con.resume("This is suspendFun result")
    }
}

fun main(args: Array<String>) {
    CoroutineScope(Dispatchers.Default).launch {
        println(suspendFun())
    }
    // 因为协程默认的执行线程池中的线程都是守护线程，因此主线程这里需要 sleep， 等待协程执行完成
    Thread.sleep(500)
}

为了演示效果，将其它的代码都去掉，反编译后：

java 复制代码

public final class CoroutineSuspendKt {
    /**
     * 多了一个 Continuation 类型的参数
     */
   @Nullable
   public static final Object suspendFun(@NotNull Continuation $completion) {
      int $i$f$suspendCancellableCoroutine = false;
      int var3 = false;
      // 先创建了一个 CancellableContinuationImpl 类型的对象，并且将参数 $completion 传入
      CancellableContinuationImpl cancellable$iv = new CancellableContinuationImpl(IntrinsicsKt.intercepted($completion), 1);
      cancellable$iv.initCancellability();
      CancellableContinuation con = (CancellableContinuation)cancellable$iv;
      int var6 = false;
      Continuation var7 = (Continuation)con;
      String var8 = "This is suspendFun result";
      Result.Companion var10001 = Result.Companion;
      // 调用 CancellableContinuationImpl#resumeWith，以恢复 $completion 调用
      var7.resumeWith(Result.constructor-impl(var8));
      Object var10000 = cancellable$iv.getResult();
      if (var10000 == IntrinsicsKt.getCOROUTINE_SUSPENDED()) {
         DebugProbesKt.probeCoroutineSuspended($completion);
      }

      return var10000;
   }

   public static final void main(@NotNull String[] args) {
      Intrinsics.checkNotNullParameter(args, "args");
      BuildersKt.launch$default(CoroutineScopeKt.CoroutineScope((CoroutineContext)Dispatchers.getDefault()), (CoroutineContext)null, (CoroutineStart)null, (Function2)(new Function2((Continuation)null) {
         int label;

         @Nullable
         public final Object invokeSuspend(@NotNull Object $result) {
            Object var3 = IntrinsicsKt.getCOROUTINE_SUSPENDED();
            Object var10000;
            switch (this.label) {
               case 0:
                  ResultKt.throwOnFailure($result);
                  this.label = 1;
                  // 调用 suspendFun 时传入了 this，即为 SuspendLambda 对象
                  var10000 = CoroutineSuspendKt.suspendFun(this);
                  if (var10000 == var3) {
                     return var3;
                  }
                  break;
               case 1:
                  ResultKt.throwOnFailure($result);
                  var10000 = $result;
                  break;
               default:
                  throw new IllegalStateException("call to 'resume' before 'invoke' with coroutine");
            }

            Object var2 = var10000;
            System.out.println(var2);
            return Unit.INSTANCE;
         }

         @NotNull
         public final Continuation create(@Nullable Object value, @NotNull Continuation completion) {
            Intrinsics.checkNotNullParameter(completion, "completion");
            Function2 var3 = new <anonymous constructor>(completion);
            return var3;
         }

         public final Object invoke(Object var1, Object var2) {
            return ((<undefinedtype>)this.create(var1, (Continuation)var2)).invokeSuspend(Unit.INSTANCE);
         }
      }), 3, (Object)null);
      Thread.sleep(500L);
   }
}

可以看到，suspendFun 函数反编译后多了一个 Continuation 类型的参数，而在内部调用 resumeWith 时，最终恢复的其实是函数参数中传入的 Continuation 对象（$completion）。而在外部协程调用 suspendFun 函数时，传入的 Continuation 对象就是 this，也就是外部协程任务本身。这样在 suspendFun 中最终恢复的就是外部的协程。

同时也能解释为什么 suspend 关键字声明的函数只能在协程或者是另一个 suspend 函数中调用，因为普通的方法中默认没有 Continuation 对象可供传入。而 suspendXXXCoroutine 系列方法则是 Kotlin 协程提供的捕获 suspend 函数 Continuation 参数的工具方法（代码中的函数声明是没有这个参数的）。

协程上下文

我们在前面已经接触过了协程上下文（CoroutineContext），协程派发器（CoroutineDispatcher）就是一个协程上下文，除此之外，协程上下文的种类还有协程名（CoroutineName）、异常处理（CoroutineExceptionHandler）、关联 Job（Job）等。协程上下文内部通过运算符重载和链式结构来将多个上下文数据融合到一个 CoroutineContext 对象中。我们来看一个例子：

kotlin 复制代码

// 通过 + 运算符联合成一个新的 CoroutineContext 对象并启动协程
CoroutineScope(Dispatchers.Default).launch(Dispatchers.IO + CoroutineName("My coroutine")) {
    println("${Thread.currentThread().name}: coroutine test 1")
}

查看 CoroutineContext 的实现：

kotlin 复制代码

@SinceKotlin("1.3")
public interface CoroutineContext {
    /**
     * Returns the element with the given [key] from this context or `null`.
     */
    public operator fun <E : Element> get(key: Key<E>): E?

    /**
     * Accumulates entries of this context starting with [initial] value and applying [operation]
     * from left to right to current accumulator value and each element of this context.
     */
    public fun <R> fold(initial: R, operation: (R, Element) -> R): R

    /**
     * Returns a context containing elements from this context and elements from  other [context].
     * The elements from this context with the same key as in the other one are dropped.
     */
    public operator fun plus(context: CoroutineContext): CoroutineContext =
        if (context === EmptyCoroutineContext) this else // fast path -- avoid lambda creation
            context.fold(this) { acc, element ->
                val removed = acc.minusKey(element.key)
                if (removed === EmptyCoroutineContext) element else {
                    // make sure interceptor is always last in the context (and thus is fast to get when present)
                    val interceptor = removed[ContinuationInterceptor]
                    if (interceptor == null) CombinedContext(removed, element) else {
                        val left = removed.minusKey(ContinuationInterceptor)
                        if (left === EmptyCoroutineContext) CombinedContext(
                            element,
                            interceptor
                        ) else
                            CombinedContext(CombinedContext(left, element), interceptor)
                    }
                }
            }
    
    // ...
}

CoroutineContext 内部重写了 + 运算符的逻辑（plus 方法），并将两个 CoroutineContext 对象合并成一个 CombinedContext 对象，看下其内部实现：

kotlin 复制代码

// this class is not exposed, but is hidden inside implementations
// this is a left-biased list, so that `plus` works naturally
@SinceKotlin("1.3")
internal class CombinedContext(
    private val left: CoroutineContext,
    private val element: Element
) : CoroutineContext, Serializable {

    override fun <E : Element> get(key: Key<E>): E? {
        var cur = this
        while (true) {
            cur.element[key]?.let { return it }
            val next = cur.left
            if (next is CombinedContext) {
                cur = next
            } else {
                return next[key]
            }
        }
    }
    
    //...
}

/**
 * CoroutineContext.kt
 */
public interface Element : CoroutineContext {
    /**
     * A key of this coroutine context element.
     */
    public val key: Key<*>

    public override operator fun <E : Element> get(key: Key<E>): E? =
        @Suppress("UNCHECKED_CAST")
        if (this.key == key) this as E else null
    
    // ....
}

CombinedContext 内部重写了 [] 运算符的逻辑，在获取指定类型的元素时，优先从 element 字段中获取，Element 内部的获取逻辑就是判断传入的 key 和内部的 key 是否相等。而每一个上下文中的 key 字段都是一个单例：

kotlin 复制代码

public data class CoroutineName(
    /**
     * User-defined coroutine name.
     */
    val name: String
) : AbstractCoroutineContextElement(CoroutineName) {
    /**
     * Key for [CoroutineName] instance in the coroutine context.
     */
    public companion object Key : CoroutineContext.Key<CoroutineName>
}

进而保证每一种上下文对象的唯一性。最后，如果从 element 中没获取到，再从 left 字段中继续这个过程，本质上是一种类似链表的数据结构。

下面来看下另外两种常见的上下文类型的用法。

异常处理

协程内部代码执行有可能出现异常，借助异常处理上下文，我们可以自定义协程异常的处理逻辑：

kotlin 复制代码

fun main(args: Array<String>) {
    // 创建一个协程异常处理上下文
    val coroutineExceptionHandler = CoroutineExceptionHandler { coroutineContext, throwable ->
        // 获取上下文中的 CoroutineName 信息
        val coroutineName = coroutineContext[CoroutineName]?.name
        println("$coroutineName occurred exception: $throwable")
    }
    // 通过 + 运算符将异常处理和协程名联合成一个新的 CoroutineContext 对象并启动协程
    CoroutineScope(Dispatchers.Default).launch(coroutineExceptionHandler + CoroutineName("My coroutine")) {
        println("${Thread.currentThread().name}: coroutine test 1")
        throw RuntimeException("test exception")
    }

    Thread.sleep(1000)
}

我们在协程代码中主动抛出一个异常，由于我们在启动协程时传入了异常处理器上下文，因此会将异常转发到异常处理器中处理，结果：

log 复制代码

DefaultDispatcher-worker-1: coroutine test 1
My coroutine occurred exception: java.lang.RuntimeException: test exception

参考协程对异常处理的默认实现：

kotlin 复制代码

private open class StandaloneCoroutine(
    parentContext: CoroutineContext,
    active: Boolean
) : AbstractCoroutine<Unit>(parentContext, initParentJob = true, active = active) {
    /**
     * 任务异常结束时回调的方法
     */
    override fun handleJobException(exception: Throwable): Boolean {
        handleCoroutineException(context, exception)
        return true
    }
}

/**
 * CoroutineExceptionHandler.kt
 */
public fun handleCoroutineException(context: CoroutineContext, exception: Throwable) {
    // Invoke an exception handler from the context if present
    try {
        // 优先从上下文中取出异常处理器并执行
        context[CoroutineExceptionHandler]?.let {
            it.handleException(context, exception)
            return
        }
    } catch (t: Throwable) {
        handleCoroutineExceptionImpl(context, handlerException(exception, t))
        return
    }
    // 协程上下文中没有异常处理器，则进行兜底处理
    // If a handler is not present in the context or an exception was thrown, fallback to the global handler
    handleCoroutineExceptionImpl(context, exception)
}

/**
 * CoroutineExceptionHandlerImpl.kt
 * 兜底逻辑处理协程异常
 */
internal actual fun handleCoroutineExceptionImpl(context: CoroutineContext, exception: Throwable) {
    // use additional extension handlers
    for (handler in handlers) {
        try {
            handler.handleException(context, exception)
        } catch (t: Throwable) {
            // Use thread's handler if custom handler failed to handle exception
            val currentThread = Thread.currentThread()
            currentThread.uncaughtExceptionHandler.uncaughtException(currentThread, handlerException(exception, t))
        }
    }

    // use thread's handler
    val currentThread = Thread.currentThread()
    currentThread.uncaughtExceptionHandler.uncaughtException(currentThread, exception)
}

可以发现在代码执行过程中出现异常时，优先使用协程上下文传入的异常处理器处理，兜底采用默认的处理机制。

协程作用域

我们在文章开头提到了协程作用域，并强调了协程必须在协程作用域中才能创建，这个设计其实是为了方便批量管理协程。每一个作用域都会有一个根 Job，而在这个作用域内创建的协程会作为子 Job 添加到根 Job 的 children 中。查看 CoroutineScope 的创建过程：

kotlin 复制代码

@Suppress("FunctionName")
public fun CoroutineScope(context: CoroutineContext): CoroutineScope =
    ContextScope(if (context[Job] != null) context else context + Job())

内部会判断传入的上下文中是否存在 Job 类型的上下文对象，不存在则会新建一个 Job。而在协程创建时会添加到 parentJob 中进行管理：

kotlin 复制代码

@InternalCoroutinesApi
public abstract class AbstractCoroutine<in T>(
    parentContext: CoroutineContext,
    initParentJob: Boolean,
    active: Boolean
) : JobSupport(active), Job, Continuation<T>, CoroutineScope {

    init {
        /*
         * Setup parent-child relationship between the parent in the context and the current coroutine.
         * It may cause this coroutine to become _cancelling_ if the parent is already cancelled.
         * It is dangerous to install parent-child relationship here if the coroutine class
         * operates its state from within onCancelled or onCancelling
         * (with exceptions for rx integrations that can't have any parent)
         */
        if (initParentJob) initParentJob(parentContext[Job])
    }
}

/**
 * JobSupport.kt
 */
public open class JobSupport constructor(active: Boolean) : Job, ChildJob, ParentJob, SelectClause0 {
    // ...
    
    protected fun initParentJob(parent: Job?) {
        assert { parentHandle == null }
        if (parent == null) {
            parentHandle = NonDisposableHandle
            return
        }
        parent.start() // make sure the parent is started
        @Suppress("DEPRECATION")
        // 依附到父 Job 中
        val handle = parent.attachChild(this)
        parentHandle = handle
        // now check our state _after_ registering (see tryFinalizeSimpleState order of actions)
        if (isCompleted) {
            handle.dispose()
            parentHandle = NonDisposableHandle // release it just in case, to aid GC
        }
    }
}

协程建立父子关系后，就可以通过父/祖先协程来批量管理子协程，比如在父协程取消时会同时取消其内部所有的子协程。常见的场景是在一个用户页面退出后，为了避免资源浪费，可以取消该页面作用域内的所有协程。由于篇幅原因，这里不继续往下追源码了。

在 Android 中使用协程

Android 官方已经推荐使用协程来完成异步任务：developer.android.com/kotlin/coro... 同时也提供了相关的三方库来简化业务开发，这里以 lifecycle-runtime-ktx 库为例，这个库是 Android 官方提供的将组件生命周期和协程结合的三方库。

我们上面提到的当用户页面退出后，取消页面中创建的所有的协程就可以通过这个库来直接实现：先添加依赖

groovy 复制代码

// 这个库是 Kotlin 协程在 Android 运行环境的依赖，主要添加了主线程派发器的支持
implementation 'org.jetbrains.kotlinx:kotlinx-coroutines-android:1.5.0'

implementation "androidx.lifecycle:lifecycle-runtime-ktx:2.4.0"

MainActivity 实现：

kotlin 复制代码

class MainActivity : AppCompatActivity() {
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)

        lifecycleScope.launch {
            val textContent = mockReadTextFile()
            // 耗时操作完成后，如果协程存活（未被取消），则继续执行
            if (isActive) {
                Toast.makeText(this@MainActivity, textContent, Toast.LENGTH_SHORT).show()
            }
        }
    }

    private suspend fun mockReadTextFile(): String {
        return suspendCoroutine<String> {
            lifecycleScope.launch(Dispatchers.IO) {
                // 模拟文件读取的耗时操作
                Thread.sleep(3000)
                // 读取完成，恢复外部协程
                it.resume("This is text file content.")
            }.invokeOnCompletion {
                Log.d(TAG, "mockReadTextFile() completed, throw = $it")
            }
        }
    }
}

我们在 MainActivity#onCreate 中通过 lifecycleScope 协程作用域启动了一个在主线程运行的协程，内部调用了一个挂起方法 mockReadTextFile 来模拟文件读取，mockReadTextFile 内部启动了一个运行在子线程中的协程来进行实际的耗时操作，并在操作完成后恢复被挂起的外部协程。运行结果：

如果在启动后 3 秒内退出 MainActivity，则不会有 toast 提示，同时会有被取消的 log：

借助 Lifecycle 库，就可以直接完成在 Activity 被 destroy 时取消在当前 Activity 中的协程作用域中启动的所有协程的功能。 Lifecycle 库的内部实现也并不复杂，参考 LifecycleCoroutineScope 源码实现：

kotlin 复制代码

internal class LifecycleCoroutineScopeImpl(
    override val lifecycle: Lifecycle,
    override val coroutineContext: CoroutineContext
) : LifecycleCoroutineScope(), LifecycleEventObserver {
    // ...
    
    /**
     * 存在生命周期的组件（Activity、Fragment）的生命周期状态改变时回调
     */
    override fun onStateChanged(source: LifecycleOwner, event: Lifecycle.Event) {
        if (lifecycle.currentState <= Lifecycle.State.DESTROYED) {
            lifecycle.removeObserver(this)
            // 如果组件已经被 destroy 了，则取消当前作用域中的所有协程
            coroutineContext.cancel()
        }
    }
}

除了 Android 官方的三方库以外，一些其它的 Android 三方库也已经支持协程，比如网络库 Okhttp 和 Retrofit。

总结

本文主要介绍了协程的概念与核心思想，并以 Kotlin 的协程实现为切入点，重点介绍了 Kotlin 协程的实现原理，包括协程的派发、调度策略；挂起和恢复的实现原理；异常处理和协程管理的知识。总体来说，协程是一个任务的调度框架，处于开发者和操作系统线程之间的层次，有了协程，开发者可以不用使用线程相关 api 即可完成异步任务的开发。对比开发者直接操作线程，协程有以下优势：

协程是运行态用户态的产物，协程的挂起等操作无需调用系统 api，也不会阻塞调用线程，进而不会有额外的线程阻塞、线程切换等系统资源的消耗。
协程本身是一个用户态内存中的对象（Job 类型），对比线程来说，协程非常轻量（内存占用），因此协程的最大个数上限比线程多得多。
协程底层默认有一个固定的线程池来处理异步任务，对于一些零散和碎片化的耗时任务，可以避免临时线程的创建，提升线程的复用次数，进而优化程序整体性能。当然也支持开发者通过自定义协程派发器来指定协程的运行线程。
协程的挂起和恢复机制在代码开发层面以一种同步调用的形式完成了异步调用 callback 的编程效果，同时保证挂起恢复前后代码执行的执行器（线程池/线程）一致，提升了代码的可读性，降低了开发成本。

随着时间的推移和生态的完善，通过协程来处理异步任务的应用会变得越来越广泛。

参考

book.kotlincn.net/text/d-kotl...

developer.android.com/kotlin/coro...

developer.android.com/topic/libra...

详解Kotlin协程实现原理

概念

协程使用

协程的派发和调度

调度策略

调度原理

挂起和恢复

挂起函数

挂起原理

自定义挂起函数

suspend 函数

协程上下文

异常处理

协程作用域

在 Android 中使用协程

总结

参考