深入理解 Kotlin 协程 (三)：返璞归真，探寻协程基础设施的底层基石

前言

相比其他语言，Kotlin 协程的实现分为两层：

基础设施层: 标准库中的协程 API，主要对协程提供了概念和语义上的基础支持。
业务框架层: 协程的上层框架支持，如我们常用的 kotlinx.coroutines 库。

本文着重于基础设施层的介绍，另外，为了方便区分：我们将使用基础设施创建的协程叫做简单协程 ；将使用业务框架封装后得到的协程叫做复合协程。

阅读指南

阅读完本文后，你将能够回答以下五个核心问题：

协程究竟是如何被创建和执行的？
为什么普通函数不能直接调用挂起函数？
"挂起"的本质是什么？恢复时怎么知道从哪继续往下执行？
协程的名称、异常处理器等配置信息是如何传递的？
业务框架（如 Dispatchers）是如何接管并切换协程线程的？

协程的构造和启动

从创建到运行的底层机制

在 Kotlin 中，创建一个简单协程只需这样：

kotlin 复制代码

val continuation = suspend { // 协程体
    println("Within the coroutine")
    0
}.createCoroutine(
    // 协程完成回调
    completion = object : Continuation<Int> {
        override fun resumeWith(result: Result<Int>) {
            println("The coroutine ends, and the result is $result")
        }

        override val context = EmptyCoroutineContext
    }
)

我们利用了标准库中的 createCoroutine 函数来创建协程，不过这个协程并不会立即执行，而是和 Lua 协程类似，需要对返回的控制实例调用 resume(Unit) 函数来启动。

kotlin 复制代码

continuation.resume(Unit)

为什么调用 resume 就会触发协程体的执行呢？

其实 createCoroutine 函数会创建一个 SafeContinuation 实例并返回，而这个 SafeContinuation 只是 Continuation 实例的代理包装类，为的是让开发者能够安全地使用它。

kotlin 复制代码

public fun <T> (suspend () -> T).createCoroutine(
    completion: Continuation<T>
): Continuation<Unit> =
    SafeContinuation(createCoroutineUnintercepted(completion).intercepted(), COROUTINE_SUSPENDED)

createCoroutineUnintercepted(completion) 才是创建协程实例的真正代码，只不过创建后，我们立即就调用 intercepted() 将其拦截住了，这正是协程不会立即执行的原因。

接着通过断点调试可知，协程实例的类名类似于 <FileName>Kt$<FunctionName>$continuation$1，这是一个匿名内部类，它由 Kotlin 编译器对协程体（suspend Lambda）编译并改造后得到，这个类继承自 SuspendLambda 类，同时又是 Continuation 接口的实现类。

其中函数变类 的过程很简单，编译器只是生成了一个类，并将这段协程代码放进了类中的 invokeSuspend 方法中（invokeSuspend 是父类 BaseContinuationImpl 中声明的抽象函数），变类的原因是普通函数不能存放状态，而类可以保存执行状态。

我们可以配合显示 Kotlin 字节码工具来对照查看，从原视频 10"35'开始，讲解到 14"55'。

匿名类的完整层级：

Plaintext 复制代码

┌─────────────────────────────────────────────┐
│                Continuation (接口)          │  ← 协程的核心接口
└───────────────────┬─────────────────────────┘
                    │ 实现
┌───────────────────▼─────────────────────────┐
│        BaseContinuationImpl (抽象类)        │  ← 父类1：声明 invokeSuspend 抽象方法
└───────────────────┬─────────────────────────┘
                    │ 继承
┌───────────────────▼─────────────────────────┐
│          SuspendLambda (抽象类)             │  ← 父类2：中间过渡类，专门给 suspend lambda 使用
└───────────────────┬─────────────────────────┘
                    │ 继承
┌───────────────────▼─────────────────────────┐
│     编译器生成的匿名类 (你的协程体)          │  ← 最终的类
└─────────────────────────────────────────────┘

这下就很清晰了，调用 createCoroutine 返回的 Continuation 实例，本质上就是包装过后的协程体，因此调用其 resume 就可以触发协程体的执行。

完整的调用路径：

Plaintext 复制代码

Continuation.resume() => Continuation.resumeWith() => BaseContinuationImpl.resumeWith() => BaseContinuationImpl.invokeSuspend() => 匿名类实现的 invokeSuspend() => 我们写的协程体

至于 SafeContinuation 具体做了哪些"安全手段"，我们可以去源码中看看，它的核心价值在于处理并发环境下的竞态条件。

kotlin 复制代码

internal actual class SafeContinuation<in T>
internal actual constructor(
    private val delegate: Continuation<T>,
    initialResult: Any?
) : Continuation<T>, CoroutineStackFrame {
    // ...

    @Volatile
    private var result: Any? = initialResult

    private companion object {
        @Suppress("UNCHECKED_CAST")
        private val RESULT = AtomicReferenceFieldUpdater.newUpdater<SafeContinuation<*>, Any?>(
            SafeContinuation::class.java, Any::class.java as Class<Any?>, "result"
        )
    }

    public actual override fun resumeWith(result: Result<T>) {
        while (true) { // lock-free loop
            val cur = this.result // atomic read
            when {
                cur === UNDECIDED -> if (RESULT.compareAndSet(this, UNDECIDED, result.value)) return
                cur === COROUTINE_SUSPENDED -> if (RESULT.compareAndSet(this, COROUTINE_SUSPENDED, RESUMED)) {
                    delegate.resumeWith(result)
                    return
                }
                else -> throw IllegalStateException("Already resumed")
            }
        }
    }

    // ... 
}

关键在于 resumeWith 函数：它通过底层的 CAS（compareAndSet）死循环操作，对协程状态进行了安全校验。这样可以保证，无论异步结果是瞬间返回（同步完成）的，还是很久之后才返回，resumeWith 都不会错误地被调用多次：

kotlin 复制代码

continuation.resume(Unit)
continuation.resume(Unit) // java.lang.IllegalStateException: Already resumed

一般来说，我们创建协程后会希望立即执行，标准库也提供了一步到位的 API：startCoroutine。

kotlin 复制代码

public fun <T> (suspend () -> T).startCoroutine(
    completion: Continuation<T>
) {
    createCoroutineUnintercepted(completion).intercepted().resume(Unit)
}

可以看到，协程创建好并拦截后，立即又调用了 resume(Unit) 恢复执行。

带有 Receiver 的协程体 (作用域)

与协程创建和启动的 API 还有另外一组：

kotlin 复制代码

public fun <R, T> (suspend R.() -> T).createCoroutine(
    receiver: R,
    completion: Continuation<T>
): Continuation<Unit> =
    SafeContinuation(createCoroutineUnintercepted(receiver, completion).intercepted(), COROUTINE_SUSPENDED)
    
public fun <R, T> (suspend R.() -> T).startCoroutine(
    receiver: R,
    completion: Continuation<T>
) {
    createCoroutineUnintercepted(receiver, completion).intercepted().resume(Unit)
}

不同之处是多了一个 receiver: R，它为协程体提供了一个作用域，我们可以在 lambda 中直接使用作用域提供的函数或状态。

为了方便使用，我们封装一个带有 Receiver 的协程启动函数 launchCoroutine：

kotlin 复制代码

fun <R, T> launchCoroutine(receiver: R, block: suspend R.() -> T) {
    block.startCoroutine(
        receiver = receiver,
        completion = object : Continuation<T> {
            override fun resumeWith(result: Result<T>) {
                println("The coroutine ends, and the result is $result")
            }

            override val context = EmptyCoroutineContext
        }
    )
}

使用时，需要创建一个作用域，并在其中定义可调用的挂起函数：

kotlin 复制代码

class MyScope {
    suspend fun myDelay(delay: Long) {
        println("Latency of $delay milliseconds")
    }
}

fun main() {
    launchCoroutine(MyScope()) {
        println("Within the coroutine")
        myDelay(3000)
    }
}

作用域除了提供函数支持外，还可以用来增加限制。例如增加 RestrictsSuspension 注解，那么协程体内就只能调用该作用域下定义的挂起函数了。

kotlin 复制代码

@RestrictsSuspension
class MyScope {
    suspend fun myDelay(delay: Long) {
        println("Latency of $delay milliseconds")
    }
}

fun main() {
    launchCoroutine(MyScope()) {
        println("Within the coroutine")
        myDelay(3000)
        delay(2000) // 报错：Restricted suspending functions can invoke member or extension suspending functions only on their restricted coroutine scope.
    }
}

注意：delay 函数不在标准库中，而是属于业务框架层，所以需要专门引入 kotlinx.coroutines，请参考文档。

揭秘可挂起的 main 函数

Kotlin 1.3 版本添加了一个特性：main 函数可直接被声明为挂起函数。这样一来，我们从程序入口就可以拥有一个协程。

不过，这个可挂起的 main 函数并不是程序真正的入口，因为 JVM 根本不知道什么是 Kotlin 协程。反编译后可以得知，Koltin 编译器帮我们生成了一个真正的普通 main 函数，在其中调用了 runSuspend 函数来执行我们写的逻辑。

kotlin 复制代码

suspend fun suspendMain() {
    // ...我们的逻辑
}

// JVM 真正的程序入口
fun main() {
    runSuspend {
        suspendMain()
    }
}

@SinceKotlin("1.3")
internal fun runSuspend(block: suspend () -> Unit) {
    val run = RunSuspend()
    // 启动协程
    block.startCoroutine(run)
    run.await()
}

private class RunSuspend : Continuation<Unit> {
    override val context: CoroutineContext
        get() = EmptyCoroutineContext

    var result: Result<Unit>? = null

    override fun resumeWith(result: Result<Unit>) {
        this.result = result
        @Suppress("PLATFORM_CLASS_MAPPED_TO_KOTLIN") (this as Object).notifyAll()
    }

    fun await() = synchronized(this) {
        while (true) {
            when (val result = this.result) {
                null -> @Suppress("PLATFORM_CLASS_MAPPED_TO_KOTLIN") (this as Object).wait()
                else -> {
                    result.getOrThrow()
                    return
                }
            }
        }
    }
}

这里的 RunSuspend 类实现了 Continuation 接口，作为整个程序的完成回调。它内部通过 Object.wait() 阻塞当前主线程（run.await()）。当协程内部运行完成后，它的 resumeWith 函数会被调用，内部触发 notifyAll()。使得外面的主线程可以停止等待，从而程序正常退出。

原视频讲解从 19"13' 到 25"26'

深入解析函数的挂起

什么是挂起函数

使用 suspend 关键字修饰的函数就是挂起函数，它只能在协程体或是其他挂起函数中调用。

kotlin 复制代码

suspend fun suspendFunc(number: Int) = suspendCoroutine { continuation ->
    thread {
        continuation.resumeWith(Result.success(number))
    }
}

在 suspendFunc 函数中，我们使用了标准库中的 suspendCoroutine 函数来获取当前所在协程的 Continuation<T> 实例。当协程执行到这里，遇到了真实的异步操作（如启动线程），当前执行流程的状态就会进入等待状态。

所谓的协程挂起，其实就是执行流程中发生异步调用时交出线程的控制权 。注意: 挂起函数并不意味着一定真正意义上的挂起，它只是提供了允许被挂起的环境。

挂起点与真正的挂起

从前面的代码可以看出，想要挂起，就需要一个 Continuation 实例。这个实例是从何而来的呢？

我们在最开始得知：协程体本身就是一个 Continuation 实例。正因为这样，挂起函数才能在协程体中运行。而该 Continuation 实例就是由最外部的协程体层层隐式传递进来的。

在协程内部调用挂起函数的地方被称为挂起点 ，如果在挂起点发生了异步操作，当前协程就会被挂起，直到与之对应的 Continuation.resume 被调用，该协程才会恢复执行。

那么，函数是怎么知道当前该不该挂起，从而交出控制权的呢？

这就要归功于前面提到的 SafeContinuation 了，它在底层存在着快慢路径机制。当我们调用 suspendCoroutine 时，它会同步地执行传入的 Lambda 代码块，然后立马就调用 SafeContinuation.getOrThrow() 函数去获取结果。

此时，就会出现两条路径：

1. 快路径（Fast Path - 同步瞬间完成）

例如下面这段代码，Lambda 内部是同步逻辑，瞬间就调用了 resume 传递了结果 0。

kotlin 复制代码

suspend fun notSuspendFunc() = suspendCoroutine { continuation -> // 这是一个 SafeContinuation 实例
    continuation.resumeWith(Result.success(value = 0))
}

当调用 getOrThrow() 时，发现结果已经存在了，它就会直接把真实数据（0）返回给外层的状态机。因为返回的不是挂起标志，协程就不会 return 交出线程控制权，而是会接着往下走。效果相当于同步返回。

2. 慢路径 (Slow Path - 真正异步挂起)

如果 Lambda 内部开启了新线程或执行了真实的 IO 操作，Lambda 代码块本身会瞬间执行完，但还没来得及回调 resume。

此时底层调用 getOrThrow() 获取结果，发现状态还是"未决定（UNDECIDED）"，那么它就会立即向外层的状态机返回一个 COROUTINE_SUSPENDED 常量标志 。外层一旦收到这个特殊标志，就会立马 return，真正地让出线程控制权挂起。

只有 resume 函数和对应的挂起函数调用在不同的调用栈（如切线程），或是当前挂起函数返回后的某个时刻 resume 再执行（如事件循环），才会走入慢路径，发生真正的异步挂起。

编译期魔法：CPS 变换与状态机

CPS 变换（Continuation-Passing-Style Transformation，连续传递风格）是通过传递 Continuation 来控制异步调用流程的技术。

Kotlin 协程挂起时，会把挂起点的信息保存到 Continuation 对象中。前面我们说过 suspendCoroutine 函数中的 Continuation 实例是外部隐式传递进来的，我们可以通过 Java 代码调用来验证这一点：

kotlin 复制代码

public class Test {
    public static void main(String[] args) {
        // Java 视角下，notSuspendFunc 多了一个 Continuation 参数，并且返回值变成了 Object
        Object result = DemoKt.notSuspendFunc(new Continuation<>() {
            @Override
            public @NotNull CoroutineContext getContext() {
                return EmptyCoroutineContext.INSTANCE;
            }

            @Override
            public void resumeWith(@NotNull Object o) {
                // ...
            }
        });
    }
}

这里的返回值 result 有两层含义：

如果挂起函数同步返回，该值就是我们想要的业务结果（此代码中为 0）；
如果挂起函数真正挂起 了，这时的返回值就是一个挂起标志 ，真正的结果需要后续从 resumeWith 方法的回调中获取。

这个挂起标志是一个常量，定义在 Intrinsics.kt 中：

kotlin 复制代码

@SinceKotlin("1.3")
public val COROUTINE_SUSPENDED: Any get() = CoroutineSingletons.COROUTINE_SUSPENDED

@SinceKotlin("1.3")
@PublishedApi
internal enum class CoroutineSingletons { COROUTINE_SUSPENDED, UNDECIDED, RESUMED }

但这里还有一个关键的问题：挂起后协程让出了线程控制权（即函数 return 了），在下次恢复时，它是怎么知道从哪行代码接着往下执行的？

这就不得不提到编译器魔法中最核心的机制：状态机（State Machine）。

编译器不仅会给挂起函数增加一个 Continuation 参数，改变挂起函数的返回值，还会把我们编写的线性协程代码，按照挂起点划分为多个状态块，使用一个 label 变量来记录当前的"执行进度"。

例如下面这段代码：

kotlin 复制代码

suspend fun fetchUser() {
    println("Start")
    val data = requestNetwork() // 这是一个真正的挂起点
    println("The data obtained from the network is $data")
    println("End: $data")
}

suspend fun requestNetwork(): Int = suspendCoroutine {
    thread {
        Thread.sleep(3000)
        it.resume(Random.nextInt())
    }
}

经过编译器的 CPS 变换后，改造后的伪代码是这样的：

kotlin 复制代码

fun fetchUser(cont: Continuation<Any?>): Any? {
    // 包装或复用状态机对象
    val sm = cont as? StateMachine ?: StateMachine(cont)

    when (sm.label) {
        0 -> {
            println("Start")
            sm.label = 1 // 推进状态
            // 将状态机本身作为 Continuation 传递给下一个挂起函数
            val result = requestNetwork(sm)
            // 如果遇到真正的异步，直接 return 挂起标志，交出控制权
            if (result == COROUTINE_SUSPENDED) {
                return COROUTINE_SUSPENDED
            }
        }

        1 -> {
            // 异步任务完成后，会调用 sm.resumeWith()，重新触发本函数，此时 label 已经是 1 了
            val data = sm.result
            println("The data obtained from the network is $data")
            println("End: $data")
            return Unit
        }
    }
}

这下我们就完全明白了：挂起函数只是参数中比普通函数多了一个隐含的 Continuation 状态机实例。也正因为编译器需要隐式传递这个状态机参数，普通函数无法直接调用挂起函数。

补充：Jetpack Compose 也借鉴了这种思想，使用 @Composable 注解来定义 Composable，编译器同样会在底层隐式传递参数，所以只有在 @Composable 环境中才能调用其他的 Composable 函数。

协程上下文 (CoroutineContext)

类似集合的数据结构

协程上下文的作用就是为协程的执行提供资源支持。它的数据结构实现类似于一条单链表，或者说非常像 List、Map 等集合。

空的协程上下文，我们已经见过很多次了：

kotlin 复制代码

val emptyContext: CoroutineContext = EmptyCoroutineContext

协程上下文的具体元素是 Element：

kotlin 复制代码

public interface Element : CoroutineContext {
    public val key: Key<*>

    public override operator fun <E : Element> get(key: Key<E>): E? =
        @Suppress("UNCHECKED_CAST")
        if (this.key == key) this as E else null

    public override fun <R> fold(initial: R, operation: (R, Element) -> R): R =
        operation(initial, this)

    public override fun minusKey(key: Key<*>): CoroutineContext =
        if (this.key == key) EmptyCoroutineContext else this
}

可以看到，它自己也实现了 CoroutineContext 接口。这看上去很奇怪，不过想象一下链表节点和链表头可以是同一种类型的抽象，就能理解了，

这主要是为了确保元素只代表自己，不存放其他数据，并且使得单个元素和元素集合的用法能保持统一（可以直接用 + 号拼接）。

其中还有一个关键属性 key，这就是协程上下文元素在集合中用于检索的唯一标识。

自定义上下文元素

我们可以继承 AbstractCoroutineContextElement 抽象类来实现自定义的协程上下文元素：

kotlin 复制代码

// 协程名称
class CoroutineName(val name: String) : AbstractCoroutineContextElement(Key) {
    companion object Key : CoroutineContext.Key<CoroutineName>
}

// 协程异常处理器
class CoroutineExceptionHandler(
    val onErrorHandler: (Throwable) -> Unit
) : AbstractCoroutineContextElement(Key) {
    companion object Key : CoroutineContext.Key<CoroutineExceptionHandler>

    fun onError(error: Throwable) {
        error.printStackTrace()
        onErrorHandler(error)
    }
}

上下文的实战运用

在结果获取时（也就是在最初传入的那个完成回调的 resumeWith 函数中），如果出现异常，我们就可以从上下文中获取 CoroutineExceptionHandler，并将异常传递给它。

kotlin 复制代码

fun main() {
    var emptyContext: CoroutineContext = EmptyCoroutineContext
    // 像操作集合一样添加元素
    emptyContext += CoroutineName(name = "co-01")
    emptyContext += CoroutineExceptionHandler(onErrorHandler = { t ->
        // 异常处理逻辑
        println("When encountering exception ${t.message}, return directly")
    })

    // 为协程添加上下文
    val continuation = suspend {
        println("Within the coroutine")
        val num = Random.nextInt(1, 11)
        if (num > 5) {
            throw Exception("the number is too large")
        } else {
            num
        }
    }.createCoroutine(object : Continuation<Int> {
        override val context: CoroutineContext
            get() = emptyContext

        override fun resumeWith(result: Result<Int>) {
            result.onFailure { exception ->
                // 从 context 中根据 Key 拿出异常处理器
                context[CoroutineExceptionHandler]?.onError(exception)
            }.onSuccess {
                // 根据 Key 拿出协程名字
                val coroutineName = context[CoroutineName]?.name
                println("The coroutine name is $coroutineName, and the result is $result")
            }
        }
    })

    continuation.resume(Unit)
}

如果在协程体 内部，我们还可以通过调用标准库中的 currentCoroutineContext() 函数直接获取当前协程的上下文：

kotlin 复制代码

suspend {
    val coroutineName = currentCoroutineContext()[CoroutineName]?.name
    println("Inside the coroutine, the coroutine name is $coroutineName")
}

协程的拦截器 (Interceptor)

拦截器的作用时机

标准库中提供了一个核心组件：拦截器（Interceptor）。它允许我们拦截协程底层 Continuation 的每一次恢复调用（resumeWith）。正因为能够拦截到恢复的时机，所以它常被用来完成底层线程的切换封装（也就是业务框架中 Dispatcher 调度器的核心原理）。

只有真正发生异步挂起并在后续恢复时，才会调用内部 Continuation 的 resume。此外，协程在最开始启动的时候，也会调用一次 resume 来执行到第一个挂起点之前的代码。因此，协程执行完毕的恢复调用的次数为 n+1 次，其中 n 是协程体内部真正发生异步挂起的次数。

例如下面的代码中，会有三次恢复调用：

kotlin 复制代码

fun main() {
    suspend {
        suspendFunc(Random.nextInt(3, 5)) // 挂起点1
        suspendFunc(Random.nextInt(2, 3)) // 挂起点2
    }.startCoroutine(object : Continuation<Unit> {
        override val context: CoroutineContext
            get() = EmptyCoroutineContext

        override fun resumeWith(result: Result<Unit>) {}
    })
}

suspend fun suspendFunc(number: Int) = suspendCoroutine { continuation ->
    thread {
        val mills = (number * 1000).toLong()
        Thread.sleep(mills)
        continuation.resumeWith(Result.success(Unit))
    }
}

动手写一个日志拦截器

拦截器本质上也是协程上下文 的元素实现之一，我们可以实现拦截器的接口 ContinuationInterceptor，在挂起点恢复执行的时候，拦截并添加额外的 AOP 操作：

kotlin 复制代码

// 日志拦截器
class LogInterceptor : ContinuationInterceptor {
    // 关键函数
    override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> {
        // 返回一个被代理的 Continuation 实例
        return LogContinuation(continuation)
    }

    override val key: CoroutineContext.Key<*>
        get() = ContinuationInterceptor // 拦截器的固定 Key
}

// 静态代理协程
class LogContinuation<T>(
    private val continuation: Continuation<T>
) : Continuation<T> by continuation {
    override fun resumeWith(result: Result<T>) {
        println("Called before resumeWith")
        continuation.resumeWith(result) // 放行，调用原本的 resume
        println("Called after resumeWith")
    }
}

示例代码：

kotlin 复制代码

fun main() {
    suspend {
        suspendFunc(Random.nextInt(0, 3))
    }.startCoroutine(completion = object : Continuation<Unit> {
        override val context: CoroutineContext
            get() = LogInterceptor() // 将拦截器放入上下文

        override fun resumeWith(result: Result<Unit>) {
            println("Coroutine End, result is $result")
        }
    })
}

拦截器的底层执行流转

在开头我们说过 SafeContinuation 是一个代理包装类，里面的 delegate 属性就代表了协程体本身。

如果添加了拦截器，delegate 就不是协程体了，而是变成了经过拦截器代理之后 的 Continuation 实例（即前面的 LogInterceptor）。

想要让协程恢复执行，就要先经过 SafeContinuation 的安全校验，再经过拦截器的 AOP 逻辑，最后才会真正到达生成的协程体类（状态机）中。

1.协程创建好后，会触发 intercepted()：

kotlin 复制代码

// Continuation.kt
@SinceKotlin("1.3")
@Suppress("UNCHECKED_CAST")
public fun <T> (suspend () -> T).startCoroutine(
    completion: Continuation<T>
) {
    createCoroutineUnintercepted(completion).intercepted().resume(Unit)
}

该函数会将底层的协程体转为 ContinuationImpl 父类型，并调用其 intercepted()：

kotlin 复制代码

// Intrinsics.kt
@SinceKotlin("1.3")
public actual fun <T> Continuation<T>.intercepted(): Continuation<T> =
    (this as? ContinuationImpl)?.intercepted() ?: this

最终，该方法会从上下文中取出我们设置的拦截器，并调用它的 interceptContinuation() 进行包装：

kotlin 复制代码

// ContinuationImpl.kt
public fun intercepted(): Continuation<Any?> =
    intercepted
        ?: (context[ContinuationInterceptor]?.interceptContinuation(this) ?: this)
            .also { intercepted = it } // 缓存起来，避免重复包装