Kotlin runBlocking 可不要乱用！

Kotlin 作为安卓开发的首选语言，开发者必须了解其底层机制。

Kotlin 最为突出的特性之一是在语言层面内置对异步和非阻塞编程的支持------协程。这为开发者提供了强大的工具来构建高效且响应灵敏的应用程序。

Kotlin 中的协程可以使用协程构建器来创建 ------ 这些是专门设计用于启动和管理协程的函数。常见的构建器包括 launch、async 和 runBlocking。在这些构建器中，runBlocking 经常出现在 Kotlin 官方文档中，用于演示协程的用法，例如在 main() 函数中：

Kotlin 复制代码

fun main() = runBlocking { // this: CoroutineScope
    launch { doWorld() }
    println("Hello")
}

// this is your first suspending function
suspend fun doWorld() {
    delay(1000L)
    println("World!")
}

这一段来自 Kotlin 官方文档的示例代码，如果你想特别轻巧的尝试一下协程，那么 runBlocking 一定是你的不二之选。许多安卓开发者应该已经熟悉它的用例。

那么，runBlocking 的底层工作原理是怎样的呢？

在本文中，我们将深入研究其内部机制，并通过一些示例，特别是在安卓开发中的示例，探讨为何要谨慎使用它。

理解

正如上面代码展示的那样，Kotlin 官方文档中的 runBlocking 用例就是

Kotlin 复制代码

fun main() = runBlocking {
  // 运行挂起函数
}

这样一种模板。

它通常会在 main 函数中展示 runBlocking 的用法。

runBlocking 之所以轻巧方便，因为它本身并不是一个挂起函数，你根本不需要启动一个协程域去使用协程。这可能会让你觉得："哇！我可以在没有协程作用域的前提下使用挂起函数，可以轻松获取结果。这太方便了！"

如今，大多数开发者都知道在安卓开发中使用 viewModelScope 安全地启动协程，以确保任务能根据 ViewModel 的生命周期自动取消。

然而，在协程刚推出时，开发者常犯像下面这样的错误：

Kotlin 复制代码

class MainViewModel(private val mainRepository: MainRepository) : ViewModel() {
  
  // 正常情况下，你不能这样做
  private fun fetchPosters() = runBlocking { 
    mainRepository.fetchPosters()
  }
 }

这种方法能顺利运行，不会给应用程序带来明显的延迟或问题，因为在大多数情况下，从网络获取一小部分数据可能并不会很慢（尤其是测试阶段离公司还近，很多测试服务甚至都是内网的）。

为什么这种方法会有问题呢？

根据官方文档，runBlocking 会启动一个新的协程，并阻塞当前线程，直到协程执行完毕。

嗯，对，runBlocking 会阻塞当前线程！

在安卓系统中，主线程负责屏幕渲染和处理用户界面交互。如果主线程被阻塞，或者被用于诸如 IO 操作这类任务，可能会导致屏幕冻结，甚至引发应用程序无响应（ANR）。

因此，在安卓用户界面代码中使用 runBlocking 来执行 IO 任务（这里的意思是在 IO 线程执行 IO 任务），例如查询数据库或从网络获取数据，在安卓开发中存在很大风险，应该避免使用。

那么，runBlocking 是如何实现阻塞当前线程的呢？

一窥

让我们深入探究 runBlocking 的内部，了解一下它在底层是如何运行的：

Kotlin 复制代码

public actual fun <T> runBlocking(context: CoroutineContext, block: suspend CoroutineScope.() -> T): T {
    contract {
        callsInPlace(block, InvocationKind.EXACTLY_ONCE)
    }
    val currentThread = Thread.currentThread()
    val contextInterceptor = context[ContinuationInterceptor]
    val eventLoop: EventLoop?
    val newContext: CoroutineContext
    if (contextInterceptor == null) {
        // create or use private event loop if no dispatcher is specified
        eventLoop = ThreadLocalEventLoop.eventLoop
        newContext = GlobalScope.newCoroutineContext(context + eventLoop) 
    } else {
        // See if context's interceptor is an event loop that we shall use (to support TestContext)
        // or take an existing thread-local event loop if present to avoid blocking it (but don't create one)
        eventLoop = (contextInterceptor as? EventLoop)?.takeIf { it.shouldBeProcessedFromContext() }
            ?: ThreadLocalEventLoop.currentOrNull()
        newContext = GlobalScope.newCoroutineContext(context)
    }
    val coroutine = BlockingCoroutine<T>(newContext, currentThread, eventLoop)
    coroutine.start(CoroutineStart.DEFAULT, coroutine, block)
    return coroutine.joinBlocking()
}

如果你查看 runBlocking 的内部实现，你会发现它在当前线程上启动一个新的协程，同时利用 GlobalScope 来获取协程上下文。

它会初始化一个 BlockingCoroutine 实例。查看 BlockingCoroutine 的内部实现，尤其是 joinBlocking 方法，就会发现该方法会完全阻塞并占用当前线程，直到所有任务完成。

Kotlin 复制代码

fun joinBlocking(): T {
    registerTimeLoopThread()
    try {
        eventLoop?.incrementUseCount()
        try {
            while (true) {
                val parkNanos = eventLoop?.processNextEvent() ?: Long.MAX_VALUE
                // note: process next even may loose unpark flag, so check if completed before parking
                if (isCompleted) break
                parkNanos(this, parkNanos)
                if (Thread.interrupted()) cancelCoroutine(InterruptedException())
            }
        } finally { // paranoia
            eventLoop?.decrementUseCount()
        }
    } finally { // paranoia
        unregisterTimeLoopThread()
    }
    // now return result
    val state = this.state.unboxState()
    (state as? CompletedExceptionally)?.let { throw it.cause }
    return state as T
}

如上面的代码所示，BlockingCoroutine 通过执行一个 while (true) 无限循环来阻塞当前线程。它持续处理当前线程事件循环中的事件（处于阻塞状态），只有当协程任务完成时，才会打破无限循环（解除阻塞）。

这是一个同步任务，它会阻塞当前线程，直到启动的协程任务完成执行。

因此，必须谨慎使用 runBlocking，以免阻塞 Android 主线程。否则可能会导致应用无响应 (ANR)，严重影响应用程序的性能和用户体验。

不慎

让我们通过分析示例代码，深入探究为何 runBlocking 在安卓开发中随意使用会带来问题。

一

现在你可能很好奇，如下例所示，将调度器设置为 Dispatchers.IO 来启动协程时会发生什么情况：

Kotlin 复制代码

// onCreate 和 onResume 中也有日志，此处省略，后续代码也会省略

fun sample1() = runBlocking(Dispatchers.IO) {
  val currentThread = Thread.currentThread()
  Log.d("tag_main", "currentThread: $currentThread")
  delay(3000) // 挂起 3 秒
  Log.d("tag_main", "job completed")
}

看起来一切都应该按预期工作，因为我们已经切换到使用 Dispatchers.IO 在后台线程上启动协程。

当你执行该函数时，它会输出类似于以下内容的日志：

makefile 复制代码

09:38:20.304 10614-10614 onCreate
09:38:20.319 10614-10713 currentThread: Thread[DefaultDispatcher-worker-1,5,main]
09:38:23.322 10614-10713 job completed
09:38:23.389 10614-10614 tag_main onResume

从上面的日志输出可以明显看出，在打印出 job completed 日志消息之前耗时 3 秒。

尽管 delay(3000) 函数在工作线程上运行（查看日志的中间两行，可以看到线程 id 和主线程不一样），但主线程仍然被阻塞，等待协程任务完成。因此，整个用户界面（UI）将冻结 3 秒，导致应用程序在此期间无响应。

在这种情况下，使用 runBlocking 并通过 Dispatchers.IO 在不同线程上运行协程并不能实现真正的异步行为。

二

如果你在 runBlocking 函数中使用 Dispatchers.Main 而非 Dispatchers.IO，会发生什么情况呢？由于 runBlocking 默认在主线程上运行，从理论上讲，使用如下示例代码应该能按预期工作：

Kotlin 复制代码

fun sample2() = runBlocking(Dispatchers.Main) {
  val currentThread = Thread.currentThread()
  Log.d("tag_main", "currentThread: $currentThread")
  delay(3000)
  Log.d("tag_main", "job completed")
}

然而，如果你运行上述函数，你会看到如下日志结果：

makefile 复制代码

09:40:29.944 11218-11218 onCreate

我劝你不要等太长时间，因为它已经卡死了。

这里有个有趣的现象：甚至关于当前线程的日志消息都没有打印出来，这表明在执行 runBlocking(Dispatchers.Main) 期间函数被阻塞了。此外，用户界面（UI）会一直处于冻结状态，并且无法在屏幕上渲染任何布局。

出现这种情况是因为 runBlocking 函数本质上会阻塞主线程来启动一个新的协程作用域。然而，协程作用域试图将上下文切换到 Dispatchers.Main，从而导致死锁。由于主线程已经被 runBlocking 占用，它无法在同一线程上处理该协程，进而引发了完全死锁。

因此，主线程会一直被无限期地阻塞，这种情况甚至比使用 Dispatchers.IO 时还要糟糕。

三

现在，让我们探讨另一种情况。由于 runBlocking 内部会阻塞当前线程，如果我们在工作线程上启动它会发生什么呢？看看下面的示例代码：

Kotlin 复制代码

fun sample3() = CoroutineScope(Dispatchers.IO).launch {
    // 当前线程是 IO 线程，因此 runBlocking 会阻塞 IO 线程。
    Log.d("tag_main", "out currentThread: ${Thread.currentThread()}")
    val result = runBlocking {
        // 当前线程是 IO 线程
        Log.d("tag_main", "inner currentThread: ${Thread.currentThread()}") 
        // 延迟3000毫秒
        delay(3000)
        // 任务完成
        Log.d("tag_main", "job completed")
    }
    Log.d("tag_main", "Result: $result")
}

如果你运行上述函数，你将看到以下日志结果：

makefile 复制代码

09:44:06.864 11409-11409 onCreate
09:44:06.879 11409-11454 out currentThread: Thread[DefaultDispatcher-worker-1,5,main]
09:44:06.881 11409-11454 inner currentThread: Thread[DefaultDispatcher-worker-1,5,main]
09:44:06.945 11409-11409 onResume
09:44:09.885 11409-11454 job completed
09:44:09.886 11409-11454 Result: 1

通过使用 CoroutineScope(Dispatchers.IO) 创建协程作用域，执行 runBlocking 的当前线程被切换到了一个工作线程，runBlocking 只会阻塞该工作线程。所以日志中，onCreate 和 onResume 并未受到影响。

这意味着，虽然执行过程需要 3 秒钟才能完成，但它完全在工作线程上运行，确保主线程保持不被阻塞且不受影响，避免了任何界面冻结问题。在这种情况下，完全避免了阻塞主线程和冻结界面布局的问题。

得道

那么在什么时候可以安全地使用 runBlocking 呢？

如果我们查看官方文档，我们会看到这样一句话：

it is designed to bridge regular blocking code to libraries that are written in suspending style, to be used in main functions and in tests.

翻译过来的话：其设计目的是桥接常规阻塞代码与采用挂起风格编写的库，适用于主函数和测试场景。

"桥接常规阻塞代码与采用挂起风格编写的库"，这句话的意思是说在如果想在普通函数中使用挂起函数，就可以使用 runBlocking，它会阻塞当前的常规函数一直到挂起函数执行完毕。

这里我们举两个常用的例子：单元测试和同步任务。

单元测试

runBlocking 最常见的用例之一是执行单元测试代码。在测试场景中，runBlocking 经常用于以阻塞方式测试挂起函数或基于协程的代码，如下例所示：

Kotlin 复制代码

private fun awaitUntil(timeoutSeconds: Long, predicate: () -> Boolean) {
    runBlocking {
        val timeoutMs = timeoutSeconds * 1_000
        var waited = 0L
        while (waited < timeoutMs) {
            if (predicate()) {
                return@runBlocking
            }

            delay(100)
            waited += 100
        }

        throw AssertionError("Predicate was not fulfilled within ${timeoutMs}ms")
    }
}

这种方法在可控协程上下文的同步测试环境中特别有用，能确保断言的行为可预测。

然而，在单元测试中，runTest 通常是更可取的方法。在 JVM 和原生平台上，它的功能与 runBlocking 类似，但增加了跳过代码中延迟的优势。这使你能够使用 delay 而不会延长测试执行时间，从而实现更高效、更快的测试。

同步任务

runBlocking 的第二种用例出现在你能够确定操作将在 IO 线程上运行的情况下。由于 runBlocking 会阻塞当前线程，直到协程任务完成，因此对于在 IO 线程上运行同步任务来说它可能是合适的，在 IO 线程中阻塞行为是可以接受的。

例如，Stream Video SDK 使用 runBlocking 来实现其套接字重新连接功能。这是因为在套接字完全断开连接后，套接字的发布者和订阅者必须正确关闭。该 SDK 谨慎地确保 prepareRejoin 方法仅在 IO 线程上执行，以维护线程安全性和可靠性。

或者，你可以使用 Job.join() 方法而不是 runBlocking 来同步执行协程任务。

此处我们探讨一下 Job.join() 。

首先了解 launch() 方法，launch() 会返回一个 Job 实例。如下所示：

Kotlin 复制代码

fun nonBlockingSample() = CoroutineScope(Dispatchers.IO).launch {
  // 当前线程是 IO 线程，因此 runBlocking 将阻塞 IO 线程。
  Log.d("tag_main", "out currentThread: ${Thread.currentThread()}")
  val result = launch {
    Log.d("tag_main", "inner currentThread: ${Thread.currentThread()}") // 当前线程是 IO 线程
    delay(3000)
    Log.d("tag_main", "end launch")
  }
  Log.d("tag_main", "job completed: $result")
}

makefile 复制代码

09:47:23.311 11586-11586 onCreate
09:47:23.329 11586-11637 out currentThread: Thread[DefaultDispatcher-worker-1,5,main]
09:47:23.330 11586-11637 job completed: StandaloneCoroutine{Active}@6fc75b1
09:47:23.330 11586-11639 inner currentThread: Thread[DefaultDispatcher-worker-3,5,main]
09:47:23.392 11586-11586 onResume
09:47:26.340 11586-11639 end launch

你会发现，job completed 这条日志消息比 end launch 这条消息打印得更早。

这种行为是由 launch() 函数的特性导致的。launch() 函数会启动一个新的协程，而不会阻塞当前线程，并返回一个指向该协程的 Job 引用，这样它就可以异步执行，而后续任务也能继续进行。但是如果你想让协程等待这个任务执行完成该怎么办呢？

解决方案是使用 Job.join() 方法，该方法会挂起协程，直到任务完成。只要调用该方法的协程的任务仍然处于活动状态，那么一旦该任务因任何原因完成，此方法就会恢复执行。你可以像下面的示例所示，出于同步目的来使用 Job.join()：

Kotlin 复制代码

fun suspending() = CoroutineScope(Dispatchers.IO).launch {
  // 当前线程是 IO 线程，因此 runBlocking 将阻塞 IO 线程。
  Log.d("tag_main", "out currentThread: ${Thread.currentThread()}")
  val result = launch {
    Log.d("tag_main", "inner currentThread: ${Thread.currentThread()}") // 当前线程是 IO 线程
    delay(3000)
    Log.d("tag_main", "end launch")
  }
  // 暂停协程，直到此任务完成。当任务因任何原因完成且调用此方法的协程的任务仍处于活动状态时，此调用将正常恢复（无异常）。如果此任务仍处于新状态，此函数还将启动相应的协程。
  result.join()
  Log.d("tag_main", "job completed: $result")
}

如果你运行上面的示例，你将看到以下日志消息：

makefile 复制代码

09:49:14.935 11765-11765 onCreate
09:49:14.951 11765-11824 out currentThread: Thread[DefaultDispatcher-worker-1,5,main]
09:49:14.953 11765-11826 inner currentThread: Thread[DefaultDispatcher-worker-3,5,main]
09:49:15.016 11765-11765 onResume
09:49:17.964 11765-11826 end launch
09:49:17.969 11765-11826 job completed: StandaloneCoroutine{Completed}@c7151ae

正如上面的日志消息所示，打印 end launch 消息花了 3 秒钟，随后打印了 job completed 消息。

总之

在本文中，我们已经探讨了为何应谨慎使用 runBlocking，尤其是在安卓系统上。

近年来，协程因其能在语言层面处理异步任务而大受欢迎，成为应用最为广泛的工具之一。然而，要在项目中有效使用协程，需要了解其确切作用、内部机制以及正确的应用方式。