Kotlin Sequence 真的如此不堪吗？

大家吼哇，今天吃了吗？吃的什么？前段时间（2025年02月28日）Kotlin官方公众号发布了他们二月份的技术月报：《Kotlin 技术月报 | 2025 年 2 月》，其中有一篇被提及的文章引起了我的注意：应该使用 Kotlin Sequences 来提高性能吗？(Should you use Kotlin Sequences for Performance?)

原文内容

为什么这篇文章会引起我的好奇呢？因为这篇文章得到的结论非常的 "反直觉" 。从标题不难看出，这篇文章探讨的内容是 Kotlin 的 Sequence 和有关它的性能问题。有趣的是，这篇文章得出的最终结论是：当一组数据数据量越大 、中间操作越多 时，使用 Sequence 进行操作的效率就越低。

借原文的代码举个例子，假如有如下代码：

Kotlin 复制代码

object Db {
    fun getItems(): List<DbModel>
}

fun getItemsList(): List<UiModel> {
    return Db.getItems()
        // 下面的操作视为"中间操作"
        .filter { it.isEnabled }
        .map { UiModel(...) }
}

fun getItemsListUsingSequence(): List<UiModel> {
    return Db.getItems()
        .asSequence()
        // 下面的操作视为"中间操作"
        .filter { it.isEnabled }
        .map { UiModel(...) }
        .toList()
}

按照原文的结论，那么假如 Db.getItems 得到的 List 中的元素越多，那么 Sequence 的效率越低；假如 中间操作 越多，Sequence 的效率越低。

对此结论的部分原文内容摘抄：

Benchmark Results: Sequences can be slow

Here's the results on my MacBook Pro M1, running on Temurin JDK 21:

Test Operations per second

List 1,636,222

Sequence 1,491,436

Flow 1,192,928

So here's me eating humble pie: using a sequence for simple chained operations is about 9% slower than not.

So I went ahead and tweaked each function to be more extreme, and perform a bunch more filtering and mapping. I count 7 intermediary collections created in this example, but the Flow and Sequence versions should still be creating zero. With this in mind, I expected the sequence version to pull ahead...
Kotlin 复制代码
fun getItemsList(): List<UiModel> {
  return Db.getItems()
      .filter { it.isEnabled }
      .map { UiModel(it.id) }
      .filter { true }
      .map { UiModel(it.id) }
      .filter { true }
      .map { UiModel(it.id) }
      .filter { true }
      .map { UiModel(it.id) }
}
Test Operations per second

List 663,391

Sequence 364,947

Flow 671,243

Lessons Learnt

Sequences can be slower due to per-element function call overhead. I'd go as far to say that they are nearly always slower today. The more complex your operation, the higher the cost.

Flows can optimize some chained operations better than expected, but don't use them for that. Use them for their asynchronousity.

Collections are often the best fastest choice for performance.

Eating humble pie 🥧

Apologies for the many times I've asked my coworkers in code reviews to use asSequence to improve performance.

...

Update 2: Large data set

As there were few people commenting to the effect of "that list is too small", I re-ran the benchmark using 100,000 items (instead of 100). The differences grew...

Test Operations per second

List 623

Sequence 245

Flow 792

ImmutableArray 757

Test	Operations per second
List	1,636,222
Sequence	1,491,436
Flow	1,192,928

Test	Operations per second
List	663,391
Sequence	364,947
Flow	671,243

Test	Operations per second
List	623
Sequence	245
Flow	792
ImmutableArray	757

提出质疑

如果你比较了解 Sequence 的话，应该知道它是一个惰性的迭代类型，你可以把它近似地当成 Java 中的 Stream。

Along with collections, the Kotlin standard library contains another type -- sequences (Sequence). Unlike collections, sequences don't contain elements, they produce them while iterating. Sequences offer the same functions as Iterable but implement another approach to multi-step collection processing.

摘自官方文档 - Sequences

也正因此，原文的结论就令人难以信服。假如在各个方面 ------ 尤其是对大量数据进行复杂地中间操作 ------ Sequence 的效率远不及 List, 那么 Sequence 岂不是除了降低内存使用以外就毫无用处了？那 IDEA 更没有理由会在这类情况下主动提示你使用 asSequence 来优化代码效率了。

除此之外，还有很多奇怪的地方。比如在 100,000 个元素的情况下，Flow 的效率竟然比 List 操作还要高 21%，甚至是4个基准中最快的那个。

带着一丝怀疑，我决定亲自试一试。

原基准测试

好消息是，原作者非常贴心地提供了他所使用的基准测试代码： gist.github.com/chrisbanes/...

当代码拷贝完成后，我首先感觉到了这代码中似乎有一些不妥之处。不过这先按下不表，先遵照原教旨主义跑一遍试试。而测试结果也的确不出我所料：

bash 复制代码

Benchmark                                    (size)   Mode  Cnt        Score       Error  Units
OriginalBenchmark.flow                          N/A  thrpt   10  1197880.963 ± 14766.372  ops/s
OriginalBenchmark.list                          N/A  thrpt   10  1344318.180 ± 38601.373  ops/s
OriginalBenchmark.sequence                      N/A  thrpt   10  1572128.363 ± 10080.851  ops/s

Test	ops/s
flow	1,197,880
list	1,344,318
sequence	1,572,128

根据我本地的基准测试情况，在 100 个元素的情况下，最终的效率情况是 Sequence (+17%) > List (+12%) > Flow，也就是 Sequence 比 List 效率高大约 17%，而 List 比 Flow 效率高约 12%，可以说与原博的结论正好相反。

简单改进

虽然直接按照原本的基准测试，就已经与原博的结论不符了，不过还是让我们回到之前。我说过有一处我认为不太妥的地方，那就是我发现它原本的基准测试中将直接构建 List 的过程也纳入到了基准测试的作用范围中。但是很明显，构建测试用的 List 并非我们需要进行测量的部分，因此很可能导致最终的结果不准确。

于是，我在 gist.github.com/ForteScarle... 中做了简单的修改，并再次进行了测试：

vbnet 复制代码

Benchmark                                    (size)   Mode  Cnt        Score        Error  Units
big_operation:flow                             100  thrpt   10   158615.807 ±   6689.494  ops/s
big_operation:list                             100  thrpt   10   309514.040 ±  10804.965  ops/s
big_operation:sequence                         100  thrpt   10   297239.770 ±   5363.794  ops/s
big_operation:flow                          100000  thrpt   10      198.088 ±      9.203  ops/s
big_operation:list                          100000  thrpt   10      248.923 ±      7.346  ops/s
big_operation:sequence                      100000  thrpt   10      304.232 ±      4.394  ops/s
small_operation:flow                           100  thrpt   10  1343646.075 ±  57013.270  ops/s
small_operation:list                           100  thrpt   10  1905597.088 ± 122824.654  ops/s
small_operation:sequence                       100  thrpt   10  2333803.105 ± 146253.368  ops/s
small_operation:flow                        100000  thrpt   10     2231.133 ±     38.463  ops/s
small_operation:list                        100000  thrpt   10     1876.287 ±     27.674  ops/s
small_operation:sequence                    100000  thrpt   10     2413.689 ±     48.375  ops/s

这其中可以分为4个基准测试结果。

较多的中间操作、元素数量100:

Test	ops/s
flow	158,615
list	309,514
sequence	297,239

结论: List (+4%) > Sequence (+87%) > Flow

较多的中间操作、元素数量100,000:

Test	ops/s
flow	198
list	248
sequence	304

结论: Sequence (+22%) > List (+26%) > Flow

较少的中间操作、元素数量100:

Test	ops/s
flow	1,343,646
list	1,905,597
sequence	2,333,803

结论: Sequence (+22%) > List (+42%) > Flow

较少的中间操作、元素数量100,000:

Test	ops/s
flow	2231
list	1876
sequence	2413

结论: Sequence (+8%) > Flow (+19%) > List

最终总结

不论是基于原博主的基准测试，还是基于我修改后的基准测试，最终结果都是一致的：当元素数量越多或中间操作越多，Sequence 的优化效果越明显。

这个结论符合直觉，也有基准测试的结果做支持，因此不难理解。

Sequence 的每一个中间操作基本上都不会增加额外产出的列表，我们以 100,000 个元素的情况为例， List 进行3次中间操作，就会遍历3次，也就是 3*100,000 次元素迭代，并且会产生3个额外的 List。而 Sequence 只需要遍历一次，也没有额外的 List 出现，不仅节省了内存占用，也有更少的迭代次数。

回到最初的标题，我想以我的观点和结论，我认为：Kotlin Sequence 并非如此不堪。 在正确应用它的前提下，它的确可以带来性能上的优化。而至于为什么原博客的结论是相反的，说实话，我也摸不到头脑。我前往了原博主在bluesky上发的原贴，没有看到有人做同样的测试，也没有看到有人提出质疑。

我也发了一个评论提出了不同的观点，不过此时距离他的原推文发布已经过去二十多天了，我也暂时没有得到他的回复，因此...这就是又一个悬而未决的疑点了。