第5章：并发与竞态条件-13：Fine- Versus Coarse-Grained Locking

In continuation of the previous text第5章：并发与竞态条件-12：Locking Traps , let's GO ahead.

Fine- Versus Coarse-Grained Locking

The first Linux kernel that supported multiprocessor systems was 2.0; it contained exactly one spinlock. The big kernel lock turned the entire kernel into one large criti cal section; only one CPU could be executing kernel code at any given time. This locksolved the concurrency problem well enough to allow the kernel developers to address all of the other issues involved in supporting SMP. But it did not scale very well. Even a two-processor system could spend a significant amount of time simply waiting for the big kernel lock. The performance of a four-processor system was not even close to that of four independent machines.

首个支持多处理器系统的 Linux 内核版本是 2.0，该版本仅包含一把自旋锁 ------"大内核锁（Big Kernel Lock，BKL）"。这把锁将整个内核变成了一个巨大的临界区：任意时刻仅允许一个 CPU 执行内核代码。这把锁虽能解决并发问题，让内核开发者得以处理支持 SMP（对称多处理）涉及的其他所有问题，但它的可扩展性极差：即便是双处理器系统，也会有大量时间耗费在等待大内核锁上；四处理器系统的性能甚至远不及四台独立机器的总和。

So, subsequent kernel releases have included finer-grained locking. In 2.2, one spin lock controlled access to the block I/O subsystem; another worked for networking, and so on. A modern kernel can contain thousands of locks, each protecting one small resource. This sort of fine-grained locking can be good for scalability; it allows each processor to workon its specific taskwithout contending for locks used by other processors. Very few people miss the big kernel lock.

因此，后续内核版本引入了更细粒度的锁机制。在 2.2 版本中，一把自旋锁控制块 I/O 子系统的访问，另一把管控网络子系统，依此类推。现代内核中可能包含数千把锁，每把锁仅保护一个小型资源。这种细粒度锁有助于提升可扩展性：它允许每个处理器专注处理自身特定任务，无需争抢其他处理器使用的锁。如今几乎没人会怀念大内核锁了。

Fine-grained locking comes at a cost, however. In a kernel with thousands of locks, it can be very hard to know which locks you need---and in which order you should acquire them---to perform a specific operation. Remember that locking bugs can be very difficult to find; more locks provide more opportunities for truly nasty locking bugs to creep into the kernel. Fine-grained locking can bring a level of complexity that, over the long term, can have a large, adverse effect on the maintainability of the kernel.

但细粒度锁也需付出代价：在内核包含数千把锁的情况下，要完成特定操作，开发者很难明确知道需要获取哪些锁、以及获取的顺序。要知道，锁相关的 bug 极难排查；锁的数量越多，恶性锁 bug 潜入内核的概率就越大。从长期来看，细粒度锁带来的复杂度会严重影响内核的可维护性。

Locking in a device driver is usually relatively straightforward; you can have a single lockthat covers everything you do, or you can create one lockfor every device you manage. As a general rule, you should start with relatively coarse locking unless you have a real reason to believe that contention could be a problem. Resist the urge to optimize prematurely; the real performance constraints often show up in unex pected places.

设备驱动中的锁机制通常相对简单：你可以用一把锁覆盖所有操作，也可以为管理的每个设备各创建一把锁。作为通用原则，除非有确凿理由认为锁竞争会成为性能瓶颈，否则应从粗粒度锁开始设计。要抵制 "过早优化" 的冲动 ------ 真正的性能瓶颈往往出现在意想不到的地方。

If you do suspect that lockcontention is hurting performance, you may find the lock meter tool useful. This patch (available at http://oss.sgi.com/projects/lockmeter/) instruments the kernel to measure time spent waiting in locks. By looking at the report, you are able to determine quickly whether lock contention is truly the prob lem or not.

如果你怀疑锁竞争正在影响性能，可使用 lockmeter 工具辅助分析。这个补丁（可从 http://oss.sgi.com/projects/lockmeter/ 获取）会为内核植入检测逻辑，统计等待锁的耗时。通过查看其生成的报告，你能快速判断锁竞争是否真的是性能问题的根源。

补充说明：

粗粒度锁与细粒度锁的核心对比

特性	粗粒度锁	细粒度锁
锁的数量	少（1~ 几把）	多（每个小资源一把）
实现复杂度	低（易设计、易维护）	高（易出错、难调试）
锁竞争概率	高（临界区大，冲突多）	低（临界区小，冲突少）
可扩展性	差（SMP 系统性能瓶颈）	好（适配多核心高并发）
死锁风险	低（锁数量少，排序简单）	高（锁数量多，排序复杂）
上下文切换开销	低（锁持有时间长，切换少）	高（锁持有时间短，切换频繁）
内存开销	低（锁结构少）	高（锁结构多）
适用场景	简单驱动、低并发场景、资源关联性强	高性能驱动、高并发场景、资源可拆分
调试难度	低（问题定位集中）	高（问题分散，锁交互复杂）

设备驱动的锁粒度选择建议
- 入门阶段 / 简单驱动：优先使用粗粒度锁（如每个设备一把自旋锁 / 信号量），降低开发和调试成本；
- 高性能 / 高并发驱动：仅在实测确认锁竞争导致性能下降时，才拆分锁为细粒度（如拆分 "设备全局锁" 为 "寄存器锁""数据缓冲区锁" 等）；
- 避免过度拆分：即使是细粒度锁，也需保证逻辑关联性（如同一功能的资源共用一把锁），防止锁数量爆炸。
大内核锁（BKL）的历史定位BKL 是 Linux 内核早期的妥协方案，其设计目标是 "快速实现 SMP 支持" 而非 "高性能"。现代内核中 BKL 已被逐步移除，仅少数遗留代码仍在使用；新驱动绝对禁止依赖 BKL，需自行实现专属锁机制。
lockmeter 工具的替代方案随着内核版本迭代，lockmeter 已不再维护，现代内核可通过以下方式分析锁竞争：
- perf lock：内核自带的性能分析工具，可统计锁的等待时间、竞争次数；
- ftrace：跟踪锁的获取 / 释放流程，定位锁持有时间过长或竞争频繁的代码段；
- 内核动态调试（dynamic_debug）：打印锁的获取 / 释放日志，分析锁使用逻辑。
过早优化的风险

驱动开发中，"为避免锁竞争而提前拆分细粒度锁" 往往会引入更多 bug（如死锁、锁排序错误），且多数情况下这些优化是无意义的 ------ 实际场景中锁竞争可能从未发生。正确的做法是：先实现粗粒度锁，通过性能测试定位瓶颈后，再针对性优化。