linux tag: v6.8-rc1

split lock 作为 bus lock 中的一种，常常会影响系统性能。Intel CPU 提供了一种机制可以对其进行检测，在其发生的时候发出警告或 SIGBUS，以提醒开发者优化。

1. bus lock

1.1 bus lock 介绍

SDM vol3, 9.1.2 Bus Locking

Intel 64 and IA-32 processors provide a LOCK# signal that is asserted automatically during certain critical memory operations to lock the system bus or equivalent link. Assertion of this signal is called a bus lock. While this output signal is asserted, requests from other processors or bus agents for control of the bus are blocked.

Intel 64 和 IA-32 处理器提供 LOCK# 信号，该信号在某些关键内存操作期间自动断言，以锁定系统总线或等效链路。此信号的断言称为总线锁 (bus lock) 。给总线加锁后，来自其他处理器或总线代理的用于控制总线的请求会被阻止。程序中可以在指令之前使用 LOCK 前缀来产生 LOCK# 信号。比如：

c 复制代码

asm volatile ("lock addl $1, %0\n\t"
               : "=m" (*ptr_temp));

对于大多数 x86 处理器中，比如 P6 和更新的处理器系列，如果加锁访问的内存被缓存在处理器内部，则 LOCK# 信号通常不被断言 (即不会触发 bus lock)，而只会锁处理器的缓存。

以下两种情形时，处理器会产生 bus lock。

split lock: 加锁访问多个 cache line。
UC lock: 加锁访问的 memory type 是 WB 以外的类型, 包括 UC, WC, WP 和 WT 等。由于其中最常见的是访问未缓存地址的内存 (Strong Uncacheable) ，因此叫 UC lock。

几种 memory type 的介绍如下: (SDM vol3, 12.3 METHODS OF CACHING AVAILABLE)

Strong Uncacheable (UC ) ---System memory locations are not cached. All reads and writes appear on the system bus and are executed in program order without reordering. This type of cache-control is useful for memory-mapped I/O devices.
Uncacheable (UC-) --- Has same characteristics as the strong uncacheable (UC) memory type, except that this memory type can be overridden by programming the MTRRs for the WC memory type.
Write Combining (WC) --- System memory locations are not cached (as with uncacheable memory) and coherency is not enforced by the processor's bus coherency protocol.
Write-through (WT) --- Writes and reads to and from system memory are cached. All writes are written to a cache line (when possible) and through to system memory.
Write-back (WB ) --- Writes and reads to and from system memory are cached. writes are performed entirely in the cache, when possible.
Write protected (WP) --- Reads come from cache lines when possible, and read misses cause cache fills.

加锁访问未缓存的内存时，会产生 bus lock 很好理解，因为要独占内存总线，确保内存访问的原子性。而访问缓存中跨 cache line 的数据为什么要锁总线，可以参考文章深入剖析 split locks，i++ 可能导致的灾难-深入剖析是什么意思。主要原因就是 split lock 会退化成 bus lock。

Intel 为了优化总线锁导致的性能问题，在 P6 后的处理器上，引入了缓存锁 (cache locking) 机制：通过缓存一致性协议保证多个 CPU core 访问跨 cache line 的内存地址的多次访问的原子性与一致性，而不需要锁内存总线。由于缓存一致性协议的粒度是一个 cache line，当原子操作的数据跨 cache line 时，依赖缓存锁机制无法保证数据一致性，会退化为总线锁来保证一致性，这种情况就是 split lock。

1.2 bus lock disable

由于 bus lock 常常会影响系统性能，Intel 处理器提供两种特性以支持关闭 bus lock。这两个分别是 split-lock disable 和 UC-lock disable。

可通过枚举 IA32_CORE_CAPABILITIES (MSR index CFH) 查看当前处理器是否支持对应的特性。

然后通过配置 MSR_MEMORY_CTRL (MSR index 33H) 中 bit28, 29 可分别开启/关闭相应特性。

开启 split-lock disable 后，split lock 会产生异常 #AC (alignment check exception) ，而不会触发 bus lock。
开启 UC-lock disable 后，UC lock 会产生异常 #GP (general-protection exception) 。
- 未来采用新架构的 Intel 至强处理器 Sierra Forest microarchitecture 或 Grand Ridge microarchitecture, 将通过 CPUID. (EAX=07H, ECX=2): EDX[bit 6] 枚举是否支持 UC-lock disable，并使用 #AC 。
- 而且 IA32_CORE_CAPABILITIES[4] 和 CPUID. (EAX=07H, ECX=2): EDX[bit 6] 不会同时为 1，即处理器只采用其中一种方式枚举该特性。

1.3 `#AC` 的补充

除了作为 split-lock disable 的异常信号之外， #AC 的传统用法是作为用户空间的对齐检查异常 (check alignment of data)，当配置为 CR0.AM = 1, EFLAGS.AC = 1, and CPL = 3 时可生效。因此一共有两类 #AC：

legacy alignment check #AC
split lock #AC

Interrupt 17---Alignment Check Exception (#AC)

2. split lock detection

如前面所说，split lock 是 bus lock 中的一种。借助 split-lock disable 特性中的异常信号 #AC，可以检测 split lock。此时 bus lock 还未发生。如果在 #AC handler 中关闭 split-lock disable，则结束异常处理后返回异常点重新执行时，才将发生 bus lock。

对于用户态的 bus lock，可以通过 #DB 进行检测。具体来说，CPL > 0 时触发的 bus lock，可以通过置上 IA32_DEBUGCTL.BLD[bit 2] 开启特性 OS bus-lock detection ，以在 bus lock 发生时产生对应的异常信号 #DB, 即 bus-lock detection debug exception (见 SDM vol3, 18.3.1.6 OS Bus-Lock Detection)。

2.1 参数 split_lock_detect

split lock detection (SLD) 和 bus lock detection 相关的 patch 早已由 Fenghua 先后提交到了社区并合入 linux。根据 kernel parameter split_lock_detect 指定的值，设置不同的检测等级，然后在 #AC 和 #DB 处理函数中分别实现 split lock detection / bus lock detection 的不同行为。

c 复制代码

// Documentation/admin-guide/kernel-parameters.txt
// Documentation/arch/x86/buslock.rst
Software handling
=================
The kernel #AC and #DB handlers handle bus lock based on the kernel
parameter "split_lock_detect". Here is a summary of different options:
+------------------+----------------------------+-----------------------+
|split_lock_detect=|#AC for split lock          |#DB for bus lock       |
+------------------+----------------------------+-----------------------+
|off               |Do nothing                  |Do nothing             |
+------------------+----------------------------+-----------------------+
|warn              |Kernel OOPs                 |Warn once per task and |
|(default)         |Warn once per task, add a   |and continues to run.  |
|                  |delay, add synchronization  |                       |
|                  |to prevent more than one    |                       |
|                  |core from executing a       |                       |
|                  |split lock in parallel.     |                       |
|                  |sysctl split_lock_mitigate  |                       |
|                  |can be used to avoid the    |                       |
|                  |delay and synchronization   |                       |
|                  |When both features are      |                       |
|                  |supported, warn in #AC      |                       |
+------------------+----------------------------+-----------------------+
|fatal             |Kernel OOPs                 |Send SIGBUS to user.   |
|                  |Send SIGBUS to user         |                       |
|                  |When both features are      |                       |
|                  |supported, fatal in #AC     |                       |
+------------------+----------------------------+-----------------------+
|ratelimit:N       |Do nothing                  |Limit bus lock rate to |
|(0 < N <= 1000)   |                            |N bus locks per second |
|                  |                            |system wide and warn on|
|                  |                            |bus locks.             |
+------------------+----------------------------+-----------------------+

对于 SLD，有三个等级。

split_lock_detect=off：关闭 split lock 检测。
split_lock_detect=warn：
- 若 split lock 被 kernel 触发，kernel OOPS。
- 若被 user space 触发，对每个任务仅触发一次 warn，然后关闭 split-lock disable，允许 split lock，但会将系统延迟增大。以引起开发者注意，提醒他们优化会产生 split lock 的程序。使用 kernel parameter split_lock_mitigate=0 可强行消除 SLD 制造的延迟。
split_lock_detect=fatal：
- 若 split lock 被 kernel 触发，kernel OOPS。
- 若被 user space 触发，对产生 split lock 的用户程序发送 SIGBUG，并生成 fatal 信息。

2.2 exc_alignment_check()

借助 DEFINE_IDTENTRY_ERRORCODE 设置 #Ac 异常处理函数 exc_alignment_check

宏定义 DEFINE_IDTENTRY_ERRORCODE(func) 主要新建 IDT 表项的入口函数 func，然后调用 __##func

c 复制代码

#define DEFINE_IDTENTRY_ERRORCODE(func)					\
static __always_inline void __##func(struct pt_regs *regs,		\
				     unsigned long error_code);		\
									\
__visible noinstr void func(struct pt_regs *regs,			\
			    unsigned long error_code)			\
{									\
	irqentry_state_t state = irqentry_enter(regs);			\
									\
	instrumentation_begin();					\
	__##func (regs, error_code);					\
	instrumentation_end();						\
	irqentry_exit(regs, state);					\
}									\
									\
static __always_inline void __##func(struct pt_regs *regs,		\
				     unsigned long error_code)

定义 #Ac 异常处理函数。

c 复制代码

// 定义中断向量 `X86_TRAP_AC` 对应的处理函数为 `exc_alignment_check`
DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_AC, exc_alignment_check);

// 定义 exc_alignment_check() 和 __exc_alignment_check()
DEFINE_IDTENTRY_ERRORCODE(exc_alignment_check)
{
	char *str = "alignment check";

	if (notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_AC, SIGBUS) == NOTIFY_STOP)
		return;
	// #AC 发生在 kernal space, 表示检测到 split lock，则 kernal OOPS
	if (!user_mode(regs))
		die("Split lock detected\n", regs, error_code);
	
	// 执行到此处，表示 #AC 发生在 user space
	local_irq_enable();
	
	// 若已经处理 user split lock, 直接 out
	// 若没有处理，发送 SIGBUG 到触发 #AC 的程序
	if (handle_user_split_lock(regs, error_code))
		goto out;

	do_trap(X86_TRAP_AC, SIGBUS, "alignment check", regs,
		error_code, BUS_ADRALN, NULL);

out:
	local_irq_disable();
}

// true 表示触发 #AC 的是 userspace 的 split lock，`split_lock_detect` 配置为 warn, 已执行处理逻辑
bool handle_user_split_lock(struct pt_regs *regs, long error_code)
{
	// 如果 #AC (1) 用于数据对齐检查，或者，(2) 用于 SLD 且 split_lock_detect==fatal,
	// 直接返回 false
	if ((regs->flags & X86_EFLAGS_AC) || sld_state == sld_fatal)
		return false;
	// 表示 split_lock_detect==warn，执行 SLD 中 warn 的逻辑
	split_lock_warn(regs->ip);
	return true;
}

2.3 split_lock_warn()

split lock 检测等级为 warn 的处理逻辑，主要包括一些警告信息的打印，和通过休眠一段时间当前线程来增加延迟，然后关闭 SLD，当 split lock 发生后再重新开启。在关闭 SLD 的时间段内，使用锁来防止其他 core 也关闭 SLD。

c 复制代码

static void split_lock_warn(unsigned long ip)
{
	struct delayed_work *work;
	int cpu;
	// 对每个 task, 只警告一次 split lock
	if (!current->reported_split_lock)
		pr_warn_ratelimited("#AC: %s/%d took a split_lock trap at address: 0x%lx\n",
				    current->comm, current->pid, ip);
	current->reported_split_lock = 1;
	// 默认 sysctl_sld_mitigate 为 1, 开始增加延迟的流程
	if (sysctl_sld_mitigate) {
		/*
		 * misery factor #1:
		 * sleep 10ms before trying to execute split lock.
		 */
		// 执行 split lock 之前，先休眠 10 ms
		if (msleep_interruptible(10) > 0)
			return;
		/*
		 * Misery factor #2:
		 * only allow one buslocked disabled core at a time.
		 */
		// 同一时间只允许一个 core 上完成 split lock，即对关闭 SLD 的这段时间加锁
		if (down_interruptible(&buslock_sem) == -EINTR)
			return;
		// 设置执行完 split lock 的工作：主要是再次开启 SLD，buslock_sem + 1 以释放锁。
		work = &sl_reenable_unlock;
		
	// 若要求 SLD 不要产生额外的延迟
	} else {
		// 设置执行完 split lock 的工作：主要是再次开启 SLD 
		work = &sl_reenable;
	}

	cpu = get_cpu();
	// 设置延迟调度的任务。
	schedule_delayed_work_on(cpu, work, 2);

	/* Disable split lock detection on this CPU to make progress */
	// 关闭 SLD, 使得 split lock 能顺利完成。
	sld_update_msr(false);
	put_cpu();
}

sl_reenable_unlock 和 sl_reenable 两者功能相近，只是前者多了释放锁的操作而已。

c 复制代码

static void __split_lock_reenable_unlock(struct work_struct *work)
{
	sld_update_msr(true);
	up(&buslock_sem);
}
static DECLARE_DELAYED_WORK(sl_reenable_unlock, __split_lock_reenable_unlock);

static void __split_lock_reenable(struct work_struct *work)
{
	sld_update_msr(true);
}
static DECLARE_DELAYED_WORK(sl_reenable, __split_lock_reenable);

// 开启/关闭 SLD 的功能函数，主要是置上/清除 MSR_MEMORY_CTRL[29]
// (这里的 MSR_TEST_CTRL 就是 MSR_MEMORY_CTRL)
static void sld_update_msr(bool on)
{
	u64 test_ctrl_val = msr_test_ctrl_cache;

	if (on)
		test_ctrl_val |= MSR_TEST_CTRL_SPLIT_LOCK_DETECT;

	wrmsrl(MSR_TEST_CTRL, test_ctrl_val);
}

3. split lock detection 虚拟化

SLD 虚拟化为 guest 中的 split lock 的检测提供处理逻辑。patch 分为两大部分，由两个 patchset 发到社区。

basic split lock #AC handling
virtualization support of split lock detection

3.1 basic split lock `#AC` handling

第一部分的 patch 提供了对 guest 中 legacy alignment check #AC 的支持，而将 guest 中触发的 split lock #AC 简单看作 host 的 user space #AC，全部交由 KVM 处理。这部分已被合进 linux。[patch 0/3] x86/kvm: Basic split lock #AC handling - Thomas Gleixner

AC_VECTOR

当 guest 中 #AC 触发 VM-exit 之后，进入 handle_exception_nmi 选择对应异常的处理路径。首先由 vmx_guest_inject_ac 判断是否将 #AC 注入 guest, 若否，则直接由 KVM 处理，调用 handle_guest_split_lock；若是，则在 guest 中通过 #AC 的处理函数 exc_alignment_check 处理。

c 复制代码

static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = {
	[EXIT_REASON_EXCEPTION_NMI] = handle_exception_nmi
}

// arch/x86/kvm/vmx/vmx.c
static int handle_exception_nmi(struct kvm_vcpu *vcpu)
	case AC_VECTOR:
		// 判断是否注入 #AC 到 guest
		if (vmx_guest_inject_ac(vcpu)) {
			// 注入 #AC
			kvm_queue_exception_e(vcpu, AC_VECTOR, error_code);
			return 1;
		}

		/*
		 * Handle split lock. Depending on detection mode this will
		 * either warn and disable split lock detection for this
		 * task or force SIGBUS on it.
		 */
		// 由 KVM 处理 #AC
		if (handle_guest_split_lock(kvm_rip_read(vcpu)))
			return 1;
		fallthrough;

vmx_guest_inject_ac()

为了只将 legacy alignment check #AC 注入 guest，需判断异常的产生源。

当 host 不支持 SLD，那么 #AC 肯定是对齐检查产生的，因为 guest 无法配置 MSR_MEMORY_CTRL[29] 来开启/关闭 split-lock disable。直接返回 false。
当 host 支持 SLD，若为 legacy alignment check #AC ，返回 true。

c 复制代码

/*
 * If the host has split lock detection disabled, then #AC is
 * unconditionally injected into the guest, which is the pre split lock
 * detection behaviour.
 *
 * If the host has split lock detection enabled then #AC is
 * only injected into the guest when:
 *  - Guest CPL == 3 (user mode)
 *  - Guest has #AC detection enabled in CR0
 *  - Guest EFLAGS has AC bit set
 */
bool vmx_guest_inject_ac(struct kvm_vcpu *vcpu)
{
	// host 不支持 split lock detection, 直接将 #AC 注入 guest
	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
		return true;
	// legacy alignment check, 返回 true, `#AC` 注入 guest
	return vmx_get_cpl(vcpu) == 3 && kvm_is_cr0_bit_set(vcpu, X86_CR0_AM) &&
	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
}

handle_guest_split_lock()

KVM 处理 guest split lock #AC 流程。

c 复制代码

bool handle_guest_split_lock(unsigned long ip)
{
	// 若 host 中设置的 split_lock_detect==warn, 则进入 warn 的处理逻辑 split_lock_warn
	if (sld_state == sld_warn) {
		split_lock_warn(ip);
		return true;
	}
	// 否则，发送 SIGBUS 给 guest，这将导致 VM 被 killed
	pr_warn_once("#AC: %s/%d %s split_lock trap at address: 0x%lx\n",
		     current->comm, current->pid,
		     sld_state == sld_fatal ? "fatal" : "bogus", ip);

	current->thread.error_code = 0;
	current->thread.trap_nr = X86_TRAP_AC;
	force_sig_fault(SIGBUS, BUS_ADRALN, NULL);
	return false;
}

3.2 virtualization of split lock detection

tag: v5.9-rc1

第二部分的 patch 采用 PV 的方式，提供了对 guest 中 split lock #AC 的支持。可以在 guest 中配置 split_lock_detect，设置检测等级。可惜还没有被社区接收，最后一个版本 v10 是基于 v5.9-rc1。[PATCH v10 0/9] KVM: Add virtualization support of split lock detection - Xiaoyao Li

SLD 比较特殊，开启/关闭 SLD 的寄存器 MSR_MEMORY_CTRL 的控制范围是整个 core。当支持超线程 (SMT) 时，一个 core 上有两个线程，即 pCPU1, pCPU2。如果 vCPU (假设对应 pCPU1) 关闭 SLD 会使得 pCPU2 上运行的任务也受到影响。

该寄存器 per-core scope 的特点使得 SLD 虚拟化有很多额外的限制，得考虑 host 是否开启超线程，即 smt/nosmt，以及 host SLD 的配置。

当 host SLD 为 off, guest 不支持 SLD。
当 host SLD 为 warn,
- host 为 nosmt，则 guest 支持 SLD，可设为 off/warn/fatal。
- host 为 smt, 则 guest 不支持 SLD。
当 host SLD 为 fatal, guest 支持 SLD, 但仅可设为 off/fatal。

暴露 SLD feature bit 给 guest

在代码实现中，首先提供了两个 KVM feature bit 给 guest, 用于表示 KVM 是否支持 SLD，是否仅支持设为 fatal。guest 可通过 CPUID.40000001H:EAX[15], CPUID.40000001H:EDX[1] 枚举。

bash 复制代码

#define KVM_FEATURE_SPLIT_LOCK_DETECT	15 # 置位表示 KVM 支持 SLD
#define KVM_HINTS_SLD_FATAL	1 # 置位表示 KVM 仅支持 guest 将 SLD 设为 fatal, 反之则不限制

另外有两个 feature flag，分别表示当前系统是否开启 SLD，以及等级是否为 fatal。

bash 复制代码

#define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
#define X86_FEATURE_SLD_FATAL		(11*32+ 7) /* split lock detection in fatal mode */

当 userspace VMM 通过 ioctl KVM_GET_SUPPORTED_CPUID 获取 KVM 支持的 CPUID，以便接下来配置 guest CPUID 时，配置 SLD 相关的 KVM feature bit。

c 复制代码

kvm_dev_ioctl
	default:
	kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
		case KVM_GET_SUPPORTED_CPUID:
		case KVM_GET_EMULATED_CPUID:
		kvm_dev_ioctl_get_cpuid
			get_cpuid_func
				do_cpuid_func
					__do_cpuid_func

static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
{
	case KVM_CPUID_FEATURES:
		// ...
 
		// 若 KVM 支持 SLD，暴露 KVM_FEATURE_SPLIT_LOCK_DETECT 给 guest
		if (kvm_split_lock_detect_supported())
			entry->eax |= (1 << KVM_FEATURE_SPLIT_LOCK_DETECT);
		
		// 若 host 为 SLD fatal, 则 KVM 仅支持 guest 为 SLD fatal。
		// 暴露 KVM_HINTS_SLD_FATAL 给 guest
		if (boot_cpu_has(X86_FEATURE_SLD_FATAL))
			entry->edx |= (1 << KVM_HINTS_SLD_FATAL);
 		break;
}

// KVM 支持 SLD，如前所述，需满足:
// (1) host SLD fatal, 或
// (2) host SLD warn 且 nosmt
static inline bool kvm_split_lock_detect_supported(void)
{
	return boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) &&
	       (boot_cpu_has(X86_FEATURE_SLD_FATAL) ||
		!cpu_smt_possible());
}

split_lock_setup()

配置 SLD 的函数会调用两次，第一次是 kernel 启动过程中的配置流程之一，第二次仅会发生在 guest 中。

c 复制代码

setup_arch
	early_cpu_init
		early_identify_cpu
			cpu_set_core_cap_bits
				split_lock_setup(false); // kernel 配置 SLD
	x86_init.hyper.guest_late_init(); // 仅发生在 guest 中
		==> kvm_guest_init
				// 根据 KVM feature bit，调用 `split_lock_setup` 设置 guest SLD
				if (kvm_para_has_feature(KVM_FEATURE_SPLIT_LOCK_DETECT))
					split_lock_setup(kvm_para_has_hint(KVM_HINTS_SLD_FATAL));

// 若 KVM 仅支持 SLD fatal, 即 KVM_HINTS_SLD_FATAL，guest 无法配置 SLD 等级，直接为 fatal。
// 同时配置 feature flag `X86_FEATURE_SLD_FATAL`
void __init split_lock_setup(bool fatal)
{
	if (fatal) {
		state = sld_fatal;
		pr_info("forced on, sending SIGBUS on user-space split_locks\n");
		goto set_cap;
	}
	// ...
	
set_cap:
	setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT);
	if (state == sld_fatal)
		setup_force_cpu_cap(X86_FEATURE_SLD_FATAL);
}

guest_inject_ac()

注: 旧版 kernel 使用的 guest_inject_ac，对应新版 kernel 的 vmx_guest_inject_ac。

当 guest #AC 触发 VM-exit 之后，KVM 对异常注入回 guest 的判断增加了 guest SLD 的检测。当 host SLD 开启时，满足以下条件时注入回 guest：

legacy alignment check #AC.
guest 开启了 SLD，可能为 SLD #AC.（新增）

c 复制代码

/*
 * If the host has split lock detection disabled, then #AC is
 * unconditionally injected into the guest, which is the pre split lock
 * detection behaviour.
 *
 * If the host has split lock detection enabled then #AC is
 * injected into the guest when:
 * 1) guest has alignment check enabled;
 * or 2) guest has split lock detection enabled;
 */
static inline bool guest_inject_ac(struct kvm_vcpu *vcpu)
{
	if (!boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
		return true;

	/*
	 * A split lock access must be an unaligned access, so we should check
	 * guest_cpu_alignent_check_enabled() fisrt.
	 */
	return guest_alignment_check_enabled(vcpu) || guest_sld_on(to_vmx(vcpu));
}

两个判断条件。

c 复制代码

static inline bool guest_alignment_check_enabled(struct kvm_vcpu *vcpu)
{
	return vmx_get_cpl(vcpu) == 3 && kvm_read_cr0_bits(vcpu, X86_CR0_AM) &&
	       (kvm_get_rflags(vcpu) & X86_EFLAGS_AC);
}

// KVM 检查 guest 是否开启 SLD，可通过记录的寄存器 bit29 的值判断。
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT	29
#define MSR_TEST_CTRL_SPLIT_LOCK_DETECT		BIT(MSR_TEST_CTRL_SPLIT_LOCK_DETECT_BIT)
static inline bool guest_sld_on(struct vcpu_vmx *vmx)
{
	return vmx->msr_test_ctrl & MSR_TEST_CTRL_SPLIT_LOCK_DETECT;
}

KVM 模拟 MSR_MEMORY_CTRL

guest 开启 SLD 时必然涉及 MSR_MEMORY_CTRL[29] 的读写，访问时触发 VM-exit，然后由 KVM 模拟行为。

在 KVM 更新寄存器的值时，guest 的 MSR_MEMORY_CTRL[29] 并不定会在所有情况下都写入真实的 MSR 中，例如 guest SLD fatal 就不会更新真实 MSR。但都会记录在 vcpu_vmx->msr_test_ctrl 中。

c 复制代码

static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
	struct vcpu_vmx *vmx = to_vmx(vcpu);
	switch (msr_index) {
	case MSR_TEST_CTRL:
		vmx->msr_test_ctrl = data;
		// 更新 MSR_MEMORY_CTRL，并不一定写入真实 MSR
		vmx_update_guest_sld(vmx);

		break;
}

static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
{
	switch (msr_info->index) {
	case MSR_TEST_CTRL:
		msr_info->data = vmx->msr_test_ctrl;
		break;
}

另外，为了让 vCPU 切换到其他 pCPU 上时，或者同一 pCPU 上 vCPU 和其他任务切换时，不改变 host 的 split lock 设置，需要将真实 MSR_MEMORY_CTRL 恢复回 host 配置的状态。

c 复制代码

struct vcpu_vmx {
	bool                  guest_has_sld; // KVM 记录 guest 是否开启 SLD
	bool                  host_sld_on; // KVM 记录 host 是否开启 SLD
	
	u64                   msr_test_ctrl; // KVM 记录 guest 视角的 MSR_MEMORY_CTRL
}

当 userspace VMM 通过 ioctl KVM_SET_CPUID 配置 vCPU 支持的 feature 时，检查 guest 是否开启 SLD，并记录到 vcpu_vmx->guest_has_sld。

c 复制代码

kvm_vcpu_ioctl
	default:
	kvm_arch_vcpu_ioctl
		case KVM_SET_CPUID:
		kvm_vcpu_ioctl_set_cpuid
			kvm_vcpu_ioctl_set_cpuid2
				kvm_vcpu_after_set_cpuid
					kvm_x86_ops.vcpu_after_set_cpuid
					==> vmx_vcpu_after_set_cpuid
// arch/x86/kvm/vmx/vmx.c
static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
{
	struct vcpu_vmx *vmx = to_vmx(vcpu);
	vmx->guest_has_sld = false;
	// 若 KVM 支持 SLD
	if (kvm_split_lock_detect_supported()) {
		best = kvm_find_cpuid_entry(vcpu, KVM_CPUID_FEATURES, 0);
		// 查看 guest 是否开启 SLD
		if (best && (best->eax & 1 << KVM_FEATURE_SPLIT_LOCK_DETECT))
			vmx->guest_has_sld = true;
	}
}

guest 和 host 相互切换时，相应地修改 MSR_MEMORY_CTRL。

c 复制代码

// guest 切换到 host, 恢复 host MSR_MEMORY_CTRL[29]
vmx_prepare_switch_to_host
	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && vmx->guest_has_sld)
		split_lock_restore_host(vmx->host_sld_on);

// host 切换到 guest, 记录 host MSR_MEMORY_CTRL[29]，恢复 guest MSR_MEMORY_CTRL[29]
vmx_prepare_switch_to_guest
	if (static_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT) && vmx->guest_has_sld)
		vmx->host_sld_on = split_lock_set_guest(guest_sld_on(vmx));

4. patches

split lock detection:

Bare metal:

[PATCH v9 00/17] x86/split_lock: Enable split lock detection - Fenghua Yu

KVM:

[patch 0/3] x86/kvm: Basic split lock #AC handling - Thomas Gleixner

[PATCH v10 0/9] KVM: Add virtualization support of split lock detection - Xiaoyao Li

QEMU:

[Qemu-devel] [PATCH v2] target/i386: define a new MSR based feature word

或：[PATCH v2] target/i386: define a new MSR based feature word - FEAT_CORE_CAPABILITY - Xiaoyao Li

bus lock detection:

Bare metal:

[PATCH v6 0/3] x86/bus_lock: Enable bus lock detection - Fenghua Yu

[PATCH 0/4] x86/bus_lock: Set rate limit for bus lock - Fenghua Yu

5. 如何测试

检测 split lock

【翻译】split lock检测与处理-阿里云开发者社区

bash 复制代码

### 在现有的cpu上检测 split lock
### 其中 event=0xf4, umask=0x10 对应 PMU 事件sq_misc.split_lock。
perf stat -e cpu/event=0xf4,umask=0x10/ -a -I 1000

本文作者：文七安

本文链接：从 bus lock 到 split lock detection - 掘金 (juejin.cn)

版权声明：本博客所有文章除特别声明外，均采用 BY-NC-SA 许可协议。转载请注明出处！

从 bus lock 到 split lock detection

1. bus lock

1.1 bus lock 介绍

1.2 bus lock disable

1.3 #AC 的补充

2. split lock detection

2.1 参数 split_lock_detect

2.2 exc_alignment_check()

2.3 split_lock_warn()

3. split lock detection 虚拟化

3.1 basic split lock #AC handling

AC_VECTOR

vmx_guest_inject_ac()

handle_guest_split_lock()

3.2 virtualization of split lock detection

暴露 SLD feature bit 给 guest

split_lock_setup()

guest_inject_ac()

KVM 模拟 MSR_MEMORY_CTRL

4. patches

5. 如何测试

1.3 `#AC` 的补充

3.1 basic split lock `#AC` handling