再探volatile原理

volatile的探查

为什么有了MESI还需要volatile关键字

通过JITWatch查看汇编代码，观测jdk17改动对代码的影响

前言

对于javaer，volatile可能大多数情况出现在八股文面试中，平时业务代码中使用较少。但是这个确实是一个很好的钻研java的入口

一、volatile是什么

volatile是什么应该不用再多解释了。

MESI为了性能考虑，引入了storebuffer和invalidateQueue，导致可能出现乱序问题。
volatile关键字引入了内存屏障，软件层面通过强一致的顺序指定，解决了乱序和可见性问题。
但是因为写操作是非原子操作，所以还是没有办法保证线程安全。
最近读的这篇知乎上的文章也描述了上述的问题，但是比我之前的文章图片多，更容易理解。
它这篇文章里讲到通过jitWatch可以观测，发现实际java执行所谓的可见性问题，不一定是volatile导致的，有可能是C2编译导致的。本文因此进行复现，并探寻一下它文章留下的问题

二、复现步骤

2.1. 原生场景

java代码如下：

java 复制代码

public class TestVolatile {

    private static boolean running = true;

    public static void main(String[] args) throws InterruptedException {
        int i = 0;
        Thread thread = new Thread(() -> {
            try {
                Thread.sleep(1000L);
            } catch (InterruptedException e) {
            }
            running = false;
        });
        thread.setDaemon(true);
        thread.start();
        while (running) {
            i++;
        }
        System.err.println(i);
    }
}

JITWatch查看C2编译结果

a 复制代码

  0x0000000116d84c04: add    sp, sp, #0x10   ;*getstatic running {reexecute=0 rethrow=0 return_oop=0}
                                                ; - SimpleInliningTest::main@24 (line 20)

  0x0000000116d84c08: mov    x19, #0xb7a0                    // #47008
                                                ;   {oop(a 'java/lang/Class'{0x00000006b1e1b7a0} = 'SimpleInliningTest')}
  0x0000000116d84c0c: movk    x19, #0xb1e1, lsl #16
  0x0000000116d84c10: movk    x19, #0x6, lsl #32
  												;通过 mov 系列指令初始化 x19 寄存器，使其指向 SimpleInliningTest 类的元数据对象（注释中 oop(a 'java/lang/Class'...) 表明这是一个对象指针）。
  0x0000000116d84c14: ldrb    w10, [x19, #112]
  												;从类元数据偏移 112 字节处加载一个字节（ldrb）到 w10，这是静态变量 running 的值（boolean 类型占 1 字节）。
  0x0000000116d84c18: cbz    w10, 0x0000000116d84c2c
 												 ;条件跳转：如果 w10 为 0（即 running = false），跳转到 0x0000000116d84c2c（循环退出路径）；否则继续执行循环体。
  0x0000000116d84c1c: ldr    x10, [x28, #264]  ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                ; - SimpleInliningTest::main@33 (line 21)
                                                ;从线程局部存储（x28）偏移 264 字节处加载安全点轮询地址到 x10。

  0x0000000116d84c20: add    w21, w21, #0x1  ; ImmutableOopMap{}
                                                ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                ; - SimpleInliningTest::main@33 (line 21)
                                                ;循环计数器自增（w21 是计数器，对应 Java 中的 i++）。

  0x0000000116d84c24: ldr    wzr, [x10]      ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                ; - SimpleInliningTest::main@33 (line 21)
                                                ;   {poll}
                                                ;安全点轮询（{poll}）：访问 x10 指向的内存，确保 JVM 可中断循环，进入到safepoint。
  0x0000000116d84c28: b    0x0000000116d84c1c      ;*getstatic running {reexecute=0 rethrow=0 return_oop=0}
                                                ; - SimpleInliningTest::main@24 (line 20)
                                                ;无条件跳回循环开始处0x0000000116d84c1c
  0x0000000116d84c2c: mov    w1, #0xffffff7e                // #-130
  0x0000000116d84c30: mov    w29, w21
  0x0000000116d84c34: bl    0x000000010f2e8d00  ; ImmutableOopMap{}
                                                ;*getstatic running {reexecute=1 rethrow=0 return_oop=0}
                                                ; - SimpleInliningTest::main@24 (line 20)
                                                ;   {runtime_call UncommonTrapBlob}
  0x0000000116d84c38: dcps1    #0xdeae
  0x0000000116d84c3c: udf    #0

神奇的地方出现了，C2编译器显示的0x0000000116d84c28地址处的指令是跳转0x0000000116d84c1c，而不是跳转到0x0000000116d84c14处，所以此处该代码被C2编译之后，实际上就是一个死循环的逻辑了。

2.2. 加了volatile场景

java代码

java 复制代码

public class TestVolatile {

	// 注意这行改动
    private static volatile boolean running = true;

    public static void main(String[] args) throws InterruptedException {
        int i = 0;
        Thread thread = new Thread(() -> {
            try {
                Thread.sleep(1000L);
            } catch (InterruptedException e) {
            }
            running = false;
        });
        thread.setDaemon(true);
        thread.start();
        while (running) {
            i++;
        }
        System.err.println(i);
    }
}

JITWatch查看C2编译结果

asm 复制代码

0x0000000116194504: add sp, sp  #0x10   ;*getstatic running {reexecute=0 rethrow=0 return_oop=0}
                                ; - SimpleInliningTest::main@24 (line 20)
0x0000000116194508: mov x19  #0xb738                	// #46904
                             ;   {oop(a 'java/lang/Class'{0x00000006b1e1b738} = 'SimpleInliningTest')}
0x000000011619450c: movk x19  #0xb1e1, lsl #16
0x0000000116194510: movk x19  #0x6, lsl #32
									;通过 mov 系列指令初始化 x19 寄存器，使其指向 SimpleInliningTest 类的元数据对象（注释中 oop(a 'java/lang/Class'...) 表明这是一个对象指针）。
0x0000000116194514: add x20, x19  #0x70  ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                  ; - SimpleInliningTest::main@33 (line 21)
                                  ;x19 是类对象地址，偏移 #0x70（112 字节）处正是静态变量 running 的存储位置 ------ 此指令将 running 的地址存入 x20，避免后续循环中重复计算偏移；
0x0000000116194518: b 0x000000011619452c
								; 无条件跳转到「循环条件判断」指令（0x452c），开始首次循环的条件检查。
0x000000011619451c: nop
0x0000000116194520: ldr x10, [x28,  #264]
								;x28 是线程局部存储（TLS）指针，从其偏移 264 字节处加载安全点轮询地址到 x10；
0x0000000116194524: add w22, w22  #0x1  ; ImmutableOopMap{r19=Oop resp=Derived_oop_r19 }
                                  ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                  ; - SimpleInliningTest::main@33 (line 21)
                                  ;循环计数器自增（w22 = w22 + 1），对应 Java 代码中的 i++；
注释 ImmutableOopMap{r19=Oop ...}：标记此处仅 x19（类对象指针）是 OOP（对象指针），且无 OOP 被修改，GC 扫描时可跳过其他寄存器，优化回收效率。
0x0000000116194528: ldr wzr, [x10]  ;   {poll} *** SAFEPOINT POLL ***
															;从 x10 指向的安全点地址加载 32 位值到 wzr（零寄存器，写入后值被忽略）；
注释 {poll} *** SAFEPOINT POLL ***：明确这是安全点检查 ------JVM 需暂停线程时（如 GC、线程中断），会将 x10 指向的地址设为无效，触发内存异常使线程进入安全点，确保 JVM 能控制线程执行。
0x000000011619452c: ldarb w11, [x20]  ;*getstatic running {reexecute=0 rethrow=0 return_oop=0}
                                      ; - SimpleInliningTest::main@24 (line 20)
                                      ;ldarb 是「原子加载字节」指令（确保多线程下读取 running 的一致性），从 x20（running 地址）加载值到 w11（running 是 boolean 类型，占 1 字节）；
0x0000000116194530: cbnz w11, 0x0000000116194520  ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
                                                  ; - SimpleInliningTest::main@27 (line 20)
                                                  ;条件跳转核心------ 若 w11 非 0（即 running = true），跳回循环体开始处（0x4520）继续执行；若 w11 为 0（running = false），执行后续退出逻辑。
0x0000000116194534: mov w1  #0xffffff4d            	// #-179
0x0000000116194538: mov w29, w22
0x000000011619453c: str w11, [sp]
0x0000000116194540: bl 0x000000010e6f8d00  ; ImmutableOopMap{}
                                           ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
                                           ; - SimpleInliningTest::main@27 (line 20)
                                           ;   {runtime_call UncommonTrapBlob}

此时可以看到加了volatile之后，C2编译出来的代码，每次在执行将safepoint地址加载到x10寄存器时，都会执行0x000000011619452c-0x0000000116194530处代码。也就是真的会每次都重新加载下running的结果，进行判断

asm 复制代码

0x000000011619452c: ldarb w11, [x20]  ;*getstatic running {reexecute=0 rethrow=0 return_oop=0}
                                      ; - SimpleInliningTest::main@24 (line 20)
                                      ;ldarb 是「原子加载字节」指令（确保多线程下读取 running 的一致性），从 x20（running 地址）加载值到 w11（running 是 boolean 类型，占 1 字节）；
0x0000000116194530: cbnz w11, 0x0000000116194520  ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
                                                  ; - SimpleInliningTest::main@27 (line 20)
                                                  ;条件跳转核心------ 若 w11 非 0（即 running = true），跳回循环体开始处（0x4520）

2.3 使用native方法

java代码

java 复制代码

public class TestVolatile {

	// 注意此时的volatile关键字移除了
    private static boolean running = true;

    public static void main(String[] args) throws InterruptedException {
        int i = 0;
        Thread thread = new Thread(() -> {
            try {
                Thread.sleep(1000L);
            } catch (InterruptedException e) {
            }
            running = false;
        });
        thread.setDaemon(true);
        thread.start();
        while (running) {
            i++;
            // 此处增加一个没有业务意义的native方法执行（读过RocketMQ代码的大概见过这种写法）
            Thread.sleep(0);
        }
        System.err.println(i);
    }
}

先说结论，这个代码能停下来！！！

JITWatch查看C2编译结果

asm 复制代码

0x0000000114ed4f84: add sp, sp  #0x10
0x0000000114ed4f88: b 0x0000000114ed4f9c
									;无条件跳转到指定地址0x0000000114ed4f9c
0x0000000114ed4f8c: nop  ;*getstatic running {reexecute=0 rethrow=0 return_oop=0}
                         ; - SimpleInliningTest::main@24 (line 20)
                         ;
0x0000000114ed4f90: add w29, w29  #0x1  ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                  ; - SimpleInliningTest::main@30 (line 21)
                                  ; 执行i++递增逻辑
0x0000000114ed4f94: mov x1, xzr
0x0000000114ed4f98: bl 0x000000010d43a100  ; ImmutableOopMap{}
                                           ;*invokestatic sleep {reexecute=0 rethrow=0 return_oop=0}
                                           ; - SimpleInliningTest::main@34 (line 22)
                                           ;   {static_call}
                                           ;调用sleep函数
0x0000000114ed4f9c: mov x10  #0xb708                	// #46856
                             ;   {oop(a 'java/lang/Class'{0x00000006b1e1b708} = 'SimpleInliningTest')}
                             ; 构建对象地址，将对象地址放到x10寄存器内
0x0000000114ed4fa0: movk x10  #0xb1e1, lsl #16
0x0000000114ed4fa4: movk x10  #0x6, lsl #32
0x0000000114ed4fa8: ldrb w11, [x10,  #112]  ;*getstatic running {reexecute=0 rethrow=0 return_oop=0}
                                     ; - SimpleInliningTest::main@24 (line 20)
                                     ;加载对象的112偏移处的字节，也就是running变量
0x0000000114ed4fac: cbnz w11, 0x0000000114ed4f90  ;*ifeq {reexecute=0 rethrow=0 return_oop=0}
                                                  ; - SimpleInliningTest::main@27 (line 20)
                                                  ;如果w11不为0则跳回循环体开始的地方0x0000000114ed4f90
0x0000000114ed4fb0: mov w1  #0xffffff4d            	// #-179
0x0000000114ed4fb4: str w11, [sp]
0x0000000114ed4fb8: bl 0x000000010d438d00  ; ImmutableOopMap{}
                                           ;*ifeq {reexecute=1 rethrow=0 return_oop=0}
                                           ; - SimpleInliningTest::main@27 (line 20)
                                           ;   {runtime_call UncommonTrapBlob}

2.3.1 汇编解析

发现增加了Thread.sleep(0)之后，编译出来的汇编，相对于2.1原生代码有了两个变化

markdown 复制代码

1. 执行循环体时，每次都会重新加载running变量，进行重新判断，也就是会跳出循环
2. 循环体开始前，没有加载safepoint内存地址了，循环体结束也没有safepoint轮询逻辑了

2.3.2 思考&问题

第一个变化，保证了主线程的循环调用能停下来；第二个变化，好像有问题，主线程不再进行safepoint轮询了，所以没法进行GC了？结论当然是不可能的，问了下豆包原因。

2.3.2.1 为什么执行了native方法之后，变量的可见性有了

第一点是：

scss 复制代码

在 JIT 编译的代码中，Thread.sleep(0) 导致每次次循环都会从类对象内存重新读取 running 字段值，核心原因与 JVM 的内存可见性保证 和 编译器优化限制 密切相关，具体可从以下角度分析：
1. Thread.sleep(0) 触发了 JVM 的 "内存屏障" 语义
Thread.sleep(long) 是一个 native 方法，JVM 对其定义了明确的 内存可见性规范：
调用 sleep() 前后，线程必须刷新工作内存与主内存的同步------ 即写入工作内存的变量需刷回主内存，读取变量时需从主内存重新加载，而非使用工作内存中的缓存副本。
这种语义类似 synchronized 或 volatile 的内存屏障效果（但实现细节不同），强制编译器放弃对 running 变量的 "缓存优化"（如将变量值长期保留在寄存器中）。因此，每次从 sleep() 返回后，JVM 必须从类对象的主内存地址（x10 + 112）重新读取 running 的最新值，而非复用之前寄存器中的旧值。
2. 避免 "循环优化导致的可见性问题"
如果没有 Thread.sleep(0)，JIT 编译器可能会对循环做以下优化：
若判断 running 是 "循环不变量"（如长时间未被修改），可能将其值缓存到寄存器中，减少主内存访问（例如之前代码中 "合并条件判断与安全点" 的优化）。
这种优化在单线程下没问题，但在多线程场景中，其他线程修改 running 后，当前线程可能因使用寄存器缓存而无法感知变化，导致循环无法退出（可见性问题）。
而 Thread.sleep(0) 的存在，相当于告诉编译器："此处可能有线程交互，必须保证变量读取的新鲜度"。因此 JIT 会禁用对 running 的缓存优化，强制每次循环都重新读取主内存中的值。
3. sleep() 涉及线程调度，隐含 "状态可能变化" 的提示
Thread.sleep(0) 虽然休眠时间为 0，但仍会触发 线程调度器的介入：当前线程暂时让出 CPU，可能有其他线程获得执行机会（包括修改 running 变量的线程）。
JIT 编译器在分析代码时，会识别这种 "线程调度点"，并推断："在调度点之后，所有共享变量的状态都可能被其他线程修改"。因此，为了保证逻辑正确性，必须重新读取 running 等共享变量的最新值，不能依赖之前的缓存。
总结
Thread.sleep(0) 本身并不直接 "强制读取变量"，但它通过两种机制间接导致了这一行为：
内存屏障语义：JVM 规范要求 sleep() 前后同步工作内存与主内存，避免变量缓存；
线程调度提示：告知编译器 "此处可能有其他线程修改共享变量"，需放弃激进的缓存优化。
这最终使得 running 字段在每次循环中都从类对象的主内存地址重新读取，保证了多线程环境下的可见性 ------ 即使 sleep(0) 看似 "什么都没做"，但其隐含的 JVM 语义深刻影响了编译器的优化策略。

2.3.2.2 为什么没了safepoint，不支持GC了么

回答：当然不可能！那为什么汇编代码里没有显式的执行safepoint了？我们去看看源码吧。我这边看的是jdk8u192的代码，查看safepoint.cpp 有五种情况，会使得jvm线程进入safepoint的轮询

less 复制代码

// Begin the process of bringing the system to a safepoint.
  // Java threads can be in several different states and are
  // stopped by different mechanisms:
  //
  //  1. Running interpreted
  //     The interpeter dispatch table is changed to force it to
  //     check for a safepoint condition between bytecodes.
  //  2. Running in native code
  //     When returning from the native code, a Java thread must check
  //     the safepoint _state to see if we must block.  If the
  //     VM thread sees a Java thread in native, it does
  //     not wait for this thread to block.  The order of the memory
  //     writes and reads of both the safepoint state and the Java
  //     threads state is critical.  In order to guarantee that the
  //     memory writes are serialized with respect to each other,
  //     the VM thread issues a memory barrier instruction
  //     (on MP systems).  In order to avoid the overhead of issuing
  //     a memory barrier for each Java thread making native calls, each Java
  //     thread performs a write to a single memory page after changing
  //     the thread state.  The VM thread performs a sequence of
  //     mprotect OS calls which forces all previous writes from all
  //     Java threads to be serialized.  This is done in the
  //     os::serialize_thread_states() call.  This has proven to be
  //     much more efficient than executing a membar instruction
  //     on every call to native code.
  //  3. Running compiled Code
  //     Compiled code reads a global (Safepoint Polling) page that
  //     is set to fault if we are trying to get to a safepoint.
  //  4. Blocked
  //     A thread which is blocked will not be allowed to return from the
  //     block condition until the safepoint operation is complete.
  //  5. In VM or Transitioning between states
  //     If a Java thread is currently running in the VM or transitioning
  //     between states, the safepointing code will wait for the thread to
  //     block itself when it attempts transitions to a new state.
  //

第二种情况明确写明了当native方法执行返回之后，必须进行安全点状态的轮询，看看当前线程是否需要进行block。而为什么咱们C2编译之后的没有对应的命令呢？猜测是因为C2编译觉得既然native方法执行完成后，jvm层面已经保证了该逻辑，那么不需要再显式执行该指令了 同时，这段注释也解释了变量可见性问题，为了保证顺序性，执行了os::serialize_thread_states()方法进行了一些mprotect系统调用，强制保证了所有先序的写操作串行化执行，来实现了内存屏障的效果。

c 复制代码

// Serialize all thread state variables
void os::serialize_thread_states() {
  // On some platforms such as Solaris & Linux, the time duration of the page
  // permission restoration is observed to be much longer than expected  due to
  // scheduler starvation problem etc. To avoid the long synchronization
  // time and expensive page trap spinning, 'SerializePageLock' is used to block
  // the mutator thread if such case is encountered. See bug 6546278 for details.
  Thread::muxAcquire(&SerializePageLock, "serialize_thread_states");
  os::protect_memory((char *)os::get_memory_serialize_page(),
                     os::vm_page_size(), MEM_PROT_READ);
  os::protect_memory((char *)os::get_memory_serialize_page(),
                     os::vm_page_size(), MEM_PROT_RW);
  Thread::muxRelease(&SerializePageLock);
}

总结

通过阅读了这篇知乎上的文章，加上之前使用过JITWatch，就进行了一些复现和论证工作。加上了一些对safepoint的理解，也了解到native方法对于safepoint的执行逻辑影响。后续研究下safepoint机制是怎么搞的~