1并发编程应对bug的方法
软件是我们应对现实世界的需求。
BARRIER(内存屏障)
-
阻止编译器优化重排 编译器为了提高效率,可能会对代码指令进行重排(只要逻辑上等价)。
barrier可以强制编译器在屏障前后的指令不进行交叉重排,保证代码按 "看起来" 的顺序执行。 -
控制 CPU 执行顺序现代 CPU 可能会乱序执行指令以提升性能。内存屏障可以强制 CPU 等待屏障前的所有指令执行完毕,并将结果同步到内存后,再执行屏障后的指令,确保多线程间数据的可见性。
cpp
#include "thread.h"
#define A 1
#define B 2
#define BARRIER __sync_synchronize()
atomic_int nested;
atomic_long count;
void critical_section() {
long cnt = atomic_fetch_add(&count, 1);
int i = atomic_fetch_add(&nested, 1) + 1;
if (i != 1) {
printf("%d threads in the critical section @ count=%ld\n", i, cnt);
assert(0);
}
atomic_fetch_add(&nested, -1);
}
int volatile x = 0, y = 0, turn;
void TA() {
while (1) {
x = 1; BARRIER;
turn = B; BARRIER; // <- this is critcal for x86
while (1) {
if (!y) break; BARRIER;
if (turn != B) break; BARRIER;
}
critical_section();
x = 0; BARRIER;
}
}
void TB() {
while (1) {
y = 1; BARRIER;
turn = A; BARRIER;
while (1) {
if (!x) break; BARRIER;
if (turn != A) break; BARRIER;
}
critical_section();
y = 0; BARRIER;
}
}
int main() {
create(TA);
create(TB);
}
基本思路:假设自己的编程是错误的。
使用assert来为自己的代码添加容错(面试的时候使用assert断言可以增加自己的印象分)(没有人可以写出来完美的代码,你要为其他人解释你的代码)
assert可以帮你找到memory上的问题(内存溢出、指针错乱)你的断言可以帮你找到这些问题。
防御性编程案例:
cpp
#define CHECK_INT(x, cond) \
({ panic_on(!((x) cond), "int check fail: " #x " " #cond); })
#define CHECK_HEAP(ptr) \
({ panic_on(!IN_RANGE((ptr), heap)); })
CHECK_INT(waitlist->count, >= 0);
CHECK_INT(pid, < MAX_PROCS);
CHECK_HEAP(ctx->rip);
CHECK_HEAP(ctx->cr3);
2.面对并发bug:死锁
AA型死锁:自己与中断之间的死锁
cpp
void os_run() {
spin_lock(&list_lock);
spin_lock(&xxx);
spin_unlock(&xxx); // ---------+
} // |
// |
void on_interrupt() { // |
spin_lock(&list_lock); // <--+
spin_unlock(&list_lock);
}
ABBA型死锁:哲学家吃饭问题的死锁
cpp
void swap(int i, int j) {
spin_lock(&lock[i]);
spin_lock(&lock[j]);
arr[i] = NULL;
arr[j] = arr[i];
spin_unlock(&lock[j]);
spin_unlock(&lock[i]);
}
死锁产生的四个必要条件:
1.互斥:一个资源只能被一个进程所使用
2.请求保持:一个进程请求资源阻塞时,不释放已经获得的资源
3.不剥夺:进程已经获得的资源不能强行剥夺
4.循环等待:若干进程之间形成头尾相接的循环等待资源关系
3.避免死锁的方法
3.1对于AA型死锁:
较为容易检测,及早报告,尽早修复。
在spinlock-xv6中的防御性编程:
panic:程序直接崩溃
cpp
if (holding(lk)) panic();
触发条件:panic 通常由程序遇到无法处理的异常情况时触发,例如:
1.数组或指针越界访问
2.除以零
3.空指针解引用
4.断言(assert)失败
5.手动调用 panic 函数主动终止程序(如某些语言中的 panic() 函数)
行为表现:当 panic 发生时,程序会立即停止当前执行流程,
可能执行一些清理操作(如释放资源、调用析构函数),
然后输出错误信息(如错误原因、调用栈跟踪),最终强制退出。
3.2对于ABBA型死锁
在任意时刻,系统中的锁都是有限的,我们规定程序必须按照固定的顺序获得锁,这样就可以避免死锁。因为总是有跑的最快的线程拿到了编号最小的锁,只有他可以继续执行,其他线程无法执行,避免了死锁。
最好的锁是封装的,别人看不到他。避免他出错。
3.3并发bug:数据竞争
不上锁就没有死锁了吗?
两个线程同时访问同一个地址,并且至少有一个是写。两个线程出现了赛跑,程序的结果取决于赛跑结果,完全不可预测。
我们几乎无法写出无锁的并发程序,所以尽量用互斥锁保护好共享数据,消灭一切数据竞争。
数据竞争通常出现的错误:
上错了锁,忘了上锁。
实现并发控制的工具:
1.互斥锁(lock/unlock)--原子性
2.条件变量(wait/signal) --同步
忘记上锁--原子性违反(atomicity violettion)AV
忘记同步--顺序违反(order violettion)OV
3.4运行时的死锁检查
在定义锁的时候添加了锁的行号。你将获得锁的顺序,如果存在环,那就是有问题。
cpp
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
typedef struct lock {
int locked;
const char *site;
} lock_t;
#define STRINGIFY(s) #s
#define TOSTRING(s) STRINGIFY(s)
#define LOCK_INIT() \
( (lock_t) { .locked = 0, .site = __FILE__ ":" TOSTRING(__LINE__), } )
lock_t lk1 = LOCK_INIT();
lock_t lk2 = LOCK_INIT();
void lock(lock_t *lk) {
printf("LOCK %s\n", lk->site);
}
void unlock(lock_t *lk) {
printf("UNLOCK %s\n", lk->site);
}
struct some_object {
lock_t lock;
int data;
};
void object_init(struct some_object *obj) {
obj->lock = LOCK_INIT();
}
int main() {
lock(&lk1);
lock(&lk2);
unlock(&lk1);
unlock(&lk2);
struct some_object *obj = malloc(sizeof(struct some_object));
assert(obj);
object_init(obj);
lock(&obj->lock);
lock(&lk2);
lock(&lk1);
}
3.5运行时的数据竞争检查
使用图论的方式去解决问题。
如果两个线程之间并没有锁,对相同地址的操作顺序将不会被保证。
3.6动态程序分析
gcc自带动态分析工具sanitizers
3.6.1非法内存访问分析:gcc uaf.c -fsanitize=address
cpp
lxy@lxy-JIAOLONG-Series:/media/lxy/工作区/TASK_2025/20251105operat/m8$ gcc uaf.c -fsanitize=address
lxy@lxy-JIAOLONG-Series:/media/lxy/工作区/TASK_2025/20251105operat/m8$ ./a.out
=================================================================
==5474==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000000010 at pc 0x55be7fe45267 bp 0x7ffdbc085100 sp 0x7ffdbc0850f0
WRITE of size 4 at 0x602000000010 thread T0
#0 0x55be7fe45266 in main (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x1266)
#1 0x7f39c639a082 in __libc_start_main ../csu/libc-start.c:308
#2 0x55be7fe4510d in _start (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x110d)
0x602000000010 is located 0 bytes inside of 4-byte region [0x602000000010,0x602000000014)
freed by thread T0 here:
#0 0x7f39c667540f in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:122
#1 0x55be7fe4522f in main (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x122f)
#2 0x7f39c639a082 in __libc_start_main ../csu/libc-start.c:308
previously allocated by thread T0 here:
#0 0x7f39c6675808 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:144
#1 0x55be7fe451de in main (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x11de)
#2 0x7f39c639a082 in __libc_start_main ../csu/libc-start.c:308
SUMMARY: AddressSanitizer: heap-use-after-free (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x1266) in main
Shadow bytes around the buggy address:
0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[fd]fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
==5474==ABORTING
3.6.2数据竞争分析:
(
ubuntu20.04可能会有问题/usr/bin/ld: cannot find libtsan_preinit.o: No such file or directory
collect2: error: ld returned 1 exit status
使用如下命令可以解决问题
sudo apt install libgcc-10-dev
sudo ln -s /usr/lib/gcc/x86_64-linux-gnu/10/libtsan_preinit.o /usr/lib/libtsan_preinit.o
)
cpp
lxy@lxy-JIAOLONG-Series:/media/lxy/工作区/TASK_2025/20251105operat/m8$ gcc sum.c -lpthread -fsanitize=thread
lxy@lxy-JIAOLONG-Series:/media/lxy/工作区/TASK_2025/20251105operat/m8$ ./a.out
==================
WARNING: ThreadSanitizer: data race (pid=9153)
Read of size 8 at 0x5596adf43028 by thread T2:
#0 Tsum <null> (a.out+0x159f)
#1 wrapper <null> (a.out+0x12fb)
Previous write of size 8 at 0x5596adf43028 by thread T1:
#0 Tsum <null> (a.out+0x15b6)
#1 wrapper <null> (a.out+0x12fb)
Location is global 'sum' of size 8 at 0x5596adf43028 (a.out+0x000000004028)
Thread T2 (tid=9156, running) created by main thread at:
#0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
#1 create <null> (a.out+0x1459)
#2 main <null> (a.out+0x1608)
Thread T1 (tid=9155, running) created by main thread at:
#0 pthread_create ../../../../src/libsanitizer/tsan/tsan_interceptors_posix.cpp:962 (libtsan.so.0+0x5ea79)
#1 create <null> (a.out+0x1459)
#2 main <null> (a.out+0x15fc)
SUMMARY: ThreadSanitizer: data race (/media/lxy/工作区/TASK_2025/20251105operat/m8/a.out+0x159f) in Tsum
==================
sum = 199598468
ThreadSanitizer: reported 1 warnings
小技巧:对于程序a.out的输出,其默认输出到std(标准输出)。可以使其输出到空。
cpp
./a.out > null
3.6.3计算机系统的canary
牺牲一些内存单元,来预警memory error.canary举例保护栈空间:
cpp
#define MAGIC 0x55555555
#define BOTTOM (STK_SZ / sizeof(u32) - 1)
struct stack { char data[STK_SZ]; };
void canary_init(struct stack *s) {
u32 *ptr = (u32 *)s;
for (int i = 0; i < CANARY_SZ; i++)
ptr[BOTTOM - i] = ptr[i] = MAGIC;
//在栈的头和尾设置一些特定的数字,不定期检查这些地址内容是否还是这些特定的数字
}
void canary_check(struct stack *s) {
u32 *ptr = (u32 *)s;
for (int i = 0; i < CANARY_SZ; i++) {
panic_on(ptr[BOTTOM - i] != MAGIC, "underflow");
panic_on(ptr[i] != MAGIC, "overflow");
}
}
msvc 中 debug mode 的 guard/fence/canary
- 未初始化栈:
0xcccccccc - 未初始化堆:
0xcdcdcdcd - 对象头尾:
0xfdfdfdfd - 已回收内存:
0xdddddddd

低配版本的lockdep:
不必管什么锁的顺序,只要自旋锁的循环次数超过某一个比较大的数。就是出现了问题。
cpp
int spin_cnt = 0;
while (xchg(&locked, 1)) {
if (spin_cnt++ > SPIN_LIMIT) {
printf("Too many spin @ %s:%d\n", __FILE__, __LINE__);
}
}
低配版本的sanitizer:
你要在堆区维护一个数据结构,mallocl就是占用一段数据,free就是解除他的占用。
malloc就把一段数据给染红,当malloc到任何一个红色的地址时,发生了错误。
free就是把一段数据给染成蓝色,当free到任何一个蓝色的数据的时候,就是发生了错误。