死锁检测组件的设计

文章目录

死锁检测组件的设计
- [1. 什么是死锁？](#1. 什么是死锁？)
- [2. 死锁检测的基本思想](#2. 死锁检测的基本思想)
- [3. 组件设计概述](#3. 组件设计概述)
- [4. 数据结构与核心代码解析](#4. 数据结构与核心代码解析)
- - [4.1 节点与图定义](#4.1 节点与图定义)
  - [4.2 关系表](#4.2 关系表)
  - [4.3 Hook 实现](#4.3 Hook 实现)
  - [4.4 检测线程](#4.4 检测线程)
  - [4.5 初始化与启动](#4.5 初始化与启动)
- [5. 测试用例](#5. 测试用例)
- [6. 总结](#6. 总结)

在多线程编程中，**死锁（Deadlock）**是一个常见且棘手的问题。当多个线程相互等待对方释放资源，导致所有线程都无法继续执行时，就发生了死锁。本文将介绍死锁的基本概念、检测原理，并给出一个基于图算法的死锁检测组件的完整实现，该组件通过 hook 技术拦截 pthread_mutex_lock/unlock 以及 pthread_create调用，动态构建线程等待图，并周期性检测环路，从而及时发现死锁。

1. 什么是死锁？

死锁是指两个或两个以上的线程在执行过程中，因争夺资源而造成的一种互相等待的现象，若无外力干预，它们将无法推进下去。经典的死锁例子如下：

线程 A 持有锁 1，等待线程 B 持有的锁 2；
线程 B 持有锁 2，等待线程 C 持有的锁 3；
线程 C 持有锁 3，等待线程 A 持有的锁 1。

这样就形成了一个循环等待链：A→B→C→A。操作系统课程中通常将死锁产生的四个必要条件总结为：

互斥：资源一次只能被一个线程占用；
持有并等待：线程持有资源的同时等待其他资源；
不可剥夺：资源只能由持有者主动释放；
循环等待：存在一组线程，每个线程都在等待下一个线程所持有的资源。

只要打破其中任何一个条件，死锁就不会发生。但实际开发中，复杂的锁依赖关系往往难以人工分析，因此需要自动检测机制。

2. 死锁检测的基本思想

死锁检测的核心是资源分配图 （或线程等待图）的环路检测。图中节点代表线程或资源，边表示"请求"或"占用"关系。简化后，我们可以只考虑线程节点，用"线程 A 等待线程 B"的边来表示等待关系，这样的图称为线程等待图。当图中出现环路时，即表示发生了死锁。

常见的检测算法是深度优先搜索（DFS）：对每个节点进行 DFS，若在搜索过程中再次访问到当前路径上的节点，则说明存在环路。实现时通常需要维护访问状态数组和路径栈。

3. 组件设计概述

我们实现的死锁检测组件基于以下设计思路：

通过 hook 技术拦截 pthread_mutex_lock 和 pthread_mutex_unlock 调用，在每次加锁前后记录锁的持有关系。
使用一个全局关系表记录当前每个互斥锁被哪个线程持有。
当线程尝试加锁一个已被其他线程持有的锁时，就在等待图中添加一条从当前线程指向持有线程的边（表示当前线程等待持有线程释放锁）。
当线程成功获得锁或解锁时，更新关系表，并移除相应的等待边。
启动一个独立的检测线程，定期对等待图进行 DFS 检测，若发现环路则输出死锁信息。

4. 数据结构与核心代码解析

4.1 节点与图定义

为了构建有向图，我们定义了节点类型 source_type，区分线程（PROCESS）和资源（RESOURCE）。虽然本实现只用到线程节点，但保留了资源类型以备扩展。

c 复制代码

enum Type {PROCESS, RESOURCE};

struct source_type {
    uint64 id;          // 线程ID 或 资源ID
    enum Type type;
    uint64 lock_id;     // 预留
    int degress;        // 预留
};

图使用邻接表存储，每个顶点是一个链表头，后续节点表示从该顶点出发的边。

c 复制代码

struct vertex {
    struct source_type s;
    struct vertex *next;
};

struct task_graph {
    struct vertex list[MAX];    // 顶点数组
    int num;                    // 当前顶点数
    pthread_mutex_t mutex;      // 保护图结构的锁（简单起见，本实现未使用）
};

全局图变量 tg 在 start_check 中初始化。

4.2 关系表

关系表 rele_node 记录了每个互斥锁当前被哪个线程持有。它是一个固定大小的数组，通过线性查找匹配的锁地址。

c 复制代码

struct rele_node_s {
    pthread_mutex_t *mtx;   // 锁地址
    pthread_t pid;          // 持有线程ID
} rele_node[MAX];

提供了三个操作：

search_rela_table：根据锁地址查找持有线程；
add_rela_table：记录锁被当前线程持有；
del_rela_table：解锁时删除记录。

4.3 Hook 实现

使用 dlsym(RTLD_NEXT, ...) 获取原始函数地址，并定义同名包装函数。这样应用程序调用 pthread_mutex_lock 时，实际会先进入我们的包装函数，执行检测逻辑后再调用真正的加锁函数。

c 复制代码

int pthread_mutex_lock(pthread_mutex_t *mutex) {
    pthread_t selfid = pthread_self();
    lock_before(selfid, mutex);          // 加锁前：可能添加等待边
    pthread_mutex_lock_f(mutex);         // 真正加锁
    lock_after(selfid, mutex);           // 加锁后：更新持有关系，移除等待边
}

int pthread_mutex_unlock(pthread_mutex_t *mutex) {
    pthread_mutex_unlock_f(mutex);       // 真正解锁
    pthread_t selfid = pthread_self();
    unlock_after(selfid, mutex);         // 解锁后：从关系表中删除记录
}

lock_before 逻辑：

检查该锁是否已被其他线程持有（search_rela_table）；
若被持有，则添加一条从当前线程到持有线程的等待边。

lock_after 逻辑：

若之前存在等待边，则移除（因为当前线程已获得锁）；
更新关系表，记录当前线程持有该锁。

unlock_after 则直接从关系表中删除该锁的记录。

4.4 检测线程

检测线程每隔 5 秒遍历所有顶点，对每个顶点启动 DFS 搜索环路。

c 复制代码

void check_dead_lock(void) {
    for (int i = 0; i < tg->num; i++) {
        search_for_cycle(i);
    }
}

search_for_cycle 初始化访问标记和路径栈，然后从顶点的第一条出边开始递归调用 DFS。

DFS 函数是核心：

若当前节点已被访问（visited[idx] == 1），说明发现环路，记录路径并输出；
否则标记访问，将当前节点加入路径，然后递归遍历所有邻接节点；
回溯时减少路径长度。

输出格式示例：

c 复制代码

cycle : 140392425447680 --> 140392408662784 --> 140392400270080 --> 140392391877376 --> 140392425447680

4.5 初始化与启动

在 main 函数之前调用 init_hook 获取原始函数地址，然后调用 start_check 初始化图并启动检测线程。

5. 测试用例

代码末尾提供了一个典型的死锁示例：四个线程分别持有锁1、2、3、4，并依次请求下一个锁，形成循环等待。

c 复制代码

pthread_mutex_t mtx1, mtx2, mtx3, mtx4;

void *thread1(void *arg) {
    pthread_mutex_lock(&mtx1);
    sleep(1);
    pthread_mutex_lock(&mtx2);   // 等待线程2持有的锁2
    // ...
}

当程序运行时，检测线程会在 5 秒后检测到环路并输出死锁信息。

6. 总结

本文从死锁的概念出发，介绍了基于图环检测的死锁检测原理，并完整实现了一个轻量级的死锁检测组件。通过 hook 技术拦截线程同步函数，动态构建等待图，再借助 DFS 定期检测环路，能够有效发现多线程程序中的死锁。

附录：完整代码

c 复制代码

// build: gcc -o deadlock deadlock.c -lpthread -ldl
#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>
#include <stdlib.h>
#include <string.h>

typedef unsigned long int uint64;


#define MAX		100

struct rele_node_s {
    pthread_mutex_t *mtx;
    pthread_t pid;
};
struct rele_node_s rele_node[MAX];
enum Type {PROCESS, RESOURCE};

struct source_type {
	uint64 id;
	enum Type type;

	uint64 lock_id;
	int degress;
};

struct vertex {

	struct source_type s;
	struct vertex *next;

};

struct task_graph {
	struct vertex list[MAX];
	int num;

	struct source_type locklist[MAX];
	int lockidx; //

	pthread_mutex_t mutex;
};

struct task_graph *tg = NULL;
int path[MAX+1];
int visited[MAX];
int k = 0;
int deadlock = 0;

struct vertex *create_vertex(struct source_type type) {

	struct vertex *tex = (struct vertex *)malloc(sizeof(struct vertex ));
	tex->s = type;
	tex->next = NULL;
	return tex;
}

int search_vertex(struct source_type type) {

	int i = 0;
	for (i = 0;i < tg->num;i ++) {
		if (tg->list[i].s.type == type.type && tg->list[i].s.id == type.id) {
			return i;
		}

	}
	return -1;
}
void add_vertex(struct source_type type) {

	if (search_vertex(type) == -1) {
		tg->list[tg->num].s = type;
		tg->list[tg->num].next = NULL;
		tg->num ++;
	}

}

int add_edge(struct source_type from, struct source_type to) {
	add_vertex(from);
	add_vertex(to);

	struct vertex *v = &(tg->list[search_vertex(from)]);

	while (v->next != NULL) {
		v = v->next;
	}

	v->next = create_vertex(to);
}

int verify_edge(struct source_type i, struct source_type j) {
	if (tg->num == 0) return 0;

	int idx = search_vertex(i);
	if (idx == -1) {
		return 0;
	}

	struct vertex *v = &(tg->list[idx]);

	while (v != NULL) {
		if (v->s.id == j.id) return 1;
		v = v->next;
	}
	return 0;
}

int remove_edge(struct source_type from, struct source_type to) {

	int idxi = search_vertex(from);
	int idxj = search_vertex(to);

	if (idxi != -1 && idxj != -1) {
		struct vertex *v = &tg->list[idxi];
		struct vertex *remove;

		while (v->next != NULL) {
			if (v->next->s.id == to.id) {
				remove = v->next;
				v->next = v->next->next;

				free(remove);
				break;
			}
			v = v->next;
		}
	}
}

void print_deadlock(void) {

	int i = 0;
	printf("cycle : ");
	for (i = 0;i < k-1;i ++) {
		printf("%ld --> ", tg->list[path[i]].s.id);
	}
	printf("%ld\n", tg->list[path[i]].s.id);
}

int DFS(int idx) {
	struct vertex *ver = &tg->list[idx];
	if (visited[idx] == 1) {
		path[k++] = idx;
		print_deadlock();
		deadlock = 1;
		return 0;
	}

	visited[idx] = 1;
	path[k++] = idx;

	while (ver->next != NULL) {
		DFS(search_vertex(ver->next->s));
		k --;
		ver = ver->next;
	}
	return 1;
}

int search_for_cycle(int idx) {

	struct vertex *ver = &tg->list[idx];
	visited[idx] = 1;
	k = 0;
	path[k++] = idx;

	while (ver->next != NULL) {
		int i = 0;
		for (i = 0;i < tg->num;i ++) {
			if (i == idx) continue;
			visited[i] = 0;
		}

		for (i = 1;i <= MAX;i ++) {
			path[i] = -1;
		}
		k = 1;

		DFS(search_vertex(ver->next->s));
		ver = ver->next;
	}
}

pthread_t search_rela_table(pthread_mutex_t *mtx) {
    int i = 0;
    for (i = 0; i < MAX; i++) {
        if(rele_node[i].mtx == mtx) {
            return rele_node[i].pid;
        }
    }
    return 0;
}

pthread_t add_rela_table(pthread_mutex_t *mtx, pthread_t pid) {
    int i = 0;

    for (i = 0; i < MAX; i++) {
        if(rele_node[i].mtx == NULL) {
            rele_node[i].mtx = mtx;
            rele_node[i].pid = pid;
            return pid;
        }
    }
    return 0;
}

pthread_t del_rela_table(pthread_mutex_t *mtx, pthread_t pid) {
    for (int i = 0; i < MAX; i++) {
        if(rele_node[i].mtx == mtx) {
            rele_node[i].mtx = NULL;
            rele_node[i].pid = 0;
            return pid;
        }
    }
    return 0;
}

void lock_before(pthread_t tid, pthread_mutex_t *mutex) {
    pthread_t otherid = search_rela_table(mutex);
    if (otherid != 0) {
        struct source_type from;
        from.id = tid;
        from.type = PROCESS;

        struct source_type to;
        to.id = otherid;
        to.type = PROCESS;

        add_edge(from, to);
    }
}


void lock_after(pthread_t tid, pthread_mutex_t *mutex) {
    pthread_t otherid = search_rela_table(mutex);
    if (otherid != 0) {
        struct source_type from;
        from.id = tid;
        from.type = PROCESS;

        struct source_type to;
        to.id = otherid;
        to.type = PROCESS;

        if(verify_edge(from, to)) remove_edge(from, to);
    }

    add_rela_table(mutex, tid);
}

void unlock_after(pthread_t tid, pthread_mutex_t *mutex) {
    del_rela_table(mutex, tid);
}

void check_dead_lock(void) {
	int i = 0;
	for (i = 0;i < tg->num;i ++) {
		search_for_cycle(i);
	}
}

static void *thread_routine(void *args) {
	while (1) {
		sleep(5);
		check_dead_lock();
	}

}

void start_check(void) {

	tg = (struct task_graph*)malloc(sizeof(struct task_graph));
	tg->num = 0;
	tg->lockidx = 0;
	
	pthread_t tid;
	pthread_create(&tid, NULL, thread_routine, NULL);
}

typedef int (* pthread_mutex_lock_t)(pthread_mutex_t *mutex);
pthread_mutex_lock_t pthread_mutex_lock_f = NULL;

typedef int (* pthread_mutex_unlock_t)(pthread_mutex_t *mutex);
pthread_mutex_unlock_t pthread_mutex_unlock_f = NULL;

typedef int (* pthread_create_t)(pthread_t *thread, const pthread_attr_t *attr,
                          void *(*start_routine) (void *), void *arg);
pthread_create_t pthread_create_f = NULL;

int pthread_mutex_lock(pthread_mutex_t *mutex) {
    pthread_t selfid = pthread_self();
    lock_before(selfid, mutex);
    pthread_mutex_lock_f(mutex);
    lock_after(selfid, mutex);
}

int pthread_mutex_unlock(pthread_mutex_t *mutex) {
    
    pthread_mutex_unlock_f(mutex);
   	pthread_t selfid = pthread_self();
	unlock_after(selfid, mutex);
}

int pthread_create(pthread_t *thread, const pthread_attr_t *attr,
                          void *(*start_routine) (void *), void *arg) {
    pthread_create_f(thread, attr, start_routine, arg);

    struct source_type v1;
	v1.id = *thread;
	v1.type = PROCESS;
	add_vertex(v1);
}

int init_hook(void) {
    if (!pthread_mutex_lock_f) {
        pthread_mutex_lock_f = dlsym(RTLD_NEXT, "pthread_mutex_lock");
    }

    if (!pthread_mutex_unlock_f) {
        pthread_mutex_unlock_f = dlsym(RTLD_NEXT, "pthread_mutex_unlock");
    }

    if (!pthread_create_f) {
        pthread_create_f = dlsym(RTLD_NEXT, "pthread_create");
    }
}


# if 1  // 死锁
pthread_mutex_t mtx1 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mtx2 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mtx3 = PTHREAD_MUTEX_INITIALIZER;
pthread_mutex_t mtx4 = PTHREAD_MUTEX_INITIALIZER;

void *thread1(void *arg) {
    pthread_mutex_lock(&mtx1);
    sleep(1);
    pthread_mutex_lock(&mtx2);

    printf("thread1:\n");
    pthread_mutex_unlock(&mtx2);
    pthread_mutex_unlock(&mtx1);
}

void *thread2(void *arg) {
    pthread_mutex_lock(&mtx2);
    sleep(1);
    pthread_mutex_lock(&mtx3);

    printf("thread2:\n");
    pthread_mutex_unlock(&mtx3);
    pthread_mutex_unlock(&mtx2);
}

void *thread3(void *arg) {
    pthread_mutex_lock(&mtx3);
    sleep(1);
    pthread_mutex_lock(&mtx4);

    printf("thread3:\n");
    pthread_mutex_unlock(&mtx4);
    pthread_mutex_unlock(&mtx3);
}

void *thread4(void *arg) {
    pthread_mutex_lock(&mtx4);
    sleep(1);
    pthread_mutex_lock(&mtx1);

    printf("thread4:\n");
    pthread_mutex_unlock(&mtx1);
    pthread_mutex_unlock(&mtx4);
}

#endif

# if 1
int main() {
	init_hook();
	start_check();

	pthread_t t1, t2, t3, t4;

	pthread_create(&t1, NULL, thread1, NULL);
	pthread_create(&t2, NULL, thread2, NULL);

	pthread_create(&t3, NULL, thread3, NULL);
	pthread_create(&t4, NULL, thread4, NULL);

	pthread_join(t1, NULL);
	pthread_join(t2, NULL);
	pthread_join(t3, NULL);
	pthread_join(t4, NULL);

	printf("complete\n");

}
#endif

#if 0
int main() {

	tg = (struct task_graph*)malloc(sizeof(struct task_graph));
	tg->num = 0;

	struct source_type v1;
	v1.id = 1;
	v1.type = PROCESS;
	add_vertex(v1);

	struct source_type v2;
	v2.id = 2;
	v2.type = PROCESS;
	add_vertex(v2);

	struct source_type v3;
	v3.id = 3;
	v3.type = PROCESS;
	add_vertex(v3);

	struct source_type v4;
	v4.id = 4;
	v4.type = PROCESS;
	add_vertex(v4);

	struct source_type v5;
	v5.id = 5;
	v5.type = PROCESS;
	add_vertex(v5);

	add_edge(v1, v2);
	add_edge(v2, v3);
	add_edge(v3, v4);
	add_edge(v4, v5);
	add_edge(v3, v1);
	
	search_for_cycle(search_vertex(v1));

}
#endif

注意：编译时需要链接 pthread 和 dl 库：

bash 复制代码

gcc -o deadlock deadlock.c -lpthread -ldl

https://github.com/0voice