【Linux】--- 线程概念、线程控制

一、Linux中线程的概念
二、Linux进程VS线程
三、线程控制

一、Linux中线程的概念

1、什么是线程？

线程的标准概念：线程是进程中的的一个执行流，是CPU调度的基本单位！
进程：进程是系统资源分配的基本单位！

关于线程的其他相关知识：

一切进程至少都有一个执行线程。
线程在进程内部运行，本质是在进程地址空间内运行。

关于进程

需要明确的是，一个进程的创建实际上伴随着其进程控制块（task_struct）、进程地址空间（mm_struct）以及页表的创建，虚拟地址和物理地址就是通过页表建立映射的。

每个进程都有自己独立的进程地址空间和独立的页表，也就意味着所有进程在运行时本身就具有独立性。

关于线程

但如果我们在创建"进程"时，只创建task_struct，并要求创建出来的task_struct和父task_struct共享进程地址空间和页表，那么创建的结果就是下面这样的：

此时我们创建的实际上就是四个线程（其实原本的进程可以看做是：主线程！）：

其中每一个线程都是当前进程里面的一个执行流，也就是我们常说的"线程是进程内部的一个执行分支"。
同时我们也可以看出，线程在进程内部运行，本质就是线程在进程地址空间内运行，也就是说曾经这个进程申请的所有资源，几乎都是被所有线程共享的。

该如何重新理解之前的进程？

下面用蓝色方框框起来的内容，我们将这个整体叫做进程。

因此，所谓的进程并不是通过task_struct来衡量的，除了task_struct之外，一个进程还要有进程地址空间、页表、代码+数据、文件、信号等等，合起来称之为一个进程。

现在我们应该站在内核角度来理解进程：承担分配系统资源的基本实体，叫做进程

在Linux中，站在CPU的角度，能否识别当前调度的task_struct是进程还是线程？

答案是不能，也不需要了，因为CPU只关心一个一个的独立执行流。无论进程内部只有一个执行流还是有多个执行流，CPU都是以task_struct为单位进行调度的。

单执行流进程被调度：

多执行流进程被调度：

操作系统中存在大量的进程，一个进程内又存在一个或多个线程，因此线程的数量一定比进程的数量多，当线程的数量足够多的时候，很明显线程的执行粒度要比进程更细。

如果一款操作系统要支持真的线程，那么就需要对这些线程进行管理。比如说创建线程、终止线程、调度线程、切换线程、给线程分配资源、释放资源以及回收资源等等，所有的这一套相比较进程都需要另起炉灶，搭建一套与进程平行的线程管理模块。

因此，如果要支持真的线程一定会提高设计操作系统的复杂程度。在Linux看来，描述线程的控制块和描述进程的控制块是类似的，因此Linux并没有重新为线程设计数据结构，而是直接复用了进程控制块，所以我们说Linux中的所有执行流都叫做轻量级进程（Lightweight Process，LWP）实际上就是线程

但也有支持真的线程的操作系统，比如Windows操作系统，因此Windows操作系统系统的实现逻辑一定比Linux操作系统的实现逻辑要复杂得多。

既然在Linux没有真正意义的线程，那么也就绝对没有真正意义上的线程相关的系统调用！

这很好理解，既然在Linux中都没有真正意义上的线程了，那么自然也没有真正意义上的线程相关的系统调用了。但是Linux可以提供创建轻量级进程的接口，也就是创建进程，共享空间

原生线程库pthread

在Linux中，站在内核角度没有真正意义上线程相关的接口，但是站在用户角度，当用户想创建一个线程时更期望使用thread_create这样类似的接口，而不是vfork函数，因此系统为用户层提供了原生线程库pthread。

原生线程库实际就是对轻量级进程的系统调用进行了封装，在用户层模拟实现了一套线程相关的接口。

因此对于我们来讲，在Linux下学习线程实际上就是学习在用户层模拟实现的这一套接口，而并非操作系统的接口。

2、二级页表

以32位平台为例，在32位平台下一共有2的32次方个地址，也就意味着有232个地址需要被映射。

如果我们所谓的页表就只是单纯的一张表，那么这张表就需要建立2的32次方个虚拟地址和物理地址之间的映射关系，即这张表一共有2的32次方个映射表项。

每一个表项中除了要有虚拟地址和与其映射的物理地址以外，实际还需要有一些权限相关的信息，比如我们所说的用户级页表和内核级页表，实际就是通过权限进行区分的。

因此所谓的页表并不是单纯的一张表。

这实际上就是我们所谓的二级页表，其中页目录项是一级页表，页表项是二级页表。

上面所说的所有映射过程，都是由MMU（MemoryManagementUnit）这个硬件完成的，该硬件是集成在CPU内的。页表是一种软件映射，MMU是一种硬件映射，所以计算机进行虚拟地址到物理地址的转化采用的是软硬件结合的方式。

注意：在Linux中，32位平台下用的是二级页表，而64位平台下用的是多级页表。

3、线程的优点

4、线程的缺点

5、线程异常

6、线程用途

二、Linux进程VS线程

1、进程和线程

进程是承担分配系统资源的基本实体，线程是调度的基本单位。

注意：在进程和线程中，关于堆栈空间需要区分清楚！！！

每一个线程都有自己的独立栈空间！！！
在一个进程中的所有线程，都共享该进程的堆空间

2、进程的多个线程共享

3、进程和线程的关系

进程和线程的关系如下图：

在此之前我们接触到的都是具有一个线程执行流的进程，即单线程进程。

三、线程控制

1、POSIX线程库

pthread线程库是应用层的原生线程库：

应用层指的是这个线程库并不是系统接口直接提供的，而是由第三方帮我们提供的。
原生指的是大部分Linux系统都会默认带上该线程库。
与线程有关的函数构成了一个完整的系列，绝大多数函数的名字都是以"pthread_"打头的。
要使用这些函数库，要通过引入头文件<pthreaad.h>。
链接这些线程函数库时，要使用编译器命令的"-lpthread"选项。

错误检查：

2、线程创建

创建线程的函数叫做pthread_create

pthread_create函数的函数原型如下：

cpp 复制代码

int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);

返回值说明：

线程创建成功返回0，失败返回错误码。

让主线程创建一个新线程

当一个程序启动时，就有一个进程被操作系统创建，与此同时一个线程也立刻运行，这个线程就叫做主线程。

主线程是产生其他子线程的线程。
通常主线程必须最后完成某些执行操作，比如各种关闭动作。（主线程必须最后退出）

下面我们让主线程调用pthread_create函数创建一个新线程，此后新线程就会跑去执行自己的新例程，而主线程则继续执行后续代码。

cpp 复制代码

#include<iostream>
#include<pthread.h>
#include<unistd.h>



void* threadStart(void *args)
{
    while(true)
    {
        sleep(1);
        std::cout<<" new thread is running... "<< " ,pid: "<<getpid()<< std::endl;
    }

}

int main()
{
    pthread_t tid1;
    pthread_create(&tid1,nullptr,threadStart,(void *)"thread-new");

    pthread_t tid2;
    pthread_create(&tid2,nullptr,threadStart,(void *)"thread-new");

    pthread_t tid3;
    pthread_create(&tid3,nullptr,threadStart,(void *)"thread-new");

    // 主线程
    while(true)
    {
        std::cout<<" main is running..."<< " ,pid: "<< getpid()<<std::endl;
        sleep(1);
    }

    return 0;
}

使用ps -aL命令，可以显示当前的轻量级进程。

默认情况下，不带-L，看到的就是一个个的进程。
带-L就可以查看到每个进程内的多个轻量级进程。

注意：在Linux中，应用层的线程与内核的LWP是一一对应的，实际上操作系统调度的时候采用的是LWP，而并非PID，只不过我们之前接触到的都是单线程进程，其PID和LWP是相等的，所以对于单线程进程来说，调度时采用PID和LWP是一样的。

获取线程ID

常见获取线程ID的方式有两种：

创建线程时通过输出型参数获得。
通过调用pthread_self函数获得。

pthread_self函数的函数原型如下：

cpp 复制代码

pthread_t pthread_self(void);

调用pthread_self函数即可获得当前线程的ID，类似于调用getpid函数获取当前进程的ID。

例如：

cpp 复制代码

#include<iostream>
#include<pthread.h>
#include<unistd.h>



void* threadStart(void *args)
{
    while(true)
    {
        sleep(1);
        //std::cout<<" new thread is running... "<< " ,pid: "<<getpid()<< std::endl;
        pthread_t tid=pthread_self();
        std::cout<<" my tid = "<<tid<<std::endl;
    }

}

int main()
{
    pthread_t tid1;
    pthread_create(&tid1,nullptr,threadStart,(void *)"thread-new");

    pthread_t tid2;
    pthread_create(&tid2,nullptr,threadStart,(void *)"thread-new");

    pthread_t tid3;
    pthread_create(&tid3,nullptr,threadStart,(void *)"thread-new");

    // 主线程
    while(true)
    {
        std::cout<<" main is running..."<< " ,pid: "<< getpid()<<std::endl;
        sleep(1);
    }

    return 0;
}

注意： 用pthread_self函数获得的线程ID与内核的LWP的值是不相等的，pthread_self函数获得的是用户级原生线程库的线程ID，而LWP是内核的轻量级进程ID，它们之间是一对一的关系。

3、线程等待

首先需要明确的是，一个线程被创建出来，这个线程就如同进程一般，也是需要被等待的。如果主线程不对新线程进行等待，那么这个新线程的资源也是不会被回收的。所以线程需要被等待，如果不等待会产生类似于"僵尸进程"的问题，也就是内存泄漏。

等待线程的函数叫做pthread_join

cpp 复制代码

int pthread_join(pthread_t thread, void **retval);

调用该函数的线程将挂起等待，直到ID为thread的线程终止，thread线程以不同的方法终止，通过pthread_join得到的终止状态是不同的。

如果thread线程通过return返回，retval所指向的单元里存放的是thread线程函数的返回值。
如果thread线程被别的线程调用pthread_cancel异常终止掉，retval所指向的单元里存放的是常数PTHREAD_CANCELED。
如果thread线程是自己调用pthread_exit终止的，retval所指向的单元存放的是传给pthread_exit的参数。
如果对thread线程的终止状态不感兴趣，可以传NULL给retval参数。

用grep命令进行查找，可以发现PTHREAD_CANCELED实际上就是头文件<pthread.h>里面的一个宏定义，它的值本质就是-1。

cpp 复制代码

[cl@VM-0-15-centos thread]$ grep -ER "PTHREAD_CANCELED" /usr/include/

例如，在下面的代码中我们先不关心线程的退出信息，直接将pthread_join函数的第二次参数设置为NULL，等待线程后打印该线程的编号以及线程ID。

cpp 复制代码

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>


void* Routine(void* arg)
{
    char* msg = (char*)arg;
	int count = 0;
	while (count < 5){
		printf("I am %s...pid: %d, ppid: %d, tid: %lu\n", msg, getpid(), getppid(), pthread_self());
		sleep(1);
		count++;
	}
	return NULL;

}


int main()
{
    pthread_t tid[5];
    for(int i=0;i<5;i++)
    {
        char* buffer=(char*)malloc(64);
        sprintf(buffer," thread %d ",i);
        pthread_create(&tid[i],NULL,Routine,buffer);
        printf("%s tid is %lu\n", buffer, tid[i]);
    }

    printf("I am main thread...pid: %d, ppid: %d, tid: %lu\n", getpid(), getppid(), pthread_self());
    // 线程等待
    for(int i=0;i<5;i++)
    {
        pthread_join(tid[i],NULL);
        printf("thread %d[%lu]...quit\n", i, tid[i]);
    }
    return 0;
}

注意： pthread_join函数默认是以阻塞的方式进行线程等待的。

如果我们等待的是一个进程，那么当这个进程退出时，我们可以通过wait函数或是waitpid函数的输出型参数status，获取到退出进程的退出码、退出信号以及core dump标志。

那为什么等待线程时我们只能拿到退出线程的退出码？难道线程不会出现异常吗？

线程在运行过程中当然也会出现异常，线程和进程一样，线程退出的情况也有三种：

代码运行完毕，结果正确。
代码运行完毕，结果不正确。
代码异常终止。
因此我们也需要考虑线程异常终止的情况，但是pthread_join函数无法获取到线程异常退出时的信息。因为线程是进程内的一个执行分支，如果进程中的某个线程崩溃了，那么整个进程也会因此而崩溃，此时我们根本没办法执行pthread_join函数，因为整个进程已经退出了。

例如，我们在线程的执行例程当中制造一个除零错误，当某一个线程执行到此处时就会崩溃，进而导致整个进程崩溃。

cpp 复制代码

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* Routine(void* arg)
{
	char* msg = (char*)arg;
	int count = 0;
	while (count < 5){
		printf("I am %s...pid: %d, ppid: %d, tid: %lu\n", msg, getpid(), getppid(), pthread_self());
		sleep(1);
		count++;
		int a = 1 / 0; //error
	}
	return (void*)2022;
}
int main()
{
	pthread_t tid[5];
	for (int i = 0; i < 5; i++){
		char* buffer = (char*)malloc(64);
		sprintf(buffer, "thread %d", i);
		pthread_create(&tid[i], NULL, Routine, buffer);
		printf("%s tid is %lu\n", buffer, tid[i]);
	}
	printf("I am main thread...pid: %d, ppid: %d, tid: %lu\n", getpid(), getppid(), pthread_self());
	for (int i = 0; i < 5; i++){
		void* ret = NULL;
		pthread_join(tid[i], &ret);
		printf("thread %d[%lu]...quit, exitcode: %d\n", i, tid[i], (int)ret);
	}
	return 0;
}

运行代码，可以看到一旦某个线程崩溃了，整个进程也就跟着挂掉了，此时主线程连等待新线程的机会都没有，这也说明了多线程的健壮性不太强，一个进程中只要有一个线程挂掉了，那么整个进程就挂掉了。并且此时我们也不知道是由于哪一个线程崩溃导致的，我们只知道是这个进程崩溃了。

所以pthread_join函数只能获取到线程正常退出时的退出码，用于判断线程的运行结果是否正确。（他无法获得线程异常退出的信息）

4、线程终止

如果需要只终止某个线程而不是终止整个进程，可以有三种方法：

从线程函数return。
线程可以自己调用pthread_exit函数终止自己。
一个线程可以调用pthread_cancel函数终止同一进程中的另一个线程。

return退出

在线程中使用return代表当前线程退出，但是在main函数中使用return代表整个进程退出，也就是说只要主线程退出了那么整个进程就退出了

例如，在下面代码中，主线程创建五个新线程后立刻进行return，那么整个进程也就退出了。

cpp 复制代码

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* Routine(void* arg)
{
	char* msg = (char*)arg;
	while (1){
		printf("I am %s...pid: %d, ppid: %d, tid: %lu\n", msg, getpid(), getppid(), pthread_self());
		sleep(1);
	}
	return (void*)0;
}
int main()
{
	pthread_t tid[5];
	for (int i = 0; i < 5; i++){
		char* buffer = (char*)malloc(64);
		sprintf(buffer, "thread %d", i);
		pthread_create(&tid[i], NULL, Routine, buffer);
		printf("%s tid is %lu\n", buffer, tid[i]);
	}
	printf("I am main thread...pid: %d, ppid: %d, tid: %lu\n", getpid(), getppid(), pthread_self());

	return 0;
}

pthread_exit函数

pthread_exit函数的功能就是终止线程，pthread_exit函数的函数原型如下：

cpp 复制代码

void pthread_exit(void *retval);

例如，在下面代码中，我们使用pthread_exit函数终止线程，并将线程的退出码设置为6666。

cpp 复制代码

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* Routine(void* arg)
{
	char* msg = (char*)arg;
	int count = 0;
	while (count < 5){
		printf("I am %s...pid: %d, ppid: %d, tid: %lu\n", msg, getpid(), getppid(), pthread_self());
		sleep(1);
		count++;
	}
	pthread_exit((void*)6666);
}
int main()
{
	pthread_t tid[5];
	for (int i = 0; i < 5; i++){
		char* buffer = (char*)malloc(64);
		sprintf(buffer, "thread %d", i);
		pthread_create(&tid[i], NULL, Routine, buffer);
		printf("%s tid is %lu\n", buffer, tid[i]);
	}
	printf("I am main thread...pid: %d, ppid: %d, tid: %lu\n", getpid(), getppid(), pthread_self());
	for (int i = 0; i < 5; i++){
		void* ret = NULL;
		pthread_join(tid[i], &ret);
		printf("thread %d[%lu]...quit, exitcode: %d\n", i, tid[i], (int)ret);
	}
	return 0;
}

注意： exit（）函数的作用是终止进程，任何一个线程调用exit函数也代表的是整个进程终止。

pthread_cancel函数

线程是可以被取消的，我们可以使用pthread_cancel函数取消某一个线程，pthread_cancel函数的函数原型如下：

cpp 复制代码

int pthread_cancel(pthread_t thread);

cpp 复制代码

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* Routine(void* arg)
{
	char* msg = (char*)arg;
	int count = 0;
	while (count < 5){
		printf("I am %s...pid: %d, ppid: %d, tid: %lu\n", msg, getpid(), getppid(), pthread_self());
		sleep(1);
		count++;
		pthread_cancel(pthread_self());
	}
	pthread_exit((void*)6666);
}
int main()
{
	pthread_t tid[5];
	for (int i = 0; i < 5; i++){
		char* buffer = (char*)malloc(64);
		sprintf(buffer, "thread %d", i);
		pthread_create(&tid[i], NULL, Routine, buffer);
		printf("%s tid is %lu\n", buffer, tid[i]);
	}
	printf("I am main thread...pid: %d, ppid: %d, tid: %lu\n", getpid(), getppid(), pthread_self());
	for (int i = 0; i < 5; i++){
		void* ret = NULL;
		pthread_join(tid[i], &ret);
		printf("thread %d[%lu]...quit, exitcode: %d\n", i, tid[i], (int)ret);
	}
	return 0;
}

运行代码，可以看到每个线程执行一次打印操作后就退出了，其退出码不是我们设置的6666而是-1，因为我们是在线程执行pthread_exit函数前将线程取消的。

thread_cancel函数，虽然线程可以自己取消自己，但一般不这样做，我们往往是用于一个线程取消另一个线程，比如主线程取消新线程。

此外，新线程也是可以取消主线程的

注意：

当采用这种取消方式时，主线程和各个新线程之间的地位是对等的，取消一个线程，其他线程也是能够跑完的，只不过主线程不再执行后续代码了。
我们一般都是用主线程去控制新线程，这才符合我们对线程控制的基本逻辑，虽然实验表明新线程可以取消主线程，但是并不推荐该做法。

5、线程分离

分离线程的函数叫做pthread_detach

pthread_detach函数的函数原型如下：

cpp 复制代码

int pthread_detach(pthread_t thread);

例如，下面我们创建五个新线程后让这五个新线程将自己进行分离，那么此后主线程就不需要在对这五个新线程进行join了。

cpp 复制代码

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/types.h>

void* Routine(void* arg)
{
	pthread_detach(pthread_self());
	char* msg = (char*)arg;
	int count = 0;
	while (count < 5){
		printf("I am %s...pid: %d, ppid: %d, tid: %lu\n", msg, getpid(), getppid(), pthread_self());
		sleep(1);
		count++;
	}
	pthread_exit((void*)6666);
}
int main()
{
	pthread_t tid[5];
	for (int i = 0; i < 5; i++){
		char* buffer = (char*)malloc(64);
		sprintf(buffer, "thread %d", i);
		pthread_create(&tid[i], NULL, Routine, buffer);
		printf("%s tid is %lu\n", buffer, tid[i]);
	}
	while (1){
		printf("I am main thread...pid: %d, ppid: %d, tid: %lu\n", getpid(), getppid(), pthread_self());
		sleep(1);
	}
	return 0;
}