Java程序中的线程,在JVM层面是一个java.lang.Thread对象,而真正执行代码的实体,则是操作系统内核调度的轻量级进程(LWP)。对于OpenJDK 17在Linux上的实现,其线程创建路径大致为:JVM → pthread_create(glibc) → clone / clone3 系统调用 → 内核。本文将以OpenJDK 17和glibc 2.34+的源码为基础,沿着这条路径逐层剖析,带你理解一个Java线程从诞生到被内核接纳的完整过程。
一、JVM层:os::create_thread ------ 准备pthread属性
OpenJDK中与操作系统交互的线程创建函数是os::create_thread(位于os_posix.cpp)。它负责构造pthread属性对象,并最终调用pthread_create。我们来看看它做了哪些关键工作。
cpp
bool os::create_thread(Thread* thread, ThreadType thr_type, size_t req_stack_size) {
// 1. 创建OSThread对象,关联JVM的Thread
OSThread* osthread = new OSThread(NULL, NULL);
osthread->set_thread_type(thr_type);
osthread->set_state(ALLOCATED);
thread->set_osthread(osthread);
// 2. 初始化pthread属性
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);
// 3. 设置线程作用域(仅AIX需要PTHREAD_SCOPE_SYSTEM,Linux默认就是系统级竞争)
// 4. 让新线程一开始就处于挂起状态(SUSPENDED),等JVM准备好后再唤醒
pthread_attr_setsuspendstate_np(&attr, PTHREAD_CREATE_SUSPENDED_NP);
// 5. 确定栈大小(处理JDK-8187028的workaround:小栈额外加64K)
size_t stack_size = os::Posix::get_initial_stack_size(thr_type, req_stack_size);
if (stack_size < 4096 * K) stack_size += 64 * K;
pthread_attr_setstacksize(&attr, stack_size);
// 6. 对于Java线程和编译器线程,禁用操作系统guard page(JVM自己管理)
if (thr_type == java_thread || thr_type == compiler_thread)
pthread_attr_setguardsize(&attr, 0);
// 7. 真正创建线程
pthread_t tid;
int ret = pthread_create(&tid, &attr, thread_native_entry, thread);
// ... 错误处理及日志
}
值得注意的设计:
-
挂起启动 :通过
PTHREAD_CREATE_SUSPENDED_NP(非标准扩展)使新线程创建后不立即运行,等待JVM完成一些初始化(如设置Thread对象状态)后再显式唤醒。这避免了线程过早进入run()方法时JVM状态未就绪的问题。 -
栈大小补偿:某些glibc版本下,请求的小栈(<4MB)实际分配可能比预期小64KB,因此JVM主动追加64KB。这是一个与底层C库实现细节博弈的产物。
-
Guard Page:JVM自己会为Java线程栈设置保护页,所以通知pthread不再额外添加,节省内存。
pthread_create是POSIX线程标准接口,但glibc内部并不是直接发起clone系统调用,而是经过了一系列封装。下面我们深入glibc的源码。
二、glibc层:从pthread_create到clone系统调用
2.1 __pthread_create_2_1 ------ 线程创建的总控
glibc中pthread_create的实现在nptl/pthread_create.c。它经历了以下主要步骤(已简化):
-
处理线程属性(如果传入NULL则获取默认属性)。
-
分配栈和线程描述符(
struct pthread *pd)------allocate_stack。 -
初始化
pd中的各种字段:启动函数start_routine、参数arg、信号掩码、调度策略等。 -
如果调试器在跟踪线程创建,则设置
stopped_start = true,让新线程创建后先阻塞,等待调试器附加。 -
调用
create_thread函数,真正发起系统调用。
2.2 create_thread ------ 准备clone参数
create_thread(位于同一文件)负责构造clone_args结构体,并调用__clone_internal。
cpp
static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
bool *stopped_start, void *stackaddr,
size_t stacksize, bool *thread_ran) {
// clone_flags 决定新进程(线程)与父进程共享的资源
const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM
| CLONE_SIGHAND | CLONE_THREAD
| CLONE_SETTLS | CLONE_PARENT_SETTID
| CLONE_CHILD_CLEARTID | 0);
TLS_DEFINE_INIT_TP (tp, pd); // 获取线程局部存储(TLS)指针
struct clone_args args = {
.flags = clone_flags,
.pidfd = (uintptr_t) &pd->tid,
.parent_tid = (uintptr_t) &pd->tid,
.child_tid = (uintptr_t) &pd->tid,
.stack = (uintptr_t) stackaddr,
.stack_size = stacksize,
.tls = (uintptr_t) tp,
};
int ret = __clone_internal (&args, &start_thread, pd);
// ...
}
clone_flags的每一位都代表新线程与父进程共享的资源:
-
CLONE_VM:共享内存地址空间(线程的必要条件)。 -
CLONE_FS:共享文件系统信息(当前工作目录、umask等)。 -
CLONE_FILES:共享文件描述符表。 -
CLONE_SIGHAND:共享信号处理器表。 -
CLONE_THREAD:将新进程放入同一个线程组(getpid()返回相同值)。 -
CLONE_SETTLS:设置线程局部存储段。 -
CLONE_PARENT_SETTID:将线程ID写入父进程提供的地址(pd->tid)。 -
CLONE_CHILD_CLEARTID:线程退出时清零pd->tid,用于pthread_join的唤醒。
2.3 __clone_internal ------ clone3优先,回退clone
glibc 2.34引入了一个更现代的系统调用clone3,它通过一个结构体传递参数,比旧的clone(使用寄存器传参)更易扩展。__clone_internal实现了自动选择机制:
cpp
int __clone_internal (struct clone_args *cl_args, int (*func)(void *), void *arg) {
#ifdef HAVE_CLONE3_WRAPPER
int saved_errno = errno;
int ret = __clone3_internal (cl_args, func, arg);
if (ret != -1 || errno != ENOSYS)
return ret;
__set_errno (saved_errno);
#endif
return __clone_internal_fallback (cl_args, func, arg); // 传统clone
}
__clone3_internal会检查内核是否支持clone3(通过第一次调用结果ENOSYS缓存判断),支持则调用__clone3汇编包装器,否则降级到传统clone。
2.4 汇编层的__clone3 ------ 陷入内核
x86-64架构下,__clone3的汇编实现(来自sysdeps/unix/sysv/linux/x86_64/clone3.S):
assembly
ENTRY (__clone3)
// 参数:rdi = cl_args, rsi = size, rdx = func, rcx = arg
test %RDI_LP, %RDI_LP // cl_args不能为NULL
jz SYSCALL_ERROR_LABEL
test %RDX_LP, %RDX_LP // func不能为NULL
jz SYSCALL_ERROR_LABEL
mov %RCX_LP, %R8_LP // 保存arg到r8(跨syscall保留)
movl $SYS_ify(clone3), %eax // syscall number 435
syscall
test %RAX_LP, %RAX_LP
jl SYSCALL_ERROR_LABEL
jz L(thread_start) // 返回0表示子进程(新线程)
ret // 父线程返回
L(thread_start):
xorl %ebp, %ebp // 清空帧指针,标记栈底
mov %R8_LP, %RDI_LP // 参数:arg
call *%rdx // 调用func(arg)
movq %rax, %rdi // exit(func返回值)
movl $SYS_ify(exit), %eax
syscall
END (__clone3)
关键点:
-
子进程(新线程)从
L(thread_start)开始执行,直接调用func(arg),然后调用exit系统调用终止,不会返回到pthread_create中。 -
父线程(原线程)从
syscall返回后继续执行,ret返回到__clone_internal。
传统
clone系统调用通过寄存器传递栈指针、标志位等,参数个数有限;clone3使用结构体,更干净且易于未来扩展。Linux内核从5.3开始支持clone3。
三、内核层:clone / clone3 系统调用的归宿
当syscall指令执行后,CPU陷入内核,根据系统调用号(435)找到内核中的sys_clone3或sys_clone(兼容旧接口)。内核会:
-
复制当前进程的
task_struct。 -
根据
clone_flags决定共享哪些资源(CLONE_VM→共享内存描述符,CLONE_FILES→共享文件表等)。 -
为新进程分配内核栈、设置TLS(如果
CLONE_SETTLS被设置)。 -
将新进程加入调度队列,根据调度策略决定是否立即抢占。
对于线程(CLONE_THREAD),内核会设置task_struct的group_leader指向父进程的线程组组长,使得getpid()返回相同的PID。同时,tgid(线程组ID)与父进程相同,而pid(线程ID)是新分配的。
当新线程被调度运行时,它会从内核态返回到用户态,但返回的入口点并非原先的syscall指令之后------因为clone系统调用在新线程中的返回地址被特殊处理为L(thread_start)(由glibc提供的函数指针)。这正是汇编中"子进程从L(thread_start)开始执行"的由来。
四、JVM线程创建的特殊之处
回到JVM,pthread_create返回后,新线程处于挂起状态(因为PTHREAD_CREATE_SUSPENDED_NP)。JVM接下来会:
-
设置线程的
Thread对象状态。 -
如果必要,将线程附加到
ThreadGroup。 -
最后调用
os::start_thread(thread),它通过pthread_cond_signal或其他同步原语唤醒新线程,使其开始执行thread_native_entry。
thread_native_entry会调用Thread::call_run(),最终执行Java代码的run()方法。
五、总结与思考
从JVM到内核,线程创建涉及多个抽象层,每层都承担了不同的职责:
| 层次 | 职责 | 关键动作 |
|---|---|---|
| JVM | 管理Java线程对象,控制启动时机 | 设置pthread属性(挂起启动、栈大小补偿、禁用guard page) |
| glibc | 实现POSIX线程语义,分配用户栈和TCB | 构造clone_args,调用clone3/clone系统调用 |
| Linux内核 | 创建内核task_struct,调度执行 | 根据flags共享资源,设置TLS,返回用户态执行函数 |
几个值得深思的设计点:
-
挂起启动的必要性:JVM需要确保新线程在完全初始化之前不执行Java代码,否则可能看到不一致的JVM内部状态。挂起启动比在临界区使用互斥锁更轻量,避免了锁竞争。
-
栈大小的精确控制:JVM主动补偿glibc可能"偷走"的栈空间,反映了对内存使用的精细化管理。同时禁用操作系统guard page,因为JVM已经实现了自己的保护机制(黄色页、红色页),可以更高效地处理栈溢出。
-
clone3的优雅演进 :glibc通过
ENOSYS回退机制,既能利用新内核的特性,又保持了与旧内核的兼容性。这种"运行时检测"模式在系统编程中很常见。
理解这一整套流程,不仅有助于诊断多线程程序的疑难杂症(如栈溢出、线程创建失败),也能让人更深刻地体会到用户态与内核态之间精妙的协作关系。下次当你写new Thread(() -> {...}).start()时,或许可以想象一下这一长串代码在底层掀起的波澜。
##源码
cpp
bool os::create_thread(Thread* thread, ThreadType thr_type,
size_t req_stack_size) {
assert(thread->osthread() == NULL, "caller responsible");
// Allocate the OSThread object.
OSThread* osthread = new OSThread(NULL, NULL);
if (osthread == NULL) {
return false;
}
// Set the correct thread state.
osthread->set_thread_type(thr_type);
// Initial state is ALLOCATED but not INITIALIZED
osthread->set_state(ALLOCATED);
thread->set_osthread(osthread);
// Init thread attributes.
pthread_attr_t attr;
pthread_attr_init(&attr);
guarantee(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) == 0, "???");
// Make sure we run in 1:1 kernel-user-thread mode.
if (os::Aix::on_aix()) {
guarantee(pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM) == 0, "???");
guarantee(pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED) == 0, "???");
}
// Start in suspended state, and in os::thread_start, wake the thread up.
guarantee(pthread_attr_setsuspendstate_np(&attr, PTHREAD_CREATE_SUSPENDED_NP) == 0, "???");
// Calculate stack size if it's not specified by caller.
size_t stack_size = os::Posix::get_initial_stack_size(thr_type, req_stack_size);
// JDK-8187028: It was observed that on some configurations (4K backed thread stacks)
// the real thread stack size may be smaller than the requested stack size, by as much as 64K.
// This very much looks like a pthread lib error. As a workaround, increase the stack size
// by 64K for small thread stacks (arbitrarily choosen to be < 4MB)
if (stack_size < 4096 * K) {
stack_size += 64 * K;
}
// On Aix, pthread_attr_setstacksize fails with huge values and leaves the
// thread size in attr unchanged. If this is the minimal stack size as set
// by pthread_attr_init this leads to crashes after thread creation. E.g. the
// guard pages might not fit on the tiny stack created.
int ret = pthread_attr_setstacksize(&attr, stack_size);
if (ret != 0) {
log_warning(os, thread)("The %sthread stack size specified is invalid: " SIZE_FORMAT "k",
(thr_type == compiler_thread) ? "compiler " : ((thr_type == java_thread) ? "" : "VM "),
stack_size / K);
thread->set_osthread(NULL);
delete osthread;
return false;
}
// Save some cycles and a page by disabling OS guard pages where we have our own
// VM guard pages (in java threads). For other threads, keep system default guard
// pages in place.
if (thr_type == java_thread || thr_type == compiler_thread) {
ret = pthread_attr_setguardsize(&attr, 0);
}
pthread_t tid = 0;
if (ret == 0) {
ret = pthread_create(&tid, &attr, (void* (*)(void*)) thread_native_entry, thread);
}
if (ret == 0) {
char buf[64];
log_info(os, thread)("Thread started (pthread id: " UINTX_FORMAT ", attributes: %s). ",
(uintx) tid, os::Posix::describe_pthread_attr(buf, sizeof(buf), &attr));
} else {
char buf[64];
log_warning(os, thread)("Failed to start thread - pthread_create failed (%d=%s) for attributes: %s.",
ret, os::errno_name(ret), os::Posix::describe_pthread_attr(buf, sizeof(buf), &attr));
// Log some OS information which might explain why creating the thread failed.
log_info(os, thread)("Number of threads approx. running in the VM: %d", Threads::number_of_threads());
LogStream st(Log(os, thread)::info());
os::Posix::print_rlimit_info(&st);
os::print_memory_info(&st);
}
pthread_attr_destroy(&attr);
if (ret != 0) {
// Need to clean up stuff we've allocated so far.
thread->set_osthread(NULL);
delete osthread;
return false;
}
// OSThread::thread_id is the pthread id.
osthread->set_thread_id(tid);
return true;
}
#define __NR_clone3 435
/* The clone3 syscall wrapper. Linux/x86-64 version.
Copyright (C) 2021-2024 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */
/* clone3() is even more special than fork() as it mucks with stacks
and invokes a function in the right context after its all over. */
#include <sysdep.h>
/* The userland implementation is:
int clone3 (struct clone_args *cl_args, size_t size,
int (*func)(void *arg), void *arg);
the kernel entry is:
int clone3 (struct clone_args *cl_args, size_t size);
The parameters are passed in registers from userland:
rdi: cl_args
rsi: size
rdx: func
rcx: arg
The kernel expects:
rax: system call number
rdi: cl_args
rsi: size */
.text
ENTRY (__clone3)
/* Sanity check arguments. */
movl $-EINVAL, %eax
test %RDI_LP, %RDI_LP /* No NULL cl_args pointer. */
jz SYSCALL_ERROR_LABEL
test %RDX_LP, %RDX_LP /* No NULL function pointer. */
jz SYSCALL_ERROR_LABEL
/* Save the cl_args pointer in R8 which is preserved by the
syscall. */
mov %RCX_LP, %R8_LP
/* Do the system call. */
movl $SYS_ify(clone3), %eax
/* End FDE now, because in the child the unwind info will be
wrong. */
cfi_endproc
syscall
test %RAX_LP, %RAX_LP
jl SYSCALL_ERROR_LABEL
jz L(thread_start)
ret
L(thread_start):
cfi_startproc
/* Clearing frame pointer is insufficient, use CFI. */
cfi_undefined (rip)
/* Clear the frame pointer. The ABI suggests this be done, to mark
the outermost frame obviously. */
xorl %ebp, %ebp
/* Set up arguments for the function call. */
mov %R8_LP, %RDI_LP /* Argument. */
call *%rdx /* Call function. */
/* Call exit with return value from function call. */
movq %rax, %rdi
movl $SYS_ify(exit), %eax
syscall
cfi_endproc
cfi_startproc
PSEUDO_END (__clone3)
libc_hidden_def (__clone3)
weak_alias (__clone3, clone3)
/* The clone3 syscall provides a superset of the functionality of the clone
interface. The kernel might extend __CL_ARGS struct in the future, with
each version with a different __SIZE. If the child is created, it will
start __FUNC function with __ARG arguments.
Different than kernel, the implementation also returns EINVAL for an
invalid NULL __CL_ARGS or __FUNC (similar to __clone).
All callers are responsible for correctly aligning the stack. The stack is
not aligned prior to the syscall (this differs from the exported __clone).
This function is only implemented if the ABI defines HAVE_CLONE3_WRAPPER.
*/
extern int __clone3 (struct clone_args *__cl_args, size_t __size,
int (*__func) (void *__arg), void *__arg);
static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
bool *stopped_start, void *stackaddr,
size_t stacksize, bool *thread_ran)
{
/* Determine whether the newly created threads has to be started
stopped since we have to set the scheduling parameters or set the
affinity. */
bool need_setaffinity = (attr != NULL && attr->extension != NULL
&& attr->extension->cpuset != 0);
if (attr != NULL
&& (__glibc_unlikely (need_setaffinity)
|| __glibc_unlikely ((attr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0)))
*stopped_start = true;
pd->stopped_start = *stopped_start;
if (__glibc_unlikely (*stopped_start))
lll_lock (pd->lock, LLL_PRIVATE);
/* We rely heavily on various flags the CLONE function understands:
CLONE_VM, CLONE_FS, CLONE_FILES
These flags select semantics with shared address space and
file descriptors according to what POSIX requires.
CLONE_SIGHAND, CLONE_THREAD
This flag selects the POSIX signal semantics and various
other kinds of sharing (itimers, POSIX timers, etc.).
CLONE_SETTLS
The sixth parameter to CLONE determines the TLS area for the
new thread.
CLONE_PARENT_SETTID
The kernels writes the thread ID of the newly created thread
into the location pointed to by the fifth parameters to CLONE.
Note that it would be semantically equivalent to use
CLONE_CHILD_SETTID but it is be more expensive in the kernel.
CLONE_CHILD_CLEARTID
The kernels clears the thread ID of a thread that has called
sys_exit() in the location pointed to by the seventh parameter
to CLONE.
The termination signal is chosen to be zero which means no signal
is sent. */
const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM
| CLONE_SIGHAND | CLONE_THREAD
| CLONE_SETTLS | CLONE_PARENT_SETTID
| CLONE_CHILD_CLEARTID
| 0);
TLS_DEFINE_INIT_TP (tp, pd);
struct clone_args args =
{
.flags = clone_flags,
.pidfd = (uintptr_t) &pd->tid,
.parent_tid = (uintptr_t) &pd->tid,
.child_tid = (uintptr_t) &pd->tid,
.stack = (uintptr_t) stackaddr,
.stack_size = stacksize,
.tls = (uintptr_t) tp,
};
int ret = __clone_internal (&args, &start_thread, pd);
if (__glibc_unlikely (ret == -1))
return errno;
/* It's started now, so if we fail below, we'll have to let it clean itself
up. */
*thread_ran = true;
/* Now we have the possibility to set scheduling parameters etc. */
if (attr != NULL)
{
/* Set the affinity mask if necessary. */
if (need_setaffinity)
{
assert (*stopped_start);
int res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
attr->extension->cpusetsize,
attr->extension->cpuset);
if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
return INTERNAL_SYSCALL_ERRNO (res);
}
/* Set the scheduling parameters. */
if ((attr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0)
{
assert (*stopped_start);
int res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
pd->schedpolicy, &pd->schedparam);
if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
return INTERNAL_SYSCALL_ERRNO (res);
}
}
return 0;
}
int
__clone3_internal (struct clone_args *cl_args, int (*func) (void *args),
void *arg)
{
#ifdef HAVE_CLONE3_WRAPPER
# if __ASSUME_CLONE3
return __clone3 (cl_args, sizeof (*cl_args), func, arg);
# else
static int clone3_supported = 1;
if (atomic_load_relaxed (&clone3_supported) == 1)
{
int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
if (ret != -1 || errno != ENOSYS)
return ret;
atomic_store_relaxed (&clone3_supported, 0);
}
# endif
#endif
__set_errno (ENOSYS);
return -1;
}
int
__clone_internal (struct clone_args *cl_args,
int (*func) (void *arg), void *arg)
{
#ifdef HAVE_CLONE3_WRAPPER
int saved_errno = errno;
int ret = __clone3_internal (cl_args, func, arg);
if (ret != -1 || errno != ENOSYS)
return ret;
/* NB: Restore errno since errno may be checked against non-zero
return value. */
__set_errno (saved_errno);
#endif
return __clone_internal_fallback (cl_args, func, arg);
}
int
__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
void *(*start_routine) (void *), void *arg)
{
void *stackaddr = NULL;
size_t stacksize = 0;
/* Avoid a data race in the multi-threaded case, and call the
deferred initialization only once. */
if (__libc_single_threaded_internal)
{
late_init ();
__libc_single_threaded_internal = 0;
/* __libc_single_threaded can be accessed through copy relocations, so
it requires to update the external copy. */
__libc_single_threaded = 0;
}
const struct pthread_attr *iattr = (struct pthread_attr *) attr;
union pthread_attr_transparent default_attr;
bool destroy_default_attr = false;
bool c11 = (attr == ATTR_C11_THREAD);
if (iattr == NULL || c11)
{
int ret = __pthread_getattr_default_np (&default_attr.external);
if (ret != 0)
return ret;
destroy_default_attr = true;
iattr = &default_attr.internal;
}
struct pthread *pd = NULL;
int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize);
int retval = 0;
if (__glibc_unlikely (err != 0))
/* Something went wrong. Maybe a parameter of the attributes is
invalid or we could not allocate memory. Note we have to
translate error codes. */
{
retval = err == ENOMEM ? EAGAIN : err;
goto out;
}
/* Initialize the TCB. All initializations with zero should be
performed in 'get_cached_stack'. This way we avoid doing this if
the stack freshly allocated with 'mmap'. */
#if TLS_TCB_AT_TP
/* Reference to the TCB itself. */
pd->header.self = pd;
/* Self-reference for TLS. */
pd->header.tcb = pd;
#endif
/* Store the address of the start routine and the parameter. Since
we do not start the function directly the stillborn thread will
get the information from its thread descriptor. */
pd->start_routine = start_routine;
pd->arg = arg;
pd->c11 = c11;
/* Copy the thread attribute flags. */
struct pthread *self = THREAD_SELF;
pd->flags = ((iattr->flags & ~(ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET))
| (self->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)));
/* Inherit rseq registration state. Without seccomp filters, rseq
registration will either always fail or always succeed. */
if ((int) THREAD_GETMEM_VOLATILE (self, rseq_area.cpu_id) >= 0)
pd->flags |= ATTR_FLAG_DO_RSEQ;
/* Initialize the field for the ID of the thread which is waiting
for us. This is a self-reference in case the thread is created
detached. */
pd->joinid = iattr->flags & ATTR_FLAG_DETACHSTATE ? pd : NULL;
/* The debug events are inherited from the parent. */
pd->eventbuf = self->eventbuf;
/* Copy the parent's scheduling parameters. The flags will say what
is valid and what is not. */
pd->schedpolicy = self->schedpolicy;
pd->schedparam = self->schedparam;
/* Copy the stack guard canary. */
#ifdef THREAD_COPY_STACK_GUARD
THREAD_COPY_STACK_GUARD (pd);
#endif
/* Copy the pointer guard value. */
#ifdef THREAD_COPY_POINTER_GUARD
THREAD_COPY_POINTER_GUARD (pd);
#endif
/* Setup tcbhead. */
tls_setup_tcbhead (pd);
/* Verify the sysinfo bits were copied in allocate_stack if needed. */
#ifdef NEED_DL_SYSINFO
CHECK_THREAD_SYSINFO (pd);
#endif
/* Determine scheduling parameters for the thread. */
if (__builtin_expect ((iattr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0, 0)
&& (iattr->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)) != 0)
{
/* Use the scheduling parameters the user provided. */
if (iattr->flags & ATTR_FLAG_POLICY_SET)
{
pd->schedpolicy = iattr->schedpolicy;
pd->flags |= ATTR_FLAG_POLICY_SET;
}
if (iattr->flags & ATTR_FLAG_SCHED_SET)
{
/* The values were validated in pthread_attr_setschedparam. */
pd->schedparam = iattr->schedparam;
pd->flags |= ATTR_FLAG_SCHED_SET;
}
if ((pd->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET))
!= (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET))
collect_default_sched (pd);
}
if (__glibc_unlikely (__nptl_nthreads == 1))
_IO_enable_locks ();
/* Pass the descriptor to the caller. */
*newthread = (pthread_t) pd;
LIBC_PROBE (pthread_create, 4, newthread, attr, start_routine, arg);
/* One more thread. We cannot have the thread do this itself, since it
might exist but not have been scheduled yet by the time we've returned
and need to check the value to behave correctly. We must do it before
creating the thread, in case it does get scheduled first and then
might mistakenly think it was the only thread. In the failure case,
we momentarily store a false value; this doesn't matter because there
is no kosher thing a signal handler interrupting us right here can do
that cares whether the thread count is correct. */
atomic_fetch_add_relaxed (&__nptl_nthreads, 1);
/* Our local value of stopped_start and thread_ran can be accessed at
any time. The PD->stopped_start may only be accessed if we have
ownership of PD (see CONCURRENCY NOTES above). */
bool stopped_start = false; bool thread_ran = false;
/* Block all signals, so that the new thread starts out with
signals disabled. This avoids race conditions in the thread
startup. */
internal_sigset_t original_sigmask;
internal_signal_block_all (&original_sigmask);
if (iattr->extension != NULL && iattr->extension->sigmask_set)
/* Use the signal mask in the attribute. The internal signals
have already been filtered by the public
pthread_attr_setsigmask_np interface. */
internal_sigset_from_sigset (&pd->sigmask, &iattr->extension->sigmask);
else
{
/* Conceptually, the new thread needs to inherit the signal mask
of this thread. Therefore, it needs to restore the saved
signal mask of this thread, so save it in the startup
information. */
pd->sigmask = original_sigmask;
/* Reset the cancellation signal mask in case this thread is
running cancellation. */
internal_sigdelset (&pd->sigmask, SIGCANCEL);
}
/* Start the thread. */
if (__glibc_unlikely (report_thread_creation (pd)))
{
stopped_start = true;
/* We always create the thread stopped at startup so we can
notify the debugger. */
retval = create_thread (pd, iattr, &stopped_start, stackaddr,
stacksize, &thread_ran);
if (retval == 0)
{
/* We retain ownership of PD until (a) (see CONCURRENCY NOTES
above). */
/* Assert stopped_start is true in both our local copy and the
PD copy. */
assert (stopped_start);
assert (pd->stopped_start);
/* Now fill in the information about the new thread in
the newly created thread's data structure. We cannot let
the new thread do this since we don't know whether it was
already scheduled when we send the event. */
pd->eventbuf.eventnum = TD_CREATE;
pd->eventbuf.eventdata = pd;
/* Enqueue the descriptor. */
do
pd->nextevent = __nptl_last_event;
while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
pd, pd->nextevent)
!= 0);
/* Now call the function which signals the event. See
CONCURRENCY NOTES for the nptl_db interface comments. */
__nptl_create_event ();
}
}
else
retval = create_thread (pd, iattr, &stopped_start, stackaddr,
stacksize, &thread_ran);
/* Return to the previous signal mask, after creating the new
thread. */
internal_signal_restore_set (&original_sigmask);
if (__glibc_unlikely (retval != 0))
{
if (thread_ran)
/* State (c) and we not have PD ownership (see CONCURRENCY NOTES
above). We can assert that STOPPED_START must have been true
because thread creation didn't fail, but thread attribute setting
did. */
{
assert (stopped_start);
/* Signal the created thread to release PD ownership and early
exit so it could be joined. */
pd->setup_failed = 1;
lll_unlock (pd->lock, LLL_PRIVATE);
/* Similar to pthread_join, but since thread creation has failed at
startup there is no need to handle all the steps. */
pid_t tid;
while ((tid = atomic_load_acquire (&pd->tid)) != 0)
__futex_abstimed_wait_cancelable64 ((unsigned int *) &pd->tid,
tid, 0, NULL, LLL_SHARED);
}
/* State (c) or (d) and we have ownership of PD (see CONCURRENCY
NOTES above). */
/* Oops, we lied for a second. */
atomic_fetch_add_relaxed (&__nptl_nthreads, -1);
/* Free the resources. */
__nptl_deallocate_stack (pd);
/* We have to translate error codes. */
if (retval == ENOMEM)
retval = EAGAIN;
}
else
{
/* We don't know if we have PD ownership. Once we check the local
stopped_start we'll know if we're in state (a) or (b) (see
CONCURRENCY NOTES above). */
if (stopped_start)
/* State (a), we own PD. The thread blocked on this lock either
because we're doing TD_CREATE event reporting, or for some
other reason that create_thread chose. Now let it run
free. */
lll_unlock (pd->lock, LLL_PRIVATE);
/* We now have for sure more than one thread. The main thread might
not yet have the flag set. No need to set the global variable
again if this is what we use. */
THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
}
out:
if (destroy_default_attr)
__pthread_attr_destroy (&default_attr.external);
return retval;
}
versioned_symbol (libc, __pthread_create_2_1, pthread_create, GLIBC_2_34);
libc_hidden_ver (__pthread_create_2_1, __pthread_create)
#ifndef SHARED
strong_alias (__pthread_create_2_1, __pthread_create)
#endif