深入解析 OpenJDK 17 在 Linux 上的线程创建机制

在现代高性能 Java 应用中，线程管理是 JVM 的核心功能之一。Java 线程的创建和调度最终依赖于底层操作系统的线程实现。在 Linux 系统上，JVM 线程创建涉及 POSIX 线程（pthread）接口以及 Linux 内核的 clone/clone3 系统调用。本文将结合 OpenJDK 17 的源码和 Linux 6.8.12 内核的线程创建机制，深入解析 Java 线程的创建过程。

1. JVM 层的线程创建入口

在 OpenJDK 中，Java 线程最终对应一个 Thread 对象，而 JVM 内部为每个线程维护一个 OSThread 对象，它封装了操作系统层面的线程信息。os::create_thread 是 JVM 在 Linux/Unix 系统上创建线程的主要入口：

arduino 复制代码

bool os::create_thread(Thread* thread, ThreadType thr_type, size_t req_stack_size);

核心流程可以总结为：

分配 OSThread 对象

OSThread* osthread = new OSThread(NULL, NULL);

thread->set_osthread(osthread);

osthread->set_state(ALLOCATED);

JVM 先创建一个内部线程对象 OSThread，初始状态为 ALLOCATED。
初始化线程属性（pthread_attr_t）

pthread_attr_init(&attr);

pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED);

使用 POSIX pthread API 设置线程为"分离状态"，避免线程结束后资源泄露。
设置线程栈大小

size_t stack_size = os::Posix::get_initial_stack_size(thr_type, req_stack_size);

if (stack_size < 4096 * K) stack_size += 64 * K;

pthread_attr_setstacksize(&attr, stack_size);

JVM 对线程栈大小做了额外的调整，避免小栈导致潜在的 pthread 库问题。
创建线程

pthread_create(&tid, &attr, (void* ()(void)) thread_native_entry, thread);

JVM 最终调用 pthread_create 创建线程。线程启动函数为 thread_native_entry，它会完成 JVM 内部线程初始化并调用 Java 层线程执行函数。

在 AIX 系统上，JVM 会额外设置线程范围和调度继承：

pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);

pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED);

2. Linux 内核层线程创建：`clone3` 系统调用

Linux 内核提供了低级线程创建接口 clone 和 clone3，允许用户态线程共享进程资源，如地址空间、文件描述符、信号处理等。clone3 是 clone 的增强版，支持更多参数和更灵活的 TLS 设置。

2.1 `clone3` 的包装函数

在 GNU C Library (glibc) 中，__clone3 封装了系统调用：

c 复制代码

extern int __clone3 (struct clone_args *__cl_args, size_t __size,
                     int (*__func) (void *__arg), void *__arg);

Linux 内核期望参数通过寄存器传递，而用户态封装函数会把线程入口函数和参数传入内核：

perl 复制代码

movl    $SYS_ify(clone3), %eax
syscall

内核会创建新线程，执行指定函数 func(arg)，并在线程退出时清理 TID 等信息。

2.2 `create_thread` 内部逻辑

glibc 的 create_thread 会构建 clone_args，并设置线程标志：

objectivec 复制代码

const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM
                         | CLONE_SIGHAND | CLONE_THREAD
                         | CLONE_SETTLS | CLONE_PARENT_SETTID
                         | CLONE_CHILD_CLEARTID);

CLONE_VM：共享内存空间
CLONE_FILES：共享文件描述符
CLONE_THREAD：线程属于同一线程组
CLONE_SETTLS：设置 TLS（线程局部存储）
CLONE_PARENT_SETTID / CLONE_CHILD_CLEARTID：父子线程 TID 维护

最终调用：

ini 复制代码

int ret = __clone_internal (&args, &start_thread, pd);

新线程启动后会执行 start_thread，初始化线程控制块（TCB）、信号掩码、调度策略等。

3. JVM 与 Linux 内核线程创建的联系

结合 JVM 和 Linux 内核源码，我们可以得到线程创建的完整流程：

JVM 分配 OSThread 对象，设置线程类型、初始状态。
JVM 设置 pthread 属性，包括分离状态、栈大小、守护页等。
JVM 调用 pthread_create 或 clone3：

对应 glibc 内部函数 __pthread_create_2_1 → create_thread → __clone_internal。
内核根据 clone_args 创建线程，设置 TLS、TID、共享资源。

新线程执行 thread_native_entry / start_thread：

初始化 JVM 层线程对象
设置调度策略、信号掩码
调用 Java 层 run() 方法

这一流程保证了 Java 线程与操作系统线程一一对应（1:1 模型），同时提供了 JVM 级别的线程管理功能，如垃圾回收线程、守护线程等。

4. JVM 特殊处理与优化

线程栈大小调整
为规避小栈导致的 pthread_attr_setstacksize 问题，JVM 对小于 4MB 的线程栈增加了 64KB 额外空间。
守护页优化
对于 Java 线程，JVM 禁用 OS 层守护页，因为 JVM 已经提供了虚拟机层的保护页，减少内存浪费。
调度与 CPU 亲和性
JVM 可以通过 pthread_attr_t 或内部属性设置线程调度策略、优先级以及 CPU 亲和性。

5. 总结

从 JVM 到 Linux 内核，线程创建涉及多个层级：

JVM 层：管理 OSThread 对象、线程栈、守护页，调用 POSIX 接口
glibc 层：封装 pthread 接口，处理 clone 参数和信号屏蔽
Linux 内核 ：低级线程创建与资源共享，通过 clone3 实现轻量级线程

理解这一流程对 JVM 性能调优、线程调度优化以及多线程调试至关重要。通过结合 OpenJDK 17 和 Linux 6.8.12 源码，我们可以看到 Java 线程创建背后的精细设计和操作系统协作。

##源码

rust 复制代码

bool os::create_thread(Thread* thread, ThreadType thr_type,
                       size_t req_stack_size) {

  assert(thread->osthread() == NULL, "caller responsible");

  // Allocate the OSThread object.
  OSThread* osthread = new OSThread(NULL, NULL);
  if (osthread == NULL) {
    return false;
  }

  // Set the correct thread state.
  osthread->set_thread_type(thr_type);

  // Initial state is ALLOCATED but not INITIALIZED
  osthread->set_state(ALLOCATED);

  thread->set_osthread(osthread);

  // Init thread attributes.
  pthread_attr_t attr;
  pthread_attr_init(&attr);
  guarantee(pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) == 0, "???");

  // Make sure we run in 1:1 kernel-user-thread mode.
  if (os::Aix::on_aix()) {
    guarantee(pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM) == 0, "???");
    guarantee(pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED) == 0, "???");
  }

  // Start in suspended state, and in os::thread_start, wake the thread up.
  guarantee(pthread_attr_setsuspendstate_np(&attr, PTHREAD_CREATE_SUSPENDED_NP) == 0, "???");

  // Calculate stack size if it's not specified by caller.
  size_t stack_size = os::Posix::get_initial_stack_size(thr_type, req_stack_size);

  // JDK-8187028: It was observed that on some configurations (4K backed thread stacks)
  // the real thread stack size may be smaller than the requested stack size, by as much as 64K.
  // This very much looks like a pthread lib error. As a workaround, increase the stack size
  // by 64K for small thread stacks (arbitrarily choosen to be < 4MB)
  if (stack_size < 4096 * K) {
    stack_size += 64 * K;
  }

  // On Aix, pthread_attr_setstacksize fails with huge values and leaves the
  // thread size in attr unchanged. If this is the minimal stack size as set
  // by pthread_attr_init this leads to crashes after thread creation. E.g. the
  // guard pages might not fit on the tiny stack created.
  int ret = pthread_attr_setstacksize(&attr, stack_size);
  if (ret != 0) {
    log_warning(os, thread)("The %sthread stack size specified is invalid: " SIZE_FORMAT "k",
                            (thr_type == compiler_thread) ? "compiler " : ((thr_type == java_thread) ? "" : "VM "),
                            stack_size / K);
    thread->set_osthread(NULL);
    delete osthread;
    return false;
  }

  // Save some cycles and a page by disabling OS guard pages where we have our own
  // VM guard pages (in java threads). For other threads, keep system default guard
  // pages in place.
  if (thr_type == java_thread || thr_type == compiler_thread) {
    ret = pthread_attr_setguardsize(&attr, 0);
  }

  pthread_t tid = 0;
  if (ret == 0) {
    ret = pthread_create(&tid, &attr, (void* (*)(void*)) thread_native_entry, thread);
  }

  if (ret == 0) {
    char buf[64];
    log_info(os, thread)("Thread started (pthread id: " UINTX_FORMAT ", attributes: %s). ",
      (uintx) tid, os::Posix::describe_pthread_attr(buf, sizeof(buf), &attr));
  } else {
    char buf[64];
    log_warning(os, thread)("Failed to start thread - pthread_create failed (%d=%s) for attributes: %s.",
      ret, os::errno_name(ret), os::Posix::describe_pthread_attr(buf, sizeof(buf), &attr));
    // Log some OS information which might explain why creating the thread failed.
    log_info(os, thread)("Number of threads approx. running in the VM: %d", Threads::number_of_threads());
    LogStream st(Log(os, thread)::info());
    os::Posix::print_rlimit_info(&st);
    os::print_memory_info(&st);
  }

  pthread_attr_destroy(&attr);

  if (ret != 0) {
    // Need to clean up stuff we've allocated so far.
    thread->set_osthread(NULL);
    delete osthread;
    return false;
  }

  // OSThread::thread_id is the pthread id.
  osthread->set_thread_id(tid);

  return true;
}

#define __NR_clone3 435

/* The clone3 syscall wrapper.  Linux/x86-64 version.
   Copyright (C) 2021-2024 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <https://www.gnu.org/licenses/>.  */

/* clone3() is even more special than fork() as it mucks with stacks
   and invokes a function in the right context after its all over.  */

#include <sysdep.h>

/* The userland implementation is:
   int clone3 (struct clone_args *cl_args, size_t size,
	       int (*func)(void *arg), void *arg);
   the kernel entry is:
   int clone3 (struct clone_args *cl_args, size_t size);

   The parameters are passed in registers from userland:
   rdi: cl_args
   rsi: size
   rdx: func
   rcx: arg

   The kernel expects:
   rax: system call number
   rdi: cl_args
   rsi: size  */

        .text
ENTRY (__clone3)
	/* Sanity check arguments.  */
	movl	$-EINVAL, %eax
	test	%RDI_LP, %RDI_LP	/* No NULL cl_args pointer.  */
	jz	SYSCALL_ERROR_LABEL
	test	%RDX_LP, %RDX_LP	/* No NULL function pointer.  */
	jz	SYSCALL_ERROR_LABEL

	/* Save the cl_args pointer in R8 which is preserved by the
	   syscall.  */
	mov	%RCX_LP, %R8_LP

	/* Do the system call.  */
	movl	$SYS_ify(clone3), %eax

	/* End FDE now, because in the child the unwind info will be
	   wrong.  */
	cfi_endproc
	syscall

	test	%RAX_LP, %RAX_LP
	jl	SYSCALL_ERROR_LABEL
	jz	L(thread_start)

	ret

L(thread_start):
	cfi_startproc
	/* Clearing frame pointer is insufficient, use CFI.  */
	cfi_undefined (rip)
	/* Clear the frame pointer.  The ABI suggests this be done, to mark
	   the outermost frame obviously.  */
	xorl	%ebp, %ebp

	/* Set up arguments for the function call.  */
	mov	%R8_LP, %RDI_LP	/* Argument.  */
	call	*%rdx		/* Call function.  */
	/* Call exit with return value from function call. */
	movq	%rax, %rdi
	movl	$SYS_ify(exit), %eax
	syscall
	cfi_endproc

	cfi_startproc
PSEUDO_END (__clone3)

libc_hidden_def (__clone3)
weak_alias (__clone3, clone3)

/* The clone3 syscall provides a superset of the functionality of the clone
   interface.  The kernel might extend __CL_ARGS struct in the future, with
   each version with a different __SIZE.  If the child is created, it will
   start __FUNC function with __ARG arguments.

   Different than kernel, the implementation also returns EINVAL for an
   invalid NULL __CL_ARGS or __FUNC (similar to __clone).

   All callers are responsible for correctly aligning the stack.  The stack is
   not aligned prior to the syscall (this differs from the exported __clone).

   This function is only implemented if the ABI defines HAVE_CLONE3_WRAPPER.
*/
extern int __clone3 (struct clone_args *__cl_args, size_t __size,
		     int (*__func) (void *__arg), void *__arg);



static int create_thread (struct pthread *pd, const struct pthread_attr *attr,
			  bool *stopped_start, void *stackaddr,
			  size_t stacksize, bool *thread_ran)
{
  /* Determine whether the newly created threads has to be started
     stopped since we have to set the scheduling parameters or set the
     affinity.  */
  bool need_setaffinity = (attr != NULL && attr->extension != NULL
			   && attr->extension->cpuset != 0);
  if (attr != NULL
      && (__glibc_unlikely (need_setaffinity)
	  || __glibc_unlikely ((attr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0)))
    *stopped_start = true;

  pd->stopped_start = *stopped_start;
  if (__glibc_unlikely (*stopped_start))
    lll_lock (pd->lock, LLL_PRIVATE);

  /* We rely heavily on various flags the CLONE function understands:

     CLONE_VM, CLONE_FS, CLONE_FILES
	These flags select semantics with shared address space and
	file descriptors according to what POSIX requires.

     CLONE_SIGHAND, CLONE_THREAD
	This flag selects the POSIX signal semantics and various
	other kinds of sharing (itimers, POSIX timers, etc.).

     CLONE_SETTLS
	The sixth parameter to CLONE determines the TLS area for the
	new thread.

     CLONE_PARENT_SETTID
	The kernels writes the thread ID of the newly created thread
	into the location pointed to by the fifth parameters to CLONE.

	Note that it would be semantically equivalent to use
	CLONE_CHILD_SETTID but it is be more expensive in the kernel.

     CLONE_CHILD_CLEARTID
	The kernels clears the thread ID of a thread that has called
	sys_exit() in the location pointed to by the seventh parameter
	to CLONE.

     The termination signal is chosen to be zero which means no signal
     is sent.  */
  const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_FILES | CLONE_SYSVSEM
			   | CLONE_SIGHAND | CLONE_THREAD
			   | CLONE_SETTLS | CLONE_PARENT_SETTID
			   | CLONE_CHILD_CLEARTID
			   | 0);

  TLS_DEFINE_INIT_TP (tp, pd);

  struct clone_args args =
    {
      .flags = clone_flags,
      .pidfd = (uintptr_t) &pd->tid,
      .parent_tid = (uintptr_t) &pd->tid,
      .child_tid = (uintptr_t) &pd->tid,
      .stack = (uintptr_t) stackaddr,
      .stack_size = stacksize,
      .tls = (uintptr_t) tp,
    };
  int ret = __clone_internal (&args, &start_thread, pd);
  if (__glibc_unlikely (ret == -1))
    return errno;

  /* It's started now, so if we fail below, we'll have to let it clean itself
     up.  */
  *thread_ran = true;

  /* Now we have the possibility to set scheduling parameters etc.  */
  if (attr != NULL)
    {
      /* Set the affinity mask if necessary.  */
      if (need_setaffinity)
	{
	  assert (*stopped_start);

	  int res = INTERNAL_SYSCALL_CALL (sched_setaffinity, pd->tid,
					   attr->extension->cpusetsize,
					   attr->extension->cpuset);
	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
	    return INTERNAL_SYSCALL_ERRNO (res);
	}

      /* Set the scheduling parameters.  */
      if ((attr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0)
	{
	  assert (*stopped_start);

	  int res = INTERNAL_SYSCALL_CALL (sched_setscheduler, pd->tid,
					   pd->schedpolicy, &pd->schedparam);
	  if (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (res)))
	    return INTERNAL_SYSCALL_ERRNO (res);
	}
    }

  return 0;
}


int
__clone3_internal (struct clone_args *cl_args, int (*func) (void *args),
		   void *arg)
{
#ifdef HAVE_CLONE3_WRAPPER
# if __ASSUME_CLONE3
  return __clone3 (cl_args, sizeof (*cl_args), func, arg);
# else
  static int clone3_supported = 1;
  if (atomic_load_relaxed (&clone3_supported) == 1)
    {
      int ret = __clone3 (cl_args, sizeof (*cl_args), func, arg);
      if (ret != -1 || errno != ENOSYS)
	return ret;

      atomic_store_relaxed (&clone3_supported, 0);
    }
# endif
#endif
  __set_errno (ENOSYS);
  return -1;
}

int
__clone_internal (struct clone_args *cl_args,
		  int (*func) (void *arg), void *arg)
{
#ifdef HAVE_CLONE3_WRAPPER
  int saved_errno = errno;
  int ret = __clone3_internal (cl_args, func, arg);
  if (ret != -1 || errno != ENOSYS)
    return ret;

  /* NB: Restore errno since errno may be checked against non-zero
     return value.  */
  __set_errno (saved_errno);
#endif

  return __clone_internal_fallback (cl_args, func, arg);
}


int
__pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
		      void *(*start_routine) (void *), void *arg)
{
  void *stackaddr = NULL;
  size_t stacksize = 0;

  /* Avoid a data race in the multi-threaded case, and call the
     deferred initialization only once.  */
  if (__libc_single_threaded_internal)
    {
      late_init ();
      __libc_single_threaded_internal = 0;
      /* __libc_single_threaded can be accessed through copy relocations, so
	 it requires to update the external copy.  */
      __libc_single_threaded = 0;
    }

  const struct pthread_attr *iattr = (struct pthread_attr *) attr;
  union pthread_attr_transparent default_attr;
  bool destroy_default_attr = false;
  bool c11 = (attr == ATTR_C11_THREAD);
  if (iattr == NULL || c11)
    {
      int ret = __pthread_getattr_default_np (&default_attr.external);
      if (ret != 0)
	return ret;
      destroy_default_attr = true;
      iattr = &default_attr.internal;
    }

  struct pthread *pd = NULL;
  int err = allocate_stack (iattr, &pd, &stackaddr, &stacksize);
  int retval = 0;

  if (__glibc_unlikely (err != 0))
    /* Something went wrong.  Maybe a parameter of the attributes is
       invalid or we could not allocate memory.  Note we have to
       translate error codes.  */
    {
      retval = err == ENOMEM ? EAGAIN : err;
      goto out;
    }


  /* Initialize the TCB.  All initializations with zero should be
     performed in 'get_cached_stack'.  This way we avoid doing this if
     the stack freshly allocated with 'mmap'.  */

#if TLS_TCB_AT_TP
  /* Reference to the TCB itself.  */
  pd->header.self = pd;

  /* Self-reference for TLS.  */
  pd->header.tcb = pd;
#endif

  /* Store the address of the start routine and the parameter.  Since
     we do not start the function directly the stillborn thread will
     get the information from its thread descriptor.  */
  pd->start_routine = start_routine;
  pd->arg = arg;
  pd->c11 = c11;

  /* Copy the thread attribute flags.  */
  struct pthread *self = THREAD_SELF;
  pd->flags = ((iattr->flags & ~(ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET))
	       | (self->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)));

  /* Inherit rseq registration state.  Without seccomp filters, rseq
     registration will either always fail or always succeed.  */
  if ((int) THREAD_GETMEM_VOLATILE (self, rseq_area.cpu_id) >= 0)
    pd->flags |= ATTR_FLAG_DO_RSEQ;

  /* Initialize the field for the ID of the thread which is waiting
     for us.  This is a self-reference in case the thread is created
     detached.  */
  pd->joinid = iattr->flags & ATTR_FLAG_DETACHSTATE ? pd : NULL;

  /* The debug events are inherited from the parent.  */
  pd->eventbuf = self->eventbuf;


  /* Copy the parent's scheduling parameters.  The flags will say what
     is valid and what is not.  */
  pd->schedpolicy = self->schedpolicy;
  pd->schedparam = self->schedparam;

  /* Copy the stack guard canary.  */
#ifdef THREAD_COPY_STACK_GUARD
  THREAD_COPY_STACK_GUARD (pd);
#endif

  /* Copy the pointer guard value.  */
#ifdef THREAD_COPY_POINTER_GUARD
  THREAD_COPY_POINTER_GUARD (pd);
#endif

  /* Setup tcbhead.  */
  tls_setup_tcbhead (pd);

  /* Verify the sysinfo bits were copied in allocate_stack if needed.  */
#ifdef NEED_DL_SYSINFO
  CHECK_THREAD_SYSINFO (pd);
#endif

  /* Determine scheduling parameters for the thread.  */
  if (__builtin_expect ((iattr->flags & ATTR_FLAG_NOTINHERITSCHED) != 0, 0)
      && (iattr->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET)) != 0)
    {
      /* Use the scheduling parameters the user provided.  */
      if (iattr->flags & ATTR_FLAG_POLICY_SET)
        {
          pd->schedpolicy = iattr->schedpolicy;
          pd->flags |= ATTR_FLAG_POLICY_SET;
        }
      if (iattr->flags & ATTR_FLAG_SCHED_SET)
        {
          /* The values were validated in pthread_attr_setschedparam.  */
          pd->schedparam = iattr->schedparam;
          pd->flags |= ATTR_FLAG_SCHED_SET;
        }

      if ((pd->flags & (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET))
          != (ATTR_FLAG_SCHED_SET | ATTR_FLAG_POLICY_SET))
        collect_default_sched (pd);
    }

  if (__glibc_unlikely (__nptl_nthreads == 1))
    _IO_enable_locks ();

  /* Pass the descriptor to the caller.  */
  *newthread = (pthread_t) pd;

  LIBC_PROBE (pthread_create, 4, newthread, attr, start_routine, arg);

  /* One more thread.  We cannot have the thread do this itself, since it
     might exist but not have been scheduled yet by the time we've returned
     and need to check the value to behave correctly.  We must do it before
     creating the thread, in case it does get scheduled first and then
     might mistakenly think it was the only thread.  In the failure case,
     we momentarily store a false value; this doesn't matter because there
     is no kosher thing a signal handler interrupting us right here can do
     that cares whether the thread count is correct.  */
  atomic_fetch_add_relaxed (&__nptl_nthreads, 1);

  /* Our local value of stopped_start and thread_ran can be accessed at
     any time. The PD->stopped_start may only be accessed if we have
     ownership of PD (see CONCURRENCY NOTES above).  */
  bool stopped_start = false; bool thread_ran = false;

  /* Block all signals, so that the new thread starts out with
     signals disabled.  This avoids race conditions in the thread
     startup.  */
  internal_sigset_t original_sigmask;
  internal_signal_block_all (&original_sigmask);

  if (iattr->extension != NULL && iattr->extension->sigmask_set)
    /* Use the signal mask in the attribute.  The internal signals
       have already been filtered by the public
       pthread_attr_setsigmask_np interface.  */
    internal_sigset_from_sigset (&pd->sigmask, &iattr->extension->sigmask);
  else
    {
      /* Conceptually, the new thread needs to inherit the signal mask
	 of this thread.  Therefore, it needs to restore the saved
	 signal mask of this thread, so save it in the startup
	 information.  */
      pd->sigmask = original_sigmask;
      /* Reset the cancellation signal mask in case this thread is
	 running cancellation.  */
      internal_sigdelset (&pd->sigmask, SIGCANCEL);
    }

  /* Start the thread.  */
  if (__glibc_unlikely (report_thread_creation (pd)))
    {
      stopped_start = true;

      /* We always create the thread stopped at startup so we can
	 notify the debugger.  */
      retval = create_thread (pd, iattr, &stopped_start, stackaddr,
			      stacksize, &thread_ran);
      if (retval == 0)
	{
	  /* We retain ownership of PD until (a) (see CONCURRENCY NOTES
	     above).  */

	  /* Assert stopped_start is true in both our local copy and the
	     PD copy.  */
	  assert (stopped_start);
	  assert (pd->stopped_start);

	  /* Now fill in the information about the new thread in
	     the newly created thread's data structure.  We cannot let
	     the new thread do this since we don't know whether it was
	     already scheduled when we send the event.  */
	  pd->eventbuf.eventnum = TD_CREATE;
	  pd->eventbuf.eventdata = pd;

	  /* Enqueue the descriptor.  */
	  do
	    pd->nextevent = __nptl_last_event;
	  while (atomic_compare_and_exchange_bool_acq (&__nptl_last_event,
						       pd, pd->nextevent)
		 != 0);

	  /* Now call the function which signals the event.  See
	     CONCURRENCY NOTES for the nptl_db interface comments.  */
	  __nptl_create_event ();
	}
    }
  else
    retval = create_thread (pd, iattr, &stopped_start, stackaddr,
			    stacksize, &thread_ran);

  /* Return to the previous signal mask, after creating the new
     thread.  */
  internal_signal_restore_set (&original_sigmask);

  if (__glibc_unlikely (retval != 0))
    {
      if (thread_ran)
	/* State (c) and we not have PD ownership (see CONCURRENCY NOTES
	   above).  We can assert that STOPPED_START must have been true
	   because thread creation didn't fail, but thread attribute setting
	   did.  */
        {
	  assert (stopped_start);
	  /* Signal the created thread to release PD ownership and early
	     exit so it could be joined.  */
	  pd->setup_failed = 1;
	  lll_unlock (pd->lock, LLL_PRIVATE);

	  /* Similar to pthread_join, but since thread creation has failed at
	     startup there is no need to handle all the steps.  */
	  pid_t tid;
	  while ((tid = atomic_load_acquire (&pd->tid)) != 0)
	    __futex_abstimed_wait_cancelable64 ((unsigned int *) &pd->tid,
						tid, 0, NULL, LLL_SHARED);
        }

      /* State (c) or (d) and we have ownership of PD (see CONCURRENCY
	 NOTES above).  */

      /* Oops, we lied for a second.  */
      atomic_fetch_add_relaxed (&__nptl_nthreads, -1);

      /* Free the resources.  */
      __nptl_deallocate_stack (pd);

      /* We have to translate error codes.  */
      if (retval == ENOMEM)
	retval = EAGAIN;
    }
  else
    {
      /* We don't know if we have PD ownership.  Once we check the local
         stopped_start we'll know if we're in state (a) or (b) (see
	 CONCURRENCY NOTES above).  */
      if (stopped_start)
	/* State (a), we own PD. The thread blocked on this lock either
	   because we're doing TD_CREATE event reporting, or for some
	   other reason that create_thread chose.  Now let it run
	   free.  */
	lll_unlock (pd->lock, LLL_PRIVATE);

      /* We now have for sure more than one thread.  The main thread might
	 not yet have the flag set.  No need to set the global variable
	 again if this is what we use.  */
      THREAD_SETMEM (THREAD_SELF, header.multiple_threads, 1);
    }

 out:
  if (destroy_default_attr)
    __pthread_attr_destroy (&default_attr.external);

  return retval;
}
versioned_symbol (libc, __pthread_create_2_1, pthread_create, GLIBC_2_34);
libc_hidden_ver (__pthread_create_2_1, __pthread_create)
#ifndef SHARED
strong_alias (__pthread_create_2_1, __pthread_create)
#endif