先来看一下作用和官方文档。

毕竟，官方文档是最权威的资料。哪怕有时候看起来难以理解。那就多看几遍好了。

作用

线程空闲时间（Keep Alive Time）： 当线程池中的线程数超过核心线程数时，多余的空闲线程会根据该参数设定的时间被终止并移除。这样可以避免无限制地保持大量空闲线程。

官方文档

接口文档

keepAliveTime - when the number of threads is greater than the core, this is the maximum time that excess idle threads will wait for new tasks before terminating.

keepAliveTime - 当线程数大于核心数时，这是多余的空闲线程在终止之前等待新任务的最长时间。

类文档

Keep-alive times 保活时间
If the pool currently has more than corePoolSize threads, excess threads will be terminated if they have been idle for more than the keepAliveTime (see getKeepAliveTime(TimeUnit)). This provides a means of reducing resource consumption when the pool is not being actively used. If the pool becomes more active later, new threads will be constructed. This parameter can also be changed dynamically using method setKeepAliveTime(long, TimeUnit). Using a value of Long.MAX_VALUE TimeUnit.NANOSECONDS effectively disables idle threads from ever terminating prior to shut down. By default, the keep-alive policy applies only when there are more than corePoolSize threads. But method allowCoreThreadTimeOut(boolean) can be used to apply this time-out policy to core threads as well, so long as the keepAliveTime value is non-zero.

如果池当前拥有超过 corePoolSize 的线程，则多余的线程如果空闲时间超过 keepAliveTime （请参阅 getKeepAliveTime(TimeUnit) ），将会被终止。这提供了一种在池未被积极使用时减少资源消耗的方法。如果池稍后变得更加活跃，则会构造新线程。该参数也可以使用方法 setKeepAliveTime(long, TimeUnit) 动态更改。使用值 Long.MAX_VALUE TimeUnit.NANOSECONDS 可以有效地禁止空闲线程在关闭之前终止。默认情况下，仅当线程数超过 corePoolSize 时才应用保持活动策略。但只要 keepAliveTime 值非零，方法 allowCoreThreadTimeOut(boolean) 也可用于将此超时策略应用于核心线程。

小结

首先，这个名字取得不好，乍一看，是保活时间的意思。

保活，这个词，也很不好理解。

简单来说，其实就是线程空闲时间的意思。也就是说，一个线程，空闲太久了，然后没有被使用，那么就要被回收，而不是占着茅坑不拉屎，因为线程对象还是蛮贵的。

实现原理

线程空闲时间太久，本质就是最常见的超时问题。

先不管jdk是怎么实现的，一般情况下，简单的超时检查是怎么实现的？其实就是比较两个时间而已。比如支付的时候，超时未支付，就会关闭订单。那这个时候怎么实现超时功能呢？其实就是比较两个时间差，即用当前时间减去创建时间，如果超时，就关闭订单。

那jdk线程池是怎么检查超时的呢？直接看源码分析。

源码分析

java.util.concurrent.ThreadPoolExecutor#getTask

java 复制代码

/**
 * 从阻塞队列里获取业务线程
 *
 * ---
 * Performs blocking or timed wait for a task, depending on
 * current configuration settings, or returns null if this worker
 * must exit because of any of:
 * 1. There are more than maximumPoolSize workers (due to
 *    a call to setMaximumPoolSize).
 * 2. The pool is stopped.
 * 3. The pool is shutdown and the queue is empty.
 * 4. This worker timed out waiting for a task, and timed-out
 *    workers are subject to termination (that is,
 *    {@code allowCoreThreadTimeOut || workerCount > corePoolSize})
 *    both before and after the timed wait, and if the queue is
 *    non-empty, this worker is not the last thread in the pool.
 *
 * @return task, or null if the worker must exit, in which case
 *         workerCount is decremented
 */
private Runnable getTask() {
    boolean timedOut = false; // Did the last poll() time out?

    for (;;) {
        int c = ctl.get();
        int rs = runStateOf(c);

        // Check if queue empty only if necessary.
        if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
            decrementWorkerCount();
            return null;
        }

        int wc = workerCountOf(c);

        // Are workers subject to culling?
        boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

        if ((wc > maximumPoolSize || (timed && timedOut))
            && (wc > 1 || workQueue.isEmpty())) {
            if (compareAndDecrementWorkerCount(c))
                return null;
            continue;
        }

        try {
            // 从阻塞队列里获取任务(业务线程)
            Runnable r = timed ?
                workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
                workQueue.take();
            if (r != null)
                return r;
            timedOut = true; //如果超时，说明当前线程是空闲线程。为什么？因为半天都没有任务，说明当前请求比较少，然后开始回收空闲线程。
        } catch (InterruptedException retry) {
            timedOut = false;
        }
    }
}

先别急，先来看一下接口文档，接口文档的中文翻译如下：

执行阻塞或定时等待任务，具体取决于当前的配置设置。如果出现以下任一情况，工作线程必须退出，返回null：

有超过 maximumPoolSize 个工作线程（由调用 setMaximumPoolSize 导致）。
线程池已经停止。
线程池已关闭且队列为空。
该工作线程在等待任务时超时，并且超时的工作线程可能被终止（即，allowCoreThreadTimeOut || workerCount > corePoolSize）。这种情况在定时等待之前和之后都适用。如果队列非空，并且该工作线程不是线程池中的最后一个线程，则不会将其终止。

返回值：

如果成功获取到任务，则返回任务。
如果工作线程必须退出，则返回null，此时会递减工作线程计数。

文档看的可能有点晕，看文档的目的不是为了一次性解决问题，而是先有个大概的了解，并且这些文档是最权威的，至少不会看到错误资料。

好了，少废话，其实核心代码就是：

java 复制代码

try {
    // 从阻塞队列里获取任务(业务线程)
    Runnable r = timed ?
        workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) : //是否超时？
        workQueue.take();
    if (r != null) //如果没有超时，就返回业务线程
        return r;
    timedOut = true; //如果超时，说明当前线程是空闲线程。为什么？因为半天都没有任务，说明当前请求比较少，然后开始回收空闲线程。
} catch (InterruptedException retry) {
    timedOut = false;
}

最核心的代码，是这一行：

java 复制代码

workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) : //是否超时？

说白了，其实不是通过比较两个时间差实现的，而是基于阻塞队列的poll方法。

那为什么通过从阻塞队列获取任务，就可以实现空闲线程超时的功能呢？

原因是因为，当前工作线程在创建的时候，就已经被启动。然后，接下来，就是一直不停地循环从阻塞队列获取任务来执行。也就是调用getTask()方法的地方。

java.util.concurrent.ThreadPoolExecutor#runWorker

java 复制代码

/**
 * 核心步骤
 * 1.从阻塞队列，读业务线程
 * 2.执行业务线程
 *
 * ---
 * Main worker run loop.  Repeatedly gets tasks from queue and
 * executes them, while coping with a number of issues:
 *
 * 1. We may start out with an initial task, in which case we
 * don't need to get the first one. Otherwise, as long as pool is
 * running, we get tasks from getTask. If it returns null then the
 * worker exits due to changed pool state or configuration
 * parameters.  Other exits result from exception throws in
 * external code, in which case completedAbruptly holds, which
 * usually leads processWorkerExit to replace this thread.
 *
 * 2. Before running any task, the lock is acquired to prevent
 * other pool interrupts while the task is executing, and then we
 * ensure that unless pool is stopping, this thread does not have
 * its interrupt set.
 *
 * 3. Each task run is preceded by a call to beforeExecute, which
 * might throw an exception, in which case we cause thread to die
 * (breaking loop with completedAbruptly true) without processing
 * the task.
 *
 * 4. Assuming beforeExecute completes normally, we run the task,
 * gathering any of its thrown exceptions to send to afterExecute.
 * We separately handle RuntimeException, Error (both of which the
 * specs guarantee that we trap) and arbitrary Throwables.
 * Because we cannot rethrow Throwables within Runnable.run, we
 * wrap them within Errors on the way out (to the thread's
 * UncaughtExceptionHandler).  Any thrown exception also
 * conservatively causes thread to die.
 *
 * 5. After task.run completes, we call afterExecute, which may
 * also throw an exception, which will also cause thread to
 * die. According to JLS Sec 14.20, this exception is the one that
 * will be in effect even if task.run throws.
 *
 * The net effect of the exception mechanics is that afterExecute
 * and the thread's UncaughtExceptionHandler have as accurate
 * information as we can provide about any problems encountered by
 * user code.
 *
 * @param w the worker
 */
final void runWorker(Worker w) {
    Thread wt = Thread.currentThread();
    Runnable task = w.firstTask;
    w.firstTask = null;
    w.unlock(); // allow interrupts
    boolean completedAbruptly = true;
    try {
        //从阻塞队列里获取业务线程
        while (task != null || (task = getTask()) != null) {
            w.lock();
            // If pool is stopping, ensure thread is interrupted;
            // if not, ensure thread is not interrupted.  This
            // requires a recheck in second case to deal with
            // shutdownNow race while clearing interrupt
            if ((runStateAtLeast(ctl.get(), STOP) ||
                 (Thread.interrupted() &&
                  runStateAtLeast(ctl.get(), STOP))) &&
                !wt.isInterrupted())
                wt.interrupt();
            try {
                beforeExecute(wt, task);
                Throwable thrown = null;
                try {
                    //执行业务线程
                    task.run();
                } catch (RuntimeException x) {
                    thrown = x; throw x;
                } catch (Error x) {
                    thrown = x; throw x;
                } catch (Throwable x) {
                    thrown = x;
                    throw new Error(x); //如果业务线程异常，并且抛出异常，那么就会导致调度线程异常，即当前调度线程不再有效
                } finally {
                    afterExecute(task, thrown);
                }
            } finally {
                task = null;
                w.completedTasks++;
                w.unlock();
            }
        }
        completedAbruptly = false;
    } finally {
        //如果没有从阻塞队列获取到任务，就删除超时的空闲线程
        processWorkerExit(w, completedAbruptly);
    }
}

核心代码

java 复制代码

//从阻塞队列里获取业务线程
while (task != null || (task = getTask()) != null) {

说白了，就是有任务，就不停地执行。

但是，如果没有任务，这个时候，说明了什么？说明请求太少，空闲线程太多，那么这个时候，就要回收空闲线程。

回收空闲线程的代码

java 复制代码

//如果没有从阻塞队列获取到任务，就删除超时的空闲线程
processWorkerExit(w, completedAbruptly);

processWorkerExit方法的完整代码

java 复制代码

/**
 * 删除超时的空闲线程
 * ---
 * Performs cleanup and bookkeeping for a dying worker. Called
 * only from worker threads. Unless completedAbruptly is set,
 * assumes that workerCount has already been adjusted to account
 * for exit.  This method removes thread from worker set, and
 * possibly terminates the pool or replaces the worker if either
 * it exited due to user task exception or if fewer than
 * corePoolSize workers are running or queue is non-empty but
 * there are no workers.
 *
 * @param w the worker
 * @param completedAbruptly if the worker died due to user exception
 */
private void processWorkerExit(Worker w, boolean completedAbruptly) {
    if (completedAbruptly) // If abrupt, then workerCount wasn't adjusted
        decrementWorkerCount();

    final ReentrantLock mainLock = this.mainLock;
    mainLock.lock();
    try {
        completedTaskCount += w.completedTasks;
        //删除超时的空闲线程
        workers.remove(w);
    } finally {
        mainLock.unlock();
    }

    tryTerminate();

    int c = ctl.get();
    if (runStateLessThan(c, STOP)) {
        if (!completedAbruptly) {
            int min = allowCoreThreadTimeOut ? 0 : corePoolSize;
            if (min == 0 && ! workQueue.isEmpty())
                min = 1;
            if (workerCountOf(c) >= min)
                return; // replacement not needed
        }
        addWorker(null, false);
    }
}

核心代码

java 复制代码

//删除超时的空闲线程
workers.remove(w);

其实就是从集合里删除数据而已

java 复制代码

/**
 * 线程池的数据结构是Set
 * 元素是工作线程：Worker
 * 工作线程的作用：执行业务任务(业务线程任务)
 * ---
 * Set containing all worker threads in pool. Accessed only when
 * holding mainLock.
 */
private final HashSet<Worker> workers = new HashSet<Worker>(); //线程池：Worker就相当于是线程池里的线程

整体流程

刚才上文有点长，最后再来看个完整版的，其实核心步骤就是：

1、创建工作线程，启动工作线程，让它去干活

2、那具体干什么活呢？执行任务，也就是业务线程任务

3、那从哪里获取任务？从阻塞队列

大概核心的步骤，就是上面几步。之前的是倒着讲的，从空闲线程超时配置参数倒推源码，截图里的123是按调用顺序，顺着讲的。

看了调用栈之后，就比较清晰了。

工作线程和业务线程

前面提到了工作线程和业务线程，有点绕。

再多解释一下。

业务线程

业务线程，就是入参。即我们业务代码里面使用线程池的时候，不是会调用线程池来执行任务吗？这个入参，就是业务线程。

工作线程

那何时创建工作线程呢？其实一开始的入口，还是刚才执行任务的方法。

java.util.concurrent.ThreadPoolExecutor#execute

java 复制代码

/**
 * 执行任务
 * 
 * ---
 * Executes the given task sometime in the future.  The task
 * may execute in a new thread or in an existing pooled thread.
 *
 * If the task cannot be submitted for execution, either because this
 * executor has been shutdown or because its capacity has been reached,
 * the task is handled by the current {@code RejectedExecutionHandler}.
 *
 * @param command the task to execute
 * @throws RejectedExecutionException at discretion of
 *         {@code RejectedExecutionHandler}, if the task
 *         cannot be accepted for execution
 * @throws NullPointerException if {@code command} is null
 */
public void execute(Runnable command) {
    if (command == null)
        throw new NullPointerException();
    /*
     * Proceed in 3 steps:
     *
     * 1. If fewer than corePoolSize threads are running, try to
     * start a new thread with the given command as its first
     * task.  The call to addWorker atomically checks runState and
     * workerCount, and so prevents false alarms that would add
     * threads when it shouldn't, by returning false.
     *
     * 2. If a task can be successfully queued, then we still need
     * to double-check whether we should have added a thread
     * (because existing ones died since last checking) or that
     * the pool shut down since entry into this method. So we
     * recheck state and if necessary roll back the enqueuing if
     * stopped, or start a new thread if there are none.
     *
     * 3. If we cannot queue task, then we try to add a new
     * thread.  If it fails, we know we are shut down or saturated
     * and so reject the task.
     */
    int c = ctl.get(); //0

    //线程池线程数量小于最小数量
    if (workerCountOf(c) < corePoolSize) { //0 < 10
        if (addWorker(command, true)) //创建新工作线程，并且添加新工作线程到线程池
            return;
        c = ctl.get();
    }

    //小于阻塞队列容量
    if (isRunning(c) && workQueue.offer(command)) { //入阻塞队列
        int recheck = ctl.get();
        if (! isRunning(recheck) && remove(command))
            reject(command);
        else if (workerCountOf(recheck) == 0)
            addWorker(null, false);
    } //小于最大线程数量
    else if (!addWorker(command, false)) //创建新的工作线程
        reject(command);
}

然后，再根据具体情况，再决定要不要创建新的工作线程。

假设阻塞队列满了，那么这个时候也会创建新的工作线程，核心代码是：

java 复制代码

//小于最大线程数量
else if (!addWorker(command, false)) //创建新的工作线程

addWorker方法的完整代码，核心步骤：

1、创建新的工作线程

2、添加工作线程到线程池

2、启动工作线程

java 复制代码

/**
 * 主要步骤
 * 1.创建新的调度线程(即Worker线程) //调度线程即工作线程
 * 2.并且添加新的调度线程到线程池
 * 3.最后启动调度线程
 *
 * ---
 * Checks if a new worker can be added with respect to current
 * pool state and the given bound (either core or maximum). If so,
 * the worker count is adjusted accordingly, and, if possible, a
 * new worker is created and started, running firstTask as its
 * first task. This method returns false if the pool is stopped or
 * eligible to shut down. It also returns false if the thread
 * factory fails to create a thread when asked.  If the thread
 * creation fails, either due to the thread factory returning
 * null, or due to an exception (typically OutOfMemoryError in
 * Thread.start()), we roll back cleanly.
 *
 * @param firstTask the task the new thread should run first (or
 * null if none). Workers are created with an initial first task
 * (in method execute()) to bypass queuing when there are fewer
 * than corePoolSize threads (in which case we always start one),
 * or when the queue is full (in which case we must bypass queue).
 * Initially idle threads are usually created via
 * prestartCoreThread or to replace other dying workers.
 *
 * @param core if true use corePoolSize as bound, else
 * maximumPoolSize. (A boolean indicator is used here rather than a
 * value to ensure reads of fresh values after checking other pool
 * state).
 * @return true if successful
 */
private boolean addWorker(Runnable firstTask, boolean core) {
    retry:
    for (;;) {
        int c = ctl.get();
        int rs = runStateOf(c);

        // Check if queue empty only if necessary.
        if (rs >= SHUTDOWN &&
            ! (rs == SHUTDOWN &&
               firstTask == null &&
               ! workQueue.isEmpty()))
            return false;

        for (;;) {
            int wc = workerCountOf(c);
            if (wc >= CAPACITY ||
                wc >= (core ? corePoolSize : maximumPoolSize))
                return false;
            //线程池线程数量加1
            if (compareAndIncrementWorkerCount(c))
                break retry;
            c = ctl.get();  // Re-read ctl
            if (runStateOf(c) != rs)
                continue retry;
            // else CAS failed due to workerCount change; retry inner loop
        }
    }

    boolean workerStarted = false;
    boolean workerAdded = false;
    Worker w = null;
    try {
        //创建新的工作线程
        w = new Worker(firstTask);
        final Thread t = w.thread;
        if (t != null) {
            final ReentrantLock mainLock = this.mainLock;
            mainLock.lock();
            try {
                // Recheck while holding lock.
                // Back out on ThreadFactory failure or if
                // shut down before lock acquired.
                int rs = runStateOf(ctl.get());

                if (rs < SHUTDOWN ||
                    (rs == SHUTDOWN && firstTask == null)) {
                    if (t.isAlive()) // precheck that t is startable
                        throw new IllegalThreadStateException();
                    //添加新线程到线程池
                    workers.add(w);
                    int s = workers.size();
                    if (s > largestPoolSize)
                        largestPoolSize = s;
                    workerAdded = true;
                }
            } finally {
                mainLock.unlock();
            }
            if (workerAdded) {
                //执行Worker线程
                t.start();
                workerStarted = true;
            }
        }
    } finally {
        if (! workerStarted)
            addWorkerFailed(w);
    }
    return workerStarted;
}

工作线程启动之后，就会：

1、从阻塞队列一直不停地循环获取任务

2、执行任务

说白了，就是启动业务线程------执行自定义业务线程的业务代码

总结

核心步骤：

1、新的任务线程进来

2、看情况，决定要不要创建新的工作线程

3、如果是创建新的工作线程，那么就用这个新创建的工作线程去执行任务线程；

如果不是，就暂时先放到阻塞队列

4、那阻塞队列里的任务线程，何时被执行呢？

工作线程会从阻塞队列不停地循环获取。工作线程在被创建的时候，其实就是一直在运行的，一直在运行的意思是，循环从阻塞队列获取任务线程来执行，而不是像缓存里的数据一样被闲置在那里的，所以，本质上来说，工作线程是没有空闲状态的，它一直在循环执行------只不过循环执行的时候，如果太长时间，没有从阻塞队列获取到任务，那么该工作线程就会被回收。主要是现在任务太少，不需要那么多的工作线程，回收是为了节省资源。

ThreadPoolExecutor keepAliveTime 实现原理和源码分析

作用

官方文档

接口文档

类文档

小结

实现原理

源码分析

整体流程

工作线程和业务线程

业务线程

工作线程

总结