SpringBoot 优雅停机的方式
- K8S 停止 Pod 时,默认会先发送 SIGTERM 信号,尝试让应用进程优雅停机,如果应用进程无法在 K8S 规定的优雅停止超时时间内退出,即 terminationGracePeriodSeconds 的值(默认为 30 秒),则 K8S 会送 SIGKILL 强制杀死应用进程。
- 手动停止,发送请求到 Spring Boot Actuator 的停机端点:/actuator/shutdown,SpringBoot 会关闭 Web ApplicationContext,然后退出,实现优雅停机。
kill -TERM 方式
SpringBoot 优雅停机时会调用 @PreDestroy 标注的函数。
java
@PreDestroy
public void cleanup() {
// 执行清理操作
log.info("Received shutdown event. Performing cleanup and shutting down gracefully.");
}
发送 SIGTERM 信号给 SpringBoot 进程,在 cleanup() 打印的日志信息中,找到了执行停机任务的线程名:SpringApplicationShutdownHook。
csharp
[2023-09-21 08:29:34.232] INFO [SpringApplicationShutdownHook] - Received shutdown event. Performing cleanup and shutting down gracefully.
立马全局搜索该线程名,发现 SpringBoot 调用 Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook")) 方法,向 JVM 注册了一个 ShutdownHook。ShutdownHook 可以在 JVM 即将关闭时执行一些清理或收尾的任务。
java
class SpringApplicationShutdownHook implements Runnable {
void addRuntimeShutdownHook() {
try {
Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook"));
}
catch (AccessControlException ex) {
// Not allowed in some environments
}
}
在 SpringApplication#run() 方法中,执行 applicationContext.refresh() 方法之前,向 JVM 注册了 ShutdownHook。
使用 AtomicBoolean shutdownHookAdded 变量,确保多线程并发执行时,只有一个线程可以成功添加 SpringApplicationShutdownHook。
将 ConfigurableApplicationContext context 对象添加到 Set<ConfigurableApplicationContext> contexts
集合中,后续会调用会调用 close() 方法关闭 ConfigurableApplicationContext 对象。SpringBoot Web 容器对应的实现为 AnnotationConfigServletWebServerApplicationContext。
java
class SpringApplicationShutdownHook implements Runnable {
private final Set<ConfigurableApplicationContext> contexts = new LinkedHashSet<>();
private final AtomicBoolean shutdownHookAdded = new AtomicBoolean();
SpringApplicationShutdownHandlers getHandlers() {
return this.handlers;
}
void registerApplicationContext(ConfigurableApplicationContext context) {
addRuntimeShutdownHookIfNecessary();
synchronized (SpringApplicationShutdownHook.class) {
assertNotInProgress();
context.addApplicationListener(this.contextCloseListener);
this.contexts.add(context);
}
}
private void addRuntimeShutdownHookIfNecessary() {
if (this.shutdownHookAdded.compareAndSet(false, true)) {
addRuntimeShutdownHook();
}
}
void addRuntimeShutdownHook() {
try {
Runtime.getRuntime().addShutdownHook(new Thread(this, "SpringApplicationShutdownHook"));
}
catch (AccessControlException ex) {
// Not allowed in some environments
}
}
SpringApplicationShutdownHook: A Runnable to be used as a shutdown hook to perform graceful shutdown of Spring Boot applications. run() 方法中做了两件重要的事情:
- contexts.forEach(this::closeAndWait):关闭 ConfigurableApplicationContext,并等待 context 变为 inactive,超时时间默认 10S。如果 context.close() 操作中存在非常耗时的同步操作 ,这里的超时等待不会生效,程序会阻塞在 context.close() 操作。
- actions.forEach(Runnable::run):用户自定义的 Shutdown Action 可以添加到 this.handlers 中,SpringApplicationShutdownHook 在执行关闭任务时,会回调用户自定义的 Shutdown Action。Logback 优雅停机就用到了这个机制,后面会说到。
java
class SpringApplicationShutdownHook implements Runnable {
private static final int SLEEP = 50;
private static final long TIMEOUT = TimeUnit.MINUTES.toMillis(10);
@Override
public void run() {
Set<ConfigurableApplicationContext> contexts;
Set<ConfigurableApplicationContext> closedContexts;
Set<Runnable> actions;
synchronized (SpringApplicationShutdownHook.class) {
this.inProgress = true;
contexts = new LinkedHashSet<>(this.contexts);
closedContexts = new LinkedHashSet<>(this.closedContexts);
actions = new LinkedHashSet<>(this.handlers.getActions());
}
contexts.forEach(this::closeAndWait);
closedContexts.forEach(this::closeAndWait);
actions.forEach(Runnable::run);
}
// Call ConfigurableApplicationContext.close() and wait until the context becomes inactive.
// We can't assume that just because the close method returns that the context is actually inactive.
// It could be that another thread is still in the process of disposing beans.
// 关闭 ConfigurableApplicationContext,等待 context 变为 inactive,超时时间默认 10S
private void closeAndWait(ConfigurableApplicationContext context) {
if (!context.isActive()) {
return;
}
context.close();
try {
int waited = 0;
while (context.isActive()) {
if (waited > TIMEOUT) {
throw new TimeoutException();
}
Thread.sleep(SLEEP);
waited += SLEEP;
}
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
logger.warn("Interrupted waiting for application context " + context + " to become inactive");
}
catch (TimeoutException ex) {
logger.warn("Timed out waiting for application context " + context + " to become inactive", ex);
}
}
ConfigurableApplicationContext#close() 方法注意事项:
Close this application context, releasing all resources and locks that the implementation might hold. This includes destroying all cached singleton beans.
Note: Does not invoke close on a parent context; parent contexts have their own, independent lifecycle.
This method can be called multiple times without side effects: Subsequent close calls on an already closed context will be ignored.
ShutdownEndpoint 方式
在 yml 中添加如下配置,暴露 Spring Actuator Shutdown 端点:/actuator/shutdown。
yaml
management:
endpoint:
shutdown:
enabled: true
endpoints:
web:
exposure:
include: '*'
ShutdownEndpoint 原理:接收到请求后,启动新线程执行 this.context.close() 操作。
java
@Endpoint(id = "shutdown", enableByDefault = false)
public class ShutdownEndpoint implements ApplicationContextAware {
private static final Map<String, String> NO_CONTEXT_MESSAGE = Collections
.unmodifiableMap(Collections.singletonMap("message", "No context to shutdown."));
private static final Map<String, String> SHUTDOWN_MESSAGE = Collections
.unmodifiableMap(Collections.singletonMap("message", "Shutting down, bye..."));
private ConfigurableApplicationContext context;
@WriteOperation
public Map<String, String> shutdown() {
if (this.context == null) {
return NO_CONTEXT_MESSAGE;
}
try {
return SHUTDOWN_MESSAGE;
}
finally {
Thread thread = new Thread(this::performShutdown);
thread.setContextClassLoader(getClass().getClassLoader());
thread.start();
}
}
private void performShutdown() {
try {
Thread.sleep(500L);
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
this.context.close();
}
注意:执行 this.context.close() 时,也会异步触发 SpringApplicationShutdownHook#run() 方法,至于是咋触发的,我也没搞清楚。。。
和发送 SIGTERM 信号相比,SpringApplicationShutdownHook 会在 ApplicationContextClosedListener 中监听 closedContexts,确保不会重复调用 context#close() 方法。
java
class SpringApplicationShutdownHook implements Runnable {
private final Set<ConfigurableApplicationContext> contexts = new LinkedHashSet<>();
private final Set<ConfigurableApplicationContext> closedContexts = Collections.newSetFromMap(new WeakHashMap<>());
private final ApplicationContextClosedListener contextCloseListener = new ApplicationContextClosedListener();
// ApplicationListener to track closed contexts.
private class ApplicationContextClosedListener implements ApplicationListener<ContextClosedEvent> {
@Override
public void onApplicationEvent(ContextClosedEvent event) {
// The ContextClosedEvent is fired at the start of a call to {@code close()}
// and if that happens in a different thread then the context may still be
// active. Rather than just removing the context, we add it to a {@code
// closedContexts} set. This is weak set so that the context can be GC'd once
// the {@code close()} method returns.
synchronized (SpringApplicationShutdownHook.class) {
ApplicationContext applicationContext = event.getApplicationContext();
SpringApplicationShutdownHook.this.contexts.remove(applicationContext);
SpringApplicationShutdownHook.this.closedContexts
.add((ConfigurableApplicationContext) applicationContext);
}
}
}
SpringBoot Tomcat 优雅停机
SpringBoot 接收到停机信号,默认会立即终止 Tomcat,不会等待现有请求完成。在配置文件中加上 server.shutdown=GRACEFUL 配置,Tomcat 等待当前请求完成,实现优雅停机。
yaml
server:
shutdown: GRACEFUL
server.shutdown=IMMEDIATE 配置:发送 HTTP 请求后,停止 SpringBoot 应用,控制台咔咔报错。
less
[2023-09-22 09:11:08.533] INFO [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:11:10.842] INFO [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Stopping ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:11:10.863] ERROR [http-nio-8080-exec-3] [bcb01a34-9721-461f-ad3d-2f71c386ff10] [TID: N/A] - controller system exception, java.nio.channels.ClosedChannelException
org.apache.catalina.connector.ClientAbortException: java.nio.channels.ClosedChannelException
at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:353)
at org.apache.catalina.connector.OutputBuffer.flushByteBuffer(OutputBuffer.java:784)
at org.apache.catalina.connector.OutputBuffer.doFlush(OutputBuffer.java:299)
[2023-09-22 09:11:10.915] INFO [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Destroying ProtocolHandler ["http-nio-8080"]
server.shutdown=GRACEFUL 配置:发送 HTTP 请求后,停止 SpringBoot 应用,控台输出:Commencing graceful shutdown. Waiting for active requests to complete,SpringBoot 进程会等待 active requests 完成,再退出。
css
[2023-09-22 09:18:12.507] INFO [Thread-5] org.springframework.boot.web.embedded.tomcat.GracefulShutdown 53 [] [TID: N/A] - Commencing graceful shutdown. Waiting for active requests to complete
[2023-09-22 09:18:12.507] INFO [tomcat-shutdown] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:18:17.633] INFO [tomcat-shutdown] org.springframework.boot.web.embedded.tomcat.GracefulShutdown 78 [] [TID: N/A] - Graceful shutdown complete
[2023-09-22 09:18:17.637] INFO [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Pausing ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:18:17.645] INFO [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Stopping ProtocolHandler ["http-nio-8080"]
[2023-09-22 09:18:17.657] INFO [Thread-5] org.apache.coyote.http11.Http11NioProtocol 173 [] [TID: N/A] - Destroying ProtocolHandler ["http-nio-8080"]
顺藤摸瓜,在 TomcatWebServer 源码中找到 Graceful Shutdown 的相关代码,如果 shutdown == Shutdown.GRACEFUL 时,会创建 GracefulShutdown 实例,处理优雅停机相关操作:this.gracefulShutdown.shutDownGracefully(callback),否则接收到停止信号后会立即停机:callback.shutdownComplete(GracefulShutdownResult.IMMEDIATE)。
TomcatWebServer#shutDownGracefully() 在 WebServerGracefulShutdownLifecycle#stop() 生命周期函数中被调用。
java
public class TomcatWebServer implements WebServer {
private final Tomcat tomcat;
private final boolean autoStart;
private final GracefulShutdown gracefulShutdown;
public TomcatWebServer(Tomcat tomcat, boolean autoStart, Shutdown shutdown) {
Assert.notNull(tomcat, "Tomcat Server must not be null");
this.tomcat = tomcat;
this.autoStart = autoStart;
this.gracefulShutdown = (shutdown == Shutdown.GRACEFUL) ? new GracefulShutdown(tomcat) : null;
initialize();
}
@Override
public void shutDownGracefully(GracefulShutdownCallback callback) {
if (this.gracefulShutdown == null) {
callback.shutdownComplete(GracefulShutdownResult.IMMEDIATE);
return;
}
this.gracefulShutdown.shutDownGracefully(callback);
}
GracefulShutdown#shutDownGracefully() 新建了一个线程,异步执行 doShutdown() 方法:获取所有 Connectors,执行 connector.getProtocolHandler().closeServerSocketGraceful() 方法,优雅关闭还未断开连接的 ServerSocket,然后再 while 循环中不断等待 TomcatEmbeddedContext 变为 inactive 状态,调用回调函数,将 Tomcat 状态设置为 GracefulShutdownResult.IDLE。
如果优雅关闭未在规定时间内返回,this.aborted 会被设置为 true,将 Tomcat 状态设置为 GracefulShutdownResult.REQUESTS_ACTIVE 并返回。
java
// Handles Tomcat graceful shutdown.
final class GracefulShutdown {
private final Tomcat tomcat;
private volatile boolean aborted = false;
GracefulShutdown(Tomcat tomcat) {
this.tomcat = tomcat;
}
void shutDownGracefully(GracefulShutdownCallback callback) {
logger.info("Commencing graceful shutdown. Waiting for active requests to complete");
new Thread(() -> doShutdown(callback), "tomcat-shutdown").start();
}
private void doShutdown(GracefulShutdownCallback callback) {
List<Connector> connectors = getConnectors();
connectors.forEach(this::close);
try {
for (Container host : this.tomcat.getEngine().findChildren()) {
for (Container context : host.findChildren()) {
while (isActive(context)) {
if (this.aborted) {
logger.info("Graceful shutdown aborted with one or more requests still active");
callback.shutdownComplete(GracefulShutdownResult.REQUESTS_ACTIVE);
return;
}
Thread.sleep(50);
}
}
}
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
logger.info("Graceful shutdown complete");
callback.shutdownComplete(GracefulShutdownResult.IDLE);
}
private void close(Connector connector) {
connector.pause();
connector.getProtocolHandler().closeServerSocketGraceful();
}
优雅停机的关键就在 Connector#close() 方法中,不过太底层了,啃不动。
java
public abstract class AbstractEndpoint<S,U> {
// Close the server socket (to prevent further connections) if the server socket was originally bound on start() (rather than on init()).
public final void closeServerSocketGraceful() {
if (bindState == BindState.BOUND_ON_START) {
// Stop accepting new connections
acceptor.stop(-1);
// Release locks that may be preventing the acceptor from stopping
releaseConnectionLatch();
unlockAccept();
// Signal to any multiplexed protocols (HTTP/2) that they may wish
// to stop accepting new streams
getHandler().pause();
// Update the bindState. This has the side-effect of disabling
// keep-alive for any in-progress connections
bindState = BindState.SOCKET_CLOSED_ON_STOP;
try {
doCloseServerSocket();
} catch (IOException ioe) {
getLog().warn(sm.getString("endpoint.serverSocket.closeFailed", getName()), ioe);
}
}
}
后续会执行 TomcatWebServer#stop() 方法,如果超过规定时间,Tomcat GracefulShutdown 还未完成其任务,则会执行 TomcatWebServer#stop() 强制停止 Tomcat。
TomcatWebServer#stop() 在 WebServerStartStopLifecycle#stop() 生命周期函数中被调用。
java
public class TomcatWebServer implements WebServer {
@Override
public void stop() throws WebServerException {
synchronized (this.monitor) {
boolean wasStarted = this.started;
try {
this.started = false;
try {
if (this.gracefulShutdown != null) {
this.gracefulShutdown.abort();
}
stopTomcat();
this.tomcat.destroy();
}
catch (LifecycleException ex) {
// swallow and continue
}
}
catch (Exception ex) {
throw new WebServerException("Unable to stop embedded Tomcat", ex);
}
finally {
if (wasStarted) {
containerCounter.decrementAndGet();
}
}
}
}
Logback 优雅停机,保证日志不丢失
为了优化程序日志性能,通常有两个做法:
- 设置 OutputStreamAppender#immediateFlush = false,OutputStreamAppender#immediateFlush 默认为 true,默认每次 log event 都强制执行 flush 刷盘操作。将 immediateFlush 改为 false 后,不用每次 log event 都执行刷盘操作,可减少 IO 刷盘次数。但是当 Pod 重启或者停止时,可能会丢失操作系统未 flush 的日志内容。这就需要利用 ShutdownHook 实现 logback 优雅停机。
- 设置 AsyncAppender,logback 默认同步方式打印日志,在同一个进程中,每个线程需要先获取 lock 锁,才能操作 outputStream,多线程同时打印日志,锁争抢可能导致性能问题。使用 AsyncAppender 装饰原生 Appender,log event 变为异步操作,由统一的线程统一操作 outputStream。问题同上,Pod 重启或者停止时,可能会丢失 BlockingQueue 中的 log event,同样需要利用 ShutdownHook 实现 logback 优雅停机。
说个好消息,SpringBoot 已经帮我们造好了轮子,而且 AutoConfiguration 也默认生效,也就是说,我们啥代码也不需要写,只需要保证 SpringBoot 能够正确接收到 SIGTERM 信号,就行。。。他真的,我哭死。。。
logback 优雅停机回调函数的注册:在 LoggingApplicationListener#onApplicationEvent() 方法中监听到 ApplicationEnvironmentPreparedEvent 事件,会调用 SpringApplication.getShutdownHandlers().add(shutdownHandler) 方法,向 SpringApplication.getShutdownHandlers() 中注册 logback shutdownHandler。该 shutdownHandler 会被 SpringApplicationShutdownHook#run() 方法回调。
java
public class LoggingApplicationListener implements GenericApplicationListener {
@Override
public void onApplicationEvent(ApplicationEvent event) {
// ...
else if (event instanceof ApplicationEnvironmentPreparedEvent) {
onApplicationEnvironmentPreparedEvent((ApplicationEnvironmentPreparedEvent) event);
}
// ...
}
private void registerShutdownHookIfNecessary(Environment environment, LoggingSystem loggingSystem) {
if (environment.getProperty(REGISTER_SHUTDOWN_HOOK_PROPERTY, Boolean.class, true)) {
Runnable shutdownHandler = loggingSystem.getShutdownHandler();
if (shutdownHandler != null && shutdownHookRegistered.compareAndSet(false, true)) {
registerShutdownHook(shutdownHandler);
}
}
}
void registerShutdownHook(Runnable shutdownHandler) {
SpringApplication.getShutdownHandlers().add(shutdownHandler);
}
上述代码添加的 Logback ShutdownHandler 在 LogbackLoggingSystem 类中定义:
java
public class LogbackLoggingSystem extends Slf4JLoggingSystem {
public Runnable getShutdownHandler() {
return () -> getLoggerContext().stop();
}
LifeCycle 接口是 logback 组件的生命周期规范,stop() 方法是销毁组件的方法,Appender 接口实现了 LifeCycle 规范,调用 Appender#stop() 方法可以优雅地销毁 Appender 实例。
java
public interface Appender<E> extends LifeCycle, ContextAware, FilterAttachable<E> {
getLoggerContext().stop() --> reset() 会调用 root.recursiveReset() 方法,这个 root 是 ch.qos.logback.classic.Logger 对象,对应着 logback <root>
标签。
xml
<root level="INFO">
<appender-ref ref="STDOUT"/>
<appender-ref ref="FILE"/>
</root>
root logger 对象中聚合两个 appender 对象,分别为代码中配置的 ConsoleAppender 和 RollingFileAppender。在 AppenderAttachableImpl#detachAndStopAllAppenders() 方法中,遍历 Appender 对象,调用其 stop() 方法,销毁实例。
java
public final class Logger implements org.slf4j.Logger, LocationAwareLogger, AppenderAttachable<ILoggingEvent>, Serializable {
transient private AppenderAttachableImpl<ILoggingEvent> aai;
public void detachAndStopAllAppenders() {
if (aai != null) {
aai.detachAndStopAllAppenders();
}
}
public class AppenderAttachableImpl<E> implements AppenderAttachable<E> {
final private COWArrayList<Appender<E>> appenderList = new COWArrayList<Appender<E>>(new Appender[0]);
public void detachAndStopAllAppenders() {
for (Appender<E> a : appenderList) {
a.stop();
}
appenderList.clear();
}
OutputStreamAppender stop 时会关闭输出流,该操作将未 flush 的日志内容强制刷出到 this.outputStream 中,并关闭输出流。
java
public class OutputStreamAppender<E> extends UnsynchronizedAppenderBase<E> {
public void stop() {
lock.lock();
try {
closeOutputStream();
super.stop();
} finally {
lock.unlock();
}
}
protected void closeOutputStream() {
if (this.outputStream != null) {
try {
// before closing we have to output out layout's footer
encoderClose();
this.outputStream.close();
this.outputStream = null;
} catch (IOException e) {
addStatus(new ErrorStatus("Could not close output stream for OutputStreamAppender.", this, e));
}
}
}
AsyncAppenderBase stop 时,会等待 work 线程:worker.join(maxFlushTime),默认时间为 1s。
java
public class AsyncAppenderBase<E> extends UnsynchronizedAppenderBase<E> implements AppenderAttachable<E> {
// The default maximum queue flush time allowed during appender stop.
// If the worker takes longer than this time it will exit, discarding any remaining items in the queue
public static final int DEFAULT_MAX_FLUSH_TIME = 1000;
int maxFlushTime = DEFAULT_MAX_FLUSH_TIME;
@Override
public void stop() {
if (!isStarted())
return;
// mark this appender as stopped so that Worker can also processPriorToRemoval if it is invoking
// aii.appendLoopOnAppenders
// and sub-appenders consume the interruption
super.stop();
// interrupt the worker thread so that it can terminate. Note that the interruption can be consumed
// by sub-appenders
worker.interrupt();
InterruptUtil interruptUtil = new InterruptUtil(context);
try {
interruptUtil.maskInterruptFlag();
worker.join(maxFlushTime);
// check to see if the thread ended and if not add a warning message
if (worker.isAlive()) {
addWarn("Max queue flush timeout (" + maxFlushTime + " ms) exceeded. Approximately " + blockingQueue.size()
+ " queued events were possibly discarded.");
} else {
addInfo("Queue flush finished successfully within timeout.");
}
} catch (InterruptedException e) {
int remaining = blockingQueue.size();
addError("Failed to join worker thread. " + remaining + " queued events may be discarded.", e);
} finally {
interruptUtil.unmaskInterruptFlag();
}
}