背景
最近由于人员变动,我接手了一个历史较久的内部系统。在一次日常工作中,其他项目的开发和测试环境 Redis 频繁出现以下超时错误:
plain
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 6 second(s)
运维分析后发现,该系统与 Redis 建立了上万个连接,将连接池完全占满,导致其他服务无法正常使用 Redis。
问题排查
初步猜测
由于该系统是内部系统,并发量很低,因此怀疑是 连接泄漏 引起的。可能的原因包括:
- 定时任务频繁执行
- Redis 使用不当(如未正确关闭连接)
- 服务长时间未重启,积累了大量未被释放的连接
初步检查代码发现,业务中使用的 JedisCluster 已配置连接池 JedisPoolConfig,看似没有问题。
java
@Bean
public JedisCluster jedisCluster() {
logger.info("****************************redis开始注入配置************************************************ ");
try {
JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxIdle(poolMaxIdle);
poolConfig.setMinIdle(poolMinIdle);
String[] nodes = clusterNodes.split(",");
if (nodes.length < 1) {
throw new IllegalArgumentException("Configure at least one clusterNodes !");
}
Set<HostAndPort> list = new HashSet();
String[] var4 = nodes;
int var5 = nodes.length;
for (int var6 = 0; var6 < var5; ++var6) {
String node = var4[var6];
String[] urls = node.split(":");
if (urls.length < 2) {
throw new IllegalArgumentException("The redis url must contain both host and port !");
}
list.add(new HostAndPort(urls[0], Integer.valueOf(urls[1])));
}
logger.info("****************************redis成功注入配置************************************************ ");
return new JedisCluster(list, 10000, 10000, 5, password, poolConfig);
} catch (Exception var9) {
logger.error(var9.toString(), var9);
throw var9;
}
}
日志分析
- 从日志中发现满屏打印的都是连接失败,表明服务在不断尝试重连 Redis。

- 重启服务后,主要发现两类错误:
- RedisMessageListenerContainer监听 Redis 订阅任务时连接异常,并持续重试。
- Spring Session 定时清理过期 Session 时抛出 Redis 连接异常,且 Lettuce 在解析 Redis 返回结果时出现"不可变集合"错误。
关键错误栈显示,异常最终源于 io.lettuce.core.output.ArrayOutput.set 中调用 AbstractList.add 时抛出 UnsupportedOperationException。
plain
ERROR[redisMessageListenerContainer-9] RedisMessageListenerContainer.handleSubscriptionException(651) - Connection failure occurred. Restarting subscription task after 5000 ms
ERROR[redisMessageListenerContainer-10] RedisMessageListenerContainer.handleSubscriptionException(651) - Connection failure occurred. Restarting subscription task after 5000 ms
ERROR[pool-15-thread-1] TaskUtils$LoggingErrorHandler.handleError(96) - Unexpected error occurred in scheduled task.
org.springframework.data.redis.RedisConnectionFailureException: Unable to connect to Redis; nested exception is io.lettuce.core.RedisConnectionException: Unable to connect
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1112)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getConnection(LettuceConnectionFactory.java:1091)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getClusterConnection(LettuceConnectionFactory.java:369)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory.getConnection(LettuceConnectionFactory.java:339)
at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:134)
at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:97)
at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:84)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:212)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:185)
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96)
at org.springframework.data.redis.core.DefaultSetOperations.members(DefaultSetOperations.java:158)
at org.springframework.data.redis.core.DefaultBoundSetOperations.members(DefaultBoundSetOperations.java:152)
at org.springframework.session.data.redis.RedisSessionExpirationPolicy.cleanExpiredSessions(RedisSessionExpirationPolicy.java:132)
at org.springframework.session.data.redis.RedisOperationsSessionRepository.cleanupExpiredSessions(RedisOperationsSessionRepository.java:430)
at org.springframework.session.data.redis.config.annotation.web.http.RedisHttpSessionConfiguration.lambda$configureTasks$0(RedisHttpSessionConfiguration.java:248)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:93)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: io.lettuce.core.RedisConnectionException: Unable to connect
at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:94)
at io.lettuce.core.AbstractRedisClient.getConnection(AbstractRedisClient.java:262)
at io.lettuce.core.cluster.RedisClusterClient.connect(RedisClusterClient.java:348)
at org.springframework.data.redis.connection.lettuce.ClusterConnectionProvider.getConnection(ClusterConnectionProvider.java:85)
at org.springframework.data.redis.connection.lettuce.LettuceConnectionFactory$SharedConnection.getNativeConnection(LettuceConnectionFactory.java:1110)
... 23 common frames omitted
Caused by: java.lang.UnsupportedOperationException: null
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at io.lettuce.core.output.ArrayOutput.set(ArrayOutput.java:54)
at io.lettuce.core.protocol.RedisStateMachine.safeSet(RedisStateMachine.java:357)
at io.lettuce.core.protocol.RedisStateMachine.decode(RedisStateMachine.java:138)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:708)
at io.lettuce.core.protocol.CommandHandler.decode0(CommandHandler.java:672)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:658)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:587)
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:556)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 common frames omitted
-
在 Pod 中执行命令统计 Redis 连接数,确认连接数持续增长,验证了"连接泄漏"的猜测。
netstat -anp | grep {redis端口} | wc -l
源码追踪:为什么连接没有正确释放?
在Spring中默认的Redis客户端使用的是Lettuce。为了简化,这里只粘一些主要的代码。
RedisMessageListenerContainer
通过异常信息RedisMessageListenerContainer.handleSubscriptionException找到具体的源码:
- 订阅任务 (SubscriptionTask) 在异常时会调用 handleSubscriptionException 尝试关闭连接并重试。
java
public class RedisMessageListenerContainer implements InitializingBean, DisposableBean, BeanNameAware, SmartLifecycle {
// ...
private class SubscriptionTask implements SchedulingAwareRunnable {
// ...
public void run() {
synchronized (localMonitor) {
subscriptionTaskRunning = true;
}
try {
// 重点1:通过connectFactory获取连接
connection = connectionFactory.getConnection();
if (connection.isSubscribed()) {
throw new IllegalStateException("Retrieved connection is already subscribed; aborting listening");
}
boolean asyncConnection = ConnectionUtils.isAsync(connectionFactory);
// NB: sync drivers' Xsubscribe calls block, so we notify the RDMLC before performing the actual subscription.
if (!asyncConnection) {
synchronized (monitor) {
monitor.notify();
}
}
SubscriptionPresentCondition subscriptionPresent = eventuallyPerformSubscription();
if (asyncConnection) {
SpinBarrier.waitFor(subscriptionPresent, getMaxSubscriptionRegistrationWaitingTime());
synchronized (monitor) {
monitor.notify();
}
}
} catch (Throwable t) {
// 重点2:出现异常时会执行handleSubscriptionException处理
handleSubscriptionException(t);
} finally {
// this block is executed once the subscription thread has ended, this may or may not mean
// the connection has been unsubscribed, depending on driver
synchronized (localMonitor) {
subscriptionTaskRunning = false;
localMonitor.notify();
}
}
}
}
}
handleSubscriptionException源码
handleSubscriptionException中会关闭连接,并且如果异常是RedisConnectionFailureException会sleep5秒后重试。
java
protected void handleSubscriptionException(Throwable ex) {
listening = false;
// 1. 关闭连接
subscriptionTask.closeConnection();
if (ex instanceof RedisConnectionFailureException) {
if (isRunning()) {
logger.error("Connection failure occurred. Restarting subscription task after " + recoveryInterval + " ms");
2. sleep一段时间,默认5s
sleepBeforeRecoveryAttempt();
3. 重试
lazyListen();
}
} else {
logger.error("SubscriptionTask aborted with exception:", ex);
}
}
void closeConnection() {
if (connection != null) {
logger.trace("Closing connection");
try {
connection.close();
} catch (Exception e) {
logger.warn("Error closing subscription connection", e);
}
connection = null;
}
}
Spring Session 清理任务
cleanExpiredSessions的逻辑:
- 主要就是读取过期的会话并删除,这里最终会使用的是RedisTemplate中的execute方法。
java
public void cleanExpiredSessions() {
long now = System.currentTimeMillis();
long prevMin = roundDownMinute(now);
if (logger.isDebugEnabled()) {
logger.debug("Cleaning up sessions expiring at " + new Date(prevMin));
}
String expirationKey = getExpirationKey(prevMin);
// 1. 查询过期的会话
Set<Object> sessionsToExpire = this.redis.boundSetOps(expirationKey).members();
// 2. 删除
this.redis.delete(expirationKey);
for (Object session : sessionsToExpire) {
String sessionKey = getSessionKey((String) session);
// 3. 确保过期的会话 Key 被及时清理 (redis惰性删除)
touch(sessionKey);
}
}
execute方法:
- 清理任务通过 RedisTemplate.execute 执行,最终会调用 RedisConnectionUtils.releaseConnection 释放连接。
java
public <T> T execute(RedisCallback<T> action, boolean exposeConnection, boolean pipeline) {
RedisConnectionFactory factory = getRequiredConnectionFactory();
RedisConnection conn = null;
try {
// 1. 获取连接
if (enableTransactionSupport) {
// only bind resources in case of potential transaction synchronization
conn = RedisConnectionUtils.bindConnection(factory, enableTransactionSupport);
} else {
conn = RedisConnectionUtils.getConnection(factory);
}
boolean existingConnection = TransactionSynchronizationManager.hasResource(factory);
RedisConnection connToUse = preProcessConnection(conn, existingConnection);
boolean pipelineStatus = connToUse.isPipelined();
if (pipeline && !pipelineStatus) {
connToUse.openPipeline();
}
RedisConnection connToExpose = (exposeConnection ? connToUse : createRedisConnectionProxy(connToUse));
T result = action.doInRedis(connToExpose);
// close pipeline
if (pipeline && !pipelineStatus) {
connToUse.closePipeline();
}
// TODO: any other connection processing?
return postProcessResult(result, connToUse, existingConnection);
} finally {
// 2. 最终会释放连接
RedisConnectionUtils.releaseConnection(conn, factory, enableTransactionSupport);
}
}
public static RedisConnection doGetConnection(RedisConnectionFactory factory, boolean allowCreate, boolean bind,
boolean transactionSupport) {
// ... 省略其他代码
RedisConnection conn = factory.getConnection();
// ... 省略其他代码
return conn;
}
从源码逻辑上看,两种场景(handleSubscriptionException 和 RedisConnectionUtils.releaseConnection)确实都设计了在异常时关闭或释放连接的流程。那么,为什么实际连接却未被关闭呢?
通过 Debug 跟踪,我发现在这两种异常处理流程中,尝试获取到的 connection 对象竟然都是 null。既然连接对象根本不存在,那么后续所有旨在关闭它的代码自然都无从执行。
handleSubscriptionException.closeConnection()

RedisConnectionUtils.releaseConnection()
Spring使用Lettuce连接
- Spring 默认使用 Lettuce 作为 Redis 客户端,且采用共享连接模式。
- 每次获取连接时,会检查是否存在可用连接,若不存在则新建。
- 调试发现,由于 Lettuce 在创建连接后尝试获取 Redis 命令详情时发生异常(UnsupportedOperationException),该异常向上抛出,导致 connection 未被正确赋值,始终为 null。
- 因此,每次请求都会触发创建新连接,且异常连接从未被关闭,最终造成泄漏。
java
@Override
public RedisClusterConnection getClusterConnection() {
if (!isClusterAware()) {
throw new InvalidDataAccessApiUsageException("Cluster is not configured!");
}
RedisClusterClient clusterClient = (RedisClusterClient) client;
return getShareNativeConnection()
? new LettuceClusterConnection(
(StatefulRedisClusterConnection<byte[], byte[]>) getOrCreateSharedConnection().getConnection(),
connectionProvider, clusterClient, clusterCommandExecutor, clientConfiguration.getCommandTimeout())
: new LettuceClusterConnection(null, connectionProvider, clusterClient, clusterCommandExecutor,
clientConfiguration.getCommandTimeout());
}
@Nullable
StatefulConnection<E, E> getConnection() {
synchronized (this.connectionMonitor) {
if (this.connection == null) {
this.connection = getNativeConnection();
}
if (getValidateConnection()) {
validateConnection();
}
return this.connection;
}
}
private <K, V> CompletableFuture<StatefulRedisClusterConnection<K, V>> connectClusterAsync(RedisCodec<K, V> codec) {
// ... 省略
return connectionMono
.flatMap(c -> c.reactive().command().collectList() // 获取 Redis 服务器支持的命令详情
.map(CommandDetailParser::parse) //
.doOnNext(detail -> c.setState(new RedisState(detail))) //
.doOnError(e -> c.setState(new RedisState(Collections.emptyList()))).then(Mono.just(c))
.onErrorResume(RedisCommandExecutionException.class, e -> Mono.just(c)))
.doOnNext(
c -> connection.registerCloseables(closeableResources, clusterWriter, pooledClusterConnectionProvider))
.map(it -> (StatefulRedisClusterConnection<K, V>) it).toFuture();
}
结论
根因定位
根本原因是 Lettuce 5.1.8 版本的一个兼容性 Bug:
- 在该版本中,Lettuce 创建连接后会立即查询 Redis 服务器支持的命令列表,并在解析响应时尝试修改一个不可变集合,从而抛出 UnsupportedOperationException。
- 该异常导致连接对象未能正常初始化,进而无法被 Spring 正确管理和释放。
- 近期公司 Redis 集群升级至 7.x,而该服务仍使用较旧的 Spring Boot 2.1.21(内含 Lettuce 5.1.8),两者之间的兼容性问题触发此 Bug。
解决方案
升级 Spring Boot 至 2.3.11.RELEASE(对应的 Lettuce 版本已修复该问题),重新部署后连接数恢复正常。
参考 https://blog.gitcode.com/00e88817cf6867be0a9fce2b1efdfcc8.html
总结与思考
总结
本质问题是Lettuce 5.1.8中的bug:
- Lettuce在处理某些Redis命令响应时,尝试修改一个不可变的列表结构,这是设计上的缺陷。
升级后的Lettuce:
- 解决了ArrayOutput中尝试修改一个不可变的列表结构的问题
- 不在创建连接时提前获取Redis中的命令
思考
- 使用框架和中间件的版本需要同步更新
- 我们在开发基础服务时作为能力的提供方一定要做好善后工作,不能一味的往上抛异常。站在Spring Redis的角度,既然Lettuce抛了连接异常,那确实可以认为连接并没有建立。
- 在一般场景下最好使用连接池