前言
- 故事背景
jenkins部署时总是会有几秒钟接口调用报错,观察日志是因为流量被下发到已下线的服务,重启脚本在停止应用之前先调用nacos注销实例api后再重启依然会短暂出现此问题。项目架构是springcloud alibaba,通过openfeign进行微服务之间调用,猜测是LoadBalancer缓存问题。 - 依赖版本
xml
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-alibaba-dependencies</artifactId>
<version>2021.0.1.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-dependencies</artifactId>
<version>2.6.3</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>2021.0.1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<dependency>
<groupId>com.alibaba.cloud</groupId>
<artifactId>spring-cloud-starter-alibaba-nacos-discovery</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-netflix-ribbon</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-openfeign</artifactId>
<version>3.1.1</version>
</dependency>
</dependencies>
- loadbalancer配置
yaml
spring:
cloud:
loadbalancer:
#需要引入Spring Retry依赖
retry:
enabled: true
springcloud loadbalancer缓存原理
-
启用启动首先装配Caffeine一级缓存,缓存应用实例,降低注册中心负载,提升性能
从上图可以看出,可以通过设置spring.cloud.loadbalancer.cache来关闭一级缓存,其值默认是开启的。
-
feign初次从loadbalance获取应用实例会触发装配ServiceInstanceListSupplier逻辑
从一级缓存中获取应用实例:
解决方案
通过上面的源码分析,根本原因是应用从nacos下线后,loadbalancer的一级缓存未移除下线实例,有以下解决办法:
- 重启脚本下线nacos实例后,等待一级缓存失效后(默认35s)再重启应用
- 禁用一级缓存(不建议)
- 监听nacos下线事件,手动移除实例
方案实现
- 采用方案
监听nacos下线事件,手动移除实例 - 代码实现
- 思路
nacos订阅需要删除缓存的服务名(serviceName),下线应用主动调用nacos实例注销api后由nacos server触发自定义的订阅回调逻辑 - nacos订阅源码分析
- 思路
从上图可以看出默认只会订阅当前服务名,这也是为什么以下代码在其他应用主动下线后没有触发回调的原因
- 编写指定服务nacos订阅与删除实例缓存逻辑
java
package com.xxx.xxx.feign.listener;
import com.alibaba.cloud.nacos.NacosDiscoveryProperties;
import com.alibaba.cloud.nacos.NacosServiceManager;
import com.alibaba.nacos.api.naming.NamingService;
import com.alibaba.nacos.api.naming.listener.NamingEvent;
import lombok.SneakyThrows;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.boot.autoconfigure.AutoConfigureAfter;
import org.springframework.boot.autoconfigure.condition.ConditionalOnProperty;
import org.springframework.cache.Cache;
import org.springframework.cloud.loadbalancer.cache.LoadBalancerCacheManager;
import org.springframework.cloud.loadbalancer.cache.LoadBalancerCacheProperties;
import org.springframework.cloud.loadbalancer.core.CachingServiceInstanceListSupplier;
import org.springframework.context.annotation.Configuration;
import javax.annotation.Resource;
import java.util.Arrays;
/**
* @description nacos应用监听
* @date 2024/7/29
*/
@Configuration
@ConditionalOnProperty(name = "spring.cloud.loadbalancer.cache.enabled", havingValue = "true")
@AutoConfigureAfter(LoadBalancerCacheProperties.class)
public class NacosInstanceListener implements InitializingBean {
@Resource
private NacosServiceManager nacosServiceManager;
@Resource
private NacosDiscoveryProperties properties;
@Resource
private LoadBalancerCacheManager caffeineLoadBalancerCacheManager;
@Override
@SneakyThrows
public void afterPropertiesSet() {
NamingService namingService = nacosServiceManager.getNamingService(properties.getNacosProperties());
namingService.subscribe("xxx-product-xxx", properties.getGroup(), Arrays.asList(properties.getClusterName()), event -> {
if (event instanceof NamingEvent) {
NamingEvent namingEvent = (NamingEvent) event;
String svrName = namingEvent.getServiceName();
Cache cache = caffeineLoadBalancerCacheManager.getCache(CachingServiceInstanceListSupplier.SERVICE_INSTANCE_CACHE_NAME);
if (cache != null) {
cache.evict(svrName);
}
System.out.println(event);
}
});
}
}
- 下线服务主动调用nacos注销实例接口,观察效果
从上图可以看到,删除服务实例缓存回调成功触发,考虑到调用nacos api下线到上述代码被成功执行的耗时,应用重启脚本最好在调用nacos api成功后等待1秒左右再停止服务。