situation
大部人在使用网关时候都在里面写同步调用的方法,不符合SpringCloudGateway的规范,造成网关性能成倍下降。典型的鉴权常见如下,这里简化了代码只保留大概得逻辑。由于SpringCloudGateway-webflux底层的线程模型是netty的reactor模型,这里的同步操作将会阻塞netty线程,彻底打破事件循环的高效。
笔者在历经了两家公司,公司中业务网关都使用SpringCloudGateway,其中一公司公司员工在上万人,业务也破局规模,但仍然有如下的代码出现,让笔者倍感奇怪。
java
public class AuthFilter implements GatewayFilter, Ordered {
@Autowire
AuthService authService;
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
String token = exchange.getRequest().getHeaders().get(AUTH_KEY);
Response response = authService.checkToken(token);
if (response.code = 4xx) {
throw new UnAuthorizationException();
}
return chain.filter(exchange);
}
}
笔者在历经了两家公司,公司中业务网关都使用SpringCloudGateway,其中一公司公司员工在上万人,业务也破局规模,但仍然有如下的代码出现,让笔者倍感奇怪。
task
使用webflux规范改造现有不规范的scg使用方式,进行压测,比对结果。
action
对同样的接口,采用webflux、异步、同步网关写法的压测
准备工作
机器配置windows电脑,cpu amd 3700x 8c16thread
网关
jdk版本21
网关pom依赖
xml
<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-gateway</artifactId>
<version>3.1.5</version>
</dependency>
<dependency>
<groupId>com.squareup.okhttp3</groupId>
<artifactId>okhttp</artifactId>
<version>4.12.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
<version>2.7.10</version>
</dependency>
</dependencies>
后端准备
同样也是jdk21
pom依赖
xml
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.4.5</version>
</parent>
<properties>
<maven.compiler.source>21</maven.compiler.source>
<maven.compiler.target>21</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
</dependencies>
这里随机sleep 5-12ms,一般情况下鉴权服务和网关都在一个机房内,往往还会采用缓存优化,延迟比较低。如果业务比较简单,可能也就10ms以内完成校验。
typescript
@SpringBootApplication
@RestController
public class Application {
ThreadLocalRandom random = ThreadLocalRandom.current();
public static void main(String[] args) {
SpringApplication.run(Application.class, args);
}
@GetMapping("/hello")
public String hello() {
try {
int sleep = random.nextInt(5, 12);
Thread.sleep(sleep);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
return "Hello World";
}
}
网关同步调用filter
java
public class SyncFilter implements GatewayFilter, Ordered {
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
Request get = new Request.Builder()
.get()
.url("http://127.0.0.1:8080/hello")
.build();
try (Response response = HttpClientUtil.client.newCall(get).execute()) {
} catch (IOException e) {
throw new RuntimeException(e);
}
return chain.filter(exchange);
}
@Override
public int getOrder() {
return 0;
}
}
网关异步调用filter
kotlin
public class AsyncFilter implements GatewayFilter, Ordered {
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
Request get = new Request.Builder()
.get()
.url("http://127.0.0.1:8080/hello")
.build();
return Mono.fromCallable(() -> {
// 使用阻塞调用(但会在弹性线程池执行)
try (Response response = HttpClientUtil.client.newCall(get).execute()) {
return response.isSuccessful();
} catch (IOException e) {
return false;
}
})
.subscribeOn(Schedulers.boundedElastic())
.flatMap(authResponse -> {
return chain.filter(exchange);
});
}
@Override
public int getOrder() {
return 0;
}
}
webflux client filter
kotlin
public class WebfluxFilter implements GatewayFilter, Ordered {
private final WebClient webClient;
public WebfluxFilter(WebClient.Builder webClientBuilder) {
this.webClient = webClientBuilder.baseUrl("http://127.0.0.1:8080").build();
}
@Override
public Mono<Void> filter(ServerWebExchange exchange, GatewayFilterChain chain) {
return webClient.get()
.uri("/hello")
.retrieve()
.bodyToMono(String.class)
.flatMap(authResponse -> {
return chain.filter(exchange);
});
}
@Override
public int getOrder() {
return 0;
}
}
压测工具
这边使用的压测工具是wrk,由于windows上不能直接运行wrk,这边采用docker运行,运行命令参考如下
bash
docker pull williamyeh/wrk
docker run --rm williamyeh/wrk -t2 -c100 -d10s --latency $url
开始压测
直连后端压测结果

matlab
5 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 14.92ms 3.81ms 44.74ms 70.19%
Req/Sec 2.68k 128.63 3.05k 65.40%
Latency Distribution
50% 14.49ms
75% 17.05ms
90% 19.84ms
99% 26.41ms
133613 requests in 10.07s, 15.95MB read
Requests/sec: 13264.07
Transfer/sec: 1.58MB
同步调用压测结果

matlab
5 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 118.64ms 31.65ms 272.09ms 70.07%
Req/Sec 337.11 42.72 450.00 73.49%
Latency Distribution
50% 118.10ms
75% 138.95ms
90% 158.75ms
99% 194.14ms
16762 requests in 10.06s, 2.03MB read
Requests/sec: 1665.97
Transfer/sec: 206.63KB
异步调用压测结果

matlab
5 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 36.91ms 7.47ms 79.53ms 68.12%
Req/Sec 1.08k 108.83 1.43k 65.73%
Latency Distribution
50% 36.46ms
75% 41.73ms
90% 46.67ms
99% 56.32ms
54119 requests in 10.10s, 6.56MB read
Requests/sec: 5358.98
Transfer/sec: 664.68KB
webflux client压测结果

matlab
5 threads and 200 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 34.35ms 6.42ms 66.74ms 68.38%
Req/Sec 1.17k 95.28 1.40k 69.14%
Latency Distribution
50% 33.93ms
75% 38.43ms
90% 42.84ms
99% 51.02ms
58155 requests in 10.09s, 7.05MB read
Requests/sec: 5761.72
Transfer/sec: 714.77KB
result
从上面的结果不难看出,在使用异步、webflux client的后网关的性能显著提升,平均延迟也更低,对cpu的利用率也更高。此外,在结果看来webflux client相较于异步的方式性能差距不大,虽然使用webflux client需要了解更多的知识,代码改造量更大,但是我还是建议你使用webflux的方式,理由在于以下几点:
- webflux client底层仍然是netty事件循环,线程数相较于boundElasticScheduler更加少,网关会有更少的线程上下文开销。
- webflux client原生支持背压,而boundedElastic使用的是线程池,默认线程数量是10 * 线程数量,有线程池被打爆的风险,即线程耗尽的风险。
- webflux client非阻塞模型减少线程切换和排队时间,尤其在高并发场景下延迟更低。