实战:微服务之Spring Cloud 负载均衡组件loadbalance和ribbon的超时与重试机制

一、概叙

1.1 实现目标

服务A调用服务B1和B2(B1和B2提供同种服务),当服务B1/B2在停止和重新发布阶段,或B1/B2有一个服务故障时,

  • 需保证服务A正常调用B服务,达到无感知发布的效果(服务B高可用)
  • 需保证服务A的请求负载均衡,避免某个B服务节点压力过大(服务B负载均衡)
  • 主要是验证服务调用超时和重试机制

说明:有用nacos服务注册发现组件。

1.2 环境

XML 复制代码
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>

        <spring.boot.version>2.2.2.RELEASE</spring.boot.version>
        <spring.cloud.version>Hoxton.SR1</spring.cloud.version>
        <spring.alibaba.version>2.1.0.RELEASE</spring.alibaba.version>

服务消费端:已经排除了ribbon,用的是官方推荐的loadbalancer

二、服务调用超时和重试案例

2.1 服务提供者:provider-user

详细nacos上的服务信息

备注:provider-user启动两个服务;provider-user--3015和provider-user--4015

服务端代码

2.2 服务消费者:provider-order

retry接口用的是默认配置:PoolingHttpClientConnectionManager

retry2接口用的是自定义配置:RestTemplate

配置

2.3 负载均衡测试

启动一个消费者服务provider-order--3017;

多次请求provider-order--3017的retry和retry2,通过日志可以确认默认使用了轮询的负载均衡策略来调用provider-user--3015和provider-user--4015

2.4 高可用测试

停止其中一个provider-user-4015服务实例,确认轮询到已停止的服务时,可以成功地在未停止的服务上自动重试请求。

2.5 ribbon.restclient.enabled

1.不设置ribbon.restclient.enabled=true时

provider-order--3017:/retry 接口 直接超时报错,并未进行重试

bash 复制代码
    /** todo 5秒即超时报错,公用的PoolingHttpClientConnectionManager
     * 2024-08-05 20:40:53.150[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200]
     * 2024-08-05 20:40:53.155[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015
     */

provider-order--3017:/retry2 接口 7秒也不报错,且未进行重试。

bash 复制代码
    @GetMapping("/retry2") // todo retry2 7秒也不报错 ,单独配置的RestTemplate; 2024-08-05 20:43:57.405[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String


         *2024-08-05 20:29:41.267[] user [http-nio-0.0.0.0-3015-exec-9] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s
         * 2024-08-05 20:29:51.358[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:30:23.498[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:30:31.393[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:30:38.764[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s
         * 2024-08-05 20:31:00.140[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:31:07.552[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:31:15.993[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:31:24.517[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 6s
         *

2.设置ribbon.restclient.enabled=true时,有三种情况

复制代码
* 案例一:provider-user只启动了一个服务
* 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000)
* todo 日志里面总共有6次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
bash 复制代码
    * 案例一:provider-user只启动了一个服务
     * 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000)
     * todo 日志里面总共有6次 "RestClient sending new Request(GET" com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
     *
     * 2024-08-05 21:04:15.198[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.servlet.DispatcherServlet-91- GET "/order/api/v1/retry2?name=String", parameters={masked}
     * 2024-08-05 21:04:15.201[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG o.s.w.s.m.m.a.RequestMappingHandlerMapping-412- Mapped to com.zxx.study.cloud.order.controller.RestfulApiController#retry2(String)
     * 2024-08-05 21:04:15.206[] order [http-nio-0.0.0.0-3017-exec-7] INFO  c.z.s.cloud.order.controller.RestfulApiController-255- name=String
     * 2024-08-05 21:04:15.207[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String
     * 2024-08-05 21:04:15.209[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- Accept=[application/json, application/*+json]
     * 2024-08-05 21:04:15.210[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.ZoneAwareLoadBalancer-112- Zone aware logic disabled or there is only one zone
     * 2024-08-05 21:04:15.211[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.LoadBalancerContext-551- using LB returned Server: 192.168.1.4:3015 for request: http://provider-user/user/api/v1/retry?name=String
     * 2024-08-05 21:04:15.212[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
     * 2024-08-05 21:04:15.213[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.http4.MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000
     *
     * todo provider-user只启动了一个服务
     * todo 第一次 5秒超时,后面重试了5次,总共6此u;   MaxAutoRetries:3 + MaxAutoRetriesNextServer: 2
     * 2024-08-05 21:04:15.227[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= eaa55510-140d-4f5d-bf23-8adf9a620646
     * 2024-08-05 21:04:15.228[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
     * 2024-08-05 21:04:18.264[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ba0e6cc8-4cf6-41fe-91eb-42ec3d2e60d2
     * 2024-08-05 21:04:18.265[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ff3fba7e-4798-44b8-a25e-b84e75fb828a
     * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= f261909b-d752-42ac-b1f9-47a1747481cc
     * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     * 2024-08-05 21:04:27.386[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= fcb7d6f7-a997-4629-9901-ebf894758a02
     * 2024-08-05 21:04:27.387[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:04:30.409[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= e6f13159-5111-4f0a-babe-3f9b6d8eff61
     * 2024-08-05 21:04:30.410[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     *
复制代码
* 案例二:provider-user只启动了一个服务
* todo provider-user只启动了一个服务
*设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000)
* todo 日志里面总共有3次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
*
bash 复制代码
   * 案例二:provider-user只启动了一个服务
     * todo provider-user只启动了一个服务
     *设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000)
     * todo 日志里面总共有3次 "RestClient sending new Request(GET" com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
     *
     * todo retry还是直接超时,并未重试。
     * 2024-08-05 21:24:05.636[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200]
     * 2024-08-05 21:24:05.637[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015
     * 2024-08-05 21:24:05.643[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] <--- ERROR SocketTimeoutException: Read timed out (5085ms)
     * 2024-08-05 21:24:05.648[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] java.net.SocketTimeoutException: Read timed out
     *
     * todo 第一次 6秒超时,后面重试了2次,总共3此u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
     * 2024-08-05 21:18:22.255[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= dc2edb32-25a6-465c-9577-07b59388670f
     * 2024-08-05 21:18:22.256[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 05c593da-6b35-4f7b-8df2-15e9dab7b391
     * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 07fc8c11-5d77-4ffe-a81c-9851f68a647e
     * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     *
复制代码
* 案例三:provider-user只启动了2个服务
*
* 总共有5次  com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: )
* 总共5次u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
*  即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。
bash 复制代码
   * 案例三:provider-user只启动了2个服务
     *
     * 总共有5次  com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: )
     * 总共5次u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
     *  即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。
     *  provider-user-3015 2次
     *  2024-08-05 21:34:14.407[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 38c1f932-e99f-49c1-889d-aa79af316089
     * 2024-08-05 21:34:14.409[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:34:19.456[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 410f7b33-46ac-4102-a0ff-3c19c18d2b52
     * 2024-08-05 21:34:19.457[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     *
     *  provider-user-4015 3次
     *  2024-08-05 21:33:59.263[] user [http-nio-0.0.0.0-4015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 97936999-48f0-4257-9fce-7a78081afa4b
     * 2024-08-05 21:33:59.264[] user [http-nio-0.0.0.0-4015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
     * 2024-08-05 21:34:04.308[] user [http-nio-0.0.0.0-4015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ac766122-6a7a-4535-b1fb-928e3a9a5f7f
     * 2024-08-05 21:34:04.309[] user [http-nio-0.0.0.0-4015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 7s
     * 2024-08-05 21:34:09.338[] user [http-nio-0.0.0.0-4015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 02283f9e-fddb-480a-a9d2-68eb3da988ac
     * 2024-08-05 21:34:09.339[] user [http-nio-0.0.0.0-4015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
     * */

2.6 小结

复制代码
1. 慎用 重试机制,GET方法也要慎用,其他方法建议不要用重试机制;OkToRetryOnAllOperations: false即只对Get生效;而true对Post,Put,Delete等均生效。 

2.如果一定要用重试,建议单服务配置,同时确保接口的幂等性。


3.ribbon.restclient.enabled=true控制了重试的开关。

三、FeignLoadBalancer分析

跟踪源码,在FeignLoadBalancer中配置了重试相关的策略,如果ribbon.OkToRetryOnAllOperations配置为true,则任何请求方法都进行重试,ribbon.OkToRetryOnAllOperations配置为false时,GET请求方式也会进行重试,非GET方法只有在连接异常时才会进行重试。

java 复制代码
@Override
public RequestSpecificRetryHandler getRequestSpecificRetryHandler (
        RibbonRequest request, IClientConfig requestConfig){
    // 如果OkToRetryOnAllOperations配置为true,则任何请求方法/任何异常的情况都进行重试
    if (this.ribbon.isOkToRetryOnAllOperations()) {
        return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(),
                requestConfig);
    }
    // OkToRetryOnAllOperations配置为false时(默认为false)
    // 非GET请求,只有连接异常时才进行重试
    if (!request.toRequest().method().equals("GET")) {
        return new RequestSpecificRetryHandler(true, false, this.getRetryHandler(),
                requestConfig);
        // GET请求任何情况/任何异常都重试
    } else {
        return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(),
                requestConfig);
    }
}

通过上面的分析,我们可以知道并不是配置了ribbon.OkToRetryOnAllOperations=false就不会进行重试,对于GET请求Ribbon还是会进行重试的,而在我们的系统中并没有对Ribbon的重试机制做特殊的配置,也就是用的默认值。

Ribbon重试机制默认配置如下:

复制代码
#同一实例最大重试次数,不包括首次调用。默认值为0
ribbon.MaxAutoRetries = 0
#同一个服务其他实例的最大重试次数,不包括第一次调用的实例。默认值为1
ribbon.MaxAutoRetriesNextServer = 1
#是否所有操作都允许重试。默认值为false
ribbon.OkToRetryOnAllOperations = false

由于MaxAutoRetriesNextServer配置默认值为1,而我们的导入接口恰巧又是GET请求,在业务服务接口数据处理超时的情况下,所以Ribbon会自动重试一次。

相关推荐
蝎子莱莱爱打怪3 天前
XZLL-IM干货系列 04|Netty 长连接实战:Pipeline 怎么排、心跳怎么跳、连接怎么管
后端·微服务·面试
SamDeepThinking4 天前
Java微服务练习方式
java·后端·微服务
米丘7 天前
微前端之 Web Components 完全指南
微服务·html
霸道流氓气质10 天前
领域驱动设计(DDD)在 Spring Boot 微服务中的实践指南
运维·spring boot·微服务
慧一居士10 天前
Feign的GET请求如何传递对象参数?
java·spring cloud
我登哥MVP10 天前
SpringCloud Alibaba 核心组件解析:服务链路追踪
java·spring boot·后端·spring·spring cloud·java-ee·maven
慧一居士10 天前
SpringCloud 微服务Feigin 用的完整调用端和被调用的示例
java·spring cloud
霸道流氓气质10 天前
Spring Boot 微服务性能优化完全指南
spring boot·微服务·性能优化
地瓜伯伯11 天前
从MESI缓存一致性协议讲透synchronized的底层
java·spring boot·spring·spring cloud·微服务·springcloud
Devin~Y11 天前
大厂 Java 面试实录:从音视频内容社区到 AI RAG 的全链路技术设计
java·spring boot·redis·spring cloud·微服务·kafka·音视频