实战:微服务之Spring Cloud 负载均衡组件loadbalance和ribbon的超时与重试机制

一、概叙

1.1 实现目标

服务A调用服务B1和B2(B1和B2提供同种服务),当服务B1/B2在停止和重新发布阶段,或B1/B2有一个服务故障时,

  • 需保证服务A正常调用B服务,达到无感知发布的效果(服务B高可用)
  • 需保证服务A的请求负载均衡,避免某个B服务节点压力过大(服务B负载均衡)
  • 主要是验证服务调用超时和重试机制

说明:有用nacos服务注册发现组件。

1.2 环境

XML 复制代码
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>

        <spring.boot.version>2.2.2.RELEASE</spring.boot.version>
        <spring.cloud.version>Hoxton.SR1</spring.cloud.version>
        <spring.alibaba.version>2.1.0.RELEASE</spring.alibaba.version>

服务消费端:已经排除了ribbon,用的是官方推荐的loadbalancer

二、服务调用超时和重试案例

2.1 服务提供者:provider-user

详细nacos上的服务信息

备注:provider-user启动两个服务;provider-user--3015和provider-user--4015

服务端代码

2.2 服务消费者:provider-order

retry接口用的是默认配置:PoolingHttpClientConnectionManager

retry2接口用的是自定义配置:RestTemplate

配置

2.3 负载均衡测试

启动一个消费者服务provider-order--3017;

多次请求provider-order--3017的retry和retry2,通过日志可以确认默认使用了轮询的负载均衡策略来调用provider-user--3015和provider-user--4015

2.4 高可用测试

停止其中一个provider-user-4015服务实例,确认轮询到已停止的服务时,可以成功地在未停止的服务上自动重试请求。

2.5 ribbon.restclient.enabled

1.不设置ribbon.restclient.enabled=true时

provider-order--3017:/retry 接口 直接超时报错,并未进行重试

bash 复制代码
    /** todo 5秒即超时报错,公用的PoolingHttpClientConnectionManager
     * 2024-08-05 20:40:53.150[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200]
     * 2024-08-05 20:40:53.155[] order [http-nio-0.0.0.0-3017-exec-3] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015
     */

provider-order--3017:/retry2 接口 7秒也不报错,且未进行重试。

bash 复制代码
    @GetMapping("/retry2") // todo retry2 7秒也不报错 ,单独配置的RestTemplate; 2024-08-05 20:43:57.405[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String


         *2024-08-05 20:29:41.267[] user [http-nio-0.0.0.0-3015-exec-9] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s
         * 2024-08-05 20:29:51.358[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:30:23.498[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:30:31.393[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:30:38.764[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 7s
         * 2024-08-05 20:31:00.140[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:31:07.552[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:31:15.993[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 5s
         * 2024-08-05 20:31:24.517[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-65- sleep= 6s
         *

2.设置ribbon.restclient.enabled=true时,有三种情况

复制代码
* 案例一:provider-user只启动了一个服务
* 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000)
* todo 日志里面总共有6次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
bash 复制代码
    * 案例一:provider-user只启动了一个服务
     * 设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了6次 (超时时间是-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000)
     * todo 日志里面总共有6次 "RestClient sending new Request(GET" com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
     *
     * 2024-08-05 21:04:15.198[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.servlet.DispatcherServlet-91- GET "/order/api/v1/retry2?name=String", parameters={masked}
     * 2024-08-05 21:04:15.201[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG o.s.w.s.m.m.a.RequestMappingHandlerMapping-412- Mapped to com.zxx.study.cloud.order.controller.RestfulApiController#retry2(String)
     * 2024-08-05 21:04:15.206[] order [http-nio-0.0.0.0-3017-exec-7] INFO  c.z.s.cloud.order.controller.RestfulApiController-255- name=String
     * 2024-08-05 21:04:15.207[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- HTTP GET http://provider-user/user/api/v1/retry?name=String
     * 2024-08-05 21:04:15.209[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG org.springframework.web.client.RestTemplate-147- Accept=[application/json, application/*+json]
     * 2024-08-05 21:04:15.210[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.ZoneAwareLoadBalancer-112- Zone aware logic disabled or there is only one zone
     * 2024-08-05 21:04:15.211[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.loadbalancer.LoadBalancerContext-551- using LB returned Server: 192.168.1.4:3015 for request: http://provider-user/user/api/v1/retry?name=String
     * 2024-08-05 21:04:15.212[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
     * 2024-08-05 21:04:15.213[] order [http-nio-0.0.0.0-3017-exec-7] DEBUG com.netflix.http4.MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 2000
     *
     * todo provider-user只启动了一个服务
     * todo 第一次 5秒超时,后面重试了5次,总共6此u;   MaxAutoRetries:3 + MaxAutoRetriesNextServer: 2
     * 2024-08-05 21:04:15.227[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= eaa55510-140d-4f5d-bf23-8adf9a620646
     * 2024-08-05 21:04:15.228[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
     * 2024-08-05 21:04:18.264[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ba0e6cc8-4cf6-41fe-91eb-42ec3d2e60d2
     * 2024-08-05 21:04:18.265[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ff3fba7e-4798-44b8-a25e-b84e75fb828a
     * 2024-08-05 21:04:21.299[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= f261909b-d752-42ac-b1f9-47a1747481cc
     * 2024-08-05 21:04:24.335[] user [http-nio-0.0.0.0-3015-exec-6] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     * 2024-08-05 21:04:27.386[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= fcb7d6f7-a997-4629-9901-ebf894758a02
     * 2024-08-05 21:04:27.387[] user [http-nio-0.0.0.0-3015-exec-7] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:04:30.409[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= e6f13159-5111-4f0a-babe-3f9b6d8eff61
     * 2024-08-05 21:04:30.410[] user [http-nio-0.0.0.0-3015-exec-8] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     *
复制代码
* 案例二:provider-user只启动了一个服务
* todo provider-user只启动了一个服务
*设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000)
* todo 日志里面总共有3次 “RestClient sending new Request(GET” com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
*
bash 复制代码
   * 案例二:provider-user只启动了一个服务
     * todo provider-user只启动了一个服务
     *设置ribbon.restclient.enabled=true 后;retry还是直接超时,并未重试。而retry2重试了3次 (超时时间是-MonitoredConnectionManager-240- Get connection: {}->http://192.168.1.4:3015, timeout = 5000)
     * todo 日志里面总共有3次 "RestClient sending new Request(GET" com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: ) http://192.168.1.4:3015/user/api/v1/retry?name=String
     *
     * todo retry还是直接超时,并未重试。
     * 2024-08-05 21:24:05.636[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG o.a.h.impl.conn.PoolingHttpClientConnectionManager-349- Connection released: [id: 0][route: {}->http://192.168.1.4:3015][total kept alive: 0; route allocated: 0 of 50; total allocated: 0 of 200]
     * 2024-08-05 21:24:05.637[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG c.n.loadbalancer.reactive.LoadBalancerCommand-314- Got error java.net.SocketTimeoutException: Read timed out when executed on server 192.168.1.4:3015
     * 2024-08-05 21:24:05.643[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] <--- ERROR SocketTimeoutException: Read timed out (5085ms)
     * 2024-08-05 21:24:05.648[] order [http-nio-0.0.0.0-3017-exec-4] DEBUG com.zxx.study.cloud.api.user.UserRestfulApiClient-72- [UserRestfulApiClient#retry] java.net.SocketTimeoutException: Read timed out
     *
     * todo 第一次 6秒超时,后面重试了2次,总共3此u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
     * 2024-08-05 21:18:22.255[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= dc2edb32-25a6-465c-9577-07b59388670f
     * 2024-08-05 21:18:22.256[] user [http-nio-0.0.0.0-3015-exec-1] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 05c593da-6b35-4f7b-8df2-15e9dab7b391
     * 2024-08-05 21:18:27.295[] user [http-nio-0.0.0.0-3015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 07fc8c11-5d77-4ffe-a81c-9851f68a647e
     * 2024-08-05 21:18:32.318[] user [http-nio-0.0.0.0-3015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     *
复制代码
* 案例三:provider-user只启动了2个服务
*
* 总共有5次  com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: )
* 总共5次u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
*  即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。
bash 复制代码
   * 案例三:provider-user只启动了2个服务
     *
     * 总共有5次  com.netflix.niws.client.http.RestClient-588- RestClient sending new Request(GET: )
     * 总共5次u;   MaxAutoRetries:2 + MaxAutoRetriesNextServer: 1
     *  即 provider-user-4015 第一次5秒超时,而后在provider-user-4015上重试了2次,provider-user-3015上也重试了2次;总共5次。
     *  provider-user-3015 2次
     *  2024-08-05 21:34:14.407[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 38c1f932-e99f-49c1-889d-aa79af316089
     * 2024-08-05 21:34:14.409[] user [http-nio-0.0.0.0-3015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 6s
     * 2024-08-05 21:34:19.456[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 410f7b33-46ac-4102-a0ff-3c19c18d2b52
     * 2024-08-05 21:34:19.457[] user [http-nio-0.0.0.0-3015-exec-5] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 4s
     *
     *  provider-user-4015 3次
     *  2024-08-05 21:33:59.263[] user [http-nio-0.0.0.0-4015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 97936999-48f0-4257-9fce-7a78081afa4b
     * 2024-08-05 21:33:59.264[] user [http-nio-0.0.0.0-4015-exec-2] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
     * 2024-08-05 21:34:04.308[] user [http-nio-0.0.0.0-4015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= ac766122-6a7a-4535-b1fb-928e3a9a5f7f
     * 2024-08-05 21:34:04.309[] user [http-nio-0.0.0.0-4015-exec-3] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 7s
     * 2024-08-05 21:34:09.338[] user [http-nio-0.0.0.0-4015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-67- uuid= 02283f9e-fddb-480a-a9d2-68eb3da988ac
     * 2024-08-05 21:34:09.339[] user [http-nio-0.0.0.0-4015-exec-4] INFO  c.z.s.cloud.user.controller.RestfulApiController-68- sleep= 5s
     * */

2.6 小结

复制代码
1. 慎用 重试机制,GET方法也要慎用,其他方法建议不要用重试机制;OkToRetryOnAllOperations: false即只对Get生效;而true对Post,Put,Delete等均生效。 

2.如果一定要用重试,建议单服务配置,同时确保接口的幂等性。


3.ribbon.restclient.enabled=true控制了重试的开关。

三、FeignLoadBalancer分析

跟踪源码,在FeignLoadBalancer中配置了重试相关的策略,如果ribbon.OkToRetryOnAllOperations配置为true,则任何请求方法都进行重试,ribbon.OkToRetryOnAllOperations配置为false时,GET请求方式也会进行重试,非GET方法只有在连接异常时才会进行重试。

java 复制代码
@Override
public RequestSpecificRetryHandler getRequestSpecificRetryHandler (
        RibbonRequest request, IClientConfig requestConfig){
    // 如果OkToRetryOnAllOperations配置为true,则任何请求方法/任何异常的情况都进行重试
    if (this.ribbon.isOkToRetryOnAllOperations()) {
        return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(),
                requestConfig);
    }
    // OkToRetryOnAllOperations配置为false时(默认为false)
    // 非GET请求,只有连接异常时才进行重试
    if (!request.toRequest().method().equals("GET")) {
        return new RequestSpecificRetryHandler(true, false, this.getRetryHandler(),
                requestConfig);
        // GET请求任何情况/任何异常都重试
    } else {
        return new RequestSpecificRetryHandler(true, true, this.getRetryHandler(),
                requestConfig);
    }
}

通过上面的分析,我们可以知道并不是配置了ribbon.OkToRetryOnAllOperations=false就不会进行重试,对于GET请求Ribbon还是会进行重试的,而在我们的系统中并没有对Ribbon的重试机制做特殊的配置,也就是用的默认值。

Ribbon重试机制默认配置如下:

#同一实例最大重试次数,不包括首次调用。默认值为0
ribbon.MaxAutoRetries = 0
#同一个服务其他实例的最大重试次数,不包括第一次调用的实例。默认值为1
ribbon.MaxAutoRetriesNextServer = 1
#是否所有操作都允许重试。默认值为false
ribbon.OkToRetryOnAllOperations = false

由于MaxAutoRetriesNextServer配置默认值为1,而我们的导入接口恰巧又是GET请求,在业务服务接口数据处理超时的情况下,所以Ribbon会自动重试一次。

相关推荐
Java程序之猿10 小时前
微服务分布式(一、项目初始化)
分布式·微服务·架构
Yvemil712 小时前
《开启微服务之旅:Spring Boot Web开发举例》(一)
前端·spring boot·微服务
Yvemil716 小时前
《开启微服务之旅:Spring Boot Web开发》(二)
前端·spring boot·微服务
维李设论16 小时前
Node.js的Web服务在Nacos中的实践
前端·spring cloud·微服务·eureka·nacos·node.js·express
永卿00118 小时前
nginx学习总结(不包含安装过程)
运维·nginx·负载均衡
人类群星闪耀时19 小时前
大模型技术优化负载均衡:AI驱动的智能化运维
运维·人工智能·负载均衡
jwolf219 小时前
基于K8S的微服务:一、服务发现,负载均衡测试(附calico网络问题解决)
微服务·kubernetes·服务发现
Yvemil720 小时前
《开启微服务之旅:Spring Boot Web开发举例》(二)
前端·spring boot·微服务
一个儒雅随和的男子21 小时前
微服务详细教程之nacos和sentinel实战
微服务·架构·sentinel