Http Client Tcp Connect Failed Retry

为什么需要 Tcp Connect Failed Retry ?

K8S 多副本环境下,如果服务消费方使用 http client 访问不健康的 pod(服务提供方),http client 会抛出异常,一般这个异常会带回到前端页面,显示【系统繁忙,请稍候重试】或者【网络异常,请稍候重试】等信息。

K8S 集群内部,一般使用 Service 进行 Pod 间通信。如果服务消费方使用 http client 进行 tcp 握手失败后,可以继续使用 Service 地址进行重试,因为 Service 底层由 Endpoint 组件实现,自带负载均衡效果,重试就有机会将 tcp 连接打到健康的 pod 上去。

为什么只做 Tcp Connect Failed Retry,不做 http read(write) timeout retry ?如果要做 http read timeout retry,需要保证服务提供方的接口均为幂等接口,这个成本比较高,折中,只做 Tcp Connect Failed Retry,剩下的 read timeout 异常,往前端抛。

Feign Client Tcp 连接失败后进行 Retry

Feign Client 提供 Retryer 接口,并提供了两个默认实现:

  1. NEVER_RETRY:空实现,不重试。
  2. Default:默认实现,重试指定次数。
java 复制代码
/**
 * Cloned for each invocation to {@link Client#execute(Request, feign.Request.Options)}.
 * Implementations may keep state to determine if retry operations should continue or not.
 */
public interface Retryer extends Cloneable {

  /**
   * if retry is permitted, return (possibly after sleeping). Otherwise propagate the exception.
   */
  void continueOrPropagate(RetryableException e);

  Retryer clone();

  class Default implements Retryer {

    private final int maxAttempts;
    private final long period;
    private final long maxPeriod;
    int attempt;
    long sleptForMillis;

    public Default() {
      this(100, SECONDS.toMillis(1), 5);
    }

    public Default(long period, long maxPeriod, int maxAttempts) {
      this.period = period;
      this.maxPeriod = maxPeriod;
      this.maxAttempts = maxAttempts;
      this.attempt = 1;
    }

    // visible for testing;
    protected long currentTimeMillis() {
      return System.currentTimeMillis();
    }

    public void continueOrPropagate(RetryableException e) {
      if (attempt++ >= maxAttempts) {
        throw e;
      }

      long interval;
      if (e.retryAfter() != null) {
        interval = e.retryAfter().getTime() - currentTimeMillis();
        if (interval > maxPeriod) {
          interval = maxPeriod;
        }
        if (interval < 0) {
          return;
        }
      } else {
        interval = nextMaxInterval();
      }
      try {
        Thread.sleep(interval);
      } catch (InterruptedException ignored) {
        Thread.currentThread().interrupt();
        throw e;
      }
      sleptForMillis += interval;
    }

    /**
     * Calculates the time interval to a retry attempt. <br>
     * The interval increases exponentially with each attempt, at a rate of nextInterval *= 1.5
     * (where 1.5 is the backoff factor), to the maximum interval.
     *
     * @return time in milliseconds from now until the next attempt.
     */
    long nextMaxInterval() {
      long interval = (long) (period * Math.pow(1.5, attempt - 1));
      return interval > maxPeriod ? maxPeriod : interval;
    }

    @Override
    public Retryer clone() {
      return new Default(period, maxPeriod, maxAttempts);
    }
  }

  /**
   * Implementation that never retries request. It propagates the RetryableException.
   */
  Retryer NEVER_RETRY = new Retryer() {

    @Override
    public void continueOrPropagate(RetryableException e) {
      throw e;
    }

    @Override
    public Retryer clone() {
      return this;
    }
  };
}

SpringBoot OpenFeign 自动装配默认使用 NEVER_RETRY 实现,如果想要切换为 Default 实现,可以在 yml 中配置(内部类使用 $ 分隔符),也可使用注入 Bean 的方式进行配置。

yml 复制代码
feign:
  client:
    config:
      default:
        connectTimeout: 1000
        readTimeout: 5000
        retryer: feign.Retryer$Default

Default 是一个很好的实现范例,但是不满足我们的要求,该实现发生任何异常均会进行重试,而我们只想在 tcp 握手失败时进行重试。

经过实验,发现 tcp 握手失败有两种情况:

  1. 服务器未监听端口,客户端与服务器 TCP 握手会立即失败,抛出 ConnectException,异常消息一般为 Connection Refused,说明 TCP 连接被拒绝。
  2. TCP 握手超时,超过 connectTimeout 时间后,抛出 SocketTimeoutException,异常 Message 为 connect timed out,说明 TCP 连接超时。

注意:如果抛出 SocketTimeoutException,异常消息为 Read time out,说明发生了 http read time out,不能进行重试。

  1. ConnectException:Signals that an error occurred while attempting to connect a socket to a remote address and port. Typically, the connection was refused remotely (e.g., no process is listening on the remote address/port).
  2. SocketTimeoutException: Signals that a timeout has occurred on a socket read or accept.

ConnectFailedRetryer 使用了装饰器模式,内部有一个成员变量 Retryer retryer 执行真正的 retry 操作,在 ConnectFailedRetryer#continueOrPropagate() 方法中,仅仅只是针对指定的异常,调用 retryer#continueOrPropagate() 方法执行 retry,其他情况则直接抛出异常。

java 复制代码
@Slf4j
public class ConnectFailedRetryer implements Retryer {

    private final Retryer retryer;

    public ConnectFailedRetryer() {
        this(100, SECONDS.toMillis(1), 5);
    }

    public ConnectFailedRetryer(long period, long maxPeriod, int maxAttempts) {
        this(new Default(period, maxPeriod, maxAttempts));
    }

    public ConnectFailedRetryer(Retryer retryer) {
        this.retryer = retryer;
    }

    @Override
    public void continueOrPropagate(RetryableException e) {
        Request request = e.request();
        String url = request.url();
        Request.HttpMethod httpMethod = request.httpMethod();
        Throwable throwable = e.getCause();
        if (throwable instanceof ConnectException) {
            ConnectException connectException = (ConnectException) throwable;
            log.warn("url: {}, httpMethod: {}, connectException: {}", url, httpMethod, connectException.getMessage());
            retryer.continueOrPropagate(e);
        } else if (throwable instanceof SocketTimeoutException) {
            SocketTimeoutException socketTimeoutException = (SocketTimeoutException) throwable;
            String timeoutExceptionMessage = socketTimeoutException.getMessage();
            log.warn("url: {}, httpMethod: {}, socketTimeoutException: {}", url, httpMethod, timeoutExceptionMessage);
            if (StrUtil.equals(timeoutExceptionMessage, "connect timed out")) {
                retryer.continueOrPropagate(e);
            } else {
                throw e;
            }
        } else {
            throw e;
        }
    }

    @Override
    public Retryer clone() {
        Retryer cloneRetryer = retryer.clone();
        return new ConnectFailedRetryer(cloneRetryer);
    }

}

在 yml 中配置自定义的 ConnectFailedRetryer:

yml 复制代码
feign:
  client:
    config:
      default:
        connectTimeout: 1000
        readTimeout: 5000
        retryer: com.oneby.common.feign.ConnectFailedRetryer

OkHttp Client Tcp 连接失败后进行 Retry

OkHttp 需要在 Interceptor#intercept() 方法中实现自定义 retry 逻辑。

java 复制代码
@Slf4j
public class ConnectFailedRetryInterceptor implements Interceptor {

    private final static int MAX_RETRY = 5;

    @Override
    public Response intercept(Chain chain) {
        Request request = chain.request();
        return RetryUtils.retryWithCondition(() -> chain.proceed(request), response -> false, throwable -> {
            if (throwable instanceof ConnectException) {
                ConnectException connectException = (ConnectException) throwable;
                log.warn("url: {}, httpMethod: {}, connectException: {}", request.url(), request.method(),
                        connectException.getMessage());
                return true;
            } else if (throwable instanceof SocketTimeoutException) {
                SocketTimeoutException socketTimeoutException = (SocketTimeoutException) throwable;
                String timeoutExceptionMessage = socketTimeoutException.getMessage();
                log.warn("url: {}, httpMethod: {}, connectException: {}", request.url(), request.method(),
                        timeoutExceptionMessage);
                return StrUtil.equals(timeoutExceptionMessage, "connect timed out");
            }
            return false;
        }, MAX_RETRY);
    }

}

这里简单封装了一个 RetryUtils 工具类,成熟的解决方案可以使用 spring retry 或者 guava retry。

java 复制代码
@Slf4j
public abstract class RetryUtils {

    private static final int RETRY_DELAY = 500;

    @SneakyThrows
    public static <T> T retryWithCondition(Callable<T> callable, Predicate<T> retValRetryPredicate,
            Predicate<Throwable> throwableRetryPredicate, int maxRetryCount) {
        int retryCount = 0;
        while (true) {
            try {
                T ret = callable.call();
                if (!retValRetryPredicate.test(ret)) {
                    return ret;
                }
                log.warn("return value: {} does not match expected value, retry later", ret);
                if (retryCount >= maxRetryCount) {
                    return ret;
                }
                try {
                    Thread.sleep(RETRY_DELAY);
                } catch (InterruptedException ignored) {
                    Thread.currentThread().interrupt();
                }
            } catch (Throwable throwable) {
                retryCount++;
                if (throwableRetryPredicate.test(throwable)) {
                    if (retryCount >= maxRetryCount) {
                        throw throwable;
                    }
                }
                log.warn("throwableRetryPredicate pass, retry later, throwable message: {}", throwable.getMessage());
                try {
                    Thread.sleep(RETRY_DELAY);
                } catch (InterruptedException ignored) {
                    Thread.currentThread().interrupt();
                    throw throwable;
                }
            }
        }
    }

}
相关推荐
2401_857610036 分钟前
Spring Boot框架:电商系统的技术优势
java·spring boot·后端
希忘auto22 分钟前
详解MySQL安装
java·mysql
冰淇淋烤布蕾34 分钟前
EasyExcel使用
java·开发语言·excel
拾荒的小海螺40 分钟前
JAVA:探索 EasyExcel 的技术指南
java·开发语言
Jakarta EE1 小时前
正确使用primefaces的process和update
java·primefaces·jakarta ee
马剑威(威哥爱编程)1 小时前
哇喔!20种单例模式的实现与变异总结
java·开发语言·单例模式
java—大象1 小时前
基于java+springboot+layui的流浪动物交流信息平台设计实现
java·开发语言·spring boot·layui·课程设计
杨哥带你写代码2 小时前
网上商城系统:Spring Boot框架的实现
java·spring boot·后端
camellias_2 小时前
SpringBoot(二十一)SpringBoot自定义CURL请求类
java·spring boot·后端
布川ku子2 小时前
[2024最新] java八股文实用版(附带原理)---Mysql篇
java·mysql·面试