Http Client Tcp Connect Failed Retry

为什么需要 Tcp Connect Failed Retry ?

K8S 多副本环境下,如果服务消费方使用 http client 访问不健康的 pod(服务提供方),http client 会抛出异常,一般这个异常会带回到前端页面,显示【系统繁忙,请稍候重试】或者【网络异常,请稍候重试】等信息。

K8S 集群内部,一般使用 Service 进行 Pod 间通信。如果服务消费方使用 http client 进行 tcp 握手失败后,可以继续使用 Service 地址进行重试,因为 Service 底层由 Endpoint 组件实现,自带负载均衡效果,重试就有机会将 tcp 连接打到健康的 pod 上去。

为什么只做 Tcp Connect Failed Retry,不做 http read(write) timeout retry ?如果要做 http read timeout retry,需要保证服务提供方的接口均为幂等接口,这个成本比较高,折中,只做 Tcp Connect Failed Retry,剩下的 read timeout 异常,往前端抛。

Feign Client Tcp 连接失败后进行 Retry

Feign Client 提供 Retryer 接口,并提供了两个默认实现:

  1. NEVER_RETRY:空实现,不重试。
  2. Default:默认实现,重试指定次数。
java 复制代码
/**
 * Cloned for each invocation to {@link Client#execute(Request, feign.Request.Options)}.
 * Implementations may keep state to determine if retry operations should continue or not.
 */
public interface Retryer extends Cloneable {

  /**
   * if retry is permitted, return (possibly after sleeping). Otherwise propagate the exception.
   */
  void continueOrPropagate(RetryableException e);

  Retryer clone();

  class Default implements Retryer {

    private final int maxAttempts;
    private final long period;
    private final long maxPeriod;
    int attempt;
    long sleptForMillis;

    public Default() {
      this(100, SECONDS.toMillis(1), 5);
    }

    public Default(long period, long maxPeriod, int maxAttempts) {
      this.period = period;
      this.maxPeriod = maxPeriod;
      this.maxAttempts = maxAttempts;
      this.attempt = 1;
    }

    // visible for testing;
    protected long currentTimeMillis() {
      return System.currentTimeMillis();
    }

    public void continueOrPropagate(RetryableException e) {
      if (attempt++ >= maxAttempts) {
        throw e;
      }

      long interval;
      if (e.retryAfter() != null) {
        interval = e.retryAfter().getTime() - currentTimeMillis();
        if (interval > maxPeriod) {
          interval = maxPeriod;
        }
        if (interval < 0) {
          return;
        }
      } else {
        interval = nextMaxInterval();
      }
      try {
        Thread.sleep(interval);
      } catch (InterruptedException ignored) {
        Thread.currentThread().interrupt();
        throw e;
      }
      sleptForMillis += interval;
    }

    /**
     * Calculates the time interval to a retry attempt. <br>
     * The interval increases exponentially with each attempt, at a rate of nextInterval *= 1.5
     * (where 1.5 is the backoff factor), to the maximum interval.
     *
     * @return time in milliseconds from now until the next attempt.
     */
    long nextMaxInterval() {
      long interval = (long) (period * Math.pow(1.5, attempt - 1));
      return interval > maxPeriod ? maxPeriod : interval;
    }

    @Override
    public Retryer clone() {
      return new Default(period, maxPeriod, maxAttempts);
    }
  }

  /**
   * Implementation that never retries request. It propagates the RetryableException.
   */
  Retryer NEVER_RETRY = new Retryer() {

    @Override
    public void continueOrPropagate(RetryableException e) {
      throw e;
    }

    @Override
    public Retryer clone() {
      return this;
    }
  };
}

SpringBoot OpenFeign 自动装配默认使用 NEVER_RETRY 实现,如果想要切换为 Default 实现,可以在 yml 中配置(内部类使用 $ 分隔符),也可使用注入 Bean 的方式进行配置。

yml 复制代码
feign:
  client:
    config:
      default:
        connectTimeout: 1000
        readTimeout: 5000
        retryer: feign.Retryer$Default

Default 是一个很好的实现范例,但是不满足我们的要求,该实现发生任何异常均会进行重试,而我们只想在 tcp 握手失败时进行重试。

经过实验,发现 tcp 握手失败有两种情况:

  1. 服务器未监听端口,客户端与服务器 TCP 握手会立即失败,抛出 ConnectException,异常消息一般为 Connection Refused,说明 TCP 连接被拒绝。
  2. TCP 握手超时,超过 connectTimeout 时间后,抛出 SocketTimeoutException,异常 Message 为 connect timed out,说明 TCP 连接超时。

注意:如果抛出 SocketTimeoutException,异常消息为 Read time out,说明发生了 http read time out,不能进行重试。

  1. ConnectException:Signals that an error occurred while attempting to connect a socket to a remote address and port. Typically, the connection was refused remotely (e.g., no process is listening on the remote address/port).
  2. SocketTimeoutException: Signals that a timeout has occurred on a socket read or accept.

ConnectFailedRetryer 使用了装饰器模式,内部有一个成员变量 Retryer retryer 执行真正的 retry 操作,在 ConnectFailedRetryer#continueOrPropagate() 方法中,仅仅只是针对指定的异常,调用 retryer#continueOrPropagate() 方法执行 retry,其他情况则直接抛出异常。

java 复制代码
@Slf4j
public class ConnectFailedRetryer implements Retryer {

    private final Retryer retryer;

    public ConnectFailedRetryer() {
        this(100, SECONDS.toMillis(1), 5);
    }

    public ConnectFailedRetryer(long period, long maxPeriod, int maxAttempts) {
        this(new Default(period, maxPeriod, maxAttempts));
    }

    public ConnectFailedRetryer(Retryer retryer) {
        this.retryer = retryer;
    }

    @Override
    public void continueOrPropagate(RetryableException e) {
        Request request = e.request();
        String url = request.url();
        Request.HttpMethod httpMethod = request.httpMethod();
        Throwable throwable = e.getCause();
        if (throwable instanceof ConnectException) {
            ConnectException connectException = (ConnectException) throwable;
            log.warn("url: {}, httpMethod: {}, connectException: {}", url, httpMethod, connectException.getMessage());
            retryer.continueOrPropagate(e);
        } else if (throwable instanceof SocketTimeoutException) {
            SocketTimeoutException socketTimeoutException = (SocketTimeoutException) throwable;
            String timeoutExceptionMessage = socketTimeoutException.getMessage();
            log.warn("url: {}, httpMethod: {}, socketTimeoutException: {}", url, httpMethod, timeoutExceptionMessage);
            if (StrUtil.equals(timeoutExceptionMessage, "connect timed out")) {
                retryer.continueOrPropagate(e);
            } else {
                throw e;
            }
        } else {
            throw e;
        }
    }

    @Override
    public Retryer clone() {
        Retryer cloneRetryer = retryer.clone();
        return new ConnectFailedRetryer(cloneRetryer);
    }

}

在 yml 中配置自定义的 ConnectFailedRetryer:

yml 复制代码
feign:
  client:
    config:
      default:
        connectTimeout: 1000
        readTimeout: 5000
        retryer: com.oneby.common.feign.ConnectFailedRetryer

OkHttp Client Tcp 连接失败后进行 Retry

OkHttp 需要在 Interceptor#intercept() 方法中实现自定义 retry 逻辑。

java 复制代码
@Slf4j
public class ConnectFailedRetryInterceptor implements Interceptor {

    private final static int MAX_RETRY = 5;

    @Override
    public Response intercept(Chain chain) {
        Request request = chain.request();
        return RetryUtils.retryWithCondition(() -> chain.proceed(request), response -> false, throwable -> {
            if (throwable instanceof ConnectException) {
                ConnectException connectException = (ConnectException) throwable;
                log.warn("url: {}, httpMethod: {}, connectException: {}", request.url(), request.method(),
                        connectException.getMessage());
                return true;
            } else if (throwable instanceof SocketTimeoutException) {
                SocketTimeoutException socketTimeoutException = (SocketTimeoutException) throwable;
                String timeoutExceptionMessage = socketTimeoutException.getMessage();
                log.warn("url: {}, httpMethod: {}, connectException: {}", request.url(), request.method(),
                        timeoutExceptionMessage);
                return StrUtil.equals(timeoutExceptionMessage, "connect timed out");
            }
            return false;
        }, MAX_RETRY);
    }

}

这里简单封装了一个 RetryUtils 工具类,成熟的解决方案可以使用 spring retry 或者 guava retry。

java 复制代码
@Slf4j
public abstract class RetryUtils {

    private static final int RETRY_DELAY = 500;

    @SneakyThrows
    public static <T> T retryWithCondition(Callable<T> callable, Predicate<T> retValRetryPredicate,
            Predicate<Throwable> throwableRetryPredicate, int maxRetryCount) {
        int retryCount = 0;
        while (true) {
            try {
                T ret = callable.call();
                if (!retValRetryPredicate.test(ret)) {
                    return ret;
                }
                log.warn("return value: {} does not match expected value, retry later", ret);
                if (retryCount >= maxRetryCount) {
                    return ret;
                }
                try {
                    Thread.sleep(RETRY_DELAY);
                } catch (InterruptedException ignored) {
                    Thread.currentThread().interrupt();
                }
            } catch (Throwable throwable) {
                retryCount++;
                if (throwableRetryPredicate.test(throwable)) {
                    if (retryCount >= maxRetryCount) {
                        throw throwable;
                    }
                }
                log.warn("throwableRetryPredicate pass, retry later, throwable message: {}", throwable.getMessage());
                try {
                    Thread.sleep(RETRY_DELAY);
                } catch (InterruptedException ignored) {
                    Thread.currentThread().interrupt();
                    throw throwable;
                }
            }
        }
    }

}
相关推荐
KingDol_MIni21 分钟前
Spring Boot 集成 T-io 实现客户端服务器通信
java·服务器·spring boot
许苑向上25 分钟前
Java八股文(下)
java·开发语言
逸Y 仙X30 分钟前
Git常见命令--助力开发
java·大数据·git·java-ee·github·idea
独孤求败Ace33 分钟前
第44天:Web开发-JavaEE应用&反射机制&类加载器&利用链&成员变量&构造方法&抽象方法
java·开发语言
FLZJ_KL33 分钟前
【设计模式】【创建型模式】单例模式(Singleton)
java·单例模式·设计模式
CL_IN41 分钟前
企业数据集成:实现高效调拨出库自动化
java·前端·自动化
计算机-秋大田1 小时前
基于Spring Boot的农产品智慧物流系统设计与实现(LW+源码+讲解)
java·开发语言·spring boot·后端·spring·课程设计
计算机毕设指导61 小时前
基于SpringBoot的城乡商城协作系统【附源码】
java·spring boot·后端·mysql·spring·tomcat·maven
华子w9089258591 小时前
基于数据可视化+SpringBoot+安卓端的数字化施工项目计划与管理平台设计和实现
java·spring boot·后端
橘猫云计算机设计1 小时前
基于Django的购物商城平台的设计与实现(源码+lw+部署文档+讲解),源码可白嫖!
java·数据库·spring boot·后端·django