Gateway API 实践之（四）FSM Gateway 的重试功能

网关的重试功能是一种重要的网络通信机制，旨在提高系统服务调用的可靠性和容错性。这个功能允许网关在初次请求失败时自动重新发送请求，从而减少临时性问题（如网络波动、服务瞬时过载等）对最终用户体验的影响。

它的工作原理是，当网关向下游服务发送请求时，如果遇到特定类型的失败（如连接错误、超时、5xx 系列错误等），而不是立即将这个错误返回给客户端，网关会根据预设的策略尝试重新发送请求。

重试则是 FSM Gateway 实现的众多 Policy Attachment 之一：

这篇文章将带大家体验下 FSM Gateway 的重试功能。

前置条件

Kubernetes 集群
kubectl 工具

环境准备

安装 FSM Gateway

FSM Gateway 的安装，可以参考安装文档。这里选择 CLI 的方式安装。

下载 FSM CLI。

shell 复制代码

system=$(uname -s | tr '[:upper:]' '[:lower:]')
arch=$(uname -m | sed -E 's/x86_/amd/' | sed -E 's/aarch/arm/')
release=v1.2.0
curl -L https://github.com/flomesh-io/fsm/releases/download/$release/fsm-$release-$system-$arch.tar.gz | tar -vxzf -
./$system-$arch/fsm version
sudo cp ./$system-$arch/fsm /usr/local/bin/fsm

在安装 FSM 时启用 FSM Gateway，默认情况是不启用的。

shell 复制代码

fsm install \
    --set=fsm.fsmGateway.enabled=true

部署示例应用

我们使用 fortio server 作为示例应用，其可以通过请求参数 status 来定义响应状态码，并可以设置出现的概率。

shell 复制代码

kubectl create namespace server
kubectl apply -n server -f - <<EOF
apiVersion: v1
kind: Service
metadata:
  name: fortio
  labels:
    app: fortio
    service: fortio
spec:
  ports:
  - port: 8080
    name: http-8080
  selector:
    app: fortio
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fortio
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fortio
  template:
    metadata:
      labels:
        app: fortio
    spec:
      containers:
      - name: fortio
        image: fortio/fortio:latest_release
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: http
EOF

创建网关和路由

shell 复制代码

kubectl apply -n server -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1beta1
kind: Gateway
metadata:
  name: simple-fsm-gateway
spec:
  gatewayClassName: fsm-gateway-cls
  listeners:
  - protocol: HTTP
    port: 8000
    name: http
    allowedRoutes:
      namespaces:
        from: Same
---
apiVersion: gateway.networking.k8s.io/v1beta1
kind: HTTPRoute
metadata:
  name: fortio-route
spec:
  parentRefs:
  - name: simple-fsm-gateway
    port: 8000
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /
    backendRefs:
    - name: fortio
      port: 8080
EOF

检查应用是否可以正常访问。

shell 复制代码

export GATEWAY_IP=$(kubectl get svc -n server -l app=fsm-gateway -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')

curl -i http://$GATEWAY_IP:8000/echo
HTTP/1.1 200 OK
date: Fri, 05 Jan 2024 07:02:17 GMT
content-length: 0
connection: keep-alive

重试策略测试

在设置重试策略之前，通过添加参数 status=503:10 让 fortio server 有 10% 的机会返回 503。使用 fortio load 生成负载，发送 100 个请求可以看到接近 10% 的是 503 响应。

shell 复制代码

fortio load -quiet -c 1 -n 100 http://$GATEWAY_IP:8000/echo\?status\=503:10

Code 200 : 89 (89.0 %)
Code 503 : 11 (11.0 %)
All done 100 calls (plus 0 warmup) 1.054 ms avg, 8.0 qps

接下来设置重试策略。

targetRef 指定策略作用的目标资源，在重试策略中目标资源只能是 K8s core 中的 Service 或者 flomesh.io 中的 ServiceImport（后者用于多集群）。此处我们指定命名空间 server 下的 fortio。
ports 服务的端口列表，由于服务可能暴露多个端口，可以为不同的端口这是重试策略。
- port 服务端口，设置为本示例中 fortio 服务的 8080。
- config 重试策略的核心配置。
  - retryOn 是可重试的响应代码列表，例如 5xx 匹配 500-599，或 500 只匹配 500。
  - numRetries 重试的次数。
  - backoffBaseInterval 是计算退避的基本间隔（以秒为单位），即在连续的重试请求之间等待的时间。主要是为了避免在出现问题的服务上造成额外压力。

详细的重试策略配置可以参考官方文档 RetryPolicy。

shell 复制代码

kubectl apply -n server -f - <<EOF
apiVersion: gateway.flomesh.io/v1alpha1
kind: RetryPolicy
metadata:
  name: retry-policy-sample
spec:
  targetRef:
    kind: Service
    name: fortio
    namespace: server
  ports:
  - port: 8080
    config:
      retryOn:
      - 5xx
      numRetries: 5
      backoffBaseInterval: 2
EOF

待策略生效后，同样发送 100 个请求，可以看到全部都是 200 的响应。需要留意的是，平均的响应时间有延长，因为重试增加了耗时。

shell 复制代码

fortio load -quiet -c 1 -n 100 http://$GATEWAY_IP:8000/echo\?status\=503:10

Code 200 : 100 (100.0 %)
All done 100 calls (plus 0 warmup) 160.820 ms avg, 5.8 qps

关于 Flomesh

Flomesh（易衡科技）成立于 2018 年，自主研发并开源了高性能可编程代理 Pipy(https://github.com/flomesh-io/pipy)。以 Pipy 为基础，Flomesh 研发了软件负载均衡、服务网格两款软件产品。为工信部认证的可信云产品、可信开源项目。

Flomesh 核心竞争力来自完全自研的核心组件 Pipy，该组件高性能、高可靠、低延迟、可编程、可扩展、低依赖，采用 C++ 开发，内置自研的 JS 引擎，支持适用 JS 脚本做扩展开发。支持包括 x86、arm、龙芯、海光等硬件 CPU 架构；支持 Linux、FreeBSD、OpenWrt 等多种核心的操作系统。

Flomesh 成立以来，以技术为根基、以客户为导向，产品被应用在头部股份制商业银行总行、大型保险公司、运营商总部以及研究院等众多客户和多个场景。