nginx 配置 proxy_next_upstream 会出现未预期 502 错误问题排查

当使用nginx代理多个网关实例时,

当被请求服务的get 接口异常时,如 error timeout invalid_header http_500 http_502 http_503 http_504,

nginx 会响应 502状态码,

在我之前的认知里,nginx 只会转发 后端服务的响应,一般不会对状态码进行修改

nginx 配置如下:

worker_processes  1;
daemon off;
master_process off; 
error_log  logs/error.log  debug; 
events {
    worker_connections  1024;
}
http {
    include       mime.types;
    default_type  application/octet-stream;
     log_format apm '[$time_local]\tclient=$remote_addr\t'
               'upstream_addr=$upstream_addr\t'
               'upstream_status=$upstream_status\t'
               'document_root="$document_root"\t'
               'fastcgi_script_name="$fastcgi_script_name"\t'
               'request_filename="$request_filename"\t'
               'request_time=$request_time\t'
               'upstream_response_time=$upstream_response_time\t'
               'upstream_connect_time=$upstream_connect_time\t'
               'upstream_header_time=$upstream_header_time\t';
    access_log  logs/access.log  apm;
    sendfile        on; 
    keepalive_timeout  65;
    upstream gateway {
        server 192.168.2.102:12012;
        server 192.168.2.102:12011;
    }
    server {
        listen       80;
        server_name  localhost; 
        location / {
            root   html;
            index  index.html index.htm;
        }
        location /api/ {
            proxy_pass http://gateway/;
            proxy_next_upstream error http_503 http_502;
        } 
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        } 
    }
}

示例测试代码:

    @GetMapping("/excep503")
    public ResponseEntity<String>  excep503(HttpServletRequest request, Integer times) throws InterruptedException {
        Thread.sleep(200);
        return ResponseEntity.status(HttpStatus.SERVICE_UNAVAILABLE).body("服务不可用");
    }

测试方法:

多次 get 请求一个异常接口

现象:

有时报错 502 ,有时报错 503


返回 503时

access_log 中的 upstream_addr 会有两条: 192.168.2.102:12012, 192.168.2.102:12011

error_log 会出现分别请求 两台网关的日志:

首先请求 connect to 192.168.2.102:12011 ;

102:12011 返回 503 Service Unavailable

报错

upstream server temporarily disabled while reading response header from upstream

然后 重新指向 connect to 192.168.2.102:12012

102:12012 同样 返回 503 Service Unavailable

返回 502时

access_log 中的 upstream_addr 只会有一条:upstream_addr=192.168.2.102:12011

error_log 只会出现一次请求网关的日志:

请求 connect to 192.168.2.102:12011 ;

102:12011 返回 503 Service Unavailable

报错

upstream server temporarily disabled while reading response header from upstream,
no live upstreams while connecting to upstream,

返回502的原因

根据 查阅相关资料

传入的ft_type为 40000000 匹配到 default ,所以最终状态码为 NGX_HTTP_BAD_GATEWAY ,即 502

nginx-1.24.0\src\http\ngx_http_upstream.c(ngx_http_upstream_next) 4370行;

switch (ft_type) {

    case NGX_HTTP_UPSTREAM_FT_TIMEOUT:
    case NGX_HTTP_UPSTREAM_FT_HTTP_504:
        status = NGX_HTTP_GATEWAY_TIME_OUT;
        break;

    case NGX_HTTP_UPSTREAM_FT_HTTP_500:
        status = NGX_HTTP_INTERNAL_SERVER_ERROR;
        break;

    case NGX_HTTP_UPSTREAM_FT_HTTP_503:
        status = NGX_HTTP_SERVICE_UNAVAILABLE;
        break;

    /*
     * NGX_HTTP_UPSTREAM_FT_BUSY_LOCK and NGX_HTTP_UPSTREAM_FT_MAX_WAITING
     * never reach here
     */

    default:
        status = NGX_HTTP_BAD_GATEWAY;
    }

502 与 503 的 逻辑分岔路:

nginx-1.24.0\src\http\ngx_http_upstream_round_robin.c(ngx_http_upstream_get_round_robin_peer)449 行

peers = rrp->peers;
    ngx_http_upstream_rr_peers_wlock(peers);

    if (peers->single) {
        peer = peers->peer;

        if (peer->down) {
            goto failed;
        }

        if (peer->max_conns && peer->conns >= peer->max_conns) {
            goto failed;
        }

        rrp->current = peer;

    } else {

        peer = ngx_http_upstream_get_peer(rrp);

        if (peer == NULL) {
            goto failed;
        }

        ngx_log_debug2(NGX_LOG_DEBUG_HTTP, pc->log, 0,
                       "get rr peer, current: %p %i",
                       peer, peer->current_weight);
    }

其中的 single 标志位是一个用于标识后端服务器组是否只有一个成员的标志,即 upstream_addr 为单个

所以现在的问题是:

为什么 有时upstream_addr是两个 ,有时是一个

debug nginx 源码

nginx启动时 给每个后端节点赋值了一个默认的超时时间 10s

发生异常时将节点标记为不可用:

nginx-1.24.0/src/http/ngx_http_upstream_round_robin.c(ngx_http_upstream_get_peer) 522 行

    for (peer = rrp->peers->peer, i = 0;
         peer;
         peer = peer->next, i++)
    {
        n = i / (8 * sizeof(uintptr_t));
        m = (uintptr_t) 1 << i % (8 * sizeof(uintptr_t));

        if (rrp->tried[n] & m) {
            continue;
        }

        if (peer->down) {
            continue;
        }

        if (peer->max_fails
            && peer->fails >= peer->max_fails
            && now - peer->checked <= peer->fail_timeout)
        {
            continue;
        }

        if (peer->max_conns && peer->conns >= peer->max_conns) {
            continue;
        }

        peer->current_weight += peer->effective_weight;
        total += peer->effective_weight;

        if (peer->effective_weight < peer->weight) {
            peer->effective_weight++;
        }

        if (best == NULL || peer->current_weight > best->current_weight) {
            best = peer;
            p = i;
        }
    }

验证

不断请求接口,发现每过10秒,就会恢复503 错误,符合猜测


相关推荐
计算机毕设定制辅导-无忧学长1 小时前
Nginx 性能优化技巧与实践(一)
nginx·性能优化·dubbo
烛.照1032 小时前
Nginx部署的前端项目刷新404问题
运维·前端·nginx
入眼皆含月9 小时前
Nginx的负载均衡
运维·nginx·负载均衡
烛.照10310 小时前
Nginx中部署多个前端项目
运维·前端·nginx
等一场春雨1 天前
Alibaba Spring Cloud 十三 Nacos,Gateway,Nginx 部署架构与负载均衡方案
nginx·spring cloud·gateway
一夜白头催人泪1 天前
【阿里云】使用docker安装nginx后可以直接访问
nginx·阿里云·docker
运维实战课程2 天前
elk(都是6.2.4重点-版本2-收集nginx日志并分析绘图(单点es,redis缓存)-无filebeat
nginx·elk·缓存
计算机毕设定制辅导-无忧学长2 天前
Nginx 安全配置与防护策略
运维·nginx·安全
_Eden_2 天前
Nginx入门学习二
服务器·学习·nginx
tryCbest2 天前
Nginx代理
服务器·javascript·nginx