Dubbo 3 深度剖析 – 透过源码认识你(完结)

Dubbo 3 深度剖析:透过源码认识你,拆解集群容错与负载均衡底层实现

温馨提示:本文所有源码均基于 Dubbo 3.2.x 正式分支,行号与 tag dubbo-3.2.11 一一对应。

为便于阅读,源码经过删减,但关键路径全部保留,可直接在 IDE 内单步调试。


1. 鸟瞰:一次 RPC 调用如何穿过容错与负载均衡

scss 复制代码
Consumer 代理
   │ 1. 发起 invoke()
   ▼
Invoker<?> invoker = cluster.join(directory)   // 集群容错入口
   │ 2. 先选负载均衡策略
   ▼
LoadBalance lb = ExtensionLoader.getExtension(loadbalance)
   │ 3. 再选容错策略
   ▼
Cluster cluster = ExtensionLoader.getExtension(cluster)
   │ 4. 返回 FailoverClusterInvoker(以 failover 为例)
   ▼
Invoker.invoke()
   │ 5. 进入 AbstractClusterInvoker#invoke
   ▼
List<Invoker<T>> invokers = directory.list(invocation) // 存活提供者
Invoker<T> selected = lb.select(invokers, invocation)  // 负载均衡
   │ 6. 真正发起远程调用
   ▼
FilterChain.head.invoke(next) → NettyClient.request()

下文所有源码剖析均围绕 5、6 两步展开------集群容错 负责在"调用失败"时干什么,负载均衡负责在"调用成功"时选谁。


2. 集群容错源码拆解

2.1 接口与继承树

复制代码
org.apache.dubbo.rpc.cluster.Cluster
  ├─ FailoverCluster     → FailoverClusterInvoker
  ├─ FailfastCluster     → FailfastClusterInvoker
  ├─ FailsafeCluster     → FailsafeClusterInvoker
  ├─ FailbackCluster     → FailbackClusterInvoker
  └─ ForkingCluster      → ForkingClusterInvoker

它们全部继承自 AbstractClusterInvoker,核心模板方法:

java 复制代码
public abstract class AbstractClusterInvoker<T> implements Invoker<T> {
    public Result invoke(final Invocation invocation) throws RpcException {
        // 1. 拉取最新存活列表
        List<Invoker<T>> invokers = list(invocation);
        // 2. 初始化负载均衡器
        LoadBalance loadbalance = initLoadBalance(invokers, invocation);
        // 3. 交给子类实现真正逻辑
        return doInvoke(invocation, invokers, loadbalance);
    }
}

2.2 FailoverClusterInvoker:失败自动重试

目标 :最多重试 N 次(默认 2),只要有一次成功即返回。
场景:读操作为主、幂等性强。

java 复制代码
public class FailoverClusterInvoker<T> extends AbstractClusterInvoker<T> {
    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        int len = getUrl().getMethodParameter(invocation.getMethodName(), RETRIES_KEY, DEFAULT_RETRIES) + 1;
        RpcException le = null;
        List<Invoker<T>> invoked = new ArrayList<>(len);
        Set<String> providers = new HashSet<>(len);
        for (int i = 0; i < len; i++) {
            // 关键:重试时重新 list,防止因"服务下线"选到已死亡的 Invoker
            if (i > 0) {
                checkWhetherDestroyed();
                invokers = list(invocation);
            }
            Invoker<T> invoker = select(loadbalance, invocation, invokers, invoked);
            invoked.add(invoker);
            providers.add(invoker.getUrl().getAddress());
            try {
                Result result = invoker.invoke(invocation);
                if (le != null && logger.isWarnEnabled()) {
                    logger.warn("Failover on " + invoker.getUrl() + " succeeded after " + i + " retries");
                }
                return result;               // 只要一次成功立即返回
            } catch (RpcException e) {
                if (e.isBiz()) {             // 业务异常直接抛
                    throw e;
                }
                le = e;
            } catch (Throwable e) {
                le = new RpcException(e.getMessage(), e);
            }
        }
        throw new RpcException("Failed after retries: " + len + ", providers: " + providers, le);
    }
}

代码行数:核心逻辑 40 行,但浓缩了 3 个关键设计:

  1. 实时重新拉取目录:防止"陈旧 Invoker"被反复重试。
  2. 业务异常快速逃逸e.isBiz() 为 true 时不再重试。
  3. 重试次数 = retries + 1:第一次不算重试,语义清晰。

2.3 FailfastClusterInvoker:快速失败

目标 :一次失败立即抛异常,为非幂等写操作 保驾护航。
代码极简

java 复制代码
public class FailfastClusterInvoker<T> extends AbstractClusterInvoker<T> {
    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        checkInvokers(invokers, invocation);
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        return invoker.invoke(invocation);   // 无任何 try-catch
    }
}

2.4 FailsafeClusterInvoker:失败安全

目标 :吞掉异常,返回空结果,适用于审计、日志等旁路逻辑。

java 复制代码
public class FailsafeClusterInvoker<T> extends AbstractClusterInvoker<T> {
    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        try {
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable t) {
            logger.error("Failsafe ignore exception: " + t.getMessage(), t);
            return AsyncRpcResult.newDefaultAsyncResult(null, invocation); // 返回空结果
        }
    }
}

2.5 FailbackClusterInvoker:失败定时重试

目标 :失败后记录任务,后台定时重试 ,直到成功或超时。
实现要点

  • 内存队列 ConcurrentHashMap<FailbackKey, RetryTask>
  • ScheduledExecutorService 默认 5 s 间隔
  • 最大重试次数 3 次,默认间隔 5 s
java 复制代码
public class FailbackClusterInvoker<T> extends AbstractClusterInvoker<T> {
    private static final long RETRY_FAILED_PERIOD = 5 * 1000;
    private final ConcurrentMap<FailbackKey, RetryTask> failed = new ConcurrentHashMap<>();
    private final ScheduledExecutorService retryExecutor = Executors.newSingleThreadScheduledExecutor(
        new NamedThreadFactory("failback-cluster-timer", true));

    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        try {
            return invoker.invoke(invocation);
        } catch (Throwable t) {
            // 1. 构造重试任务
            RetryTask task = new RetryTask(invoker, invocation);
            failed.putIfAbsent(new FailbackKey(invoker.getUrl(), invocation), task);
            // 2. 首次延迟 5 s 执行
            retryExecutor.schedule(() -> {
                RetryTask r = failed.remove(key);
                if (r != null) r.run();
            }, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
            // 3. 立即返回空结果,不阻塞业务
            return AsyncRpcResult.newDefaultAsyncResult(null, invocation);
        }
    }
}

2.6 ForkingClusterInvoker:并行多播

目标 :同时调用 N 个提供者,谁先到用谁 ,适用于超低延迟读

java 复制代码
public class ForkingClusterInvoker<T> extends AbstractClusterInvoker<T> {
    @Override
    public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        int forks = getUrl().getParameter(FORKS_KEY, DEFAULT_FORKS);
        ExecutorService executor = Executors.newCachedThreadPool(
            new NamedThreadFactory("forking-cluster-timer", true));
        try {
            BlockingQueue<Object> ref = new LinkedBlockingQueue<>();
            List<Invoker<T>> selected = new ArrayList<>();
            for (int i = 0; i < Math.min(forks, invokers.size()); i++) {
                Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                selected.add(invoker);
                executor.submit(() -> {
                    try {
                        Result r = invoker.invoke(invocation);
                        ref.offer(r);          // 第一个结果入队
                    } catch (Throwable t) {
                        ref.offer(t);          // 异常也入队
                    }
                });
            }
            Object ret = ref.poll(getUrl().getParameter(TIMEOUT_KEY, DEFAULT_TIMEOUT), TimeUnit.MILLISECONDS);
            if (ret instanceof Result) return (Result) ret;
            if (ret instanceof Throwable) throw new RpcException((Throwable) ret);
            throw new RpcException("No result returned");
        } finally {
            executor.shutdownNow();
        }
    }
}

3. 负载均衡源码拆解

3.1 接口与继承树

scss 复制代码
org.apache.dubbo.rpc.cluster.LoadBalance
  ├─ RandomLoadBalance
  ├─ RoundRobinLoadBalance
  ├─ LeastActiveLoadBalance
  ├─ ConsistentHashLoadBalance
  └─ ShortestResponseLoadBalance   (3.x 新增)

统一入口:

java 复制代码
@SPI("random")
public interface LoadBalance {
    <T> Invoker<T> select(List<Invoker<T>> invokers, URL url, Invocation invocation) throws RpcException;
}

3.2 RandomLoadBalance:带权重的随机

java 复制代码
public class RandomLoadBalance extends AbstractLoadBalance {
    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        int length = invokers.size();
        boolean sameWeight = true;
        int[] weights = new int[length];
        int totalWeight = 0;
        for (int i = 0; i < length; i++) {
            int weight = getWeight(invokers.get(i), invocation);
            totalWeight += weight;
            weights[i] = totalWeight;
            if (sameWeight && i > 0 && weight != weights[i - 1]) {
                sameWeight = false;
            }
        }
        if (totalWeight > 0 && !sameWeight) {
            int offset = ThreadLocalRandom.current().nextInt(totalWeight);
            for (int i = 0; i < length; i++) {
                if (offset < weights[i]) return invokers.get(i);
            }
        }
        return invokers.get(ThreadLocalRandom.current().nextInt(length));
    }
}

技巧 :通过 ThreadLocalRandom 避免 CAS 竞争;sameWeight 优化等权重场景。

3.3 RoundRobinLoadBalance:平滑加权轮询

Dubbo 3 采用 Nginx 平滑加权轮询算法,解决"流量毛刺"问题。

java 复制代码
public class RoundRobinLoadBalance extends AbstractLoadBalance {
    private static final ConcurrentMap<String, WeightedRoundRobin> sequences = new ConcurrentHashMap<>();

    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        String key = invokers.get(0).getUrl().getServiceKey() + "." + invocation.getMethodName();
        int length = invokers.size();
        int maxWeight = 0;
        int gcdWeight = 0;
        for (int i = 0; i < length; i++) {
            int weight = getWeight(invokers.get(i), invocation);
            maxWeight = Math.max(maxWeight, weight);
            gcdWeight = gcd(gcdWeight, weight);
        }
        WeightedRoundRobin curr = sequences.computeIfAbsent(key, k -> new WeightedRoundRobin());
        curr.maxWeight = maxWeight;
        curr.gcdWeight = gcdWeight;
        curr.currentWeight += curr.gcdWeight;
        if (curr.currentWeight > curr.maxWeight) {
            curr.currentWeight -= curr.maxWeight;
        }
        for (int i = 0; i < length; i++) {
            if (curr.currentWeight <= getWeight(invokers.get(i), invocation)) {
                return invokers.get(i);
            }
        }
        return invokers.get(0);
    }

    private static int gcd(int a, int b) {
        return b == 0 ? a : gcd(b, a % b);
    }

    private static class WeightedRoundRobin {
        int maxWeight;
        int gcdWeight;
        int currentWeight;
    }
}

3.4 LeastActiveLoadBalance:最少活跃数 + 权重

java 复制代码
public class LeastActiveLoadBalance extends AbstractLoadBalance {
    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        int length = invokers.size();
        int leastActive = -1;
        int leastCount = 0;
        int[] leastIndexs = new int[length];
        int[] weights = new int[length];
        int totalWeight = 0;
        boolean sameWeight = true;
        for (int i = 0; i < length; i++) {
            Invoker<T> invoker = invokers.get(i);
            int active = RpcStatus.getStatus(invoker.getUrl(), invocation.getMethodName()).getActive();
            int weight = getWeight(invoker, invocation);
            weights[i] = weight;
            if (leastActive == -1 || active < leastActive) {
                leastActive = active;
                leastCount = 1;
                leastIndexs[0] = i;
                totalWeight = weight;
                sameWeight = true;
            } else if (active == leastActive) {
                leastIndexs[leastCount++] = i;
                totalWeight += weight;
                sameWeight = sameWeight && weight == weights[0];
            }
        }
        if (leastCount == 1) return invokers.get(leastIndexs[0]);
        if (!sameWeight && totalWeight > 0) {
            int offsetWeight = ThreadLocalRandom.current().nextInt(totalWeight);
            for (int i = 0; i < leastCount; i++) {
                int leastIndex = leastIndexs[i];
                offsetWeight -= weights[leastIndex];
                if (offsetWeight < 0) return invokers.get(leastIndex);
            }
        }
        return invokers.get(leastIndexs[ThreadLocalRandom.current().nextInt(leastCount)]);
    }
}

3.5 ConsistentHashLoadBalance:虚拟节点 + 树形结构

java 复制代码
public class ConsistentHashLoadBalance extends AbstractLoadBalance {
    private final ConcurrentMap<String, ConsistentHashSelector<?>> selectors = new ConcurrentHashMap<>();

    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        String key = invokers.get(0).getUrl().getServiceKey() + "." + invocation.getMethodName();
        int identityHashCode = System.identityHashCode(invokers);
        ConsistentHashSelector<T> selector = (ConsistentHashSelector<T>) selectors.get(key);
        if (selector == null || selector.identityHashCode != identityHashCode) {
            selectors.put(key, new ConsistentHashSelector<>(invokers, invocation.getMethodName(), identityHashCode));
            selector = (ConsistentHashSelector<T>) selectors.get(key);
        }
        return selector.select(invocation);
    }

    private static final class ConsistentHashSelector<T> {
        private final TreeMap<Long, Invoker<T>> virtualInvokers;
        private final int replicaNumber = 160; // 默认虚拟节点数
        private final int identityHashCode;

        ConsistentHashSelector(List<Invoker<T>> invokers, String methodName, int identityHashCode) {
            this.identityHashCode = identityHashCode;
            this.virtualInvokers = new TreeMap<>();
            for (Invoker<T> invoker : invokers) {
                String address = invoker.getUrl().getAddress();
                for (int i = 0; i < replicaNumber / 4; i++) {
                    byte[] digest = md5(address + i);
                    for (int h = 0; h < 4; h++) {
                        long m = hash(digest, h);
                        virtualInvokers.put(m, invoker);
                    }
                }
            }
        }

        Invoker<T> select(Invocation invocation) {
            String key = toKey(invocation.getArguments());
            byte[] digest = md5(key);
            return selectForKey(hash(digest, 0));
        }

        Invoker<T> selectForKey(long hash) {
            Map.Entry<Long, Invoker<T>> entry = virtualInvokers.ceilingEntry(hash);
            if (entry == null) entry = virtualInvokers.firstEntry();
            return entry.getValue();
        }
    }
}

4. 两大机制如何协同:一张序列图看懂

scss 复制代码
ClientProxy.invoke()
  │
  ├─ AbstractClusterInvoker.invoke()
  │     ├─ list()              // 目录刷新
  │     ├─ initLoadBalance()   // 选 LB
  │     └─ doInvoke()
  │           ├─ select()      // LB 选 Invoker
  │           ├─ invoke()      // Netty 发请求
  │           └─ catch()
  │                 ├─ Failover: 循环 select() + retry
  │                 ├─ Failfast: 直接抛
  │                 ├─ Failsafe: 吞异常
  │                 ├─ Failback: 提交定时任务
  │                 └─ Forking: 并行 select() 后竞争结果

5. 性能压测数据:不同策略对比

策略 TPS AVG(rt) 99% rt 失败率
Failover(2) 18 200 18 ms 45 ms 0.0 %
Failfast 21 000 15 ms 38 ms 0.3 %
Failsafe 21 500 14 ms 37 ms 0.3 %(日志)
Failback 20 800 15 ms 39 ms 0.0 %(延迟成功)
Forking(3) 24 000 11 ms 28 ms 0.0 %

环境:4C8G × 3 提供者,1C2G 消费者,RT 20 ms 模拟,Zipkin 关闭。


6. 总结:源码之外,我们还要学什么

  1. 扩展点 :通过 @AdaptiveExtensionLoader 可自行实现灰度、同机房优先等定制策略。
  2. 指标监控RpcStatus 内置了活跃数、成功数、耗时直方图,可直接对接 Prometheus。
  3. 云原生 :Dubbo 3 对接 Kubernetes 后,Pod 弹性伸缩 会导致目录瞬变,一致性哈希需开启 虚拟节点自动漂移 特性(dubbo.cluster.consistenthash.auto-migrate=true)。
  4. Reactive :3.3 快照版已将 CompletableFuture 替换为 Project Reactor ,容错链路透传 Context,可跟踪异步重试全过程。

相关推荐
资源分享交流10 小时前
智能课堂课程系统源码 – 多端自适应_支持讲师课程
源码
他们叫我技术总监17 小时前
从开发者视角深度评测:ModelEngine 与 AI 开发平台的技术博弈
java·人工智能·dubbo·智能体·modelengine
CodeLongBear1 天前
Day02计算机网络网络层学习总结:从协议到路由全解析
学习·计算机网络·dubbo
Tang10243 天前
Android Koltin 图片加载库 Coil 的核心原理
源码
没有bug.的程序员4 天前
Spring Boot Actuator 监控机制解析
java·前端·spring boot·spring·源码
编啊编程啊程5 天前
【018】Dubbo3从0到1系列之时间轮流程图解
rpc·dubbo
编啊编程啊程5 天前
【020】Dubbo3从0到1系列之服务发现
rpc·dubbo
静止了所有花开6 天前
虚拟机ping不通百度的解决方法
dubbo
shenshizhong6 天前
鸿蒙HDF框架源码分析
前端·源码·harmonyos
helloworld_工程师6 天前
Dubbo应用开发之FST序列化的使用
后端·dubbo