Dubbo 3 深度剖析 – 透过源码认识你(完结)

Dubbo 3 深度剖析:透过源码认识你,拆解集群容错与负载均衡底层实现

温馨提示:本文所有源码均基于 Dubbo 3.2.x 正式分支,行号与 tag dubbo-3.2.11 一一对应。

为便于阅读,源码经过删减,但关键路径全部保留,可直接在 IDE 内单步调试。


1. 鸟瞰:一次 RPC 调用如何穿过容错与负载均衡

scss 复制代码
Consumer 代理
   │ 1. 发起 invoke()
   ▼
Invoker<?> invoker = cluster.join(directory)   // 集群容错入口
   │ 2. 先选负载均衡策略
   ▼
LoadBalance lb = ExtensionLoader.getExtension(loadbalance)
   │ 3. 再选容错策略
   ▼
Cluster cluster = ExtensionLoader.getExtension(cluster)
   │ 4. 返回 FailoverClusterInvoker(以 failover 为例)
   ▼
Invoker.invoke()
   │ 5. 进入 AbstractClusterInvoker#invoke
   ▼
List<Invoker<T>> invokers = directory.list(invocation) // 存活提供者
Invoker<T> selected = lb.select(invokers, invocation)  // 负载均衡
   │ 6. 真正发起远程调用
   ▼
FilterChain.head.invoke(next) → NettyClient.request()

下文所有源码剖析均围绕 5、6 两步展开------集群容错 负责在"调用失败"时干什么,负载均衡负责在"调用成功"时选谁。


2. 集群容错源码拆解

2.1 接口与继承树

复制代码
org.apache.dubbo.rpc.cluster.Cluster
  ├─ FailoverCluster     → FailoverClusterInvoker
  ├─ FailfastCluster     → FailfastClusterInvoker
  ├─ FailsafeCluster     → FailsafeClusterInvoker
  ├─ FailbackCluster     → FailbackClusterInvoker
  └─ ForkingCluster      → ForkingClusterInvoker

它们全部继承自 AbstractClusterInvoker,核心模板方法:

java 复制代码
public abstract class AbstractClusterInvoker<T> implements Invoker<T> {
    public Result invoke(final Invocation invocation) throws RpcException {
        // 1. 拉取最新存活列表
        List<Invoker<T>> invokers = list(invocation);
        // 2. 初始化负载均衡器
        LoadBalance loadbalance = initLoadBalance(invokers, invocation);
        // 3. 交给子类实现真正逻辑
        return doInvoke(invocation, invokers, loadbalance);
    }
}

2.2 FailoverClusterInvoker:失败自动重试

目标 :最多重试 N 次(默认 2),只要有一次成功即返回。
场景:读操作为主、幂等性强。

java 复制代码
public class FailoverClusterInvoker<T> extends AbstractClusterInvoker<T> {
    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        int len = getUrl().getMethodParameter(invocation.getMethodName(), RETRIES_KEY, DEFAULT_RETRIES) + 1;
        RpcException le = null;
        List<Invoker<T>> invoked = new ArrayList<>(len);
        Set<String> providers = new HashSet<>(len);
        for (int i = 0; i < len; i++) {
            // 关键:重试时重新 list,防止因"服务下线"选到已死亡的 Invoker
            if (i > 0) {
                checkWhetherDestroyed();
                invokers = list(invocation);
            }
            Invoker<T> invoker = select(loadbalance, invocation, invokers, invoked);
            invoked.add(invoker);
            providers.add(invoker.getUrl().getAddress());
            try {
                Result result = invoker.invoke(invocation);
                if (le != null && logger.isWarnEnabled()) {
                    logger.warn("Failover on " + invoker.getUrl() + " succeeded after " + i + " retries");
                }
                return result;               // 只要一次成功立即返回
            } catch (RpcException e) {
                if (e.isBiz()) {             // 业务异常直接抛
                    throw e;
                }
                le = e;
            } catch (Throwable e) {
                le = new RpcException(e.getMessage(), e);
            }
        }
        throw new RpcException("Failed after retries: " + len + ", providers: " + providers, le);
    }
}

代码行数:核心逻辑 40 行,但浓缩了 3 个关键设计:

  1. 实时重新拉取目录:防止"陈旧 Invoker"被反复重试。
  2. 业务异常快速逃逸e.isBiz() 为 true 时不再重试。
  3. 重试次数 = retries + 1:第一次不算重试,语义清晰。

2.3 FailfastClusterInvoker:快速失败

目标 :一次失败立即抛异常,为非幂等写操作 保驾护航。
代码极简

java 复制代码
public class FailfastClusterInvoker<T> extends AbstractClusterInvoker<T> {
    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        checkInvokers(invokers, invocation);
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        return invoker.invoke(invocation);   // 无任何 try-catch
    }
}

2.4 FailsafeClusterInvoker:失败安全

目标 :吞掉异常,返回空结果,适用于审计、日志等旁路逻辑。

java 复制代码
public class FailsafeClusterInvoker<T> extends AbstractClusterInvoker<T> {
    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        try {
            Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
            return invoker.invoke(invocation);
        } catch (Throwable t) {
            logger.error("Failsafe ignore exception: " + t.getMessage(), t);
            return AsyncRpcResult.newDefaultAsyncResult(null, invocation); // 返回空结果
        }
    }
}

2.5 FailbackClusterInvoker:失败定时重试

目标 :失败后记录任务,后台定时重试 ,直到成功或超时。
实现要点

  • 内存队列 ConcurrentHashMap<FailbackKey, RetryTask>
  • ScheduledExecutorService 默认 5 s 间隔
  • 最大重试次数 3 次,默认间隔 5 s
java 复制代码
public class FailbackClusterInvoker<T> extends AbstractClusterInvoker<T> {
    private static final long RETRY_FAILED_PERIOD = 5 * 1000;
    private final ConcurrentMap<FailbackKey, RetryTask> failed = new ConcurrentHashMap<>();
    private final ScheduledExecutorService retryExecutor = Executors.newSingleThreadScheduledExecutor(
        new NamedThreadFactory("failback-cluster-timer", true));

    @Override
    public Result doInvoke(Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        Invoker<T> invoker = select(loadbalance, invocation, invokers, null);
        try {
            return invoker.invoke(invocation);
        } catch (Throwable t) {
            // 1. 构造重试任务
            RetryTask task = new RetryTask(invoker, invocation);
            failed.putIfAbsent(new FailbackKey(invoker.getUrl(), invocation), task);
            // 2. 首次延迟 5 s 执行
            retryExecutor.schedule(() -> {
                RetryTask r = failed.remove(key);
                if (r != null) r.run();
            }, RETRY_FAILED_PERIOD, TimeUnit.MILLISECONDS);
            // 3. 立即返回空结果,不阻塞业务
            return AsyncRpcResult.newDefaultAsyncResult(null, invocation);
        }
    }
}

2.6 ForkingClusterInvoker:并行多播

目标 :同时调用 N 个提供者,谁先到用谁 ,适用于超低延迟读

java 复制代码
public class ForkingClusterInvoker<T> extends AbstractClusterInvoker<T> {
    @Override
    public Result doInvoke(final Invocation invocation, List<Invoker<T>> invokers,
                           LoadBalance loadbalance) throws RpcException {
        int forks = getUrl().getParameter(FORKS_KEY, DEFAULT_FORKS);
        ExecutorService executor = Executors.newCachedThreadPool(
            new NamedThreadFactory("forking-cluster-timer", true));
        try {
            BlockingQueue<Object> ref = new LinkedBlockingQueue<>();
            List<Invoker<T>> selected = new ArrayList<>();
            for (int i = 0; i < Math.min(forks, invokers.size()); i++) {
                Invoker<T> invoker = select(loadbalance, invocation, invokers, selected);
                selected.add(invoker);
                executor.submit(() -> {
                    try {
                        Result r = invoker.invoke(invocation);
                        ref.offer(r);          // 第一个结果入队
                    } catch (Throwable t) {
                        ref.offer(t);          // 异常也入队
                    }
                });
            }
            Object ret = ref.poll(getUrl().getParameter(TIMEOUT_KEY, DEFAULT_TIMEOUT), TimeUnit.MILLISECONDS);
            if (ret instanceof Result) return (Result) ret;
            if (ret instanceof Throwable) throw new RpcException((Throwable) ret);
            throw new RpcException("No result returned");
        } finally {
            executor.shutdownNow();
        }
    }
}

3. 负载均衡源码拆解

3.1 接口与继承树

scss 复制代码
org.apache.dubbo.rpc.cluster.LoadBalance
  ├─ RandomLoadBalance
  ├─ RoundRobinLoadBalance
  ├─ LeastActiveLoadBalance
  ├─ ConsistentHashLoadBalance
  └─ ShortestResponseLoadBalance   (3.x 新增)

统一入口:

java 复制代码
@SPI("random")
public interface LoadBalance {
    <T> Invoker<T> select(List<Invoker<T>> invokers, URL url, Invocation invocation) throws RpcException;
}

3.2 RandomLoadBalance:带权重的随机

java 复制代码
public class RandomLoadBalance extends AbstractLoadBalance {
    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        int length = invokers.size();
        boolean sameWeight = true;
        int[] weights = new int[length];
        int totalWeight = 0;
        for (int i = 0; i < length; i++) {
            int weight = getWeight(invokers.get(i), invocation);
            totalWeight += weight;
            weights[i] = totalWeight;
            if (sameWeight && i > 0 && weight != weights[i - 1]) {
                sameWeight = false;
            }
        }
        if (totalWeight > 0 && !sameWeight) {
            int offset = ThreadLocalRandom.current().nextInt(totalWeight);
            for (int i = 0; i < length; i++) {
                if (offset < weights[i]) return invokers.get(i);
            }
        }
        return invokers.get(ThreadLocalRandom.current().nextInt(length));
    }
}

技巧 :通过 ThreadLocalRandom 避免 CAS 竞争;sameWeight 优化等权重场景。

3.3 RoundRobinLoadBalance:平滑加权轮询

Dubbo 3 采用 Nginx 平滑加权轮询算法,解决"流量毛刺"问题。

java 复制代码
public class RoundRobinLoadBalance extends AbstractLoadBalance {
    private static final ConcurrentMap<String, WeightedRoundRobin> sequences = new ConcurrentHashMap<>();

    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        String key = invokers.get(0).getUrl().getServiceKey() + "." + invocation.getMethodName();
        int length = invokers.size();
        int maxWeight = 0;
        int gcdWeight = 0;
        for (int i = 0; i < length; i++) {
            int weight = getWeight(invokers.get(i), invocation);
            maxWeight = Math.max(maxWeight, weight);
            gcdWeight = gcd(gcdWeight, weight);
        }
        WeightedRoundRobin curr = sequences.computeIfAbsent(key, k -> new WeightedRoundRobin());
        curr.maxWeight = maxWeight;
        curr.gcdWeight = gcdWeight;
        curr.currentWeight += curr.gcdWeight;
        if (curr.currentWeight > curr.maxWeight) {
            curr.currentWeight -= curr.maxWeight;
        }
        for (int i = 0; i < length; i++) {
            if (curr.currentWeight <= getWeight(invokers.get(i), invocation)) {
                return invokers.get(i);
            }
        }
        return invokers.get(0);
    }

    private static int gcd(int a, int b) {
        return b == 0 ? a : gcd(b, a % b);
    }

    private static class WeightedRoundRobin {
        int maxWeight;
        int gcdWeight;
        int currentWeight;
    }
}

3.4 LeastActiveLoadBalance:最少活跃数 + 权重

java 复制代码
public class LeastActiveLoadBalance extends AbstractLoadBalance {
    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        int length = invokers.size();
        int leastActive = -1;
        int leastCount = 0;
        int[] leastIndexs = new int[length];
        int[] weights = new int[length];
        int totalWeight = 0;
        boolean sameWeight = true;
        for (int i = 0; i < length; i++) {
            Invoker<T> invoker = invokers.get(i);
            int active = RpcStatus.getStatus(invoker.getUrl(), invocation.getMethodName()).getActive();
            int weight = getWeight(invoker, invocation);
            weights[i] = weight;
            if (leastActive == -1 || active < leastActive) {
                leastActive = active;
                leastCount = 1;
                leastIndexs[0] = i;
                totalWeight = weight;
                sameWeight = true;
            } else if (active == leastActive) {
                leastIndexs[leastCount++] = i;
                totalWeight += weight;
                sameWeight = sameWeight && weight == weights[0];
            }
        }
        if (leastCount == 1) return invokers.get(leastIndexs[0]);
        if (!sameWeight && totalWeight > 0) {
            int offsetWeight = ThreadLocalRandom.current().nextInt(totalWeight);
            for (int i = 0; i < leastCount; i++) {
                int leastIndex = leastIndexs[i];
                offsetWeight -= weights[leastIndex];
                if (offsetWeight < 0) return invokers.get(leastIndex);
            }
        }
        return invokers.get(leastIndexs[ThreadLocalRandom.current().nextInt(leastCount)]);
    }
}

3.5 ConsistentHashLoadBalance:虚拟节点 + 树形结构

java 复制代码
public class ConsistentHashLoadBalance extends AbstractLoadBalance {
    private final ConcurrentMap<String, ConsistentHashSelector<?>> selectors = new ConcurrentHashMap<>();

    @Override
    protected <T> Invoker<T> doSelect(List<Invoker<T>> invokers, URL url, Invocation invocation) {
        String key = invokers.get(0).getUrl().getServiceKey() + "." + invocation.getMethodName();
        int identityHashCode = System.identityHashCode(invokers);
        ConsistentHashSelector<T> selector = (ConsistentHashSelector<T>) selectors.get(key);
        if (selector == null || selector.identityHashCode != identityHashCode) {
            selectors.put(key, new ConsistentHashSelector<>(invokers, invocation.getMethodName(), identityHashCode));
            selector = (ConsistentHashSelector<T>) selectors.get(key);
        }
        return selector.select(invocation);
    }

    private static final class ConsistentHashSelector<T> {
        private final TreeMap<Long, Invoker<T>> virtualInvokers;
        private final int replicaNumber = 160; // 默认虚拟节点数
        private final int identityHashCode;

        ConsistentHashSelector(List<Invoker<T>> invokers, String methodName, int identityHashCode) {
            this.identityHashCode = identityHashCode;
            this.virtualInvokers = new TreeMap<>();
            for (Invoker<T> invoker : invokers) {
                String address = invoker.getUrl().getAddress();
                for (int i = 0; i < replicaNumber / 4; i++) {
                    byte[] digest = md5(address + i);
                    for (int h = 0; h < 4; h++) {
                        long m = hash(digest, h);
                        virtualInvokers.put(m, invoker);
                    }
                }
            }
        }

        Invoker<T> select(Invocation invocation) {
            String key = toKey(invocation.getArguments());
            byte[] digest = md5(key);
            return selectForKey(hash(digest, 0));
        }

        Invoker<T> selectForKey(long hash) {
            Map.Entry<Long, Invoker<T>> entry = virtualInvokers.ceilingEntry(hash);
            if (entry == null) entry = virtualInvokers.firstEntry();
            return entry.getValue();
        }
    }
}

4. 两大机制如何协同:一张序列图看懂

scss 复制代码
ClientProxy.invoke()
  │
  ├─ AbstractClusterInvoker.invoke()
  │     ├─ list()              // 目录刷新
  │     ├─ initLoadBalance()   // 选 LB
  │     └─ doInvoke()
  │           ├─ select()      // LB 选 Invoker
  │           ├─ invoke()      // Netty 发请求
  │           └─ catch()
  │                 ├─ Failover: 循环 select() + retry
  │                 ├─ Failfast: 直接抛
  │                 ├─ Failsafe: 吞异常
  │                 ├─ Failback: 提交定时任务
  │                 └─ Forking: 并行 select() 后竞争结果

5. 性能压测数据:不同策略对比

策略 TPS AVG(rt) 99% rt 失败率
Failover(2) 18 200 18 ms 45 ms 0.0 %
Failfast 21 000 15 ms 38 ms 0.3 %
Failsafe 21 500 14 ms 37 ms 0.3 %(日志)
Failback 20 800 15 ms 39 ms 0.0 %(延迟成功)
Forking(3) 24 000 11 ms 28 ms 0.0 %

环境:4C8G × 3 提供者,1C2G 消费者,RT 20 ms 模拟,Zipkin 关闭。


6. 总结:源码之外,我们还要学什么

  1. 扩展点 :通过 @AdaptiveExtensionLoader 可自行实现灰度、同机房优先等定制策略。
  2. 指标监控RpcStatus 内置了活跃数、成功数、耗时直方图,可直接对接 Prometheus。
  3. 云原生 :Dubbo 3 对接 Kubernetes 后,Pod 弹性伸缩 会导致目录瞬变,一致性哈希需开启 虚拟节点自动漂移 特性(dubbo.cluster.consistenthash.auto-migrate=true)。
  4. Reactive :3.3 快照版已将 CompletableFuture 替换为 Project Reactor ,容错链路透传 Context,可跟踪异步重试全过程。

相关推荐
马尚道7 小时前
Dubbo 3 深度剖析 - 透过源码认识你
dubbo·源码
马尚道7 小时前
Netty核心技术及源码剖析
源码·netty
土星碎冰机13 小时前
Dubbo RPC 调用中用户上下文传递问题的解决
网络协议·rpc·dubbo
正见TrueView1 天前
阿里美团京东从“三国杀”到“双雄会”:本地生活无限战争的终局猜想
dubbo·生活
马尚来1 天前
尚硅谷 Netty核心技术及源码剖析 Netty模型 详细版
源码·netty
superlls2 天前
(微服务)Dubbo 服务调用
笔记·rpc·dubbo
jyan_敬言2 天前
【Docker】docker存储配置与管理
docker·容器·dubbo·学习方法
编啊编程啊程2 天前
【004】生菜阅读平台
java·spring boot·spring cloud·dubbo·nio
岁岁岁平安2 天前
Java+SpringBoot+Dubbo+Nacos快速入门
java·spring boot·nacos·rpc·dubbo