Paimon源码解读 -- Compaction-9.SortMergeReaderWithLoserTree

前言

上篇文章Paimon源码解读 -- Compaction-1.MergeTreeCompactTask解析了Paimon-Compaction阶段的大概流程

其中Paimon的compaction操作由如下几个部分组成,

  1. SingleFileWriter和RollingFileWriter去执行写入和滚动文件操作 -- 详情看文章Paimon源码解读 -- Compaction-2.SingleFileWriter和RollingFileWriter
  2. ReducerMergeFunctionWrapper去执行聚合逻辑 -- 详情看文章Paimon源码解读 -- PartialUpdateMerge
  3. readerForMergeTree()最后由SortMergeReader去执行特定的合并算法,去将文件进行排序合并重写 -- 调用的流程请看文章Paimon源码解读 -- Compaction-3.MergeSorter

一.SortMergeReader

1.入口函数

首先SortMergeReader是在MergeSorter.mergeSortNoSpill()中调用的,具体流程如下

java 复制代码
// 无溢出合并的方法
public <T> RecordReader<T> mergeSortNoSpill(
        List<? extends ReaderSupplier<KeyValue>> lazyReaders,
        Comparator<InternalRow> keyComparator,
        @Nullable FieldsComparator userDefinedSeqComparator,
        MergeFunctionWrapper<T> mergeFunction)
        throws IOException {
    List<RecordReader<KeyValue>> readers = new ArrayList<>(lazyReaders.size());
    for (ReaderSupplier<KeyValue> supplier : lazyReaders) {
        try {
            readers.add(supplier.get()); // 底层调MergeTreeReaders.readerForRun()获取当前Sorted Run下所有DF文件创建且合起来的RecordReader,其实就是获取当前Sorted Run对应的RecordReader
        } catch (IOException e) {
            // if one of the readers creating failed, we need to close them all.
            // 如果当前Sorted Run下所有DF文件创建的RecordReader,有任何一个有问题,则全部关闭
            readers.forEach(IOUtils::closeQuietly);
            throw e;
        }
    }
    // 创建一个排序合并读取器,这是创建排序合并算法器的入口
    return SortMergeReader.createSortMergeReader(
            readers, keyComparator, userDefinedSeqComparator, mergeFunction, sortEngine); 
}

2.SortMergeReader本身

java 复制代码
public interface SortMergeReader<T> extends RecordReader<T> {

    // 由MergeSorter.mergeSortNoSpill()调用,该方法是创建排序合并算法器的入口
    static <T> SortMergeReader<T> createSortMergeReader(
            List<RecordReader<KeyValue>> readers,
            Comparator<InternalRow> userKeyComparator,
            @Nullable FieldsComparator userDefinedSeqComparator,
            MergeFunctionWrapper<T> mergeFunctionWrapper,
            SortEngine sortEngine) {
        // 根据配置的'sort-engine',去创建对应的算法排序合并器,默认是loser-tree
        switch (sortEngine) {
            case MIN_HEAP: // 最小堆min-heap算法
                return new SortMergeReaderWithMinHeap<>(
                        readers, userKeyComparator, userDefinedSeqComparator, mergeFunctionWrapper);
            case LOSER_TREE: // 败者树loser-tree算法
                return new SortMergeReaderWithLoserTree<>(
                        readers, userKeyComparator, userDefinedSeqComparator, mergeFunctionWrapper);
            default:
                throw new UnsupportedOperationException("Unsupported sort engine: " + sortEngine);
        }
    }
}

二.SortMergeReaderWithLoserTree -- 败者树SortMergeReader

1.源码机制

(1) 核心属性和构造函数

swift 复制代码
SortMergeReaderWithLoserTree<T>
  ├── MergeFunctionWrapper<T> mergeFunctionWrapper  // 合并函数包装器
  └── LoserTree<KeyValue> loserTree                 // 败者树
       ├── int[] tree                                // 败者索引数组
       ├── List<LeafIterator<KeyValue>> leaves       // 叶子节点列表
       ├── Comparator<KeyValue> firstComparator      // Key 比较器
       └── Comparator<KeyValue> secondComparator     // Sequence 比较器

SortMergeIterator (内部类)
  └── boolean released                               // 是否已释放
java 复制代码
private final MergeFunctionWrapper<T> mergeFunctionWrapper; // MergeFunction的包装器,由'merge-engine'参数绑定
private final LoserTree<KeyValue> loserTree; // 败者树数据结构

public SortMergeReaderWithLoserTree(
        List<RecordReader<KeyValue>> readers,
        Comparator<InternalRow> userKeyComparator,
        @Nullable FieldsComparator userDefinedSeqComparator,
        MergeFunctionWrapper<T> mergeFunctionWrapper) {
    this.mergeFunctionWrapper = mergeFunctionWrapper;
    // 创建败者树
    this.loserTree =
            new LoserTree<>(
                    readers, // 所有RecordReader,每个RecordReader对应一个败者树节点
                    (e1, e2) -> userKeyComparator.compare(e2.key(), e1.key()), // Key的比较器
                    createSequenceComparator(userDefinedSeqComparator)); // Sequence比较器
}

(2) createSequenceComparator() -- 创建sequence比较器

less 复制代码
// 创建sequnce比较器
private Comparator<KeyValue> createSequenceComparator(
        @Nullable FieldsComparator userDefinedSeqComparator) {
    if (userDefinedSeqComparator == null) {
        // 默认:按 sequenceNumber 降序
        return (e1, e2) -> Long.compare(e2.sequenceNumber(), e1.sequenceNumber());
    }
    // 用户自定义sequence-field:先降序比较用户字段,再比较 sequenceNumber
    return (o1, o2) -> {
        int result = userDefinedSeqComparator.compare(o2.value(), o1.value());
        if (result != 0) {
            return result;
        }
        return Long.compare(o2.sequenceNumber(), o1.sequenceNumber());
    };
}

(3) readeBatch()

java 复制代码
// 与min-heap相比,loser-tree只会产生一个批次
// 读取批次
@Nullable
@Override
public RecordIterator<T> readBatch() throws IOException {
    loserTree.initializeIfNeeded(); // 初始化败者树
    // 检查是否有数据
    // peekWinner() 返回当前全局赢家,如果为 null 说明还没有全局赢家,需要返回SortMergeIterator
    return loserTree.peekWinner() == null ? null : new SortMergeIterator();
}

(4) SortMergeIterator -- 核心

java 复制代码
private class SortMergeIterator implements RecordIterator<T> {

    private boolean released = false;

    @Nullable
    @Override
    public T next() throws IOException {
        while (true) {
            // 1.调整树并更新下一个winner
            loserTree.adjustForNextLoop();
            // 2.弹出这个winner,并更新下下一个winner
            KeyValue winner = loserTree.popWinner();
            if (winner == null) {
                return null;
            }
            // 3.调合并函数的reset()重置缓存
            mergeFunctionWrapper.reset();
            // 4.调合并函数对winner进行合并
            mergeFunctionWrapper.add(winner); // 记住:Key小的先弹出,Key大的后弹出;旧的数据会先弹出,新的数据会后弹出;
            // 5.合并所有相同的Key,并返回合并结果
            T result = merge();
            if (result != null) {
                return result;
            }
        }
    }

    private T merge() {
        Preconditions.checkState(
                !released, "SortMergeIterator#nextImpl is called after release");
        // 持续弹出相同Key的winner,并重新调整树
        while (loserTree.peekWinner() != null) {
            // 进行合并
            mergeFunctionWrapper.add(loserTree.popWinner());
        }
        // 返回合并结果
        return mergeFunctionWrapper.getResult();
    }

    @Override
    public void releaseBatch() {
        released = true;
    }
}

2.LoserTree -- 败者树

核心点就是Key小的、Sequnece小的优先弹出进行优先聚合;Key大的、Sequence大的最后弹出,再进行聚合

(1) 核心属性、枚举类等

<1> 属性
java 复制代码
private final int[] tree; // 存储败者索引的数组,tree[0]的值表示当前winner的索引位置
private final int size; // 叶子节点数量
private final List<LeafIterator<T>> leaves; // 叶子节点列表

// 补充:如果comparator.compare('a', 'b') > 0 则认为a是胜者,那么a会作为父节点,b作为叶子节点
private final Comparator<T> firstComparator; // Key 比较器,优先比较

/** same as firstComparator, but mainly used to compare sequenceNumber. */
private final Comparator<T> secondComparator; // Sequence 比较器,其次比较

private boolean initialized; // 是否已初始化
<2> 构造函数
java 复制代码
public LoserTree(
        List<RecordReader<T>> nextBatchReaders,  // 所有待合并的 RecordReader
        Comparator<T> firstComparator,           // Key 比较器
        Comparator<T> secondComparator) {        // Sequence 比较器
    this.size = nextBatchReaders.size();
    this.leaves = new ArrayList<>(size);
    this.tree = new int[size]; // 树数组大小等于叶子数量
    
    // 如果 compare(a, b) > 0,则 a 是赢家(a 更优)
    // 如果 compare(a, b) < 0,则 b 是赢家(b 更优)
    // 如果 compare(a, b) = 0,则需要使用 secondComparator 继续比较
    
    // null 值处理:
    // e1 == null => 返回 -1(e1 是败者)
    // e2 == null => 返回 1(e1 是赢家)
    this.firstComparator =
            (e1, e2) -> e1 == null ? -1 : (e2 == null ? 1 : firstComparator.compare(e1, e2));
    this.secondComparator =
            (e1, e2) -> e1 == null ? -1 : (e2 == null ? 1 : secondComparator.compare(e1, e2));
    this.initialized = false;
    // 为每个 RecordReader 创建 LeafIterator 包装器
    for (RecordReader<T> reader : nextBatchReaders) {
        LeafIterator<T> iterator = new LeafIterator<>(reader);
        this.leaves.add(iterator);
    }
}
<3> State枚举类 -- 核心

整个算法的核心是根据这个状态去判断是赢家还是输家,然后进行调整父子节点位置的

java 复制代码
private enum State {
    LOSER_WITH_NEW_KEY(false),  // 败者,新Key
    LOSER_WITH_SAME_KEY(false), // 败者,相同Key
    LOSER_POPPED(false),        // 败者,已弹出
    WINNER_WITH_NEW_KEY(true),  // 胜者,新Key
    WINNER_WITH_SAME_KEY(true), // 胜者,相同Key
    WINNER_POPPED(true);        // 胜者,已弹出

    private final boolean winner;

    State(boolean winner) {
        this.winner = winner;
    }

    public boolean isWinner() {
        return winner;
    }
}
<4> LeafIterator叶子节点迭代器
java 复制代码
// 叶子节点迭代器
private static class LeafIterator<T> implements Closeable {
    private final RecordReader<T> reader; // 底层RecordReader

    private RecordReader.RecordIterator<T> iterator; // 当前批次迭代器

    private T kv; // 当前KeyValue

    private boolean endOfInput; // 是否读取完毕

    private int firstSameKeyIndex; // 第一个相同Key的索引,这个是从上往下算的

    private State state; // 节点状态

    private LeafIterator(RecordReader<T> reader) {
        this.reader = reader;
        this.endOfInput = false;
        this.firstSameKeyIndex = -1;
        this.state = State.WINNER_WITH_NEW_KEY; // 默认每个叶子节点都是新Key的胜者,后续会和父节点进行比较
    }
    // 返回当前叶子节点的KeyValue
    public T peek() {
        return kv;
    }
    // 弹出当前叶子节点的KeyValue,并标记该节点的state为胜者已弹出
    public T pop() {
        this.state = State.WINNER_POPPED;
        return kv;
    }
    // 更新第一个相同Key节点的索引位置
    public void setFirstSameKeyIndex(int index) {
        if (firstSameKeyIndex == -1) {
            firstSameKeyIndex = index;
        }
    }

    // 推进到下一个记录
    public void advanceIfAvailable() throws IOException {
        // 重置状态
        this.firstSameKeyIndex = -1;
        this.state = State.WINNER_WITH_NEW_KEY;
        // 尝试从当前迭代器读取下一个记录
        if (iterator == null || (kv = iterator.next()) == null) {
            // 当前批次已经读完,需要读取新的批次
            while (!endOfInput) {
                if (iterator != null) {
                    iterator.releaseBatch(); // 释放当前批次内存
                    iterator = null;
                }
                // 读取下一批次
                iterator = reader.readBatch();
                if (iterator == null) {
                    // 所有数据已读完,重置属性
                    endOfInput = true;
                    kv = null;
                    reader.close();
                } else if ((kv = iterator.next()) != null) {
                    // 成功读取到数据,则break
                    break;
                }
            }
        }
    }

    @Override
    public void close() throws IOException {
        if (this.iterator != null) {
            this.iterator.releaseBatch();
            this.iterator = null;
        }
        this.reader.close();
    }
}

(2) initializeIfNeeded() -- 初始化败者树

案例

ini 复制代码
初始数据:
Leaf 0: key=5, seq=100
Leaf 1: key=3, seq=200
Leaf 2: key=7, seq=150

执行过程:
1. i=2: advanceIfAvailable() → key=7
        adjust(2) → tree[某位置] = 2 (败者)
        
2. i=1: advanceIfAvailable() → key=3
        adjust(1) → 与 Leaf 2 比较
        key=3 < key=7 → Leaf 1 更优(赢家)
        
3. i=0: advanceIfAvailable() → key=5
        adjust(0) → 与当前赢家比较
        key=3 < key=5 → Leaf 1 仍是赢家

最终结果:
tree[0] = 1 (Leaf 1 是全局赢家,key=3 最小,先弹出)
java 复制代码
public void initializeIfNeeded() throws IOException {
    if (!initialized) {
        Arrays.fill(tree, -1); // 初始化败者数组为-1,最开始没有败者和胜者
        // 从最后一个叶子开始,逐个读取数据并构建败者树
        // 为什么从后向前?因为败者树是从叶子向根构建的
        for (int i = size - 1; i >= 0; i--) {
            leaves.get(i).advanceIfAvailable(); // 读取该节点的第一个KeyValue,并更新state=WINNER_WITH_NEW_KEY, firstSameKeyIndex=-1
            adjust(i); // 调整树结构,让该节点参与比较
        }
        initialized = true;
    }
}

(3) adjustForNextLoop() -- 调整并获取下一个winner

java 复制代码
public void adjustForNextLoop() throws IOException {
    LeafIterator<T> winner = leaves.get(tree[0]);
    // 持续调整直到赢家状态不是 WINNER_POPPED
    while (winner.state == State.WINNER_POPPED) {
        winner.advanceIfAvailable();             // 读取下一个 KV
        adjust(tree[0]);                         // 重新调整树,尝试将winner相同Key的节点提到下一次的winnerNode中
        winner = leaves.get(tree[0]);
    }
}

(4) popWinner() -- 弹出当前的赢家,并且标记其为WINNER_POPPED

java 复制代码
// 弹出当前的赢家,并且标记其为WINNER_POPPED
public T popWinner() {
    LeafIterator<T> winner = leaves.get(tree[0]);
    // 如果当前winner已经弹出过,返回 null
    if (winner.state == State.WINNER_POPPED) {
        // if the winner has already been popped, it means that all the same key has been
        // processed.
        return null;
    }
    // 如果当前winner没有弹出过,将winner弹出,并标记为WINNER_POPED,重新调整树
    T result = winner.pop();
    adjust(tree[0]); // 重新调整树结构,将该winner相同的Key的下一个节点提到winner位置,以便后续继续弹出该key的数据
    return result;
}

(5) peekWinner() -- 找到当前的赢家

返回当前的赢家,不可以是已经处理过的赢家;否则,返回null。

java 复制代码
public T peekWinner() {
    // 返回当前的赢家,不可以是已经处理过的赢家
    return leaves.get(tree[0]).state != State.WINNER_POPPED ? leaves.get(tree[0]).peek() : null;
}

(6) adjust() -- 算法核心

补充:adjust(tree[0]):重新调整树,尝试将当前tree[0]对应winner的相同Key的节点提到下一次的winnerNode中,进行交换位置;如果没有这相同的Key,则进行比较其parent位置节点,进行更新winner

java 复制代码
private void adjust(int winner) {
    // 核心循环:从叶子节点向上遍历到根节点,这里的winner最开始是当前节点
    // parent 计算公式:(winner + size) / 2
    // 这是完全二叉树的父节点索引计算公式(数组从 0 开始)
    // 从叶子节点向上遍历到根节点
    for (int parent = (winner + this.size) / 2; parent > 0 && winner >= 0; parent /= 2) {
        LeafIterator<T> winnerNode = leaves.get(winner);
        LeafIterator<T> parentNode;
        // CASE-1: 初始化阶段,都还没有败者和胜者,那么将当前的winner标记为新key败者,以便后续比较
        if (this.tree[parent] == -1) {
            // 重新设置winner节点状态为LOSER_WITH_NEW_KEY
            winnerNode.state = State.LOSER_WITH_NEW_KEY;
        }
        // CASE-2: 正常调整,已经有败者了
        else {
            // 获取当前节点的父节点
            parentNode = leaves.get(this.tree[parent]);
            // 根据 winner 的当前状态,选择不同的调整策略
            switch (winnerNode.state) {
                case WINNER_WITH_NEW_KEY:
                    // winner 持有新 Key,需要完整比较,调adjustWithNewWinnerKey()去处理
                    adjustWithNewWinnerKey(parent, parentNode, winnerNode);
                    break;
                case WINNER_WITH_SAME_KEY:
                    // winner 持有相同 Key,只需比较 Sequence,调adjustWithSameWinnerKey()去处理
                    adjustWithSameWinnerKey(parent, parentNode, winnerNode);
                    break;
                case WINNER_POPPED:
                    // winner 已被弹出,快速路径优化
                    if (winnerNode.firstSameKeyIndex < 0) {
                        // 没有更多相同 Key 了,停止调整
                        parent = -1;
                    } else {
                        // 快速路径:根据winner节点记录的firstSameKeyIndex,直接跳转到第一个相同 Key 的位置
                        // 更新winner节点的state为LOSER_POPPED,并更新parentNode和其状态,后续会交换
                        parent = winnerNode.firstSameKeyIndex;
                        parentNode = leaves.get(this.tree[parent]);
                        winnerNode.state = State.LOSER_POPPED;
                        parentNode.state = State.WINNER_WITH_SAME_KEY;
                    }
                    break;
                default:
                    throw new UnsupportedOperationException(
                            "unknown state for " + winnerNode.state.name());
            }
        }

        // if the winner loses, exchange nodes.
        // 如果处理完,最开始的胜者节点状态变了,不再是胜者了,那么调整胜者节点
        // 原 parent 中的败者成为新 winner,继续向上比较
        if (!winnerNode.state.isWinner()) {
            int tmp = winner;
            winner = this.tree[parent];
            this.tree[parent] = tmp;
        }
    }
    // 循环结束,最终 winner 到达根节点
    this.tree[0] = winner;
}

(7) adjustWithNewWinnerKey() -- 新Key赢家的处理

java 复制代码
private void adjustWithNewWinnerKey(
        int index, LeafIterator<T> parentNode, LeafIterator<T> winnerNode) {
    // 处理场景:当前 winner 持有的是新 Key(与全局赢家不同)
    // 判断父节点情况
    switch (parentNode.state) {
        // CASE-1: parentNode 也是新 Key,需要完整比较
        case LOSER_WITH_NEW_KEY:
            // when the new winner is also a new key, it needs to be compared.
            T parentKey = parentNode.peek();
            T childKey = winnerNode.peek();
            /* 1.先比较Key,
            注意:SortMergeReaderWithLoserTree传入的Key比较器是(e1, e2) -> userKeyComparator.compare(e2.key(), e1.key())
            firstComparator.compare(parentKey, childKey)
                返回>0,表示child的key大,parent是赢家
                返回<0,表示parent的key大,child是赢家
                返回=0,表示一样大,需要再进行Sequence比较
             */
            int firstResult = firstComparator.compare(parentKey, childKey);
            // SUB-CASE-1: Key一样,再进行Sequence比较
            if (firstResult == 0) {
                // 注意:secondComparator其实是(e1, e2) -> Long.compare(e2.sequenceNumber(), e1.sequenceNumber());
                int secondResult = secondComparator.compare(parentKey, childKey);

                // parent的Sequence更大,child是旧的数据,因此,它是赢家,优先弹出
                if (secondResult < 0) {
                    parentNode.state = State.LOSER_WITH_SAME_KEY;
                    winnerNode.setFirstSameKeyIndex(index); // 记录相同 Key 的位置
                }
                // child 的 Sequence 更大或相等,parent是旧的数据,则parent是赢家,优先弹出
                else {
                    winnerNode.state = State.LOSER_WITH_SAME_KEY;
                    parentNode.state = State.WINNER_WITH_NEW_KEY;
                    parentNode.setFirstSameKeyIndex(index);
                }
            }
            // SUB-CASE-2: child的Key大,则 parent是赢家,需更新俩节点状态
            else if (firstResult > 0) {
                // the two keys are completely different and just need to update the state.
                parentNode.state = State.WINNER_WITH_NEW_KEY;
                winnerNode.state = State.LOSER_WITH_NEW_KEY;
            }
            return;
        // CASE-2: 不可能的情况,因为相同的key会走adjustWithSameWinnerKey这个方法,直接抛出异常
        case LOSER_WITH_SAME_KEY:
            throw new RuntimeException(
                    "This is a bug. Please file an issue. A node in the WINNER_WITH_NEW_KEY "
                            + "state cannot encounter a node in the LOSER_WITH_SAME_KEY state.");
        // CASE-3: 仅发生在 adjustForNextLoop 中
        // parent 已经被弹出处理过,现在有新数据进来
        case LOSER_POPPED:
            parentNode.state = State.WINNER_POPPED;
            parentNode.firstSameKeyIndex = -1;
            winnerNode.state = State.LOSER_WITH_NEW_KEY;
            return;
        default:
            throw new UnsupportedOperationException(
                    "unknown state for " + parentNode.state.name());
    }
}

(8) adjustWithSameWinnerKey() -- 相同key赢家的处理

java 复制代码
private void adjustWithSameWinnerKey(
        int index, LeafIterator<T> parentNode, LeafIterator<T> winnerNode) {
    switch (parentNode.state) {
        // CASE-1: parent有相同的Key,则只需要再比较Sequnece即可
        case LOSER_WITH_SAME_KEY:
            // the key of the previous loser is the same as the key of the current winner,
            // only the sequence needs to be compared.
            T parentKey = parentNode.peek();
            T childKey = winnerNode.peek();
            // 比较Sequence
            int secondResult = secondComparator.compare(parentKey, childKey);
            // child 的 Sequence 更大,parent 是旧的,需要先弹出,调整两节点的状态
            if (secondResult > 0) {
                parentNode.state = State.WINNER_WITH_SAME_KEY;
                winnerNode.state = State.LOSER_WITH_SAME_KEY;
                parentNode.setFirstSameKeyIndex(index);
            }
            // parent 的 Sequence 更大,child 是旧的,需要先弹出,不需要调整状态,只需要记录index即可
            else {
                winnerNode.setFirstSameKeyIndex(index);
            }
            return;
        // CASE-2: 其他情况,都不在这个方法处理,会在adjustWithNewWinnerKey()中处理
        case LOSER_WITH_NEW_KEY:
        case LOSER_POPPED:
            return;
        default:
            throw new UnsupportedOperationException(
                    "unknown state for " + parentNode.state.name());
    }
}

3.流程总结

(1) 初始状态

ini 复制代码
tree = [-1, -1, -1]
size = 3

所有 Leaf 初始状态:
Leaf 0: kv=null, state=WINNER_WITH_NEW_KEY, firstSameKeyIndex=-1
Leaf 1: kv=null, state=WINNER_WITH_NEW_KEY, firstSameKeyIndex=-1
Leaf 2: kv=null, state=WINNER_WITH_NEW_KEY, firstSameKeyIndex=-1

(2) initializeIfNeeded阶段

<1> 步骤 1:i=2, adjust(2)

advanceIfAvailable()

ini 复制代码
Leaf 2 读取第一条记录
Leaf 2: kv={key=1, seq=150, value="C"}, state=WINNER_WITH_NEW_KEY, firstSameKeyIndex=-1

adjust(2)

java 复制代码
winner = 2
parent = (2 + 3) / 2 = 2

循环第 1 轮:parent=2, winner=2
  winnerNode = Leaf 2
  tree[2] == -1  // 初始化分支
  → winnerNode.state = LOSER_WITH_NEW_KEY
  
  !winnerNode.state.isWinner() = true(是 LOSER)
  → 交换:
     tmp = 2
     winner = tree[2] = -1
     tree[2] = 2
  
  winner = -1
  parent /= 2 → parent = 1
  循环条件:1 > 0 && -1 >= 0 → false

tree[0] = -1

结果

ini 复制代码
tree = [-1, -1, 2]
Leaf 2: state=LOSER_WITH_NEW_KEY
<2> 步骤 2:i=1, adjust(1)

advanceIfAvailable()

ini 复制代码
Leaf 1: kv={key=1, seq=200, value="B"}, state=WINNER_WITH_NEW_KEY, firstSameKeyIndex=-1

adjust(1)

java 复制代码
winner = 1
parent = (1 + 3) / 2 = 2

循环第 1 轮:parent=2, winner=1
  winnerNode = Leaf 1 (key=1, seq=200, state=WINNER_WITH_NEW_KEY)
  tree[2] == 2
  parentNode = Leaf 2 (key=1, seq=150, state=LOSER_WITH_NEW_KEY)
  
  winnerNode.state = WINNER_WITH_NEW_KEY
  → 调用 adjustWithNewWinnerKey(2, Leaf 2, Leaf 1)
  
  adjustWithNewWinnerKey():
    parentNode.state = LOSER_WITH_NEW_KEY
    parentKey = {key=1, seq=150}
    childKey = {key=1, seq=200}
    
    firstResult = firstComparator.compare(parentKey, childKey)
                = userKeyComparator.compare(childKey, parentKey)
                = userKeyComparator.compare({key=1}, {key=1})
                = 0
    
    // Key 相同,比较 Sequence
    secondResult = secondComparator.compare(parentKey, childKey)
                 = Long.compare(childKey.seq, parentKey.seq)
                 = Long.compare(200, 150)
                 = 50 > 0
    
    // secondResult > 0,进入 else 分支
    // parent 的 seq 更小,parent 是赢家
    winnerNode.state = LOSER_WITH_SAME_KEY
    parentNode.state = WINNER_WITH_NEW_KEY
    parentNode.setFirstSameKeyIndex(2)  // 记录 index=2
    
    // Leaf 2.firstSameKeyIndex == -1,设置为 2
    Leaf 2.firstSameKeyIndex = 2
  
  // 返回后检查
  !winnerNode.state.isWinner() = true(Leaf 1 是 LOSER)
  → 交换:
     tmp = 1
     winner = tree[2] = 2
     tree[2] = 1
  
  winner = 2
  parent /= 2 → parent = 1

循环第 2 轮:parent=1, winner=2
  winnerNode = Leaf 2 (key=1, seq=150, state=WINNER_WITH_NEW_KEY)
  tree[1] == -1  // 初始化分支
  → winnerNode.state = LOSER_WITH_NEW_KEY
  
  !isWinner() = true
  → 交换:
     tmp = 2
     winner = tree[1] = -1
     tree[1] = 2
  
  winner = -1
  parent = 0
  循环条件:0 > 0 → false

tree[0] = -1

结果

ini 复制代码
tree = [-1, 2, 1]
Leaf 1: state=LOSER_WITH_SAME_KEY, firstSameKeyIndex=-1
Leaf 2: state=LOSER_WITH_NEW_KEY, firstSameKeyIndex=2
<3> 步骤 3:i=0, adjust(0)

advanceIfAvailable()

ini 复制代码
Leaf 0: kv={key=1, seq=100, value="A"}, state=WINNER_WITH_NEW_KEY, firstSameKeyIndex=-1

adjust(0)

java 复制代码
winner = 0
parent = (0 + 3) / 2 = 1

循环第 1 轮:parent=1, winner=0
  winnerNode = Leaf 0 (key=1, seq=100, state=WINNER_WITH_NEW_KEY)
  tree[1] == 2
  parentNode = Leaf 2 (key=1, seq=150, state=LOSER_WITH_NEW_KEY)
  
  调用 adjustWithNewWinnerKey(1, Leaf 2, Leaf 0):
    parentKey = {key=1, seq=150}
    childKey = {key=1, seq=100}
    
    firstResult = compare({key=1}, {key=1}) = 0
    
    secondResult = Long.compare(100, 150) = -50 < 0
    
    // secondResult < 0,进入 if 分支
    // child 的 seq 更小,child 是赢家
    parentNode.state = LOSER_WITH_SAME_KEY
    winnerNode.setFirstSameKeyIndex(1)
    
    Leaf 0.firstSameKeyIndex = 1
  
  // winnerNode 还是 WINNER
  !isWinner() = false
  → 不交换
  
  winner = 0
  parent = 0
  循环条件:0 > 0 → false

tree[0] = 0
<4> 初始化完成
ini 复制代码
tree = [0, 2, 1]
- tree[0] = 0 → 全局赢家 Leaf 0 (key=1, seq=100)
- tree[1] = 2 → 败者 Leaf 2 (key=1, seq=150)
- tree[2] = 1 → 败者 Leaf 1 (key=1, seq=200)

Leaf 0: kv={key=1, seq=100}, state=WINNER_WITH_NEW_KEY, firstSameKeyIndex=1
Leaf 1: kv={key=1, seq=200}, state=LOSER_WITH_SAME_KEY, firstSameKeyIndex=-1
Leaf 2: kv={key=1, seq=150}, state=LOSER_WITH_SAME_KEY, firstSameKeyIndex=2

(3) SortMergeIterator.next() 执行

第一次调用 next()

java 复制代码
public T next() throws IOException {
    while (true) {
        loserTree.adjustForNextLoop();       // 步骤 1
        KeyValue winner = loserTree.popWinner(); // 步骤 2
        if (winner == null) return null;
        
        mergeFunctionWrapper.reset();        // 步骤 3
        mergeFunctionWrapper.add(winner);    // 步骤 4
        
        T result = merge();                  // 步骤 5
        if (result != null) return result;
    }
}
<1> 步骤 1:adjustForNextLoop()
java 复制代码
public void adjustForNextLoop() throws IOException {
    LeafIterator<T> winner = leaves.get(tree[0]);
    while (winner.state == State.WINNER_POPPED) {
        winner.advanceIfAvailable();
        adjust(tree[0]);
        winner = leaves.get(tree[0]);
    }
}
ini 复制代码
winner = leaves.get(0) = Leaf 0
Leaf 0.state = WINNER_WITH_NEW_KEY ≠ WINNER_POPPED
→ 循环条件不满足,直接返回
<2> 步骤 2:popWinner()
java 复制代码
public T popWinner() {
    LeafIterator<T> winner = leaves.get(tree[0]);
    if (winner.state == State.WINNER_POPPED) {
        return null;
    }
    T result = winner.pop();
    adjust(tree[0]);
    return result;
}
java 复制代码
winner = Leaf 0
Leaf 0.state = WINNER_WITH_NEW_KEY ≠ WINNER_POPPED

result = Leaf 0.pop()
  Leaf 0.state = WINNER_POPPED
  return {key=1, seq=100, value="A"}

调用 adjust(0):
  winner = 0
  parent = 1
  
  循环第 1 轮:
    winnerNode = Leaf 0 (state=WINNER_POPPED, firstSameKeyIndex=1)
    tree[1] = 2
    parentNode = Leaf 2
    
    // WINNER_POPPED 分支
    firstSameKeyIndex = 1 ≥ 0
    → 快速路径
    parent = 1
    parentNode = Leaf 2 (tree[1])
    
    Leaf 0.state = LOSER_POPPED
    Leaf 2.state = WINNER_WITH_SAME_KEY
    
    // 检查 !winnerNode.state.isWinner()
    // Leaf 0 现在是 LOSER_POPPED (不是赢家)
    !isWinner() = true
    → 交换:
       tmp = 0
       winner = tree[1] = 2
       tree[1] = 0
    
    winner = 2
    parent = 0
    循环条件:0 > 0 → false
  
  tree[0] = 2

返回:{key=1, seq=100, value="A"}

当前状态

ini 复制代码
tree = [2, 0, 1]
- tree[0] = 2 → Leaf 2 是新的全局赢家

Leaf 0: state=LOSER_POPPED, firstSameKeyIndex=1
Leaf 1: state=LOSER_WITH_SAME_KEY, firstSameKeyIndex=-1
Leaf 2: state=WINNER_WITH_SAME_KEY, firstSameKeyIndex=2
<3> 步骤 3-4:reset() 和 add()
ini 复制代码
mergeFunctionWrapper.reset();
  latestKv = null

mergeFunctionWrapper.add({key=1, seq=100, value="A"});
  latestKv = {key=1, seq=100, value="A"}
<4> 步骤 5:merge()
java 复制代码
private T merge() {
    while (loserTree.peekWinner() != null) {
        mergeFunctionWrapper.add(loserTree.popWinner());
    }
    return mergeFunctionWrapper.getResult();
}
《1》第 1 次循环
java 复制代码
peekWinner():
  tree[0] = 2
  Leaf 2.state = WINNER_WITH_SAME_KEY ≠ WINNER_POPPED
  return Leaf 2.peek() = {key=1, seq=150, value="C"}  ✅

popWinner():
  winner = Leaf 2
  Leaf 2.state ≠ WINNER_POPPED
  
  result = Leaf 2.pop()
    Leaf 2.state = WINNER_POPPED
    return {key=1, seq=150, value="C"}
  
  adjust(2):
    winner = 2
    parent = 2
    
    winnerNode = Leaf 2 (WINNER_POPPED, firstSameKeyIndex=2)
    tree[2] = 1
    parentNode = Leaf 1
    
    firstSameKeyIndex = 2 ≥ 0
    parent = 2
    parentNode = Leaf 1 (tree[2])
    
    Leaf 2.state = LOSER_POPPED
    Leaf 1.state = WINNER_WITH_SAME_KEY
    
    !isWinner() = true
    → 交换:
       tmp = 2
       winner = tree[2] = 1
       tree[2] = 2
    
    winner = 1
    parent = 1
    
    循环第 2 轮:
      winnerNode = Leaf 1 (WINNER_WITH_SAME_KEY)
      tree[1] = 0
      parentNode = Leaf 0 (LOSER_POPPED)
      
      // adjustWithSameWinnerKey
      parentNode.state = LOSER_POPPED
      → return(不处理)
      
      !isWinner() = false(Leaf 1 是 WINNER)
      → 不交换
      
      parent = 0
      循环条件:0 > 0 → false
    
    tree[0] = 1

add({key=1, seq=150, value="C"}):
  latestKv = {key=1, seq=150, value="C"} // 弹出

当前状态

ini 复制代码
tree = [1, 0, 2]

Leaf 0: state=LOSER_POPPED, firstSameKeyIndex=1
Leaf 1: state=WINNER_WITH_SAME_KEY, firstSameKeyIndex=-1
Leaf 2: state=LOSER_POPPED, firstSameKeyIndex=2
《2》第 2 次循环
java 复制代码
peekWinner():
  tree[0] = 1
  Leaf 1.state = WINNER_WITH_SAME_KEY ≠ WINNER_POPPED
  return {key=1, seq=200, value="B"}  ✅

popWinner():
  result = Leaf 1.pop()
    Leaf 1.state = WINNER_POPPED
    return {key=1, seq=200, value="B"}
  
  adjust(1):
    winner = 1
    parent = 2
    
    winnerNode = Leaf 1 (WINNER_POPPED, firstSameKeyIndex=-1)
    
    firstSameKeyIndex = -1 < 0
    → parent = -1(停止调整)
    
    !isWinner() = false → 不处理
    
    循环条件:-1 > 0 → false
    
    tree[0] = 1

add({key=1, seq=200, value="B"}):
  latestKv = {key=1, seq=200, value="B"}  // 弹出

当前状态

ini 复制代码
tree = [1, 0, 2]

Leaf 0: state=LOSER_POPPED, firstSameKeyIndex=1
Leaf 1: state=WINNER_POPPED, firstSameKeyIndex=-1
Leaf 2: state=LOSER_POPPED, firstSameKeyIndex=2
《3》第 3 次循环
java 复制代码
peekWinner():
  tree[0] = 1
  Leaf 1.state = WINNER_POPPED
  return null  ❌

循环结束

getResult():
  return latestKv = {key=1, seq=200, value="B"}

(4) 最终结果

ini 复制代码
以deduplicate覆盖为例子
next() 返回:{key=1, seq=200, value="B"}

所有相同 Key 的记录都被收集:
1. seq=100 先添加
2. seq=150 覆盖
3. seq=200 最终覆盖

保留的是 Sequence 最大(最新)的数据

官方案例请看PIP-2: Optimize SortMergeReader with LoserTree - Paimon - Apache Software Foundation

相关推荐
程序员爱钓鱼4 小时前
BlackHole 2ch:macOS无杂音录屏与系统音频采集完整技术指南
前端·后端·设计模式
睿观·ERiC4 小时前
跨境电商合规预警:Keith 律所 TRO 诉讼(25-cv-15032)突袭,奇幻插画版权风险排查指南
大数据·人工智能·跨境电商
与遨游于天地4 小时前
接口与实现分离:从 SPI 到 OSGi、SOFAArk的模块化演进
开发语言·后端·架构
ss2734 小时前
springboot二手车交易系统
java·spring boot·后端
韩立学长4 小时前
【开题答辩实录分享】以《智慧酒店管理——手机预订和住宿管理》为例进行选题答辩实录分享
android·java·后端
何中应4 小时前
【面试题-8】Spring/Spring MVC/Spring Boot/Spring Cloud
java·spring boot·后端·spring·mvc·面试题
云器科技4 小时前
小红书×云器科技|增量计算+实时湖仓构建小红书实验数仓生产新范式
大数据·数据库架构·小红书·实时数据·数据湖仓
武子康4 小时前
大数据-186 Logstash JDBC vs Syslog Input:原理、场景对比与可复用配置(基于 Logstash 7.3.0)
大数据·后端·logstash
梦里不知身是客114 小时前
spark如何调节jvm的连接等待时长
大数据·分布式·spark