LevelDB的SkipList实现

Leveldb中的SkipList具体实现在db/skiplist.h中。

私有域：

struct Node 用来存储数据，和上面Java实现的不同，Node中key的值既包含了需要查询的key，也包含具体的Value，这个和数据存储格式相关系
enum { kMaxHeight = 12 }; 允许最大的层数是12层
compare_ 用来做key值比较的比较器
arena_ 为每个Node 分配内存
head_ 头结点指针
std::atomic max_height_; 原子递增的最大层数
一个上文提到的随机数生成的对象

explicit关键字一般用来修饰类的构造函数，作用是告诉编译器按照实际的类型来构造函数，不允许做隐式转换。

公有域：

一个不允许隐式转换的构造器
void Insert(const Key& key); // 插入方法
bool Contains(const Key& key) const; 判断是否存在
Iterator 一个迭代器实现，迭代器中包含了多种查询方式，包括查找当前的key等方法（后文详细介绍）。其中也包含了私有域，指向当前的SkipList和一个用来遍历的Node。

Node结构体包含了一个Key，这个就是前面提到的template <typename Key, class Comparator>中的这个Key，还包含了一个数组

std::atomic<Node*> next_[1];

这个是一种柔性的数组。也就是大小是可变的，next数组是一个指针数组，也就是上文中java代码中的IndexNode，本身不存储key Value，而是作为索引存储。他存储的数据是一个包含了Key值的Node的指针，当前Node有多少层，他的数组就是多大。也就是记录了每一层的数据。

一个SkipList 初始化过程中需要传入的参数就是比较器comparator_ 和内存分配器arena_的指针。

查询

和上面的实现不同的是，leveldb的查询是放在Iterator 的seek方法中的

arduino 复制代码

template <typename Key, class Comparator>
inline void SkipList<Key, Comparator>::Iterator::Seek(const Key& target) {
  node_ = list_->FindGreaterOrEqual(target, nullptr);
}

最后调用的是SkipList本身的FindGreaterOrEqual 方法，也就是要么找到当前值，要么就找到比当前值大的那个值。该方法包含了两个参数，第一个是需要查询的值，第二个是用来存储当前查询level的前序节点，也就是update数组。

arduino 复制代码

template <typename Key, class Comparator>
bool SkipList<Key, Comparator>::KeyIsAfterNode(const Key& key, Node* n) const {
  // null n is considered infinite
  return (n != nullptr) && (compare_(n->key, key) < 0);
}

template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node*
SkipList<Key, Comparator>::FindGreaterOrEqual(const Key& key,
                                              Node** prev) const {
  Node* x = head_; // x为后续查询使用的临时数据，此时x在最上层的头结点
  int level = GetMaxHeight() - 1; //获取当前最大的层数，从0开始，所以减去1
  while (true) {
    Node* next = x->Next(level); // next数组就是每一层链表的数据，从head开始
    if (KeyIsAfterNode(key, next)) { //找到当前的查询的值是否在x后面，如果是，则说明当前的值的后一个值小于查询的值
      // Keep searching in this list
      x = next; //往右查询。
    } else { // 如果x的后面的值大于查询的key，则判断是是否需要记录层数，get中prev是nullptr，所以不需要存储，在insert的时候就需要记录下来
      if (prev != nullptr) prev[level] = x;
      if (level == 0) { // 如果当前已经查询到最后一层，则返回这个next，此时next因为前面的KeyIsAfterNode 返回的是false，所以要么当前的值的右边为nullptr要么这个值就比key大，而且是紧挨的着的那个大于这个key的值。仅仅在第0层是这个结果
        return next;
      } else {
        // Switch to next list
        level--;
      }
    }
  }
}

本身实现上和上文的没有区别，首先从最高层往下便利，其中Next方法就是获取到下一层的节点。这个方法还有一个实现就是NoBarrier_Next,两者的区别就是是否允许重排，在java中，指令重排是使用valite关键字关闭，c++中用下面的方式：（直接使用leveldb的代码）

scss 复制代码

  Node* Next(int n) {
    assert(n >= 0);
    // Use an 'acquire load' so that we observe a fully initialized
    // version of the returned Node.
    return next_[n].load(std::memory_order_acquire);
  }
  void SetNext(int n, Node* x) {
    assert(n >= 0);
    // Use a 'release store' so that anybody who reads through this
    // pointer observes a fully initialized version of the inserted node.
    next_[n].store(x, std::memory_order_release);
  }

  // No-barrier variants that can be safely used in a few locations.
  Node* NoBarrier_Next(int n) {
    assert(n >= 0);
    return next_[n].load(std::memory_order_relaxed);
  }
  void NoBarrier_SetNext(int n, Node* x) {
    assert(n >= 0);
    next_[n].store(x, std::memory_order_relaxed);
  }

其中的memory_order就是决定当前的指令是否重排：

std::memory_order_acquire：这是一种内存顺序语义，用于 load 操作。它确保在加载 next_[n] 指针之前，所有之前的内存操作（写入和读取）都不会被重排序到加载之后。在多线程环境中，这可以确保读取到 next_[n] 的值是最新的。
std::memory_order_release：这是一种内存顺序语义，用于 store 操作。它确保在存储 x 到 next_[n] 指针之后，所有之后的内存操作（写入和读取）都不会被重排序到存储之前。在多线程环境中，这可以确保 x 被完全初始化之后再被其他线程访问。
std::memory_order_relaxed：这是一种较弱的内存顺序语义，通常用于 load 和 store 操作。它允许更大的重排序自由度，不保证像 acquire 和 release 那样的强制顺序。通常在不需要严格的顺序控制时使用，以提高性能。

所以setNext值得是当前的获取的n是否被后续访问的线程可见，避免了出现多次创建甚至覆盖的问题，但是后面的两个则没有这个限制。

来看个例子：

当前如果有:

rust 复制代码

head2--->3-> 

head1--->3->   5->   9->

head0--->3->4->5->7->9->

如果有上文中的3层，现在需要查询8

首先x被复制为head2 进入循环，然后x获取head的链表也就是next中的数据，此时为3
当前查询的8 大于3 ，所以将3 位置Node赋值给x,进入到下次循环
继续获取3后面的值，但是此时3后面的值为空，所以KeyIsAfterNode返回为false，进入到else，判断prev是否存储，判断是否已经到了第0层，此时否为false，所以将level--，找下一层
x此时获取的是next中的第2层，也就是head1后面的3，因为3小于8，所以找到了5，所以将5 赋值给x，继续循环
此时x为5，里面的next指向的是9，也就是next此时为9，9 大于8。所以不在赋值给x。而是进入下一层
进入到head0，此时x为5，next为7，然后7小于8，进入下次循环
此时x为7，但是next为9，因为是0层了，所以直接返回next

也就是说这个方法返回的值和他的名字一样，和当前值相等或者比当前值大的第一个数。在Iterator中seek的值会存储在本身的node_ 属性中，需要进一步进行判断。如果这个值为nullptr，则Iterator中的Valid 为false。

插入

查询也使用到了上面的FindGreaterOrEqual

scss 复制代码

template <typename Key, class Comparator>
void SkipList<Key, Comparator>::Insert(const Key& key) {
  // TODO(opt): We can use a barrier-free variant of FindGreaterOrEqual()
  // here since Insert() is externally synchronized.
  Node* prev[kMaxHeight];
  Node* x = FindGreaterOrEqual(key, prev);

  // Our data structure does not allow duplicate insertion
  assert(x == nullptr || !Equal(key, x->key));

  int height = RandomHeight();
  if (height > GetMaxHeight()) {
    for (int i = GetMaxHeight(); i < height; i++) {
      prev[i] = head_;
    }
    // It is ok to mutate max_height_ without any synchronization
    // with concurrent readers.  A concurrent reader that observes
    // the new value of max_height_ will see either the old value of
    // new level pointers from head_ (nullptr), or a new value set in
    // the loop below.  In the former case the reader will
    // immediately drop to the next level since nullptr sorts after all
    // keys.  In the latter case the reader will use the new node.
    max_height_.store(height, std::memory_order_relaxed);
  }

  x = NewNode(key, height);
  for (int i = 0; i < height; i++) {
    // NoBarrier_SetNext() suffices since we will add a barrier when
    // we publish a pointer to "x" in prev[i].
    x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i));
    prev[i]->SetNext(i, x);
  }
}
template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::NewNode(
    const Key& key, int height) {
  char* const node_memory = arena_->AllocateAligned(
      sizeof(Node) + sizeof(std::atomic<Node*>) * (height - 1));
  return new (node_memory) Node(key);
}

插入过程中，首先是找到已经存在的key，或者这个key的后缀节点，需要注意的是这个key再leveldb中时不允许在一个一模一样的key已经存在然后继续插入的。leveldb中的每次插入都包含了一个序列，所以这个key一般不会一样，我能想到的完全一样的情况感觉就是sequence的值获取是出现问题，导致一个key拥有了相同的sequence，这个后面再讨论。

然后就是获取到一个随机的高度，如果当前的高度超过了现在最大的高度（random不会返回超过12的高度），就需要将在获取key的时候存储的prev节点添加上新的head节点。然后更新高度。将当前的key和height封装为一个新的Node，主要是开辟内存空间。然后就是填充各个层的数据

删除

leveldb本身的删除就是一个添加墓碑标记的删除，后续文件合并才会真正的删除，所以Skiplist中也没有删除。

迭代方式

SkipList的迭代器中的方法有：

csharp 复制代码

    bool Valid() const; // 查询的结果是否是nullptr
    // Advances to the next position.
    // REQUIRES: Valid()
    void Next(); // 找到查询值的后续节点

    // Advances to the previous position.
    // REQUIRES: Valid()
    void Prev();// 查询值的前序节点

    // Advance to the first entry with a key >= target
    void Seek(const Key& target); // 获取到key>=target的值

    // Position at the first entry in list.
    // Final state of iterator is Valid() iff list is not empty.
    void SeekToFirst(); // 最小值

    // Position at the last entry in list.
    // Final state of iterator is Valid() iff list is not empty.
    void SeekToLast(); // 最大值

其中Valid 、 Next和 Seek已经大概说过了。主要就是一个Prev，SeekToFirst和SeekToLast。

SeekToLast的方法比较简单，就是从level为0一直遍历到最最后面。SeekToFirst 实现就是找到第一个，也就是获取level 0 的第一个值。所以这个两个值可以算得上是获取最大最小值。

Prev则是找到当前查询值小的最后的一个值：

arduino 复制代码

template <typename Key, class Comparator>
typename SkipList<Key, Comparator>::Node*
SkipList<Key, Comparator>::FindLessThan(const Key& key) const {
  Node* x = head_;
  int level = GetMaxHeight() - 1;
  while (true) {
    assert(x == head_ || compare_(xeveldb中的SkipList具体实现在`db/skiplist.h`中。

私有域：

1.  struct Node 用来存储数据，和上面Java实现的不同，Node中key的值既包含了需要查询的key，也包含具体的Value，这个和数据存储格式相关系
1.  enum { *kMaxHeight* = 12 }; 允许最大的层数是12层
1.  compare_ 用来做key值比较的比较器
1.  arena_ 为每个Node 分配内存
1.  head_ 头结点指针
1.  std::atomic<int> max_height_; 原子递增的最大层数
1.  一个上文提到的随机数生成的对象

> explicit关键字一般用来修饰类的构造函数，作用是告诉编译器按照实际的类型来构造函数，不允许做隐式转换。

公有域：

1.  一个不允许隐式转换的构造器
1.  void Insert(const Key& key); // 插入方法
1.  bool Contains(const Key& key) const; 判断是否存在
1.  Iterator 一个迭代器实现，迭代器中包含了多种查询方式，包括查找当前的key等方法（后文详细介绍）。其中也包含了私有域，指向当前的SkipList和一个用来遍历的Node。

Node结构体包含了一个Key，这个就是前面提到的`template <typename Key, class Comparator>`中的这个Key，还包含了一个数组

> std::atomic<Node*> next_[1];

这个是一种柔性的数组。也就是大小是可变的，next数组是一个指针数组，也就是上文中java代码中的IndexNode，本身不存储key Value，而是作为索引存储。

一个SkipList 初始化过程中需要传入的参数就是比较器comparator_ 和内存分配器arena_的指针。

##### 查询

和上面的实现不同的是，leveldb的查询是放在Iterator 的seek方法中的

template <typename Key, class Comparator> inline void SkipList<Key, Comparator>::Iterator::Seek(const Key& target) { node_ = list_->FindGreaterOrEqual(target, nullptr); }

sql 复制代码

最后调用的是SkipList本身的FindGreaterOrEqual 方法，也就是要么找到当前值，要么就找到比当前值大的那个值。该方法包含了两个参数，第一个是需要查询的值，第二个是用来存储当前查询level的前序节点，也就是update数组。

template <typename Key, class Comparator> bool SkipList<Key, Comparator>::KeyIsAfterNode(const Key& key, Node* n) const { // null n is considered infinite return (n != nullptr) && (compare_(n->key, key) < 0); } template <typename Key, class Comparator> typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::FindGreaterOrEqual(const Key& key, Node** prev) const { Node* x = head_; // x为后续查询使用的临时数据，此时x在最上层的头结点 int level = GetMaxHeight() - 1; //获取当前最大的层数，从0开始，所以减去1 while (true) { Node* next = x->Next(level); // next数组就是每一层链表的数据，从head开始 if (KeyIsAfterNode(key, next)) { //找到当前的查询的值是否在x后面，如果是，则说明当前的值的后一个值小于查询的值 // Keep searching in this list x = next; //往右查询。 } else { // 如果x的后面的值大于查询的key，则判断是是否需要记录层数，get中prev是nullptr，所以不需要存储，在insert的时候就需要记录下来 if (prev != nullptr) prev[level] = x; if (level == 0) { // 如果当前已经查询到最后一层，则返回这个next，此时next因为前面的KeyIsAfterNode 返回的是false，所以要么当前的值的右边为nullptr要么这个值就比key大，而且是紧挨的着的那个大于这个key的值。仅仅在第0层是这个结果 return next; } else { // Switch to next list level--; } } } }

r 复制代码

本身实现上和上文的没有区别，首先从最高层往下便利，其中Next方法就是获取到下一层的节点。这个方法还有一个实现就是`NoBarrier_Next`,两者的区别就是是否允许重排，在java中，指令重排是使用valite关键字关闭，c++中用下面的方式：（直接使用leveldb的代码）

Node* Next(int n) { assert(n >= 0); // Use an 'acquire load' so that we observe a fully initialized // version of the returned Node. return next_[n].load(std::memory_order_acquire); } void SetNext(int n, Node* x) { assert(n >= 0); // Use a 'release store' so that anybody who reads through this // pointer observes a fully initialized version of the inserted node. next_[n].store(x, std::memory_order_release); } // No-barrier variants that can be safely used in a few locations. Node* NoBarrier_Next(int n) { assert(n >= 0); return next_[n].load(std::memory_order_relaxed); } void NoBarrier_SetNext(int n, Node* x) { assert(n >= 0); next_[n].store(x, std::memory_order_relaxed); }

go 复制代码

其中的memory_order就是决定当前的指令是否重排：

1.  `std::memory_order_acquire`：这是一种内存顺序语义，用于 `load` 操作。它确保在加载 `next_[n]` 指针之前，所有之前的内存操作（写入和读取）都不会被重排序到加载之后。在多线程环境中，这可以确保读取到 `next_[n]` 的值是最新的。
1.  `std::memory_order_release`：这是一种内存顺序语义，用于 `store` 操作。它确保在存储 `x` 到 `next_[n]` 指针之后，所有之后的内存操作（写入和读取）都不会被重排序到存储之前。在多线程环境中，这可以确保 `x` 被完全初始化之后再被其他线程访问。
1.  `std::memory_order_relaxed`：这是一种较弱的内存顺序语义，通常用于 `load` 和 `store` 操作。它允许更大的重排序自由度，不保证像 `acquire` 和 `release` 那样的强制顺序。通常在不需要严格的顺序控制时使用，以提高性能。

所以setNext值得是当前的获取的n是否被后续访问的线程可见，避免了出现多次创建甚至覆盖的问题，但是后面的两个则没有这个限制。

来看个例子：

当前如果有:

head2--->3-> head1--->3-> 5-> 9-> head0--->3->4->5->7->9->

vbnet 复制代码

如果有上文中的3层，现在需要查询8

1.  首先x被复制为head2 进入循环，然后x获取head的链表也就是next中的数据，此时为3
1.  当前查询的8 大于3 ，所以将3 位置Node赋值给x,进入到下次循环
1.  继续获取3后面的值，但是此时3后面的值为空，所以KeyIsAfterNode返回为false，进入到else，判断prev是否存储，判断是否已经到了第0层，此时否为false，所以将level--，找下一层
1.  x此时获取的是next中的第2层，也就是head1后面的3，因为3小于8，所以找到了5，所以将5 赋值给x，继续循环
1.  此时x为5，里面的next指向的是9，也就是next此时为9，9 大于8。所以不在赋值给x。而是进入下一层
1.  进入到head0，此时x为5，next为7，然后7小于8，进入下次循环
1.  此时x为7，但是next为9，因为是0层了，所以直接返回next

也就是说这个方法返回的值和他的名字一样，和当前值相等或者比当前值大的第一个数。在Iterator中seek的值会存储在本身的node_ 属性中，需要进一步进行判断。如果这个值为nullptr，则Iterator中的Valid 为false。

##### 插入

查询也使用到了上面的FindGreaterOrEqual

template <typename Key, class Comparator> void SkipList<Key, Comparator>::Insert(const Key& key) { // TODO(opt): We can use a barrier-free variant of FindGreaterOrEqual() // here since Insert() is externally synchronized. Node* prev[kMaxHeight]; Node* x = FindGreaterOrEqual(key, prev); // Our data structure does not allow duplicate insertion assert(x == nullptr || !Equal(key, x->key)); int height = RandomHeight(); if (height > GetMaxHeight()) { for (int i = GetMaxHeight(); i < height; i++) { prev[i] = head_; } // It is ok to mutate max_height_ without any synchronization // with concurrent readers. A concurrent reader that observes // the new value of max_height_ will see either the old value of // new level pointers from head_ (nullptr), or a new value set in // the loop below. In the former case the reader will // immediately drop to the next level since nullptr sorts after all // keys. In the latter case the reader will use the new node. max_height_.store(height, std::memory_order_relaxed); } x = NewNode(key, height); for (int i = 0; i < height; i++) { // NoBarrier_SetNext() suffices since we will add a barrier when // we publish a pointer to "x" in prev[i]. x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i)); prev[i]->SetNext(i, x); } } template <typename Key, class Comparator> typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::NewNode( const Key& key, int height) { char* const node_memory = arena_->AllocateAligned( sizeof(Node) + sizeof(std::atomic<Node*>) * (height - 1)); return new (node_memory) Node(key); }

vbnet 复制代码

插入过程中，首先是找到已经存在的key，或者这个key的后缀节点，需要注意的是这个key再leveldb中时不允许在一个一模一样的key已经存在然后继续插入的。leveldb中的每次插入都包含了一个序列，所以这个key一般不会一样，我能想到的完全一样的情况感觉就是sequence的值获取是出现问题，导致一个key拥有了相同的sequence，这个后面再讨论。

然后就是获取到一个随机的高度，如果当前的高度超过了现在最大的高度（random不会返回超过12的高度），就需要将在获取key的时候存储的prev节点添加上新的head节点。然后更新高度。将当前的key和height封装为一个新的Node，主要是开辟内存空间。然后就是填充各个层的数据

##### 删除

leveldb本身的删除就是一个添加墓碑标记的删除，后续文件合并才会真正的删除，所以Skiplist中也没有删除。

##### 迭代方式

SkipList的迭代器中的方法有：

bool Valid() const; // 查询的结果是否是nullptr // Advances to the next position. // REQUIRES: Valid() void Next(); // 找到查询值的后续节点 // Advances to the previous position. // REQUIRES: Valid() void Prev();// 查询值的前序节点 // Advance to the first entry with a key >= target void Seek(const Key& target); // 获取到key>=target的值 // Position at the first entry in list. // Final state of iterator is Valid() iff list is not empty. void SeekToFirst(); // 最小值 // Position at the last entry in list. // Final state of iterator is Valid() iff list is not empty. void SeekToLast(); // 最大值

sql 复制代码

其中Valid 、 Next和 Seek已经大概说过了。主要就是一个Prev，SeekToFirst和SeekToLast。

SeekToLast的方法比较简单，就是从level为0一直遍历到最最后面。SeekToFirst 实现就是找到第一个，也就是获取level 0 的第一个值。所以这个两个值可以算得上是获取最大最小值。

Prev则是找到当前查询值小的最后的一个值：

template <typename Key, class Comparator> typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::FindLessThan(const Key& key) const { Node* x = head_; int level = GetMaxHeight() - 1; while (true) { assert(x == head_ || compare_(x->key, key) < 0); Node* next = x->Next(level); if (next == nullptr || compare_(next->key, key) >= 0) { if (level == 0) { return x; } else { // Switch to next list level--; } } else { x = next; } } }

vbnet 复制代码

核心实现就是上面的FindLessThan，可以看到和前面的FindGreaterOrEqual类似，FindGreaterOrEqual已经很详细的说过，这里就不赘述了。->key, key) < 0);
    Node* next = x->Next(level);
    if (next == nullptr || compare_(next->key, key) >= 0) {
      if (level == 0) {
        return x;
      } else {
        // Switch to next list
        level--;
      }
    } else {
      x = next;
    }
  }
}

核心实现就是上面的FindLessThan，可以看到和前面的FindGreaterOrEqual类似，FindGreaterOrEqual已经很详细的说过，这里就不赘述了。