从头开始构建数据库:07.空闲列表:重用页面

Since our B-tree is immutable, every update to the KV store creates new nodes in the path instead of updating current nodes, leaving some nodes unreachable from the latest version. We need to reuse these unreachable nodes from old versions, otherwise, the database file will grow indefinitely.

由于我们的 B 树是不可变的,因此对 KV 存储的每次更新都会在路径中创建新节点,而不是更新当前节点,从而导致某些节点无法从最新版本访问。我们需要重用旧版本中这些不可达的节点,否则数据库文件将无限增长。

7.1 Design the Free List

7.1 设计空闲列表

To reuse these pages, we'll add a persistent free list to keep track of unused pages. Update operations reuse pages from the list before appending new pages, and unused pages from the current version are added to the list.

为了重用这些页面,我们将添加一个持久的空闲列表来跟踪未使用的页面。更新操作在附加新页面之前重用列表中的页面,并将当前版本中未使用的页面添加到列表中。

The list is used as a stack (first-in-last-out), each update operation can both remove from and add to the top of the list.

该列表用作堆栈(先进后出),每个更新操作都可以从列表顶部删除或添加到列表顶部。

go 复制代码
// number of items in the list
func (fl *FreeList) Total() int
// get the nth pointer
func (fl *FreeList) Get(topn int) uint64
// remove `popn` pointers and add some new pointers
func (fl *FreeList) Update(popn int, freed []uint64)

The free list is also immutable like our B-tree. Each node contains:

空闲列表也像我们的 B 树一样是不可变的。每个节点包含:

  1. Multiple pointers to unused pages.
    指向未使用页面的多个指针。
  2. The link to the next node.
    到下一个节点的链接。
  3. The total number of items in the list. This only applies to the head node.
    列表中的项目总数。这只适用于头节点。
perl 复制代码
|   node1   |     |   node2   |     |   node3   |
+-----------+     +-----------+     +-----------+
| total=xxx |     |           |     |           |
|  next=yyy | ==> |  next=qqq | ==> |  next=eee | ==> ...
|  size=zzz |     |  size=ppp |     |  size=rrr |
|  pointers |     |  pointers |     |  pointers |

The node format: 节点格式:

matlab 复制代码
| type | size | total | next |  pointers |
|  2B  |  2B  |   8B  |  8B  | size * 8B |
go 复制代码
const BNODE_FREE_LIST = 3
const FREE_LIST_HEADER = 4 + 8 + 8
const FREE_LIST_CAP = (BTREE_PAGE_SIZE - FREE_LIST_HEADER) / 8

Functions for accessing the list node:

访问链表节点的函数:

go 复制代码
func flnSize(node BNode) int
func flnNext(node BNode) uint64
func flnPtr(node BNode, idx int)
func flnSetPtr(node BNode, idx int, ptr uint64)
func flnSetHeader(node BNode, size uint16, next uint64)
func flnSetTotal(node BNode, total uint64)

7.2 The Free List Datatype

7.2 空闲列表数据类型

The FreeList type consists of the pointer to the head node and callbacks for managing disk pages.
FreeList 类型由指向头节点的指针和用于管理磁盘页面的回调组成。

go 复制代码
type FreeList struct {
    head uint64
    // callbacks for managing on-disk pages
    get func(uint64) BNode  // dereference a pointer
    new func(BNode) uint64  // append a new page
    use func(uint64, BNode) // reuse a page
}

These callbacks are different from the B-tree because the pages used by the list are managed by the list itself .

这些回调与B树不同,因为列表使用的页面是由列表本身管理的。

  • The new callback is only for appending new pages since the free list must reuse pages from itself.
    new 回调仅用于附加新页面,因为空闲列表必须重用自身的页面。
  • There is no del callback because the free list adds unused pages to itself.
    没有 del 回调,因为空闲列表将未使用的页面添加到自身。
  • The use callback registers a pending update to a reused page.
    use 回调将挂起的更新注册到重用页面。
go 复制代码
type BTree struct {
    // pointer (a nonzero page number)
    root uint64
    // callbacks for managing on-disk pages
    get func(uint64) BNode // dereference a pointer
    new func(BNode) uint64 // allocate a new page
    del func(uint64)       // deallocate a page
}

7.3 The Free List Implementation

7.3 空闲列表的实现

Getting the nth item from the list is just a simple list traversal.

从列表中获取第 n 项只是简单的列表遍历。

go 复制代码
func (fl *FreeList) Get(topn int) uint64 {
    assert(0 <= topn && topn < fl.Total())
    node := fl.get(fl.head)
    for flnSize(node) <= topn {
        topn -= flnSize(node)
        next := flnNext(node)
        assert(next != 0)
        node = fl.get(next)
    }
    return flnPtr(node, flnSize(node)-topn-1)
}

Updating the list is tricky. It first removes popn items from the list, then adds the freed to the list, which can be divided into 3 phases:

更新列表很棘手。它首先从列表中删除 popn 项,然后将 freed 添加到列表中,这可以分为3个阶段:

  1. If the head node is larger than popn, remove it. The node itself will be added to the list later. Repeat this step until it is not longer possible.
    如果头节点大于 popn ,则将其删除。该节点本身稍后将被添加到列表中。重复此步骤,直到不再可能为止。
  2. We may need to remove some items from the list and possibly add some new items to the list. Updating the list head requires new pages, and new pages should be reused from the items of the list itself. Pop some items from the list one by one until there are enough pages to reuse for the next phase.
    我们可能需要从列表中删除一些项目,并且可能需要向列表中添加一些新项目。更新列表头需要新页面,并且应从列表本身的项目中重用新页面。从列表中逐个弹出一些项目,直到有足够的页面可供下一阶段重复使用。
  3. Modify the list by adding new nodes.
    通过添加新节点来修改列表。
go 复制代码
// remove `popn` pointers and add some new pointers
func (fl *FreeList) Update(popn int, freed []uint64) {
    assert(popn <= fl.Total())
    if popn == 0 && len(freed) == 0 {
        return // nothing to do
    }

    // prepare to construct the new list
    total := fl.Total()
    reuse := []uint64{}
    for fl.head != 0 && len(reuse)*FREE_LIST_CAP < len(freed) {
        node := fl.get(fl.head)
        freed = append(freed, fl.head) // recyle the node itself
        if popn >= flnSize(node) {
            // phase 1
            // remove all pointers in this node
            popn -= flnSize(node)
        } else {
            // phase 2:
            // remove some pointers
            remain := flnSize(node) - popn
            popn = 0
            // reuse pointers from the free list itself
            for remain > 0 && len(reuse)*FREE_LIST_CAP < len(freed)+remain {
                remain--
                reuse = append(reuse, flnPtr(node, remain))
            }
            // move the node into the `freed` list
            for i := 0; i < remain; i++ {
                freed = append(freed, flnPtr(node, i))
            }
        }
        // discard the node and move to the next node
        total -= flnSize(node)
        fl.head = flnNext(node)
    }
    assert(len(reuse)*FREE_LIST_CAP >= len(freed) || fl.head == 0)

    // phase 3: prepend new nodes
    flPush(fl, freed, reuse)

    // done
    flnSetTotal(fl.get(fl.head), uint64(total+len(freed)))
}
go 复制代码
func flPush(fl *FreeList, freed []uint64, reuse []uint64) {
    for len(freed) > 0 {
        new := BNode{make([]byte, BTREE_PAGE_SIZE)}

        // construct a new node
        size := len(freed)
        if size > FREE_LIST_CAP {
            size = FREE_LIST_CAP
        }
        flnSetHeader(new, uint16(size), fl.head)
        for i, ptr := range freed[:size] {
            flnSetPtr(new, i, ptr)
        }
        freed = freed[size:]

        if len(reuse) > 0 {
            // reuse a pointer from the list
            fl.head, reuse = reuse[0], reuse[1:]
            fl.use(fl.head, new)
        } else {
            // or append a page to house the new node
            fl.head = fl.new(new)
        }
    }
    assert(len(reuse) == 0)
}

7.4 Manage Disk Pages

7.4 管理磁盘页面

Step 1: Modify the Data Structure

第1步:修改数据结构

The data structure is modified. Temporary pages are kept in a map keyed by their assigned page numbers. And removed page numbers are also there.

数据结构被修改。临时页面保存在由分配的页码键入的映射中。删除的页码也在那里。

go 复制代码
type KV struct {
    // omitted...
    page struct {
        flushed uint64 // database size in number of pages
        nfree   int    // number of pages taken from the free list
        nappend int    // number of pages to be appended
        // newly allocated or deallocated pages keyed by the pointer.
        // nil value denotes a deallocated page.
        updates map[uint64][]byte
    }
}

Step 2: Page Management for B-Tree

第2步:B-Tree的页面管理

The pageGet function is modified to also return temporary pages because the free list code depends on this behavior.
pageGet 函数被修改为也返回临时页面,因为空闲列表代码取决于此行为。

go 复制代码
// callback for BTree & FreeList, dereference a pointer.
func (db *KV) pageGet(ptr uint64) BNode {
    if page, ok := db.page.updates[ptr]; ok {
        assert(page != nil)
        return BNode{page} // for new pages
    }
    return pageGetMapped(db, ptr) // for written pages
}

func pageGetMapped(db *KV, ptr uint64) BNode {
    start := uint64(0)
    for _, chunk := range db.mmap.chunks {
        end := start + uint64(len(chunk))/BTREE_PAGE_SIZE
        if ptr < end {
            offset := BTREE_PAGE_SIZE * (ptr - start)
            return BNode{chunk[offset : offset+BTREE_PAGE_SIZE]}
        }
        start = end
    }
    panic("bad ptr")
}

The function for allocating a B-tree page is changed to reuse pages from the free list first.

分配 B 树页面的功能更改为首先重用空闲列表中的页面。

go 复制代码
// callback for BTree, allocate a new page.
func (db *KV) pageNew(node BNode) uint64 {
    assert(len(node.data) <= BTREE_PAGE_SIZE)
    ptr := uint64(0)
    if db.page.nfree < db.free.Total() {
        // reuse a deallocated page
        ptr = db.free.Get(db.page.nfree)
        db.page.nfree++
    } else {
        // append a new page
        ptr = db.page.flushed + uint64(db.page.nappend)
        db.page.nappend++
    }
    db.page.updates[ptr] = node.data
    return ptr
}

Removed pages are marked for the free list update later.

删除的页面将被标记为稍后更新空闲列表。

go 复制代码
// callback for BTree, deallocate a page.
func (db *KV) pageDel(ptr uint64) {
    db.page.updates[ptr] = nil
}

Step 3: Page Management for the Free List

步骤3:空闲列表的页面管理

Callbacks for appending a new page and reusing a page for the free list:

用于附加新页面并重新使用空闲列表页面的回调:

go 复制代码
// callback for FreeList, allocate a new page.
func (db *KV) pageAppend(node BNode) uint64 {
    assert(len(node.data) <= BTREE_PAGE_SIZE)
    ptr := db.page.flushed + uint64(db.page.nappend)
    db.page.nappend++
    db.page.updates[ptr] = node.data
    return ptr
}

// callback for FreeList, reuse a page.
func (db *KV) pageUse(ptr uint64, node BNode) {
    db.page.updates[ptr] = node.data
}

Step 4: Update the Free List

第 4 步:更新空闲列表

Before extending the file and writing pages to disk, we must update the free list first since it also creates pending writes.

在扩展文件并将页面写入磁盘之前,我们必须首先更新空闲列表,因为它还会创建挂起的写入。

go 复制代码
func writePages(db *KV) error {
    // update the free list
    freed := []uint64{}
    for ptr, page := range db.page.updates {
        if page == nil {
            freed = append(freed, ptr)
        }
    }
    db.free.Update(db.page.nfree, freed)

    // extend the file & mmap if needed
    // omitted...

    // copy pages to the file
    for ptr, page := range db.page.updates {
        if page != nil {
            copy(pageGetMapped(db, ptr).data, page)
        }
    }
    return nil
}

The pointer to the list head is added to the master page:

指向列表头的指针被添加到母版页中:

复制代码
| sig | btree_root | page_used | free_list |
| 16B |     8B     |     8B    |     8B    |

Step 5: Done 第 5 步:完成

The KV store is finished. It is persistent and crash resistant, although it can only be accessed sequentially.

KV存储完成了。尽管只能按顺序访问,但它具有持久性和抗崩溃性。

There is more to learn in part II of the book:

本书的第二部分还有更多内容需要学习:

  • Relational DB on the KV store.
    KV 存储上的关系型数据库。
  • Concurrent access to the database and transactions.
    对数据库和事务的并发访问。
相关推荐
Freedom℡1 小时前
Spark,连接MySQL数据库,添加数据,读取数据
数据库·hadoop·spark
Code哈哈笑2 小时前
【图书管理系统】用户注册系统实现详解
数据库·spring boot·后端·mybatis
2401_837088502 小时前
SQL性能分析
数据库·sql
用手手打人2 小时前
SpringBoot(一)--- Maven基础
spring boot·后端·maven
瓜皮弟子头很铁2 小时前
多项目共用SQL 添加需要字段
数据库·sql
CryptoRzz2 小时前
股票数据源对接技术指南:印度尼西亚、印度、韩国
数据库·python·金融·数据分析·区块链
Pluto_CSND2 小时前
hbase shell的常用命令
大数据·数据库·hbase
哈哈真棒2 小时前
sparkSQL读入csv文件写入mysql(2)
数据库·mysql
Cynicism_Smile3 小时前
Mysql 8.0.32 union all 创建视图后中文模糊查询失效
数据库·mysql
小oo呆3 小时前
【自然语言处理与大模型】向量数据库技术
数据库·人工智能·自然语言处理