Writing an OS in Rust : Allocator Designs 分配器设计与实现(下)

Writing an OS in Rust : Allocator Designs 分配器设计与实现(上)

4.1 Introduction 介绍

The idea behind a fixed-size block allocator is the following: Instead of allocating exactly as much memory as requested, we define a small number of block sizes and round up each allocation to the next block size. For example, with block sizes of 16, 64, and 512 bytes, an allocation of 4 bytes would return a 16-byte block, an allocation of 48 bytes a 64-byte block, and an allocation of 128 bytes a 512-byte block.

固定大小块分配器背后的想法如下:我们定义少量的块大小并将每个分配向上舍入到下一个块大小,而不是完全按照请求分配尽可能多的内存。例如,对于 16、64 和 512 字节的块大小,分配 4 字节将返回 16 字节块,分配 48 字节将返回 64 字节块,分配 128 字节将返回 512 字节块。 。

Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. However, instead of using a single list with different block sizes, we create a separate list for each size class. Each list then only stores blocks of a single size. For example, with block sizes of 16, 64, and 512, there would be three separate linked lists in memory:

与链表分配器一样,我们通过在未使用的内存中创建链表来跟踪未使用的内存。但是,我们不是使用具有不同块大小的单个列表,而是为每个大小类别创建一个单独的列表。每个列表仅存储单一大小的块。例如,块大小为 16、64 和 512 时,内存中将存在三个独立的链表:

.

Instead of a single head pointer, we have the three head pointers head_16, head_64, and head_512 that each point to the first unused block of the corresponding size. All nodes in a single list have the same size. For example, the list started by the head_16 pointer only contains 16-byte blocks. This means that we no longer need to store the size in each list node since it is already specified by the name of the head pointer.

我们有三个头指针 head_16head_64head_512 ,而不是单个 head 指针,每个指针都指向第一个未使用的头指针相应大小的块。单个列表中的所有节点都具有相同的大小。例如,由 head_16 指针开始的列表仅包含16字节块。这意味着我们不再需要在每个列表节点中存储大小,因为它已经由头指针的名称指定。

Since each element in a list has the same size, each list element is equally suitable for an allocation request. This means that we can very efficiently perform an allocation using the following steps:

由于列表中的每个元素具有相同的大小,因此每个列表元素同样适合分配请求。这意味着我们可以使用以下步骤非常有效地执行分配:

  • Round up the requested allocation size to the next block size. For example, when an allocation of 12 bytes is requested, we would choose the block size of 16 in the above example.

    将请求的分配大小向上舍入到下一个块大小。例如,当请求分配 12 字节时,我们将在上例中选择块大小 16。

  • Retrieve the head pointer for the list, e.g., for block size 16, we need to use head_16.

    检索列表的头指针,例如,对于块大小 16,我们需要使用 head_16

  • Remove the first block from the list and return it.

    从列表中删除第一个块并将其返回。

Most notably, we can always return the first element of the list and no longer need to traverse the full list. Thus, allocations are much faster than with the linked list allocator.

最值得注意的是,我们始终可以返回列表的第一个元素,而不再需要遍历整个列表。因此,分配比链表分配器快得多。

Block Sizes and Wasted Memory

块大小和浪费的内存

Depending on the block sizes, we lose a lot of memory by rounding up. For example, when a 512-byte block is returned for a 128-byte allocation, three-quarters of the allocated memory is unused. By defining reasonable block sizes, it is possible to limit the amount of wasted memory to some degree. For example, when using the powers of 2 (4, 8, 16, 32, 64, 128, ...) as block sizes, we can limit the memory waste to half of the allocation size in the worst case and a quarter of the allocation size in the average case.

根据块大小,我们通过舍入会丢失大量内存。例如,当为 128 字节分配返回 512 字节块时,四分之三的已分配内存未使用。通过定义合理的块大小,可以在一定程度上限制浪费的内存量。例如,当使用 2 的幂(4、8、16、32、64、128...)作为块大小时,我们可以将内存浪费限制在最坏情况下分配大小的一半和分配大小的四分之一平均情况下的大小。

It is also common to optimize block sizes based on common allocation sizes in a program. For example, we could additionally add block size 24 to improve memory usage for programs that often perform allocations of 24 bytes. This way, the amount of wasted memory can often be reduced without losing the performance benefits.

根据程序中常见的分配大小来优化块大小也很常见。例如,我们可以另外添加块大小 24,以改善经常执行 24 字节分配的程序的内存使用情况。这样,通常可以减少浪费的内存量,而不会损失性能优势。

Deallocation 解除分配

Much like allocation, deallocation is also very performant. It involves the following steps:

与分配非常相似,释放也非常高效。它涉及以下步骤:

  • Round up the freed allocation size to the next block size. This is required since the compiler only passes the requested allocation size to dealloc, not the size of the block that was returned by alloc. By using the same size-adjustment function in both alloc and dealloc, we can make sure that we always free the correct amount of memory.

    将释放的分配大小向上舍入到下一个块大小。这是必需的,因为编译器仅将请求的分配大小传递给 dealloc ,而不是 alloc 返回的块的大小。通过在 allocdealloc 中使用相同的大小调整函数,我们可以确保始终释放正确的内存量。

  • Retrieve the head pointer for the list.

    检索列表的头指针。

  • Add the freed block to the front of the list by updating the head pointer.

    通过更新头指针将释放的块添加到链表的前面。

Most notably, no traversal of the list is required for deallocation either. This means that the time required for a dealloc call stays the same regardless of the list length.

最值得注意的是,释放也不需要遍历列表。这意味着无论列表长度如何, dealloc 调用所需的时间都保持不变。

Fallback Allocator 后备分配器

Given that large allocations (>2 KB) are often rare, especially in operating system kernels, it might make sense to fall back to a different allocator for these allocations. For example, we could fall back to a linked list allocator for allocations greater than 2048 bytes in order to reduce memory waste. Since only very few allocations of that size are expected, the linked list would stay small and the (de)allocations would still be reasonably fast.

鉴于大型分配 (>2 KB) 通常很少见,尤其是在操作系统内核中,因此为这些分配回退到不同的分配器可能是有意义的。例如,对于大于 2048 字节的分配,我们可以回退到链表分配器,以减少内存浪费。由于预计该大小的分配非常少,因此链表将保持很小,并且分配(取消)分配仍然相当快。

Creating new Blocks 创建新块

Above, we always assumed that there are always enough blocks of a specific size in the list to fulfill all allocation requests. However, at some point, the linked list for a given block size becomes empty. At this point, there are two ways we can create new unused blocks of a specific size to fulfill an allocation request:

上面,我们始终假设列表中始终有足够的特定大小的块来满足所有分配请求。然而,在某些时候,给定块大小的链表会变空。此时,我们可以通过两种方法创建特定大小的新的未使用块来满足分配请求:

  • Allocate a new block from the fallback allocator (if there is one).

    从后备分配器(如果有)分配一个新块。

  • Split a larger block from a different list. This best works if block sizes are powers of two. For example, a 32-byte block can be split into two 16-byte blocks.

    从不同的列表中分割一个更大的块。如果块大小是 2 的幂,则此方法最有效。例如,一个32字节的块可以被分割成两个16字节的块。

For our implementation, we will allocate new blocks from the fallback allocator since the implementation is much simpler.

对于我们的实现,我们将从后备分配器分配新块,因为实现要简单得多。

4.2 Implementation

Now that we know how a fixed-size block allocator works, we can start our implementation. We won't depend on the implementation of the linked list allocator created in the previous section, so you can follow this part even if you skipped the linked list allocator implementation.

现在我们知道固定大小的块分配器是如何工作的,我们可以开始实现了。我们不会依赖于上一节中创建的链表分配器的实现,因此即使您跳过了链表分配器的实现,也可以按照这部分进行操作。

List Node 列表节点

We start our implementation by creating a ListNode type in a new allocator::fixed_size_block module:

我们通过在新的 allocator::fixed_size_block 模块中创建 ListNode 类型来开始实现:

rust 复制代码
// in src/allocator.rs

pub mod fixed_size_block;
rust 复制代码
// in src/allocator/fixed_size_block.rs

struct ListNode {
    next: Option<&'static mut ListNode>,
}

This type is similar to the ListNode type of our linked list allocator implementation, with the difference that we don't have a size field. It isn't needed because every block in a list has the same size with the fixed-size block allocator design.

此类型类似于我们的链表分配器实现的 ListNode 类型,区别在于我们没有 size 字段。不需要它,因为在固定大小的块分配器设计中,列表中的每个块都具有相同的大小。

Block Sizes 块尺寸

Next, we define a constant BLOCK_SIZES slice with the block sizes used for our implementation:

接下来,我们定义一个常量 BLOCK_SIZES 切片,其中包含用于实现的块大小:

rust 复制代码
// in src/allocator/fixed_size_block.rs

/// The block sizes to use.
///
/// The sizes must each be power of 2 because they are also used as
/// the block alignment (alignments must be always powers of 2).
const BLOCK_SIZES: &[usize] = &[8, 16, 32, 64, 128, 256, 512, 1024, 2048];

As block sizes, we use powers of 2, starting from 8 up to 2048. We don't define any block sizes smaller than 8 because each block must be capable of storing a 64-bit pointer to the next block when freed. For allocations greater than 2048 bytes, we will fall back to a linked list allocator.

对于块大小,我们使用 2 的幂,从 8 到 2048。我们不定义任何小于 8 的块大小,因为每个块在释放时必须能够存储指向下一个块的 64 位指针。对于大于 2048 字节的分配,我们将回退到链表分配器。

To simplify the implementation, we define the size of a block as its required alignment in memory. So a 16-byte block is always aligned on a 16-byte boundary and a 512-byte block is aligned on a 512-byte boundary. Since alignments always need to be powers of 2, this rules out any other block sizes. If we need block sizes that are not powers of 2 in the future, we can still adjust our implementation for this (e.g., by defining a second BLOCK_ALIGNMENTS array).

为了简化实现,我们将块的大小定义为其在内存中所需的对齐方式。因此,16 字节块始终在 16 字节边界上对齐,512 字节块始终在 512 字节边界上对齐。由于对齐始终需要为 2 的幂,因此排除了任何其他块大小。如果我们将来需要的块大小不是 2 的幂,我们仍然可以为此调整我们的实现(例如,通过定义第二个 BLOCK_ALIGNMENTS 数组)。

The Allocator Type 分配器类型

Using the ListNode type and the BLOCK_SIZES slice, we can now define our allocator type:

使用 ListNode 类型和 BLOCK_SIZES 切片,我们现在可以定义分配器类型:

rust 复制代码
// in src/allocator/fixed_size_block.rs

pub struct FixedSizeBlockAllocator {
    list_heads: [Option<&'static mut ListNode>; BLOCK_SIZES.len()],
    fallback_allocator: linked_list_allocator::Heap,
}

The list_heads field is an array of head pointers, one for each block size. This is implemented by using the len() of the BLOCK_SIZES slice as the array length. As a fallback allocator for allocations larger than the largest block size, we use the allocator provided by the linked_list_allocator. We could also use the LinkedListAllocator we implemented ourselves instead, but it has the disadvantage that it does not merge freed blocks.

list_heads 字段是一个 head 指针数组,每个指针对应一个块大小。这是通过使用 BLOCK_SIZES 切片的 len() 作为数组长度来实现的。作为大于最大块大小的分配的后备分配器,我们使用 linked_list_allocator 提供的分配器。我们也可以使用我们自己实现的 LinkedListAllocator 来代替,但它的缺点是它不会合并释放的块。

For constructing a FixedSizeBlockAllocator, we provide the same new and init functions that we implemented for the other allocator types too:

为了构造 FixedSizeBlockAllocator ,我们提供了与其他分配器类型相同的 newinit 函数:

rust 复制代码
// in src/allocator/fixed_size_block.rs

impl FixedSizeBlockAllocator {
    /// Creates an empty FixedSizeBlockAllocator.
    pub const fn new() -> Self {
        const EMPTY: Option<&'static mut ListNode> = None;
        FixedSizeBlockAllocator {
            list_heads: [EMPTY; BLOCK_SIZES.len()],
            fallback_allocator: linked_list_allocator::Heap::empty(),
        }
    }

    /// Initialize the allocator with the given heap bounds.
    ///
    /// This function is unsafe because the caller must guarantee that the given
    /// heap bounds are valid and that the heap is unused. This method must be
    /// called only once.
    pub unsafe fn init(&mut self, heap_start: usize, heap_size: usize) {
        self.fallback_allocator.init(heap_start, heap_size);
    }
}

The new function just initializes the list_heads array with empty nodes and creates an empty linked list allocator as fallback_allocator. The EMPTY constant is needed to tell the Rust compiler that we want to initialize the array with a constant value. Initializing the array directly as [None; BLOCK_SIZES.len()] does not work, because then the compiler requires Option<&'static mut ListNode> to implement the Copy trait, which it does not. This is a current limitation of the Rust compiler, which might go away in the future.

new 函数只是用空节点初始化 list_heads 数组,并创建一个 empty 链表分配器作为 fallback_allocatorEMPTY 常量需要告诉 Rust 编译器我们想要用常量值初始化数组。直接将数组初始化为 [None; BLOCK_SIZES.len()] 不起作用,因为编译器需要 Option<&'static mut ListNode> 来实现 Copy 特征,但事实并非如此。这是 Rust 编译器当前的限制,将来可能会消失。

If you haven't done so already for the LinkedListAllocator implementation, you also need to add #![feature(const_mut_refs)] to the top of your lib.rs. The reason is that any use of mutable reference types in const functions is still unstable, including the Option<&'static mut ListNode> array element type of the list_heads field (even if we set it to None).

如果您尚未对 LinkedListAllocator 实现执行此操作,则还需要将 #![feature(const_mut_refs)] 添加到 lib.rs 的顶部。原因是在 const 函数中使用可变引用类型仍然不稳定,包括 list_heads 字段的 Option<&'static mut ListNode> 数组元素类型(即使我们将其设置为 None

The unsafe init function only calls the init function of the fallback_allocator without doing any additional initialization of the list_heads array. Instead, we will initialize the lists lazily on alloc and dealloc calls.

不安全的 init 函数仅调用 fallback_allocatorinit 函数,而不对 list_heads 数组进行任何额外的初始化。相反,我们将在 allocdealloc 调用时延迟初始化列表。

For convenience, we also create a private fallback_alloc method that allocates using the fallback_allocator:

为了方便起见,我们还创建了一个使用 fallback_allocator 进行分配的私有 fallback_alloc 方法:

rust 复制代码
// in src/allocator/fixed_size_block.rs

use alloc::alloc::Layout;
use core::ptr;

impl FixedSizeBlockAllocator {
    /// Allocates using the fallback allocator.
    fn fallback_alloc(&mut self, layout: Layout) -> *mut u8 {
        match self.fallback_allocator.allocate_first_fit(layout) {
            Ok(ptr) => ptr.as_ptr(),
            Err(_) => ptr::null_mut(),
        }
    }
}

The Heap type of the linked_list_allocator crate does not implement GlobalAlloc (as it's not possible without locking). Instead, it provides an allocate_first_fit method that has a slightly different interface. Instead of returning a *mut u8 and using a null pointer to signal an error, it returns a Result<NonNull<u8>, ()>. The NonNull type is an abstraction for a raw pointer that is guaranteed to not be a null pointer. By mapping the Ok case to the NonNull::as_ptr method and the Err case to a null pointer, we can easily translate this back to a *mut u8 type.

linked_list_allocator crate 的 Heap 类型不实现 GlobalAlloc (因为没有锁定就不可能实现)。相反,它提供了一个接口略有不同的 allocate_first_fit 方法。它不是返回 *mut u8 并使用空指针来表示错误,而是返回 Result<NonNull<u8>, ()>NonNull 类型是原始指针的抽象,保证不是空指针。通过将 Ok 情况映射到 NonNull::as_ptr 方法,并将 Err 情况映射到空指针,我们可以轻松地将其转换回 *mut u8 类型。

Calculating the List Index

计算列表索引

Before we implement the GlobalAlloc trait, we define a list_index helper function that returns the lowest possible block size for a given Layout:

在实现 GlobalAlloc 特征之前,我们定义一个 list_index 辅助函数,它返回给定 Layout 的最小可能块大小:

rust 复制代码
// in src/allocator/fixed_size_block.rs

/// Choose an appropriate block size for the given layout.
///
/// Returns an index into the `BLOCK_SIZES` array.
fn list_index(layout: &Layout) -> Option<usize> {
    let required_block_size = layout.size().max(layout.align());
    BLOCK_SIZES.iter().position(|&s| s >= required_block_size)
}

The block must have at least the size and alignment required by the given Layout. Since we defined that the block size is also its alignment, this means that the required_block_size is the maximum of the layout's size() and align() attributes. To find the next-larger block in the BLOCK_SIZES slice, we first use the iter() method to get an iterator and then the position() method to find the index of the first block that is at least as large as the required_block_size.

该块必须至少具有给定 Layout 所需的大小和对齐方式。由于我们定义块大小也是其对齐方式,这意味着 required_block_size 是布局的 size()align() 属性中的最大值。为了在 BLOCK_SIZES 切片中查找下一个更大的块,我们首先使用 iter() 方法获取迭代器,然后使用 position() 方法查找索引第一个块至少与 required_block_size 一样大。

Note that we don't return the block size itself, but the index into the BLOCK_SIZES slice. The reason is that we want to use the returned index as an index into the list_heads array.

请注意,我们不返回块大小本身,而是返回 BLOCK_SIZES 切片的索引。原因是我们想使用返回的索引作为 list_heads 数组的索引。

Implementing GlobalAlloc

The last step is to implement the GlobalAlloc trait:

最后一步是实现 GlobalAlloc 特征:

rust 复制代码
// in src/allocator/fixed_size_block.rs

use super::Locked;
use alloc::alloc::GlobalAlloc;

unsafe impl GlobalAlloc for Locked<FixedSizeBlockAllocator> {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        todo!();
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        todo!();
    }
}

Like for the other allocators, we don't implement the GlobalAlloc trait directly for our allocator type, but use the Locked wrapper to add synchronized interior mutability. Since the alloc and dealloc implementations are relatively large, we introduce them one by one in the following.

与其他分配器一样,我们不直接为分配器类型实现 GlobalAlloc 特征,而是使用 Locked 包装器来添加同步内部可变性。由于 allocdealloc 的实现比较大,下面我们一一介绍。

实现 alloc

The implementation of the alloc method looks like this:

alloc 方法的实现如下所示:

rust 复制代码
// in `impl` block in src/allocator/fixed_size_block.rs

unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
    let mut allocator = self.lock();
    match list_index(&layout) {
        Some(index) => {
            match allocator.list_heads[index].take() {
                Some(node) => {
                    allocator.list_heads[index] = node.next.take();
                    node as *mut ListNode as *mut u8
                }
                None => {
                    // no block exists in list => allocate new block
                    let block_size = BLOCK_SIZES[index];
                    // only works if all block sizes are a power of 2
                    let block_align = block_size;
                    let layout = Layout::from_size_align(block_size, block_align)
                        .unwrap();
                    allocator.fallback_alloc(layout)
                }
            }
        }
        None => allocator.fallback_alloc(layout),
    }
}

Let's go through it step by step:

让我们一步步看一下:

First, we use the Locked::lock method to get a mutable reference to the wrapped allocator instance. Next, we call the list_index function we just defined to calculate the appropriate block size for the given layout and get the corresponding index into the list_heads array. If this index is None, no block size fits for the allocation, therefore we use the fallback_allocator using the fallback_alloc function.

首先,我们使用 Locked::lock 方法获取对包装的分配器实例的可变引用。接下来,我们调用刚刚定义的 list_index 函数来计算给定布局的适当块大小,并将相应的索引获取到 list_heads 数组中。如果此索引为 None ,则没有适合分配的块大小,因此我们使用 fallback_allocatorfallback_alloc 函数。

If the list index is Some, we try to remove the first node in the corresponding list started by list_heads[index] using the Option::take method. If the list is not empty, we enter the Some(node) branch of the match statement, where we point the head pointer of the list to the successor of the popped node (by using take again). Finally, we return the popped node pointer as a *mut u8.

如果列表索引为 Some ,我们尝试使用 Option::take 方法删除由 list_heads[index] 开头的相应列表中的第一个节点。如果列表不为空,则进入 match 语句的 Some(node) 分支,将列表的头指针指向弹出的 node )。最后,我们将弹出的 node 指针作为 *mut u8 返回。

If the list head is None, it indicates that the list of blocks is empty. This means that we need to construct a new block as described above. For that, we first get the current block size from the BLOCK_SIZES slice and use it as both the size and the alignment for the new block. Then we create a new Layout from it and call the fallback_alloc method to perform the allocation. The reason for adjusting the layout and alignment is that the block will be added to the block list on deallocation.

如果列表头是 None ,则表明块列表为空。这意味着我们需要如上所述构造一个新块。为此,我们首先从 BLOCK_SIZES 切片中获取当前块大小,并将其用作新块的大小和对齐方式。然后我们从中创建一个新的 Layout 并调用 fallback_alloc 方法来执行分配。调整布局和对齐方式的原因是该块将在释放时添加到块列表中。

实现dealloc

The implementation of the dealloc method looks like this:

dealloc 方法的实现如下所示:

rust 复制代码
// in src/allocator/fixed_size_block.rs

use core::{mem, ptr::NonNull};

// inside the `unsafe impl GlobalAlloc` block

unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
    let mut allocator = self.lock();
    match list_index(&layout) {
        Some(index) => {
            let new_node = ListNode {
                next: allocator.list_heads[index].take(),
            };
            // verify that block has size and alignment required for storing node
            assert!(mem::size_of::<ListNode>() <= BLOCK_SIZES[index]);
            assert!(mem::align_of::<ListNode>() <= BLOCK_SIZES[index]);
            let new_node_ptr = ptr as *mut ListNode;
            new_node_ptr.write(new_node);
            allocator.list_heads[index] = Some(&mut *new_node_ptr);
        }
        None => {
            let ptr = NonNull::new(ptr).unwrap();
            allocator.fallback_allocator.deallocate(ptr, layout);
        }
    }
}

Like in alloc, we first use the lock method to get a mutable allocator reference and then the list_index function to get the block list corresponding to the given Layout. If the index is None, no fitting block size exists in BLOCK_SIZES, which indicates that the allocation was created by the fallback allocator. Therefore, we use its deallocate to free the memory again. The method expects a NonNull instead of a *mut u8, so we need to convert the pointer first. (The unwrap call only fails when the pointer is null, which should never happen when the compiler calls dealloc.)

alloc 一样,我们首先使用 lock 方法获取可变分配器引用,然后使用 list_index 函数获取与给定 Layout 。如果索引为 None ,则 BLOCK_SIZES 中不存在合适的块大小,这表明分配是由后备分配器创建的。因此,我们再次使用它的 deallocate 来释放内存。该方法需要 NonNull 而不是 *mut u8 ,因此我们需要首先转换指针。 ( unwrap 调用仅在指针为 null 时失败,而当编译器调用 dealloc 时,这种情况永远不会发生。)

If list_index returns a block index, we need to add the freed memory block to the list. For that, we first create a new ListNode that points to the current list head (by using Option::take again). Before we write the new node into the freed memory block, we first assert that the current block size specified by index has the required size and alignment for storing a ListNode. Then we perform the write by converting the given *mut u8 pointer to a *mut ListNode pointer and then calling the unsafe write method on it. The last step is to set the head pointer of the list, which is currently None since we called take on it, to our newly written ListNode. For that, we convert the raw new_node_ptr to a mutable reference.

如果 list_index 返回块索引,我们需要将释放的内存块添加到列表中。为此,我们首先创建一个新的 ListNode 指向当前列表头(再次使用 Option::take )。在将新节点写入已释放的内存块之前,我们首先断言 index 指定的当前块大小具有存储 ListNode 所需的大小和对齐方式。然后,我们通过将给定的 *mut u8 指针转换为 *mut ListNode 指针,然后对其调用不安全的 write 方法来执行写入。最后一步是将列表的头指针(当前为 None 因为我们在其上调用 take )设置为我们新编写的 ListNode 。为此,我们将原始 new_node_ptr 转换为可变引用。

There are a few things worth noting:

有几点值得注意:

  • We don't differentiate between blocks allocated from a block list and blocks allocated from the fallback allocator. This means that new blocks created in alloc are added to the block list on dealloc, thereby increasing the number of blocks of that size.

    我们不区分从块列表分配的块和从后备分配器分配的块。这意味着在 alloc 中创建的新块将添加到 dealloc 上的块列表中,从而增加该大小的块的数量。

  • The alloc method is the only place where new blocks are created in our implementation. This means that we initially start with empty block lists and only fill these lists lazily when allocations of their block size are performed.

    alloc 方法是我们的​​实现中创建新块的唯一位置。这意味着我们最初从空块列表开始,并且仅在执行块大小分配时才延迟填充这些列表。

  • We don't need unsafe blocks in alloc and dealloc, even though we perform some unsafe operations. The reason is that Rust currently treats the complete body of unsafe functions as one large unsafe block. Since using explicit unsafe blocks has the advantage that it's obvious which operations are unsafe and which are not, there is a proposed RFC to change this behavior.

    即使我们执行一些 unsafe 操作,我们也不需要 allocdealloc 中的 unsafe 块。原因是 Rust 目前将不安全函数的完整主体视为一个大的 unsafe 块。由于使用显式 unsafe 块的优点是哪些操作不安全、哪些操作不安全一目了然,因此有一个提议的 RFC 来改变这种行为。

4.3 Using it 使用它

To use our new FixedSizeBlockAllocator, we need to update the ALLOCATOR static in the allocator module:

要使用新的 FixedSizeBlockAllocator ,我们需要更新 allocator 模块中的静态 ALLOCATOR

rust 复制代码
// in src/allocator.rs

use fixed_size_block::FixedSizeBlockAllocator;

#[global_allocator]
static ALLOCATOR: Locked<FixedSizeBlockAllocator> = Locked::new(
    FixedSizeBlockAllocator::new());

Since the init function behaves the same for all allocators we implemented, we don't need to modify the init call in init_heap.

由于 init 函数对于我们实现的所有分配器的行为都是相同的,因此我们不需要修改 init_heap 中的 init 调用。

When we now run our heap_allocation tests again, all tests should still pass:

当我们现在再次运行 heap_allocation 测试时,所有测试仍应通过:

css 复制代码
> cargo test --test heap_allocation
simple_allocation... [ok]
large_vec... [ok]
many_boxes... [ok]
many_boxes_long_lived... [ok]

Our new allocator seems to work!

我们的新分配器似乎有效!

4.4 Discussion 讨论

While the fixed-size block approach has much better performance than the linked list approach, it wastes up to half of the memory when using powers of 2 as block sizes. Whether this tradeoff is worth it heavily depends on the application type. For an operating system kernel, where performance is critical, the fixed-size block approach seems to be the better choice.

虽然固定大小的块方法比链表方法具有更好的性能,但当使用 2 的幂作为块大小时,它会浪费多达一半的内存。这种权衡是否值得在很大程度上取决于应用程序类型。对于性能至关重要的操作系统内核来说,固定大小的块方法似乎是更好的选择

On the implementation side, there are various things that we could improve in our current implementation:

在实施方面,我们当前的实施中有很多可以改进的地方:

  • Instead of only allocating blocks lazily using the fallback allocator, it might be better to pre-fill the lists to improve the performance of initial allocations.

    与其仅使用后备分配器延迟分配块,不如预先填充列表以提高初始分配的性能。

  • To simplify the implementation, we only allowed block sizes that are powers of 2 so that we could also use them as the block alignment. By storing (or calculating) the alignment in a different way, we could also allow arbitrary other block sizes. This way, we could add more block sizes, e.g., for common allocation sizes, in order to minimize the wasted memory.

    为了简化实现,我们只允许块大小为 2 的幂,这样我们也可以使用它们作为块对齐。通过以不同的方式存储(或计算)对齐方式,我们还可以允许任意其他块大小。这样,我们可以添加更多的块大小,例如,对于常见的分配大小,以最大限度地减少浪费的内存。

  • We currently only create new blocks, but never free them again. This results in fragmentation and might eventually result in allocation failure for large allocations. It might make sense to enforce a maximum list length for each block size. When the maximum length is reached, subsequent deallocations are freed using the fallback allocator instead of being added to the list.

    我们目前只创建新块,但不再释放它们。这会导致碎片,并可能最终导致大型分配的分配失败。强制每个块大小的最大列表长度可能是有意义的。当达到最大长度时,后续的释放将使用后备分配器释放,而不是添加到列表中。

  • Instead of falling back to a linked list allocator, we could have a special allocator for allocations greater than 4 KiB. The idea is to utilize paging, which operates on 4 KiB pages, to map a continuous block of virtual memory to non-continuous physical frames. This way, fragmentation of unused memory is no longer a problem for large allocations.

    我们可以为大于 4 KiB 的分配提供一个特殊的分配器,而不是退回到链表分配器。这个想法是利用分页(在 4 KiB 页面上运行)将连续的虚拟内存块映射到非连续的物理帧。这样,未使用的内存碎片对于大型分配不再是问题。

  • With such a page allocator, it might make sense to add block sizes up to 4 KiB and drop the linked list allocator completely. The main advantages of this would be reduced fragmentation and improved performance predictability, i.e., better worst-case performance.

    使用这样的页面分配器,添加高达 4 KiB 的块大小并完全删除链接列表分配器可能是有意义的。这样做的主要优点是减少碎片并提高性能可预测性,即更好的最坏情况性能。

It's important to note that the implementation improvements outlined above are only suggestions. Allocators used in operating system kernels are typically highly optimized for the specific workload of the kernel, which is only possible through extensive profiling.

值得注意的是,上述实施改进只是建议。操作系统内核中使用的分配器通常针对内核的特定工作负载进行高度优化,这只有通过广泛的分析才能实现。

4.5 Variations 变体

There are also many variations of the fixed-size block allocator design. Two popular examples are the slab allocator and the buddy allocator, which are also used in popular kernels such as Linux. In the following, we give a short introduction to these two designs.

固定大小的块分配器 设计也有许多变体。两个流行的例子是slab分配器buddy分配器,它们也用在Linux等流行内核中。下面我们对这两种设计进行简单介绍。

4.5.1 Slab Allocator

The idea behind a slab allocator is to use block sizes that directly correspond to selected types in the kernel. This way, allocations of those types fit a block size exactly and no memory is wasted. Sometimes, it might be even possible to preinitialize type instances in unused blocks to further improve performance.

Slab分配器背后的想法是使用直接对应于内核中选定类型 的块大小。这样,这些类型的分配完全适合块大小,并且不会浪费内存。有时,甚至可以在未使用的块中预初始化类型实例,以进一步提高性能。

Slab allocation is often combined with other allocators. For example, it can be used together with a fixed-size block allocator to further split an allocated block in order to reduce memory waste. It is also often used to implement an object pool pattern on top of a single large allocation.

Slab分配通常与其他分配器结合使用。例如,它可以与固定大小的块分配器一起使用,以进一步分割分配的块,以减少内存浪费。它还经常用于在单个大型分配之上实现对象池模式。

4.5.2 Buddy Allocator 伙伴分配器

Instead of using a linked list to manage freed blocks, the buddy allocator design uses a binary tree data structure together with power-of-2 block sizes. When a new block of a certain size is required, it splits a larger sized block into two halves, thereby creating two child nodes in the tree. Whenever a block is freed again, its neighbor block in the tree is analyzed. If the neighbor is also free, the two blocks are joined back together to form a block of twice the size.

伙伴分配器设计不使用链表来管理释放的块,而是使用二叉树数据结构和 2 的幂次方块大小。当需要一定大小的新块时,它将较大尺寸的块分成两半,从而在树中创建两个子节点。每当一个块再次被释放时,就会分析树中它的邻居块。如果邻居也空闲,则这两个块将重新连接在一起以形成两倍大小的块。

The advantage of this merge process is that external fragmentation is reduced so that small freed blocks can be reused for a large allocation. It also does not use a fallback allocator, so the performance is more predictable. The biggest drawback is that only power-of-2 block sizes are possible, which might result in a large amount of wasted memory due to internal fragmentation. For this reason, buddy allocators are often combined with a slab allocator to further split an allocated block into multiple smaller blocks.

此合并过程的优点是减少了外部碎片,以便可以将小的释放块重新用于大型分配。它也不使用后备分配器,因此性能更可预测。最大的缺点是只能使用 2 的幂次方块大小,这可能会因内部碎片而导致大量内存浪费。因此,伙伴分配器通常与slab分配器结合使用,以进一步将分配的块分割成多个更小的块。

5. 总结

This post gave an overview of different allocator designs. We learned how to implement a basic bump allocator, which hands out memory linearly by increasing a single next pointer. While bump allocation is very fast, it can only reuse memory after all allocations have been freed. For this reason, it is rarely used as a global allocator.

这篇文章概述了不同的分配器设计。我们学习了如何实现基本的Bump分配器,它通过增加单个 next 指针来线性分配内存。虽然Bump分配非常快,但它只能在所有分配都被释放后才能重用内存。因此,它很少用作全局分配器

Next, we created a linked list allocator that uses the freed memory blocks itself to create a linked list, the so-called free list. This list makes it possible to store an arbitrary number of freed blocks of different sizes. While no memory waste occurs, the approach suffers from poor performance because an allocation request might require a complete traversal of the list. Our implementation also suffers from external fragmentation because it does not merge adjacent freed blocks back together.

接下来,我们创建了一个链表分配器,它使用释放的内存块本身来创建一个链表,即所谓的空闲列表。该列表使得可以存储任意数量的不同大小的释放块。虽然不会发生内存浪费,但该方法的性能较差,因为分配请求可能需要完整遍历列表。我们的实现还受到外部碎片的影响,因为它不会将相邻的已释放块重新合并在一起

To fix the performance problems of the linked list approach, we created a fixed-size block allocator that predefines a fixed set of block sizes. For each block size, a separate free list exists so that allocations and deallocations only need to insert/pop at the front of the list and are thus very fast. Since each allocation is rounded up to the next larger block size, some memory is wasted due to internal fragmentation.

为了解决链表方法的性能问题,我们创建了一个固定大小的块分配器,它预定义了一组固定的块大小。对于每个块大小,存在一个单独的空闲列表 ,因此分配和释放只需要在列表的前面插入/弹出,因此非常快。由于每次分配都会向上舍入到下一个较大的块大小,因此由于内部碎片而浪费了一些内存。

There are many more allocator designs with different tradeoffs. Slab allocation works well to optimize the allocation of common fixed-size structures, but is not applicable in all situations. Buddy allocation uses a binary tree to merge freed blocks back together, but wastes a large amount of memory because it only supports power-of-2 block sizes. It's also important to remember that each kernel implementation has a unique workload, so there is no "best" allocator design that fits all cases.

还有更多具有不同权衡的分配器设计。Slab分配可以很好地优化常见固定尺寸结构的分配,但并不适用于所有情况。伙伴分配使用二叉树将释放的块合并在一起,但浪费大量内存,因为它只支持 2 的幂的块大小。同样重要的是要记住,每个内核实现都有独特的工作负载,因此不存在适合所有情况的"最佳"分配器设计。

6. What's next? 下一步是什么?

With this post, we conclude our memory management implementation for now. Next, we will start exploring multitasking, starting with cooperative multitasking in the form of async/await. In subsequent posts, we will then explore threads, multiprocessing, and processes.

通过这篇文章,我们现在结束了内存管理的实现。接下来,我们将开始探索多任务处理,首先从 async/await 形式的协作多任务处理开始。在后续文章中,我们将探讨线程、多处理和进程。

相关推荐
三天不学习24 分钟前
C# 中的记录类型简介 【代码之美系列】
后端·c#·微软技术·record·记录类型
任小永的博客1 小时前
VUE3+django接口自动化部署平台部署说明文档(使用说明,需要私信)
后端·python·django
凡人的AI工具箱1 小时前
每天40分玩转Django:Django类视图
数据库·人工智能·后端·python·django·sqlite
凡人的AI工具箱1 小时前
每天40分玩转Django:实操图片分享社区
数据库·人工智能·后端·python·django
Q_19284999062 小时前
基于Spring Boot的个人健康管理系统
java·spring boot·后端
liutaiyi82 小时前
Redis可视化工具 RDM mac安装使用
redis·后端·macos
Q_19284999062 小时前
基于Springcloud的智能社区服务系统
后端·spring·spring cloud
xiaocaibao7772 小时前
Java语言的网络编程
开发语言·后端·golang
政采云技术2 小时前
Java反应式编程概述
后端
会说法语的猪3 小时前
springboot实现图片上传、下载功能
java·spring boot·后端