虚拟内存、内存分段、分页、CUDA编程中的零拷贝

[1 虚拟地址](#1 虚拟地址)

[2 分段管理](#2 分段管理)

[2.1 为什么分段管理会产生碎片](#2.1 为什么分段管理会产生碎片)

3分页管理

[4 CUDA编程中的零拷贝](#4 CUDA编程中的零拷贝)

参考文献：

下面的1 2 3 4都是我的通俗理解，不是官方语言。

1 虚拟地址

cpu发出的地址是虚拟地址，它经过MMU映射到物理地址上。这个自己以前就知道，不赘述了。

2 分段管理

分段管理其实就是，操作系统每次把一个程序需要的虚拟内存分为什么代码段、数据段、堆栈段等，然后操作系统需要从内存条申请一大块连续内存 给代码段，再申请一大块连续内存给数据段，然后程序中使用的当然都是虚拟地址了，然后cpu通过查询段表知道段基址和偏移地址，从而找到物理地址。

2.1 为什么分段管理会产生碎片

这个也简单理解，就是比如一个程序需要100MB内存，然后一会释放了，过一会另一个程序需要200MB内存，那么前面被释放的那100MB可能现在就没法用了，因为太小了，就这么简单理解就可以。

3分页管理

前面分段管理不是有内存碎片吗，分段管理的弊端我觉得其实就是需要申请连续内存，所以分页管理就来了，分页管理把内存条每4KB划分成一个页，然后程序的虚拟地址也是用4KB一个页给管理起来，这样有个好处就是比如一个程序需要100MB，那么不需要从内存条申请一大块连续空间，因为他4kb一个页去管理，那么边边角角的一些零碎内存也给用起来了，这不就没有碎片了吗。

4 CUDA编程中的零拷贝

cpp 复制代码

    /**
    * Allocate ZeroCopy mapped memory, shared between CUDA and CPU.
    *
    * @note although two pointers are returned, one for CPU and GPU, they both resolve to the same physical memory.
    *
    * @param[out] cpuPtr Returned CPU pointer to the shared memory.
    * @param[out] gpuPtr Returned GPU pointer to the shared memory.
    * @param[in] size Size (in bytes) of the shared memory to allocate.
    *
    * @returns `0` if the allocation succeeded, otherwise faield.
    * @ingroup cudaMemory
    */
    int cudaAllocMapped(void** cpuPtr, void** gpuPtr, size_t size) {
        if (!cpuPtr || !gpuPtr || size == 0)
            return -1;

        CUDA_SAFECALL(cudaHostAlloc(cpuPtr, size, cudaHostAllocMapped), "cudaHostAlloc failed", -1);
        CUDA_SAFECALL(cudaHostGetDevicePointer(gpuPtr, *cpuPtr, 0), "cudaHostGetDevicePointer failed", -1);

        memset(*cpuPtr, 0, size);
        VLOG(3) << "[InferServer] cudaAllocMapped " << size << " bytes, CPU " << *cpuPtr << " GPU " << *gpuPtr;
        return 0;
    }


    /**
    * Allocate ZeroCopy mapped memory, shared between CUDA and CPU.
    *
    * @note this overload of cudaAllocMapped returns one pointer, assumes that the
    *       CPU and GPU addresses will match (as is the case with any recent CUDA version).
    *
    * @param[out] ptr Returned pointer to the shared CPU/GPU memory.
    * @param[in] size Size (in bytes) of the shared memory to allocate.
    *
    * @returns `0` if the allocation succeeded, otherwise failed.
    * @ingroup cudaMemory
    */
    int cudaAllocMapped(void** ptr, size_t size) {
        void* cpuPtr{};
        void* gpuPtr{};

        if (!ptr || size == 0)
            return cudaErrorInvalidValue;

        auto error = cudaAllocMapped(&cpuPtr, &gpuPtr, size);
        if (error != cudaSuccess)
            return error;

        CUDA_SAFECALL(cpuPtr != gpuPtr, "cudaAllocMapped() - addresses of CPU and GPU pointers don't match", cudaErrorMemoryAllocation);

        *ptr = gpuPtr;
        return cudaSuccess;
    }

其实就是cudaAllocMapped申请的内存，cpu和gpu都能访问，然后就不需要拷贝了，

cudaHostAlloc ：用于分配锁页内存，并将其映射到设备地址空间，其中的参数cudaHostAllocMapped 标志表示这块内存是映射的，即可以被GPU直接访问，cudaHostAlloc 分配的内存是共享的，CPU和GPU都可以访问这块内存。这种访问模式通常称为"零拷贝"或"统一虚拟寻址"（UVA）

cudaHostGetDevicePointer：用于获取与锁页内存对应的设备指针。

cpuPtr 和 gpuPtr 实际上指向的是同一块物理内存。这是通过CUDA的统一虚拟寻址（Unified Virtual Addressing, UVA）实现的。统一虚拟寻址（UVA）：现代CUDA版本支持统一虚拟寻址，这使得CPU和GPU可以使用相同的地址空间。这意味着同一块物理内存可以同时被CPU和GPU访问，而不需要显式的数据复制。

参考文献：

一篇文带你搞懂，虚拟内存、内存分页、分段、段页式内存管理（超详细）_一篇文带你搞懂,虚拟内存、内存分页、分段、段页式内存管理(超详细)-CSDN博客

CUDA：cudaHostAlloc()-CSDN博客

CUDA编程------cudaHostAlloc-CSDN博客

CUDA编程------zero copy_cuda zerocopy-CSDN博客