Chapter 0: GPU Memory Is Virtual Memory — Why This Column Exists

The Observation

Open include/drm/ttm/ttm_tt.h in the Linux kernel and you will find this:

c 复制代码

struct ttm_tt {
    struct page **pages;
    uint32_t page_flags;
    struct file *swap_storage;   /* Pointer to shmem struct file for swap storage */
    ...
};

#define TTM_TT_FLAG_SWAPPED  BIT(0)

Now open include/linux/swap.h:

c 复制代码

/* Used for pages in swap cache */
struct swap_info_struct {
    struct file *swap_file;
    ...
};

The terminology is not a coincidence. The GPU memory subsystem in Linux is a virtual memory system. It has virtual address spaces, page tables, demand paging, LRU eviction, swap-out to slower storage, and swap-in on re-access. It just runs on a different processor.

This column exists because understanding that structural identity will make you a fundamentally better GPU systems engineer.

The Thesis

The Linux GPU memory management subsystem (TTM/GEM/GPUVM) is a domain-specific re-derivation of classical CPU virtual memory --- not merely inspired by it, but architecturally isomorphic to it.

Both subsystems solve the same abstract problem:

Given a processor with a virtual address space larger than its fast local memory, multiplex limited physical storage among competing consumers using indirection (page tables), lazy allocation (demand paging), and capacity management (eviction/swap).

The CPU solves this for processes competing for DRAM. The GPU solves this for buffer objects competing for VRAM. The algorithms are the same. The data structures are the same. In many cases, even the function names are the same.

The Evidence (Preview)

Here is what we will demonstrate across this column:

Naming

CPU MM Term	GPU MM (DRM/TTM) Direct Equivalent
`swap_storage` (swap backing file)	`ttm_tt.swap_storage` (shmem backing file)
`do_swap_page()` (swap-in handler)	`ttm_tt_swapin()` (GPU swap-in handler)
`shrink_inactive_list()`	`drm_gem_lru_scan()`
`handle_mm_fault()`	`ttm_tt_populate()`
`MADV_DONTNEED`	`DRM_GEM_OBJECT_PURGEABLE`

These are not metaphors. They are the actual symbol names in the kernel source.

Structure

复制代码

CPU: mm_struct ──contains──→ vm_area_struct ──maps──→ struct page (in RAM)
                                                         ↕ eviction
                                                     swap entry (on disk)

GPU: drm_gpuvm ──contains──→ drm_gpuva ──maps──→ ttm_resource (in VRAM)
                                                       ↕ eviction
                                                   swap_storage (in shmem)

Algorithm

The ttm_tt_swapin() function in the kernel reads:

c 复制代码

int ttm_tt_swapin(struct ttm_tt *ttm)
{
    struct file *swap_storage = ttm->swap_storage;
    struct address_space *swap_space = swap_storage->f_mapping;

    for (i = 0; i < ttm->num_pages; ++i) {
        from_page = shmem_read_mapping_page_gfp(swap_space, i, gfp_mask);
        to_page = ttm->pages[i];
        copy_highpage(to_page, from_page);
        put_page(from_page);
    }

    ttm->swap_storage = NULL;
    ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
    return 0;
}

Compare with the CPU's do_swap_page() logic:

Read page from swap backing store
Copy into allocated page frame
Clear swap state

The structure is identical. The difference is only which processor will access the result.

Why GPU Developers Should Care

1. You already know more than you think

If you understand TTM eviction, you understand CPU page reclaim. If you understand drm_gpuvm, you understand mm_struct. The conceptual framework transfers directly --- but only if you see the connection explicitly.

2. The two worlds are merging

HMM (Heterogeneous Memory Management), SVM (Shared Virtual Memory), and ATS/PRI are erasing the boundary between CPU and GPU memory. Device-private pages already appear as swap entries in CPU page tables. GPU fault handlers already call into CPU MM infrastructure. The next generation of GPU drivers is CPU MM code.

3. CPU MM has 30 years of hard-won lessons

LRU thrashing mitigation, working set estimation, memory pressure signaling, shrinker priority, preemptible reclaim --- CPU MM has solved these problems through decades of production use. Every one of these solutions applies directly to GPU memory management, but only if you recognize the parallel.

4. Better debugging through analogy

When your GPU driver hits an eviction deadlock, the debugging techniques from CPU MM (lock ordering analysis, reclaim context tracing, memory pressure simulation) apply directly. When you see -ENOMEM from ttm_bo_validate(), you're looking at the GPU equivalent of an OOM condition --- and the resolution strategies are the same.

What This Column Is NOT

Not a CPU MM tutorial (though we explain enough CPU MM for each chapter to be self-contained)
Not a DRM driver tutorial (we assume you already write or maintain GPU drivers)
Not a hardware architecture comparison (we focus on the software abstractions in the kernel)
Not limited to one GPU vendor (TTM, GEM, and GPUVM are vendor-neutral frameworks)

The Column Structure

We progress from the most visible parallel (address spaces) to the deepest (convergence through HMM):

复制代码

Part I:   Address Spaces         mm_struct ↔ drm_gpuvm
Part II:  Physical Storage       buddy alloc ↔ TTM resources
Part III: Page Tables            PGD→PTE ↔ GPU page tables
Part IV:  Eviction & Swap        kswapd/LRU ↔ TTM eviction     ← centerpiece
Part V:   Demand Paging          page fault ↔ populate/validate
Part VI:  Hints & Policies       madvise ↔ GEM purgeable
Part VII: Convergence            HMM/SVM/ATS --- the merge
Part VIII: Engineering           debug, perf, design checklists

Each chapter follows a consistent four-part structure:

GPU Concept --- What you already know from DRM/TTM
CPU Mirror --- The exact CPU MM equivalent, with code
Why They Converge --- The architectural reason the parallel exists
Practical Implications --- What this means for your driver code

Prerequisites

To get the most from this column, you should have:

Working familiarity with the DRM subsystem (GEM objects, TTM buffer objects, IOCTLs)
Basic understanding of virtual memory (what a page table is, what a TLB does)
Ability to read Linux kernel C code
Access to a kernel source tree (6.x recommended) for following along

You do not need:

Deep CPU MM expertise (we build that as we go)
Knowledge of any specific GPU hardware architecture
Prior experience with HMM or ZONE_DEVICE

A Note on Convergence

When TTM was first written (2009), the GPU memory subsystem was a separate world. Buffers were explicitly allocated, explicitly placed, and explicitly bound. The "swap" analogy was implicit --- a design pattern, not a shared code path.

Today, that separation is dissolving:

drm_gpuvm (2023) explicitly models GPU VA spaces with the same abstractions as mm_struct
drm_gem_lru_scan() plugs GEM objects directly into the kernel's shrinker framework
GPU SVM drivers (drm_gpusvm, AMD KFD SVM) call hmm_range_fault(), which calls handle_mm_fault()
Device-private pages live in the CPU's page tables as swap-like entries

The GPU memory subsystem didn't just borrow terminology from CPU MM. It is rejoining CPU MM. Understanding both sides of this convergence is no longer optional for GPU systems engineers --- it is the job.

Let's Begin

In the next chapter, we start with the big picture: A Tale of Two Memory Hierarchies. We'll compare the CPU and GPU memory hierarchies side by side --- from register files to swap devices --- and establish why both inevitably arrive at virtual memory as the solution.

Then we dive in, one concept at a time, until the full isomorphism is laid bare.