The Observation
Open include/drm/ttm/ttm_tt.h in the Linux kernel and you will find this:
c
struct ttm_tt {
struct page **pages;
uint32_t page_flags;
struct file *swap_storage; /* Pointer to shmem struct file for swap storage */
...
};
#define TTM_TT_FLAG_SWAPPED BIT(0)
Now open include/linux/swap.h:
c
/* Used for pages in swap cache */
struct swap_info_struct {
struct file *swap_file;
...
};
The terminology is not a coincidence. The GPU memory subsystem in Linux is a virtual memory system. It has virtual address spaces, page tables, demand paging, LRU eviction, swap-out to slower storage, and swap-in on re-access. It just runs on a different processor.
This column exists because understanding that structural identity will make you a fundamentally better GPU systems engineer.
The Thesis
The Linux GPU memory management subsystem (TTM/GEM/GPUVM) is a domain-specific re-derivation of classical CPU virtual memory --- not merely inspired by it, but architecturally isomorphic to it.
Both subsystems solve the same abstract problem:
Given a processor with a virtual address space larger than its fast local memory, multiplex limited physical storage among competing consumers using indirection (page tables), lazy allocation (demand paging), and capacity management (eviction/swap).
The CPU solves this for processes competing for DRAM. The GPU solves this for buffer objects competing for VRAM. The algorithms are the same. The data structures are the same. In many cases, even the function names are the same.
The Evidence (Preview)
Here is what we will demonstrate across this column:
Naming
| CPU MM Term | GPU MM (DRM/TTM) Direct Equivalent |
|---|---|
swap_storage (swap backing file) |
ttm_tt.swap_storage (shmem backing file) |
do_swap_page() (swap-in handler) |
ttm_tt_swapin() (GPU swap-in handler) |
shrink_inactive_list() |
drm_gem_lru_scan() |
handle_mm_fault() |
ttm_tt_populate() |
MADV_DONTNEED |
DRM_GEM_OBJECT_PURGEABLE |
These are not metaphors. They are the actual symbol names in the kernel source.
Structure
CPU: mm_struct ──contains──→ vm_area_struct ──maps──→ struct page (in RAM)
↕ eviction
swap entry (on disk)
GPU: drm_gpuvm ──contains──→ drm_gpuva ──maps──→ ttm_resource (in VRAM)
↕ eviction
swap_storage (in shmem)
Algorithm
The ttm_tt_swapin() function in the kernel reads:
c
int ttm_tt_swapin(struct ttm_tt *ttm)
{
struct file *swap_storage = ttm->swap_storage;
struct address_space *swap_space = swap_storage->f_mapping;
for (i = 0; i < ttm->num_pages; ++i) {
from_page = shmem_read_mapping_page_gfp(swap_space, i, gfp_mask);
to_page = ttm->pages[i];
copy_highpage(to_page, from_page);
put_page(from_page);
}
ttm->swap_storage = NULL;
ttm->page_flags &= ~TTM_TT_FLAG_SWAPPED;
return 0;
}
Compare with the CPU's do_swap_page() logic:
- Read page from swap backing store
- Copy into allocated page frame
- Clear swap state
The structure is identical. The difference is only which processor will access the result.
Why GPU Developers Should Care
1. You already know more than you think
If you understand TTM eviction, you understand CPU page reclaim. If you understand drm_gpuvm, you understand mm_struct. The conceptual framework transfers directly --- but only if you see the connection explicitly.
2. The two worlds are merging
HMM (Heterogeneous Memory Management), SVM (Shared Virtual Memory), and ATS/PRI are erasing the boundary between CPU and GPU memory. Device-private pages already appear as swap entries in CPU page tables. GPU fault handlers already call into CPU MM infrastructure. The next generation of GPU drivers is CPU MM code.
3. CPU MM has 30 years of hard-won lessons
LRU thrashing mitigation, working set estimation, memory pressure signaling, shrinker priority, preemptible reclaim --- CPU MM has solved these problems through decades of production use. Every one of these solutions applies directly to GPU memory management, but only if you recognize the parallel.
4. Better debugging through analogy
When your GPU driver hits an eviction deadlock, the debugging techniques from CPU MM (lock ordering analysis, reclaim context tracing, memory pressure simulation) apply directly. When you see -ENOMEM from ttm_bo_validate(), you're looking at the GPU equivalent of an OOM condition --- and the resolution strategies are the same.
What This Column Is NOT
- Not a CPU MM tutorial (though we explain enough CPU MM for each chapter to be self-contained)
- Not a DRM driver tutorial (we assume you already write or maintain GPU drivers)
- Not a hardware architecture comparison (we focus on the software abstractions in the kernel)
- Not limited to one GPU vendor (TTM, GEM, and GPUVM are vendor-neutral frameworks)
The Column Structure
We progress from the most visible parallel (address spaces) to the deepest (convergence through HMM):
Part I: Address Spaces mm_struct ↔ drm_gpuvm
Part II: Physical Storage buddy alloc ↔ TTM resources
Part III: Page Tables PGD→PTE ↔ GPU page tables
Part IV: Eviction & Swap kswapd/LRU ↔ TTM eviction ← centerpiece
Part V: Demand Paging page fault ↔ populate/validate
Part VI: Hints & Policies madvise ↔ GEM purgeable
Part VII: Convergence HMM/SVM/ATS --- the merge
Part VIII: Engineering debug, perf, design checklists
Each chapter follows a consistent four-part structure:
- GPU Concept --- What you already know from DRM/TTM
- CPU Mirror --- The exact CPU MM equivalent, with code
- Why They Converge --- The architectural reason the parallel exists
- Practical Implications --- What this means for your driver code
Prerequisites
To get the most from this column, you should have:
- Working familiarity with the DRM subsystem (GEM objects, TTM buffer objects, IOCTLs)
- Basic understanding of virtual memory (what a page table is, what a TLB does)
- Ability to read Linux kernel C code
- Access to a kernel source tree (6.x recommended) for following along
You do not need:
- Deep CPU MM expertise (we build that as we go)
- Knowledge of any specific GPU hardware architecture
- Prior experience with HMM or ZONE_DEVICE
A Note on Convergence
When TTM was first written (2009), the GPU memory subsystem was a separate world. Buffers were explicitly allocated, explicitly placed, and explicitly bound. The "swap" analogy was implicit --- a design pattern, not a shared code path.
Today, that separation is dissolving:
drm_gpuvm(2023) explicitly models GPU VA spaces with the same abstractions asmm_structdrm_gem_lru_scan()plugs GEM objects directly into the kernel's shrinker framework- GPU SVM drivers (
drm_gpusvm, AMD KFD SVM) callhmm_range_fault(), which callshandle_mm_fault() - Device-private pages live in the CPU's page tables as swap-like entries
The GPU memory subsystem didn't just borrow terminology from CPU MM. It is rejoining CPU MM. Understanding both sides of this convergence is no longer optional for GPU systems engineers --- it is the job.
Let's Begin
In the next chapter, we start with the big picture: A Tale of Two Memory Hierarchies. We'll compare the CPU and GPU memory hierarchies side by side --- from register files to swap devices --- and establish why both inevitably arrive at virtual memory as the solution.
Then we dive in, one concept at a time, until the full isomorphism is laid bare.