Refault Distance算法详解

一、算法背景与问题起源

1.1 传统LRU的局限性

在Linux内核中，页面缓存（page cache）管理长期面临一个核心问题：缓存回收抖动（cache thrashing）。传统LRU算法在内存压力下会无差别地回收页面，导致刚刚被回收的活跃页面很快又被访问，不得不重新从磁盘读取，造成性能抖动。

1.2 核心问题

如何区分"偶然访问"和"真正活跃"的页面？
如何避免回收那些即将被再次访问的页面？

二、底层逻辑与核心思想

2.1 基本概念

Refault Distance（重缺页距离）：一个页面从被回收（eviction）到再次被访问（refault）之间，系统发生的页面缓存访问次数。

2.2 核心数据结构

// 简化版数据结构示意

struct page {

unsigned long flags; // 页面标志位

struct list_head list; // LRU链表链接

unsigned long private; // 页面私有数据

// ... 其他字段

};

// 关键统计信息存储在zone结构体中

struct zone {

// ...

unsigned long workingset_refault; // 重缺页计数

unsigned long workingset_activate; // 激活计数

unsigned long inactive_age; // inactive列表年龄计数器

// ...

};

2.3 算法核心逻辑

2.3.1 三个关键时间点

首次访问：页面读入内存，加入inactive列表尾部
回收时刻：页面从inactive列表头部被移除，记录当时的"世代号"
再次访问：发生缺页，计算距离上次回收的"距离"

2.3.2 关键公式

refault_distance = current_inactive_age - eviction_inactive_age

active_threshold = inactive_list_size * 3/4

决策规则：

如果 `refault_distance < active_threshold`：页面仍在"活跃工作集"中，应放入active列表
如果 `refault_distance >= active_threshold`：页面已"冷却"，放入inactive列表

三、详细案例说明

3.1 场景设定

假设系统配置：

内存zone的inactive列表大小：1000个页面
active_threshold = 1000 × 3/4 = 750
初始inactive_age计数器：0

3.2 页面生命周期跟踪

步骤1：页面A首次访问

时间点 T0:

页面A从磁盘读入
加入inactive列表尾部
设置A.eviction_age = 当前inactive_age(0)
inactive_age++ (变为1)

步骤2：系统内存压力，开始回收

时间点 T100:

inactive_age计数器增长到 100
页面A仍在inactive列表中
此时有其他页面访问，inactive_age持续增长

步骤3：页面A被回收

时间点 T500:

inactive_age = 500
页面A从inactive列表头部被移除
记录 A.evicted_at_age = 500
页面内容仍保留在内存，但标记为可回收

步骤4：页面A再次被访问

时间点 T800:

当前inactive_age = 800
进程再次访问页面A，发生缺页
计算refault_distance = 800 - 500 = 300
比较：300 < 750 (active_threshold)

步骤5：算法决策

由于300 < 750，算法判断：

页面A在短时间内被重新访问
说明A属于活跃工作集
将A直接放入active列表尾部，避免立即被再次回收

3.3 对比案例：非活跃页面

假设页面B：

在T600被回收 (evicted_at_age = 600)
在T1500再次访问
refault_distance = 1500 - 600 = 900
900 > 750，放入inactive列表

四、算法实现细节

4.1 关键函数调用链

handle_pte_fault() → do_swap_page() → workingset_refault()

↓

page_cache_get_speculative()

↓

workingset_activation() // 决策点

4.2 核心代码逻辑（简化）

void workingset_refault(struct page *page, unsigned long refault_distance)

{

unsigned long active_file; // active列表大小

unsigned long inactive_file; // inactive列表大小

unsigned long threshold;

// 获取当前列表大小

active_file = node_page_state(NR_ACTIVE_FILE);

inactive_file = node_page_state(NR_INACTIVE_FILE);

// 计算阈值

threshold = (inactive_file + active_file) * 3 / 4;

// 关键决策

if (refault_distance < threshold) {

// 属于活跃工作集

SetPageActive(page);

workingset_activate++; // 统计计数

list_add(&page->lru, &lru_active);

} else {

// 不属于活跃工作集

ClearPageActive(page);

list_add(&page->lru, &lru_inactive);

}

五、如何使用与监控

5.1 监控相关指标

查看workingset统计：

查看关键统计信息

cat /proc/vmstat | grep workingset

输出示例：

workingset_refault 123456

workingset_activate 98765

workingset_nodereclaim 0

查看页面缓存状态

cat /proc/meminfo | grep -E "(Active|Inactive).*file"

Active(file): 123456 kB

Inactive(file): 789012 kB

使用tracepoint跟踪：

启用refault跟踪

echo 1 > /sys/kernel/debug/tracing/events/workingset/workingset_refault/enable

查看跟踪结果

cat /sys/kernel/debug/tracing/trace_pipe

5.2 性能调优参数

关键参数：`vfs_cache_pressure`

查看当前值

cat /proc/sys/vm/vfs_cache_pressure

调整参数（默认值=100）

值越大，回收page cache越积极

echo 150 > /proc/sys/vm/vfs_cache_pressure

参数含义：

`vfs_cache_pressure = 100`：默认行为
`vfs_cache_pressure < 100`：减少回收，保留更多缓存
`vfs_cache_pressure > 100`：更积极回收缓存

5.3 实际应用建议

场景1：数据库服务器

数据库工作集较大，减少缓存回收

echo 50 > /proc/sys/vm/vfs_cache_pressure

监控refault率

watch -n 1 'cat /proc/vmstat | grep workingset'

场景2：内存紧张的环境

更积极回收缓存

echo 200 > /proc/sys/vm/vfs_cache_pressure

同时监控缺页率

vmstat 1 观察si/so和pi/po列

5.4 诊断工具

使用`trace-cmd`深入分析：

记录workingset事件

trace-cmd record -e workingset:*

分析结果

trace-cmd report | grep -A5 -B5 "refault"

编写自定义监控脚本：

python

!/usr/bin/env python3

import time

def monitor_workingset():

prev_refault = 0

prev_activate = 0

while True:

with open('/proc/vmstat', 'r') as f:

data = f.read()

提取关键指标

refault = int( $x for x in data.split() if 'workingset_refault' in x$ $1$ )

activate = int( $x for x in data.split() if 'workingset_activate' in x$ $1$ )

计算速率

refault_rate = refault - prev_refault

activate_rate = activate - prev_activate

if prev_refault > 0:

hit_ratio = activate_rate / max(refault_rate, 1) * 100

print(f"Refault率: {refault_rate}/s, 激活率: {activate_rate}/s, 命中率: {hit_ratio:.1f}%")

prev_refault, prev_activate = refault, activate

time.sleep(1)

六、算法优势与局限

6.1 优势

自适应性：根据实际访问模式动态调整
抗抖动：有效避免缓存回收抖动
低开销：仅需维护少量元数据
预测性：能预测页面未来访问可能性

6.2 局限

冷启动问题：新系统需要时间建立访问模式
模式突变：工作集突然变化时可能需要调整期
内存开销：每个页面需要存储额外元数据

七、总结

Refault Distance算法通过量化页面"回收-再访问"的距离，智能地区分偶然访问和真正活跃的页面。它在Linux内核中的实现体现了以下设计哲学：

基于证据的决策：用实际访问数据而非启发式规则
轻量级跟踪：最小化性能开销
自调节：适应不同工作负载特征

在实际应用中，理解这一算法有助于：

诊断缓存性能问题
优化内存敏感型应用
设计更高效的内存使用策略
理解Linux内存管理的深层机制

通过适当的监控和调优，Refault Distance算法能显著提升系统的内存使用效率和整体性能。