内核内存锁定机制与用户空间内存锁定的交互分析

内核内存锁定机制与用户空间内存锁定的交互分析

在Linux系统中,内存锁定机制通过mlockmlockall系统调用实现用户空间内存的物理驻留保证。但当应用程序通过ioctl等系统调用触发内核分配内存时,这种内核分配的内存的锁定行为需要从以下四个层面进行深入分析:

一、用户空间与内核空间的内存管理边界

1. 地址空间隔离机制

Linux采用双地址空间模型(用户空间0-3GB,内核空间3-4GB x86架构),通过CR3寄存器切换页表实现隔离。用户进程通过系统调用陷入内核态时,CPU自动切换到内核页表,此时访问的内核内存属于全局地址空间,与用户进程无关。

2. 内存分配路径差异

  • 用户空间分配 :通过mallocbrk/mmap→页错误→内核分配物理页→建立用户页表映射
  • 内核空间分配 :通过kmalloc/vmalloc直接调用SLAB或伙伴系统,建立内核页表映射

3. 锁定机制作用域

mlockall(MCL_CURRENT)仅锁定当前用户页表项(PTE)中已存在的映射,内核通过struct mm_struct管理进程内存,锁定操作通过设置VM_LOCKED标志实现,该标志仅影响用户VMA区域。

二、内核内存分配的具体场景分析

1. 直接内核内存分配

当驱动程序通过ioctl调用kmalloc(GFP_KERNEL)分配内存时:

c 复制代码
// 典型驱动代码片段
static long my_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
    void *kernel_buf = kmalloc(BUF_SIZE, GFP_KERNEL);
    copy_from_user(kernel_buf, user_buf, BUF_SIZE);
    // 数据处理
    kfree(kernel_buf);
    return 0;
}

此类内存:

  • 分配于内核地址空间的高端内存区域(ZONE_HIGHMEM)
  • 不被任何用户页表映射
  • 通过__get_free_pages最终调用伙伴系统分配

2. DMA缓冲区分配

使用dma_alloc_coherent接口时:

c 复制代码
void *dma_buf = dma_alloc_coherent(dev, size, &dma_handle, GFP_KERNEL);

此时:

  • 内存可能来自DMA区域(ZONE_DMA)
  • 建立永久的内核线性映射(可通过kmap访问)
  • 产生/proc/iomem中的资源记录

3. 用户态直接访问的内核内存

通过mmap实现用户空间直接访问:

c 复制代码
// 驱动mmap实现
static int my_mmap(struct file *filp, struct vm_area_struct *vma)
{
    remap_pfn_range(vma, vma->vm_start, pfn, size, vma->vm_page_prot);
    return 0;
}

这种情况:

  • 用户页表建立到内核物理页的映射
  • 内存仍属于内核管理范畴
  • mlock可锁定此类映射页面(因属于用户VMA)

三、内存锁定的实现机制对比

1. 用户空间锁定流程

c 复制代码
// mlockall系统调用路径
SYSCALL_DEFINE1(mlockall, int, flags)
{
    vm_flags |= VM_LOCKED;
    apply_to_page_range(...mlock_fixup...);
}

关键步骤:

  • 遍历进程所有VMA区域
  • 设置VM_LOCKED标志
  • 调用mlock_fixup立即锁定现有页面

2. 内核内存锁定特性

内核页面默认具有以下属性:

  • 页表项_PAGE_PRESENT始终有效
  • 不被加入LRU链表(通过__SetPageLRU
  • 通过mark_page_accessed维护访问状态
  • 部分关键页面标记为PG_reserved

3. 锁定效果监测

通过/proc//smaps可观察:

bash 复制代码
7f8e6c000000-7f8e6c021000 rw-p 00000000 00:00 0 
Size:                132 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Locked:                0 kB  # 内核分配页面无锁定计数

四、实际测试与性能影响

1. 测试方案设计

使用以下模块验证:

c 复制代码
// 测试驱动模块
static long test_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
    struct page *page = alloc_pages(GFP_KERNEL, 0);
    // 记录物理地址供后续检查
    return 0;
}

// 用户程序
mlockall(MCL_CURRENT);
ioctl(fd, ALLOC_CMD);
// 读取/proc/pagemap验证页面状态

2. 结果分析

通过pagemap工具解析:

python 复制代码
# pagemap解析脚本
with open('/proc/pid/pagemap', 'rb') as f:
    f.seek(vpn * 8)
    entry = struct.unpack('Q', f.read(8))[0]
    pfn = entry & 0x7fffffffffffff
    swapped = (entry >> 62) & 1

测试发现:

  • 内核分配的页面未出现在用户空间VMA区域
  • pagemap中对应虚拟地址无有效PFN
  • vmstat统计的nr_mlock计数无变化

3. 性能影响评估

当大量内核内存分配导致系统内存压力时:

  • 用户空间锁定内存受到RLIMIT_MEMLOCK保护
  • 内核通过psi监控触发直接内存回收
  • 可能产生mm_lock竞争导致调度延迟

五、结论与最佳实践

通过上述分析可得出结论:

  1. 作用域隔离mlockall仅影响用户空间VMA映射的页面,内核分配的内存不受其控制
  2. 生命周期差异:内核内存由SLAB/伙伴系统管理,独立于进程生命周期
  3. 安全边界:防止用户空间通过内存锁定干扰内核内存管理

对于需要保证内核内存驻留的场景,建议:

  • 驱动程序使用GFP_NOIOGFP_NOFS避免递归I/O
  • 关键数据结构采用vmalloc并配合mlock用户映射区域
  • 对于DMA操作使用dma_alloc_attrs设置DMA_ATTR_NO_KERNEL_MAPPING

最终架构示意图如下:

复制代码
+-------------------+     +-------------------+
|  User Space       |     |  Kernel Space     |
|                   |     |                   |
|  mlock()区域       |     |  kmalloc内存池     |
|  (VM_LOCKED)     |     |  (无锁定标志)       |
+--------+----------+     +---------+---------+
         |                          |
         |          Page Table      |
         +--------------------------> PFN管理
                                     |
                              +------v------+
                              | 物理内存      |
                              | (DRAM)     |
                              +-------------+

Citations:

1\] https://man7.org/linux/man-pages/man2/mlockall.2.html \[2\] https://man.archlinux.org/man/mlockall.2.en \[3\] https://pubs.opengroup.org/onlinepubs/7908799/xsh/mlockall.html \[4\] https://www.kernel.org/doc/html/v6.9/admin-guide/mm/pagemap.html \[5\] https://community.osr.com/t/massive-data-exchange-between-user-and-kernel-spaces-best-practice-question/50419 \[6\] https://stackoverflow.com/questions/4535379/do-kernel-pages-get-swapped-out \[7\] https://docs.kernel.org/mm/unevictable-lru.html \[8\] https://stackoverflow.com/questions/10017928/how-do-you-understand-mlockall-man-page \[9\] https://stackoverflow.com/questions/63929431/if-i-mmap-a-memory-region-with-no-access-bits-set-does-mlockall-still-force-it \[10\] https://www.osronline.com/article.cfm%5Eid=39.htm \[11\] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/using_mlock_to_avoid_page_io \[12\] https://linux.kernel.narkive.com/Dni31jcZ/how-to-get-the-physical-page-addresses-from-a-kernel-virtual-address-for-dma-sg-list \[13\] https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/mm/mlock.c \[14\] http://man.he.net/man2/mlockall \[15\] https://www.kernel.org/doc/html/v5.4/vm/unevictable-lru.html \[16\] https://www.ibm.com/docs/en/aix/7.2?topic=m-mlockall-munlockall-subroutine \[17\] https://discuss.elastic.co/t/cannot-set-up-mlockall-true-on-redhat-6-6/1059 \[18\] https://www.usenix.org/system/files/conference/atc13/atc13-menychtas.pdf \[19\] https://stackoverflow.com/questions/63929431/if-i-mmap-a-memory-region-with-no-access-bits-set-does-mlockall-still-force-it \[20\] https://www3.physnet.uni-hamburg.de/physnet/Tru64-Unix/HTML/APS33DTE/DOCU_005.HTM \[21\] https://eric-lo.gitbook.io/memory-mapped-io/pin-the-page \[22\] https://www.gnu.org/s/libc/manual/html_node/Page-Lock-Functions.html \[23\] https://www.ibm.com/docs/en/zos/2.4.0?topic=functions-mlockall-lock-address-space-process \[24\] https://forums.codeguru.com/showthread.php?383608-mlockall \[25\] https://www.kernel.org/doc/html/v5.18/vm/unevictable-lru.html \[26\] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/8/html/optimizing_rhel_8_for_real_time_for_low_latency_operation/assembly_using-mlock-system-calls-on-rhel-for-real-time_optimizing-rhel8-for-real-time-for-low-latency-operation \[27\] https://www.cs.auckland.ac.nz/references/unix/digital/APS33DTE/DOCU_005.HTM \[28\] https://stackoverflow.com/questions/56411164/can-i-ask-the-kernel-to-populate-fault-in-a-range-of-anonymous-pages \[29\] https://wiki.linuxfoundation.org/realtime/documentation/howto/applications/memory \[30\] https://developer.ibm.com/articles/l-kernel-memory-access/ \[31\] https://forums.raspberrypi.com/viewtopic.php?t=296233 \[32\] https://stackoverflow.com/questions/36593457/protecting-shared-memory-segment-between-kernel-and-user-space \[33\] https://man7.org/linux/man-pages/man2/perf_event_open.2.html \[34\] https://docs.kernel.org/arch/x86/mtrr.html \[35\] https://www.tutorialspoint.com/unix_system_calls/mlock.htm \[36\] https://www.qnx.com/developers/docs/7.1/ \[37\] https://www.kernel.org/doc/html/v4.13/gpu/drm-mm.html \[38\] https://www.kernel.org/doc/gorman/html/understand/understand013.html \[39\] https://askubuntu.com/questions/157793/why-is-swap-being-used-even-though-i-have-plenty-of-free-ram \[40\] https://stackoverflow.com/questions/42312978/ \[41\] https://docs.couchbase.com/server/current/install/install-swap-space.html \[42\] https://www.reddit.com/r/linux/comments/1ecg0ov/does_swap_cost_kernel_memory/ \[43\] https://www.kernel.org/doc/gorman/html/understand/understand014.html \[44\] https://serverfault.com/questions/48486/what-is-swap-memory \[45\] https://machaddr.substack.com/p/linux-swap-memory-evolution-tuning \[46\] https://www.infradead.org/git/?p=users%2Fjedix%2Flinux-maple.git%3Ba%3Dblob_plain%3Bf%3Dmm%2Fmlock.c%3Bhb%3D5499315668dae0e0935489075aadac4a91ff04ff \[47\] https://lkml2.uits.iu.edu/hypermail/linux/kernel/0201.1/0205.html \[48\] https://unix.stackexchange.com/questions/600699/does-page-swapping-happen-when-the-main-memory-is-still-available \[49\] https://kernel.org/doc/gorman/html/understand/understand014.html \[50\] https://www.scoutapm.com/blog/understanding-page-faults-and-memory-swap-in-outs-when-should-you-worry \[51\] https://serverfault.com/questions/1007070/is-it-possible-to-manually-swap-out-a-page-by-its-virtual-address \[52\] https://www.openeuler.org/en/blog/liqunsheng/2020-11-26-swap.html \[53\] https://www.reddit.com/r/linuxquestions/comments/17t3110/how_does_the_kernel_use_swap_space_and_what_are/ \[54\] https://unix.stackexchange.com/questions/678806/how-does-the-kernel-decide-between-disk-cache-vs-swap \[55\] https://www.kernel.org/doc/html/v5.0/vm/unevictable-lru.html \[56\] https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/using_mlock_to_avoid_page_io \[57\] https://lkml.indiana.edu/1709.3/01588.html \[58\] https://kernel.googlesource.com/pub/scm/linux/kernel/git/daeinki/drm-exynos/+/refs/tags/drm-fixes-2022-04-23/Documentation/vm/unevictable-lru.rst \[59\] https://github.com/tinganho/linux-kernel/blob/master/mm/mlock.c

相关推荐
u0109362655 分钟前
Linux电源管理(五),发热管理(thermal),温度控制 (结合设备树 和ACPI Source Language(ASL)分析)
linux
itachi-uchiha7 分钟前
Linux上的rm和srm 命令
linux·运维·服务器
Waitccy19 分钟前
Linux 系统安全基线检查:入侵防范测试标准与漏洞修复方法
linux·运维·网络·安全·系统安全·等保
ShiYQ@师1 小时前
Ubuntu 18.04.6下OpenSSL与OpenSSH版本升级
linux·ubuntu
带鱼吃猫1 小时前
Linux系统:文件系统前言,详解CHS&LBA地址
linux·运维·服务器
默默提升实验室2 小时前
Linux 系统如何挂载U盘
linux·运维·服务器
mahuifa2 小时前
python实现usb热插拔检测(linux)
linux·服务器·python
Lw老王要学习2 小时前
Linux架构篇、第五章git2.49.0部署与使用
linux·运维·git·云计算·it
啊吧怪不啊吧3 小时前
Linux之初见进程
linux·centos
上天_去_做颗惺星 EVE_BLUE3 小时前
Docker入门教程:常用命令与基础概念
linux·运维·macos·docker·容器·bash