文章目录
- 前言
- 一、text_poke_early
-
- [1.1 text_poke_early简介](#1.1 text_poke_early简介)
- [1.2 用途](#1.2 用途)
- 二、text_poke_smp
-
- [2.1 简介](#2.1 简介)
-
- [2.1.1 text_poke_smp函数](#2.1.1 text_poke_smp函数)
- [2.2.2 stop_machine_text_poke简介](#2.2.2 stop_machine_text_poke简介)
- [2.2.3 text_poke函数](#2.2.3 text_poke函数)
- [2.2 用途](#2.2 用途)
- [三、text_poke_smp 内核hook](#三、text_poke_smp 内核hook)
前言
Linux x86_64平台指令替换函数有两种类型:
(1)在内核启动阶段指令替换函数:text_poke_early
(2)在内核运行阶段指令替换函数:text_poke_smp/bp
在3.12.0版本以后,内核运行阶段指令替换函数由text_poke_smp改为text_poke_bp。
这里我以centos7 3.10.0为例主要介绍text_poke_smp函数。
一、text_poke_early
1.1 text_poke_early简介
c
// linux-3.10/arch/x86/kernel/alternative.c
/**
* text_poke_early - Update instructions on a live kernel at boot time
* @addr: address to modify
* @opcode: source of the copy
* @len: length to copy
*
* When you use this code to patch more than one byte of an instruction
* you need to make sure that other CPUs cannot execute this code in parallel.
* Also no thread must be currently preempted in the middle of these
* instructions. And on the local CPU you need to be protected again NMI or MCE
* handlers seeing an inconsistent instruction while you patch.
*/
void *__init_or_module text_poke_early(void *addr, const void *opcode,
size_t len)
{
unsigned long flags;
local_irq_save(flags);
memcpy(addr, opcode, len);
sync_core();
local_irq_restore(flags);
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */
return addr;
}
这段代码实现了在系统引导时(boot time)修改内核指令的函数text_poke_early()。它用于在引导过程中更新内核指令,以便在早期阶段进行修补。
函数的参数包括:
c
addr:要修改的指令的地址。
opcode:要复制的指令的源地址。
len:要复制的指令长度。
函数的实现步骤如下:
(1)使用local_irq_save()保存当前CPU的中断状态,并禁用中断。
(2)使用memcpy()将源指令(opcode)的内容复制到目标地址(addr)。
(3)调用sync_core()进行同步,以确保指令修改生效。
(4)使用local_irq_restore()恢复之前保存的中断状态,重新启用中断。
1.2 用途
主要用于以下几个地方:
(1)alternatives 指令替换:具体可以参考:Linux x86_64架构 动态替换 altinstructions
内核启动时进行alternatives 指令替换。
c
/* Replace instructions with better alternatives for this CPU type.
This runs before SMP is initialized to avoid SMP problems with
self modifying code. This implies that asymmetric systems where
APs have less capabilities than the boot processor are not handled.
Tough. Make sure you disable such features by hand. */
void __init_or_module apply_alternatives(struct alt_instr *start,
struct alt_instr *end)
{
struct alt_instr *a;
u8 *instr, *replacement;
u8 insnbuf[MAX_PATCH_LEN];
DPRINTK("%s: alt table %p -> %p\n", __func__, start, end);
/*
* The scan order should be from start to end. A later scanned
* alternative code can overwrite a previous scanned alternative code.
* Some kernel functions (e.g. memcpy, memset, etc) use this order to
* patch code.
*
* So be careful if you want to change the scan order to any other
* order.
*/
for (a = start; a < end; a++) {
instr = (u8 *)&a->instr_offset + a->instr_offset;
replacement = (u8 *)&a->repl_offset + a->repl_offset;
BUG_ON(a->replacementlen > a->instrlen);
BUG_ON(a->instrlen > sizeof(insnbuf));
BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
if (!boot_cpu_has(a->cpuid))
continue;
memcpy(insnbuf, replacement, a->replacementlen);
/* 0xe8 is a relative jump; fix the offset. */
if (*insnbuf == 0xe8 && a->replacementlen == 5)
*(s32 *)(insnbuf + 1) += replacement - instr;
add_nops(insnbuf + a->replacementlen,
a->instrlen - a->replacementlen);
text_poke_early(instr, insnbuf, a->instrlen);
}
}
(2)Linux jump label机制:具体可参考:Linux Static Keys和jump label机制
内核启动时修改jump label。
c
// linux-3.10/init/main.c
start_kernel()
-->jump_label_init()
-->arch_jump_label_transform_static()
c
__init_or_module void arch_jump_label_transform_static(struct jump_entry *entry,
enum jump_label_type type)
{
__jump_label_transform(entry, type, text_poke_early);
}
c
static void __jump_label_transform(struct jump_entry *entry,
enum jump_label_type type,
void *(*poker)(void *, const void *, size_t))
{
union jump_code_union code;
if (type == JUMP_LABEL_ENABLE) {
code.jump = 0xe9;
code.offset = entry->target -
(entry->code + JUMP_LABEL_NOP_SIZE);
} else
memcpy(&code, ideal_nops[NOP_ATOMIC5], JUMP_LABEL_NOP_SIZE);
(*poker)((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
}
(3)Linux Static calls机制:Linux Static calls机制
c
// linux-5.15/arch/x86/kernel/static_call.c
static void __ref __static_call_transform(void *insn, enum insn_type type, void *func)
{
const void *emulate = NULL;
int size = CALL_INSN_SIZE;
const void *code;
switch (type) {
case CALL:
code = text_gen_insn(CALL_INSN_OPCODE, insn, func);
if (func == &__static_call_return0) {
emulate = code;
code = &xor5rax;
}
break;
case NOP:
code = x86_nops[5];
break;
case JMP:
code = text_gen_insn(JMP32_INSN_OPCODE, insn, func);
break;
case RET:
code = text_gen_insn(RET_INSN_OPCODE, insn, func);
size = RET_INSN_SIZE;
break;
}
if (memcmp(insn, code, size) == 0)
return;
if (unlikely(system_state == SYSTEM_BOOTING))
return text_poke_early(insn, code, size);
text_poke_bp(insn, code, size, emulate);
}
(4)ftrace
二、text_poke_smp
2.1 简介
2.1.1 text_poke_smp函数
c
/**
* text_poke_smp - Update instructions on a live kernel on SMP
* @addr: address to modify
* @opcode: source of the copy
* @len: length to copy
*
* Modify multi-byte instruction by using stop_machine() on SMP. This allows
* user to poke/set multi-byte text on SMP. Only non-NMI/MCE code modifying
* should be allowed, since stop_machine() does _not_ protect code against
* NMI and MCE.
*
* Note: Must be called under get_online_cpus() and text_mutex.
*/
void *__kprobes text_poke_smp(void *addr, const void *opcode, size_t len)
{
struct text_poke_params tpp;
struct text_poke_param p;
p.addr = addr;
p.opcode = opcode;
p.len = len;
tpp.params = &p;
tpp.nparams = 1;
atomic_set(&stop_machine_first, 1);
wrote_text = 0;
/* Use __stop_machine() because the caller already got online_cpus. */
__stop_machine(stop_machine_text_poke, (void *)&tpp, cpu_online_mask);
return addr;
}
这段代码实现了在SMP(对称多处理器)系统上更新内核中的指令的函数text_poke_smp()。它使用stop_machine()函数在SMP系统上修改多字节指令。这允许用户在SMP系统上修改多字节的内核指令。但需要注意的是,由于stop_machine()函数无法保护代码免受NMI(非屏蔽中断)和MCE(机器检查异常)的影响,因此只应允许非NMI和MCE的代码进行修改。
函数的参数包括:
c
addr:要修改的指令的地址。
opcode:要复制的指令的源地址。
len:要复制的指令长度。
主要调用stop_machine_text_poke函数。
2.2.2 stop_machine_text_poke简介
c
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
*/
static atomic_t stop_machine_first;
static int wrote_text;
struct text_poke_params {
struct text_poke_param *params;
int nparams;
};
static int __kprobes stop_machine_text_poke(void *data)
{
struct text_poke_params *tpp = data;
struct text_poke_param *p;
int i;
if (atomic_xchg(&stop_machine_first, 0)) {
for (i = 0; i < tpp->nparams; i++) {
p = &tpp->params[i];
text_poke(p->addr, p->opcode, p->len);
}
smp_wmb(); /* Make sure other cpus see that this has run */
wrote_text = 1;
} else {
while (!wrote_text)
cpu_relax();
smp_mb(); /* Load wrote_text before following execution */
}
for (i = 0; i < tpp->nparams; i++) {
p = &tpp->params[i];
flush_icache_range((unsigned long)p->addr,
(unsigned long)p->addr + p->len);
}
/*
* Intel Archiecture Software Developer's Manual section 7.1.3 specifies
* that a core serializing instruction such as "cpuid" should be
* executed on _each_ core before the new instruction is made visible.
*/
sync_core();
return 0;
}
这段代码实现了使用stop_machine()函数进行内核文本交叉修改的功能。它通过stop_machine_text_poke()函数实现了在停机状态下修改内核文本的操作。
代码中使用了以下全局变量:
c
atomic_t类型的stop_machine_first:用于标记是否为第一次执行stop_machine_text_poke()函数。
int类型的wrote_text:表示是否已经修改了文本。
结构体text_poke_params定义了传递给stop_machine_text_poke()函数的参数:
c
params:指向text_poke_param结构体数组的指针,用于存储要修改的参数。
nparams:参数数组的长度。
stop_machine_text_poke()函数的实现步骤如下:
(1)通过传递的参数data获取text_poke_params结构体指针tpp。
(2)如果atomic_xchg()函数将stop_machine_first的值交换为0,表示当前是第一次执行stop_machine_text_poke()函数,则执行以下操作:
a:遍历tpp->params数组,依次获取text_poke_param结构体指针p。
b:调用text_poke()函数,将p->addr地址处的指令替换为p->opcode指令,长度为p->len。
c: 使用smp_wmb()确保其他CPU能够看到此次修改。
d:将wrote_text设置为1,表示已经修改了文本。
(3)否则,即不是第一次执行stop_machine_text_poke()函数,则进入循环,直到wrote_text变为非0。
(4)使用smp_mb()在执行后加载wrote_text之前进行内存屏障操作。
(5)遍历tpp->params数组,依次获取text_poke_param结构体指针p。
(6)调用flush_icache_range()函数,刷新从p->addr到(p->addr + p->len)范围内的指令缓存。
(7)使用sync_core()函数进行核心同步,确保新指令在每个核心上都可见。
2.2.3 text_poke函数
c
/**
* text_poke - Update instructions on a live kernel
* @addr: address to modify
* @opcode: source of the copy
* @len: length to copy
*
* Only atomic text poke/set should be allowed when not doing early patching.
* It means the size must be writable atomically and the address must be aligned
* in a way that permits an atomic write. It also makes sure we fit on a single
* page.
*
* Note: Must be called under text_mutex.
*/
void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
unsigned long flags;
char *vaddr;
struct page *pages[2];
int i;
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
} else {
pages[0] = virt_to_page(addr);
WARN_ON(!PageReserved(pages[0]));
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
BUG_ON(!pages[0]);
local_irq_save(flags);
set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
if (pages[1])
set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
clear_fixmap(FIX_TEXT_POKE0);
if (pages[1])
clear_fixmap(FIX_TEXT_POKE1);
local_flush_tlb();
sync_core();
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */
for (i = 0; i < len; i++)
BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
local_irq_restore(flags);
return addr;
}
这段代码实现了在实时内核上更新指令的函数text_poke()。它用于在实时内核中修改指定地址的指令。
函数的参数包括:
c
addr:要修改的指令的地址。
opcode:要复制的指令的源地址。
len:要复制的指令的长度。
函数的实现步骤如下:
(1)根据地址addr的特性,判断是否位于核心内核文本区域。如果不是核心内核文本区域,使用vmalloc_to_page()将地址转换为对应的内存页,并存储在pages数组中的第一个位置和第二个位置。
(2)如果地址位于核心内核文本区域,使用virt_to_page()将地址转换为对应的内存页,并存储在pages数组中的第一个位置。同时,使用WARN_ON()检查该页是否为保留页。
(3)使用BUG_ON()检查pages[0]是否为空。
(4)使用local_irq_save()保存当前中断状态,并禁用中断。
(5)使用set_fixmap()将pages[0]与FIX_TEXT_POKE0进行映射,将其物理地址设置为页表项的值。
(6)如果pages[1]不为空,使用set_fixmap()将pages[1]与FIX_TEXT_POKE1进行映射,将其物理地址设置为页表项的值。
(7)将fix_to_virt()的结果赋给vaddr,即获取FIX_TEXT_POKE0映射的虚拟地址。
(8)使用memcpy()将opcode的内容复制到vaddr和addr在页内的相对偏移位置上,长度为len。
(9)使用clear_fixmap()清除FIX_TEXT_POKE0的映射。
(10)如果pages[1]不为空,使用clear_fixmap()清除FIX_TEXT_POKE1的映射。
(11)使用local_flush_tlb()刷新TLB(转换后备缓冲)以确保新的页表项生效。
(12)使用sync_core()进行核心同步,确保新指令在每个核心上都可见。
(13)使用一个循环遍历检查修改后的指令是否和原始指令一致。如果不一致,使用BUG_ON()触发错误。
(14)使用local_irq_restore()恢复之前保存的中断状态。
(15)返回被修改的地址。
该函数用于在实时内核中修改指令。它将指定地址的指令替换为新的指令,并进行了一系列的同步操作和错误检查,以确保修改的指令能够正确执行。需要注意的是,在调用text_poke()函数之前,必须在text_mutex的保护下进行,以确保互斥访问。
2.2 用途
(1)Linux jump label机制:具体可参考:Linux Static Keys和jump label机制
内核运行时修改jump label。
c
static_key_slow_inc/dec
-->jump_label_update()
-->__jump_label_update()
-->arch_jump_label_transform()
c
void arch_jump_label_transform(struct jump_entry *entry,
enum jump_label_type type)
{
get_online_cpus();
mutex_lock(&text_mutex);
__jump_label_transform(entry, type, text_poke_smp);
mutex_unlock(&text_mutex);
put_online_cpus();
}
(2)kprobe机制:优化kprobe,使用jmp指令替换int3指令。
c
/*
* Replace breakpoints (int3) with relative jumps.
* Caller must call with locking kprobe_mutex and text_mutex.
*/
void __kprobes arch_optimize_kprobes(struct list_head *oplist)
{
struct optimized_kprobe *op, *tmp;
int c = 0;
list_for_each_entry_safe(op, tmp, oplist, list) {
WARN_ON(kprobe_disabled(&op->kp));
/* Setup param */
setup_optimize_kprobe(&jump_poke_params[c],
jump_poke_bufs[c].buf, op);
list_del_init(&op->list);
if (++c >= MAX_OPTIMIZE_PROBES)
break;
}
/*
* text_poke_smp doesn't support NMI/MCE code modifying.
* However, since kprobes itself also doesn't support NMI/MCE
* code probing, it's not a problem.
*/
text_poke_smp_batch(jump_poke_params, c);
}
(3)Linux Static calls机制:Linux Static calls机制
c
// linux-5.15/arch/x86/kernel/static_call.c
static void __ref __static_call_transform(void *insn, enum insn_type type, void *func)
{
const void *emulate = NULL;
int size = CALL_INSN_SIZE;
const void *code;
switch (type) {
case CALL:
code = text_gen_insn(CALL_INSN_OPCODE, insn, func);
if (func == &__static_call_return0) {
emulate = code;
code = &xor5rax;
}
break;
case NOP:
code = x86_nops[5];
break;
case JMP:
code = text_gen_insn(JMP32_INSN_OPCODE, insn, func);
break;
case RET:
code = text_gen_insn(RET_INSN_OPCODE, insn, func);
size = RET_INSN_SIZE;
break;
}
if (memcmp(insn, code, size) == 0)
return;
if (unlikely(system_state == SYSTEM_BOOTING))
return text_poke_early(insn, code, size);
text_poke_bp(insn, code, size, emulate);
}
(4)ftrace
三、text_poke_smp 内核hook
text_poke_smp 可以在内核运行时修改内存指令,因此可以用来进行内核hook。
例子请参考:
https://blog.csdn.net/bin_linux96/article/details/105776231
https://blog.csdn.net/dog250/article/details/105254739
https://blog.csdn.net/dog250/article/details/105787199
https://blog.csdn.net/dog250/article/details/84201114
https://blog.csdn.net/qq_21792169/article/details/84583275
https://richardweiyang-2.gitbook.io/kernel-exploring/00-index/04-ftrace_internal