文章目录
- 简介
- [readelf -S](#readelf -S)
- kprobe/handle_mm_fault (SHT_PROGBITS)
- .rel*
-
- [bpftool 处理](#bpftool 处理)
- .maps
-
- [bpftool 查找 maps](#bpftool 查找 maps)
- .BTF
- .BTF.ext
- license
- 其他
- 与eBPF无关的节
-
- [.llvm_addrsig .rel.debug_* .debug_* .rel.BTF .rel.BTF.ext](#.llvm_addrsig .rel.debug_* .debug_* .rel.BTF .rel.BTF.ext)
- 参考
文章地址:https://gitee.com/kiraskyler/Articles/blob/master/eBPF/elf.o 文件内容.md
简介
使用gitee simple_bpf仓库代码构建出的.o文件分析,欧拉24.03 lts sp2环境
readelf -S
# readelf -S build/src/trace.bpf.o -W
There are 23 section headers, starting at offset 0xe17e8:
节头:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 0e16f3 0000f0 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] kprobe/handle_mm_fault PROGBITS 0000000000000000 000040 0001c8 00 AX 0 0 8
[ 4] .relkprobe/handle_mm_fault REL 0000000000000000 08b150 000010 10 I 22 3 8
[ 5] .maps PROGBITS 0000000000000000 000208 000018 00 WA 0 0 8
[ 6] license PROGBITS 0000000000000000 000220 000004 00 WA 0 0 1
[ 7] .debug_loc PROGBITS 0000000000000000 000224 00013f 00 0 0 1
[ 8] .debug_abbrev PROGBITS 0000000000000000 000363 00038a 00 0 0 1
[ 9] .debug_info PROGBITS 0000000000000000 0006ed 052e39 00 0 0 1
[10] .rel.debug_info REL 0000000000000000 08b160 0563e0 10 I 22 9 8
[11] .debug_ranges PROGBITS 0000000000000000 053526 000060 00 0 0 1
[12] .debug_str PROGBITS 0000000000000000 053586 0302c8 01 MS 0 0 1
[13] .BTF PROGBITS 0000000000000000 083850 007511 00 0 0 4
[14] .rel.BTF REL 0000000000000000 0e1540 000020 10 I 22 13 8
[15] .BTF.ext PROGBITS 0000000000000000 08ad64 00019c 00 0 0 4
[16] .rel.BTF.ext REL 0000000000000000 0e1560 000160 10 I 22 15 8
[17] .debug_frame PROGBITS 0000000000000000 08af00 000028 00 0 0 8
[18] .rel.debug_frame REL 0000000000000000 0e16c0 000020 10 I 22 17 8
[19] .debug_line PROGBITS 0000000000000000 08af28 0000ec 00 0 0 1
[20] .rel.debug_line REL 0000000000000000 0e16e0 000010 10 I 22 19 8
[21] .llvm_addrsig LOOS+0xfff4c03 0000000000000000 0e16f0 000003 00 E 22 0 1
[22] .symtab SYMTAB 0000000000000000 08b018 000138 18 1 10 8
kprobe/handle_mm_fault (SHT_PROGBITS)
使用了SEC宏
simple_bpf/src/trace.bpf.c: 13
SEC("kprobe/handle_mm_fault")
宏会把这个函数放到__attribute__((section(name), used))函数名作为节头的单独节中
/usr/include/bpf/bpf_helpers.h: 36
#define SEC(name) \
_Pragma("GCC diagnostic push") \
_Pragma("GCC diagnostic ignored \"-Wignored-attributes\"") \
__attribute__((section(name), used)) \
_Pragma("GCC diagnostic pop") \
#endif
可以看到这个节的内容是8字节对齐的,对应bpf汇编每条指令都是8字节的
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 3] kprobe/handle_mm_fault PROGBITS 0000000000000000 000040 0001c8 00 AX 0 0 8
反汇编:
# llvm-objdump -d build/src/trace.bpf.o --section=kprobe/handle_mm_fault
0000000000000000 <kprobe_handle>:
0: bf 16 00 00 00 00 00 00 r6 = r1
1: 85 00 00 00 23 00 00 00 call 0x23
2: bf 07 00 00 00 00 00 00 r7 = r0
3: b7 01 00 00 28 00 00 00 r1 = 0x28
指令存储结构
/usr/include/linux/bpf.h: 72
struct bpf_insn {
__u8 code; /* opcode */
__u8 dst_reg:4; /* dest register */
__u8 src_reg:4; /* source register */
__s16 off; /* signed offset */
__s32 imm; /* signed immediate constant */
};
SEC(abc...) / SEC(?abc)
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf.c: 758
/* libbpf's convention for SEC("?abc...") is that it's just like
* SEC("abc...") but the corresponding bpf_program starts out with
* autoload set to false.
*/
if (sec_name[0] == '?') {
prog->autoload = false;
/* from now on forget there was ? in section name */
sec_name++;
} else {
prog->autoload = true;
}
libbpf处理
查找 SHT_PROGBITS 类型
不限制节名,无论是否是
.text,示例中节名为kprobe/handle_mm_fault
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf.c: 3352
static int bpf_object__elf_collect(struct bpf_object *obj)
{
......
else if (sh->sh_type == SHT_PROGBITS && data->d_size > 0) {
if (sh->sh_flags & SHF_EXECINSTR) { // 可执行
if (strcmp(name, ".text") == 0)
obj->efile.text_shndx = idx;
err = bpf_object__add_programs(obj, data, name, idx); // SHT_PROGBITS且有内容、可执行、LOCAL类型,加载prog,添加到obj->programs
libbpf加载program时候会扫描符号表,查找FUNC、LOCAL且位于SHT_PROGBITS、SHF_EXECINSTR节的符号,符号名是prog->name(节名不是),符号表中的Value代表的是对应整个ebpf prog的偏移,不再是常规elf文件的符号地址
# readelf -s -W build/src/trace.bpf.o
Symbol table '.symtab' contains 13 entries:
Num: Value Size Type Bind Vis Ndx Name
10: 0000000000000000 456 FUNC GLOBAL DEFAULT 3 kprobe_handle
-exec p obj->programs[0]
$1 = {name = 0x5555556d5ad0 "kprobe_handle", sec_name = 0x5555556d5ab0 "kprobe/handle_mm_fault", sec_idx = 3, sec_def = 0x0, sec_insn_off = 0, sec_insn_cnt = 57, sub_insn_off = 0, insns = 0x5555556d9fe0, insns_cnt = 57......
解析类型
find_sec_def从section_defs中根据节名kprobe/handle_mm_fault查找,section_defs定义了各种支持的类型,几十种
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf.c: 8652
static const struct bpf_sec_def section_defs[] = {
SEC_DEF("socket", SOCKET_FILTER, 0, SEC_NONE),
SEC_DEF("sk_reuseport/migrate", SK_REUSEPORT, BPF_SK_REUSEPORT_SELECT_OR_MIGRATE, SEC_ATTACHABLE),
SEC_DEF("sk_reuseport", SK_REUSEPORT, BPF_SK_REUSEPORT_SELECT, SEC_ATTACHABLE),
SEC_DEF("kprobe+", KPROBE, 0, SEC_NONE, attach_kprobe),
SEC_DEF("uprobe+", KPROBE, 0, SEC_NONE, attach_uprobe),
SEC_DEF("uprobe.s+", KPROBE, 0, SEC_SLEEPABLE, attach_uprobe),
.rel*
# readelf -S -W build/src/trace.bpf.o
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 3] kprobe/handle_mm_fault PROGBITS 0000000000000000 000040 0001c8 00 AX 0 0 8
[ 4] .relkprobe/handle_mm_fault REL 0000000000000000 08b5b8 000010 10 I 22 3 8
[ 9] .debug_info PROGBITS 0000000000000000 0006d7 052d9f 00 0 0 1
[10] .rel.debug_info REL 0000000000000000 08b5c8 056aa0 10 I 22 9 8
[13] .BTF PROGBITS 0000000000000000 083c20 0075a6 00 0 0 4
[14] .rel.BTF REL 0000000000000000 0e2068 000020 10 I 22 13 8
[15] .BTF.ext PROGBITS 0000000000000000 08b1c8 00019c 00 0 0 4
[16] .rel.BTF.ext REL 0000000000000000 0e2088 000160 10 I 22 15 8
[17] .debug_frame PROGBITS 0000000000000000 08b368 000028 00 0 0 8
[18] .rel.debug_frame REL 0000000000000000 0e21e8 000020 10 I 22 17 8
[19] .debug_line PROGBITS 0000000000000000 08b390 0000ec 00 0 0 1
[20] .rel.debug_line REL 0000000000000000 0e2208 000010 10 I 22 19 8
每一个.rel.*的entry size均是0x10 = sizeof(Elf64_Rel)
typedef struct
{
Elf64_Addr r_offset; /* Address 对应整个ebpf prog的指令index */
Elf64_Xword r_info; /* Relocation type and symbol index 高32bit是对应符号表的id */
} Elf64_Rel;
bpftool 处理
只处理可执行权限节的对应.rel、.rel.struct_ops、.rel.struct_ops.link、.rel.maps四种
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf.c: 3352
static int bpf_object__elf_collect(struct bpf_object *obj)
{
......
} else if (sh->sh_type == SHT_REL) {
int targ_sec_idx = sh->sh_info; /* points to other section */
/* Only do relo for section with exec instructions */
if (!section_have_execinstr(obj, targ_sec_idx) &&
strcmp(name, ".rel" STRUCT_OPS_SEC) &&
strcmp(name, ".rel" STRUCT_OPS_LINK_SEC) &&
strcmp(name, ".rel" MAPS_ELF_SEC)) {
pr_info("elf: skipping relo section(%d) %s for section(%d) %s\n",
idx, name, targ_sec_idx,
elf_sec_name(obj, elf_sec_by_idx(obj, targ_sec_idx)) ?: "<?>");
continue;
.maps
该节的存储所有SEC(".maps")定义情况
/root/simple_bpf/src/trace.bpf.c:7
struct {
__uint(type, BPF_MAP_TYPE_PERF_EVENT_ARRAY);
__uint(key_size, sizeof(u32));
__uint(value_size, sizeof(u32));
}events SEC(".maps");
新式的__uint定义方式仅仅是定义空指针,这里是int (*name)[val],表示的是一个指针,指向大小为val的int数组,而不是int *name[val]定义4个指针,所以一行__uint是一个指针大小,三个指针大小就是0x18=24
/usr/include/bpf/bpf_helpers.h:13
#define __uint(name, val) int (*name)[val]
所以.maps里数据都是0
# readelf -x .maps build/src/trace.bpf.o
Hex dump of section '.maps':
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 00000000 ........
.maps节中每个map定义位置在符号表中查找,这里只有一个events,在第5个节的0偏移位置,大小24
# readelf -s -W build/src/trace.bpf.o
Symbol table '.symtab' contains 13 entries:
Num: Value Size Type Bind Vis Ndx Name
11: 0000000000000000 24 OBJECT GLOBAL DEFAULT 5 events
map的定义成员(eg: key_size名称与大小)在.BTF中记录, type_id与#define __uint(name, val)中的val对应
# bpftool btf dump file build/src/trace.bpf.o
[7] STRUCT '(anon)' size=24 vlen=3
'type' type_id=1 bits_offset=0
'key_size' type_id=1 bits_offset=64
'value_size' type_id=5 bits_offset=128
bpftool 查找 maps
bpftool通过
.BTF信息查找
DATASEC类型,名称为.maps的BTF信息里存储了map条目。
- 查找map
- 条目354,size会修正为
.maps节的大小,vlen表示条目354有1条信息- 354的一条信息,指向BTF条目6,即
events,offset会修正为符号表中events中的地址
- 354的一条信息,指向BTF条目6,即
- 条目354,size会修正为
- 查找map定义
-
条目6的
type_id即属性所在的条目,即条目5bpftool btf dump file build/src/trace.bpf.o
[5] STRUCT '(anon)' size=24 vlen=3
'type' type_id=1 bits_offset=0
'key_size' type_id=1 bits_offset=64
'value_size' type_id=1 bits_offset=128
[6] VAR 'events' type_id=5, linkage=global
[354] DATASEC '.maps' size=0 vlen=1
type_id=6 offset=0 size=24 (VAR 'events')
-
实现方式参考bpf_object__init_user_btf_maps方法
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf.c: 2581
static int bpf_object__init_user_btf_maps(struct bpf_object *obj, bool strict,
const char *pin_root_path)
.BTF
内容布局
btf_header 头信息
types区域
struct btf_type
根据btf_type->info中存储的类型与大小决定当前type大小不同
重复struct btf_type......
str区域(和types区域先后顺序无要求)
btf_header
/usr/include/linux/btf.h: 11
struct btf_header {
__u16 magic; // 0xeB9F,/sys/kernel/btf/vmlinux整个文件就像一个.BTF节
__u8 version;
__u8 flags;
__u32 hdr_len;
/* All offsets are in bytes relative to the end of this header */
__u32 type_off; /* offset of type section */
__u32 type_len; /* length of type section */
__u32 str_off; /* offset of string section 相对于头结束时的偏移量 */
__u32 str_len; /* length of string section */
};
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/btf.c: 895
btf->hdr = btf->raw_data; // 直接对准.BTF节的内容
btf_type
一条type记录是一个btf_type头紧跟vlen个struct btf_member,btf_type表示基本属性,btf_member表示成员
/usr/include/linux/btf.h: 31
struct btf_type {
__u32 name_off; // 在对应.BTF节str部分的偏移,by:btf__name_by_offset
/* "info" bits arrangement
* bits 0-15: vlen (e.g. # of struct's members)
* bits 16-23: unused
* bits 24-28: kind (e.g. int, ptr, array...etc)
* bits 29-30: unused
* bit 31: kind_flag, currently used by
* struct, union, enum, fwd and enum64
*/
__u32 info;
/* "size" is used by INT, ENUM, STRUCT, UNION, DATASEC and ENUM64.
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
* FUNC, FUNC_PROTO, VAR, DECL_TAG and TYPE_TAG.
* "type" is a type_id referring to another type.
*/
union {
__u32 size;
__u32 type;
};
};
根据一个btf_type的类型不同,实际每个条目大小不同,比如一个int的定义,BTF_TYPE_ENC是struct btf_type,后面还要带上一行其他属性
/usr/src/debug/libbpf-1.2.2-11.oe2403sp2.x86_64/src/libbpf.c: 452
static const char strs[] = "\0int";
__u32 types[] = {
/* int */
BTF_TYPE_INT_ENC(1, BTF_INT_SIGNED, 0, 32, 4),
};
/usr/src/debug/libbpf-1.2.2-11.oe2403sp2.x86_64/src/libbpf_internal.h: 65
#define BTF_TYPE_INT_ENC(name, encoding, bits_offset, bits, sz) \
BTF_TYPE_ENC(name, BTF_INFO_ENC(BTF_KIND_INT, 0, 0), sz), \
BTF_INT_ENC(encoding, bits_offset, bits)
BTF_INT_ENC中三个成员组成结构体btf_member
/usr/include/linux/btf.h: 122
struct btf_member {
__u32 name_off;
__u32 type;
/* If the type info kind_flag is set, the btf_member offset
* contains both member bitfield size and bit offset. The
* bitfield size is set for bitfield members. If the type
* info kind_flag is not set, the offset contains only bit
* offset.
*/
__u32 offset; // 当前member相对于btf_type的偏移
};
btf_member
btf_type里包含btf_type->个members
/usr/include/linux/btf.h: 122
struct btf_member {
__u32 name_off;
__u32 type; // 当前成员的type信息index
/* If the type info kind_flag is set, the btf_member offset
* contains both member bitfield size and bit offset. The
* bitfield size is set for bitfield members. If the type
* info kind_flag is not set, the offset contains only bit
* offset.
*/
__u32 offset; // 当前member在btf_type的偏移,单位bit,24位以上记录的是位域
};
一个struct btf_type条目中,members数据紧跟在btf_type(btf_ext_info_sec)头之后
/usr/src/debug/libbpf-1.2.2-11.oe2403sp2.x86_64/src/btf.h: 522
static inline struct btf_member *btf_members(const struct btf_type *t) // 头往后是成员
{
return (struct btf_member *)(t + 1);
/* Get bit offset of a member with specified index. */
static inline __u32 btf_member_bit_offset(const struct btf_type *t,
__u32 member_idx)
{
const struct btf_member *m = btf_members(t) + member_idx; // 第member_idx个成员
参考
.BTF.ext
实现CO-RE的关键,记录了使用到的重定位信息
内容布局
参考libbpf解析该节的函数
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/btf.c: 2832
struct btf_ext *btf_ext__new(const __u8 *data, __u32 size)
布局总览:
btf_ext_header 头信息
func_info区:
---> 与btf_ext_header间隔btf_ext_header->func_info_off <---
record_size(u32) 相比于btf_ext_header偏移btf_ext_header->func_info_off
btf_ext_info_sec
btf_ext_info_sec->num_info * record_size条记录
重复btf_ext_info_sec和记录......
line_info区:
和func_info区一致,line/func/core_relo区顺序无要求
core_relo_info区:
和func_info、line_info区一致,line/func/core_relo区顺序无要求
btf_ext_header
core_relo_off和实现core有关,如果core_relo_len > 0代表需要解析vmlinux
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf_internal.h: 423
struct btf_ext_header {
__u16 magic;
__u8 version;
__u8 flags;
__u32 hdr_len;
/* All offsets are in bytes relative to the end of this header */
__u32 func_info_off;
__u32 func_info_len;
__u32 line_info_off;
__u32 line_info_len;
/* optional part of .BTF.ext header */
__u32 core_relo_off;
__u32 core_relo_len;
};
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/btf.c: 2785
static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
{
const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
btf_ext_info_sec
func_info/line_info/core_relo_info中的一个条目
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf_internal.h: 451
struct btf_ext_info_sec {
__u32 sec_name_off; // .BTF(非.BTF.ext,ref:btf__name_by_offset)中str偏移量
__u32 num_info; // 有多少个struct bpf_core_relo
/* Followed by num_info * record_size number of bytes */
__u8 data[]; // struct bpf_core_relo内容
};
其中record_size指的是接下来bpf_core_relo的大小
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/btf.c: 2655
static int btf_ext_setup_info(struct btf_ext *btf_ext,
struct btf_ext_sec_setup_param *ext_sec)
{
const struct btf_ext_info_sec *sinfo;
void *info;
info = btf_ext->data + btf_ext->hdr->hdr_len + ext_sec->off; // 找到func_info位置,ext_sec->off = btf_ext->hdr->func_info_off
record_size = *(__u32 *)info;
sinfo = info + sizeof(__u32); // 跳过record_size成员
while (info_left) {
unsigned int sec_hdrlen = sizeof(struct btf_ext_info_sec); // 该节的后续是n * (struct btf_ext_info_sec + btf_ext_info_sec->num_info * record_size)
num_records = sinfo->num_info;
total_record_size = sec_hdrlen + (__u64)num_records * record_size;
info_left -= total_record_size;
sinfo = (void *)sinfo + total_record_size;
bpf_core_relo
这里便存储着CORE的信息,哪一条指令insn_off的偏移是哪个位置access_str_off
/usr/include/linux/bpf.h: 7338
struct bpf_core_relo {
__u32 insn_off;
__u32 type_id; // .BTF中types的index
__u32 access_str_off; // .BTF中str的index,内容是低层core,eg: 0:1:2:3
enum bpf_core_relo_kind kind;
};
bpf_core_relo从btf_ext_info_sec->data开始,每个条目是record_size大小,btf_ext_info_sec->num_info个条目
/usr/src/debug/libbpf-1.2.2-11.oe2403sp2.x86_64/src/libbpf_internal.h: 397
#define for_each_btf_ext_rec(seg, sec, i, rec) \
for (i = 0, rec = (void *)&(sec)->data; \
i < (sec)->num_info; \
i++, rec = (void *)rec + (seg)->rec_size)
参考
license
/root/simple_bpf/src/trace.bpf.c: 33
char _license[] SEC("license") = "GPL";
同理,通过SEC宏控制
# hexdump -C -s 0x220 -n 0x4 build/src/trace.bpf.o
00000220 47 50 4c 00 |GPL.|
其他
version
bpftool工具可解析该节,但例程中无此节
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf.c: 1413
static int
bpf_object__init_kversion(struct bpf_object *obj, void *data, size_t size)
{
__u32 kver;
if (!data || size != sizeof(kver)) {
pr_warn("invalid kver section in %s\n", obj->path);
return -LIBBPF_ERRNO__FORMAT;
}
memcpy(&kver, data, sizeof(kver));
obj->kern_version = kver;
maps
不带前面的
.,不再支持
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf.c: 3352
static int bpf_object__elf_collect(struct bpf_object *obj)
{
......
} else if (strcmp(name, "maps") == 0) {
pr_warn("elf: legacy map definitions in 'maps' section are not supported by libbpf v1.0+\n");
与eBPF无关的节
.llvm_addrsig .rel.debug_* .debug_* .rel.BTF .rel.BTF.ext
bpftool中未使用到这些节
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf.c: 3300
static bool is_sec_name_dwarf(const char *name)
{
/* approximation, but the actual list is too long */
return str_has_pfx(name, ".debug_");
static bool ignore_elf_section(Elf64_Shdr *hdr, const char *name)
{
/* no special handling of .strtab */
if (hdr->sh_type == SHT_STRTAB)
return true;
/* ignore .llvm_addrsig section as well */
if (hdr->sh_type == SHT_LLVM_ADDRSIG)
return true;
/* no subprograms will lead to an empty .text section, ignore it */
if (hdr->sh_type == SHT_PROGBITS && hdr->sh_size == 0 &&
strcmp(name, ".text") == 0)
return true;
/* DWARF sections */
if (is_sec_name_dwarf(name)) // .debug_*
return true;
if (str_has_pfx(name, ".rel")) {
name += sizeof(".rel") - 1;
/* DWARF section relocations */
if (is_sec_name_dwarf(name)) // .rel.debug_*
return true;
/* .BTF and .BTF.ext don't need relocations */
if (strcmp(name, BTF_ELF_SEC) == 0 ||
strcmp(name, BTF_EXT_ELF_SEC) == 0) // .rel.BTF .rel.BTF.ext
return true;
}