"结构决定性质,性质决定用途"。如果不了解结构,是很难真正理解的。
通过一个示例的可执行文件了解Mach-O文件的结构
Mach-O基本结构
- Header: :文件类型、目标架构类型等
- Load Commands:描述文件在虚拟内存中的逻辑结构、布局
- Data: 在Load commands中定义的Segment的数据

Header

Header的结构定义在loader.h
c
/*
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
// 魔数:64位的mach-o有两个取值
// #define MH_MAGIC_64 0xfeedfacf -- 小端:Intel
// #define MH_CIGAM_64 0xcffaedfe -- 大端:以前macOS在PowerPC安装
uint32_t magic; /* mach magic number identifier */
// cpu类型
// 在machine.h中定义
// 例子中的显示的cpu的Value是:CPU_TYPE_ARM,根据下面的定义 0x0000000C | 0x01000000 = 0x0100000C
// #define CPU_ARCH_ABI64 0x01000000 /* 64 bit ABI */
// #define CPU_TYPE_ARM ((cpu_type_t) 12)
// #define CPU_TYPE_ARM64 (CPU_TYPE_ARM | CPU_ARCH_ABI64)
int32_t cputype; /* cpu specifier */
/*
* ARM64 subtypes
* ARM64的具体类型
* 例子中的显示的值是0,即CPU_SUBTYPE_ARM64_ALL
*/
// #define CPU_SUBTYPE_ARM64_ALL ((cpu_subtype_t) 0)
// #define CPU_SUBTYPE_ARM64_V8 ((cpu_subtype_t) 1)
// #define CPU_SUBTYPE_ARM64E ((cpu_subtype_t) 2)
int32_t cpusubtype; /* machine specifier */
// 文件类型
/**
* #define MH_OBJECT 0x1 -- .o文件,.a是.o的合集
* #define MH_EXECUTE 0x2 -- 可执行文件
* #define MH_DYLIB 0x6 -- 动态库
* #define MH_DYLINKER 0x7 -- dyld链接器
* #define MH_DSYM 0xa -- 符号表文件
*/
// 例子中的是2,即MH_EXECUTE,可执行文件
uint32_t filetype; /* type of file */
// Load Commands加载命令的条数
// 例子中是23条
uint32_t ncmds; /* number of load commands */
// Load Commands部分的长度
// 例子中是2864byte
uint32_t sizeofcmds; /* the size of all the load commands */
// mach-o的标志,通过位移枚举定义
// 例子中的
/**
* #define MH_NOUNDEFS 0x1 -- 没有未定义的引用
* #define MH_DYLDLINK 0x4 -- 已经静态链接过了,可以动态链接
* #define MH_TWOLEVEL 0x8 -- 链接时:库名 + 函数减少同名冲突 见参考一
* #define MH_PIE 0x200000 -- 每次加载主程序在一个随机地址,增加安全
*/
uint32_t flags; /* flags */
// 保留
uint32_t reserved; /* reserved */
};
Load Commands
每个Load Commands都有对应的结构体
LC_SEGMENT_64
c
/*
* The 64-bit segment load command indicates that a part of this file is to be
* mapped into a 64-bit task's address space. If the 64-bit segment has
* sections then section_64 structures directly follow the 64-bit segment
* command and their size is reflected in cmdsize.
*/
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
int32_t maxprot; /* maximum VM protection */
int32_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
使用segment_command_64结构体的segment
Segment: __PAGEZERO
__PAGEZERO
用于捕捉NULL指针引用

c
#define LC_SEGMENT_64 0x19 // 即64位的segment
// vm_prot.h
typedef int vm_prot_t;
#define VM_PROT_NONE ((vm_prot_t) 0x00)
// 读/写/执行
#define VM_PROT_READ ((vm_prot_t) 0x01) /* read permission */
#define VM_PROT_WRITE ((vm_prot_t) 0x02) /* write permission */
#define VM_PROT_EXECUTE ((vm_prot_t) 0x04) /* execute permission */
...
变量名 | 值 | 说明 |
---|---|---|
cmd | 0x19 | segment的类型 |
cmdsize | 0x48 | segment的长度, 这里是0x48 = 0x000000068 - 0x00000020 |
segname | 0x5F5F504147455A45524F000000000000 | segment的名,这里是__PAGEZERO, ASCII表示:5F = '_',50 = 'P',41 = 'A'...,4F = 'O' |
vmaddr | 0 | segment在虚拟内存的起始地址,8个字节uint64_t |
vmsize | 0x0000000100000000 | segment的长度,2^32 = 4GB,即64位的虚拟内存的前4G都是__PAGEZERO |
fileoff | 0 | 文件的偏移量,从磁盘的角度看 |
filesize | 0 | 占用文件的大小,这是磁盘的角度看,实际未占用磁盘大小 |
maxprot | 0 | 虚拟内存的最高的权限设置,未设置,即不能读,不能写,也不能被加载到cpu中执行 |
initprot | 0 | 初始化时的虚拟内存的权限设置,未设置 |
nsects | 0 | segment中包含的section的数量,这里为0个 |
flags | 0 | 标志,没有 |
Segment: __TEXT 代码
__TEXT
用于描述代码segment的一些信息

也是segment_command_64结构体,可以看到这个segment中的initprot中是有VM_PROT_EXECUTE,声明这部分是可以被执行的。segment中9个sections
Section: __text
每个section的结构体如下
c
struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};

c
#define S_REGULAR 0x0 /* regular section */
#define S_ATTR_PURE_INSTRUCTIONS 0x80000000 // 这个sections只包含机器指令
#define S_ATTR_SOME_INSTRUCTIONS 0x00000400 /* section contains some
machine instructions */
变量名 | 值 | 说明 |
---|---|---|
sectname | 0x5F5F7465787400000000000000000000 | section的名称,__text |
segname | 0x5F5F5445585400000000000000000000 | section所属segment的名称,__TEXT |
addr | 0x0000000100005F04 | 虚拟内存的起始地址 |
size | 0x0000000000000564 | section的长度 |
offset | 0x5F04 | 代码在文件的具体偏移量,每个应用都不一样 |
align | 4 | 对齐 |
reloff | 0 | 静态链接重定位,.a文件中__objc_const能看到 |
nreloc | 0 | 静态链接重定位的符号的数量 |
flags | 0x80000400 | 标志,详见loader.h |
reserved1 | 保留,动态链接时的符号 | |
reserved2 | 保留,动态链接时的符号数量 | |
reserved3 | 保留 |

然后因为__PAGEZERO
占用了0x0000000100000000
加上前面文件占用了空间,所以应用的汇编代码的起始位置在0x5F04
位置,从上面的截图看确实如此
Section: __stubs
动态链接的符号,看reserved2有12个,这部分在二进制中的地址是0x0000000100006468

到0x0000000100006468
查看

这里存放的是运行时需要从系统和其他动态库中加载的符号
Section: __stub_helper
加载动态库有rebinding符号的过程,比如上面__stub
的需要12个外部的符号,__stub_helper
是辅助该过程能顺利完成
Section: __objc_stubs
__objc_stubs is a section in iOS binaries that contains stub functions for Objective-C calls. These stubs are used for debugging and analyzing Objective-C code
iOS Apps compiled with recent versions of XCode can generate stubs for msgSend calls, where each stub is just a call to the actual msgSend address after setting a specific selector:
应该是个高版本SDK跳过消息查找过程,加快方法调用的优化,后面再探究。
Section: __objc_methods
OC方法的信息
c
#define S_CSTRING_LITERALS 0x2 /* section with only literal C strings*/ // sections里只有C语言的常量字符串

Section:__objc_classname
OC的类名相关的描述,和__objc_methods
差不多
Section:__objc_methtype
OC的方法签名部分的描述
找到Data部分实际存的内容

Section: __cstring
C的常量字符串的描述
Section: __unwind_info
用于存储处理异常情况的信息
Segment: __DATA 数据
对数据部分的组织规则的描述,这部分也有一些sections
Section: __got
非懒加载指针,dyld 加载时会立即绑定表项中的符号

dyld_stub_binder 负责绑定符号,objc_msgSend消息发送,这两个懒加载没有意义
Seciton: __la_symbol_ptr
相对的是懒加载指针,表中的指针一开始都指向 __TEXT.__stub_helper
Section: __cfstring
Core Foundation 字符串
Section: __objc_classlist
记录了App中所有的class,包括meta class。该节中存储的是一个个的指针,指针指向的地址是class结构体所在的地址

这里Address是0x100008090
,去掉前面的0x100000000
(__PAGEZERO),找0x8090
的地址

里面的值是0x00000001000091A0
,描述是指针,再去找0x91A0
,走到__DATA.__objc_data
,这里存着实际的OC的类

Section: __objc_protolist

0x1000080A8 => 0x0000000100009298,到了 __DATA.__data
?


Section: __objc_imageInfo
主要用来区分OC的版本是 1.0 还是 2.0
Section: __objc_const
记录在OC内存初始化过程中的不可变内容,比如 method_t 结构体定义
Section: __objc_selrefs
标记哪些SEL对应的字符串被引用了
Section: __objc_classrefs
标记哪些类被引用了
Section: __objc_superrefs
Objective-C 超类引用
Section: __objc_ivar
存储程序中的 ivar 变量
Section: __objc_data
用于保存 OC 类需要的数据。最主要的内容是映射 __objc_const 地址,用于找到类的相关数据
Section: __data
初始化过的可变数据
Segment: __LINKEDIT
fileOffset是 0xc000,size是0x7850,两者相加得 0x13850,从下图可知Dynamic Loader Info 到Code Signature都是这个区间内,里面包含动态库加载哪些符号,符号表,二进制的签名信息。所以可执行文件的加载指令后的实际内容就是__TEXT,__DATA,__LINKEDIT,__PAGEZERO是占位
sh
# 用size命令显示macho文件时就是4个段
$ size -x -m path/to/macho-execute



使用其他结构体的Command
Command:LC_DYLD_INFO_ONLY
描述dyld要绑定动态库的哪些符号,是强绑定还是弱绑定
c
/*
* The dyld_info_command contains the file offsets and sizes of
* the new compressed form of the information dyld needs to
* load the image. This information is used by dyld on Mac OS X
* 10.6 and later. All information pointed to by this command
* is encoded using byte streams, so no endian swapping is needed
* to interpret it.
*/
struct dyld_info_command {
uint32_t cmd; /* LC_DYLD_INFO or LC_DYLD_INFO_ONLY */
uint32_t cmdsize; /* sizeof(struct dyld_info_command) */
uint32_t rebase_off; /* file offset to rebase info */
uint32_t rebase_size; /* size of rebase info */
uint32_t bind_off; /* file offset to binding info */
uint32_t bind_size; /* size of binding info */
uint32_t weak_bind_off;
uint32_t weak_bind_size; /* size of weak binding info */
uint32_t lazy_bind_off;
uint32_t lazy_bind_size; /* size of lazy binding infs */
uint32_t export_off; /* file offset to lazy binding info */
uint32_t export_size; /* size of lazy binding infs */
};
Command: LC_SYMTAB
macho文件的符号表的描述
c
/*
* The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
* "stab" style symbol table information as described in the header files
* <nlist.h> and <stab.h>.
*/
struct symtab_command {
uint32_t cmd; /* LC_SYMTAB */
uint32_t cmdsize; /* sizeof(struct symtab_command) */
uint32_t symoff; /* symbol table offset */
uint32_t nsyms; /* number of symbol table entries */
uint32_t stroff; /* string table offset */
uint32_t strsize; /* string table size in bytes */
};
Command: LC_DYSYMTAB
macho文件依赖的动态库的符号表
Command: LC_LOAD_DYLINKER
加载dyld链接器
c
/*
* A program that uses a dynamic linker contains a dylinker_command to identify
* the name of the dynamic linker (LC_LOAD_DYLINKER). And a dynamic linker
* contains a dylinker_command to identify the dynamic linker (LC_ID_DYLINKER).
* A file can have at most one of these.
* This struct is also used for the LC_DYLD_ENVIRONMENT load command and
* contains string for dyld to treat like environment variable.
*/
struct dylinker_command {
uint32_t cmd; /* LC_ID_DYLINKER, LC_LOAD_DYLINKER or
LC_DYLD_ENVIRONMENT */
uint32_t cmdsize; /* includes pathname string */
union lc_str name; /* dynamic linker's path name */
};

Command: LC_UUID
静态连接器生成的128位随机数,用于标识macho文件
c
/*
* The uuid load command contains a single 128-bit unique random number that
* identifies an object produced by the static link editor.
*/
struct uuid_command {
uint32_t cmd; /* LC_UUID */
uint32_t cmdsize; /* sizeof(struct uuid_command) */
uint8_t uuid[16]; /* the 128-bit uuid */
};
Command: LC_VERSION_MIN_IPHONEOS
指定最低版本号
c
/*
* The version_min_command contains the min OS version on which this
* binary was built to run.
*/
struct version_min_command {
uint32_t cmd; /* LC_VERSION_MIN_MACOSX or
LC_VERSION_MIN_IPHONEOS or
LC_VERSION_MIN_WATCHOS or
LC_VERSION_MIN_TVOS */
uint32_t cmdsize; /* sizeof(struct min_version_command) */
uint32_t version; /* X.Y.Z is encoded in nibbles xxxx.yy.zz */
uint32_t sdk; /* X.Y.Z is encoded in nibbles xxxx.yy.zz */
};
Command: LC_SOURCE_VERSION
指定iOS SDK系统库的版本
c
/*
* The source_version_command is an optional load command containing
* the version of the sources used to build the binary.
*/
struct source_version_command {
uint32_t cmd; /* LC_SOURCE_VERSION */
uint32_t cmdsize; /* 16 */
uint64_t version; /* A.B.C.D.E packed as a24.b10.c10.d10.e10 */
};
Command: LC_MAIN
应用程序入口
c
/*
* The entry_point_command is a replacement for thread_command.
* It is used for main executables to specify the location (file offset)
* of main(). If -stack_size was used at link time, the stacksize
* field will contain the stack size need for the main thread.
*/
struct entry_point_command {
uint32_t cmd; /* LC_MAIN only used in MH_EXECUTE filetypes */
uint32_t cmdsize; /* 24 */
uint64_t entryoff; /* file (__TEXT) offset of main() */
uint64_t stacksize;/* if not zero, initial stack size */
};

地址是 0x6120,找到对应地址可知就是 _main
函数的地址

Command: LC_ENCRYPTION_INFO_64
c
/*
* The encryption_info_command contains the file offset and size of an
* of an encrypted segment.
*/
struct encryption_info_command {
uint32_t cmd; /* LC_ENCRYPTION_INFO */
uint32_t cmdsize; /* sizeof(struct encryption_info_command) */
uint32_t cryptoff; /* file offset of encrypted range */
uint32_t cryptsize; /* file size of encrypted range */
uint32_t cryptid; /* which enryption system,
0 means not-encrypted yet */
};
加密部分是Crypt Offset:0x4000 , Crypt Size: 0x4000,两者相加末尾地址为0x8000,根据下图看,实际加密的部分是代码Segment的内容


Command: LC_LOAD_DYLIB
有若干个该命令,用于加载系统及应用链接的动态库
c
/*
* Dynamicly linked shared libraries are identified by two things. The
* pathname (the name of the library as found for execution), and the
* compatibility version number. The pathname must match and the compatibility
* number in the user of the library must be greater than or equal to the
* library being used. The time stamp is used to record the time a library was
* built and copied into user so it can be use to determined if the library used
* at runtime is exactly the same as used to built the program.
*/
struct dylib {
union lc_str name; /* library's path name */
uint32_t timestamp; /* library's build time stamp */
uint32_t current_version; /* library's current version number */
uint32_t compatibility_version; /* library's compatibility vers number*/
};
/*
* A dynamically linked shared library (filetype == MH_DYLIB in the mach header)
* contains a dylib_command (cmd == LC_ID_DYLIB) to identify the library.
* An object that uses a dynamically linked shared library also contains a
* dylib_command (cmd == LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, or
* LC_REEXPORT_DYLIB) for each library it uses.
*/
struct dylib_command {
uint32_t cmd; /* LC_ID_DYLIB, LC_LOAD_{,WEAK_}DYLIB,
LC_REEXPORT_DYLIB */
uint32_t cmdsize; /* includes pathname string */
struct dylib dylib; /* the library identification */
};

name字段指明加载路径
Command: LC_RPATH
前面动态库name里有@rpath变量的描述,@rpath的值在这里指定
Command: LC_FUNCTION_STARTS
该命令用于描述函数的起始地址信息,指向了链接信息段中 Function Starts 的首地址 Function Starts 定义了一个函数起始地址表,调试器和其他程序通过该表可以很容易地判断出一个地址是否在函数内
Command: LC_DATA_IN_CODE
该命令使用一个 struct linkedit_data_command 指向一个 data_in_code_entry 数组 data_in_code_entry 数组中的每一个元素,用于描述代码段中一个存储数据的区域
Command: LC_CODE_SIGATURE
签名信息的描述,从这里可知,二进制文件的签名是在文件内
Data
Load Commands部分是在描述MachO文件如何组织。比如代码部分的长度是多少,这种很像C语言操作数组时要传长度。如果再扩展一下概念,网络协议通过各种包的格式控制数据的传输,那前面这些命令也是在控制如何解析后面的Data。