optool为macho文件增加动态库

对macho文件有一定理解后,了解下optool是如何给macho文件增加动态库等功能的

optool 源码

环境

macOS 13.4 (22F66)

Xcode 14.3.1

0x0 编译

下载源码

sh 复制代码
$ git clone --recurse-submodules https://github.com/alexzielenski/optool.git

修改下Deployment Target,比如改成11.0,不修改会报

当然可以从Xcode旧版本里找到libarclite_macos.x复制一个到对应目录下。

⌘ + B后,从Derived Data中找到编译后的二进制文件

libArgumentParser-Static.a 这个静态库主要是解析命令行参数的,而且链接后会在optool中,可以忽略

0x1 命令到源码分析

注入动态库

sh 复制代码
# optool install -c <command> -p <payload> -t <target>
$ optool install -c load -p 动态库的地址 -t macho文件

编译出一个非常简单的app,用MachOView查看可知Load Comamnds的LC_LOAD_DYLIB加载了Foundaiton,libobjc.A.dylib... UIKit这些系统库

现在追加一个LearnOptionFramework

sh 复制代码
./optool install -c load -p LearnOptoolFramework.framework/LearnOptoolFramework -t LearnOptool_patch_cmd

再次用MachOView查看,看出Load Commands增加了LearnOptoolFramework

源码分析

打开Xcode工程,工程代码结构简单

通过启动传参来调试添加动态库的过程

从main函数开始看,前面部分是解析参数的

objc 复制代码
int main(int argc, const char * argv[]) {
    @autoreleasepool {
        BOOL showHelp = NO;

        // Flags
        XPMArgumentSignature *weak = [XPMArgumentSignature argumentSignatureWithFormat:@"[-w --weak]"];
        XPMArgumentSignature *resign = [XPMArgumentSignature argumentSignatureWithFormat:@"[--resign]"];
        XPMArgumentSignature *target = [XPMArgumentSignature argumentSignatureWithFormat:@"[-t --target]={1,1}"];
        ...
        XPMArgumentPackage *package = [[NSProcessInfo processInfo] xpmargs_parseArgumentsWithSignatures:@[resign, command, strip, restore, install, uninstall, output, backup, aslr, help, unrestrict, rename]];

        NSString *targetPath = [package firstObjectForSignature:target];
        ...

上面这部分是建立命令行解析的规则,比如是这样的字符串 [-t --target]={1,1}

objc 复制代码
XPMArgumentSignature *target = [XPMArgumentSignature argumentSignatureWithFormat:@"[-t --target]={1,1}"];

进入

objc 复制代码
+ (id)argumentSignatureWithFormat:(NSString *)format, ...
{
	va_list args;
	va_start(args, format);
    
	XPMArgumentSignature * signature = [XPMArgumentSignature argumentSignatureWithFormat:format arguments:args];
	
	va_end(args);
	
	return signature;
}

这里会用到可变参数

可变参数

  1. va_list args:定义一个指向个数可变的参数列表指针;
  2. va_start(args, format)format是第一个可选参数前的固定参数,va_start 使指针指向第一个可选参数;
  3. va_arg(args, type)返回参数列表中指针args所指的参数,返回类型为type,并使指针args指向参数列表中下一个参数;
  4. va_end(ap) 清空参数列表,并置参数指针ap无效.

按流程走下去主要就是

objc 复制代码
xpmargs_ScanFormatCtorHead(scanner, foundSwitches, foundAliases, &foundRange, &didFindRange);

...
for (NSString * s in [enclosedString componentsSeparatedByString:@" "]) {
		if ([s hasPrefix:@"--"]) {
			[switches addObject:[s substringFromIndex:2]];
		} else if ([s hasPrefix:@"-"]) {
			[switches addObject:[s substringFromIndex:1]];
		} else {
			[aliases addObject:s];
		}
	}

xpmargs_ScanFormatCtorTail(scanner, valueRange, didFindRange);

整个要做的是如何识别[-t --target]={1,1},然后命令行传入能匹配成功。

命令行参数

通过NSProcessInfo 对象的arguments 获取这次传入的参数信息,

objc 复制代码
@property (readonly, copy) NSArray<NSString *> *arguments;

打印效果大致如下

sh 复制代码
<__NSFrozenArrayM 0x600000c00c00>(
xxx/optool,
install,
-c,
load,
-p,
LearnOptoolFramework.framework/LearnOptoolFramework,
-t,
LearnOptool_patch_cmd
)

根据规则解析出targetPath

objc 复制代码
NSString *targetPath = [package firstObjectForSignature:target];

就是-t 参数传入的

如下所示,接着因为工具支持是否要备份,使用({})匿名函数来返回备份文件路径,target后面加_backup,因为没有使用到忽略。

objc 复制代码
...
NSBundle *bundle = [NSBundle bundleWithPath:targetPath];
NSString *executablePath = [[bundle.executablePath ?: targetPath stringByExpandingTildeInPath] stringByResolvingSymlinksInPath];
NSString *backupPath = ({
    NSString *bkp = [executablePath stringByAppendingString:@"_backup"];
    if (bundle) {
    NSString *vers = [bundle objectForInfoDictionaryKey:(NSString *)kCFBundleVersionKey];
    if (vers)
        bkp = [bkp stringByAppendingPathExtension:vers];
    }
    bkp;
});;

紧接着是一段保护处理,也略过,然后读取executablePath文件到NSData对象中,然后创建一个可变的二进制对象,后面对binary进行操作

objc 复制代码
...
NSData *originalData = [NSData dataWithContentsOfFile:executablePath];
NSMutableData *binary = originalData.mutableCopy;
...

// 检查
if (!binary)
    return OPErrorRead;

然后自定义了一个thin_header

objc 复制代码
// we pass around this header which includes some extra information
// and a 32-bit header which we used for both 32-bit and 64-bit files
// since the 64-bit just adds an extra field to the end which we don't need
struct thin_header {
    uint32_t offset;
    uint32_t size;
    struct mach_header header;
};
objc 复制代码
struct thin_header headers[4];
uint32_t numHeaders = 0;
headersFromBinary(headers, binary, &numHeaders);

if (numHeaders == 0) {
    LOG("No compatible architecture found");
    return OPErrorIncompatibleBinary;
}
    ...

主要看下headersFromBinary这个函数

objc 复制代码
// 这部分的逻辑主要是检查macho文件的魔数字段,然后确定是FAT,还是单独架构的文件,然后将填充到自定义的header结构体
struct thin_header *headersFromBinary(struct thin_header *headers, NSData *binary, uint32_t *amount) {
    // In a MachO/FAT binary the first 4 bytes is a magic number
    // which gives details about the type of binary it is
    // CIGAM and co. mean the target binary has a byte order
    // in reverse relation to the host machine so we have to swap the bytes
    uint32_t magic = [binary intAtOffset:0];
    bool shouldSwap = magic == MH_CIGAM || magic == MH_CIGAM_64 || magic == FAT_CIGAM;
#define SWAP(NUM) shouldSwap ? CFSwapInt32(NUM) : NUM
    
    uint32_t numArchs = 0;

    // a FAT file is basically a collection of thin MachO binaries
    if (magic == FAT_CIGAM || magic == FAT_MAGIC) {
        LOG("Found FAT Header");
        
        // WE GOT A FAT ONE
        struct fat_header fat = *(struct fat_header *)binary.bytes;
        fat.nfat_arch = SWAP(fat.nfat_arch);
        int offset = sizeof(struct fat_header);

        // Loop through the architectures within the FAT binary to find
        // a thin macho header that we can work with (x86 or x86_64)
        for (int i = 0; i < fat.nfat_arch; i++) {
            struct fat_arch arch;
            arch = *(struct fat_arch *)([binary bytes] + offset);
            arch.cputype = SWAP(arch.cputype);
            arch.offset = SWAP(arch.offset);

            struct thin_header macho = headerAtOffset(binary, arch.offset);
            if (macho.size > 0) {
                LOG("Found thin header...");

                headers[numArchs] = macho;
                numArchs++;
            }
            
            offset += sizeof(struct fat_arch);
        }
    // The binary is thin, meaning it contains only one architecture
    } else if (magic == MH_MAGIC || magic == MH_MAGIC_64) {
        // 根据设备的情况这个是这个,然后给thin_header赋值
        struct thin_header macho = headerAtOffset(binary, 0);
        if (macho.size > 0) {
            LOG("Found thin header...");

            numArchs++;
            headers[0] = macho;
        }
        
    } else {
        LOG("No headers found.");
    }
    
    *amount = numArchs;
    
    return headers;
}

headerAtOffset函数给thin_header赋值

objc 复制代码
struct thin_header headerAtOffset(NSData *binary, uint32_t offset) {
    struct thin_header macho;
    macho.offset = offset;
    macho.header = *(struct mach_header *)(binary.bytes + offset);
    if (macho.header.magic == MH_MAGIC || macho.header.magic == MH_CIGAM) {
        macho.size = sizeof(struct mach_header);
    } else {
        macho.size = sizeof(struct mach_header_64);
    }
    if (macho.header.cputype != CPU_TYPE_X86_64 && macho.header.cputype != CPU_TYPE_I386 && macho.header.cputype != CPU_TYPE_ARM && macho.header.cputype != CPU_TYPE_ARM64){
        macho.size = 0;
    }
    
    return macho;
}

上面这些准备工作执行完成后,可以开始修改二进制文件了。

开始遍历上面准备好的 thin_header 数组,里面有 mach_header 结构体

objc 复制代码
for (uint32_t i = 0; i < numHeaders; i++) {
    struct thin_header macho = headers[i];
    ...
    
    else if ([package booleanValueForSignature:install]) {
        NSString *lc = [package firstObjectForSignature:command];
        uint32_t command = LC_LOAD_DYLIB;
        if (lc)
            command = COMMAND(lc);
        if (command == -1) {
            LOG("Invalid load command.");
            return OPErrorInvalidLoadCommand;
        }

        if (insertLoadEntryIntoBinary(dylibPath, binary, macho, command)) {
            LOG("Successfully inserted a %s command for %s", LC(command),   CPU(macho.header.cputype));
        } else {
            LOG("Failed to insert a %s command for %s", LC(command), CPU(macho.header.cputype));
            return OPErrorInsertFailure;
        }

根据下面这段COMMAND的宏,因为命令传的是load所以就是LC_LOAD_DYLIB,说明是一条加载动态库的指令,将待加载的动态库,二进制文件,macho文件头信息,加载命令都传到insertLoadEntryIntoBinary函数中

objc 复制代码
#define COMMAND(str) ({ \
    uint32_t cmd = -1; \
    if ([str isEqualToString: @"reexport"]) \
        cmd = LC_REEXPORT_DYLIB; \
    else if ([str isEqualToString: @"weak"]) \
        cmd = LC_LOAD_WEAK_DYLIB; \
    else if ([str isEqualToString: @"upward"]) \
        cmd = LC_LOAD_UPWARD_DYLIB; \
    else if ([str isEqualToString: @"load"]) \
        cmd = LC_LOAD_DYLIB; \
    cmd; \
})

进入到 insertLoadEntryIntoBinary 函数,这段就是在处理macho来增加动态库加载的逻辑

objc 复制代码
BOOL insertLoadEntryIntoBinary(NSString *dylibPath, NSMutableData *binary, struct thin_header macho, uint32_t type) {
    // 异常保护
    if (type != LC_REEXPORT_DYLIB &&
        type != LC_LOAD_WEAK_DYLIB &&
        type != LC_LOAD_UPWARD_DYLIB &&
        type != LC_LOAD_DYLIB) {
        LOG("Invalid load command type");
        return NO;
    }
    // parse load commands to see if our load command is already there
    uint32_t lastOffset = 0;
    // 是否重复添加
    if (binaryHasLoadCommandForDylib(binary, dylibPath, &lastOffset, macho)) {
        // there already exists a load command for this payload so change the command type
        uint32_t originalType = *(uint32_t *)(binary.bytes + lastOffset);
        if (originalType != type) {
            LOG("A load command already exists for %s. Changing command type from %s to desired %s", dylibPath.UTF8String, LC(originalType), LC(type));
            [binary replaceBytesInRange:NSMakeRange(lastOffset, sizeof(type)) withBytes:&type];
        } else {
            LOG("Load command already exists");
        }
        
        return YES;
    }
    
    // create a new load command
    // 新添加,创建加载动态库命令
    unsigned int length = (unsigned int)sizeof(struct dylib_command) + (unsigned int)dylibPath.length;
    // 我添加的dylibPath.length = 51,整个length = 75, padding = 5
    unsigned int padding = (8 - (length % 8));
    
    // check if data we are replacing is null
    // 获取要添加command的位置的内容
    NSData *occupant = [binary subdataWithRange:NSMakeRange(macho.header.sizeofcmds + macho.offset + macho.size,
                                                            length + padding)];

    // All operations in optool try to maintain a constant byte size of the executable
    // so we don't want to append new bytes to the binary (that would break the executable
    // since everything is offset-based--we'd have to go in and adjust every offset)
    // So instead take advantage of the huge amount of padding after the load commands
    // 比较如果非空,说明到了代码的内容部分,就有问题。添加命令只能在Load Command后面且在代码内容前
    if (strcmp([occupant bytes], "\0")) {
        NSLog(@"cannot inject payload into %s because there is no room", dylibPath.fileSystemRepresentation);
        return NO;
    }
    
    LOG("Inserting a %s command for architecture: %s", LC(type), CPU(macho.header.cputype));
    
    struct dylib_command command;
    struct dylib dylib;
    
    /*
 * A variable length string in a load command is represented by an lc_str
 * union.  The strings are stored just after the load command structure and
 * the offset is from the start of the load command structure.  The size
 * of the string is reflected in the cmdsize field of the load command.
 * Once again any padded bytes to bring the cmdsize field to a multiple
 * of 4 bytes must be zero.
 *  lc_str 的注释
 */
    // 所以offset是dylib command的大小
    dylib.name.offset = sizeof(struct dylib_command);
    dylib.timestamp = 2; // load commands I've seen use 2 for some reason
    dylib.current_version = 0;
    dylib.compatibility_version = 0;
    command.cmd = type;
    command.dylib = dylib;
    // 命令的长度等于内容 + 对齐
    command.cmdsize = length + padding;
    
    unsigned int zeroByte = 0;
    NSMutableData *commandData = [NSMutableData data];
    // 动态库 command的内容
    [commandData appendBytes:&command length:sizeof(struct dylib_command)];
    // 动态库的路径字符串
    [commandData appendData:[dylibPath dataUsingEncoding:NSASCIIStringEncoding]];
    // 默认补0
    [commandData appendBytes:&zeroByte length:padding];
    
    // remove enough null bytes to account of our inserted data
    // 这部分内容用0覆盖,最后一个参数传0就是指定覆盖区间就是Range的length
    [binary replaceBytesInRange:NSMakeRange(macho.offset + macho.header.sizeofcmds + macho.size, commandData.length)
                      withBytes:0
                         length:0];
    // insert the data
    // 添加data
    [binary replaceBytesInRange:NSMakeRange(lastOffset, 0) withBytes:commandData.bytes length:commandData.length];
    
    // fix the existing header
    // 元数据的修改
    macho.header.ncmds += 1;
    macho.header.sizeofcmds += command.cmdsize;
    
    // this is safe to do in 32bit because the 4 bytes after the header are still being put back
    // 替换mach的header部分
    [binary replaceBytesInRange:NSMakeRange(macho.offset, sizeof(macho.header)) withBytes:&macho.header];
    
    return YES;
}

效果

最后是加载在LC_LOAD_DYLIB这块

insertLoadEntryIntoBinary函数

objc 复制代码
// 替换的时候有个lastOffset
    [binary replaceBytesInRange:NSMakeRange(lastOffset, 0) withBytes:commandData.bytes length:commandData.length];
    
// 上面检测是否是重复添加时,会去计算
    if (binaryHasLoadCommandForDylib(binary, dylibPath, &lastOffset, macho)) {
objc 复制代码
...
// 只有这个LC_LOAD_DYLIB命令会修改loadOffset
case LC_LOAD_DYLIB: {
                struct dylib_command command = *(struct dylib_command *)(binary.bytes + binary.currentOffset);
                char *name = (char *)[[binary subdataWithRange:NSMakeRange(binary.currentOffset + command.dylib.name.offset, command.cmdsize - command.dylib.name.offset)] bytes];
                
                if ([@(name) isEqualToString:dylib]) {
                    *lastOffset = (unsigned int)binary.currentOffset;
                    return YES;
                }
                
                binary.currentOffset += size;
                loadOffset = (unsigned int)binary.currentOffset;
                break;
            }
... 
// 这个lastOffset 就是最后一个LC_LOAD_DYLIB的位置
if (lastOffset != NULL)
        *lastOffset = loadOffset;

optool 添加有限制,但是LoadCommand末尾到代码部分的空间对添加几条命令还是足够的

02x 参考

  1. va_list、va_start和va_end使用
  2. replaceBytes(in:withBytes:length:)
相关推荐
h汉堡1 小时前
C++入门基础
开发语言·c++·学习
小斌的Debug日记2 小时前
SpringBoot和微服务学习记录Day3
spring boot·学习·微服务
Rex_Code2 小时前
iOS Google登录
ios
路有瑶台2 小时前
EXCEL学习
学习·excel
计算机视觉与OpenCV2 小时前
自动驾驶与机器人算法学习
学习·机器人·自动驾驶
筱小虾米2 小时前
Dify忘记管理员密码,重置的问题
人工智能·学习·dify
FAREWELL000752 小时前
C#进阶学习(九)委托的介绍
开发语言·学习·c#·委托
我的代码永没有bug3 小时前
day1-小白学习JAVA---JDK安装和环境变量配置(mac版)
java·学习·macos
浅浅2804 小时前
numpy、pandas内存优化操作整理
数据结构·经验分享·python·学习·性能优化·numpy·pandas
nenchoumi31194 小时前
VLA论文精读(十四)PointVLA: Injecting the 3D World into Vision-Language-Action Models
论文阅读·笔记·学习·vla