本篇介绍
相信搞过android开发的都经历过crash的case,crash后可以看到一个非常详细的堆栈,从这个堆栈上可以看到crash时候的调用上下文,该信息在分析crash问题时非常有价值,那本篇我们就来看下这个堆栈是如何拿到的。
Unwind介绍
利用内存信息拿到调用堆栈的过程就是回栈,Unwind,业界也有开源的libunwind方案libunwind, 研究该流程可以领略到计算机的不少奥妙。接下来我们就开始看看android上的unwind。
在Android上有AndroidLocalUnwinder和AndroidRemoteUnwinder,前者是获取本进程堆栈信息,后者是获取跨进程堆栈信息,我们就先从前者看下。
java
class AndroidLocalUnwinder : public AndroidUnwinder {
public:
AndroidLocalUnwinder() : AndroidUnwinder(getpid()) {
initial_map_names_to_skip_.emplace_back(kUnwindstackLib);
}
AndroidLocalUnwinder(std::shared_ptr<Memory>& process_memory)
: AndroidUnwinder(getpid(), process_memory) {
initial_map_names_to_skip_.emplace_back(kUnwindstackLib);
}
AndroidLocalUnwinder(const std::vector<std::string>& initial_map_names_to_skip)
: AndroidUnwinder(getpid(), initial_map_names_to_skip) {
initial_map_names_to_skip_.emplace_back(kUnwindstackLib);
}
AndroidLocalUnwinder(const std::vector<std::string>& initial_map_names_to_skip,
const std::vector<std::string>& map_suffixes_to_ignore)
: AndroidUnwinder(getpid(), initial_map_names_to_skip, map_suffixes_to_ignore) {
initial_map_names_to_skip_.emplace_back(kUnwindstackLib);
}
virtual ~AndroidLocalUnwinder() = default;
protected:
static constexpr const char* kUnwindstackLib = "libunwindstack.so";
bool InternalInitialize(ErrorData& error) override;
bool InternalUnwind(std::optional<pid_t> tid, AndroidUnwinderData& data) override;
};
应用通过创建AndroidLocalUnwinder对象,然后调用Unwind方法就可以了,例子如下:
ini
AndroidLocalUnwinder unwinder;
AndroidUnwinderData data;
unwinder.Unwind(tid, data);
这时候拿到的data就包含了堆栈信息,先看下AndroidUnwinderData的结构:
java
struct AndroidUnwinderData {
AndroidUnwinderData() = default;
explicit AndroidUnwinderData(const size_t max_frames) : max_frames(max_frames) {}
explicit AndroidUnwinderData(const bool show_all_frames) : show_all_frames(show_all_frames) {}
void DemangleFunctionNames();
std::string GetErrorString();
std::vector<FrameData> frames;
ErrorData error;
std::optional<std::unique_ptr<Regs>> saved_initial_regs;
const std::optional<size_t> max_frames;
const bool show_all_frames = false;
};
这儿的FrameData就对应了某个函数调用记录。这块下面再详细介绍,从上面的例子可以看到入口就是Unwind,那接下来看下Unwind的操作:
java
bool AndroidUnwinder::Unwind(std::optional<pid_t> tid, AndroidUnwinderData& data) {
if (!Initialize(data.error)) {
return false;
}
return InternalUnwind(tid, data);
}
虽然入口只有2行代码,先Initialize,然后InternalUnwind,可是内部有不少乾坤,我们先看下Initialize:
java
bool AndroidUnwinder::Initialize(ErrorData& error) {
// Android stores the jit and dex file location only in the library
// libart.so or libartd.so.
static std::vector<std::string> search_libs [[clang::no_destroy]] = {"libart.so", "libartd.so"};
std::call_once(initialize_, [this, &error]() {
if (!InternalInitialize(error)) {
initialize_status_ = false;
return;
}
jit_debug_ = CreateJitDebug(arch_, process_memory_, search_libs);
#if defined(DEXFILE_SUPPORT)
dex_files_ = CreateDexFiles(arch_, process_memory_, search_libs);
#endif
initialize_status_ = true;
});
return initialize_status_;
}
这儿的no_destroy就表示search_libs不会被析构。接下来看下InternalInitialize,这个方法也是进程内只执行一次:
java
bool AndroidLocalUnwinder::InternalInitialize(ErrorData& error) {
arch_ = Regs::CurrentArch();
maps_.reset(new LocalUpdatableMaps);
if (!maps_->Parse()) {
error.code = ERROR_MAPS_PARSE;
return false;
}
if (process_memory_ == nullptr) {
process_memory_ = Memory::CreateProcessMemoryThreadCached(getpid());
}
return true;
}
这儿就开始读取内存信息了,首先先获取当前的架构,现在的android大部分都是arm64了,接下来就是读取内存信息。先看下LocalUpdatableMaps如何Parse的:
java
bool LocalUpdatableMaps::Parse() {
pthread_rwlock_wrlock(&maps_rwlock_);
bool parsed = Maps::Parse();
pthread_rwlock_unlock(&maps_rwlock_);
return parsed;
}
继续跟一下:
java
bool Maps::Parse() {
std::shared_ptr<MapInfo> prev_map;
return android::procinfo::ReadMapFile(GetMapsFile(),
[&](const android::procinfo::MapInfo& mapinfo) {
// Mark a device map in /dev/ and not in /dev/ashmem/ specially.
auto flags = mapinfo.flags;
if (strncmp(mapinfo.name.c_str(), "/dev/", 5) == 0 &&
strncmp(mapinfo.name.c_str() + 5, "ashmem/", 7) != 0) {
flags |= unwindstack::MAPS_FLAGS_DEVICE_MAP;
}
maps_.emplace_back(
MapInfo::Create(prev_map, mapinfo.start, mapinfo.end, mapinfo.pgoff, flags, mapinfo.name));
prev_map = maps_.back();
});
}
这儿就开始读内存的maps信息,然后开始解析成定义好的map格式。读取的文件路径是"/proc/self/maps", 这里记录的就是内存段信息,格式如下所示:
java
emu64a:/proc/4584 # cat maps |more
12c00000-5ac00000 rw-p 00000000 00:00 0 [anon:dalvik-main space (region space)]
6f0d5000-6f363000 rw-p 00000000 00:00 0 [anon:dalvik-/system/framework/boot.art]
6f363000-6f3a5000 rw-p 00000000 00:00 0 [anon:dalvik-/system/framework/boot-core-libart.art]
6f3a5000-6f3ce000 rw-p 00000000 00:00 0 [anon:dalvik-/system/framework/boot-okhttp.art]
6f3ce000-6f410000 rw-p 00000000 00:00 0 [anon:dalvik-/system/framework/boot-bouncycastle.art]
6f410000-6f411000 rw-p 00000000 00:00 0 [anon:dalvik-/system/framework/boot-apache-xml.art]
6f411000-6f4a4000 r--p 00000000 fe:00 1299 /system/framework/arm64/boot.oat
6f4a4000-6f788000 r-xp 00093000 fe:00 1299 /system/framework/arm64/boot.oat
6f788000-6f789000 rw-p 00000000 00:00 0 [anon:.bss]
6f789000-6f79c000 rw-p 00000000 fe:00 1300 /system/framework/arm64/boot.vdex
6f79c000-6f79d000 r--p 00377000 fe:00 1299 /system/framework/arm64/boot.oat
6f79d000-6f79e000 rw-p 00378000 fe:00 1299 /system/framework/arm64/boot.oat
6f79e000-6f7ac000 r--p 00000000 fe:00 1275 /system/framework/arm64/boot-core-libart.oat
6f7ac000-6f7f0000 r-xp 0000e000 fe:00 1275 /system/framework/arm64/boot-core-libart.oat
6f7f0000-6f7f1000 rw-p 00000000 00:00 0 [anon:.bss]
6f7f1000-6f7f4000 rw-p 00000000 fe:00 1276 /system/framework/arm64/boot-core-libart.vdex
6f7f4000-6f7f5000 r--p 00052000 fe:00 1275 /system/framework/arm64/boot-core-libart.oat
6f7f5000-6f7f6000 rw-p 00053000 fe:00 1275 /system/framework/arm64/boot-core-libart.oat
6f7f6000-6f802000 r--p 00000000 fe:00 1290 /system/framework/arm64/boot-okhttp.oat
6f802000-6f836000 r-xp 0000c000 fe:00 1290 /system/framework/arm64/boot-okhttp.oat
6f836000-6f837000 rw-p 00000000 00:00 0 [anon:.bss]
6f837000-6f839000 rw-p 00000000 fe:00 1291 /system/framework/arm64/boot-okhttp.vdex
6f839000-6f83a000 r--p 00040000 fe:00 1290 /system/framework/arm64/boot-okhttp.oat
6f83a000-6f83b000 rw-p 00041000 fe:00 1290 /system/framework/arm64/boot-okhttp.oat
6f83b000-6f843000 r--p 00000000 fe:00 1269 /system/framework/arm64/boot-bouncycastle.oat
6f843000-6f858000 r-xp 00008000 fe:00 1269 /system/framework/arm64/boot-bouncycastle.oat
包含的信息是虚拟地址的起始地址,结束地址,权限,该虚拟地址空间对应的内容在被映射文件中的偏移,设备号,inode号。
接下来就是将这些信息解析出来。
具体解析过程如下
java
inline bool ReadMapFileContent(char* content, const MapInfoParamsCallback& callback) {
uint64_t start_addr;
uint64_t end_addr;
uint16_t flags;
uint64_t pgoff;
ino_t inode;
char* line_start = content;
char* next_line;
char* name;
bool shared;
while (line_start != nullptr && *line_start != '\0') {
bool parsed = ParseMapsFileLine(line_start, start_addr, end_addr, flags, pgoff,
inode, &name, shared, &next_line);
if (!parsed) {
return false;
}
line_start = next_line;
callback(start_addr, end_addr, flags, pgoff, inode, name, shared);
}
return true;
}
这样就是解析maps文件,可以参考上面的信息一块看,可以更好的看到每个字段的含义:
java
// Parses the given line p pointing at proc/<pid>/maps content buffer and returns true on success
// and false on failure parsing. The first new line character of line will be replaced by the
// null character and *next_line will point to the character after the null.
//
// Example of how a parsed line look line:
// 00400000-00409000 r-xp 00000000 fc:00 426998 /usr/lib/gvfs/gvfsd-http
static inline bool ParseMapsFileLine(char* p, uint64_t& start_addr, uint64_t& end_addr, uint16_t& flags,
uint64_t& pgoff, ino_t& inode, char** name, bool& shared, char** next_line) {
// Make the first new line character null.
*next_line = strchr(p, '\n');
if (*next_line != nullptr) {
**next_line = '\0';
(*next_line)++;
}
char* end;
// start_addr
start_addr = strtoull(p, &end, 16);
if (end == p || *end != '-') {
return false;
}
p = end + 1;
// end_addr
end_addr = strtoull(p, &end, 16);
if (end == p) {
return false;
}
p = end;
if (!PassSpace(&p)) {
return false;
}
// flags
flags = 0;
if (*p == 'r') {
flags |= PROT_READ;
} else if (*p != '-') {
return false;
}
p++;
if (*p == 'w') {
flags |= PROT_WRITE;
} else if (*p != '-') {
return false;
}
p++;
if (*p == 'x') {
flags |= PROT_EXEC;
} else if (*p != '-') {
return false;
}
p++;
if (*p != 'p' && *p != 's') {
return false;
}
shared = *p == 's';
p++;
if (!PassSpace(&p)) {
return false;
}
// pgoff
pgoff = strtoull(p, &end, 16);
if (end == p) {
return false;
}
p = end;
if (!PassSpace(&p)) {
return false;
}
// major:minor
if (!PassXdigit(&p) || *p++ != ':' || !PassXdigit(&p) || !PassSpace(&p)) {
return false;
}
// inode
inode = strtoull(p, &end, 10);
if (end == p) {
return false;
}
p = end;
if (*p != '\0' && !PassSpace(&p)) {
return false;
}
// Assumes that the first new character was replaced with null.
*name = p;
return true;
}
将解析的每行信息用MapInfo表示,因此也可以和MapInfo对照看下:
java
// Represents virtual memory map (as obtained from /proc/*/maps).
//
// Note that we have to be surprisingly careful with memory usage here,
// since in system-wide profiling this data can take considerable space.
// (for example, 400 process * 400 maps * 128 bytes = 20 MB + string data).
class MapInfo {
public:
MapInfo(std::shared_ptr<MapInfo>& prev_map, uint64_t start, uint64_t end, uint64_t offset,
uint64_t flags, SharedString name)
: start_(start),
end_(end),
offset_(offset),
flags_(flags),
name_(name),
elf_fields_(nullptr),
prev_map_(prev_map) {}
MapInfo(uint64_t start, uint64_t end, uint64_t offset, uint64_t flags, SharedString name)
: start_(start),
end_(end),
offset_(offset),
flags_(flags),
name_(name),
elf_fields_(nullptr) {}
static inline std::shared_ptr<MapInfo> Create(std::shared_ptr<MapInfo>& prev_map,
uint64_t start, uint64_t end, uint64_t offset,
uint64_t flags, SharedString name) {
auto map_info = std::make_shared<MapInfo>(prev_map, start, end, offset, flags, name);
if (prev_map) {
prev_map->next_map_ = map_info;
}
return map_info;
}
接下来继续看下内存读取:
java
std::shared_ptr<Memory> Memory::CreateProcessMemoryThreadCached(pid_t pid) {
if (pid == getpid()) {
return std::shared_ptr<Memory>(new MemoryThreadCache(new MemoryLocal()));
}
return std::shared_ptr<Memory>(new MemoryThreadCache(new MemoryRemote(pid)));
}
这儿只是创建了一个Memory对象,并没有真正读取,那这个对象用来干啥呢? 需要继续往后看:
java
bool AndroidUnwinder::Initialize(ErrorData& error) {
// Android stores the jit and dex file location only in the library
// libart.so or libartd.so.
static std::vector<std::string> search_libs [[clang::no_destroy]] = {"libart.so", "libartd.so"};
std::call_once(initialize_, [this, &error]() {
if (!InternalInitialize(error)) {
initialize_status_ = false;
return;
}
jit_debug_ = CreateJitDebug(arch_, process_memory_, search_libs);
#if defined(DEXFILE_SUPPORT)
dex_files_ = CreateDexFiles(arch_, process_memory_, search_libs);
#endif
initialize_status_ = true;
});
return initialize_status_;
}
由于java对应的jit和dex信息会存在到虚拟机的动态库中,因此需要从art的库中解析
接下来看下 CreateJitDebug:
java
std::unique_ptr<JitDebug> CreateJitDebug(ArchEnum arch, std::shared_ptr<Memory>& memory,
std::vector<std::string> search_libs) {
return CreateGlobalDebugImpl<Elf>(arch, memory, search_libs, "__jit_debug_descriptor");
}
template <typename Symfile>
std::unique_ptr<GlobalDebugInterface<Symfile>> CreateGlobalDebugImpl(
ArchEnum arch, std::shared_ptr<Memory>& memory, std::vector<std::string> search_libs,
const char* global_variable_name) {
CHECK(arch != ARCH_UNKNOWN);
// The interface needs to see real-time changes in memory for synchronization with the
// concurrently running ART JIT compiler. Skip caching and read the memory directly.
std::shared_ptr<Memory> jit_memory;
MemoryCacheBase* cached_memory = memory->AsMemoryCacheBase();
if (cached_memory != nullptr) {
jit_memory = cached_memory->UnderlyingMemory();
} else {
jit_memory = memory;
}
switch (arch) {
case ARCH_X86: {
using Impl = GlobalDebugImpl<Symfile, uint32_t, Uint64_P>;
static_assert(offsetof(typename Impl::JITCodeEntry, symfile_size) == 12, "layout");
static_assert(offsetof(typename Impl::JITCodeEntry, seqlock) == 28, "layout");
static_assert(sizeof(typename Impl::JITCodeEntry) == 32, "layout");
static_assert(sizeof(typename Impl::JITDescriptor) == 48, "layout");
return std::make_unique<Impl>(arch, jit_memory, search_libs, global_variable_name);
}
case ARCH_ARM: {
using Impl = GlobalDebugImpl<Symfile, uint32_t, Uint64_A>;
static_assert(offsetof(typename Impl::JITCodeEntry, symfile_size) == 16, "layout");
static_assert(offsetof(typename Impl::JITCodeEntry, seqlock) == 32, "layout");
static_assert(sizeof(typename Impl::JITCodeEntry) == 40, "layout");
static_assert(sizeof(typename Impl::JITDescriptor) == 48, "layout");
return std::make_unique<Impl>(arch, jit_memory, search_libs, global_variable_name);
}
case ARCH_ARM64:
case ARCH_X86_64:
case ARCH_RISCV64: {
using Impl = GlobalDebugImpl<Symfile, uint64_t, Uint64_A>;
static_assert(offsetof(typename Impl::JITCodeEntry, symfile_size) == 24, "layout");
static_assert(offsetof(typename Impl::JITCodeEntry, seqlock) == 40, "layout");
static_assert(sizeof(typename Impl::JITCodeEntry) == 48, "layout");
static_assert(sizeof(typename Impl::JITDescriptor) == 56, "layout");
return std::make_unique<Impl>(arch, jit_memory, search_libs, global_variable_name);
}
default:
abort();
}
}
这儿是按照体系结构创建GlobalDebugImpl,我们先只关心ARM64。
先看下AsMemoryCacheBase做了啥:
java
MemoryCacheBase* AsMemoryCacheBase() override { return this; }
只是返回了自己。接下来是UnderlyingMemory
java
const std::shared_ptr<Memory>& UnderlyingMemory() { return impl_; }
这儿对于我们,返回的其实是MemoryLocal。然后就是创建GlobalDebugImpl对象了。
这儿需要了解一个背景,gdb是如何调式java代码的?要知道java代码运行时会通过jit形成可以直接执行的代码,那这时候就需要一个映射信息,比如某个java 符号对应的代码地址与范围。
这些信息就是以如下结构描述的:
java
struct JITCodeEntry {
Uintptr_T next;
Uintptr_T prev;
Uintptr_T symfile_addr;
Uint64_T symfile_size;
// Android-specific fields:
Uint64_T timestamp;
uint32_t seqlock;
};
struct JITDescriptor {
uint32_t version;
uint32_t action_flag;
Uintptr_T relevant_entry;
Uintptr_T first_entry;
// Android-specific fields:
uint8_t magic[8];
uint32_t flags;
uint32_t sizeof_descriptor;
uint32_t sizeof_entry;
uint32_t seqlock;
Uint64_T timestamp;
};
接下来再看下DexFile:
java
std::unique_ptr<DexFiles> CreateDexFiles(ArchEnum arch, std::shared_ptr<Memory>& memory,
std::vector<std::string> search_libs) {
return CreateGlobalDebugImpl<DexFile>(arch, memory, search_libs, "__dex_debug_descriptor");
}
这儿流程和JitDebug一样,可以在后面深入看。
到了这儿Initialize流程算是结束了,接下来就要开始回栈了
java
bool AndroidLocalUnwinder::InternalUnwind(std::optional<pid_t> tid, AndroidUnwinderData& data) {
if (!tid) {
tid = android::base::GetThreadId();
}
if (static_cast<uint64_t>(*tid) == android::base::GetThreadId()) {
// Unwind current thread.
std::unique_ptr<Regs> regs(Regs::CreateFromLocal());
RegsGetLocal(regs.get());
return AndroidUnwinder::Unwind(regs.get(), data);
}
ThreadUnwinder unwinder(data.max_frames.value_or(max_frames_), maps_.get(), process_memory_);
unwinder.SetJitDebug(jit_debug_.get());
unwinder.SetDexFiles(dex_files_.get());
std::unique_ptr<Regs>* initial_regs = nullptr;
if (data.saved_initial_regs) {
initial_regs = &data.saved_initial_regs.value();
}
unwinder.UnwindWithSignal(kThreadUnwindSignal, *tid, initial_regs,
data.show_all_frames ? nullptr : &initial_map_names_to_skip_,
&map_suffixes_to_ignore_);
data.frames = unwinder.ConsumeFrames();
data.error = unwinder.LastError();
return data.frames.size() != 0;
}
开始先判断是否是给当前线程回栈,如果是,那就直接回就可以,如果不是,那么还需要通过信号的方式。可以想到,前者是后者的特例,因此直接看后者的流程即可。
首先是构造了一个ThreadUnwinder对象,然后是一顿赋值,并没有实际的逻辑操作,接下开看下重头戏UnwindWithSignal
java
void ThreadUnwinder::UnwindWithSignal(int signal, pid_t tid, std::unique_ptr<Regs>* initial_regs,
const std::vector<std::string>* initial_map_names_to_skip,
const std::vector<std::string>* map_suffixes_to_ignore) {
ClearErrors();
if (tid == static_cast<pid_t>(android::base::GetThreadId())) {
last_error_.code = ERROR_UNSUPPORTED;
return;
}
if (!Init()) {
return;
}
ThreadEntry* entry = SendSignalToThread(signal, tid);
if (entry == nullptr) {
return;
}
std::unique_ptr<Regs> regs(Regs::CreateFromUcontext(Regs::CurrentArch(), entry->GetUcontext()));
if (initial_regs != nullptr) {
initial_regs->reset(regs->Clone());
}
SetRegs(regs.get());
UnwinderFromPid::Unwind(initial_map_names_to_skip, map_suffixes_to_ignore);
// Tell the signal handler to exit and release the entry.
entry->Wake();
// Wait for the thread to indicate it is done with the ThreadEntry.
// If this fails, the Wait command will log an error message.
entry->Wait(WAIT_FOR_THREAD_TO_RESTART);
ThreadEntry::Remove(entry);
}
从前面的逻辑我们知道这儿的tid是目标线程的tid,和当前线程并不一样,因此接下来就是初始化,然后给目标线程发信号。
先看下初始化:
java
bool UnwinderFromPid::Init() {
CHECK(arch_ != ARCH_UNKNOWN);
if (initted_) {
return true;
}
initted_ = true;
if (maps_ == nullptr) {
if (pid_ == getpid()) {
maps_ptr_.reset(new LocalMaps());
} else {
maps_ptr_.reset(new RemoteMaps(pid_));
}
if (!maps_ptr_->Parse()) {
ClearErrors();
last_error_.code = ERROR_INVALID_MAP;
return false;
}
maps_ = maps_ptr_.get();
}
if (process_memory_ == nullptr) {
if (pid_ == getpid()) {
// Local unwind, so use thread cache to allow multiple threads
// to cache data even when multiple threads access the same object.
process_memory_ = Memory::CreateProcessMemoryThreadCached(pid_);
} else {
// Remote unwind should be safe to cache since the unwind will
// be occurring on a stopped process.
process_memory_ = Memory::CreateProcessMemoryCached(pid_);
}
}
// jit_debug_ and dex_files_ may have already been set, for example in
// AndroidLocalUnwinder::InternalUnwind.
if (jit_debug_ == nullptr) {
jit_debug_ptr_ = CreateJitDebug(arch_, process_memory_);
SetJitDebug(jit_debug_ptr_.get());
}
#if defined(DEXFILE_SUPPORT)
if (dex_files_ == nullptr) {
dex_files_ptr_ = CreateDexFiles(arch_, process_memory_);
SetDexFiles(dex_files_ptr_.get());
}
#endif
return true;
}
可以看到这儿的map_, process_memory_, jit_debug_, dex_files_ 已经初始化过了,因此这儿就不需要再初始化了。接下来就是发送信号:
java
ThreadEntry* ThreadUnwinder::SendSignalToThread(int signal, pid_t tid) {
static std::mutex action_mutex;
std::lock_guard<std::mutex> guard(action_mutex);
ThreadEntry* entry = ThreadEntry::Get(tid);
entry->Lock();
struct sigaction new_action = {.sa_sigaction = SignalHandler,
.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK};
struct sigaction old_action = {};
sigemptyset(&new_action.sa_mask);
if (sigaction(signal, &new_action, &old_action) != 0) { // 设置信号handler,这样目标线程唤醒后就会收到并让信号处理函数在目标线程中执行
Log::AsyncSafe("sigaction failed: %s", strerror(errno));
ThreadEntry::Remove(entry);
last_error_.code = ERROR_SYSTEM_CALL;
return nullptr;
}
if (tgkill(getpid(), tid, signal) != 0) { // 发送信号
// Do not emit an error message, this might be expected. Set the
// error and let the caller decide.
if (errno == ESRCH) {
last_error_.code = ERROR_THREAD_DOES_NOT_EXIST;
} else {
last_error_.code = ERROR_SYSTEM_CALL;
}
sigaction(signal, &old_action, nullptr);
ThreadEntry::Remove(entry);
return nullptr;
}
// Wait for the thread to get the ucontext. The number indicates
// that we are waiting for the first Wake() call made by the thread.
bool wait_completed = entry->Wait(WAIT_FOR_UCONTEXT); // 当前线程等待目标线程唤醒
if (wait_completed) {
return entry;
}
if (old_action.sa_sigaction == nullptr) {
// If the wait failed, it could be that the signal could not be delivered
// within the timeout. Add a signal handler that's simply going to log
// something so that we don't crash if the signal eventually gets
// delivered. Only do this if there isn't already an action set up.
struct sigaction log_action = {.sa_sigaction = SignalLogOnly,
.sa_flags = SA_RESTART | SA_SIGINFO | SA_ONSTACK};
sigemptyset(&log_action.sa_mask);
sigaction(signal, &log_action, nullptr);
} else {
sigaction(signal, &old_action, nullptr);
}
// Check to see if the thread has disappeared.
if (tgkill(getpid(), tid, 0) == -1 && errno == ESRCH) { // 如果发送信号失败了,设置错误码
last_error_.code = ERROR_THREAD_DOES_NOT_EXIST;
} else {
last_error_.code = ERROR_THREAD_TIMEOUT;
}
ThreadEntry::Remove(entry);
return nullptr;
}
这儿的流程就是给当前进程设置一个信号处理函数,然后让这个信号处理函数在目标线程中执行,然后当前线程就坐等通知就行。这时候的线程间通信依赖的就是ThreadEntry,可以先看下它的内容,比较简洁:
java
bool ThreadEntry::Wait(WaitType type) {
static const std::chrono::duration wait_time(std::chrono::seconds(10));
std::unique_lock<std::mutex> lock(wait_mutex_);
if (wait_cond_.wait_for(lock, wait_time, [this, type] { return wait_value_ == type; })) {
return true;
} else {
Log::AsyncSafe("Timeout waiting for %s", GetWaitTypeName(type));
return false;
}
}
void ThreadEntry::Wake() {
wait_mutex_.lock();
wait_value_++;
wait_mutex_.unlock();
wait_cond_.notify_one();
}
接下来再看下SignalHandler,要记住,这个handler是在目标线程中执行的:
java
static void SignalHandler(int, siginfo_t*, void* sigcontext) {
android::base::ErrnoRestorer restore;
ThreadEntry* entry = ThreadEntry::Get(android::base::GetThreadId(), false);
if (!entry) {
return;
}
entry->CopyUcontextFromSigcontext(sigcontext);
// Indicate the ucontext is now valid.
entry->Wake();
// Pause the thread until the unwind is complete. This avoids having
// the thread run ahead causing problems.
// The number indicates that we are waiting for the second Wake() call
// overall which is made by the thread requesting an unwind.
if (entry->Wait(WAIT_FOR_UNWIND_TO_COMPLETE)) {
// Do not remove the entry here because that can result in a deadlock
// if the code cannot properly send a signal to the thread under test.
entry->Wake();
}
// If the wait fails, the entry might have been freed, so only exit.
}
是不是这儿的流程看起来像是状态机了?先unwind线程发起Wait,目标线程保存中断上下文,然后唤醒unwind线程,目标线程继续Wait,等unwind线程来唤醒,最后目标线程再唤醒unwind线程。
这儿的entry中就保存了上下文信息:
java
void ThreadEntry::CopyUcontextFromSigcontext(void* sigcontext) {
ucontext_t* ucontext = reinterpret_cast<ucontext_t*>(sigcontext);
// The only thing the unwinder cares about is the mcontext data.
memcpy(&ucontext_.uc_mcontext, &ucontext->uc_mcontext, sizeof(ucontext->uc_mcontext));
}
那mcontext是啥呢?
java
/* Structure to describe FPU registers. */
typedef struct _libc_fpstate *fpregset_t;
/* Context to describe whole processor state. */
typedef struct
{
gregset_t gregs;
/* Note that fpregs is a pointer. */
fpregset_t fpregs;
__extension__ unsigned long long __reserved1 [8];
} mcontext_t;
/* Userlevel context. */
typedef struct ucontext
{
unsigned long int uc_flags;
struct ucontext *uc_link;
stack_t uc_stack;
mcontext_t uc_mcontext;
__sigset_t uc_sigmask;
struct _libc_fpstate __fpregs_mem;
} ucontext_t;
也就是通用寄存器和浮点寄存器的值,通用寄存器的格式如下:
java
/* Type for general register. */
__extension__ typedef long long int greg_t;
/* Number of general registers. */
#define NGREG 23
/* Container for all general registers. */
typedef greg_t gregset_t[NGREG];
#ifdef __USE_GNU
/* Number of each register in the `gregset_t' array. */
enum
{
REG_R8 = 0,
# define REG_R8 REG_R8
REG_R9,
# define REG_R9 REG_R9
REG_R10,
# define REG_R10 REG_R10
REG_R11,
# define REG_R11 REG_R11
REG_R12,
# define REG_R12 REG_R12
REG_R13,
# define REG_R13 REG_R13
REG_R14,
# define REG_R14 REG_R14
REG_R15,
# define REG_R15 REG_R15
REG_RDI,
# define REG_RDI REG_RDI
REG_RSI,
# define REG_RSI REG_RSI
REG_RBP,
# define REG_RBP REG_RBP
REG_RBX,
# define REG_RBX REG_RBX
REG_RDX,
# define REG_RDX REG_RDX
REG_RAX,
# define REG_RAX REG_RAX
REG_RCX,
# define REG_RCX REG_RCX
REG_RSP,
# define REG_RSP REG_RSP
REG_RIP,
# define REG_RIP REG_RIP
REG_EFL,
# define REG_EFL REG_EFL
REG_CSGSFS, /* Actually short cs, gs, fs, __pad0. */
# define REG_CSGSFS REG_CSGSFS
REG_ERR,
# define REG_ERR REG_ERR
REG_TRAPNO,
# define REG_TRAPNO REG_TRAPNO
REG_OLDMASK,
# define REG_OLDMASK REG_OLDMASK
REG_CR2
# define REG_CR2 REG_CR2
};
#endif
浮点寄存器的表示如下:
java
struct _libc_fpxreg
{
unsigned short int significand[4];
unsigned short int exponent;
unsigned short int padding[3];
};
struct _libc_xmmreg
{
__uint32_t element[4];
};
struct _libc_fpstate
{
/* 64-bit FXSAVE format. */
__uint16_t cwd;
__uint16_t swd;
__uint16_t ftw;
__uint16_t fop;
__uint64_t rip;
__uint64_t rdp;
__uint32_t mxcsr;
__uint32_t mxcr_mask;
struct _libc_fpxreg _st[8];
struct _libc_xmmreg _xmm[16];
__uint32_t padding[24];
};
/* Structure to describe FPU registers. */
typedef struct _libc_fpstate *fpregset_t;
也就是这样一操作,unwind线程就可以拿到目标线程的寄存器上下文信息。
再看下unwind线程是如何保存寄存器信息的:
java
Regs* Regs::CreateFromUcontext(ArchEnum arch, void* ucontext) {
switch (arch) {
case ARCH_X86:
return RegsX86::CreateFromUcontext(ucontext);
case ARCH_X86_64:
return RegsX86_64::CreateFromUcontext(ucontext);
case ARCH_ARM:
return RegsArm::CreateFromUcontext(ucontext);
case ARCH_ARM64:
return RegsArm64::CreateFromUcontext(ucontext);
case ARCH_RISCV64:
return RegsRiscv64::CreateFromUcontext(ucontext);
case ARCH_UNKNOWN:
default:
return nullptr;
}
}
我们只关心ARM64, 继续看下:
java
Regs* RegsArm64::CreateFromUcontext(void* ucontext) {
arm64_ucontext_t* arm64_ucontext = reinterpret_cast<arm64_ucontext_t*>(ucontext);
RegsArm64* regs = new RegsArm64();
memcpy(regs->RawData(), &arm64_ucontext->uc_mcontext.regs[0], ARM64_REG_LAST * sizeof(uint64_t));
return regs;
}
这儿先来一个类型强转,翻译成arm64的上下文,然后将寄存器信息保存下来。
看下arm64的上下文结构:
java
struct arm64_mcontext_t {
uint64_t fault_address; // __u64
uint64_t regs[ARM64_REG_LAST]; // __u64
uint64_t pstate; // __u64
// Nothing else is used, so don't define it.
};
struct arm64_ucontext_t {
uint64_t uc_flags; // unsigned long
uint64_t uc_link; // struct ucontext*
arm64_stack_t uc_stack;
arm64_sigset_t uc_sigmask;
// The kernel adds extra padding after uc_sigmask to match glibc sigset_t on ARM64.
char __padding[128 - sizeof(arm64_sigset_t)];
// The full structure requires 16 byte alignment, but our partial structure
// doesn't, so force the alignment.
arm64_mcontext_t uc_mcontext __attribute__((aligned(16)));
};
这时候就是设置下寄存器信息:
java
void SetRegs(Regs* regs) {
regs_ = regs;
arch_ = regs_ != nullptr ? regs->Arch() : ARCH_UNKNOWN;
}
这儿并没有翻译,需要继续往下看Unwind:
java
void UnwinderFromPid::Unwind(const std::vector<std::string>* initial_map_names_to_skip,
const std::vector<std::string>* map_suffixes_to_ignore) {
if (!Init()) {
return;
}
Unwinder::Unwind(initial_map_names_to_skip, map_suffixes_to_ignore);
}
我们已经看到Init过了,接下来进入核心部分:
java
void Unwinder::Unwind(const std::vector<std::string>* initial_map_names_to_skip,
const std::vector<std::string>* map_suffixes_to_ignore) {
CHECK(arch_ != ARCH_UNKNOWN);
ClearErrors();
frames_.clear();
// Clear any cached data from previous unwinds.
process_memory_->Clear();
if (maps_->Find(regs_->pc()) == nullptr) {
regs_->fallback_pc();
}
bool return_address_attempt = false;
bool adjust_pc = false;
for (; frames_.size() < max_frames_;) {
uint64_t cur_pc = regs_->pc();
uint64_t cur_sp = regs_->sp();
std::shared_ptr<MapInfo> map_info = maps_->Find(regs_->pc());
uint64_t pc_adjustment = 0;
uint64_t step_pc;
uint64_t rel_pc;
Elf* elf;
bool ignore_frame = false;
if (map_info == nullptr) {
step_pc = regs_->pc();
rel_pc = step_pc;
// If we get invalid map via return_address_attempt, don't hide error for the previous frame.
if (!return_address_attempt || last_error_.code == ERROR_NONE) {
last_error_.code = ERROR_INVALID_MAP;
last_error_.address = step_pc;
}
elf = nullptr;
} else {
ignore_frame =
initial_map_names_to_skip != nullptr &&
std::find(initial_map_names_to_skip->begin(), initial_map_names_to_skip->end(),
android::base::Basename(map_info->name())) != initial_map_names_to_skip->end();
if (!ignore_frame && ShouldStop(map_suffixes_to_ignore, map_info->name())) {
break;
}
elf = map_info->GetElf(process_memory_, arch_);
step_pc = regs_->pc();
rel_pc = elf->GetRelPc(step_pc, map_info.get());
// Everyone except elf data in gdb jit debug maps uses the relative pc.
if (!(map_info->flags() & MAPS_FLAGS_JIT_SYMFILE_MAP)) {
step_pc = rel_pc;
}
if (adjust_pc) {
pc_adjustment = GetPcAdjustment(rel_pc, elf, arch_);
} else {
pc_adjustment = 0;
}
step_pc -= pc_adjustment;
// If the pc is in an invalid elf file, try and get an Elf object
// using the jit debug information.
if (!elf->valid() && jit_debug_ != nullptr && (map_info->flags() & PROT_EXEC)) {
uint64_t adjusted_jit_pc = regs_->pc() - pc_adjustment;
Elf* jit_elf = jit_debug_->Find(maps_, adjusted_jit_pc);
if (jit_elf != nullptr) {
// The jit debug information requires a non relative adjusted pc.
step_pc = adjusted_jit_pc;
elf = jit_elf;
}
}
}
FrameData* frame = nullptr;
if (!ignore_frame) {
if (regs_->dex_pc() != 0) {
// Add a frame to represent the dex file.
FillInDexFrame();
// Clear the dex pc so that we don't repeat this frame later.
regs_->set_dex_pc(0);
// Make sure there is enough room for the real frame.
if (frames_.size() == max_frames_) {
last_error_.code = ERROR_MAX_FRAMES_EXCEEDED;
break;
}
}
frame = FillInFrame(map_info, elf, rel_pc, pc_adjustment);
// Once a frame is added, stop skipping frames.
initial_map_names_to_skip = nullptr;
}
adjust_pc = true;
bool stepped = false;
bool in_device_map = false;
bool finished = false;
if (map_info != nullptr) {
if (map_info->flags() & MAPS_FLAGS_DEVICE_MAP) {
// Do not stop here, fall through in case we are
// in the speculative unwind path and need to remove
// some of the speculative frames.
in_device_map = true;
} else {
auto sp_info = maps_->Find(regs_->sp());
if (sp_info != nullptr && sp_info->flags() & MAPS_FLAGS_DEVICE_MAP) {
// Do not stop here, fall through in case we are
// in the speculative unwind path and need to remove
// some of the speculative frames.
in_device_map = true;
} else {
bool is_signal_frame = false;
if (elf->StepIfSignalHandler(rel_pc, regs_, process_memory_.get())) {
stepped = true;
is_signal_frame = true;
} else if (elf->Step(step_pc, regs_, process_memory_.get(), &finished,
&is_signal_frame)) {
stepped = true;
}
if (is_signal_frame && frame != nullptr) {
// Need to adjust the relative pc because the signal handler
// pc should not be adjusted.
frame->rel_pc = rel_pc;
frame->pc += pc_adjustment;
step_pc = rel_pc;
}
elf->GetLastError(&last_error_);
}
}
}
if (frame != nullptr) {
if (!resolve_names_ ||
!elf->GetFunctionName(step_pc, &frame->function_name, &frame->function_offset)) {
frame->function_name = "";
frame->function_offset = 0;
}
}
if (finished) {
break;
}
if (!stepped) {
if (return_address_attempt) {
// Only remove the speculative frame if there are more than two frames
// or the pc in the first frame is in a valid map.
// This allows for a case where the code jumps into the middle of
// nowhere, but there is no other unwind information after that.
if (frames_.size() > 2 || (frames_.size() > 0 && maps_->Find(frames_[0].pc) != nullptr)) {
// Remove the speculative frame.
frames_.pop_back();
}
break;
} else if (in_device_map) {
// Do not attempt any other unwinding, pc or sp is in a device
// map.
break;
} else {
// Steping didn't work, try this secondary method.
if (!regs_->SetPcFromReturnAddress(process_memory_.get())) {
break;
}
return_address_attempt = true;
}
} else {
return_address_attempt = false;
if (max_frames_ == frames_.size()) {
last_error_.code = ERROR_MAX_FRAMES_EXCEEDED;
}
}
// If the pc and sp didn't change, then consider everything stopped.
if (cur_pc == regs_->pc() && cur_sp == regs_->sp()) {
last_error_.code = ERROR_REPEATED_FRAME;
break;
}
}
}
这么一顿操作下来,我们就拿到了回栈结果了。我们开始分段看:
java
if (maps_->Find(regs_->pc()) == nullptr) {
regs_->fallback_pc();
}
首先看当前的pc所在的内存段,可以想到maps中存放的是不同so的内存段,Find就会进行耳返查找,看pc正在哪个so区间:
java
std::shared_ptr<MapInfo> Maps::Find(uint64_t pc) {
if (maps_.empty()) {
return nullptr;
}
size_t first = 0;
size_t last = maps_.size();
while (first < last) {
size_t index = (first + last) / 2;
const auto& cur = maps_[index];
if (pc >= cur->start() && pc < cur->end()) {
return cur;
} else if (pc < cur->start()) {
last = index;
} else {
first = index + 1;
}
}
return nullptr;
}
果然和我们想的一样,那regs->pc 就是读取pc地址:
java
uint64_t RegsArm64::pc() {
return regs_[ARM64_REG_PC];
}
那如果没找到呢?
java
void RegsArm64::fallback_pc() {
// As a last resort, try stripping the PC of the pointer
// authentication code.
regs_[ARM64_REG_PC] = strip_pac(regs_[ARM64_REG_PC], pac_mask_);
}
有些arm特性会在pc地址上做一些标记,比如hwasan就是利用了地址的高位来存放tag信息,这儿的思路也一样,把tag信息抹掉恢复成正常的pc地址:
java
static uint64_t strip_pac(uint64_t pc, uint64_t mask) {
// If the target is aarch64 then the return address may have been
// signed using the Armv8.3-A Pointer Authentication extension. The
// original return address can be restored by stripping out the
// authentication code using a mask or xpaclri. xpaclri is a NOP on
// pre-Armv8.3-A architectures.
if (mask) {
pc &= ~mask;
} else {
#if defined(__BIONIC__)
pc = __bionic_clear_pac_bits(pc);
#endif
}
return pc;
}
接下来就是利用mapinfo 进行回栈,直到栈的深度超过阈值,Android默认最大的栈深度是512,这个数量已经足够用了。
接下来按照我们的逻辑思考,如何回栈?那就是先拿到pc和sp地址,查看所在的内存段,没错,就是这个逻辑:
java
uint64_t cur_pc = regs_->pc();
uint64_t cur_sp = regs_->sp();
std::shared_ptr<MapInfo> map_info = maps_->Find(regs_->pc());
如果map info 为空,就表示没法解析了,看下如何处理的:
java
if (map_info == nullptr) {
step_pc = regs_->pc();
rel_pc = step_pc;
// If we get invalid map via return_address_attempt, don't hide error for the previous frame.
if (!return_address_attempt || last_error_.code == ERROR_NONE) {
last_error_.code = ERROR_INVALID_MAP;
last_error_.address = step_pc;
}
elf = nullptr;
}
正常的话,mapinfo中一定会包含pc,因为pc地址一定是在进程虚拟内存地址访问的范围内,而进程所有的内存地址使用情况都在maps中,所以只要是正常case,maps中就会包含pc,那有没有可能不包含呢? 当然有可能,比如代码里的野地址。
接下来就看找到对应mapinfo的case:
java
ignore_frame =
initial_map_names_to_skip != nullptr &&
std::find(initial_map_names_to_skip->begin(), initial_map_names_to_skip->end(),
android::base::Basename(map_info->name())) != initial_map_names_to_skip->end();
if (!ignore_frame && ShouldStop(map_suffixes_to_ignore, map_info->name())) {
break;
}
有一些so我们是不希望在堆栈中看到的,那就可以通过initial_map_names_to_skip来指定,比如有些隐私的so或者是libunwind,自己解析自己就容易出问题。如果是非这种case,就可以继续往下解析:
java
elf = map_info->GetElf(process_memory_, arch_);
step_pc = regs_->pc();
rel_pc = elf->GetRelPc(step_pc, map_info.get());
先获取elf,然后获取相对地址,因为pc的中的地址是相对于整个虚拟内存的,而对于so,so内都是按照相对地址来参考的,因此需要做一下转换。先看下如何获取的elf:
java
Elf* MapInfo::GetElf(const std::shared_ptr<Memory>& process_memory, ArchEnum expected_arch) {
// Make sure no other thread is trying to add the elf to this map.
std::lock_guard<std::mutex> guard(elf_mutex());
if (elf().get() != nullptr) {
return elf().get();
}
ScopedElfCacheLock elf_cache_lock;
if (Elf::CachingEnabled() && !name().empty()) {
if (Elf::CacheGet(this)) {
return elf().get();
}
}
elf().reset(new Elf(CreateMemory(process_memory)));
// If the init fails, keep the elf around as an invalid object so we
// don't try to reinit the object.
elf()->Init();
if (elf()->valid() && expected_arch != elf()->arch()) {
// Make the elf invalid, mismatch between arch and expected arch.
elf()->Invalidate();
}
if (!elf()->valid()) {
set_elf_start_offset(offset());
} else if (auto prev_real_map = GetPrevRealMap(); prev_real_map != nullptr &&
prev_real_map->flags() == PROT_READ &&
prev_real_map->offset() < offset()) {
// If there is a read-only map then a read-execute map that represents the
// same elf object, make sure the previous map is using the same elf
// object if it hasn't already been set. Locking this should not result
// in a deadlock as long as the invariant that the code only ever tries
// to lock the previous real map holds true.
std::lock_guard<std::mutex> guard(prev_real_map->elf_mutex());
if (prev_real_map->elf() == nullptr) {
// Need to verify if the map is the previous read-only map.
prev_real_map->set_elf(elf());
prev_real_map->set_memory_backed_elf(memory_backed_elf());
prev_real_map->set_elf_start_offset(elf_start_offset());
prev_real_map->set_elf_offset(prev_real_map->offset() - elf_start_offset());
} else if (prev_real_map->elf_start_offset() == elf_start_offset()) {
// Discard this elf, and use the elf from the previous map instead.
set_elf(prev_real_map->elf());
}
}
// Cache the elf only after all of the above checks since we might
// discard the original elf we created.
if (Elf::CachingEnabled()) {
Elf::CacheAdd(this);
}
return elf().get();
}
上来就是操作elf(),如果有的话,直接就返回了。看下具体实现:
java
inline std::shared_ptr<Elf>& elf() { return GetElfFields().elf_; }
MapInfo::ElfFields& MapInfo::GetElfFields() {
ElfFields* elf_fields = elf_fields_.load(std::memory_order_acquire);
if (elf_fields != nullptr) {
return *elf_fields;
}
// Allocate and initialize the field in thread-safe way.
std::unique_ptr<ElfFields> desired(new ElfFields());
ElfFields* expected = nullptr;
// Strong version is reliable. Weak version might randomly return false.
if (elf_fields_.compare_exchange_strong(expected, desired.get())) {
return *desired.release(); // Success: we transferred the pointer ownership to the field.
} else {
return *expected; // Failure: 'expected' is updated to the value set by the other thread.
}
}
这儿就是分配下ElfFields,作为elf信息的cache。顺便提一下原子比较的stong和weak的区别。目前arm上的原子操作是用的LL/SC(load-linked/store-conditional)指令实现的,这种方式比传统的CAS条件更松一些,可是性能会更好,默认是weak的实现方式,可能出现偶尔的误判,比如值是相等的,可是返回了false。 那stong 就可以完全避免这种情况,因为在weak的基础上又加了一层保障,这个还是很有意思的。LL/SC 就是在读取数据后,只要对该地址没有更新,那么后续的写入可以直接生效。
这个如果不好理解的话,可以简单一点,只要是非while场景,一律用stong,while场景可选用weak。
继续回到上面的逻辑,elf()首次肯定返回空,那么接下来就需要看看如何构造elf了。
java
if (Elf::CachingEnabled() && !name().empty()) {
if (Elf::CacheGet(this)) {
return elf().get();
}
}
这儿的elf 的cache默认是关闭的,因此从cache中是拿不到了。只能主动创建,流程如下:
java
elf().reset(new Elf(CreateMemory(process_memory)));
// If the init fails, keep the elf around as an invalid object so we
// don't try to reinit the object.
elf()->Init();
if (elf()->valid() && expected_arch != elf()->arch()) {
// Make the elf invalid, mismatch between arch and expected arch.
elf()->Invalidate();
}
看下CreateMemory:
java
Memory* MapInfo::CreateMemory(const std::shared_ptr<Memory>& process_memory) {
if (end() <= start()) {
return nullptr;
}
set_elf_offset(0);
// Fail on device maps.
if (flags() & MAPS_FLAGS_DEVICE_MAP) {
return nullptr;
}
// First try and use the file associated with the info.
if (!name().empty()) {
Memory* memory = GetFileMemory();
if (memory != nullptr) {
return memory;
}
}
if (process_memory == nullptr) {
return nullptr;
}
set_memory_backed_elf(true);
// Need to verify that this elf is valid. It's possible that
// only part of the elf file to be mapped into memory is in the executable
// map. In this case, there will be another read-only map that includes the
// first part of the elf file. This is done if the linker rosegment
// option is used.
std::unique_ptr<MemoryRange> memory(new MemoryRange(process_memory, start(), end() - start(), 0));
if (Elf::IsValidElf(memory.get())) {
set_elf_start_offset(offset());
auto next_real_map = GetNextRealMap();
// Might need to peek at the next map to create a memory object that
// includes that map too.
if (offset() != 0 || next_real_map == nullptr || offset() >= next_real_map->offset()) {
return memory.release();
}
// There is a possibility that the elf object has already been created
// in the next map. Since this should be a very uncommon path, just
// redo the work. If this happens, the elf for this map will eventually
// be discarded.
MemoryRanges* ranges = new MemoryRanges;
ranges->Insert(new MemoryRange(process_memory, start(), end() - start(), 0));
ranges->Insert(new MemoryRange(process_memory, next_real_map->start(),
next_real_map->end() - next_real_map->start(),
next_real_map->offset() - offset()));
return ranges;
}
auto prev_real_map = GetPrevRealMap();
// Find the read-only map by looking at the previous map. The linker
// doesn't guarantee that this invariant will always be true. However,
// if that changes, there is likely something else that will change and
// break something.
if (offset() == 0 || prev_real_map == nullptr || prev_real_map->offset() >= offset()) {
set_memory_backed_elf(false);
return nullptr;
}
// Make sure that relative pc values are corrected properly.
set_elf_offset(offset() - prev_real_map->offset());
// Use this as the elf start offset, otherwise, you always get offsets into
// the r-x section, which is not quite the right information.
set_elf_start_offset(prev_real_map->offset());
std::unique_ptr<MemoryRanges> ranges(new MemoryRanges);
if (!ranges->Insert(new MemoryRange(process_memory, prev_real_map->start(),
prev_real_map->end() - prev_real_map->start(), 0))) {
return nullptr;
}
if (!ranges->Insert(new MemoryRange(process_memory, start(), end() - start(), elf_offset()))) {
return nullptr;
}
return ranges.release();
}
这儿首先就是加载对应so的内容,具体实现是GetFileMemory:
java
Memory* MapInfo::GetFileMemory() {
// Fail on device maps.
if (flags() & MAPS_FLAGS_DEVICE_MAP) {
return nullptr;
}
std::unique_ptr<MemoryFileAtOffset> memory(new MemoryFileAtOffset);
if (offset() == 0) {
if (memory->Init(name(), 0)) {
return memory.release();
}
return nullptr;
}
// These are the possibilities when the offset is non-zero.
// - There is an elf file embedded in a file, and the offset is the
// the start of the elf in the file.
// - There is an elf file embedded in a file, and the offset is the
// the start of the executable part of the file. The actual start
// of the elf is in the read-only segment preceeding this map.
// - The whole file is an elf file, and the offset needs to be saved.
//
// Map in just the part of the file for the map. If this is not
// a valid elf, then reinit as if the whole file is an elf file.
// If the offset is a valid elf, then determine the size of the map
// and reinit to that size. This is needed because the dynamic linker
// only maps in a portion of the original elf, and never the symbol
// file data.
//
// For maps with MAPS_FLAGS_JIT_SYMFILE_MAP, the map range is for a JIT function,
// which can be smaller than elf header size. So make sure map_size is large enough
// to read elf header.
uint64_t map_size = std::max<uint64_t>(end() - start(), sizeof(ElfTypes64::Ehdr));
if (!memory->Init(name(), offset(), map_size)) {
return nullptr;
}
// Check if the start of this map is an embedded elf.
uint64_t max_size = 0;
if (Elf::GetInfo(memory.get(), &max_size)) {
set_elf_start_offset(offset());
if (max_size > map_size) {
if (memory->Init(name(), offset(), max_size)) {
return memory.release();
}
// Try to reinit using the default map_size.
if (memory->Init(name(), offset(), map_size)) {
return memory.release();
}
set_elf_start_offset(0);
return nullptr;
}
return memory.release();
}
// No elf at offset, try to init as if the whole file is an elf.
if (memory->Init(name(), 0) && Elf::IsValidElf(memory.get())) {
set_elf_offset(offset());
return memory.release();
}
// See if the map previous to this one contains a read-only map
// that represents the real start of the elf data.
if (InitFileMemoryFromPreviousReadOnlyMap(memory.get())) {
return memory.release();
}
// Failed to find elf at start of file or at read-only map, return
// file object from the current map.
if (memory->Init(name(), offset(), map_size)) {
return memory.release();
}
return nullptr;
}
这儿就是映射so文件,需要处理多种场景,比如整个文件就是完整的elf,这个文件部分内容是elf,这个文件的elf在前面的meminfo中等。比如如果整个文件就是完整的elf,那么就是:
java
if (offset() == 0) {
if (memory->Init(name(), 0)) {
return memory.release();
}
return nullptr;
}
如果只有部分是elf,那么就是:
less
uint64_t map_size = std::max<uint64_t>(end() - start(), sizeof(ElfTypes64::Ehdr));
if (!memory->Init(name(), offset(), map_size)) {
return nullptr;
}
offset()就是elf开头的部分,这儿就是为了查找elf的文件头,包含的信息如下:
java
typedef struct
{
unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */
Elf64_Half e_type; /* Object file type */
Elf64_Half e_machine; /* Architecture */
Elf64_Word e_version; /* Object file version */
Elf64_Addr e_entry; /* Entry point virtual address */
Elf64_Off e_phoff; /* Program header table file offset */
Elf64_Off e_shoff; /* Section header table file offset */
Elf64_Word e_flags; /* Processor-specific flags */
Elf64_Half e_ehsize; /* ELF header size in bytes */
Elf64_Half e_phentsize; /* Program header table entry size */
Elf64_Half e_phnum; /* Program header table entry count */
Elf64_Half e_shentsize; /* Section header table entry size */
Elf64_Half e_shnum; /* Section header table entry count */
Elf64_Half e_shstrndx; /* Section header string table index */
} Elf64_Ehdr;
从elf文件中拿到头信息就可以解析该so了。也就是当前的目的就是要找到elf的头信息。
再看这个文件是否是一个elf:
java
// Check if the start of this map is an embedded elf.
uint64_t max_size = 0;
if (Elf::GetInfo(memory.get(), &max_size)) {
set_elf_start_offset(offset());
if (max_size > map_size) {
if (memory->Init(name(), offset(), max_size)) {
return memory.release();
}
// Try to reinit using the default map_size.
if (memory->Init(name(), offset(), map_size)) {
return memory.release();
}
set_elf_start_offset(0);
return nullptr;
}
return memory.release();
}
这儿的GetInfo就是获取文件的信息,看是否是elf,如果是elf的话,算一下elf文件的大小等:
java
bool Elf::GetInfo(Memory* memory, uint64_t* size) {
if (!IsValidElf(memory)) {
return false;
}
*size = 0;
uint8_t class_type;
if (!memory->ReadFully(EI_CLASS, &class_type, 1)) {
return false;
}
// Get the maximum size of the elf data from the header.
if (class_type == ELFCLASS32) {
ElfInterface32::GetMaxSize(memory, size);
} else if (class_type == ELFCLASS64) {
ElfInterface64::GetMaxSize(memory, size);
} else {
return false;
}
return true;
}
再回顾下,如何判断一个文件是否是elf文件,那就是检查下文件头的magic number:
java
bool Elf::IsValidElf(Memory* memory) {
if (memory == nullptr) {
return false;
}
// Verify that this is a valid elf file.
uint8_t e_ident[SELFMAG + 1];
if (!memory->ReadFully(0, e_ident, SELFMAG)) {
return false;
}
if (memcmp(e_ident, ELFMAG, SELFMAG) != 0) {
return false;
}
return true;
}
完全正确。
那如果magic number不一样是不是就断定不是elf文件了呢?也不一定,还可以看看文件类型,我们每个文件的开头都会有点信息标识该文件的类型,比如bash,可执行文件,so等。接下来就是通过这种方式看是否是elf文件。如果读取到的类型正好是ELFCLASS64,那说明还真是一个elf文件,那接下来获取下大小,那问题来了,如何获取elf文件的大小呢? 肯定不是文件大小,因为如果elf是嵌入到一个文件中的话,elf文件大小就会小于文件大小,那可以利用elf文件的格式计算出来:
java
// This is an estimation of the size of the elf file using the location
// of the section headers and size. This assumes that the section headers
// are at the end of the elf file. If the elf has a load bias, the size
// will be too large, but this is acceptable.
template <typename ElfTypes>
void ElfInterfaceImpl<ElfTypes>::GetMaxSize(Memory* memory, uint64_t* size) {
EhdrType ehdr;
if (!memory->ReadFully(0, &ehdr, sizeof(ehdr))) {
*size = 0;
return;
}
// If this winds up as zero, the PT_LOAD reading will get a better value.
uint64_t elf_size = ehdr.e_shoff + ehdr.e_shentsize * ehdr.e_shnum;
// Search through the PT_LOAD values and if any result in a larger elf
// size, use that.
uint64_t offset = ehdr.e_phoff;
for (size_t i = 0; i < ehdr.e_phnum; i++, offset += ehdr.e_phentsize) {
PhdrType phdr;
if (!memory->ReadFully(offset, &phdr, sizeof(phdr))) {
break;
}
if (phdr.p_type == PT_LOAD) {
uint64_t end_offset;
if (__builtin_add_overflow(phdr.p_offset, phdr.p_memsz, &end_offset)) {
continue;
}
if (end_offset > elf_size) {
elf_size = end_offset;
}
}
}
*size = elf_size;
}
看到了吧?这就是技巧!一般section header表会在elf文件的末尾,那就利用这个信息可以算一波,不过我们知道elf文件有2种试视图,一种是链接一种是运行,section header是链接用的,运行时候不一定要有,而运行时候的program header是一定会有的,那接下来就利用program header 再算一波,然后取一个最大值就稳了。
再看下program header table entry的结构:
java
typedef struct
{
Elf64_Word p_type; /* Segment type */
Elf64_Word p_flags; /* Segment flags */
Elf64_Off p_offset; /* Segment file offset */
Elf64_Addr p_vaddr; /* Segment virtual address */
Elf64_Addr p_paddr; /* Segment physical address */
Elf64_Xword p_filesz; /* Segment size in file */
Elf64_Xword p_memsz; /* Segment size in memory */
Elf64_Xword p_align; /* Segment alignment */
} Elf64_Phdr;
这时候拿到elf文件大小后就可以再试着映射下so,看看能否成功,如果还是不行,那就只能继续试试其他case了,比如我们拿到的offset是无效的:
java
// No elf at offset, try to init as if the whole file is an elf.
if (memory->Init(name(), 0) && Elf::IsValidElf(memory.get())) {
set_elf_offset(offset());
return memory.release();
}
或者前一个同名的只读mapinfo才是elf的头?
java
// See if the map previous to this one contains a read-only map
// that represents the real start of the elf data.
if (InitFileMemoryFromPreviousReadOnlyMap(memory.get())) {
return memory.release();
}
看下InitFileMemoryFromPreviousReadOnlyMap:
java
bool MapInfo::InitFileMemoryFromPreviousReadOnlyMap(MemoryFileAtOffset* memory) {
// One last attempt, see if the previous map is read-only with the
// same name and stretches across this map.
auto prev_real_map = GetPrevRealMap();
if (prev_real_map == nullptr || prev_real_map->flags() != PROT_READ ||
prev_real_map->offset() >= offset()) {
return false;
}
uint64_t map_size = end() - prev_real_map->end();
if (!memory->Init(name(), prev_real_map->offset(), map_size)) {
return false;
}
uint64_t max_size;
if (!Elf::GetInfo(memory, &max_size) || max_size < map_size) {
return false;
}
if (!memory->Init(name(), prev_real_map->offset(), max_size)) {
return false;
}
set_elf_offset(offset() - prev_real_map->offset());
set_elf_start_offset(prev_real_map->offset());
return true;
}
👀关注公众号:Android老皮!!!欢迎大家来找我探讨交流👀