一次酣畅淋漓的问题排查(c++标准库异常实现原理)

PS:要转载请注明出处,本人版权所有。

PS: 这个只是基于《我自己》的理解,

如果和你的原则及想法相冲突,请谅解,勿喷。

环境说明

前言


在集成和定制llama.cpp工程的时候,做了许多工作,也遇到了很多问题,但是绝大部分问题都是很快就能解决的,少部分问题花一些时间也能解决掉,其中有两个关联问题是让我最印象深刻的。为了整理和探究这两个问题的根源,特在此编写本文。且在写本文这段时间内,也整理和提了一个关联的pr给llama.cpp(https://github.com/ggml-org/llama.cpp/pull/17653)。

首先我们有如下的代码示例:

c++ 复制代码
try{
    {
        // ... ...
        if (!std::filesystem::exists("/bugreports"))

        // ... ...
    }

    {
        std::filesystem::directory_iterator dir_it("/", fs::directory_options::skip_permission_denied);
        for (const auto & entry : dir_it) {
            // ... ...
        }
        // ... ...
    }

    return ;
}
catch (const std::exception& e){
    printf("exception: %s\n", e.what());
    return ;
}
catch(...){
    printf("Fatal Error, Unkown exception\n");
    return ;
}

根据上面的代码示例,在不同的编译条件、同一个执行环境(软、硬件)下它3个code-block分支都会走,这让我简直头大。下面是两个catch-code-block部分的输出:

shell 复制代码
exception: filesystem error: in posix_stat: failed to determine attributes for the specified path: Permission denied ["/bugreports"]
shell 复制代码
Fatal Error, Unkown exception

当然,上面的3个code-block其实对应这几个问题:

  • 为什么同一个设备,同一段代码在不同条件下执行3个不同的分支,尤其是什么情况下正常执行,什么情况下抛出异常?
  • std::filesystem::exists/std::filesystem::directory_iterator 什么情况下会抛出异常?
  • 对于std::filesystem::exists/std::filesystem::directory_iterator抛出的异常来说,为什么捕获路径不一样(是否能抓到filesystem error)?

下面我们分别对这几个问题进行分析(以std::filesystem::exists为例)。

问题初步分析


为什么同一设备,同一代码,不同编译条件可以正常或者异常运行?

在我的例子里面,根据我的实际测试反馈来看,在build.gradle里面,【 compileSdk = 34,minSdk = 34,ndk=26】【 compileSdk = 34,minSdk = 34,ndk=26】两种不同配置,导致运行结果不一样,当minSdk=26时,代码会抛出异常,当minSdk=34时,代码正常运行。

经过上面的分析和测试,我们可以得到一个猜(可能性极大)的原因:因为ndk版本是一样的,意味着上面的标准库实现是一样的,因此这个现象的主要原因还是不同的编译条件,让我们使用posix api访问/bugreports目录时,posix api有不同的返回。

更底层的原因导致posix api有不同的返回,我不是很了解、不熟悉android的底层系统细节,因此就不继续排查了,有缘再说,下次一定。

接着我们排查一下c++标准库的std::filesystem::exists实现,看看异常从哪里来?

什么情况下std::filesystem::exists会抛出异常?

我们先查看https://en.cppreference.com/w/cpp/filesystem/exists.html,其定义如下:

c++ 复制代码
bool exists( std::filesystem::file_status s ) noexcept; (1)	(since C++17)
bool exists( const std::filesystem::path& p ); (2)	(since C++17)
bool exists( const std::filesystem::path& p, std::error_code& ec ) noexcept; (3)	(since C++17)

/*
    Exceptions
        Any overload not marked noexcept may throw std::bad_alloc if memory allocation fails.

        2) Throws std::filesystem::filesystem_error on underlying OS API errors, constructed with p as the first path argument and the OS error code as the error code argument.
*/

因此,对于我们上文的用法,如果底层OS的API出现问题,那么会抛出异常,这个现象是符合标准定义的。

下面我们来看看exists的源码具体实现(libcxx):

c++ 复制代码
inline _LIBCPP_HIDE_FROM_ABI bool exists(const path& __p) { return exists(__status(__p)); }

_LIBCPP_EXPORTED_FROM_ABI file_status __status(const path&, error_code* __ec = nullptr); 

file_status __status(const path& p, error_code* ec) { return detail::posix_stat(p, ec); }


inline file_status posix_stat(path const& p, error_code* ec) {
  StatT path_stat;
  return posix_stat(p, path_stat, ec);
}

inline file_status posix_stat(path const& p, StatT& path_stat, error_code* ec) {
  error_code m_ec;
  if (detail::stat(p.c_str(), &path_stat) == -1)
    m_ec = detail::capture_errno();
  return create_file_status(m_ec, p, path_stat, ec);
}

namespace detail {
using ::stat; //<sys/stat.h>
} // end namespace detail

inline file_status create_file_status(error_code& m_ec, path const& p, const StatT& path_stat, error_code* ec) {
  if (ec)
    *ec = m_ec;
  if (m_ec && (m_ec.value() == ENOENT || m_ec.value() == ENOTDIR)) {
    return file_status(file_type::not_found);
  } else if (m_ec) {
    ErrorHandler<void> err("posix_stat", ec, &p);
    err.report(m_ec, "failed to determine attributes for the specified path");
    return file_status(file_type::none);
  }

  // ... ... other code
}

因此exists()抛异常的根本原因就是,调用detail::stat的时候,产生了Permission denied 错误,然后在create_file_status中抛出了异常。

对于std::filesystem::filesystem_error异常,在不同位置捕获的原因?

根据上面的最小化测试代码,再一次对整体构建过程进行排查后,有如下发现:

  • 当上面的代码在一个so中,如果启用了-Wl,--version-script功能,导致未导出vtable和typeinfo对象的符号(Android)。
  • 在x86里面构建上面同样的实例时,发现启用了-Wl,--version-script功能,默认也能导出了vtable和typeinfo对象的符号。

上面的现象把我搞郁闷了,经过编译器、链接器、编译参数、链接参数和符号等相关的排查,终于在一个位置发现了一些奇怪的东西:

shell 复制代码
  #  readelf -sW build/libnativelib.so|grep fs10filesystem16filesystem_errorE
  # 下面的so能在catch (const std::exception& e)中捕获异常,nm -CD 也有fs10filesystem16filesystem_errorE相关的符号
    12: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND _ZTINSt6__ndk14__fs10filesystem16filesystem_errorE
    18: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND _ZTVNSt6__ndk14__fs10filesystem16filesystem_errorE
   235: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND _ZTINSt6__ndk14__fs10filesystem16filesystem_errorE
   241: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  UND _ZTVNSt6__ndk14__fs10filesystem16filesystem_errorE

  # 下面的so只能在catch(...)捕获异常,nm -CD 没有fs10filesystem16filesystem_errorE相关的符号
   393: 0000000000036340    24 OBJECT  LOCAL  DEFAULT   17 _ZTINSt6__ndk14__fs10filesystem16filesystem_errorE
   395: 0000000000036318    40 OBJECT  LOCAL  DEFAULT   17 _ZTVNSt6__ndk14__fs10filesystem16filesystem_errorE
   410: 000000000000ad5a    47 OBJECT  LOCAL  DEFAULT   11 _ZTSNSt6__ndk14__fs10filesystem16filesystem_errorE

上面我们可以知道,正常的so,其相关的typeinfo/vtable是GLOBAL 且未定义的,其定义应该在libc++.so或者libstdc++.so的。而异常的so相关的typeinfo/vtable的符号是LOCAL且已经定义了。

经过一系列查询,上面问题的差异出在ANDROID_STL在cmake中默认是c++_static的(https://developer.android.com/ndk/guides/cpp-support?hl=zh-cn#selecting_a_c_runtime),这个时候c++标准库的实现是以静态库的方式链接到我的so,因此相关的实现是local的,现在只需要改为c++_shared就解决了上面的异常路径不一致的情况。

此外,当我还是用c++_static继续编译,只是手动把typeinfo/vtable的符号都导出为依赖libc++.so或者libstdc++.so时,发现也能够正常捕获异常了。

上面我们只是找到了引起问题的地方,但是没有回答,为什么nm -CD 没有fs10filesystem16filesystem_errorE相关的typeinfo/vtable符号的时候,只有catch(...)能捕获异常。要回答这个问题,我们得去初步看一下c++异常机制是怎么实现的,下面我们继续分析。

c++标准库异常实现原理简单分析

为了尽可能的贴近我的遇到问题的场景和方便调试,且不同ABI的异常实现可能不一致,下面基于clang,x64,来分析c++异常实现的基本原理(Itanium C++ ABI)。

首先我们来看看我们throw一个异常的时候调用的汇编代码是什么?

c++ 复制代码
extern "C" __attribute__((visibility("default"))) void pp()
{
  throw std::runtime_error("test_exception");
}
shell 复制代码
   0x00007ffff7f9a380 <+0>:     push   %rbp
   0x00007ffff7f9a381 <+1>:     mov    %rsp,%rbp
   0x00007ffff7f9a384 <+4>:     sub    $0x20,%rsp
   0x00007ffff7f9a388 <+8>:     mov    $0x10,%edi
=> 0x00007ffff7f9a38d <+13>:    call   0x7ffff7fb48e0 <__cxa_allocate_exception>
   0x00007ffff7f9a392 <+18>:    mov    %rax,%rdi
   0x00007ffff7f9a395 <+21>:    mov    %rdi,%rax
   0x00007ffff7f9a398 <+24>:    mov    %rax,-0x18(%rbp)
   0x00007ffff7f9a39c <+28>:    lea    -0x902d(%rip),%rsi        # 0x7ffff7f91376
   0x00007ffff7f9a3a3 <+35>:    call   0x7ffff7fb5e80 <_ZNSt13runtime_errorC2EPKc>
   0x00007ffff7f9a3a8 <+40>:    jmp    0x7ffff7f9a3ad <pp()+45>
   0x00007ffff7f9a3ad <+45>:    mov    -0x18(%rbp),%rdi
   0x00007ffff7f9a3b1 <+49>:    lea    0x1d158(%rip),%rsi        # 0x7ffff7fb7510 <_ZTISt13runtime_error>
   0x00007ffff7f9a3b8 <+56>:    lea    0xb1(%rip),%rdx        # 0x7ffff7f9a470 <_ZNSt15underflow_errorD2Ev>
   0x00007ffff7f9a3bf <+63>:    call   0x7ffff7fb4b00 <__cxa_throw>
   0x00007ffff7f9a3c4 <+68>:    mov    -0x18(%rbp),%rdi
   0x00007ffff7f9a3c8 <+72>:    mov    %rax,%rcx
   0x00007ffff7f9a3cb <+75>:    mov    %edx,%eax
   0x00007ffff7f9a3cd <+77>:    mov    %rcx,-0x8(%rbp)
   0x00007ffff7f9a3d1 <+81>:    mov    %eax,-0xc(%rbp)
   0x00007ffff7f9a3d4 <+84>:    call   0x7ffff7fb49c0 <__cxa_free_exception>
   0x00007ffff7f9a3d9 <+89>:    mov    -0x8(%rbp),%rdi
   0x00007ffff7f9a3dd <+93>:    call   0x7ffff7fb6160 <_Unwind_Resume@plt>

从上面的代码可以知道,先调用__cxa_allocate_exception在特定空间分配内存(不是一般的堆栈空间,避免干扰堆栈),然后调用placement new 在前面的空间上面构造std::runtime_error对象,然后执行__cxa_throw开始堆栈展开,查找异常链。这个链接介绍了cpp标准里面对异常展开流程的描述(https://en.cppreference.com/w/cpp/language/throw.html)。

下面我们通过查看__cxa_throw的源码,看看libc++对异常展开是怎么实现的。

libcxxabi\src\cxa_exception.cpp

c++ 复制代码
void
__cxa_throw(void *thrown_object, std::type_info *tinfo, void (_LIBCXXABI_DTOR_FUNC *dest)(void *)) {
    __cxa_eh_globals *globals = __cxa_get_globals();
    __cxa_exception* exception_header = cxa_exception_from_thrown_object(thrown_object);

    exception_header->unexpectedHandler = std::get_unexpected();
    exception_header->terminateHandler  = std::get_terminate();
    exception_header->exceptionType = tinfo;
    exception_header->exceptionDestructor = dest;
    setOurExceptionClass(&exception_header->unwindHeader);
    exception_header->referenceCount = 1;  // This is a newly allocated exception, no need for thread safety.
    globals->uncaughtExceptions += 1;   // Not atomically, since globals are thread-local

    exception_header->unwindHeader.exception_cleanup = exception_cleanup_func;

#if __has_feature(address_sanitizer)
    // Inform the ASan runtime that now might be a good time to clean stuff up.
    __asan_handle_no_return();
#endif

#ifdef __USING_SJLJ_EXCEPTIONS__
    _Unwind_SjLj_RaiseException(&exception_header->unwindHeader);
#else
    _Unwind_RaiseException(&exception_header->unwindHeader);
#endif
    //  This only happens when there is no handler, or some unexpected unwinding
    //     error happens.
    failed_throw(exception_header);
}

这里可以看到,首先函数3个参数分别是:刚刚的std::runtime_error对象,异常对象的typeinfo,std::runtime_error对应的析构函数。然后就开始根据不同的异常实现,开始展开堆栈。此外,这里有个地方可以值得注意:exceptionType 很明显就是我们本文的问题有关系,如果没有导出对应的typeinfo,很有可能在其他地方无法匹配这个异常。

还有这里补充一个细节:现在常见的异常模型大概有3类,SJLJ(setjump-longjump),DWARF,SEH (Windows),当前类linux用的异常模型是DWARF中的定义。

根据上面的执行流,我们接着来看_Unwind_RaiseException的实现。

libunwind\src\UnwindLevel1.c

c 复制代码
/// Called by __cxa_throw.  Only returns if there is a fatal error.
_LIBUNWIND_EXPORT _Unwind_Reason_Code
_Unwind_RaiseException(_Unwind_Exception *exception_object) {
  _LIBUNWIND_TRACE_API("_Unwind_RaiseException(ex_obj=%p)",
                       static_cast<void *>(exception_object));
  unw_context_t uc;
  unw_cursor_t cursor;
  __unw_getcontext(&uc);

  // This field for is for compatibility with GCC to say this isn't a forced
  // unwind. EHABI #7.2
  exception_object->unwinder_cache.reserved1 = 0;

  // phase 1: the search phase
  _Unwind_Reason_Code phase1 = unwind_phase1(&uc, &cursor, exception_object);
  if (phase1 != _URC_NO_REASON)
    return phase1;

  // phase 2: the clean up phase
  return unwind_phase2(&uc, &cursor, exception_object, false);
}

从这里来看,异常展开分为了两个阶段,phase1和phase2,从备注来看就是搜索、清理。下面我们先来看unwind_phase1的做了什么。

libunwind\src\UnwindLevel1.c

c 复制代码
static _Unwind_Reason_Code
unwind_phase1(unw_context_t *uc, unw_cursor_t *cursor, _Unwind_Exception *exception_object) {
  __unw_init_local(cursor, uc);

  // Walk each frame looking for a place to stop.
  while (true) {
    // Ask libunwind to get next frame (skip over first which is
    // _Unwind_RaiseException).
    int stepResult = __unw_step(cursor);
    // ... ...

    // See if frame has code to run (has personality routine).
    unw_proc_info_t frameInfo;
    unw_word_t sp;
    if (__unw_get_proc_info(cursor, &frameInfo) != UNW_ESUCCESS) {
        // ... ...
    }

    // ... ...

    // If there is a personality routine, ask it if it will want to stop at
    // this frame.
    if (frameInfo.handler != 0) {
      _Unwind_Personality_Fn p =
          (_Unwind_Personality_Fn)(uintptr_t)(frameInfo.handler);
      _LIBUNWIND_TRACE_UNWINDING(
          "unwind_phase1(ex_ojb=%p): calling personality function %p",
          (void *)exception_object, (void *)(uintptr_t)p);
      _Unwind_Reason_Code personalityResult =
          (*p)(1, _UA_SEARCH_PHASE, exception_object->exception_class,
               exception_object, (struct _Unwind_Context *)(cursor));
      switch (personalityResult) {
      case _URC_HANDLER_FOUND:
        // found a catch clause or locals that need destructing in this frame
        // stop search and remember stack pointer at the frame
        __unw_get_reg(cursor, UNW_REG_SP, &sp);
        exception_object->private_2 = (uintptr_t)sp;
        _LIBUNWIND_TRACE_UNWINDING(
            "unwind_phase1(ex_ojb=%p): _URC_HANDLER_FOUND",
            (void *)exception_object);
        return _URC_NO_REASON;

      case _URC_CONTINUE_UNWIND:
        _LIBUNWIND_TRACE_UNWINDING(
            "unwind_phase1(ex_ojb=%p): _URC_CONTINUE_UNWIND",
            (void *)exception_object);
        // continue unwinding
        break;

      default:
        // something went wrong
        _LIBUNWIND_TRACE_UNWINDING(
            "unwind_phase1(ex_ojb=%p): _URC_FATAL_PHASE1_ERROR",
            (void *)exception_object);
        return _URC_FATAL_PHASE1_ERROR;
      }
    }
  }
  return _URC_NO_REASON;
}
c++ 复制代码
static _Unwind_Reason_Code
unwind_phase2(unw_context_t *uc, unw_cursor_t *cursor, _Unwind_Exception *exception_object) {
  __unw_init_local(cursor, uc);

  _LIBUNWIND_TRACE_UNWINDING("unwind_phase2(ex_ojb=%p)",
                             (void *)exception_object);

  // uc is initialized by __unw_getcontext in the parent frame. The first stack
  // frame walked is unwind_phase2.
  unsigned framesWalked = 1;
  // Walk each frame until we reach where search phase said to stop.
  while (true) {

    // Ask libunwind to get next frame (skip over first which is
    // _Unwind_RaiseException).
    int stepResult = __unw_step(cursor);
    // ... ...

    // Get info about this frame.
    unw_word_t sp;
    unw_proc_info_t frameInfo;
    __unw_get_reg(cursor, UNW_REG_SP, &sp);
    if (__unw_get_proc_info(cursor, &frameInfo) != UNW_ESUCCESS) {
        // ... ...
    }

    // ... ...

    ++framesWalked;
    // If there is a personality routine, tell it we are unwinding.
    if (frameInfo.handler != 0) {
      _Unwind_Personality_Fn p =
          (_Unwind_Personality_Fn)(uintptr_t)(frameInfo.handler);
      _Unwind_Action action = _UA_CLEANUP_PHASE;
      if (sp == exception_object->private_2) {
        // Tell personality this was the frame it marked in phase 1.
        action = (_Unwind_Action)(_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME);
      }
       _Unwind_Reason_Code personalityResult =
          (*p)(1, action, exception_object->exception_class, exception_object,
               (struct _Unwind_Context *)(cursor));
      switch (personalityResult) {
      case _URC_CONTINUE_UNWIND:
        // Continue unwinding
        _LIBUNWIND_TRACE_UNWINDING(
            "unwind_phase2(ex_ojb=%p): _URC_CONTINUE_UNWIND",
            (void *)exception_object);
        if (sp == exception_object->private_2) {
          // Phase 1 said we would stop at this frame, but we did not...
          _LIBUNWIND_ABORT("during phase1 personality function said it would "
                           "stop here, but now in phase2 it did not stop here");
        }
        break;
      case _URC_INSTALL_CONTEXT:
        _LIBUNWIND_TRACE_UNWINDING(
            "unwind_phase2(ex_ojb=%p): _URC_INSTALL_CONTEXT",
            (void *)exception_object);
        // Personality routine says to transfer control to landing pad.
        // We may get control back if landing pad calls _Unwind_Resume().
        if (_LIBUNWIND_TRACING_UNWINDING) {
          unw_word_t pc;
          __unw_get_reg(cursor, UNW_REG_IP, &pc);
          __unw_get_reg(cursor, UNW_REG_SP, &sp);
          _LIBUNWIND_TRACE_UNWINDING("unwind_phase2(ex_ojb=%p): re-entering "
                                     "user code with ip=0x%" PRIxPTR
                                     ", sp=0x%" PRIxPTR,
                                     (void *)exception_object, pc, sp);
        }

        __unw_phase2_resume(cursor, framesWalked);
        // __unw_phase2_resume() only returns if there was an error.
        return _URC_FATAL_PHASE2_ERROR;
      default:
        // Personality routine returned an unknown result code.
        _LIBUNWIND_DEBUG_LOG("personality function returned unknown result %d",
                             personalityResult);
        return _URC_FATAL_PHASE2_ERROR;
      }
    }
  }

  // Clean up phase did not resume at the frame that the search phase
  // said it would...
  return _URC_FATAL_PHASE2_ERROR;
}

这里的代码也很明晰,首先获取了当前栈帧的信息,然后将frameInfo.handler转换为_Unwind_Personality_Fn处理函数,然后调用这个函数进行处理。这里有两种情况:

  • unwind_phase1,当action=_UA_SEARCH_PHASE时,代码我们当前阶段是通过_Unwind_Personality_Fn搜索catch代码块,当找到处理块时,返回_URC_HANDLER_FOUND,并给exception_object->private_2赋值,方便在第二阶段进行执行。
  • unwind_phase2,exception_object->private_2 == sp时,当action=(_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME)时,我们开始调用_Unwind_Personality_Fn安装对应的catch-block,然后返回_URC_INSTALL_CONTEXT,最后执行__unw_phase2_resume开始执行异常处理。

此外,这里的 __unw_init_local执行了一个非常重要的操作,那就是找到了.eh_frame的位置,下面简单看一下代码流程:

c++ 复制代码
inline bool LocalAddressSpace::findUnwindSections(pint_t targetAddr,
                                                  UnwindInfoSections &info) {

    // ... ...

    info.dso_base = 0;
    // Bare metal is statically linked, so no need to ask the dynamic loader
    info.dwarf_section_length = (size_t)(&__eh_frame_end - &__eh_frame_start);
    info.dwarf_section =        (uintptr_t)(&__eh_frame_start);

    // ... ...
}

template <typename A, typename R>
void UnwindCursor<A, R>::setInfoBasedOnIPRegister(bool isReturnAddress) {

  // ... ...
  // Ask address space object to find unwind sections for this pc.
  UnwindInfoSections sects;
  if (_addressSpace.findUnwindSections(pc, sects)) 
  // ... ...
}

// template <typename A, typename R>
// int UnwindCursor<A, R>::step() {
//     // ... ...
//     this->setInfoBasedOnIPRegister(true);
//     // ... ...
// }

_LIBUNWIND_HIDDEN int __unw_init_local(unw_cursor_t *cursor,
                                       unw_context_t *context) {
  // ... ...
  // Use "placement new" to allocate UnwindCursor in the cursor buffer.
  new (reinterpret_cast<UnwindCursor<LocalAddressSpace, REGISTER_KIND> *>(cursor))
      UnwindCursor<LocalAddressSpace, REGISTER_KIND>(
          context, LocalAddressSpace::sThisAddressSpace);
#undef REGISTER_KIND
  AbstractUnwindCursor *co = (AbstractUnwindCursor *)cursor;
  co->setInfoBasedOnIPRegister();

  return UNW_ESUCCESS;
}

这里的_Unwind_Personality_Fn函数是itanium-cxx-abi 定义的,定义文档在这个位置https://itanium-cxx-abi.github.io/cxx-abi/abi-eh.html#cxx-throw。主要作用就是和c++特性相关的堆栈展开特定代码,这个函数在gcc/clang里面叫做:__gxx_personality_v0,我们直接去看他的源码。

libcxxabi\src\cxa_personality.cpp

c++ 复制代码
#if !defined(_LIBCXXABI_ARM_EHABI)
#if defined(__SEH__) && !defined(__USING_SJLJ_EXCEPTIONS__)
static _Unwind_Reason_Code __gxx_personality_imp
#else
_LIBCXXABI_FUNC_VIS _Unwind_Reason_Code
#ifdef __USING_SJLJ_EXCEPTIONS__
__gxx_personality_sj0
#elif defined(__MVS__)
__zos_cxx_personality_v2
#else
__gxx_personality_v0
#endif
#endif
                    (int version, _Unwind_Action actions, uint64_t exceptionClass,
                     _Unwind_Exception* unwind_exception, _Unwind_Context* context)
{
    if (version != 1 || unwind_exception == 0 || context == 0)
        return _URC_FATAL_PHASE1_ERROR;

    bool native_exception = (exceptionClass     & get_vendor_and_language) ==
                            (kOurExceptionClass & get_vendor_and_language);
    scan_results results;
    // Process a catch handler for a native exception first.
    if (actions == (_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME) &&
        native_exception) {
        // Reload the results from the phase 1 cache.
        __cxa_exception* exception_header =
            (__cxa_exception*)(unwind_exception + 1) - 1;
        results.ttypeIndex = exception_header->handlerSwitchValue;
        results.actionRecord = exception_header->actionRecord;
        results.languageSpecificData = exception_header->languageSpecificData;
        results.landingPad =
            reinterpret_cast<uintptr_t>(exception_header->catchTemp);
        results.adjustedPtr = exception_header->adjustedPtr;

        // Jump to the handler.
        set_registers(unwind_exception, context, results);
        // Cache base for calculating the address of ttype in
        // __cxa_call_unexpected.
        if (results.ttypeIndex < 0) {
#if defined(_AIX)
          exception_header->catchTemp = (void *)_Unwind_GetDataRelBase(context);
#else
          exception_header->catchTemp = 0;
#endif
        }
        return _URC_INSTALL_CONTEXT;
    }

    // In other cases we need to scan LSDA.
    scan_eh_tab(results, actions, native_exception, unwind_exception, context);
    if (results.reason == _URC_CONTINUE_UNWIND ||
        results.reason == _URC_FATAL_PHASE1_ERROR)
        return results.reason;

    if (actions & _UA_SEARCH_PHASE)
    {
        // Phase 1 search:  All we're looking for in phase 1 is a handler that
        //   halts unwinding
        assert(results.reason == _URC_HANDLER_FOUND);
        if (native_exception) {
            // For a native exception, cache the LSDA result.
            __cxa_exception* exc = (__cxa_exception*)(unwind_exception + 1) - 1;
            exc->handlerSwitchValue = static_cast<int>(results.ttypeIndex);
            exc->actionRecord = results.actionRecord;
            exc->languageSpecificData = results.languageSpecificData;
            exc->catchTemp = reinterpret_cast<void*>(results.landingPad);
            exc->adjustedPtr = results.adjustedPtr;
        }
        return _URC_HANDLER_FOUND;
    }

    assert(actions & _UA_CLEANUP_PHASE);
    assert(results.reason == _URC_HANDLER_FOUND);
    set_registers(unwind_exception, context, results);
    // Cache base for calculating the address of ttype in __cxa_call_unexpected.
    if (results.ttypeIndex < 0) {
      __cxa_exception* exception_header =
            (__cxa_exception*)(unwind_exception + 1) - 1;
#if defined(_AIX)
      exception_header->catchTemp = (void *)_Unwind_GetDataRelBase(context);
#else
      exception_header->catchTemp = 0;
#endif
    }
    return _URC_INSTALL_CONTEXT;
}

我们从整体来看这段代码,从上面可以知道,phase1,phase2都会调用到这里来:

  • phase1, action=_UA_SEARCH_PHASE, 调用scan_eh_tab查找catch-block,并返回_URC_HANDLER_FOUND
  • phase2, action=(_UA_CLEANUP_PHASE | _UA_HANDLER_FRAME),通过set_registers设置对应的catch-block,然后返回_URC_INSTALL_CONTEXT,然后在__unw_phase2_resume执行对应的catch-block。

从上面的实现来看,scan_eh_tab是核心,其正是展开异常搜索和匹配的关键。其源码如下

c++ 复制代码
static void scan_eh_tab(scan_results &results, _Unwind_Action actions,
                        bool native_exception,
                        _Unwind_Exception *unwind_exception,
                        _Unwind_Context *context) {
    // Initialize results to found nothing but an error
    results.ttypeIndex = 0;
    results.actionRecord = 0;
    results.languageSpecificData = 0;
    results.landingPad = 0;
    results.adjustedPtr = 0;
    results.reason = _URC_FATAL_PHASE1_ERROR;
    // Check for consistent actions
    // ... ...

    // Start scan by getting exception table address.
    const uint8_t *lsda = (const uint8_t *)_Unwind_GetLanguageSpecificData(context);
    if (lsda == 0)
    {
        // There is no exception table
        results.reason = _URC_CONTINUE_UNWIND;
        return;
    }
    results.languageSpecificData = lsda;
#if defined(_AIX)
    uintptr_t base = _Unwind_GetDataRelBase(context);
#else
    uintptr_t base = 0;
#endif
    // Get the current instruction pointer and offset it before next
    // instruction in the current frame which threw the exception.
    uintptr_t ip = _Unwind_GetIP(context) - 1;
    // Get beginning current frame's code (as defined by the
    // emitted dwarf code)
    uintptr_t funcStart = _Unwind_GetRegionStart(context);
#ifdef __USING_SJLJ_EXCEPTIONS__
    if (ip == uintptr_t(-1))
    {
        // no action
        results.reason = _URC_CONTINUE_UNWIND;
        return;
    }
    else if (ip == 0)
        call_terminate(native_exception, unwind_exception);
    // ip is 1-based index into call site table
#else  // !__USING_SJLJ_EXCEPTIONS__
    uintptr_t ipOffset = ip - funcStart;
#endif // !defined(_USING_SLJL_EXCEPTIONS__)
    const uint8_t* classInfo = NULL;
    // Note: See JITDwarfEmitter::EmitExceptionTable(...) for corresponding
    //       dwarf emission
    // Parse LSDA header.
    uint8_t lpStartEncoding = *lsda++;
    const uint8_t* lpStart =
        (const uint8_t*)readEncodedPointer(&lsda, lpStartEncoding, base);
    if (lpStart == 0)
        lpStart = (const uint8_t*)funcStart;
    uint8_t ttypeEncoding = *lsda++;
    if (ttypeEncoding != DW_EH_PE_omit)
    {
        // Calculate type info locations in emitted dwarf code which
        // were flagged by type info arguments to llvm.eh.selector
        // intrinsic
        uintptr_t classInfoOffset = readULEB128(&lsda);
        classInfo = lsda + classInfoOffset;
    }
    // Walk call-site table looking for range that
    // includes current PC.
    uint8_t callSiteEncoding = *lsda++;
#ifdef __USING_SJLJ_EXCEPTIONS__
    (void)callSiteEncoding;  // When using SjLj exceptions, callSiteEncoding is never used
#endif
    uint32_t callSiteTableLength = static_cast<uint32_t>(readULEB128(&lsda));
    const uint8_t* callSiteTableStart = lsda;
    const uint8_t* callSiteTableEnd = callSiteTableStart + callSiteTableLength;
    const uint8_t* actionTableStart = callSiteTableEnd;
    const uint8_t* callSitePtr = callSiteTableStart;
    while (callSitePtr < callSiteTableEnd)
    {
        // There is one entry per call site.
#ifndef __USING_SJLJ_EXCEPTIONS__
        // The call sites are non-overlapping in [start, start+length)
        // The call sites are ordered in increasing value of start
        uintptr_t start = readEncodedPointer(&callSitePtr, callSiteEncoding);
        uintptr_t length = readEncodedPointer(&callSitePtr, callSiteEncoding);
        uintptr_t landingPad = readEncodedPointer(&callSitePtr, callSiteEncoding);
        uintptr_t actionEntry = readULEB128(&callSitePtr);
        if ((start <= ipOffset) && (ipOffset < (start + length)))
#else  // __USING_SJLJ_EXCEPTIONS__
        // ip is 1-based index into this table
        uintptr_t landingPad = readULEB128(&callSitePtr);
        uintptr_t actionEntry = readULEB128(&callSitePtr);
        if (--ip == 0)
#endif // __USING_SJLJ_EXCEPTIONS__
        {
            // Found the call site containing ip.
#ifndef __USING_SJLJ_EXCEPTIONS__
            if (landingPad == 0)
            {
                // No handler here
                results.reason = _URC_CONTINUE_UNWIND;
                return;
            }
            landingPad = (uintptr_t)lpStart + landingPad;
#else  // __USING_SJLJ_EXCEPTIONS__
            ++landingPad;
#endif // __USING_SJLJ_EXCEPTIONS__
            results.landingPad = landingPad;
            if (actionEntry == 0)
            {
                // Found a cleanup
                results.reason = actions & _UA_SEARCH_PHASE
                                     ? _URC_CONTINUE_UNWIND
                                     : _URC_HANDLER_FOUND;
                return;
            }
            // Convert 1-based byte offset into
            const uint8_t* action = actionTableStart + (actionEntry - 1);
            bool hasCleanup = false;
            // Scan action entries until you find a matching handler, cleanup, or the end of action list
            while (true)
            {
                const uint8_t* actionRecord = action;
                int64_t ttypeIndex = readSLEB128(&action);
                if (ttypeIndex > 0)
                {
                    // Found a catch, does it actually catch?
                    // First check for catch (...)
                    const __shim_type_info* catchType =
                        get_shim_type_info(static_cast<uint64_t>(ttypeIndex),
                                           classInfo, ttypeEncoding,
                                           native_exception, unwind_exception,
                                           base);
                    if (catchType == 0)
                    {
                        // Found catch (...) catches everything, including
                        // foreign exceptions. This is search phase, cleanup
                        // phase with foreign exception, or forced unwinding.
                        assert(actions & (_UA_SEARCH_PHASE | _UA_HANDLER_FRAME |
                                          _UA_FORCE_UNWIND));
                        results.ttypeIndex = ttypeIndex;
                        results.actionRecord = actionRecord;
                        results.adjustedPtr =
                            get_thrown_object_ptr(unwind_exception);
                        results.reason = _URC_HANDLER_FOUND;
                        return;
                    }
                    // Else this is a catch (T) clause and will never
                    //    catch a foreign exception
                    else if (native_exception)
                    {
                        __cxa_exception* exception_header = (__cxa_exception*)(unwind_exception+1) - 1;
                        void* adjustedPtr = get_thrown_object_ptr(unwind_exception);
                        const __shim_type_info* excpType =
                            static_cast<const __shim_type_info*>(exception_header->exceptionType);
                        if (adjustedPtr == 0 || excpType == 0)
                        {
                            // Something very bad happened
                            call_terminate(native_exception, unwind_exception);
                        }
                        if (catchType->can_catch(excpType, adjustedPtr))
                        {
                            // Found a matching handler. This is either search
                            // phase or forced unwinding.
                            assert(actions &
                                   (_UA_SEARCH_PHASE | _UA_FORCE_UNWIND));
                            results.ttypeIndex = ttypeIndex;
                            results.actionRecord = actionRecord;
                            results.adjustedPtr = adjustedPtr;
                            results.reason = _URC_HANDLER_FOUND;
                            return;
                        }
                    }
                    // Scan next action ...
                }
                else if (ttypeIndex < 0)
                {
                    // Found an exception specification.
                    if (actions & _UA_FORCE_UNWIND) {
                        // Skip if forced unwinding.
                    } else if (native_exception) {
                        // Does the exception spec catch this native exception?
                        __cxa_exception* exception_header = (__cxa_exception*)(unwind_exception+1) - 1;
                        void* adjustedPtr = get_thrown_object_ptr(unwind_exception);
                        const __shim_type_info* excpType =
                            static_cast<const __shim_type_info*>(exception_header->exceptionType);
                        if (adjustedPtr == 0 || excpType == 0)
                        {
                            // Something very bad happened
                            call_terminate(native_exception, unwind_exception);
                        }
                        if (exception_spec_can_catch(ttypeIndex, classInfo,
                                                     ttypeEncoding, excpType,
                                                     adjustedPtr,
                                                     unwind_exception, base))
                        {
                            // Native exception caught by exception
                            // specification.
                            assert(actions & _UA_SEARCH_PHASE);
                            results.ttypeIndex = ttypeIndex;
                            results.actionRecord = actionRecord;
                            results.adjustedPtr = adjustedPtr;
                            results.reason = _URC_HANDLER_FOUND;
                            return;
                        }
                    } else {
                        // foreign exception caught by exception spec
                        results.ttypeIndex = ttypeIndex;
                        results.actionRecord = actionRecord;
                        results.adjustedPtr =
                            get_thrown_object_ptr(unwind_exception);
                        results.reason = _URC_HANDLER_FOUND;
                        return;
                    }
                    // Scan next action ...
                } else {
                    hasCleanup = true;
                }
                const uint8_t* temp = action;
                int64_t actionOffset = readSLEB128(&temp);
                if (actionOffset == 0)
                {
                    // End of action list. If this is phase 2 and we have found
                    // a cleanup (ttypeIndex=0), return _URC_HANDLER_FOUND;
                    // otherwise return _URC_CONTINUE_UNWIND.
                    results.reason = hasCleanup && actions & _UA_CLEANUP_PHASE
                                         ? _URC_HANDLER_FOUND
                                         : _URC_CONTINUE_UNWIND;
                    return;
                }
                // Go to next action
                action += actionOffset;
            }  // there is no break out of this loop, only return
        }
#ifndef __USING_SJLJ_EXCEPTIONS__
        else if (ipOffset < start)
        {
            // There is no call site for this ip
            // Something bad has happened.  We should never get here.
            // Possible stack corruption.
            call_terminate(native_exception, unwind_exception);
        }
#endif // !__USING_SJLJ_EXCEPTIONS__
    }  // there might be some tricky cases which break out of this loop

    // It is possible that no eh table entry specify how to handle
    // this exception. By spec, terminate it immediately.
    call_terminate(native_exception, unwind_exception);
}

从这里可以看到,这里的核心就是获取lsda数据(_Unwind_GetLanguageSpecificData, .gcc_except_table段),然后用上下文传过来的抛出的异常信息来匹配,如果匹配上,就找到了对应的catch字段,我们就返回并执行,如果没有匹配上,就只有调用std::terminate了。

其实这里的解析lsda,就能找到对应的catch-block,因此我们需要了解一下lsda的大致结构:

c++ 复制代码
/*
    Exception Handling Table Layout:

+-----------------+--------+
| lpStartEncoding | (char) |
+---------+-------+--------+---------------+-----------------------+
| lpStart | (encoded with lpStartEncoding) | defaults to funcStart |
+---------+-----+--------+-----------------+---------------+-------+
| ttypeEncoding | (char) | Encoding of the type_info table |
+---------------+-+------+----+----------------------------+----------------+
| classInfoOffset | (ULEB128) | Offset to type_info table, defaults to null |
+-----------------++--------+-+----------------------------+----------------+
| callSiteEncoding | (char) | Encoding for Call Site Table |
+------------------+--+-----+-----+------------------------+--------------------------+
| callSiteTableLength | (ULEB128) | Call Site Table length, used to find Action table |
+---------------------+-----------+---------------------------------------------------+
+---------------------+-----------+------------------------------------------------+
| Beginning of Call Site Table            The current ip is a 1-based index into   |
| ...                                     this table.  Or it is -1 meaning no      |
|                                         action is needed.  Or it is 0 meaning    |
|                                         terminate.                               |
| +-------------+---------------------------------+------------------------------+ |
| | landingPad  | (ULEB128)                       | offset relative to lpStart   | |
| | actionEntry | (ULEB128)                       | Action Table Index 1-based   | |
| |             |                                 | actionEntry == 0 -> cleanup  | |
| +-------------+---------------------------------+------------------------------+ |
| ...                                                                              |
+----------------------------------------------------------------------------------+
+---------------------------------------------------------------------+
| Beginning of Action Table       ttypeIndex == 0 : cleanup           |
| ...                             ttypeIndex  > 0 : catch             |
|                                 ttypeIndex  < 0 : exception spec    |
| +--------------+-----------+--------------------------------------+ |
| | ttypeIndex   | (SLEB128) | Index into type_info Table (1-based) | |
| | actionOffset | (SLEB128) | Offset into next Action Table entry  | |
| +--------------+-----------+--------------------------------------+ |
| ...                                                                 |
+---------------------------------------------------------------------+-----------------+
| type_info Table, but classInfoOffset does *not* point here!                           |
| +----------------+------------------------------------------------+-----------------+ |
| | Nth type_info* | Encoded with ttypeEncoding, 0 means catch(...) | ttypeIndex == N | |
| +----------------+------------------------------------------------+-----------------+ |
| ...                                                                                   |
| +----------------+------------------------------------------------+-----------------+ |
| | 1st type_info* | Encoded with ttypeEncoding, 0 means catch(...) | ttypeIndex == 1 | |
| +----------------+------------------------------------------------+-----------------+ |
| +---------------------------------------+-----------+------------------------------+  |
| | 1st ttypeIndex for 1st exception spec | (ULEB128) | classInfoOffset points here! |  |
| | ...                                   | (ULEB128) |                              |  |
| | Mth ttypeIndex for 1st exception spec | (ULEB128) |                              |  |
| | 0                                     | (ULEB128) |                              |  |
| +---------------------------------------+------------------------------------------+  |
| ...                                                                                   |
| +---------------------------------------+------------------------------------------+  |
| | 0                                     | (ULEB128) | throw()                      |  |
| +---------------------------------------+------------------------------------------+  |
| ...                                                                                   |
| +---------------------------------------+------------------------------------------+  |
| | 1st ttypeIndex for Nth exception spec | (ULEB128) |                              |  |
| | ...                                   | (ULEB128) |                              |  |
| | Mth ttypeIndex for Nth exception spec | (ULEB128) |                              |  |
| | 0                                     | (ULEB128) |                              |  |
| +---------------------------------------+------------------------------------------+  |
+---------------------------------------------------------------------------------------+
*/

从这里可以知道,其实lsda的核心,就是遍历 Call Site Table,获取到Action Table Index,然后在Action Table中获取到ttypeIndex,然后根据ttypeIndex在type_info Table中开始搜索和匹配异常对象和catch对象是否匹配。如果匹配,返回,如果不匹配,循环遍历Action Table中的action链表,直到处理完。

本文不同异常捕获的原因分析

根据上文的分析,本文的问题肯定出在lsda的Action Table和type_info Table上面。

c++ 复制代码
int main(int argc, char* argv[])
{
        try{
                p();
        }
        catch(std::exception& e){
                printf("std::exception: %s\n", e.what());
        }
        catch(...){
                printf("unkown exception\n");
        }
        return 0;
}
shell 复制代码
# objdump -d --disassemble=main ./build/test
# 此时是正常捕获std异常
0000000000001a70 <main>:
    1a70:       55                      push   %rbp
    1a71:       48 89 e5                mov    %rsp,%rbp
    1a74:       48 83 ec 30             sub    $0x30,%rsp
    1a78:       c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
    1a7f:       89 7d f8                mov    %edi,-0x8(%rbp)
    1a82:       48 89 75 f0             mov    %rsi,-0x10(%rbp)
    1a86:       e8 35 01 00 00          call   1bc0 <p@plt>
    1a8b:       e9 00 00 00 00          jmp    1a90 <main+0x20>
    1a90:       e9 51 00 00 00          jmp    1ae6 <main+0x76>
    1a95:       48 89 c1                mov    %rax,%rcx
    1a98:       89 d0                   mov    %edx,%eax
    1a9a:       48 89 4d e8             mov    %rcx,-0x18(%rbp)
    1a9e:       89 45 e4                mov    %eax,-0x1c(%rbp)
    1aa1:       8b 45 e4                mov    -0x1c(%rbp),%eax
    1aa4:       b9 02 00 00 00          mov    $0x2,%ecx
    1aa9:       39 c8                   cmp    %ecx,%eax
    1aab:       0f 85 3d 00 00 00       jne    1aee <main+0x7e>
    1ab1:       48 8b 7d e8             mov    -0x18(%rbp),%rdi
    1ab5:       e8 16 01 00 00          call   1bd0 <__cxa_begin_catch@plt>
    1aba:       48 89 45 d8             mov    %rax,-0x28(%rbp)
    1abe:       48 8b 7d d8             mov    -0x28(%rbp),%rdi
    1ac2:       48 8b 07                mov    (%rdi),%rax
    1ac5:       48 8b 40 10             mov    0x10(%rax),%rax
    1ac9:       ff d0                   call   *%rax
    1acb:       48 89 c6                mov    %rax,%rsi
    1ace:       48 8d 3d a1 ed ff ff    lea    -0x125f(%rip),%rdi        # 876 <_IO_stdin_used+0x16>
    1ad5:       31 c0                   xor    %eax,%eax
    1ad7:       e8 04 01 00 00          call   1be0 <printf@plt>
    1adc:       e9 00 00 00 00          jmp    1ae1 <main+0x71>
    1ae1:       e8 0a 01 00 00          call   1bf0 <__cxa_end_catch@plt>
    1ae6:       31 c0                   xor    %eax,%eax
    1ae8:       48 83 c4 30             add    $0x30,%rsp
    1aec:       5d                      pop    %rbp
    1aed:       c3                      ret
    1aee:       48 8b 7d e8             mov    -0x18(%rbp),%rdi
    1af2:       e8 d9 00 00 00          call   1bd0 <__cxa_begin_catch@plt>
    1af7:       48 8d 3d 66 ed ff ff    lea    -0x129a(%rip),%rdi        # 864 <_IO_stdin_used+0x4>
    1afe:       31 c0                   xor    %eax,%eax
    1b00:       e8 db 00 00 00          call   1be0 <printf@plt>
    1b05:       e9 00 00 00 00          jmp    1b0a <main+0x9a>
    1b0a:       e8 e1 00 00 00          call   1bf0 <__cxa_end_catch@plt>
    1b0f:       e9 d2 ff ff ff          jmp    1ae6 <main+0x76>
    1b14:       48 89 c1                mov    %rax,%rcx
    1b17:       89 d0                   mov    %edx,%eax
    1b19:       48 89 4d e8             mov    %rcx,-0x18(%rbp)
    1b1d:       89 45 e4                mov    %eax,-0x1c(%rbp)
    1b20:       e8 cb 00 00 00          call   1bf0 <__cxa_end_catch@plt>
    1b25:       e9 00 00 00 00          jmp    1b2a <main+0xba>
    1b2a:       e9 1b 00 00 00          jmp    1b4a <main+0xda>
    1b2f:       48 89 c1                mov    %rax,%rcx
    1b32:       89 d0                   mov    %edx,%eax
    1b34:       48 89 4d e8             mov    %rcx,-0x18(%rbp)
    1b38:       89 45 e4                mov    %eax,-0x1c(%rbp)
    1b3b:       e8 b0 00 00 00          call   1bf0 <__cxa_end_catch@plt>
    1b40:       e9 00 00 00 00          jmp    1b45 <main+0xd5>
    1b45:       e9 00 00 00 00          jmp    1b4a <main+0xda>
    1b4a:       48 8b 7d e8             mov    -0x18(%rbp),%rdi
    1b4e:       e8 ad 00 00 00          call   1c00 <_Unwind_Resume@plt>
    1b53:       48 89 c7                mov    %rax,%rdi
    1b56:       e8 05 00 00 00          call   1b60 <__clang_call_terminate>

当正常捕获异常时,cmp %ecx,%eax位置的eax的值是2,正常进入异常分支。当异常捕获异常时,cmp %ecx,%eax位置的eax的值是1,进入异常捕获分支。意味着在异常情况下:get_shim_type_info(scan_eh_tab中)返回值是0。(注意,第一次查找到了类型,但是不匹配,循环遍历链表下一此匹配到了catch(...))

上面是我们的猜测,我们直接重新构建libcxx/libcxxabi的debug版本,然后再构建我们的测试程序,然后在scan_eh_tab中我们得到了如下的图的核心结果:

从上面可知,我们不同的构建方法,导致了cxx底层无法对两个class类型进行dynamic_cast,导致无法匹配,因此进入了catch(...)的代码段。有兴趣的人可以去追踪dynamic_cast的底层实现函数如下:

c++ 复制代码
__dynamic_cast(const void *static_ptr, const __class_type_info *static_type,
               const __class_type_info *dst_type,
               std::ptrdiff_t src2dst_offset)

也就是说,我们的核心原因就是__class_type_info在静态编译、动态编译不同情况下,虽然定义是一样的,当两个符号分别在libc++.so和libuser.so的不同符号的时候(地址不一样),但是无法进行cast操作,这是合理的。

后记


总的来说,上面的内容解答了如下两个问题:

  • 为什么会捕获到异常:编译条件导致的android系统底层对某些api有不同的控制行为?
  • 为什么符号都存在的情况下,走了不一样的异常捕获路径:核心在于typeinfo对象无法dynamic_cast

此次问题调查,加深了我对stl_static/stl_shared的理解,同时加深了我对c++底层实现的了解。加深了我对gcc/clang等编译器的底层功能结构的了解。

同时,根据这次折腾llvm源码的过程,下次再一次想了解c++底层的实现的话,会快捷、方便不少。

参考文献


打赏、订阅、收藏、丢香蕉、硬币,请关注公众号(攻城狮的搬砖之路)

PS: 请尊重原创,不喜勿喷。

PS: 要转载请注明出处,本人版权所有。

PS: 有问题请留言,看到后我会第一时间回复。