在Android Native层实现Try/Catch异常处理机制

在Java层，我们可以使用try/catch语句来捕获和处理异常。然而，在Android的Native层（C/C++代码），我们并没有内置的异常处理机制。这篇文章将介绍如何在Android Native层实现类似于try/catch的异常处理机制。

一、技术原理

在Native层实现异常处理的关键在于信号处理（Signal Handling）和非局部跳转（Non-Local Jumps）。当程序发生错误（如访问非法内存、除以零等）时，操作系统会向进程发送一个信号。我们可以设置一个信号处理函数（Signal Handler），在收到信号时执行特定的代码。

非局部跳转提供了一种在程序中跳转到另一个位置的方法，而不是按照正常的控制流程执行。在C语言中，我们可以使用setjmp和longjmp函数来实现非局部跳转。setjmp函数保存当前的执行上下文（包括堆栈和寄存器状态等），并返回0。longjmp函数恢复由setjmp保存的上下文，并使setjmp返回一个非零值。我们可以利用这个特性，在信号处理函数中调用longjmp，跳转到setjmp所在的位置，实现异常的捕获和处理。

二、代码实现

2.1 定义结构体保存线程的异常处理信息

首先，我们定义一个结构体native_code_handler_struct，用于保存线程的异常处理信息。这个结构体包括一个sigjmp_buf类型的变量ctx，用于保存setjmp的上下文；一个标志位ctx_is_set，表示上下文是否已经被设置；以及其他与异常处理相关的信息。

c 复制代码

/* Thread-specific crash handler structure. */
typedef struct native_code_handler_struct {
  /* Restore point context. */
  sigjmp_buf ctx;
  int ctx_is_set;
  int reenter;

  /* Alternate stack. */
  char *stack_buffer;
  size_t stack_buffer_size;
  stack_t stack_old;

  /* Signal code and info. */
  int code;
  siginfo_t si;
  ucontext_t uc;

  /* Custom assertion failures. */
  const char *expression;
  const char *file;
  int line;

  /* Alarm was fired. */
  int alarm;
} native_code_handler_struct;

native_code_handler_struct* native_code_handler_g;

static native_code_handler_struct* getCrashHandler() {
    return native_code_handler_g;
}

2.2 实现try/catch语义

然后，我们定义了一系列的函数和宏，用于实现try/catch语义。COFFEE_TRY宏检查当前是否已经在一个try块中（通过inside函数），如果不在，则设置信号处理函数（通过setupSignalHandler函数）并保存执行上下文（通过sigsetjmp函数）。COFFEE_CATCH宏和COFFEE_END宏则用于标识catch块和try/catch块的结束。

c 复制代码

/** Internal functions & definitions, not to be used directly. **/
#include <setjmp.h>
extern int inside(void);
extern int setupSignalHandler(int);
extern sigjmp_buf* get_ctx(void);
extern void cleanup(void);
#define COFFEE_TRY()                                \
  if (inside() || \
      (setupSignalHandler() == 0 \
       && sigsetjmp(*get_ctx(), 1) == 0))
#define COFFEE_CATCH() else
#define COFFEE_END() cleanup()
/** End of internal functions & definitions. **/

2.3 检查当前线程的异常处理信息

inside函数检查当前线程的异常处理信息，如果已经在一个try块中，则增加reenter计数并返回1；否则返回0。

c 复制代码

 * Returns 1 if we are already inside a coffeecatch block, 0 otherwise.
 */
int inside() {
    native_code_handler_struct *const t = getCrashHandler();
    if (t != NULL && t->reenter > 0) {
        t->reenter++;
        return 1;
    }
    return 0;
}

2.4 设置信号处理函数

setupSignalHandler函数设置信号处理函数，并将reenter计数加1，表示进入了一个新的try块。

c 复制代码

/**
 * Calls handler_setup(1) to setup a crash handler, mark the
 * context as valid, and return 0 upon success.
 */
int setupSignalHandler(int id) {
    if (handler_setup(1, id) == 0) {
        native_code_handler_struct *const t = getCrashHandler();
        assert(t != NULL);
        t->reenter++;
        t->ctx_is_set = 1;
        LOGD("setup reenter:%d", t->reenter);
        return 0;
    } else {
        return -1;
    }
}

handler_setup设置崩溃处理器，包括全局和线程相关的资源。首先调用handler_setup_global(id)初始化全局资源，然后为当前线程初始化本地资源。

c 复制代码

/**
 * Acquire the crash handler for the current thread.
 * The handler_cleanup() must be called to release allocated
 * resources.
 **/
static int handler_setup(int setup_thread, int id) {
    int code;
    
    DEBUG(print("setup for a new handler\n"));
    
    /* Initialize globals. */
    if (pthread_mutex_lock(&native_code_g.mutex) != 0) {
        return -1;
    }
    
    code = handler_setup_global(id);
    
    if (pthread_mutex_unlock(&native_code_g.mutex) != 0) {
        return -1;
    }
    
    /* Global initialization failed. */
    if (code != 0) {
        return -1;
    }
    
    /* Initialize locals. */
    if (setup_thread && getCrashHandler() == NULL) {
        native_code_handler_struct *const t = native_code_handler_struct_init();
        
        if (t == NULL) {
            return -1;
        }
        
        native_code_handler_g = t;
        
        DEBUG(print("installed thread alternative stack\n"));
    }
    
    /* OK. */
    return 0;
}

handler_setup_global初始化全局资源，包括分配内存、设置信号处理函数等。首先为native_code_g.sa_old和native_code_g.id分配内存，然后设置信号处理函数coffeecatch_signal_pass，并将其设置到指定的信号上。

c 复制代码

/* Internal globals initialization. */
static int handler_setup_global(int id) {
    int curInitCount = native_code_g.initialized;
    size_t i;
    struct sigaction sa_pass;
    
    if (native_code_g.initialized++ == 0) {
        native_code_g.sa_old = calloc(sizeof(struct sigaction*), MAX_SIGNAL_HANDLER_SETUP_TIMES);
        if (native_code_g.sa_old == NULL) {
            return -1;
        }
        native_code_g.id = calloc(sizeof(int), MAX_SIGNAL_HANDLER_SETUP_TIMES);
    }
    native_code_g.id[curInitCount] = id;
    
    if (native_code_g.initialized > native_code_g.maxInitialized) {
        native_code_g.maxInitialized = native_code_g.initialized;
        assert(native_code_g.maxInitialized <= MAX_SIGNAL_HANDLER_SETUP_TIMES);
    }

    DEBUG(print("installing global signal handlers\n"));

    /* Setup handler structure. */
    memset(&sa_pass, 0, sizeof(sa_pass));
    sigemptyset(&sa_pass.sa_mask);
    sa_pass.sa_sigaction = coffeecatch_signal_pass;
    sa_pass.sa_flags = SA_SIGINFO | SA_ONSTACK;

    /* Allocate */
    native_code_g.sa_old[curInitCount] = calloc(sizeof(struct sigaction), SIG_NUMBER_MAX);
    if (native_code_g.sa_old[curInitCount] == NULL) {
        return -1;
    }

    /* Setup signal handlers for SIGABRT (Java calls abort()) and others. **/
    for (i = 0; native_sig_catch[i] != 0; i++) {
        const int sig = native_sig_catch[i];
        const struct sigaction * const action = &sa_pass;
        assert(sig < SIG_NUMBER_MAX);
        if (sigaction(sig, action, &native_code_g.sa_old[curInitCount][sig]) != 0) {
            return -1;
        }
    }

    DEBUG(print("installed global signal handlers\n"));

    /* OK. */
    return 0;
}

2.5 信号处理和非局部跳转

coffeecatch_signal_pass和coffeecatch_try_jump_userland两个函数用于信号处理和非局部跳转，以实现在Java层捕获Native层的异常。

2.5.1 信号处理函数实现

coffeecatch_signal_pass是一个信号处理函数，用于在捕获到信号时执行。它首先调用原始的Java信号处理器，然后设置一个定时器以防止死锁。接着，它检查是否有可用的上下文，如果有，则将信号信息和上下文信息保存到native_code_handler_struct结构体中，并尝试跳转到用户空间。如果没有可用的上下文，函数将调用abort()终止程序。

c 复制代码

static void coffeecatch_signal_pass(const int code, siginfo_t *const si,
                                    void *const sc) {
  native_code_handler_struct *t;

  DEBUG(print("caught signal\n"));

  /* Call the "real" Java handler for JIT and internals. */
  coffeecatch_call_old_signal_handler(code, si, sc);

  /* Ensure we do not deadlock. Default of ALRM is to die.
   * (signal() and alarm() are signal-safe) */
  signal(code, SIG_DFL);
  coffeecatch_start_alarm();

  /* Available context ? */
  t = coffeecatch_get();
  if (t != NULL) {
    /* An alarm() call was triggered. */
    coffeecatch_mark_alarm(t);

    /* Take note of the signal. */
    coffeecatch_copy_context(t, code, si, sc);

    /* Back to the future. */
    coffeecatch_try_jump_userland(t, code, si, sc);
  }

  /* Nope. (abort() is signal-safe) */
  DEBUG(print("calling abort()\n"));
  signal(SIGABRT, SIG_DFL);
  abort();
}

2.5.2 跳转回用户空间

coffeecatch_try_jump_userland尝试将程序的执行环境跳转回用户空间。它首先检查是否有有效的上下文，如果有，则恢复备用堆栈，并调用siglongjmp()函数跳转回之前保存的执行环境。

需要注意的是，siglongjmp()函数在信号处理中并不是异步信号安全的，因此在使用它时需要谨慎。

c 复制代码

/* Try to jump to userland. */
static void coffeecatch_try_jump_userland(native_code_handler_struct*
                                                 const t,
                                                 const int code,
                                                 siginfo_t *const si,
                                                 void * const sc) {
  (void) si; /* UNUSED */
  (void) sc; /* UNUSED */

  /* Valid context ? */
  if (t != NULL && t->ctx_is_set) {
    DEBUG(print("calling siglongjmp()\n"));

    /* Invalidate the context */
    t->ctx_is_set = 0;

    /* We need to revert the alternate stack before jumping. */
    coffeecatch_revert_alternate_stack();

    siglongjmp(t->ctx, code);
  }
}

这段代码的主要作用是在捕获到信号时执行特定的操作，例如保存信号信息、恢复执行环境等。

2.6 清理异常处理的资源

cleanup函数清理异常处理的资源，并将reenter计数减1，表示退出了一个try块。

c 复制代码

/**
 * Calls handler_cleanup()
 */
void cleanup() {
    revert_alternate_stack();
    
    native_code_handler_struct *const t = getCrashHandler();
    assert(t != NULL);
    assert(t->reenter > 0);
    t->reenter--;
    if (t->reenter == 0) {
        t->ctx_is_set = 0;
        handler_cleanup();
    }
}

revert_alternate_stack() 用于恢复线程的堆栈。它通过 sigaltstack() 系统调用获取当前线程的堆栈信息，并将 SS_ONSTACK 标志位清除，表示不再使用备用堆栈。

c 复制代码

/* Unflag "on stack" */
static void revert_alternate_stack(void) {
#ifndef NO_USE_SIGALTSTACK
    stack_t ss;
    if (sigaltstack(NULL, &ss) == 0) {
        ss.ss_flags &= ~SS_ONSTACK;
        sigaltstack (&ss, NULL);
    }
#endif
}

handler_cleanup() 用于清理异常处理的全局资源：

释放当前线程的异常处理信息，并恢复线程的堆栈。
通过 pthread_mutex_lock() 和 pthread_mutex_unlock() 函数加锁和解锁全局资源，以保证在多线程环境中的安全性。
遍历所有捕获的信号，并使用 sigaction() 函数将信号处理函数恢复为最早设置的旧信号处理函数。
释放所有分配的内存，并使用 pthread_key_delete() 函数删除线程局部存储的键。

c 复制代码

static int handler_cleanup() {
    /* Cleanup locals. */
    native_code_handler_struct *const t = getCrashHandler();
    if (t != NULL) {
        DEBUG(print("removing thread alternative stack\n"));
        
        /* Erase thread-specific value now (detach). */
        if (pthread_setspecific(native_code_thread, NULL) != 0) {
            assert(! "pthread_setspecific() failed");
        }
        
        /* Free handler and reset slternate stack */
        if (native_code_handler_struct_free(t) != 0) {
            return -1;
        }
        
        DEBUG(print("removed thread alternative stack\n"));
    }
    
    /* Cleanup globals. */
    if (pthread_mutex_lock(&native_code_g.mutex) != 0) {
        assert(! "pthread_mutex_lock() failed");
    }
    assert(native_code_g.maxInitialized != 0);
    if (native_code_g.initialized == 0) {
        
        size_t i;
        /* Restore signal handler. */
        for(i = 0; native_sig_catch[i] != 0; i++) {
            const int sig = native_sig_catch[i];
            assert(sig < SIG_NUMBER_MAX);
            //直接重置成最早一次设置信号处理函数时，所对应的旧信号处理函数
            if (sigaction(sig, &native_code_g.sa_old[0][sig], NULL) != 0) {
                return -1;
            }
        }
        
        /* Free old structure. */
        for (i = 0;i < native_code_g.maxInitialized;i++) {
            free(native_code_g.sa_old[i]);
            native_code_g.sa_old[i] = NULL;
        }
                
        /* Free old structure. */
        free(native_code_g.sa_old);
        native_code_g.sa_old = NULL;
        free(native_code_g.id);
        native_code_g.id = NULL;
        LOGV("cleanup signal handler");

        /* Delete thread var. */
        if (pthread_key_delete(native_code_thread) != 0) {
            assert(! "pthread_key_delete() failed");
        }
    }
    if (pthread_mutex_unlock(&native_code_g.mutex) != 0) {
        assert(! "pthread_mutex_unlock() failed");
    }
    
    return 0;
}

三、使用示例

3.1 示例

上述实现允许我们从信号（如segv，sibus等）中恢复正常，就像一个Java异常一样。然而，它无法从allocator/mutexes等问题中恢复正常，但至少大多数崩溃（如空指针解引用、整数除法、栈溢出等）应该可以处理。

我们需用使用-funwind-tables编译所有的库，才可以在所有的二进制文件上获取正确的堆栈信息。在ARM上，也可以使用--no-merge-exidx-entries链接器开关，来解决堆栈相关的问题。在Android上，可以在每个库的Android.mk文件中使用以下行来实现这一点： LOCAL_CFLAGS := -funwind-tables -Wl,--no-merge-exidx-entries

以下是一个简单的示例，演示如何在Android Native层使用上述代码实现的try/catch异常处理机制。

c 复制代码

COFFEE_TRY() {
  call_some_native_function()
} COFFEE_CATCH() {
  const char*const message = get_message();
  jclass cls = (*env)->FindClass(env, "java/lang/RuntimeException");
  (*env)->ThrowNew(env, cls, strdup(message));
} COFFEE_END();

当异常发生时，程序会跳过try块中剩余的代码，直接进入catch块。这样，我们可以捕获和处理异常，避免程序崩溃。

通过上述代码，我们可以在Android Native层实现类似于Java的try/catch异常处理机制。这对于提高Native代码的稳定性和可维护性非常有帮助。需要注意的是，这种方法并不能捕获所有类型的异常，例如C++抛出的异常。在实际应用中，我们需要根据具体的需求和场景来选择最合适的异常处理策略。

3.2 如何在Native层获取更多的异常信息

我们还可以在catch块中获取和处理这些异常信息。例如，打印异常类型、出错地址、寄存器状态等。

c 复制代码

const char* get_message() {
    const int error = errno; // 保存当前线程的 errno 值
    const native_code_handler_struct* const t = getCrashHandler(); // 获取当前线程的异常处理信息

    // 如果找到有效的异常处理信息
    if (t != NULL) {
        char * const buffer = t->stack_buffer; // 缓冲区用于存储错误消息
        const size_t buffer_len = t->stack_buffer_size; // 缓冲区的大小
        size_t buffer_offs = 0; // 缓冲区的偏移量，用于追加字符串

        const char* const posix_desc =
        desc_sig(t->si.si_signo, t->si.si_code); // 获取信号的描述字符串

        // 如果是断言失败
        if ((t->code == SIGABRT
#ifdef __ANDROID__
             // 在 Android 系统中，由于 BUG #16672，断言失败可能会导致 SIGSEGV 信号
             || (t->code == SIGSEGV && (uintptr_t) t->si.si_addr == 0xdeadbaad)
#endif
             ) && t->expression != NULL) {
            // 将断言失败的信息格式化到缓冲区
            snprintf(&buffer[buffer_offs], buffer_len - buffer_offs,
                     "assertion '%s' failed at %s:%d",
                     t->expression, t->file, t->line);
            buffer_offs += strlen(&buffer[buffer_offs]);
        }
        // 其他信号
        else {
            // 将信号编号格式化到缓冲区
            snprintf(&buffer[buffer_offs], buffer_len - buffer_offs, "signal %d", t->si.si_signo);
            buffer_offs += strlen(&buffer[buffer_offs]);

            // 将信号描述格式化到缓冲区
            snprintf(&buffer[buffer_offs], buffer_len - buffer_offs, " (%s)", posix_desc);
            buffer_offs += strlen(&buffer[buffer_offs]);

            // 如果是非法指令或段错误，将错误地址格式化到缓冲区
            if (t->si.si_signo == SIGILL || t->si.si_signo == SIGSEGV) {
                snprintf(&buffer[buffer_offs], buffer_len - buffer_offs, " at address %p", t->si.si_addr);
                buffer_offs += strlen(&buffer[buffer_offs]);
            }
        }

        // 如果信号关联的 errno 值非零，将对应的错误信息格式化到缓冲区
        if (t->si.si_errno != 0) {
            snprintf(&buffer[buffer_offs], buffer_len - buffer_offs, ": ");
            buffer_offs += strlen(&buffer[buffer_offs]);
            if (strerror_r(t->si.si_errno, &buffer[buffer_offs], buffer_len - buffer_offs) == 0) {
                snprintf(&buffer[buffer_offs], buffer_len - buffer_offs, "unknown error");
                buffer_offs += strlen(&buffer[buffer_offs]);
            }
        }

        // 如果是子进程终止信号，将发送进程的 ID 格式化到缓冲区
        if (t->si.si_signo == SIGCHLD && t->si.si_pid != 0) {
            snprintf(&buffer[buffer_offs], buffer_len - buffer_offs, " (sent by pid %d)", (int) t->si.si_pid);
            buffer_offs += strlen(&buffer[buffer_offs]);
        }

        // 返回错误消息字符串
        buffer[buffer_offs] = '\0';
        return t->stack_buffer;
    } else {
        // 静态缓冲区用于处理异常处理器设置期间的错误
        static char buffer[256];
#ifdef _GNU_SOURCE
        return strerror_r(error, &buffer[0], sizeof(buffer));
#else
        const int code = strerror_r(error, &buffer[0], sizeof(buffer));
        errno = error;
        if (code == 0) {
            return buffer;
        } else {
            return "unknown error during crash handler setup";
        }
#endif
    }
}

这个函数的主要作用是在捕获到异常时获取异常的详细信息，以便在异常处理代码中使用。通过这个函数，我们可以在Android Native层实现更详细和准确的异常处理。

需要注意的是，在处理异常时，我们应该尽量避免执行可能触发新异常的操作，例如访问非法内存、调用不安全的函数等。在实际应用中，我们可以根据具体的需求和场景来选择最合适的异常处理策略。

3.3 限制

本文提供的异常处理机制不能捕获所有类型的异常。例如，不能捕获由于堆栈溢出导致的异常。对于这些情况，需要使用其他方法来进行处理和调试。
在某些架构和编译器下，setjmp和longjmp函数的行为可能与本文描述的不完全相同。因此在使用本文提供的异常处理机制之前，请确保在目标平台上能够正常工作。
本文提供的异常处理机制可能会影响应用程序的性能。因为它需要在运行时设置信号处理函数，并在发生异常时执行非局部跳转。在性能敏感的场景中，请谨慎使用这种机制。

3.4 注意事项

在使用本文提供的异常处理机制时，请确保正确地设置和清理信号处理函数。在多线程环境中，需要为每个线程单独设置和清理信号处理函数。
在catch块中，尽量避免执行可能引发新异常的代码。因为在catch块中发生的异常可能无法被捕获和处理。
在catch块中，可以使用COFFEE_EXCEPTION()宏获取异常的详细信息，例如信号编号、错误地址等。这些信息对于调试和错误报告非常有用。
请注意，本文提供的异常处理机制并不能替代合理的错误处理和资源管理策略。在编写Native代码时，请始终确保正确地处理错误情况，并在适当的时候释放分配的资源。

四、如何在Native层捕获和处理C++抛出的异常

在前面的部分中，我们已经介绍了如何在Android Native层实现类似于Java的try/catch异常处理机制，并获取异常的详细信息。现在，我们将介绍如何在Native层捕获和处理C++抛出的异常。

在C++中，异常处理机制与C语言中的信号处理和非局部跳转不同。C++异常是通过throw语句抛出的，可以被catch语句捕获和处理。由于C++异常处理机制与C语言不兼容，我们需要使用C++特性来捕获和处理C++异常。

以下是一个简单的示例，演示如何在Android Native层捕获和处理C++抛出的异常：

cpp 复制代码

#include <iostream>
#include <stdexcept>

void native_function() {
    try {
        // 故意抛出一个C++异常
        throw std::runtime_error("An error occurred.");
        std::cout << "This line will not be executed." << std::endl;
    } catch (const std::exception &e) {
        std::cout << "Caught an exception: " << e.what() << std::endl;
    } catch (...) {
        std::cout << "Caught an unknown exception." << std::endl;
    }
}

在这个示例中，我们使用C++的try和catch语句捕获和处理异常。当发生异常时，程序会跳过try块中剩余的代码，直接进入catch块。这样，我们可以捕获和处理C++抛出的异常，避免程序崩溃。

需要注意的是，C++异常处理机制与前面介绍的C语言异常处理机制不兼容。在混合使用C和C++代码的项目中，我们需要分别处理C和C++的异常。在实际应用中，我们可以根据具体的需求和场景来选择最合适的异常处理策略。

五、总结

总结一下，在Android Native层实现异常处理机制，我们需要考虑以下几点：

使用信号处理和非局部跳转实现类似于Java的try/catch异常处理机制，捕获C语言中的异常（如非法内存访问、浮点异常等）。
在信号处理函数中获取异常的详细信息（如信号类型、出错地址、寄存器状态等），并在catch块中进行处理。
对于C++抛出的异常，使用C++的try/catch语句进行捕获和处理。

通过以上方法，我们可以在Android Native层实现更稳定和可维护的代码。在实际应用中，我们需要根据具体的需求和场景来选择最合适的异常处理策略。