记录一次鸿蒙JSVM崩溃定位修复

背景

我们的跨端方案需要在鸿蒙上动态执行js代码,类似RN。鸿蒙提供了JSVM解决方案,JSVM套壳V8。但在运行过程出现一个JSVM内部的崩溃。这篇文章主要记录了如何使用调试方法、少量汇编知识定位到系统库的崩溃原因

现象

使用jsvm执行js脚本,发生了WhiteToGreyAndPush崩溃,堆栈如下:

ruby 复制代码
Reason:Signal:SIGSEGV(SEGV_MAPERR)@0xbebebebebebebec6 
[libjsvm.so] v8::internal::MarkingBarrier::WhiteToGreyAndPush(v8::internal::HeapObject) Disassembly:201
[libjsvm.so] v8::internal::MarkingBarrier::Write(v8::internal::HeapObject, v8::internal::FullHeapObjectSlot, v8::internal::HeapObject) 0x0000005f572087bc
[libjsvm.so] v8::internal::Factory::CodeBuilder::BuildInternal(bool) 0x0000005f5718766c
[libjsvm.so] v8::internal::compiler::CodeGenerator::FinalizeCode() 0x0000005f578bd380
[libjsvm.so] auto v8::internal::compiler::PipelineImpl::Run<v8::internal::compiler::FinalizeCodePhase>() 0x0000005f57b25c3c
[libjsvm.so] v8::internal::compiler::PipelineImpl::FinalizeCode(bool) 0x0000005f57b19208
[libjsvm.so] v8::internal::compiler::PipelineCompilationJob::FinalizeJobImpl(v8::internal::Isolate*) 0x0000005f57b1903c
[libjsvm.so] v8::internal::Compiler::FinalizeTurbofanCompilationJob(v8::internal::TurbofanCompilationJob*, v8::internal::Isolate*) 0x0000005f57065b40
[libjsvm.so] v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() 0x0000005f570b3b24
[libjsvm.so] v8::internal::StackGuard::HandleInterrupts() 0x0000005f57149b20
[libjsvm.so] v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) 0x0000005f57592604
[libjsvm.so] Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit 0x0000005f56af20b8
[libjsvm.so] Builtins_ProxyConstructor 0x0000005f56b53190
[libjsvm.so] Builtins_JSBuiltinsConstructStub 0x0000005f56a6677c
[??] ?? 0x0000005f3f4c9b84
ruby 复制代码
Reason:Signal:SIGSEGV(SEGV_MAPERR)@0xbebebebebebebec6 
[libjsvm.so] v8::internal::MarkingBarrier::WhiteToGreyAndPush(v8::internal::HeapObject) Disassembly:201
[libjsvm.so] v8::internal::MarkingBarrier::Write(v8::internal::HeapObject, v8::internal::FullHeapObjectSlot, v8::internal::HeapObject) 0x0000005f37f887bc
[libjsvm.so] v8::internal::Dictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::SetEntry(v8::internal::InternalIndex, v8::internal::Object, v8::internal::Object, v8::internal::PropertyDetails) 0x0000005f381d2c0c
[libjsvm.so] v8::internal::Handle<v8::internal::NameDictionary> v8::internal::Dictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::Add<v8::internal::Isolate, (v8::internal::AllocationType)0>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NameDictionary>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, v8::internal::InternalIndex*) 0x0000005f381d32dc
[libjsvm.so] v8::internal::BaseNameDictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::Add(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NameDictionary>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, v8::internal::InternalIndex*) 0x0000005f381d2e64
[libjsvm.so] v8::internal::JSObject::MigrateToMap(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Map>, int) 0x0000005f3815bd74
[libjsvm.so] v8::internal::JSObject::OptimizeAsPrototype(v8::internal::Handle<v8::internal::JSObject>, bool) 0x0000005f3815fa68
[libjsvm.so] v8::internal::Map::SetPrototype(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::HeapObject>, bool) 0x0000005f381b389c
[libjsvm.so] v8::internal::JSFunction::SetInitialMap(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>) 0x0000005f3814a2a4
[libjsvm.so] v8::internal::ApiNatives::CreateApiFunction(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NativeContext>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::InstanceType, v8::internal::MaybeHandle<v8::internal::Name>) 0x0000005f37d14208
[libjsvm.so] v8::internal::(anonymous namespace)::InstantiateFunction(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NativeContext>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::MaybeHandle<v8::internal::Name>) 0x0000005f37d122ac
[libjsvm.so] v8::internal::ApiNatives::InstantiateFunction(v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::MaybeHandle<v8::internal::Name>) 0x0000005f37d12c4c
[libjsvm.so] v8::FunctionTemplate::GetFunction(v8::Local<v8::Context>) 0x0000005f37d318a4
[libjsvm.so] v8::Function::New(v8::Local<v8::Context>, void (*)(v8::FunctionCallbackInfo<v8::Value> const&), v8::Local<v8::Value>, int, v8::ConstructorBehavior, v8::SideEffectType) 0x0000005f37d317e0
[libjsvm.so] OH_JSVM_CreateFunction 0x0000005f37569780

定位

坏的MarkingBarrier

复现崩溃: 可以看到是在指令ldr x20, [x23, #0x8]出现的崩溃,打印x23,发现x23是一个非法指针0xbebebebebebebebe。

结合上句ldr x23, [x0, #0x50],可知x23是x0偏移0x50的属性。根据c++的编译规则,x0是 MarkingBarrier::WhiteToGreyAndPush 的第一个参数,指向this,也就是 MarkingBarrier。

对 MarkingBarrier::WhiteToGreyAndPush 打断点,查看正常情况下 MarkingBarrier 内容:

同上文,x0是第一个参数,也就是this,在WhiteToGreyAndPush里指当前的MarkingBarrier。在memory view中查看MarkingBarrier(0x0000006185bc3640)内容:

可以看到当前MarkingBarrier偏移0x50值是正常的,所以这次运行不会有问题。这里我们设置个watchpoint,观察MarkingBarrier偏移0x50值是否什么时候被改坏了:

arduino 复制代码
(lldb) watchpoint set expression 0x0000006185bc3690
Watchpoint created: Watchpoint 1: addr = 0x6185bc3690 size = 8 state = enabled type = w
    new value: 418855532128

断点位置 = 0x0000006185bc3640(MarkingBarrier) + 0x50 = 0x0000006185bc3690

可以看到0x0000006185bc3690当前的值是 418855532128,换算成16进制是0x6185bc3660,也和我们在memory view中看到的一致。

命中watchpoint后可以看到

断点上一句str x12, [x11, #0x50]指令的意思是把x12存储到x11偏移0x50的位置,打印x11、x12:

bash 复制代码
(lldb) p/x $x12
(unsigned long) $3 = 0x0000006185bc3660

(lldb) p/x $x11
(unsigned long) $4 = 0x0000006185bc3640

可以看到x11指向的就是前面的 MarkingBarrier 对象,x12打印的则跟之前 0x0000006185bc3690 位置值一致,说明这里给它赋了一个跟之前一样的值,不用管它。继续运行后触发了崩溃:

打印出x0(MarkingBarrier),发现当前的 MarkingBarrier 地址是 0x00000061860d8f40,跟之前我们断点到的 0x0000006185bc3640 是两个对象。查看0x00000061860d8f40内存:

可以发现后面的 MarkingBarrier 偏移0x50是坏的,也就是 MarkingBarrier 不是被改坏的,而是被赋值了一个坏的

MarkingBarrier来源

下载v8源码:github.com/v8/v8,这里我用的... 12.0.46版本。

查看崩溃堆栈二中的 Dictionary::SetEntry 源码,尝试寻找MarkingBarrier来源:

scss 复制代码
template <typename Derived, typename Shape>
void Dictionary<Derived, Shape>::SetEntry(InternalIndex entry,
                                          Tagged<Object> key,
                                          Tagged<Object> value,
                                          PropertyDetails details) {
  DCHECK(Dictionary::kEntrySize == 2 || Dictionary::kEntrySize == 3);
  DCHECK(!IsName(key) || details.dictionary_index() > 0);
  int index = DerivedHashTable::EntryToIndex(entry);
  DisallowGarbageCollection no_gc;
  WriteBarrierMode mode = this->GetWriteBarrierMode(no_gc);
  this->set(index + Derived::kEntryKeyIndex, key, mode);
  this->set(index + Derived::kEntryValueIndex, value, mode);
  if (Shape::kHasDetails) DetailsAtPut(entry, details);
}

其中this->set方法实现如下(Dictionary继承链路:Dictionary -> HashTable -> HashTableBase -> FixedArray):

scss 复制代码
void FixedArray::set(int index, Tagged<Object> value, WriteBarrierMode mode) {
  DCHECK_NE(map(), GetReadOnlyRoots().fixed_cow_array_map());
  DCHECK_LT(static_cast<unsigned>(index), static_cast<unsigned>(length()));
  int offset = OffsetOfElementAt(index);
  RELAXED_WRITE_FIELD(*this, offset, value);
  CONDITIONAL_WRITE_BARRIER(*this, offset, value, mode);
}

CONDITIONAL_WRITE_BARRIER是个宏定义,定义如下:

csharp 复制代码
#define CONDITIONAL_WRITE_BARRIER(object, offset, value, mode)             \
  do {                                                                     \
    DCHECK_NOT_NULL(GetHeapFromWritableObject(object));                    \
    CombinedWriteBarrier(object, (object)->RawField(offset), value, mode); \
  } while (false)
#endif

CombinedWriteBarrier及相关函数的关键实现如下:

scss 复制代码
inline void CombinedWriteBarrier(Tagged<HeapObject> host, MaybeObjectSlot slot,
                                 MaybeObject value, WriteBarrierMode mode) {
  ...
  heap_internals::CombinedWriteBarrierInternal(host, HeapObjectSlot(slot), value_object, mode);
}

inline void CombinedWriteBarrierInternal(Tagged<HeapObject> host,
                                         HeapObjectSlot slot,
                                         Tagged<HeapObject> value,
                                         WriteBarrierMode mode) {
...
  if (V8_UNLIKELY(is_marking)) {
    WriteBarrier::MarkingSlow(host, HeapObjectSlot(slot), value);
  }
}

void WriteBarrier::MarkingSlow(Tagged<HeapObject> host, HeapObjectSlot slot,
                               Tagged<HeapObject> value) {
  MarkingBarrier* marking_barrier = CurrentMarkingBarrier(host);
  marking_barrier->Write(host, slot, value);
}

可以看到是通过 CurrentMarkingBarrier 方法取的MarkingBarrier对象,再看看CurrentMarkingBarrier实现:

ini 复制代码
namespace {
thread_local MarkingBarrier* current_marking_barrier = nullptr;
}  // namespace

MarkingBarrier* WriteBarrier::CurrentMarkingBarrier(
    Tagged<HeapObject> verification_candidate) {
  MarkingBarrier* marking_barrier = current_marking_barrier;
  DCHECK_NOT_NULL(marking_barrier);
  ...
  return marking_barrier;
}

这里用了 thread_local 保存 MarkingBarrier,也就是 current_marking_barrier 指针被修改指向了坏的MarkingBarrier,导致了上文的崩溃。寻找current_marking_barrier赋值逻辑:

ini 复制代码
MarkingBarrier* WriteBarrier::SetForThread(MarkingBarrier* marking_barrier) {
  MarkingBarrier* existing = current_marking_barrier;
  current_marking_barrier = marking_barrier;
  return existing;
}

对WriteBarrier::SetForThread打断点:

因为崩溃是在主线程,有很多非主线程的current_marking_barrier修改这里不用管,继续运行,直到触发主线程的 current_marking_barrier 修改:

这里没展示堆栈,我们可以用register read查看lr寄存器的值:

ini 复制代码
(lldb) register read
General Purpose Registers:
        x0 = 0x0000000000000000
        x1 = 0x0000000000000020
        x2 = 0x0000000000000020
        x3 = 0x0000005aeb606a40  ld-musl-aarch64.so.1`memset
        x4 = 0x0000005cee8a7f30
        x5 = 0x0000000000000001
        x6 = 0x0000007fddc54000
        x7 = 0x0000000000000001
        x8 = 0x0000000000000000
        x9 = 0x0000000000000000
       x10 = 0x0000005aeb606a40  ld-musl-aarch64.so.1`memset
       x11 = 0x000000000b18d225
       x12 = 0xffffffffffffffff
       x13 = 0x0000000000000000
       x14 = 0x0000007fffffffff
       x15 = 0x0000007fffffffff
       x16 = 0x0000005aecaf4fd8  
       x17 = 0x0000005aeb71d450  ld-musl-aarch64.so.1`tss_get
       x18 = 0xffff000000000006
       x19 = 0x0000007b6f867800
       x20 = 0x0000005cee8a7e50
       x21 = 0x0000007b6f874c40
       x22 = 0x0000005cf0298650
       x23 = 0x0000007b76a68800
       x24 = 0x0000000000000000
       x25 = 0x0000005aeb96c570  ld-musl-aarch64.so.1`ohos_malloc_hook_shared_library
       x26 = 0x0000000000000007
       x27 = 0x0000007b714fb7e8  libace_napi.z.so`ArkNativeEngine::napiProfilerEnabled
       x28 = 0x0000007fde44d3a0
        fp = 0x0000007fde44a880
        lr = 0x0000007c95cc05a8  libjsvm.so`v8::internal::Isolate::Enter() + 232
        sp = 0x0000007fde44a880
        pc = 0x0000007c95d2c2ac  libjsvm.so`v8::internal::WriteBarrier::SetForThread(v8::internal::MarkingBarrier*)
      cpsr = 0x20001000

继续运行,lr寄存器分别收集到下面的调用者:

rust 复制代码
libjsvm.so`v8::internal::Isolate::Enter() + 232

libjsvm.so`v8::internal::Isolate::Init(v8::internal::SnapshotData*, v8::internal::SnapshotData*, v8::internal::Snapshot

libjsvm.so`v8::Isolate::Initialize(v8::Isolate*, v8::Isolate::CreateParams const&) + 496

libjsvm.so`OH_JSVM_CreateVM + 332

除了v8的Isolate逻辑,这里出现一个熟悉的调用者OH_JSVM_CreateVM。OH_JSVM_CreateVM完整调用堆栈如下:

ruby 复制代码
    [libjsvm.so] v8::internal::WriteBarrier::SetForThread(v8::internal::MarkingBarrier*) Disassembly:401
    [??] ?? 0x004f007b25fa494c(OH_JSVM_CreateVM)
    [libmylib.so] MyEngineInstance::MyEngineInstance(napi_env__*, napi_value__*, napi_value__*, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>> const&, unsigned long const&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>> const&) MyEngineInstance.cpp:686
    [libmylib.so] std::__n1::__shared_ptr_emplace<MyEngineInstance, std::__n1::allocator<MyEngineInstance>>::__shared_ptr_emplace[abi:v15004]<napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&>(std::__n1::allocator<MyEngineInstance>, napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&) shared_ptr.h:294
    [libmylib.so] std::__n1::shared_ptr<MyEngineInstance> std::__n1::allocate_shared[abi:v15004]<MyEngineInstance, std::__n1::allocator<MyEngineInstance>, napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&, void>(std::__n1::allocator<MyEngineInstance> const&, napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&) shared_ptr.h:953
    [libmylib.so] std::__n1::shared_ptr<MyEngineInstance> std::__n1::make_shared[abi:v15004]<MyEngineInstance, napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&, void>(napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&) shared_ptr.h:962
    [libmylib.so] createMyEngine(napi_env__*, napi_callback_info__*) napi_init.cpp:82
    [libace_napi.z.so] panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*) 0x0000007a07ebdebc
    [JIT(0x777c880400)] RTStub_PushCallRangeAndDispatchNative 0x0000007a1c874eb0
    [JIT(0x777c880400)] BCStubInterpreterRoutine 0x0000007a1c48c6bc

这里createMyEngine是在新建MyEngineInstance,同时会创建新的jsvm vm。新的jsvm vm会有自己的Isolate,修改了 thread_local 的 current_marking_barrier 指向。主线程同时运行多个jsvm vm时,它们共享一个 current_marking_barrier,从而引发了问题。

解决方案

使用多线程方案,每个线程只运行一个JSVM vm。JSVM套壳V8,从分析结果看V8就无法做到同线程运行多vm实例。

相关推荐
周倦岚4 小时前
HarmonyOS动画:属性动画、显示动画、转场动画
华为·harmonyos
2501_919749034 小时前
鸿蒙:使用worker实现多线程通信
华为·harmonyos
安卓开发者4 小时前
鸿蒙Next应用开发:ArkTS语言下的IPC与RPC通信指南
qt·rpc·harmonyos
Forever_Hopeful4 小时前
华为鸿蒙 ArkTS 实战:基于 RelationalStore 的 SQLite 实现本地数据持久化
华为·sqlite·harmonyos
程序员潘Sir9 小时前
鸿蒙应用开发从入门到实战(十三):ArkUI组件Slider&Progress
harmonyos·鸿蒙
程序员潘Sir1 天前
鸿蒙应用开发从入门到实战(十二):ArkUI组件Button&Toggle
harmonyos·鸿蒙
程序员潘Sir2 天前
鸿蒙应用开发从入门到实战(十一):ArkUI组件Text&TextInput
harmonyos·鸿蒙
程序员潘Sir3 天前
鸿蒙应用开发从入门到实战(十):ArkUI图片组件Image
harmonyos
高心星5 天前
鸿蒙应用开发——Repeat组件的使用
harmonyos