记录一次鸿蒙JSVM崩溃定位修复

背景

我们的跨端方案需要在鸿蒙上动态执行js代码,类似RN。鸿蒙提供了JSVM解决方案,JSVM套壳V8。但在运行过程出现一个JSVM内部的崩溃。这篇文章主要记录了如何使用调试方法、少量汇编知识定位到系统库的崩溃原因

现象

使用jsvm执行js脚本,发生了WhiteToGreyAndPush崩溃,堆栈如下:

ruby 复制代码
Reason:Signal:SIGSEGV(SEGV_MAPERR)@0xbebebebebebebec6 
[libjsvm.so] v8::internal::MarkingBarrier::WhiteToGreyAndPush(v8::internal::HeapObject) Disassembly:201
[libjsvm.so] v8::internal::MarkingBarrier::Write(v8::internal::HeapObject, v8::internal::FullHeapObjectSlot, v8::internal::HeapObject) 0x0000005f572087bc
[libjsvm.so] v8::internal::Factory::CodeBuilder::BuildInternal(bool) 0x0000005f5718766c
[libjsvm.so] v8::internal::compiler::CodeGenerator::FinalizeCode() 0x0000005f578bd380
[libjsvm.so] auto v8::internal::compiler::PipelineImpl::Run<v8::internal::compiler::FinalizeCodePhase>() 0x0000005f57b25c3c
[libjsvm.so] v8::internal::compiler::PipelineImpl::FinalizeCode(bool) 0x0000005f57b19208
[libjsvm.so] v8::internal::compiler::PipelineCompilationJob::FinalizeJobImpl(v8::internal::Isolate*) 0x0000005f57b1903c
[libjsvm.so] v8::internal::Compiler::FinalizeTurbofanCompilationJob(v8::internal::TurbofanCompilationJob*, v8::internal::Isolate*) 0x0000005f57065b40
[libjsvm.so] v8::internal::OptimizingCompileDispatcher::InstallOptimizedFunctions() 0x0000005f570b3b24
[libjsvm.so] v8::internal::StackGuard::HandleInterrupts() 0x0000005f57149b20
[libjsvm.so] v8::internal::Runtime_StackGuard(int, unsigned long*, v8::internal::Isolate*) 0x0000005f57592604
[libjsvm.so] Builtins_CEntry_Return1_ArgvOnStack_NoBuiltinExit 0x0000005f56af20b8
[libjsvm.so] Builtins_ProxyConstructor 0x0000005f56b53190
[libjsvm.so] Builtins_JSBuiltinsConstructStub 0x0000005f56a6677c
[??] ?? 0x0000005f3f4c9b84
ruby 复制代码
Reason:Signal:SIGSEGV(SEGV_MAPERR)@0xbebebebebebebec6 
[libjsvm.so] v8::internal::MarkingBarrier::WhiteToGreyAndPush(v8::internal::HeapObject) Disassembly:201
[libjsvm.so] v8::internal::MarkingBarrier::Write(v8::internal::HeapObject, v8::internal::FullHeapObjectSlot, v8::internal::HeapObject) 0x0000005f37f887bc
[libjsvm.so] v8::internal::Dictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::SetEntry(v8::internal::InternalIndex, v8::internal::Object, v8::internal::Object, v8::internal::PropertyDetails) 0x0000005f381d2c0c
[libjsvm.so] v8::internal::Handle<v8::internal::NameDictionary> v8::internal::Dictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::Add<v8::internal::Isolate, (v8::internal::AllocationType)0>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NameDictionary>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, v8::internal::InternalIndex*) 0x0000005f381d32dc
[libjsvm.so] v8::internal::BaseNameDictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::Add(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NameDictionary>, v8::internal::Handle<v8::internal::Name>, v8::internal::Handle<v8::internal::Object>, v8::internal::PropertyDetails, v8::internal::InternalIndex*) 0x0000005f381d2e64
[libjsvm.so] v8::internal::JSObject::MigrateToMap(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Map>, int) 0x0000005f3815bd74
[libjsvm.so] v8::internal::JSObject::OptimizeAsPrototype(v8::internal::Handle<v8::internal::JSObject>, bool) 0x0000005f3815fa68
[libjsvm.so] v8::internal::Map::SetPrototype(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::HeapObject>, bool) 0x0000005f381b389c
[libjsvm.so] v8::internal::JSFunction::SetInitialMap(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Map>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>) 0x0000005f3814a2a4
[libjsvm.so] v8::internal::ApiNatives::CreateApiFunction(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NativeContext>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::InstanceType, v8::internal::MaybeHandle<v8::internal::Name>) 0x0000005f37d14208
[libjsvm.so] v8::internal::(anonymous namespace)::InstantiateFunction(v8::internal::Isolate*, v8::internal::Handle<v8::internal::NativeContext>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::MaybeHandle<v8::internal::Name>) 0x0000005f37d122ac
[libjsvm.so] v8::internal::ApiNatives::InstantiateFunction(v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::MaybeHandle<v8::internal::Name>) 0x0000005f37d12c4c
[libjsvm.so] v8::FunctionTemplate::GetFunction(v8::Local<v8::Context>) 0x0000005f37d318a4
[libjsvm.so] v8::Function::New(v8::Local<v8::Context>, void (*)(v8::FunctionCallbackInfo<v8::Value> const&), v8::Local<v8::Value>, int, v8::ConstructorBehavior, v8::SideEffectType) 0x0000005f37d317e0
[libjsvm.so] OH_JSVM_CreateFunction 0x0000005f37569780

定位

坏的MarkingBarrier

复现崩溃: 可以看到是在指令ldr x20, [x23, #0x8]出现的崩溃,打印x23,发现x23是一个非法指针0xbebebebebebebebe。

结合上句ldr x23, [x0, #0x50],可知x23是x0偏移0x50的属性。根据c++的编译规则,x0是 MarkingBarrier::WhiteToGreyAndPush 的第一个参数,指向this,也就是 MarkingBarrier。

对 MarkingBarrier::WhiteToGreyAndPush 打断点,查看正常情况下 MarkingBarrier 内容:

同上文,x0是第一个参数,也就是this,在WhiteToGreyAndPush里指当前的MarkingBarrier。在memory view中查看MarkingBarrier(0x0000006185bc3640)内容:

可以看到当前MarkingBarrier偏移0x50值是正常的,所以这次运行不会有问题。这里我们设置个watchpoint,观察MarkingBarrier偏移0x50值是否什么时候被改坏了:

arduino 复制代码
(lldb) watchpoint set expression 0x0000006185bc3690
Watchpoint created: Watchpoint 1: addr = 0x6185bc3690 size = 8 state = enabled type = w
    new value: 418855532128

断点位置 = 0x0000006185bc3640(MarkingBarrier) + 0x50 = 0x0000006185bc3690

可以看到0x0000006185bc3690当前的值是 418855532128,换算成16进制是0x6185bc3660,也和我们在memory view中看到的一致。

命中watchpoint后可以看到

断点上一句str x12, [x11, #0x50]指令的意思是把x12存储到x11偏移0x50的位置,打印x11、x12:

bash 复制代码
(lldb) p/x $x12
(unsigned long) $3 = 0x0000006185bc3660

(lldb) p/x $x11
(unsigned long) $4 = 0x0000006185bc3640

可以看到x11指向的就是前面的 MarkingBarrier 对象,x12打印的则跟之前 0x0000006185bc3690 位置值一致,说明这里给它赋了一个跟之前一样的值,不用管它。继续运行后触发了崩溃:

打印出x0(MarkingBarrier),发现当前的 MarkingBarrier 地址是 0x00000061860d8f40,跟之前我们断点到的 0x0000006185bc3640 是两个对象。查看0x00000061860d8f40内存:

可以发现后面的 MarkingBarrier 偏移0x50是坏的,也就是 MarkingBarrier 不是被改坏的,而是被赋值了一个坏的

MarkingBarrier来源

下载v8源码:github.com/v8/v8,这里我用的... 12.0.46版本。

查看崩溃堆栈二中的 Dictionary::SetEntry 源码,尝试寻找MarkingBarrier来源:

scss 复制代码
template <typename Derived, typename Shape>
void Dictionary<Derived, Shape>::SetEntry(InternalIndex entry,
                                          Tagged<Object> key,
                                          Tagged<Object> value,
                                          PropertyDetails details) {
  DCHECK(Dictionary::kEntrySize == 2 || Dictionary::kEntrySize == 3);
  DCHECK(!IsName(key) || details.dictionary_index() > 0);
  int index = DerivedHashTable::EntryToIndex(entry);
  DisallowGarbageCollection no_gc;
  WriteBarrierMode mode = this->GetWriteBarrierMode(no_gc);
  this->set(index + Derived::kEntryKeyIndex, key, mode);
  this->set(index + Derived::kEntryValueIndex, value, mode);
  if (Shape::kHasDetails) DetailsAtPut(entry, details);
}

其中this->set方法实现如下(Dictionary继承链路:Dictionary -> HashTable -> HashTableBase -> FixedArray):

scss 复制代码
void FixedArray::set(int index, Tagged<Object> value, WriteBarrierMode mode) {
  DCHECK_NE(map(), GetReadOnlyRoots().fixed_cow_array_map());
  DCHECK_LT(static_cast<unsigned>(index), static_cast<unsigned>(length()));
  int offset = OffsetOfElementAt(index);
  RELAXED_WRITE_FIELD(*this, offset, value);
  CONDITIONAL_WRITE_BARRIER(*this, offset, value, mode);
}

CONDITIONAL_WRITE_BARRIER是个宏定义,定义如下:

csharp 复制代码
#define CONDITIONAL_WRITE_BARRIER(object, offset, value, mode)             \
  do {                                                                     \
    DCHECK_NOT_NULL(GetHeapFromWritableObject(object));                    \
    CombinedWriteBarrier(object, (object)->RawField(offset), value, mode); \
  } while (false)
#endif

CombinedWriteBarrier及相关函数的关键实现如下:

scss 复制代码
inline void CombinedWriteBarrier(Tagged<HeapObject> host, MaybeObjectSlot slot,
                                 MaybeObject value, WriteBarrierMode mode) {
  ...
  heap_internals::CombinedWriteBarrierInternal(host, HeapObjectSlot(slot), value_object, mode);
}

inline void CombinedWriteBarrierInternal(Tagged<HeapObject> host,
                                         HeapObjectSlot slot,
                                         Tagged<HeapObject> value,
                                         WriteBarrierMode mode) {
...
  if (V8_UNLIKELY(is_marking)) {
    WriteBarrier::MarkingSlow(host, HeapObjectSlot(slot), value);
  }
}

void WriteBarrier::MarkingSlow(Tagged<HeapObject> host, HeapObjectSlot slot,
                               Tagged<HeapObject> value) {
  MarkingBarrier* marking_barrier = CurrentMarkingBarrier(host);
  marking_barrier->Write(host, slot, value);
}

可以看到是通过 CurrentMarkingBarrier 方法取的MarkingBarrier对象,再看看CurrentMarkingBarrier实现:

ini 复制代码
namespace {
thread_local MarkingBarrier* current_marking_barrier = nullptr;
}  // namespace

MarkingBarrier* WriteBarrier::CurrentMarkingBarrier(
    Tagged<HeapObject> verification_candidate) {
  MarkingBarrier* marking_barrier = current_marking_barrier;
  DCHECK_NOT_NULL(marking_barrier);
  ...
  return marking_barrier;
}

这里用了 thread_local 保存 MarkingBarrier,也就是 current_marking_barrier 指针被修改指向了坏的MarkingBarrier,导致了上文的崩溃。寻找current_marking_barrier赋值逻辑:

ini 复制代码
MarkingBarrier* WriteBarrier::SetForThread(MarkingBarrier* marking_barrier) {
  MarkingBarrier* existing = current_marking_barrier;
  current_marking_barrier = marking_barrier;
  return existing;
}

对WriteBarrier::SetForThread打断点:

因为崩溃是在主线程,有很多非主线程的current_marking_barrier修改这里不用管,继续运行,直到触发主线程的 current_marking_barrier 修改:

这里没展示堆栈,我们可以用register read查看lr寄存器的值:

ini 复制代码
(lldb) register read
General Purpose Registers:
        x0 = 0x0000000000000000
        x1 = 0x0000000000000020
        x2 = 0x0000000000000020
        x3 = 0x0000005aeb606a40  ld-musl-aarch64.so.1`memset
        x4 = 0x0000005cee8a7f30
        x5 = 0x0000000000000001
        x6 = 0x0000007fddc54000
        x7 = 0x0000000000000001
        x8 = 0x0000000000000000
        x9 = 0x0000000000000000
       x10 = 0x0000005aeb606a40  ld-musl-aarch64.so.1`memset
       x11 = 0x000000000b18d225
       x12 = 0xffffffffffffffff
       x13 = 0x0000000000000000
       x14 = 0x0000007fffffffff
       x15 = 0x0000007fffffffff
       x16 = 0x0000005aecaf4fd8  
       x17 = 0x0000005aeb71d450  ld-musl-aarch64.so.1`tss_get
       x18 = 0xffff000000000006
       x19 = 0x0000007b6f867800
       x20 = 0x0000005cee8a7e50
       x21 = 0x0000007b6f874c40
       x22 = 0x0000005cf0298650
       x23 = 0x0000007b76a68800
       x24 = 0x0000000000000000
       x25 = 0x0000005aeb96c570  ld-musl-aarch64.so.1`ohos_malloc_hook_shared_library
       x26 = 0x0000000000000007
       x27 = 0x0000007b714fb7e8  libace_napi.z.so`ArkNativeEngine::napiProfilerEnabled
       x28 = 0x0000007fde44d3a0
        fp = 0x0000007fde44a880
        lr = 0x0000007c95cc05a8  libjsvm.so`v8::internal::Isolate::Enter() + 232
        sp = 0x0000007fde44a880
        pc = 0x0000007c95d2c2ac  libjsvm.so`v8::internal::WriteBarrier::SetForThread(v8::internal::MarkingBarrier*)
      cpsr = 0x20001000

继续运行,lr寄存器分别收集到下面的调用者:

rust 复制代码
libjsvm.so`v8::internal::Isolate::Enter() + 232

libjsvm.so`v8::internal::Isolate::Init(v8::internal::SnapshotData*, v8::internal::SnapshotData*, v8::internal::Snapshot

libjsvm.so`v8::Isolate::Initialize(v8::Isolate*, v8::Isolate::CreateParams const&) + 496

libjsvm.so`OH_JSVM_CreateVM + 332

除了v8的Isolate逻辑,这里出现一个熟悉的调用者OH_JSVM_CreateVM。OH_JSVM_CreateVM完整调用堆栈如下:

ruby 复制代码
    [libjsvm.so] v8::internal::WriteBarrier::SetForThread(v8::internal::MarkingBarrier*) Disassembly:401
    [??] ?? 0x004f007b25fa494c(OH_JSVM_CreateVM)
    [libmylib.so] MyEngineInstance::MyEngineInstance(napi_env__*, napi_value__*, napi_value__*, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>> const&, unsigned long const&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>> const&) MyEngineInstance.cpp:686
    [libmylib.so] std::__n1::__shared_ptr_emplace<MyEngineInstance, std::__n1::allocator<MyEngineInstance>>::__shared_ptr_emplace[abi:v15004]<napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&>(std::__n1::allocator<MyEngineInstance>, napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&) shared_ptr.h:294
    [libmylib.so] std::__n1::shared_ptr<MyEngineInstance> std::__n1::allocate_shared[abi:v15004]<MyEngineInstance, std::__n1::allocator<MyEngineInstance>, napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&, void>(std::__n1::allocator<MyEngineInstance> const&, napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&) shared_ptr.h:953
    [libmylib.so] std::__n1::shared_ptr<MyEngineInstance> std::__n1::make_shared[abi:v15004]<MyEngineInstance, napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&, void>(napi_env__*&, napi_value__*&, napi_value__*&, char const (&) [1], unsigned long&, std::__n1::basic_string<char, std::__n1::char_traits<char>, std::__n1::allocator<char>>&) shared_ptr.h:962
    [libmylib.so] createMyEngine(napi_env__*, napi_callback_info__*) napi_init.cpp:82
    [libace_napi.z.so] panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*) 0x0000007a07ebdebc
    [JIT(0x777c880400)] RTStub_PushCallRangeAndDispatchNative 0x0000007a1c874eb0
    [JIT(0x777c880400)] BCStubInterpreterRoutine 0x0000007a1c48c6bc

这里createMyEngine是在新建MyEngineInstance,同时会创建新的jsvm vm。新的jsvm vm会有自己的Isolate,修改了 thread_local 的 current_marking_barrier 指向。主线程同时运行多个jsvm vm时,它们共享一个 current_marking_barrier,从而引发了问题。

解决方案

使用多线程方案,每个线程只运行一个JSVM vm。JSVM套壳V8,从分析结果看V8就无法做到同线程运行多vm实例。

相关推荐
richard_yuu1 小时前
鸿蒙心理测评模块实战|PHQ-9/GAD7双量表答题、实时计分与结果本地化存储
华为·harmonyos
不爱吃糖的程序媛4 小时前
2026年Electron 鸿蒙PC环境搭建指南
人工智能·华为·harmonyos
nashane5 小时前
HarmonyOS 6学习:长截图功能开发中的滚动拼接与权限处理实战
人工智能·华为·harmonyos
大师兄66686 小时前
从零开发一个 HarmonyOS 输入法——KikaInputMethod 完整拆解
harmonyos·服务卡片·harmonyos6·formkit
Python私教11 小时前
鸿蒙 NEXT 也能接 MCP?用 ArkTS 跑通 AI Agent 工具链
人工智能·华为·harmonyos
Swift社区14 小时前
分布式能力在鸿蒙 PC 上到底怎么用?
分布式·华为·harmonyos
nashane1 天前
HarmonyOS 6学习:外接键盘CapsLock与长截图功能的实战调试与完整解决方案
学习·华为·计算机外设·harmonyos
aqi001 天前
一文理清 HarmonyOS 6.0.2 涵盖的十个升级点
android·华为·harmonyos·鸿蒙·harmony
环信即时通讯云1 天前
环信Flutter UIKit适配鸿蒙实战指南
flutter·华为·harmonyos