定屏闪退问题分析思路:
-
定屏问题如果是相机问题,一般会出现返帧,导致预览卡死。当然还有其他情况,我们先看返帧情况,发现request和result开始都正常,到12:53:05.443038就没有返帧了,定屏了。往下排查。
行 53944: 01-01 12:53:03.429684 1825 16891 I CamX : [CONFIG][HAL ] camxhal3.cpp:1810 process_capture_request() output_buffers[0] : 0xb400007c12beeb80, buffer: 0x7e15dbeb40, status: 00000000, stream: 0xb400007d52135ea8
行 53945: 01-01 12:53:03.429696 1825 16891 I CamX : [CONFIG][HAL ] camxhal3.cpp:1810 process_capture_request() output_buffers[1] : 0xb400007c12beeba0, buffer: 0x7e15dbeb40, status: 00000000, stream: 0xb400007d52135ba8
行 56194: 01-01 12:53:04.423164 1825 3582 I CamX : [CONFIG][HAL ] camxhal3.cpp:2075 process_capture_result() frame_number 584, partial_result 1, result 0xb400007d52c0d780, num_physcam_metadata 0, num_output_buffers 0
行 56385: 01-01 12:53:04.446711 1825 3582 I CamX : [CONFIG][HAL ] camxhal3.cpp:2075 process_capture_result() frame_number 581, partial_result 2, result 0xb400007d21c2b0c0, num_physcam_metadata 1, num_output_buffers 0
行 56442: 01-01 12:53:04.456509 1825 3582 I CamX : [CONFIG][HAL ] camxhal3.cpp:2075 process_capture_result() frame_number 581, partial_result 0, result 0x0, num_physcam_metadata 0, num_output_buffers 1
行 56443: 01-01 12:53:04.456514 1825 3582 I CamX : [CONFIG][HAL ] camxhal3.cpp:2106 process_capture_result() output_buffers[0] : 0xb400007ca6da5700, buffer: 0xb400007caa52f758, status: 00000000, stream: 0xb400007d52135ea8
行 56495: 01-01 12:53:04.457988 1825 3580 I CamX : [CONFIG][HAL ] camxhal3.cpp:2075 process_capture_result() frame_number 580, partial_result 0, result 0x0, num_physcam_metadata 0, num_output_buffers 1
行 56496: 01-01 12:53:04.457994 1825 3580 I CamX : [CONFIG][HAL ] camxhal3.cpp:2106 process_capture_result() output_buffers[0] : 0xb400007ca6da5700, buffer: 0xb400007ca72d9178, status: 00000000, stream: 0xb400007d52135ba8
行 58262: 01-01 12:53:05.443033 1825 3581 I CamX : [CONFIG][HAL ] camxhal3.cpp:2075 process_capture_result() frame_number 605, partial_result 0, result 0x0, num_physcam_metadata 0, num_output_buffers 1
行 58263: 01-01 12:53:05.443038 1825 3581 I CamX : [CONFIG][HAL ] camxhal3.cpp:2106 process_capture_result() output_buffers[0] : 0xb400007ca6da5700, buffer: 0xb400007ca63e6918, status: 00000000, stream: 0xb400007d52135ba8 -
发现日志里有crash日志,camx hal在12:53:05.931crash了,这个时间前面已经出现问题了,导致定屏并crash。
01-01 12:53:06.362630 24257 24257 F DEBUG : Timestamp: 2024-01-01 12:53:05.931813851+0800
01-01 12:53:06.362634 24257 24257 F DEBUG : Process uptime: 0s
01-01 12:53:06.362642 24257 24257 F DEBUG : Cmdline: /vendor/bin/hw/[email protected]_64
01-01 12:53:06.362646 24257 24257 F DEBUG : pid: 1825, tid: 3581, name: SoloTMgr_1 >>> /vendor/bin/hw/[email protected]_64 <<<
01-01 12:53:06.362652 24257 24257 F DEBUG : uid: 1047
01-01 12:53:06.362658 24257 24257 F DEBUG : tagged_addr_ctrl: 0000000000000001 (PR_TAGGED_ADDR_ENABLE)
01-01 12:53:06.362663 24257 24257 F DEBUG : pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
01-01 12:53:06.362669 24257 24257 F DEBUG : signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000000
01-01 12:53:06.362674 24257 24257 F DEBUG : Cause: null pointer dereference
01-01 12:53:06.362678 24257 24257 F DEBUG : Abort message: 'otp info index = 0 :value = 6500, (0.523098, 0.535230), (0.522952, 0.531020) !'
01-01 12:53:06.362689 24257 24257 F DEBUG : x0 b400007bc959f000 x1 0000000000000001 x2 0000000000000000 x3 0000000000000002
01-01 12:53:06.362694 24257 24257 F DEBUG : x4 b400007c25350618 x5 0000000000000000 x6 0000000000000000 x7 00000000ffffffff
01-01 12:53:06.362700 24257 24257 F DEBUG : x8 0000007d92de5000 x9 0000007d91afcdf0 x10 0000000000000dfd x11 0000000000000150
01-01 12:53:06.362705 24257 24257 F DEBUG : x12 000000000000000d x13 0000007c25200000 x14 0000000000000020 x15 0000000000000000
01-01 12:53:06.362710 24257 24257 F DEBUG : x16 0000000000000001 x17 0000007e1792ccac x18 b400007cb2cadc40 x19 0000000000000000
01-01 12:53:06.362715 24257 24257 F DEBUG : x20 0000000000000001 x21 0000007d35f0afd0 x22 0000000000000001 x23 0000007d93a283a8
01-01 12:53:06.362720 24257 24257 F DEBUG : x24 00000000000009a0 x25 b400007bc95bf340 x26 0000007d35e890a0 x27 0000000000000000
01-01 12:53:06.362726 24257 24257 F DEBUG : x28 b400007bc95b7060 x29 0000007d35e52b90
01-01 12:53:06.362731 24257 24257 F DEBUG : lr 0044e67d9203f61c sp 0000007d35e52b70 pc 0000007d92c34a68 pst 0000000080001000
01-01 12:53:06.362737 24257 24257 F DEBUG : 8 total frames
01-01 12:53:06.362741 24257 24257 F DEBUG : backtrace:
01-01 12:53:06.362749 24257 24257 F DEBUG : #00 pc 000000000140aa68 /vendor/lib64/hw/camera.qcom.so (CamX::Node::GetCmdBufferForRequest(unsigned long, CamX::CmdBufferManager*)+48) (BuildId: bf5bd295751ef5290f0cfc3d8c2839df)
01-01 12:53:06.362757 24257 24257 F DEBUG : #01 pc 0000000000815618 /vendor/lib64/hw/camera.qcom.so (CamX::IPENode::ExecuteProcessRequest(CamX::ExecuteProcessRequestData*)+71296) (BuildId: bf5bd295751ef5290f0cfc3d8c2839df)
01-01 12:53:06.362762 24257 24257 F DEBUG : #02 pc 00000000013ee268 /vendor/lib64/hw/camera.qcom.so (CamX::Node::ProcessRequest(CamX::NodeProcessRequestData*, unsigned long)+9720) (BuildId: bf5bd295751ef5290f0cfc3d8c2839df)
01-01 12:53:06.362767 24257 24257 F DEBUG : #03 pc 000000000135ff60 /vendor/lib64/hw/camera.qcom.so (CamX::DeferredRequestQueue::DeferredWorkerWrapper(void*) (.cfi)+680) (BuildId: bf5bd295751ef5290f0cfc3d8c2839df)
01-01 12:53:06.362773 24257 24257 F DEBUG : #04 pc 000000000002d8f4 /vendor/lib64/libcamxcommonutils.so (CamX::ThreadCore::DispatchJob(CamX::RuntimeJob*)+628) (BuildId: b72d1bc00ef4147f5e1d08acf77e8bd9)
01-01 12:53:06.362778 24257 24257 F DEBUG : #05 pc 000000000002e8b4 /vendor/lib64/libcamxcommonutils.so (CamX::ThreadCore::WorkerThreadBody -
我们看下这个和buffer相关的crash当前线程情况。发现并没有特别报错。
行 131: 01-01 12:53:05.451298 1825 3581 I CamX : [ INFO][CORE ] camxpipeline.cpp:1998 ProcessRequest() Pipeline:MultiCameraCustomSATEIS0_0_cam_2 requestId:608, Tuning mode: default 0, sensor 24, usecase 0, feature1, 23 feature2 0, scene 0, effect 0 行 133: 01-01 12:53:05.451302 1825 3581 I CamX : [ INFO][CORE ] camxpipeline.cpp:2046 ProcessRequest() Pipeline::MultiCameraCustomSATEIS0_0_cam_2 last submitted request updated to 608 pointer:0xb400007c900ff000 行 135: 01-01 12:53:05.451334 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence... Node: cb0a0000, Node::MultiCameraCustomSATEIS0_com.arcsoft.node.smooth_transition1_cam2, fence: b4e62020(9), Port: 0, portIndex 0 reqId: 608, ImgBuf: 0x0, portidx/groupID 1 result: 0 行 137: 01-01 12:53:05.451346 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9595 SetupRequestOutputPortFence() CreatePrivateFence...Node: cb0a0000, Node::MultiCameraCustomSATEIS0_com.arcsoft.node.smooth_transition1_cam2, fence: b4e65820(12), ImgBuf:0x0 reqId:608 result: 0 行 138: 01-01 12:53:05.451385 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence... Node: 845eb000, Node::MultiCameraCustomSATEIS0_com.qti.node.gpu0_cam2, fence: b4e54020(14), Port: 1, portIndex 0 reqId: 608, ImgBuf: 0x0, portidx/groupID 1 result: 0 行 139: 01-01 12:53:05.451392 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9595 SetupRequestOutputPortFence() CreatePrivateFence...Node: 845eb000, Node::MultiCameraCustomSATEIS0_com.qti.node.gpu0_cam2, fence: b4e57820(19), ImgBuf:0x0 reqId:608 result: 0 行 143: 01-01 12:53:05.451786 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence... Node: 845eb000, Node::MultiCameraCustomSATEIS0_com.qti.node.gpu0_cam2, fence: b4e5b020(21), Port: 2, portIndex 1 reqId: 608, ImgBuf: 0x0, portidx/groupID 2 result: 0 行 144: 01-01 12:53:05.451796 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9595 SetupRequestOutputPortFence() CreatePrivateFence...Node: 845eb000, Node::MultiCameraCustomSATEIS0_com.qti.node.gpu0_cam2, fence: b4e5e820(22), ImgBuf:0x0 reqId:608 result: 0 行 145: 01-01 12:53:05.451831 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence... Node: bd2a0000, Node::MultiCameraCustomSATEIS0_com.arcsoft.node.eisv23_cam2, fence: b4e70020(24), Port: 0, portIndex 0 reqId: 608, ImgBuf: 0x0, portidx/groupID 1 result: 0 行 146: 01-01 12:53:05.451838 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9595 SetupRequestOutputPortFence() CreatePrivateFence...Node: bd2a0000, Node::MultiCameraCustomSATEIS0_com.arcsoft.node.eisv23_cam2, fence: b4e73820(26), ImgBuf:0x0 reqId:608 result: 0 行 147: 01-01 12:53:05.451852 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence... Node: bd2a0000, Node::MultiCameraCustomSATEIS0_com.arcsoft.node.eisv23_cam2, fence: b4e77020(37), Port: 1, portIndex 1 reqId: 608, ImgBuf: 0x0, portidx/groupID 2 result: 0 行 148: 01-01 12:53:05.451858 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9595 SetupRequestOutputPortFence() CreatePrivateFence...Node: bd2a0000, Node::MultiCameraCustomSATEIS0_com.arcsoft.node.eisv23_cam2, fence: b4e7a820(38), ImgBuf:0x0 reqId:608 result: 0 行 149: 01-01 12:53:05.451875 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence... Node: 21436000, Node::MultiCameraCustomSATEIS0_EVA10_cam2, fence: b5d87c20(39), Port: 1, portIndex 0 reqId: 608, ImgBuf: 0x0, portidx/groupID 2 result: 0 行 150: 01-01 12:53:05.451885 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence... Node: 21436000, Node::MultiCameraCustomSATEIS0_EVA10_cam2, fence: b5d89420(39), Port: 2, portIndex 1 reqId: 608, ImgBuf: 0x0, portidx/groupID 2 result: 0 行 152: 01-01 12:53:05.451902 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence... Node: 2140d000, Node::MultiCameraCustomSATEIS0_IPE10_cam2, fence: b4e80b20(43), Port: 8, portIndex 0 reqId: 608, ImgBuf: 0x0, portidx/groupID 1 result: 0 行 154: 01-01 12:53:05.451920 1825 3581 I CamX : [ INFO][CORE ] camxnode.cpp:9466 SetupRequestOutputPortFence() CreatePrivateFence
-
我们继续看下crash的代码,发现是高通原生代码,这里也一般不会出现错误。
8004 CmdBuffer* Node::GetCmdBufferForRequest(
8005 UINT64 requestId,
8006 CmdBufferManager* pCmdBufferManager)
8007 {
8008 CAMX_ASSERT(NULL != pCmdBufferManager);
8009
8010 PacketResource* pPacketResource = NULL;
8011
8012 if (CamxResultSuccess == pCmdBufferManager->GetBufferForRequest(GetCSLSyncId(requestId), &pPacketResource))
8013 {
8014 CAMX_ASSERT(TRUE == pPacketResource->GetUsageFlags().cmdBuffer);
8015 }
8016
8017 // We know pPacketResource actually points to a CmdBuffer so we may static_cast
8018 return static_cast<CmdBuffer*>(pPacketResource);
8019 } -
用户态没有明显报错,我们看下kernel KMD日志,发现高通camx IOVA内存不足导致的分配不到足够内存导致ISP request一直处于wait,KMD CRM一直not ready on link,最终用户态也wait。
<6>[ 6646.900716][T503585] CAM_ERR: CAM-SMMU: cam_smmu_map_buffer_validate: 2169 IOVA alloc failed for shared memory, size=10620928, idx=2, handle=162019
<6>[ 6646.900733][T503585] CAM_ERR: CAM-SMMU: cam_smmu_map_buffer_and_add_to_list: 2310 buffer validation failure
<6>[ 6646.900735][T503585] CAM_ERR: CAM-SMMU: cam_smmu_map_user_iova: 3311 mapping or add list fail cb:icp idx=2, fd=803, region=1, rc=-12
<6>[ 6646.900737][T503585] CAM_ERR: CAM-SMMU: cam_smmu_dump_cb_info: 610 ********** 4:53:5:639 Context bank dump for icp **********
<6>[ 6646.900739][T503585] CAM_ERR: CAM-SMMU: cam_smmu_dump_cb_info: 616 Usage: shared_usage=254640128 io_usage=705449984 shared_free=9601024 io_free=3290673152
<6>[ 6646.900742][T503585] CAM_ERR: CAM-MEM: cam_mem_util_map_hw_va: 880 Failed secured map to smmu, i=0, fd=803, dir=0, mmu_hdl=162019, rc=-12
<6>[ 6646.900744][T503585] CAM_ERR: CAM-MEM: cam_mem_mgr_alloc_and_map: 987 Failed in map_hw_va len=10620928, flags=0x859, fd=803, region=1, num_hdl=1, rc=-12
<6>[ 6646.911541][ C2] CAM_INFO: CAM-ISP: __cam_isp_ctx_check_deferred_buf_done: 2203 Buf done with no active request but with req in wait list, req 615 last apply id:615 last err id:0
<6>[ 6646.935116][T221544] CAM_INFO: CAM-CRM: __cam_req_mgr_find_dev_name: 245 Skip Frame: req: 617 not ready on link: 0x490126 for pd: 2 dev: cam-sensor open_req count: 3 -
经过分析高通 IOVA内存在kernel dts就固定了size,我们调整大小,复测pass。
dma-coherent; icp_iova_mem_map: iova-mem-map { iova-mem-region-shared { /* Shared region is ~350MB long */ iova-region-name = "shared"; iova-region-start = <0x800000>; iova-region-len = <0x16000000>; //长度增加即可。 iova-region-id = <0x1>; status = "ok";