记一次 Starrocks be 内存异常宕机

突发性 be 内存飙高,直至被系统 kill 掉,be 内存如下:其中 starrocks_be_update_mem_bytes 指标打满,重启也是如此

bash 复制代码
[root@localhost bin]# curl -XGET -s http://192.168.1.49:8040/metrics | grep "^starrocks_be_.*_mem_bytes\|^starrocks_be_tcmalloc_bytes_in_use
starrocks_be_bitmap_index_mem_bytes 0
starrocks_be_bloom_filter_index_mem_bytes 0
starrocks_be_chunk_allocator_mem_bytes 0
starrocks_be_clone_mem_bytes 0
starrocks_be_column_metadata_mem_bytes 5185856
starrocks_be_column_pool_mem_bytes 0
starrocks_be_column_zonemap_index_mem_bytes 127232
starrocks_be_compaction_mem_bytes 1550597312
starrocks_be_consistency_mem_bytes 0
starrocks_be_datacache_mem_bytes 0
starrocks_be_load_mem_bytes 0
starrocks_be_metadata_mem_bytes 172205561
starrocks_be_ordinal_index_mem_bytes 4896744
starrocks_be_process_mem_bytes 59815309344
starrocks_be_query_mem_bytes 0
starrocks_be_rowset_metadata_mem_bytes 66151306
starrocks_be_schema_change_mem_bytes 0
starrocks_be_segment_metadata_mem_bytes 96028
starrocks_be_segment_zonemap_mem_bytes 72196
starrocks_be_short_key_index_mem_bytes 0
starrocks_be_storage_page_cache_mem_bytes 0
starrocks_be_tablet_metadata_mem_bytes 100772371
starrocks_be_tablet_schema_mem_bytes 1618363
starrocks_be_update_mem_bytes 40682742067

dmesg -T 看到被 kill 了

bash 复制代码
# dmesg -T | grep starrocks

[Thu May 29 12:07:24 2025] Killed process 28647 (starrocks_be), UID 0, total-vm:170796752kB, anon-rss:67733148kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 12:28:55 2025] [31816]     0 31816 43159618 16927419   53727        0             0 starrocks_be
[Thu May 29 12:28:55 2025] Out of memory: Kill process 31816 (starrocks_be) score 724 or sacrifice child
[Thu May 29 12:28:55 2025] Killed process 31816 (starrocks_be), UID 0, total-vm:172638472kB, anon-rss:67709676kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 12:55:49 2025] [ 2682]     0  2682 53296564 16972830   63852        0             0 starrocks_be
[Thu May 29 12:55:49 2025] Out of memory: Kill process 2682 (starrocks_be) score 727 or sacrifice child
[Thu May 29 12:55:49 2025] Killed process 2682 (starrocks_be), UID 0, total-vm:213186256kB, anon-rss:67891320kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 13:09:03 2025] [ 4756]     0  4756 52227527 17808095   67753   667099             0 starrocks_be
[Thu May 29 13:09:03 2025] Out of memory: Kill process 4756 (starrocks_be) score 791 or sacrifice child
[Thu May 29 13:09:03 2025] Killed process 4756 (starrocks_be), UID 0, total-vm:208910108kB, anon-rss:71232380kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 13:21:18 2025] [ 8048]     0  8048 55023047 18406542   63982        0             0 starrocks_be
[Thu May 29 13:21:18 2025] Out of memory: Kill process 8048 (starrocks_be) score 788 or sacrifice child
[Thu May 29 13:21:18 2025] Killed process 8048 (starrocks_be), UID 0, total-vm:220092188kB, anon-rss:73626168kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 13:39:41 2025] [10765]     0 10765 62032082 18145670   79366   299756             0 starrocks_be
[Thu May 29 13:39:41 2025] Out of memory: Kill process 10765 (starrocks_be) score 790 or sacrifice child
[Thu May 29 13:39:41 2025] Killed process 10765 (starrocks_be), UID 0, total-vm:248128328kB, anon-rss:72 

be.INFO 持续报:Memory of process exceed limit. Start execute plan f Used: 83343295392, Limit: 61847529062. Mem usage has exceed the limit of BE

bash 复制代码
I0529 10:44:04.416954 10989 starrocks_be.cpp:231] BE start step 11: start brpc server successfully
I0529 10:44:04.423513 10989 starrocks_be.cpp:240] BE start step 12: start http server successfully
I0529 10:44:04.423936 10989 thrift_server.cpp:380] heartbeat has started listening port on 9050
I0529 10:44:04.423982 10989 starrocks_be.cpp:259] BE start step 13: start heartbeat server successfully
I0529 10:44:04.423985 10989 starrocks_be.cpp:261] BE started successfully
I0529 10:44:04.545176 11741 tablet_manager.cpp:816] Found the best tablet to compact. compaction_type=update tablet_id=3544752 highest_score=655
I0529 10:44:04.545372 11741 tablet_updates.cpp:2725] update compaction start tablet:3544752 version:11 score:17605201920 merge levels:3 pick:3/valid:3/all:4 248,282,283 #pick_segments:68 #valid_segments:68 #rows:119341438->119341434 bytes:106.37 MB->106.37 MB(estimate)
I0529 10:44:06.347834 11837 heartbeat_server.cpp:77] get heartbeat from FE.host:192.168.1.49, port:9020, cluster id:274557974, run_mode:SHARED_NOTHING, counter:1
I0529 10:44:06.347885 11837 heartbeat_server.cpp:99] Updating master info: TMasterInfo(network_address=TNetworkAddress(hostname=192.168.1.49, port=9020), cluster_id=274557974, epoch=29, token=8400b357-a521-425d-a338-3c5e7deea427, backend_ip=192.168.1.49, http_port=8030, heartbeat_flags=0, backend_id=10006, min_active_txn_id=395207, run_mode=SHARED_NOTHING)
I0529 10:44:06.347919 11837 heartbeat_server.cpp:104] Master FE is changed or restarted. report tablet and disk info immediately
W0529 10:44:06.406687 11097 mem_hook.cpp:249] large memory alloc, query_id:00000000-0000-0000-0000-000000000000 instance: 00000000-0000-0000-0000-000000000000 acquire:1828867984 bytes, stack:
    @          0x2dbffed  malloc
    @          0x8b3a0b5  operator new()
    @          0x505ab4d  std::vector<>::_M_range_insert<>()
    @          0x505c676  starrocks::PrimaryKeyEncoder::encode()
    @          0x55fc5a5  starrocks::CompactionState::_load_segments()
    @          0x55fd42b  starrocks::CompactionState::_do_load()
    @          0x55fd4d5  _ZZSt9call_onceIZN9starrocks15CompactionState4loadEPNS0_6RowsetEEUlvE_JEEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv
    @     0x2abe0386020b  __pthread_once_slow
    @          0x55fb788  starrocks::CompactionState::load()
    @          0x5137ad5  starrocks::TabletUpdates::_apply_compaction_commit()
    @          0x513ef25  starrocks::TabletUpdates::do_apply()
    @          0x2e79fdd  starrocks::ThreadPool::dispatch_thread()
    @          0x2e739fa  starrocks::Thread::supervise_thread()
    @     0x2abe03861ea5  start_thread
    @     0x2abe0449cb0d  __clone
    @              (nil)  (unknown)
...
E0529 10:39:18.652560  8960 update_compaction_state.cpp:129]  memory limit exceeded when loading compaction state pk tablet_id:3544754 rowset #rows:201887404 size:537369140 seg:0/1 #rows:201887404 memory:20095316174 stats:index:510.85 MB rowset:0 compaction:37.39 GB delvec:8.00 B dcg:0 total:37.89 GB/34.56 GB
W0529 10:39:18.652825  8960 mem_hook.cpp:249] large memory alloc, query_id:00000000-0000-0000-0000-000000000000 instance: 00000000-0000-0000-0000-000000000000 acquire:1615099232 bytes, stack:
    @          0x2dbffed  malloc
    @          0x8b3a0b5  operator new()
    @          0x5034ee6  std::vector<>::reserve()
    @          0x502336c  starrocks::PrimaryIndex::_replace_persistent_index()
    @          0x502354e  starrocks::PrimaryIndex::try_replace()
    @          0x513838c  starrocks::TabletUpdates::_apply_compaction_commit()
    @          0x513ef25  starrocks::TabletUpdates::do_apply()
    @          0x2e79fdd  starrocks::ThreadPool::dispatch_thread()
    @          0x2e739fa  starrocks::Thread::supervise_thread()
    @     0x2b2a272baea5  start_thread
    @     0x2b2a27ef5b0d  __clone
    @              (nil)  (unknown)

每次重启 be 都会去拉起 tablet: 3544744 load persistent indexupdate_compaction

bash 复制代码
I0529 12:45:48.295147  2709 daemon.cpp:197] Current memory statistics: process(1433574152), query_pool(0), load(0), metadata(168087184), compaction(116601792), schema_change(0), column_pool(0), page_cache(0), update(8), chunk_allocator(0), clone(0), consistency(0), datacache(0)
I0529 12:45:49.596513  2799 persistent_index.cpp:4975] load persistent index tablet:3544744 version:11 size: 225867285 l0_size: 0 l0_capacity:0 #shard: 2233 l1_size:23864293 l2_size:4437070901 memory: 261692378 status: OK time:23875ms
...
I0529 12:46:23.093927  2799 update_compaction_state.cpp:137]  loading large compaction state tablet_id:3544744 rowset #rows:225867285 size:661735103 seg:0/1 #rows:225867285 memory:20051758160 stats:index:510.85 MB rowset:0 compaction:18.67 GB delvec:8.00 B dcg:0 total:19.17 GB/34.56 GB
...
E0529 12:46:27.941511  2800 update_compaction_state.cpp:129]  memory limit exceeded when loading compaction state pk tablet_id:3544754 rowset #rows:201887404 size:537369140 seg:0/1 #rows:201887404 memory:20095316174 stats:index:510.85 MB rowset:0 compaction:37.39 GB delvec:8.00 B dcg:0 total:37.89 GB/34.56 GB
I0529 12:46:27.941589  2800 update_compaction_state.cpp:137]  loading large compaction state tablet_id:3544754 rowset #rows:201887404 size:537369140 seg:0/1 #rows:201887404 memory:20095316174 stats:index:510.85 MB rowset:0 compaction:37.39 GB delvec:8.00 B dcg:0 total:37.89 GB/34.56 GB
处理

参考:https://forum.mirrorship.cn/t/topic/5086/2

/data/app/sr/be/lib/starrocks_be: error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory

LD_LIBRARY_PATH 系统库配置路径

删除问题 tablet元数据
bash 复制代码
[root@localhost bin]# ./meta_tool.sh --operation=delete_persistent_index_meta --root_path=/data/dbdata --tablet_id=3544754
------------------------------------------
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0529 15:24:45.140825 30221 data_dir.cpp:135] path: /data/dbdata, hash: 1903728691121462593
delete tablet persistent index meta success, tablet_id: 3544754

[root@localhost bin]# ./meta_tool.sh --operation=delete_meta --root_path=/data/dbdata --tablet_id=3544754

相关:

数据无导入的前提下, compaction却一直发生

Starrocks-BE v3.2.3 每天晚上一直把磁盘IO打到100%,性能损耗巨大

相关推荐
StarRocks_labs7 天前
StarRocks:Connect Data Analytics with the World
数据库·starrocks·iceberg·存算分离·lakehouse 架构
StarRocks_labs8 天前
StarRocks 助力印度领先即时零售平台 Zepto 构建实时洞察能力
大数据·starrocks·clickhouse·存算一体·postgres mvp
StarRocks_labs1 个月前
淘宝闪购实时分析黑科技:StarRocks + Paimon撑起秋天第一波奶茶自由
starrocks·数据湖·阿里巴巴·paimon·物化视图
镜舟科技1 个月前
告别 Hadoop,拥抱 StarRocks!政采云数据平台升级之路
大数据·starrocks·数据仓库·hadoop·存算分离
StarRocks_labs1 个月前
欧洲数字化养殖平台 Herdwatch 借力 Iceberg + StarRocks 提升分析能力
数据库·starrocks·iceberg·湖仓一体架构·herdwatch
阿里云大数据AI技术1 个月前
鹰角网络基于阿里云 EMR Serverless StarRocks 的实时分析工程实践
starrocks·clickhouse·阿里云·emr·实时分析
小Tomkk1 个月前
StarRocks不能启动 ,StarRocksFe节点不能启动问题 处理
starrocks·log满了
jakeswang2 个月前
去哪儿StarRocks实践
starrocks·后端
鸿乃江边鸟2 个月前
Starrocks中的 Query Profile以及explain analyze及trace命令中的区别
大数据·starrocks·sql
鸿乃江边鸟2 个月前
Starrocks ShortCircuit短路径的调度
大数据·starrocks·sql