记一次 Starrocks be 内存异常宕机

突发性 be 内存飙高,直至被系统 kill 掉,be 内存如下:其中 starrocks_be_update_mem_bytes 指标打满,重启也是如此

bash 复制代码
[root@localhost bin]# curl -XGET -s http://192.168.1.49:8040/metrics | grep "^starrocks_be_.*_mem_bytes\|^starrocks_be_tcmalloc_bytes_in_use
starrocks_be_bitmap_index_mem_bytes 0
starrocks_be_bloom_filter_index_mem_bytes 0
starrocks_be_chunk_allocator_mem_bytes 0
starrocks_be_clone_mem_bytes 0
starrocks_be_column_metadata_mem_bytes 5185856
starrocks_be_column_pool_mem_bytes 0
starrocks_be_column_zonemap_index_mem_bytes 127232
starrocks_be_compaction_mem_bytes 1550597312
starrocks_be_consistency_mem_bytes 0
starrocks_be_datacache_mem_bytes 0
starrocks_be_load_mem_bytes 0
starrocks_be_metadata_mem_bytes 172205561
starrocks_be_ordinal_index_mem_bytes 4896744
starrocks_be_process_mem_bytes 59815309344
starrocks_be_query_mem_bytes 0
starrocks_be_rowset_metadata_mem_bytes 66151306
starrocks_be_schema_change_mem_bytes 0
starrocks_be_segment_metadata_mem_bytes 96028
starrocks_be_segment_zonemap_mem_bytes 72196
starrocks_be_short_key_index_mem_bytes 0
starrocks_be_storage_page_cache_mem_bytes 0
starrocks_be_tablet_metadata_mem_bytes 100772371
starrocks_be_tablet_schema_mem_bytes 1618363
starrocks_be_update_mem_bytes 40682742067

dmesg -T 看到被 kill 了

bash 复制代码
# dmesg -T | grep starrocks

[Thu May 29 12:07:24 2025] Killed process 28647 (starrocks_be), UID 0, total-vm:170796752kB, anon-rss:67733148kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 12:28:55 2025] [31816]     0 31816 43159618 16927419   53727        0             0 starrocks_be
[Thu May 29 12:28:55 2025] Out of memory: Kill process 31816 (starrocks_be) score 724 or sacrifice child
[Thu May 29 12:28:55 2025] Killed process 31816 (starrocks_be), UID 0, total-vm:172638472kB, anon-rss:67709676kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 12:55:49 2025] [ 2682]     0  2682 53296564 16972830   63852        0             0 starrocks_be
[Thu May 29 12:55:49 2025] Out of memory: Kill process 2682 (starrocks_be) score 727 or sacrifice child
[Thu May 29 12:55:49 2025] Killed process 2682 (starrocks_be), UID 0, total-vm:213186256kB, anon-rss:67891320kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 13:09:03 2025] [ 4756]     0  4756 52227527 17808095   67753   667099             0 starrocks_be
[Thu May 29 13:09:03 2025] Out of memory: Kill process 4756 (starrocks_be) score 791 or sacrifice child
[Thu May 29 13:09:03 2025] Killed process 4756 (starrocks_be), UID 0, total-vm:208910108kB, anon-rss:71232380kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 13:21:18 2025] [ 8048]     0  8048 55023047 18406542   63982        0             0 starrocks_be
[Thu May 29 13:21:18 2025] Out of memory: Kill process 8048 (starrocks_be) score 788 or sacrifice child
[Thu May 29 13:21:18 2025] Killed process 8048 (starrocks_be), UID 0, total-vm:220092188kB, anon-rss:73626168kB, file-rss:0kB, shmem-rss:0kB
[Thu May 29 13:39:41 2025] [10765]     0 10765 62032082 18145670   79366   299756             0 starrocks_be
[Thu May 29 13:39:41 2025] Out of memory: Kill process 10765 (starrocks_be) score 790 or sacrifice child
[Thu May 29 13:39:41 2025] Killed process 10765 (starrocks_be), UID 0, total-vm:248128328kB, anon-rss:72 

be.INFO 持续报:Memory of process exceed limit. Start execute plan f Used: 83343295392, Limit: 61847529062. Mem usage has exceed the limit of BE

bash 复制代码
I0529 10:44:04.416954 10989 starrocks_be.cpp:231] BE start step 11: start brpc server successfully
I0529 10:44:04.423513 10989 starrocks_be.cpp:240] BE start step 12: start http server successfully
I0529 10:44:04.423936 10989 thrift_server.cpp:380] heartbeat has started listening port on 9050
I0529 10:44:04.423982 10989 starrocks_be.cpp:259] BE start step 13: start heartbeat server successfully
I0529 10:44:04.423985 10989 starrocks_be.cpp:261] BE started successfully
I0529 10:44:04.545176 11741 tablet_manager.cpp:816] Found the best tablet to compact. compaction_type=update tablet_id=3544752 highest_score=655
I0529 10:44:04.545372 11741 tablet_updates.cpp:2725] update compaction start tablet:3544752 version:11 score:17605201920 merge levels:3 pick:3/valid:3/all:4 248,282,283 #pick_segments:68 #valid_segments:68 #rows:119341438->119341434 bytes:106.37 MB->106.37 MB(estimate)
I0529 10:44:06.347834 11837 heartbeat_server.cpp:77] get heartbeat from FE.host:192.168.1.49, port:9020, cluster id:274557974, run_mode:SHARED_NOTHING, counter:1
I0529 10:44:06.347885 11837 heartbeat_server.cpp:99] Updating master info: TMasterInfo(network_address=TNetworkAddress(hostname=192.168.1.49, port=9020), cluster_id=274557974, epoch=29, token=8400b357-a521-425d-a338-3c5e7deea427, backend_ip=192.168.1.49, http_port=8030, heartbeat_flags=0, backend_id=10006, min_active_txn_id=395207, run_mode=SHARED_NOTHING)
I0529 10:44:06.347919 11837 heartbeat_server.cpp:104] Master FE is changed or restarted. report tablet and disk info immediately
W0529 10:44:06.406687 11097 mem_hook.cpp:249] large memory alloc, query_id:00000000-0000-0000-0000-000000000000 instance: 00000000-0000-0000-0000-000000000000 acquire:1828867984 bytes, stack:
    @          0x2dbffed  malloc
    @          0x8b3a0b5  operator new()
    @          0x505ab4d  std::vector<>::_M_range_insert<>()
    @          0x505c676  starrocks::PrimaryKeyEncoder::encode()
    @          0x55fc5a5  starrocks::CompactionState::_load_segments()
    @          0x55fd42b  starrocks::CompactionState::_do_load()
    @          0x55fd4d5  _ZZSt9call_onceIZN9starrocks15CompactionState4loadEPNS0_6RowsetEEUlvE_JEEvRSt9once_flagOT_DpOT0_ENUlvE0_4_FUNEv
    @     0x2abe0386020b  __pthread_once_slow
    @          0x55fb788  starrocks::CompactionState::load()
    @          0x5137ad5  starrocks::TabletUpdates::_apply_compaction_commit()
    @          0x513ef25  starrocks::TabletUpdates::do_apply()
    @          0x2e79fdd  starrocks::ThreadPool::dispatch_thread()
    @          0x2e739fa  starrocks::Thread::supervise_thread()
    @     0x2abe03861ea5  start_thread
    @     0x2abe0449cb0d  __clone
    @              (nil)  (unknown)
...
E0529 10:39:18.652560  8960 update_compaction_state.cpp:129]  memory limit exceeded when loading compaction state pk tablet_id:3544754 rowset #rows:201887404 size:537369140 seg:0/1 #rows:201887404 memory:20095316174 stats:index:510.85 MB rowset:0 compaction:37.39 GB delvec:8.00 B dcg:0 total:37.89 GB/34.56 GB
W0529 10:39:18.652825  8960 mem_hook.cpp:249] large memory alloc, query_id:00000000-0000-0000-0000-000000000000 instance: 00000000-0000-0000-0000-000000000000 acquire:1615099232 bytes, stack:
    @          0x2dbffed  malloc
    @          0x8b3a0b5  operator new()
    @          0x5034ee6  std::vector<>::reserve()
    @          0x502336c  starrocks::PrimaryIndex::_replace_persistent_index()
    @          0x502354e  starrocks::PrimaryIndex::try_replace()
    @          0x513838c  starrocks::TabletUpdates::_apply_compaction_commit()
    @          0x513ef25  starrocks::TabletUpdates::do_apply()
    @          0x2e79fdd  starrocks::ThreadPool::dispatch_thread()
    @          0x2e739fa  starrocks::Thread::supervise_thread()
    @     0x2b2a272baea5  start_thread
    @     0x2b2a27ef5b0d  __clone
    @              (nil)  (unknown)

每次重启 be 都会去拉起 tablet: 3544744 load persistent indexupdate_compaction

bash 复制代码
I0529 12:45:48.295147  2709 daemon.cpp:197] Current memory statistics: process(1433574152), query_pool(0), load(0), metadata(168087184), compaction(116601792), schema_change(0), column_pool(0), page_cache(0), update(8), chunk_allocator(0), clone(0), consistency(0), datacache(0)
I0529 12:45:49.596513  2799 persistent_index.cpp:4975] load persistent index tablet:3544744 version:11 size: 225867285 l0_size: 0 l0_capacity:0 #shard: 2233 l1_size:23864293 l2_size:4437070901 memory: 261692378 status: OK time:23875ms
...
I0529 12:46:23.093927  2799 update_compaction_state.cpp:137]  loading large compaction state tablet_id:3544744 rowset #rows:225867285 size:661735103 seg:0/1 #rows:225867285 memory:20051758160 stats:index:510.85 MB rowset:0 compaction:18.67 GB delvec:8.00 B dcg:0 total:19.17 GB/34.56 GB
...
E0529 12:46:27.941511  2800 update_compaction_state.cpp:129]  memory limit exceeded when loading compaction state pk tablet_id:3544754 rowset #rows:201887404 size:537369140 seg:0/1 #rows:201887404 memory:20095316174 stats:index:510.85 MB rowset:0 compaction:37.39 GB delvec:8.00 B dcg:0 total:37.89 GB/34.56 GB
I0529 12:46:27.941589  2800 update_compaction_state.cpp:137]  loading large compaction state tablet_id:3544754 rowset #rows:201887404 size:537369140 seg:0/1 #rows:201887404 memory:20095316174 stats:index:510.85 MB rowset:0 compaction:37.39 GB delvec:8.00 B dcg:0 total:37.89 GB/34.56 GB
处理

参考:https://forum.mirrorship.cn/t/topic/5086/2

/data/app/sr/be/lib/starrocks_be: error while loading shared libraries: libjvm.so: cannot open shared object file: No such file or directory

LD_LIBRARY_PATH 系统库配置路径

删除问题 tablet元数据
bash 复制代码
[root@localhost bin]# ./meta_tool.sh --operation=delete_persistent_index_meta --root_path=/data/dbdata --tablet_id=3544754
------------------------------------------
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0529 15:24:45.140825 30221 data_dir.cpp:135] path: /data/dbdata, hash: 1903728691121462593
delete tablet persistent index meta success, tablet_id: 3544754

[root@localhost bin]# ./meta_tool.sh --operation=delete_meta --root_path=/data/dbdata --tablet_id=3544754

相关:

数据无导入的前提下, compaction却一直发生

Starrocks-BE v3.2.3 每天晚上一直把磁盘IO打到100%,性能损耗巨大

相关推荐
镜舟科技2 天前
数据仓库分层 4 层模型是什么?
starrocks·数据仓库·数据治理·bi·物化视图·bitmap 索引
鸿乃江边鸟5 天前
Starrocks 物化视图的实现以及在刷新期间能否读数据
java·大数据·starrocks·sql
StarRocks_labs5 天前
StarRocks x Iceberg:云原生湖仓分析技术揭秘与最佳实践
大数据·starrocks·云原生·iceberg·物化视图
鸿乃江边鸟6 天前
Starrocks 怎么计算各个算子的统计信息
大数据·starrocks·sql
鸿乃江边鸟12 天前
Starrocks的CBO基石--统计信息的来源 StatisticAutoCollector
大数据·starrocks·sql
StarRocks_labs13 天前
StarRocks Community Monthly Newsletter (Apr)
数据库·starrocks·数据查询·routine load·stream load
StarRocks_labs17 天前
StarRocks MCP Server 开源发布:为 AI 应用提供强大分析中枢
数据库·starrocks·人工智能·开源·olap·mcp
鸿乃江边鸟17 天前
Starrocks的主键表涉及到的MOR Delete+Insert更新策略
大数据·starrocks·sql
StarRocks_labs20 天前
从InfluxDB到StarRocks:Grab实现Spark监控平台10倍性能提升
大数据·数据库·starrocks·分布式·spark·iris·物化视图