StarRocks 集群安装部署文档CENTOS
📋 部署前准备
1. 系统要求
硬件要求
-
FE节点:4核8GB内存以上
-
BE节点:8核16GB内存以上(建议16核32GB)
-
磁盘:SSD推荐,至少100GB可用空间
-
网络:万兆网络,节点间延迟<1ms
软件要求
-
操作系统:CentOS 7.x / Ubuntu 16.04+
-
Java:JDK 8/11
-
CPU:支持AVX2指令集
2. 环境检查脚本
创建环境检查脚本 check_environment.sh:
bash
#!/bin/bash
echo "=== StarRocks 环境检查 ==="
# 检查操作系统
echo -e "\n📋 操作系统信息:"
cat /etc/os-release
# 检查CPU是否支持AVX2
echo -e "\n🔍 CPU AVX2 支持检查:"
if grep -q avx2 /proc/cpuinfo; then
echo "✅ CPU 支持 AVX2"
else
echo "❌ CPU 不支持 AVX2,StarRocks 需要 AVX2 支持"
fi
# 检查内存
echo -e "\n💾 内存信息:"
free -h
# 检查磁盘
echo -e "\n💿 磁盘信息:"
df -h
# 检查Java
echo -e "\n☕ Java 版本:"
java -version 2>/dev/null || echo "Java 未安装"
# 检查防火墙
echo -e "\n🔥 防火墙状态:"
systemctl status firewalld 2>/dev/null | grep Active || echo "Firewalld 未运行"
# 检查时间同步
echo -e "\n⏰ 时间同步状态:"
timedatectl status 2>/dev/null | grep "System clock" || ntpstat
# 检查网络
echo -e "\n🌐 网络连通性:"
ping -c 2 www.starrocks.io > /dev/null && echo "✅ 外网连通正常" || echo "❌ 外网连接失败"
echo -e "\n✅ 环境检查完成"
🚀 集群规划示例
假设3节点集群:
-
节点1:FE Leader + BE
-
节点2:FE Follower + BE
-
节点3:FE Observer + BE
| 节点 | IP地址 | 角色 | 配置 |
|---|---|---|---|
| sr-node1 | 192.168.1.101 | FE(Leader), BE | 8C16G |
| sr-node2 | 192.168.1.102 | FE(Follower), BE | 8C16G |
| sr-node3 | 192.168.1.103 | FE(Observer), BE | 8C16G |
📥 安装部署步骤
1. 下载安装包
bash
# 在所有节点执行
cd /opt
wget https://releases.starrocks.io/starrocks/StarRocks-3.1.0.tar.gz
tar -xzf StarRocks-3.1.0.tar.gz
mv StarRocks-3.1.0 starrocks
cd starrocks
2. 配置环境变量
bash
# 编辑 /etc/profile.d/starrocks.sh
sudo tee /etc/profile.d/starrocks.sh << 'EOF'
export STARROCKS_HOME=/opt/starrocks
export PATH=$STARROCKS_HOME/fe/bin:$STARROCKS_HOME/be/bin:$PATH
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk # 根据实际路径调整
EOF
source /etc/profile.d/starrocks.sh
3. FE 节点配置
节点1 (FE Leader) 配置:
bash
# 创建FE配置目录
mkdir -p /opt/starrocks/fe/conf
# 创建fe.conf
sudo tee /opt/starrocks/fe/conf/fe.conf << 'EOF'
# 元数据目录
meta_dir = /opt/starrocks/fe/meta
# 网络配置
priority_networks = 192.168.1.0/24
http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010
# JVM配置
JAVA_OPTS = "-Xmx8192m -Xms8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0"
# 集群配置
cluster_id = 1
# 其他配置
sys_log_level = INFO
EOF
节点2 (FE Follower) 配置:
bash
# 创建fe.conf(与节点1类似,但不需要指定cluster_id)
sudo tee /opt/starrocks/fe/conf/fe.conf << 'EOF'
meta_dir = /opt/starrocks/fe/meta
priority_networks = 192.168.1.0/24
http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010
JAVA_OPTS = "-Xmx8192m -Xms8192m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0"
sys_log_level = INFO
EOF
节点3 (FE Observer) 配置:
bash
# 创建fe.conf(与节点2相同)
sudo tee /opt/starrocks/fe/conf/fe.conf << 'EOF'
meta_dir = /opt/starrocks/fe/meta
priority_networks = 192.168.1.0/24
http_port = 8030
rpc_port = 9020
query_port = 9030
edit_log_port = 9010
JAVA_OPTS = "-Xmx4096m -Xms4096m -XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0"
sys_log_level = INFO
EOF
4. BE 节点配置
在所有节点配置BE:
bash
# 创建BE配置目录
mkdir -p /opt/starrocks/be/conf
mkdir -p /opt/starrocks/be/storage
# 创建be.conf
sudo tee /opt/starrocks/be/conf/be.conf << 'EOF'
# 存储目录
storage_root_path = /opt/starrocks/be/storage
# 网络配置
priority_networks = 192.168.1.0/24
be_port = 9060
be_http_port = 8040
heartbeat_service_port = 9050
brpc_port = 8060
# 资源限制
mem_limit = 80%
storage_page_cache_limit = 40%
# 其他配置
sys_log_level = INFO
EOF
5. 启动集群
启动FE(按顺序启动):
bash
# 在节点1启动FE Leader
cd /opt/starrocks/fe
./bin/start_fe.sh --daemon
# 检查节点1启动状态
curl http://192.168.1.101:8030/api/health
# 在节点2启动FE Follower(连接到Leader)
mysql -h 192.168.1.101 -P 9030 -uroot -e "ALTER SYSTEM ADD FOLLOWER '192.168.1.102:9010'"
cd /opt/starrocks/fe
./bin/start_fe.sh --daemon
# 在节点3启动FE Observer
mysql -h 192.168.1.101 -P 9030 -uroot -e "ALTER SYSTEM ADD OBSERVER '192.168.1.103:9010'"
cd /opt/starrocks/fe
./bin/start_fe.sh --daemon
启动所有BE节点:
bash
# 在每个节点启动BE
cd /opt/starrocks/be
./bin/start_be.sh --daemon
# 在FE Leader上添加BE节点
mysql -h 192.168.1.101 -P 9030 -uroot -e "
ALTER SYSTEM ADD BACKEND '192.168.1.101:9050';
ALTER SYSTEM ADD BACKEND '192.168.1.102:9050';
ALTER SYSTEM ADD BACKEND '192.168.1.103:9050';
"
6. 验证集群状态
创建验证脚本 check_cluster_status.sh:
bash
#!/bin/bash
echo "=== StarRocks 集群状态检查 ==="
# 检查FE状态
echo -e "\n📊 FE 节点状态:"
mysql -h 192.168.1.101 -P 9030 -uroot -e "SHOW PROC '/frontends'" 2>/dev/null
# 检查BE状态
echo -e "\n💾 BE 节点状态:"
mysql -h 192.168.1.101 -P 9030 -uroot -e "SHOW PROC '/backends'" 2>/dev/null
# 检查集群版本
echo -e "\n🔍 集群版本信息:"
mysql -h 192.168.1.101 -P 9030 -uroot -e "SELECT CURRENT_VERSION()" 2>/dev/null
# 测试简单查询
echo -e "\n✅ 简单查询测试:"
mysql -h 192.168.1.101 -P 9030 -uroot -e "SELECT 1 as test_result" 2>/dev/null
echo -e "\n🎉 集群状态检查完成"
运行验证:
bash
chmod +x check_cluster_status.sh
./check_cluster_status.sh
🔧 系统服务配置
创建systemd服务
FE服务:
bash
sudo tee /etc/systemd/system/starrocks-fe.service << 'EOF'
[Unit]
Description=StarRocks FE
After=network.target
[Service]
Type=forking
User=root
Group=root
ExecStart=/opt/starrocks/fe/bin/start_fe.sh --daemon
ExecStop=/opt/starrocks/fe/bin/stop_fe.sh
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
BE服务:
bash
sudo tee /etc/systemd/system/starrocks-be.service << 'EOF'
[Unit]
Description=StarRocks BE
After=network.target
[Service]
Type=forking
User=root
Group=root
ExecStart=/opt/starrocks/be/bin/start_be.sh --daemon
ExecStop=/opt/starrocks/be/bin/stop_be.sh
Restart=on-failure
RestartSec=10
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
启用服务:
bash
sudo systemctl daemon-reload
sudo systemctl enable starrocks-fe starrocks-be
sudo systemctl start starrocks-fe starrocks-be
📝 常用管理命令
集群管理
sql
-- 查看FE状态
SHOW PROC '/frontends';
-- 查看BE状态
SHOW PROC '/backends';
-- 查看数据库
SHOW DATABASES;
-- 创建用户
CREATE USER 'test' IDENTIFIED BY 'password';
GRANT ALL ON *.* TO 'test';
服务管理
bash
# 启动服务
systemctl start starrocks-fe
systemctl start starrocks-be
# 停止服务
systemctl stop starrocks-fe
systemctl stop starrocks-be
# 查看服务状态
systemctl status starrocks-fe
systemctl status starrocks-be
🐛 故障排查
常见问题解决
-
FE启动失败
bash
# 检查元数据 tail -f /opt/starrocks/fe/log/fe.log # 检查端口占用 netstat -tulpn | grep -E ':(8030|9030|9010)' -
BE无法注册
bash
# 检查BE日志 tail -f /opt/starrocks/be/log/be.INFO # 检查网络连通性 telnet 192.168.1.101 9030 -
磁盘空间不足
sql
-- 清理过期数据 ADMIN SET FRONTEND CONFIG ("max_tablet_version_num" = "500");
📊 监控配置
启用监控指标
bash
# 在BE配置中启用监控
echo "enable_metric_calculator = true" >> /opt/starrocks/be/conf/be.conf
# 重启BE服务
systemctl restart starrocks-be
访问监控界面:http://192.168.1.101:8030
✅ 部署完成检查清单
-
所有节点时间同步
-
防火墙端口开放
-
FE节点正常启动
-
BE节点正常注册
-
集群状态健康
-
简单查询测试通过
-
系统服务配置完成
恭喜!StarRocks 集群部署完成! 🎉