TiDB 单机部署与监控完整指南 🖥️
一、为什么单机部署?
适用场景:
- 开发测试环境
- 学习验证 TiDB
- 功能演示
- 小规模应用(< 50GB 数据)
不适用:
- 生产环境
- 性能测试
- 高可用场景
二、单机部署详细步骤
第1步:环境准备
1.1 硬件要求(最低配置)
makefile
# 检查你的机器配置
CPU: 4核+(推荐8核)
内存: 8GB+(推荐16GB)
磁盘: 100GB+ SSD(必须SSD!)
操作系统: CentOS 7.3+/Ubuntu 18.04+
# 检查命令
lscpu
free -h
df -h
lsblk
1.2 系统配置优化
bash
# 1. 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
# 2. 关闭SELinux
setenforce 0
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
# 3. 调整文件句柄数
echo "fs.file-max = 1000000" >> /etc/sysctl.conf
echo "fs.nr_open = 1000000" >> /etc/sysctl.conf
sysctl -p
# 4. 设置用户限制
cat >> /etc/security/limits.conf << EOF
* soft nofile 1000000
* hard nofile 1000000
* soft nproc 65536
* hard nproc 65536
* soft stack 10240
EOF
# 5. 禁用透明大页
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
# 6. 创建部署目录
mkdir -p /tidb-data
mkdir -p /tidb-deploy
chmod 755 /tidb-data /tidb-deploy
第2步:安装 TiUP(一键式工具)
bash
# 1. 安装TiUP
curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh
# 2. 设置环境变量
echo 'export PATH=$HOME/.tiup/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
# 3. 验证安装
tiup --version
# 4. 安装cluster组件
tiup cluster
# 5. 查看可安装版本
tiup list tidb
第3步:编写单机拓扑配置
创建文件 single.yaml:
yaml
# 单机部署拓扑配置
global:
user: "root" # 用root用户部署
ssh_port: 22
deploy_dir: "/tidb-deploy"
data_dir: "/tidb-data"
arch: "amd64"
pd_servers:
- host: 127.0.0.1
name: "pd-1"
client_port: 2379
peer_port: 2380
deploy_dir: "/tidb-deploy/pd-2379"
data_dir: "/tidb-data/pd-2379"
log_dir: "/tidb-deploy/pd-2379/log"
tidb_servers:
- host: 127.0.0.1
port: 4000
status_port: 10080
deploy_dir: "/tidb-deploy/tidb-4000"
log_dir: "/tidb-deploy/tidb-4000/log"
tikv_servers:
- host: 127.0.0.1
port: 20160
status_port: 20180
deploy_dir: "/tidb-deploy/tikv-20160"
data_dir: "/tidb-data/tikv-20160"
log_dir: "/tidb-deploy/tikv-20160/log"
monitoring_servers:
- host: 127.0.0.1
port: 9090
deploy_dir: "/tidb-deploy/prometheus-9090"
data_dir: "/tidb-data/prometheus-9090"
log_dir: "/tidb-deploy/prometheus-9090/log"
grafana_servers:
- host: 127.0.0.1
port: 3000
deploy_dir: "/tidb-deploy/grafana-3000"
alertmanager_servers:
- host: 127.0.0.1
web_port: 9093
cluster_port: 9094
deploy_dir: "/tidb-deploy/alertmanager-9093"
data_dir: "/tidb-data/alertmanager-9093"
log_dir: "/tidb-deploy/alertmanager-9093/log"
第4步:执行部署
vbnet
# 1. 检查环境(单机可跳过检查)
tiup cluster check single.yaml
# 2. 部署集群(使用v7.5.0版本)
tiup cluster deploy tidb-single v7.5.0 single.yaml
# 3. 启动集群
tiup cluster start tidb-single
# 4. 查看状态
tiup cluster display tidb-single
预期输出:
bash
Cluster type: tidb
Cluster name: tidb-single
Cluster version: v7.5.0
SSH type: builtin
Dashboard URL: http://127.0.0.1:2379/dashboard
Grafana URL: http://127.0.0.1:3000
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
127.0.0.1:9093 alertmanager 127.0.0.1 9093/9094 linux/x86_64 Up /tidb-data/alertmanager /tidb-deploy/alertmanager
127.0.0.1:3000 grafana 127.0.0.1 3000 linux/x86_64 Up - /tidb-deploy/grafana
127.0.0.1:2379 pd 127.0.0.1 2379/2380 linux/x86_64 Up /tidb-data/pd /tidb-deploy/pd
127.0.0.1:9090 prometheus 127.0.0.1 9090 linux/x86_64 Up /tidb-data/prometheus /tidb-deploy/prometheus
127.0.0.1:4000 tidb 127.0.0.1 4000/10080 linux/x86_64 Up - /tidb-deploy/tidb
127.0.0.1:20160 tikv 127.0.0.1 20160/20180 linux/x86_64 Up /tidb-data/tikv /tidb-deploy/tikv
第5步:验证部署
sql
# 1. 连接到TiDB(类似MySQL)
mysql -h 127.0.0.1 -P 4000 -u root
# 2. 执行测试SQL
SELECT VERSION();
SHOW DATABASES;
CREATE DATABASE test;
USE test;
CREATE TABLE t1 (id INT PRIMARY KEY, name VARCHAR(20));
INSERT INTO t1 VALUES (1, 'tidb'), (2, 'mysql');
SELECT * FROM t1;
DROP DATABASE test;
EXIT;
第6步:配置优化(单机特调)
arduino
# 编辑配置
tiup cluster edit-config tidb-single
在编辑器中添加:
yaml
server_configs:
pd:
replication.max-replicas: 1 # 单机设置为1副本
schedule.leader-schedule-limit: 4
schedule.region-schedule-limit: 2048
schedule.replica-schedule-limit: 64
tikv:
storage.block-cache.capacity: "2GB" # 根据内存调整
raftstore.apply-pool-size: 2
raftstore.store-pool-size: 2
coprocessor.region-max-size: "200MB"
coprocessor.region-max-keys: 2000000
tidb:
performance.max-procs: 4
mem-quota-query: 2147483648 # 2GB
log.slow-threshold: 500
prepared-plan-cache.enabled: true
prepared-plan-cache.capacity: 50
保存后重新加载配置:
vbnet
tiup cluster reload tidb-single
三、监控系统配置
3.1 访问监控面板
markdown
1. Grafana监控: http://127.0.0.1:3000
- 用户名: admin
- 密码: admin(首次登录会要求修改)
2. TiDB Dashboard: http://127.0.0.1:2379/dashboard
- 用户名: root
- 密码: (无密码,直接登录)
3. Prometheus: http://127.0.0.1:9090
3.2 导入监控面板
在Grafana中导入以下面板:
| 面板名称 | ID | 用途 |
|---|---|---|
| TiDB集群概览 | 12599 | 核心监控 |
| TiDB详细指标 | 12600 | TiDB详情 |
| TiKV详细指标 | 12603 | TiKV详情 |
| PD详细指标 | 12605 | PD详情 |
| Node Exporter | 11074 | 主机监控 |
导入步骤:
- 访问 http://127.0.0.1:3000
- 点击 ➕ → Import
- 输入面板ID → Load
- 选择 Prometheus 数据源 → Import
3.3 关键监控指标
1. 集群概览面板
markdown
1. Query Duration (P99): 查询延迟(< 1秒正常)
2. QPS: 每秒查询数
3. Storage Capacity: 存储使用率
4. Connection Count: 连接数
5. Region Health: Region健康度
2. 单机特有关注点
sql
-- 查看内存使用
SELECT * FROM information_schema.processlist ORDER BY MEM DESC LIMIT 10;
-- 查看锁信息
SELECT * FROM information_schema.deadlocks;
-- 查看慢查询
SELECT * FROM information_schema.slow_query
WHERE time > NOW() - INTERVAL 1 HOUR
ORDER BY Time DESC
LIMIT 10;
3.4 设置告警规则
创建告警配置文件 alerts.yml:
yaml
groups:
- name: tidb-single-alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "高CPU使用率"
description: "CPU使用率超过80%持续5分钟"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "高内存使用率"
description: "内存使用率超过85%持续5分钟"
- alert: LowDiskSpace
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 20
for: 5m
labels:
severity: critical
annotations:
summary: "磁盘空间不足"
description: "根分区可用空间小于20%"
- alert: TiDBDown
expr: up{job="tidb"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "TiDB服务宕机"
description: "TiDB实例已宕机超过1分钟"
- alert: TiKVDown
expr: up{job="tikv"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "TiKV服务宕机"
description: "TiKV实例已宕机超过1分钟"
导入告警规则:
bash
cp alerts.yml /tidb-deploy/prometheus-9090/conf/
tiup cluster reload tidb-single -R prometheus
四、常用操作命令
4.1 启停管理
bash
# 启动集群
tiup cluster start tidb-single
# 停止集群
tiup cluster stop tidb-single
# 重启集群
tiup cluster restart tidb-single
# 查看状态
tiup cluster display tidb-single
# 查看日志
tiup cluster audit tidb-single
tail -f /tidb-deploy/tidb-4000/log/tidb.log
tail -f /tidb-deploy/tikv-20160/log/tikv.log
4.2 备份恢复
bash
# 1. 安装BR工具
tiup install br:v7.5.0
# 2. 全量备份
br backup full \
--pd "127.0.0.1:2379" \
--storage "local:///tmp/backup" \
--log-file /tmp/backup.log
# 3. 恢复备份
br restore full \
--pd "127.0.0.1:2379" \
--storage "local:///tmp/backup" \
--log-file /tmp/restore.log
4.3 性能测试
ini
# 1. 安装sysbench
curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.rpm.sh | sudo bash
sudo yum -y install sysbench
# 2. 准备数据
sysbench oltp_common \
--db-driver=mysql \
--mysql-host=127.0.0.1 \
--mysql-port=4000 \
--mysql-user=root \
--mysql-db=sbtest \
prepare
# 3. 运行测试
sysbench oltp_read_write \
--db-driver=mysql \
--mysql-host=127.0.0.1 \
--mysql-port=4000 \
--mysql-user=root \
--mysql-db=sbtest \
--threads=8 \
--time=300 \
run
五、故障排查
5.1 常见问题
问题1:启动失败
bash
# 检查端口占用
netstat -tlnp | grep -E "4000|2379|2380|20160|9090|3000"
# 检查目录权限
ls -la /tidb-*
# 查看详细日志
cat /tidb-deploy/pd-2379/log/pd.log | tail -50
问题2:连接失败
bash
# 检查TiDB是否运行
tiup cluster display tidb-single
# 测试连接
mysql -h 127.0.0.1 -P 4000 -u root -e "SELECT 1"
# 查看TiDB日志
tail -f /tidb-deploy/tidb-4000/log/tidb.log
问题3:内存不足
sql
-- 查看内存使用
SELECT * FROM information_schema.processlist ORDER BY MEM DESC LIMIT 5;
-- 终止占用内存的查询
KILL TIDB [query_id];
5.2 性能优化
sql
-- 1. 分析慢查询
SELECT query, digest, avg_process_time, max_process_time, exec_count
FROM information_schema.slow_query
ORDER BY max_process_time DESC
LIMIT 10;
-- 2. 检查索引
ANALYZE TABLE your_table;
SHOW INDEX FROM your_table;
-- 3. 调整TiDB参数
SET GLOBAL tidb_mem_quota_query = 1073741824; -- 1GB
SET GLOBAL tidb_enable_fast_analyze = ON;
六、维护脚本
6.1 健康检查脚本
bash
#!/bin/bash
# tidb_health_check.sh
echo "=== TiDB 单机健康检查 ==="
echo "检查时间: $(date)"
echo ""
# 1. 检查进程
echo "1. 进程状态:"
tiup cluster display tidb-single 2>/dev/null | grep -E "127.0.0.1.*(Up|Down)"
# 2. 检查端口
echo -e "\n2. 端口监听:"
for port in 4000 2379 2380 20160 9090 3000; do
if netstat -tln | grep ":$port " > /dev/null; then
echo "端口 $port: ✓ 已监听"
else
echo "端口 $port: ✗ 未监听"
fi
done
# 3. 检查磁盘空间
echo -e "\n3. 磁盘空间:"
df -h /tidb-data / 2>/dev/null | awk '{print $6, $5}'
# 4. 检查内存
echo -e "\n4. 内存使用:"
free -h | grep -E "^Mem:"
# 5. 测试连接
echo -e "\n5. 数据库连接:"
mysql -h 127.0.0.1 -P 4000 -u root -e "SELECT VERSION();" 2>/dev/null && echo "连接: ✓ 成功" || echo "连接: ✗ 失败"
6.2 备份脚本
bash
#!/bin/bash
# backup_tidb.sh
BACKUP_DIR="/tmp/tidb_backup_$(date +%Y%m%d_%H%M%S)"
LOG_FILE="/tmp/backup_$(date +%Y%m%d).log"
echo "开始备份: $(date)" | tee -a $LOG_FILE
# 创建备份目录
mkdir -p $BACKUP_DIR
# 执行备份
tiup br backup full \
--pd "127.0.0.1:2379" \
--storage "local://$BACKUP_DIR" \
--ratelimit 10 \
--log-file "$BACKUP_DIR/backup.log" 2>&1 | tee -a $LOG_FILE
if [ $? -eq 0 ]; then
echo "备份成功: $BACKUP_DIR" | tee -a $LOG_FILE
# 清理旧备份(保留最近7天)
find /tmp -name "tidb_backup_*" -type d -mtime +7 -exec rm -rf {} ;
else
echo "备份失败" | tee -a $LOG_FILE
exit 1
fi
七、升级TiDB版本
perl
# 1. 查看当前版本
tiup cluster display tidb-single | grep "Cluster version"
# 2. 查看可用版本
tiup list tidb
# 3. 备份数据
br backup full --pd "127.0.0.1:2379" --storage "local:///tmp/backup_before_upgrade"
# 4. 升级到新版本
tiup cluster upgrade tidb-single v7.6.0
# 5. 验证升级
tiup cluster display tidb-single
mysql -h 127.0.0.1 -P 4000 -u root -e "SELECT VERSION();"
八、卸载清理
bash
# 1. 停止集群
tiup cluster stop tidb-single
# 2. 销毁集群
tiup cluster destroy tidb-single
# 3. 清理数据目录
rm -rf /tidb-data /tidb-deploy
# 4. 清理TiUP数据
rm -rf ~/.tiup
# 5. 卸载TiUP
rm -rf ~/.tiup
sed -i '/tiup/d' ~/.bashrc
九、监控大屏示例
9.1 监控指标清单
markdown
✅ 必须监控:
1. 查询延迟(P95/P99)
2. QPS/TPS
3. 连接数
4. 内存使用率
5. CPU使用率
6. 磁盘使用率
7. TiKV Region数
8. 慢查询数
9. 错误日志
10. 备份状态
9.2 Grafana面板截图
ruby
集群概览 → http://127.0.0.1:3000/d/12599
TiDB详情 → http://127.0.0.1:3000/d/12600
TiKV详情 → http://127.0.0.1:3000/d/12603
PD详情 → http://127.0.0.1:3000/d/12605
十、快速命令速查
bash
# 连接数据库
mysql -h 127.0.0.1 -P 4000 -u root
# 查看集群状态
tiup cluster display tidb-single
# 重启TiDB
tiup cluster restart tidb-single -R tidb
# 查看日志
tail -f /tidb-deploy/tidb-4000/log/tidb.log
# 监控地址
# Grafana: http://127.0.0.1:3000
# Dashboard: http://127.0.0.1:2379/dashboard
# Prometheus: http://127.0.0.1:9090
# 备份
br backup full --pd "127.0.0.1:2379" --storage "local:///tmp/backup"
# 恢复
br restore full --pd "127.0.0.1:2379" --storage "local:///tmp/backup"
总结
单机部署TiDB适合学习和测试,记住几个要点:
- 必须用SSD,否则性能极差
- 内存至少8GB,推荐16GB
- 定期备份,单机无高可用
- 监控是必须的,有问题先看监控
- 版本选择,用最新稳定版
按照这个指南,你可以在30分钟内完成TiDB单机部署和监控配置。开始你的TiDB之旅吧!