MongoDB 高可用部署:Replica Set 搭建与故障转移测试
-
- 第一章:高可用架构基础与设计原理
-
- [1.1 高可用性核心概念与理论基础](#1.1 高可用性核心概念与理论基础)
- [1.2 副本集架构深度解析](#1.2 副本集架构深度解析)
- [1.3 选举算法与故障检测机制](#1.3 选举算法与故障检测机制)
- 第二章:副本集部署实战
-
- [2.1 环境准备与系统配置](#2.1 环境准备与系统配置)
- [2.2 MongoDB 安装与配置](#2.2 MongoDB 安装与配置)
- [2.3 副本集初始化与配置](#2.3 副本集初始化与配置)
- 第三章:高级配置与优化
-
- [3.1 读写分离与读偏好设置](#3.1 读写分离与读偏好设置)
- [3.2 写关注(Write Concern)配置](#3.2 写关注(Write Concern)配置)
- [3.3 连接池与性能优化](#3.3 连接池与性能优化)
- 第四章:监控与维护
-
- [4.1 监控指标与告警系统](#4.1 监控指标与告警系统)
- [4.2 备份与恢复策略](#4.2 备份与恢复策略)
- 第五章:故障转移测试与验证
-
- [5.1 自动化测试框架](#5.1 自动化测试框架)
- [5.2 网络分区测试](#5.2 网络分区测试)
- 第六章:生产环境最佳实践
-
- [6.1 多数据中心部署策略](#6.1 多数据中心部署策略)
- [6.2 安全加固配置](#6.2 安全加固配置)
- 第七章:故障排查与恢复
-
- [7.1 常见问题处理指南](#7.1 常见问题处理指南)
- [7.2 数据一致性验证工具](#7.2 数据一致性验证工具)
第一章:高可用架构基础与设计原理
1.1 高可用性核心概念与理论基础
在分布式系统设计中,高可用性(High Availability,HA)是一个核心的质量属性,它衡量系统提供服务的时间比例。MongoDB 通过副本集(Replica Set)机制实现高可用性,其设计基于以下几个关键理论:
CAP 定理的应用:
MongoDB 副本集在 CAP 定理中主要提供CP(一致性和分区容错性) 特性。在网络分区发生时,系统优先保证数据一致性,可能会暂时牺牲部分可用性。这种设计选择确保了数据的正确性,避免了脑裂情况下的数据不一致问题。
Raft 共识算法:
MongoDB 的选举机制基于 Raft 算法的变种实现。Raft 通过以下几个机制确保一致性:
- 领导者选举:当主节点失效时,副本集自动选举新的主节点
- 日志复制:所有写操作首先记录到操作日志(oplog),然后复制到从节点
- 安全性保证:确保只有包含最新数据的节点才能成为主节点
副本集的设计目标:
- 自动故障转移:在主节点失效时,系统能在秒级内自动切换到新主节点
- 数据冗余:数据在多个节点间复制,提供物理层面的冗余保障
- 读写分离:支持将读请求分发到从节点,提高系统吞吐量
- 灾难恢复:通过地理分布的节点部署,提供地域级别的容灾能力
- 零数据丢失:通过写关注(Write Concern)机制确保数据持久化
1.2 副本集架构深度解析
MongoDB 副本集采用主从复制架构,但其实现机制比传统主从复制更加智能和自动化。下图展示了副本集的完整架构和数据流:
MongoDB 副本集 oplog 复制 oplog 复制 心跳检测 心跳检测 心跳检测 写请求 读请求 读请求 读请求 直接连接 读偏好查询 主节点
Primary 从节点
Secondary 从节点
Secondary 仲裁节点
Arbiter 客户端应用程序 查询路由器
节点类型与角色详解:
- Primary(主节点):
- 唯一接受所有写操作的节点
- 将写操作记录到 oplog(操作日志)
- 处理所有读请求(除非配置了读偏好)
- 通过心跳机制监控其他节点状态
- Secondary(从节点):
- 异步复制主节点的数据
- 可以处理读请求(根据读偏好配置)
- 参与主节点选举投票
- 可以配置为特殊角色:
- Hidden:对客户端不可见,用于专门任务
- Delayed:延迟复制,用于数据恢复
- Priority 0:不能成为主节点
- Arbiter(仲裁节点):
- 不存储数据,仅参与投票
- 用于解决偶数节点数的投票平局问题
- 资源需求低,适合资源受限环境
数据复制机制:
Oplog(操作日志)是 MongoDB 复制的核心机制,它具有以下特性:
- 是一个 capped collection(固定大小集合)
- 存储在 local 数据库中(local.oplog.rs)
- 每个操作在 Oplog 中都是幂等的
- 默认占用磁盘空间的 5%
- 可以通过 oplogSizeMB 参数调整大小
复制过程:
- 主节点将写操作记录到自己的 oplog
- 从节点定期查询主节点的 oplog 获取新操作
- 从节点按顺序重放这些操作
- 从节点将应用的操作记录到自己的 oplog
1.3 选举算法与故障检测机制
MongoDB 使用基于 Raft 协议变种的选举算法,确保在节点故障时能够快速选出新的主节点。
选举触发条件:
- 主节点不可达(心跳超时)
- 主节点执行 rs.stepDown() 主动退位
- 新节点加入副本集
- 网络分区导致节点间通信中断
- 副本集重新配置
心跳机制:
- 默认每 2 秒发送一次心跳
- 心跳超时时间为 10 秒
- 可通过 settings.heartbeatTimeoutSecs 调整
投票规则: - 每个节点最多只能投一票
- 只有当前投票权为 1 的节点可以投票
- 候选人需要获得多数票(N/2 + 1)才能成为主节点
- 仲裁节点不存储数据但参与投票
选举过程:
- 从节点检测到主节点失效
- 节点等待随机时间(防止多个节点同时发起选举)
- 节点向其他节点发送投票请求
- 其他节点检查候选人的数据新旧程度
- 获得多数票的节点成为新的主节点
第二章:副本集部署实战
2.1 环境准备与系统配置
服务器规划示例:
假设我们部署一个包含 3 个数据节点和 1 个仲裁节点的生产环境副本集:
节点 | IP地址 | 角色 | 数据目录 | 建议配置 |
---|---|---|---|---|
mongo1 | 192.168.1.101 | Primary | /data/mongodb/db1 | 16CPU, 32GB RAM, SSD |
mongo2 | 192.168.1.102 | Secondary | /data/mongodb/db2 | 16CPU, 32GB RAM, SSD |
mongo3 | 192.168.1.103 | Secondary | /data/mongodb/db3 | 16CPU, 32GB RAM, SSD |
mongo-arbiter | 192.168.1.104 | Arbiter | 无数据 | 2CPU, 4GB RAM |
系统优化配置:
bash
# 配置系统限制 - 在所有节点执行
echo "mongodb soft nofile 64000" >> /etc/security/limits.conf
echo "mongodb hard nofile 64000" >> /etc/security/limits.conf
echo "mongodb soft nproc 32000" >> /etc/security/limits.conf
echo "mongodb hard nproc 32000" >> /etc/security/limits.conf
# 禁用透明大页 - 在所有节点执行
echo 'never' > /sys/kernel/mm/transparent_hugepage/enabled
echo 'never' > /sys/kernel/mm/transparent_hugepage/defrag
# 创建系统服务文件
cat > /etc/systemd/system/disable-thp.service << EOF
[Unit]
Description=Disable Transparent Huge Pages
[Service]
Type=oneshot
ExecStart=/bin/sh -c 'echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag'
[Install]
WantedBy=multi-user.target
EOF
# 调整内核参数
cat >> /etc/sysctl.conf << EOF
# MongoDB 优化参数
vm.swappiness = 1
net.core.somaxconn = 4096
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_max_syn_backlog = 4096
EOF
sysctl -p
# 创建数据目录和权限设置
mkdir -p /data/mongodb/{db1,db2,db3,logs}
chown -R mongodb:mongodb /data/mongodb
chmod 755 /data/mongodb
2.2 MongoDB 安装与配置
安装 MongoDB:
bash
# 在 Ubuntu 20.04 上安装 MongoDB 6.0
wget -qO - https://www.mongodb.org/static/pgp/server-6.0.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/6.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-6.0.list
sudo apt-get update
sudo apt-get install -y mongodb-org
# 检查安装版本
mongod --version
配置 mongod.conf:
yaml
# /etc/mongod.conf - 主节点配置示例
systemLog:
destination: file
logAppend: true
path: /data/mongodb/logs/mongod.log
logRotate: reopen
verbosity: 0 # 0=info, 1=debug, 2=verbose
storage:
dbPath: /data/mongodb/db1
journal:
enabled: true
commitIntervalMs: 100
wiredTiger:
engineConfig:
cacheSizeGB: 24 # 建议为系统内存的50%-60%
journalCompressor: snappy
directoryForIndexes: false
collectionConfig:
blockCompressor: snappy
indexConfig:
prefixCompression: true
net:
port: 27017
bindIp: 192.168.1.101,127.0.0.1 # 生产环境应指定IP
maxIncomingConnections: 65536
wireObjectCheck: true
ipv6: false
replication:
replSetName: "rs0"
oplogSizeMB: 20480 # 根据写负载调整,建议能容纳24小时操作
enableMajorityReadConcern: true
processManagement:
fork: true
pidFilePath: /var/run/mongodb/mongod.pid
timeZoneInfo: /usr/share/zoneinfo
# 性能调优参数
setParameter:
enableLocalhostAuthBypass: false
logLevel: 0
tcpFastOpenServer: true
notablescan: false
authenticationMechanisms: SCRAM-SHA-1,SCRAM-SHA-256
# 操作 profiling
operationProfiling:
mode: slowOp
slowOpThresholdMs: 100
rateLimit: 100
配置差异处理:
对于从节点和仲裁节点,需要调整以下参数:
- dbPath:对应节点的数据目录
- bindIp:对应节点的IP地址
- 仲裁节点可以省略 storage 部分的大部分配置
2.3 副本集初始化与配置
初始化副本集:
javascript
// 在主节点上执行初始化
rs.initiate({
_id: "rs0",
version: 1,
members: [
{
_id: 0,
host: "192.168.1.101:27017",
priority: 2, // 更高的优先级
tags: { "dc": "east", "role": "primary" }
},
{
_id: 1,
host: "192.168.1.102:27017",
priority: 1,
tags: { "dc": "east", "role": "secondary" }
},
{
_id: 2,
host: "192.168.1.103:27017",
priority: 1,
tags: { "dc": "west", "role": "secondary" }
},
{
_id: 3,
host: "192.168.1.104:27017",
arbiterOnly: true, // 标记为仲裁节点
priority: 0,
tags: { "dc": "central", "role": "arbiter" }
}
],
settings: {
chainingAllowed: true, // 允许从节点从其他从节点同步
heartbeatIntervalMillis: 2000,
heartbeatTimeoutSecs: 10,
electionTimeoutMillis: 10000,
catchUpTimeoutMillis: 60000,
catchUpTakeoverDelayMillis: 30000,
getLastErrorModes: {},
getLastErrorDefaults: {
w: "majority",
wtimeout: 5000
}
}
})
// 等待初始化完成
sleep(10000)
// 查看副本集状态
rs.status()
// 查看详细的配置信息
rs.conf()
// 检查各节点状态
db.adminCommand({ replSetGetStatus: 1 })
// 验证写关注设置
db.adminCommand({ getDefaultRWConcern: 1 })
副本集状态分析:
javascript
// 详细的副本集状态分析函数
function analyzeReplicaSetStatus() {
const status = rs.status();
const config = rs.conf();
print("=== 副本集状态分析 ===");
// 分析每个成员的状态
status.members.forEach(member => {
print(`节点: ${member.name}`);
print(` 状态: ${member.stateStr}`);
print(` 延迟: ${member.optimeDate ? new Date() - member.optimeDate : 'N/A'} ms`);
print(` 最后一次心跳: ${member.lastHeartbeat}`);
print(` 投票权: ${member.votes}`);
print(` 优先级: ${member.priority}`);
print(` 配置版本: ${member.configVersion}`);
print("---");
});
// 分析选举信息
print(`当前任期: ${status.electionId}`);
print(`上次选举时间: ${status.electionDate}`);
print(`操作日志时间范围: ${status.oplogTime}`);
// 检查健康状况
const unhealthyMembers = status.members.filter(m =>
m.state !== 1 && m.state !== 2 // 非PRIMARY和SECONDARY状态
);
if (unhealthyMembers.length > 0) {
print("警告: 发现不健康节点");
unhealthyMembers.forEach(m => print(` - ${m.name}: ${m.stateStr}`));
}
return status;
}
// 执行分析
analyzeReplicaSetStatus();
第三章:高级配置与优化
3.1 读写分离与读偏好设置
读偏好(Read Preference)策略详解:
javascript
// 连接字符串中指定读偏好
const connectionString = "mongodb://192.168.1.101:27017,192.168.1.102:27017,192.168.1.103:27017/" +
"?replicaSet=rs0" +
"&readPreference=secondaryPreferred" +
"&maxStalenessSeconds=120" +
"&readPreferenceTags=dc:east" +
"&connectTimeoutMS=30000" +
"&socketTimeoutMS=60000" +
"&serverSelectionTimeoutMS=30000";
// 在驱动程序中配置读偏好
const { MongoClient, ReadPreference } = require('mongodb');
const client = new MongoClient(connectionString, {
readPreference: ReadPreference.SECONDARY_PREFERRED,
readPreferenceTags: [{ dc: 'east' }],
maxStalenessSeconds: 120,
poolSize: 50,
minPoolSize: 10,
maxIdleTimeMS: 30000,
waitQueueTimeoutMS: 10000
});
// 在集合级别设置读偏好
const collection = db.collection('users')
.withReadPreference(ReadPreference.SECONDARY)
.withReadConcern('majority')
.withWriteConcern('majority');
// 读偏好选项详细说明:
const readPreferences = {
primary: {
description: "只从主节点读取,提供强一致性",
useCase: "需要最新数据的读写操作",
consistency: "强一致性"
},
primaryPreferred: {
description: "优先从主节点,不可用时从副本节点",
useCase: "可以容忍短暂陈旧数据的应用",
consistency: "最终一致性"
},
secondary: {
description: "只从副本节点读取",
useCase: "报表查询、数据分析等后台任务",
consistency: "陈旧数据"
},
secondaryPreferred: {
description: "优先从副本节点读取",
useCase: "读多写少的应用,减轻主节点压力",
consistency: "陈旧数据"
},
nearest: {
description: "从网络延迟最低的节点读取",
useCase: "地理分布式应用,追求最低延迟",
consistency: "不确定"
}
};
3.2 写关注(Write Concern)配置
写关注级别详解:
javascript
// 不同写关注级别的实际应用
db.orders.insertOne(
{
orderId: "12345",
customer: "John Doe",
amount: 100.50,
items: ["item1", "item2"]
},
{
writeConcern: {
w: "majority", // 需要多数节点确认
j: true, // 等待日志持久化
wtimeout: 5000 // 5秒超时
}
}
);
// 基于标签的写关注
db.reviews.insertOne(
{
productId: "p123",
rating: 5,
comment: "Excellent product"
},
{
writeConcern: {
w: {
// 需要至少2个东部数据中心的节点确认
east: 2
},
wtimeout: 3000
}
}
);
// 在副本集配置中设置默认写关注
rs.reconfig({
// ... 其他配置
settings: {
getLastErrorDefaults: {
w: "majority",
wtimeout: 5000,
j: true
},
getLastErrorModes: {
east: {
// 定义东部数据中心模式
dc: "east"
},
multipleDCs: {
// 需要多个数据中心的确认
dc: 2
}
}
}
});
// 验证写关注设置
db.adminCommand({
getDefaultRWConcern: 1
});
// 临时修改写关注
db.adminCommand({
setDefaultRWConcern: 1,
defaultWriteConcern: {
w: "majority",
wtimeout: 5000
}
});
3.3 连接池与性能优化
高级连接池配置:
java
// Java 驱动连接配置示例
import com.mongodb.*;
import com.mongodb.connection.*;
import com.mongodb.MongoClientSettings;
import com.mongodb.ReadPreference;
import com.mongodb.WriteConcern;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
public class MongoConnectionManager {
private static final String CONNECTION_STRING =
"mongodb://192.168.1.101:27017,192.168.1.102:27017,192.168.1.103:27017/" +
"?replicaSet=rs0" +
"&readPreference=secondaryPreferred" +
"&maxStalenessSeconds=120" +
"&w=majority" +
"&journal=true" +
"&wtimeoutMS=5000";
public static MongoClient createMongoClient() {
ConnectionPoolSettings poolSettings = ConnectionPoolSettings.builder()
.maxSize(100) // 最大连接数
.minSize(10) // 最小连接数
.maxWaitTime(120000) // 最大等待时间(ms)
.maxConnectionLifeTime(1800000) // 连接最大生命周期(ms)
.maxConnectionIdleTime(600000) // 最大空闲时间(ms)
.maintenanceFrequency(60000) // 维护频率(ms)
.maintenanceInitialDelay(0) // 初始延迟(ms)
.build();
SocketSettings socketSettings = SocketSettings.builder()
.connectTimeout(30000) // 连接超时(ms)
.readTimeout(60000) // 读取超时(ms)
.build();
ServerSettings serverSettings = ServerSettings.builder()
.heartbeatFrequency(10000) // 心跳频率(ms)
.minHeartbeatFrequency(500) // 最小心跳频率(ms)
.build();
MongoClientSettings settings = MongoClientSettings.builder()
.applyConnectionString(new ConnectionString(CONNECTION_STRING))
.applyToConnectionPoolSettings(builder -> builder.applySettings(poolSettings))
.applyToSocketSettings(builder -> builder.applySettings(socketSettings))
.applyToServerSettings(builder -> builder.applySettings(serverSettings))
.readPreference(ReadPreference.secondaryPreferred())
.writeConcern(WriteConcern.MAJORITY.withWTimeout(5000, TimeUnit.MILLISECONDS))
.readConcern(ReadConcern.MAJORITY)
.retryReads(true) // 启用读重试
.retryWrites(true) // 启用写重试
.build();
return MongoClients.create(settings);
}
// 连接监控和统计
public static void monitorConnectionPool(MongoClient client) {
MongoClientSettings settings = client.getSettings();
ConnectionPoolSettings poolSettings = settings.getConnectionPoolSettings();
System.out.println("连接池配置:");
System.out.println(" 最大连接数: " + poolSettings.getMaxSize());
System.out.println(" 最小连接数: " + poolSettings.getMinSize());
System.out.println(" 最大等待时间: " + poolSettings.getMaxWaitTime() + "ms");
System.out.println(" 最大生命周期: " + poolSettings.getMaxConnectionLifeTime() + "ms");
// 实际监控需要访问底层连接池统计信息
// 这通常通过JMX或自定义监控实现
}
}
第四章:监控与维护
4.1 监控指标与告警系统
关键监控指标体系:
bash
#!/bin/bash
# MongoDB 副本集监控脚本
# 监控函数
monitor_replica_set() {
local host=${1:-localhost}
local port=${2:-27017}
echo "=== MongoDB 副本集监控 - $(date) ==="
# 1. 检查副本集状态
echo "1. 副本集状态:"
mongosh --host $host --port $port --eval "
try {
const status = rs.status();
print('状态: OK');
print('主节点: ' + status.members.find(m => m.state === 1).name);
print('健康节点数: ' + status.members.filter(m => m.health === 1).length);
print('总节点数: ' + status.members.length);
} catch (e) {
print('状态: ERROR - ' + e.message);
}
"
# 2. 检查复制延迟
echo "2. 复制延迟:"
mongosh --host $host --port $port --eval "
const status = rs.status();
status.members.forEach(member => {
if (member.state === 2) { // SECONDARY
const lag = status.members[0].optime.ts - member.optime.ts;
print(member.name + ': ' + lag + ' 秒延迟');
}
});
"
# 3. 检查操作日志状态
echo "3. 操作日志状态:"
mongosh --host $host --port $port --eval "
const oplog = db.getSiblingDB('local').oplog.rs.stats();
print('大小: ' + (oplog.size / 1024 / 1024).toFixed(2) + ' MB');
print('使用率: ' + (oplog.usedSize / oplog.size * 100).toFixed(2) + '%');
print('时间窗口: ' +
((oplog.timeDiff = new Date(oplog.max).getTime() -
new Date(oplog.min).getTime()) / 3600000).toFixed(2) + ' 小时');
"
# 4. 检查连接数
echo "4. 连接数统计:"
mongosh --host $host --port $port --eval "
const stats = db.serverStatus();
print('当前连接: ' + stats.connections.current);
print('可用连接: ' + stats.connections.available);
print('总创建连接: ' + stats.connections.totalCreated);
"
# 5. 检查内存使用
echo "5. 内存使用:"
mongosh --host $host --port $25000 --eval "
const mem = db.serverStatus().mem;
print(' resident: ' + mem.resident + ' MB');
print(' virtual: ' + mem.virtual + ' MB');
print(' mapped: ' + mem.mapped + ' MB');
"
}
# 执行监控
monitor_replica_set "192.168.1.101" 27017
Prometheus + Grafana 监控配置:
yaml
# prometheus.yml 配置
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'mongodb'
static_configs:
- targets: ['192.168.1.101:9216', '192.168.1.102:9216', '192.168.1.103:9216']
metrics_path: /metrics
relabel_configs:
- source_labels: [__address__]
target_label: instance
- source_labels: [__address__]
regex: '(.*):9216'
target_label: host
replacement: '$1'
- job_name: 'node-exporter'
static_configs:
- targets: ['192.168.1.101:9100', '192.168.1.102:9100', '192.168.1.103:9100']
4.2 备份与恢复策略
逻辑备份策略:
bash
#!/bin/bash
# MongoDB 逻辑备份脚本
# 配置参数
BACKUP_DIR="/backup/mongodb"
LOG_DIR="/var/log/mongodb"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=7
# 创建备份目录
mkdir -p $BACKUP_DIR/$DATE
mkdir -p $LOG_DIR
# 执行备份
echo "开始 MongoDB 备份: $DATE" | tee -a $LOG_DIR/backup.log
# 使用 mongodump 进行备份
mongodump \
--host="rs0/192.168.1.101:27017,192.168.1.102:27017,192.168.1.103:27017" \
--readPreference="secondary" \
--out=$BACKUP_DIR/$DATE \
--gzip \
--oplog \
--numParallelCollections=4 \
--verbose 2>&1 | tee -a $LOG_DIR/backup.log
# 检查备份结果
if [ $? -eq 0 ]; then
echo "备份成功完成: $DATE" | tee -a $LOG_DIR/backup.log
# 计算备份大小
BACKUP_SIZE=$(du -sh $BACKUP_DIR/$DATE | cut -f1)
echo "备份大小: $BACKUP_SIZE" | tee -a $LOG_DIR/backup.log
# 清理旧备份
find $BACKUP_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \; 2>/dev/null
echo "已清理超过 $RETENTION_DAYS 天的旧备份" | tee -a $LOG_DIR/backup.log
else
echo "备份失败: $DATE" | tee -a $LOG_DIR/backup.log
exit 1
fi
# 备份验证
echo "开始备份验证..." | tee -a $LOG_DIR/backup.log
mongorestore --dryRun \
--objcheck \
$BACKUP_DIR/$DATE 2>&1 | tee -a $LOG_DIR/backup.log
if [ $? -eq 0 ]; then
echo "备份验证成功" | tee -a $LOG_DIR/backup.log
else
echo "备份验证失败" | tee -a $LOG_DIR/backup.log
exit 1
fi
物理备份策略:
bash
#!/bin/bash
# MongoDB 物理备份(文件系统快照)
# 配置参数
SNAPSHOT_DIR="/snapshots/mongodb"
DATA_DIR="/data/mongodb"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=3
# 检查是否在主节点上运行
IS_MASTER=$(mongosh --host localhost --eval "db.isMaster().ismaster" --quiet)
if [ "$IS_MASTER" != "true" ]; then
echo "错误: 必须在主节点上执行物理备份" >&2
exit 1
fi
# 创建快照目录
mkdir -p $SNAPSHOT_DIR
# 锁定数据库写入
echo "锁定数据库写入..."
mongosh --host localhost --eval "db.fsyncLock()"
# 创建文件系统快照
echo "创建快照..."
lvcreate --size 10G --snapshot --name mongo-snap-$DATE /dev/vg0/mongo-lv
# 解锁数据库
echo "解锁数据库..."
mongosh --host localhost --eval "db.fsyncUnlock()"
# 挂载快照并备份
echo "备份快照..."
mkdir -p /mnt/snapshot
mount /dev/vg0/mongo-snap-$DATE /mnt/snapshot
# 使用 rsync 进行备份
rsync -av --delete \
--exclude="*.lock" \
--exclude="mongod.lock" \
/mnt/snapshot/ $SNAPSHOT_DIR/$DATE/
# 卸载和清理
umount /mnt/snapshot
lvremove -f /dev/vg0/mongo-snap-$DATE
# 清理旧快照
find $SNAPSHOT_DIR -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \;
echo "物理备份完成: $SNAPSHOT_DIR/$DATE"
第五章:故障转移测试与验证
5.1 自动化测试框架
完整的故障转移测试套件:
python
#!/usr/bin/env python3
import pymongo
import time
import logging
import subprocess
import json
from datetime import datetime
class ComprehensiveFailoverTester:
def __init__(self, connection_string, test_duration=3600):
self.connection_string = connection_string
self.test_duration = test_duration
self.setup_logging()
def setup_logging(self):
"""配置日志记录"""
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('failover_test.log'),
logging.StreamHandler()
]
)
self.logger = logging.getLogger(__name__)
def run_comprehensive_test(self):
"""执行完整的故障转移测试套件"""
test_results = {}
try:
# 测试序列
tests = [
self.test_initial_connection,
self.test_normal_operations,
self.test_primary_failover,
self.test_network_partition,
self.test_secondary_failure,
self.test_arbiter_failure,
self.test_rolling_restart,
self.test_config_change
]
for test in tests:
test_name = test.__name__
self.logger.info(f"开始测试: {test_name}")
start_time = time.time()
result = test()
duration = time.time() - start_time
test_results[test_name] = {
'result': result,
'duration': duration,
'timestamp': datetime.now().isoformat()
}
if not result.get('success', False):
self.logger.error(f"测试失败: {test_name}")
break
time.sleep(30) # 测试间间隔
# 生成测试报告
self.generate_test_report(test_results)
except Exception as e:
self.logger.error(f"测试套件执行失败: {e}")
raise
def test_primary_failover(self):
"""测试主节点故障转移"""
result = {'success': False, 'details': {}}
try:
# 1. 获取当前主节点
client = pymongo.MongoClient(self.connection_string)
primary_before = client.admin.command('isMaster')['primary']
self.logger.info(f"当前主节点: {primary_before}")
# 2. 停止主节点
self.logger.info("停止主节点服务...")
stop_command = f"ssh {primary_before.split(':')[0]} 'systemctl stop mongod'"
subprocess.run(stop_command, shell=True, check=True)
# 3. 监控故障转移
start_time = time.time()
timeout = 120 # 2分钟超时
new_primary = None
while time.time() - start_time < timeout:
try:
# 使用其他节点连接
alt_client = pymongo.MongoClient(
self.connection_string.replace(primary_before, '192.168.1.102:27017'),
serverSelectionTimeoutMS=5000
)
is_master = alt_client.admin.command('isMaster')
if is_master['ismaster']:
new_primary = is_master['primary']
break
time.sleep(2)
except Exception as e:
self.logger.warning(f"等待故障转移: {e}")
time.sleep(5)
if not new_primary:
raise Exception("故障转移超时")
# 4. 验证新主节点
self.logger.info(f"新主节点: {new_primary}")
result['details']['old_primary'] = primary_before
result['details']['new_primary'] = new_primary
result['details']['failover_time'] = time.time() - start_time
# 5. 验证数据一致性
self.verify_data_consistency()
# 6. 恢复原主节点
self.logger.info("恢复原主节点...")
start_command = f"ssh {primary_before.split(':')[0]} 'systemctl start mongod'"
subprocess.run(start_command, shell=True, check=True)
# 等待节点重新加入
time.sleep(30)
result['success'] = True
except Exception as e:
self.logger.error(f"主节点故障转移测试失败: {e}")
result['error'] = str(e)
return result
def verify_data_consistency(self):
"""验证数据一致性"""
client = pymongo.MongoClient(self.connection_string)
# 检查所有数据库的集合
databases = client.list_database_names()
for db_name in databases:
if db_name in ['admin', 'local', 'config']:
continue
db = client[db_name]
collections = db.list_collection_names()
for collection_name in collections:
# 检查文档数量一致性
count_primary = db[collection_name].count_documents({})
# 从其他节点检查
alt_client = pymongo.MongoClient(
self.connection_string,
readPreference=pymongo.ReadPreference.SECONDARY
)
count_secondary = alt_client[db_name][collection_name].count_documents({})
if count_primary != count_secondary:
raise Exception(f"数据不一致: {db_name}.{collection_name}")
def generate_test_report(self, results):
"""生成详细的测试报告"""
report = {
'timestamp': datetime.now().isoformat(),
'test_duration': self.test_duration,
'results': results,
'summary': {
'total_tests': len(results),
'passed_tests': sum(1 for r in results.values() if r['result']['success']),
'failed_tests': sum(1 for r in results.values() if not r['result']['success']),
'total_duration': sum(r['duration'] for r in results.values())
}
}
with open('failover_test_report.json', 'w') as f:
json.dump(report, f, indent=2)
self.logger.info(f"测试报告生成完成: {report['summary']}")
if __name__ == "__main__":
connection_str = "mongodb://192.168.1.101:27017,192.168.1.102:27017,192.168.1.103:27017/?replicaSet=rs0"
tester = ComprehensiveFailoverTester(connection_str)
tester.run_comprehensive_test()
5.2 网络分区测试
网络分区模拟与测试:
python
def test_network_partition(self):
"""测试网络分区场景"""
result = {'success': False, 'details': {}}
try:
client = pymongo.MongoClient(self.connection_string)
# 1. 获取当前拓扑
status = client.admin.command('replSetGetStatus')
members = status['members']
# 2. 模拟网络分区(将主节点与部分节点隔离)
primary = next(m for m in members if m['stateStr'] == 'PRIMARY')
secondaries = [m for m in members if m['stateStr'] == 'SECONDARY']
# 隔离主节点与一个从节点
isolated_secondary = secondaries[0]
self.logger.info(f"模拟网络分区: 隔离 {primary['name']} 和 {isolated_secondary['name']}")
# 使用 iptables 模拟网络分区
isolate_commands = [
f"ssh {primary['name'].split(':')[0]} 'iptables -A INPUT -s {isolated_secondary['name'].split(':')[0]} -j DROP'",
f"ssh {isolated_secondary['name'].split(':')[0]} 'iptables -A INPUT -s {primary['name'].split(':')[0]} -j DROP'"
]
for cmd in isolate_commands:
subprocess.run(cmd, shell=True, check=True)
# 3. 观察系统行为
time.sleep(30) # 等待系统反应
# 4. 检查是否发生故障转移
try:
# 尝试从其他节点连接
other_secondary = secondaries[1]['name']
alt_client = pymongo.MongoClient(
f"mongodb://{other_secondary}/?replicaSet=rs0",
serverSelectionTimeoutMS=5000
)
new_status = alt_client.admin.command('replSetGetStatus')
new_primary = next(m for m in new_status['members'] if m['stateStr'] == 'PRIMARY')
result['details']['original_primary'] = primary['name']
result['details']['new_primary'] = new_primary['name']
result['details']['partition_duration'] = 30
except Exception as e:
self.logger.warning(f"网络分区期间无法连接: {e}")
# 5. 恢复网络
restore_commands = [
f"ssh {primary['name'].split(':')[0]} 'iptables -D INPUT -s {isolated_secondary['name'].split(':')[0]} -j DROP'",
f"ssh {isolated_secondary['name'].split(':')[0]} 'iptables -D INPUT -s {primary['name'].split(':')[0]} -j DROP'"
]
for cmd in restore_commands:
subprocess.run(cmd, shell=True, check=True)
# 6. 验证数据一致性
time.sleep(60) # 等待数据同步
self.verify_data_consistency()
result['success'] = True
except Exception as e:
self.logger.error(f"网络分区测试失败: {e}")
result['error'] = str(e)
return result
第六章:生产环境最佳实践
6.1 多数据中心部署策略
跨数据中心副本集配置:
yaml
# 三数据中心生产环境配置
# 数据中心1: 主数据中心 (东部)
# 数据中心2: 备份数据中心 (西部)
# 数据中心3: 仲裁节点数据中心 (中部)
# mongod.conf 主节点配置
replication:
replSetName: "rs0"
oplogSizeMB: 51200 # 50GB oplog,支持更长的复制窗口
# 副本集配置
rs.reconfig({
_id: "rs0",
version: 2,
members: [
// 数据中心1 - 主站点
{
_id: 0,
host: "dc1-mongo1:27017",
priority: 3, // 最高优先级
tags: { "dc": "east", "rack": "rack1" }
},
{
_id: 1,
host: "dc1-mongo2:27017",
priority: 2,
tags: { "dc": "east", "rack": "rack2" }
},
// 数据中心2 - 备份站点
{
_id: 2,
host: "dc2-mongo1:27017",
priority: 1, // 较低优先级
tags: { "dc": "west", "rack": "rack1" },
hidden: false,
slaveDelay: 0
},
{
_id: 3,
host: "dc2-mongo2:27017",
priority: 1,
tags: { "dc": "west", "rack": "rack2" },
hidden: false
},
// 数据中心3 - 仲裁站点
{
_id: 4,
host: "dc3-arbiter1:27017",
arbiterOnly: true,
priority: 0,
tags: { "dc": "central", "role": "arbiter" }
}
],
settings: {
chainingAllowed: false, // 禁止链式复制
heartbeatIntervalMillis: 2000,
heartbeatTimeoutSecs: 20, // 跨数据中心需要更长的超时
electionTimeoutMillis: 10000,
catchUpTimeoutMillis: 120000,
getLastErrorModes: {
east: { "dc": "east" }, // 东部数据中心模式
west: { "dc": "west" }, // 西部数据中心模式
majorityEast: { "dc": "east", "count": 2 } // 东部多数
},
getLastErrorDefaults: {
w: "majority",
wtimeout: 10000 // 跨数据中心需要更长的超时
}
}
})
6.2 安全加固配置
全面的安全配置:
yaml
# mongod.conf 安全配置
security:
authorization: enabled
keyFile: /etc/mongodb/keyfile
javascriptEnabled: false
redactClientLogData: true
enableEncryption: true
encryptionCipherMode: AES256-CBC
encryptionKeyFile: /etc/mongodb/encryption-keyfile
net:
ssl:
mode: requireSSL
PEMKeyFile: /etc/mongodb/ssl/mongodb.pem
CAFile: /etc/mongodb/ssl/ca.pem
allowConnectionsWithoutCertificates: false
allowInvalidCertificates: false
allowInvalidHostnames: false
disabledProtocols: TLS1_0,TLS1_1 # 禁用不安全的TLS版本
compression:
compressors: snappy,zlib
setParameter:
enableLocalhostAuthBypass: false
logLevel: 1
tcpFastOpenServer: true
notablescan: true # 禁止全表扫描(生产环境)
internalQueryExecMaxBlockingSortBytes: 33554432
internalQueryMaxBlockingSortMemoryUsageBytes: 100000000
密钥文件管理:
bash
#!/bin/bash
# 密钥文件生成和管理脚本
# 生成密钥文件(在其中一个节点执行)
openssl rand -base64 756 > /etc/mongodb/keyfile
chmod 400 /etc/mongodb/keyfile
chown mongodb:mongodb /etc/mongodb/keyfile
# 分发到所有节点
NODES=("192.168.1.101" "192.168.1.102" "192.168.1.103" "192.168.1.104")
for node in "${NODES[@]}"; do
scp /etc/mongodb/keyfile mongodb@$node:/etc/mongodb/keyfile
ssh mongodb@$node "chmod 400 /etc/mongodb/keyfile && chown mongodb:mongodb /etc/mongodb/keyfile"
done
# 生成加密密钥文件
openssl rand -base64 32 > /etc/mongodb/encryption-keyfile
chmod 400 /etc/mongodb/encryption-keyfile
chown mongodb:mongodb /etc/mongodb/encryption-keyfile
# 分发加密密钥文件
for node in "${NODES[@]}"; do
scp /etc/mongodb/encryption-keyfile mongodb@$node:/etc/mongodb/encryption-keyfile
ssh mongodb@$node "chmod 400 /etc/mongodb/encryption-keyfile && chown mongodb:mongodb /etc/mongodb/encryption-keyfile"
done
第七章:故障排查与恢复
7.1 常见问题处理指南
节点状态异常处理:
javascript
// 节点状态检查与恢复函数
function diagnoseAndRecoverNode(nodeAddress) {
print(`诊断节点: ${nodeAddress}`);
try {
const conn = new Mongo(nodeAddress);
const status = conn.adminCommand({ replSetGetStatus: 1 });
// 检查节点状态
const member = status.members.find(m => m.name === nodeAddress);
if (!member) {
print("错误: 节点不在副本集配置中");
return false;
}
print(`当前状态: ${member.stateStr}`);
print(`健康状态: ${member.health === 1 ? '健康' : '不健康'}`);
print(`最后一次心跳: ${member.lastHeartbeat}`);
print(`复制延迟: ${member.optimeLag}秒`);
// 根据状态采取相应措施
switch (member.state) {
case 1: // PRIMARY
print("节点是主节点,状态正常");
return true;
case 2: // SECONDARY
if (member.health === 0) {
print("警告: 从节点不健康");
// 检查复制延迟
if (member.optimeLag > 60) {
print("复制延迟过高,可能需要重新同步");
return restartReplication(nodeAddress);
}
}
return true;
case 3: // RECOVERING
print("节点正在恢复中,请等待...");
return checkRecoveryProgress(nodeAddress);
case 4: // FATAL
print("节点处于致命状态,需要完全重新同步");
return forceResync(nodeAddress);
case 5: // STARTUP2
print("节点正在初始化,请等待...");
return waitForInitialSync(nodeAddress);
case 6: // UNKNOWN
print("节点状态未知,检查网络连接");
return checkNetworkConnectivity(nodeAddress);
case 7: // ARBITER
print("仲裁节点,状态正常");
return true;
case 8: // DOWN
print("节点宕机,检查服务状态");
return restartMongoService(nodeAddress);
case 9: // ROLLBACK
print("节点正在回滚操作");
return monitorRollbackProgress(nodeAddress);
case 10: // REMOVED
print("节点已被移除,需要重新添加");
return reAddNodeToReplicaSet(nodeAddress);
default:
print("未知状态,需要进一步调查");
return false;
}
} catch (e) {
print(`连接节点失败: ${e.message}`);
return false;
}
}
// 强制重新同步节点
function forceResync(nodeAddress) {
print(`强制重新同步节点: ${nodeAddress}`);
try {
// 1. 从副本集移除节点
const config = rs.conf();
const memberId = config.members.findIndex(m => m.host === nodeAddress);
if (memberId === -1) {
print("错误: 节点不在配置中");
return false;
}
config.members.splice(memberId, 1);
rs.reconfig(config, { force: true });
// 2. 清理节点数据
// 注意: 这需要在节点服务器上执行
print("请在目标服务器上执行: sudo systemctl stop mongod");
print("请在目标服务器上执行: sudo rm -rf /data/mongodb/*");
print("请在目标服务器上执行: sudo systemctl start mongod");
// 3. 重新添加节点
rs.add(nodeAddress);
// 4. 等待初始同步完成
print("等待初始同步完成...");
waitForInitialSync(nodeAddress);
return true;
} catch (e) {
print(`重新同步失败: ${e.message}`);
return false;
}
}
7.2 数据一致性验证工具
高级数据一致性检查:
javascript
// 全面的数据一致性验证工具
function comprehensiveConsistencyCheck() {
print("开始全面的数据一致性检查...");
const startTime = new Date();
const inconsistencies = [];
try {
// 获取副本集状态
const status = rs.status();
const primary = status.members.find(m => m.state === 1);
if (!primary) {
throw new Error("没有找到主节点");
}
// 连接到主节点
const primaryConn = new Mongo(primary.name);
const primaryDB = primaryConn.getDB("admin");
// 检查所有数据库
const databases = primaryDB.adminCommand({ listDatabases: 1 }).databases;
for (const dbInfo of databases) {
const dbName = dbInfo.name;
// 跳过系统数据库
if (['admin', 'local', 'config'].includes(dbName)) {
continue;
}
print(`检查数据库: ${dbName}`);
const db = primaryConn.getDB(dbName);
const collections = db.getCollectionNames();
for (const collectionName of collections) {
if (collectionName.startsWith('system.')) {
continue;
}
print(` 检查集合: ${collectionName}`);
// 检查主节点数据
const primaryCount = db[collectionName].countDocuments({});
const primaryHash = calculateCollectionHash(db[collectionName]);
// 检查所有从节点
for (const member of status.members) {
if (member.state === 2) { // SECONDARY
try {
const secondaryConn = new Mongo(member.name);
const secondaryDB = secondaryConn.getDB(dbName);
const secondaryCount = secondaryDB[collectionName].countDocuments({});
const secondaryHash = calculateCollectionHash(secondaryDB[collectionName]);
if (primaryCount !== secondaryCount) {
inconsistencies.push({
database: dbName,
collection: collectionName,
node: member.name,
issue: `文档数量不一致: 主节点=${primaryCount}, 从节点=${secondaryCount}`
});
}
if (primaryHash !== secondaryHash) {
inconsistencies.push({
database: dbName,
collection: collectionName,
node: member.name,
issue: `数据哈希不一致: 主节点=${primaryHash}, 从节点=${secondaryHash}`
});
}
} catch (e) {
inconsistencies.push({
database: dbName,
collection: collectionName,
node: member.name,
issue: `连接失败: ${e.message}`
});
}
}
}
}
}
// 检查操作日志一致性
const oplogStatus = checkOplogConsistency();
if (oplogStatus.inconsistencies.length > 0) {
inconsistencies.push(...oplogStatus.inconsistencies);
}
// 生成报告
const duration = new Date() - startTime;
print(`一致性检查完成,耗时: ${duration}ms`);
if (inconsistencies.length === 0) {
print("所有节点数据一致");
return { success: true, inconsistencies: [] };
} else {
print(`发现 ${inconsistencies.length} 个不一致问题:`);
inconsistencies.forEach(issue => {
print(` - ${issue.database}.${issue.collection} @ ${issue.node}: ${issue.issue}`);
});
return { success: false, inconsistencies: inconsistencies };
}
} catch (e) {
print(`一致性检查失败: ${e.message}`);
return { success: false, error: e.message };
}
}
// 计算集合数据哈希
function calculateCollectionHash(collection) {
const hashResult = collection.aggregate([
{ $sort: { _id: 1 } }, // 确保顺序一致
{ $project: {
hashInput: { $objectToArray: "$$ROOT" }
}},
{ $unwind: "$hashInput" },
{ $group: {
_id: null,
totalHash: {
$sum: {
$function: {
body: function(k, v) {
return hashCode(k + JSON.stringify(v));
},
args: ["$hashInput.k", "$hashInput.v"],
lang: "js"
}
}
}
}}
]).next();
return hashResult ? hashResult.totalHash : 0;
}
// 简单的哈希函数
function hashCode(str) {
let hash = 0;
for (let i = 0; i < str.length; i++) {
hash = ((hash << 5) - hash) + str.charCodeAt(i);
hash |= 0;
}
return hash;
}
通过这个全面的 MongoDB 副本集部署指南,您应该能够构建一个高可用、高性能的 MongoDB 集群,并具备完善的监控、备份和故障处理能力。记得定期进行故障转移测试,确保系统在真实故障时能够正常工作。