一、MongoDB自动化运维的重要性
1.1 运维挑战分析
MongoDB在生产环境中的运维工作量随着数据规模增长而显著增加,运维团队面临以下挑战:
- 重复性任务:备份、监控、索引管理等需要定期执行
- 人为错误:手动操作容易出错,特别是复杂操作
- 时间消耗:运维人员大量时间花在日常任务上
- 一致性问题:不同人执行相同任务可能结果不同
- 响应延迟:手动监控难以及时发现和响应问题
统计数据 :根据Gartner报告,68%的数据库问题 源于运维操作失误,而自动化可以将操作错误率降低85%。
1.2 自动化运维的价值
| 维度 | 价值 | 量化指标 |
|---|---|---|
| 效率提升 | 减少人工干预,加速任务执行 | 任务时间减少70% |
| 错误减少 | 消除人为操作错误 | 错误率降低85% |
| 一致性保证 | 标准化操作流程 | 任务一致性100% |
| 可扩展性 | 轻松处理更大规模数据 | 支持10倍数据增长 |
| 成本优化 | 减少运维人力需求 | 运维成本降低40% |
1.3 自动化运维的成熟度模型
| 级别 | 特征 | 指标 |
|---|---|---|
| 初始 | 手动操作,无脚本 | 0-10%任务自动化 |
| 可重复 | 有脚本,但不一致 | 10-30%任务自动化 |
| 已定义 | 标准化脚本和流程 | 30-60%任务自动化 |
| 已管理 | 全面自动化,有监控 | 60-90%任务自动化 |
| 优化 | 预测性自动化,AI辅助 | 90-100%任务自动化 |
二、日常维护任务分类与批量化思路
2.1 核心维护任务分类
| 任务类型 | 频率 | 手动难度 | 适合自动化程度 |
|---|---|---|---|
| 备份与恢复 | 每日/每小时 | 高 | 100% |
| 性能监控 | 持续 | 中 | 100% |
| 索引管理 | 每日/每周 | 高 | 80% |
| 数据清理 | 每日/每周 | 中 | 90% |
| 安全审计 | 每日/每周 | 高 | 75% |
| 配置管理 | 偶尔 | 低 | 60% |
| 集群管理 | 偶尔 | 高 | 85% |
2.2 批量化处理的核心思路
2.2.1 任务分解
将复杂任务分解为可重复执行的小单元:
- 原子化操作:每个脚本只做一件事
- 参数化:通过参数控制不同环境/场景
- 模块化:构建可复用的函数库
2.2.2 流程标准化
- 输入标准化:统一输入格式和校验
- 输出标准化:统一日志和报告格式
- 错误处理标准化:统一错误代码和处理方式
2.2.3 可维护性设计
- 文档化:每个脚本有清晰文档
- 版本控制:使用Git管理脚本
- 测试机制:确保脚本可靠性
三、核心维护任务的脚本实现
3.1 备份与恢复自动化
3.1.1 完整备份脚本
bash
#!/bin/bash
# mongodb-backup.sh
# MongoDB全量备份脚本
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/backup/mongodb/full"
RETENTION_DAYS=30
# 检查并创建备份目录
mkdir -p ${BACKUP_DIR}/${DATE}
echo "[$(date)] Starting backup to ${BACKUP_DIR}/${DATE}"
# 执行备份
mongodump --uri "mongodb://backup_user:backup_password@localhost:27017" \
--gzip \
--archive=${BACKUP_DIR}/${DATE}/full-backup-${DATE}.gz \
--excludeCollection=system.indexes \
--excludeCollection=system.namespaces
# 验证备份完整性
echo "[$(date)] Verifying backup integrity..."
mongorestore --validate \
--gzip \
--archive=${BACKUP_DIR}/${DATE}/full-backup-${DATE}.gz \
--noIndexRestore
# 清理旧备份
find ${BACKUP_DIR} -maxdepth 1 -type d -mtime +${RETENTION_DAYS} -exec rm -rf {} \;
# 记录备份元数据
echo "Backup completed: ${DATE}" >> ${BACKUP_DIR}/backup.log
echo "Location: ${BACKUP_DIR}/${DATE}" >> ${BACKUP_DIR}/backup.log
echo "Size: $(du -sh ${BACKUP_DIR}/${DATE})" >> ${BACKUP_DIR}/backup.log
echo "[$(date)] Backup completed successfully"
exit 0
3.1.2 增量备份脚本(Oplog备份)
bash
#!/bin/bash
# mongodb-incremental-backup.sh
# MongoDB增量备份脚本(基于Oplog)
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/backup/mongodb/incremental"
RETENTION_HOURS=24
# 获取最新备份时间点
LATEST_BACKUP=$(ls -t ${BACKUP_DIR} | head -n 1)
if [ -z "${LATEST_BACKUP}" ]; then
echo "No previous backup found. Running full backup instead..."
/backup/mongodb/mongodb-backup.sh
exit $?
fi
# 获取上次备份时间戳
LATEST_BACKUP_TIME=$(echo ${LATEST_BACKUP} | awk -F'-' '{print $1" "$2}')
LATEST_TIMESTAMP=$(date -d "${LATEST_BACKUP_TIME}" +%s)
# 计算当前时间戳
CURRENT_TIMESTAMP=$(date +%s)
DIFF_HOURS=$(( (CURRENT_TIMESTAMP - LATEST_TIMESTAMP) / 3600 ))
# 如果间隔大于24小时,执行全量备份
if [ ${DIFF_HOURS} -gt 24 ]; then
echo "Time since last backup (${DIFF_HOURS} hours) exceeds 24 hours. Running full backup..."
/backup/mongodb/mongodb-backup.sh
exit $?
fi
# 执行增量备份
mkdir -p ${BACKUP_DIR}/${DATE}
mongodump --uri "mongodb://backup_user:backup_password@localhost:27017" \
--gzip \
--archive=${BACKUP_DIR}/${DATE}/incremental-backup-${DATE}.gz \
--oplog
# 记录增量备份元数据
echo "Incremental backup: ${DATE}" >> ${BACKUP_DIR}/backup.log
echo "Based on: ${LATEST_BACKUP}" >> ${BACKUP_DIR}/backup.log
echo "Timestamp: ${LATEST_TIMESTAMP}" >> ${BACKUP_DIR}/backup.log
# 清理旧增量备份
find ${BACKUP_DIR} -maxdepth 1 -type d -mmin +${RETENTION_HOURS} -exec rm -rf {} \;
echo "[$(date)] Incremental backup completed"
exit 0
3.1.3 自动恢复测试脚本
bash
#!/bin/bash
# backup-recovery-test.sh
# 备份恢复测试脚本
TEST_DIR="/tmp/backup-recovery-test"
DATE=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="/backup/mongodb/full"
# 选择最新的完整备份
LATEST_BACKUP=$(ls -t ${BACKUP_DIR} | head -n 1)
if [ -z "${LATEST_BACKUP}" ]; then
echo "No backups found. Cannot run recovery test."
exit 1
fi
# 设置测试环境
mkdir -p ${TEST_DIR}
rm -rf ${TEST_DIR}/*
echo "[$(date)] Testing backup: ${BACKUP_DIR}/${LATEST_BACKUP}"
# 恢复备份
mongorestore --gzip \
--archive=${BACKUP_DIR}/${LATEST_BACKUP}/full-backup-${LATEST_BACKUP}.gz \
--drop \
--noIndexRestore \
--db=test_recovery \
--nsInclude="test.*"
# 验证恢复结果
DOC_COUNT=$(mongo --eval "db.getSiblingDB('test_recovery').sample.count()" | tail -1)
if [ ${DOC_COUNT} -eq 0 ]; then
echo "Recovery test FAILED: No documents found"
echo "[$(date)] Recovery test FAILED" >> /var/log/mongodb/recovery-test.log
exit 1
else
echo "Recovery test PASSED: ${DOC_COUNT} documents restored"
echo "[$(date)] Recovery test PASSED" >> /var/log/mongodb/recovery-test.log
fi
# 清理测试环境
rm -rf ${TEST_DIR}
# 发送测试报告
echo "MongoDB Backup Recovery Test - $(date)" | \
mail -s "MongoDB Backup Recovery Test Result" dba@yourcompany.com \
-a "/var/log/mongodb/recovery-test.log"
exit 0
3.2 性能监控与分析
3.2.1 实时性能监控脚本
bash
#!/bin/bash
# mongo-performance-monitor.sh
# MongoDB实时性能监控脚本
# 配置参数
INTERVAL=15 # 秒
DURATION=3600 # 监控持续时间(秒)
OUTPUT_DIR="/var/log/mongodb/performance"
THRESHOLDS="{
'connection_utilization': 0.8,
'query_latency_95': 500,
'lock_contention': 0.1
}"
# 创建输出目录
mkdir -p ${OUTPUT_DIR}
LOG_FILE="${OUTPUT_DIR}/performance-$(date +%Y%m%d).log"
# 性能监控主循环
end_time=$(( $(date +%s) + DURATION ))
while [ $(date +%s) -lt ${end_time} ]; do
# 获取当前时间戳
TIMESTAMP=$(date +%Y-%m-%dT%H:%M:%S)
# 收集性能指标
METRICS=$(mongo --eval '
var stats = db.serverStatus();
var opcounters = stats.opcounters;
var locks = stats.locks;
var connections = stats.connections;
var network = stats.network;
var lockContention = (locks["global"]["acquireCount"] > 0) ?
locks["global"]["acquireWaitCount"] / locks["global"]["acquireCount"] : 0;
var connectionUtilization = connections["current"] / connections["available"];
var queryLatency = 0;
try {
var profile = db.getSiblingDB("local").system.profile.findOne(
{ "op": "query" },
{ "millis": 1 }
);
if (profile) {
queryLatency = profile.millis;
}
} catch (e) {
// Ignore errors
}
print(JSON.stringify({
timestamp: new Date().toISOString(),
connection_utilization: connectionUtilization,
lock_contention: lockContention,
query_latency: queryLatency,
opcounters: opcounters,
connections: connections,
network: network
}));
')
# 分析指标
ANALYSIS=$(echo "${METRICS}" | python3 -c '
import json, sys, datetime
# 从标准输入读取指标
metrics = json.load(sys.stdin)
thresholds = {k: float(v) for k, v in sys.argv[1].strip("{}").split(",")}
anomalies = []
# 检查连接利用率
if metrics["connection_utilization"] > thresholds["connection_utilization"]:
anomalies.append(f"High connection utilization: {metrics["connection_utilization"]:.2f} > {thresholds["connection_utilization"]}")
# 检查查询延迟
if metrics["query_latency"] > thresholds["query_latency_95"]:
anomalies.append(f"High query latency: {metrics["query_latency"]}ms > {thresholds["query_latency_95"]}ms")
# 检查锁竞争
if metrics["lock_contention"] > thresholds["lock_contention"]:
anomalies.append(f"High lock contention: {metrics["lock_contention"]:.2%} > {thresholds["lock_contention"]:.2%}")
# 生成分析结果
result = {
"timestamp": metrics["timestamp"],
"anomalies": anomalies,
"metrics": {
"connection_utilization": f"{metrics["connection_utilization"]:.2f}",
"query_latency": f"{metrics["query_latency"]}ms",
"lock_contention": f"{metrics["lock_contention"]:.2%}"
}
}
print(json.dumps(result))
' "${THRESHOLDS}")
# 记录结果
echo "${METRICS}" >> ${LOG_FILE}
echo "${ANALYSIS}" >> ${LOG_FILE}
# 睡眠指定间隔
sleep ${INTERVAL}
done
echo "[$(date)] Performance monitoring completed"
3.2.2 业务关键查询监控
javascript
// critical-queries-monitor.js
const { MongoClient } = require('mongodb');
const fs = require('fs');
const path = require('path');
// 配置
const MONGODB_URI = "mongodb://app_user:app_password@localhost:27017";
const CRITICAL_QUERIES = [
{
name: "CustomerOrderSummary",
query: { userId: { $exists: true } },
collection: "orders",
threshold: 1000 // ms
},
{
name: "ProductSearch",
query: { searchTerm: { $exists: true } },
collection: "products",
threshold: 500
}
];
const OUTPUT_DIR = "/var/log/mongodb/critical-queries";
const LOG_FILE = path.join(OUTPUT_DIR, `query-monitor-${new Date().toISOString().split('T')[0]}.log`);
// 创建输出目录
if (!fs.existsSync(OUTPUT_DIR)) {
fs.mkdirSync(OUTPUT_DIR, { recursive: true });
}
async function monitorCriticalQueries() {
const client = new MongoClient(MONGODB_URI, {
useNewUrlParser: true,
useUnifiedTopology: true
});
try {
await client.connect();
console.log(`Connected to MongoDB at ${new Date()}`);
for (const query of CRITICAL_QUERIES) {
const start = Date.now();
const result = await client.db().collection(query.collection).find(query.query).limit(1).toArray();
const duration = Date.now() - start;
// 检查是否超过阈值
if (duration > query.threshold) {
const logEntry = {
timestamp: new Date().toISOString(),
query: query.name,
duration: duration,
threshold: query.threshold,
resultCount: result.length,
warning: "Query exceeded threshold"
};
// 记录警告
fs.appendFileSync(LOG_FILE, JSON.stringify(logEntry) + '\n');
console.warn(`WARNING: ${query.name} took ${duration}ms (threshold: ${query.threshold}ms)`);
}
}
} catch (error) {
console.error("Error monitoring queries:", error);
const errorLog = {
timestamp: new Date().toISOString(),
error: error.message,
stack: error.stack
};
fs.appendFileSync(LOG_FILE, JSON.stringify(errorLog) + '\n');
} finally {
await client.close();
}
}
// 定期执行监控
setInterval(monitorCriticalQueries, 30000); // 每30秒
// 初始执行
monitorCriticalQueries();
3.3 索引管理自动化
3.3.1 索引分析与优化脚本
bash
#!/bin/bash
# index-optimizer.sh
# MongoDB索引分析与优化脚本
OUTPUT_DIR="/var/log/mongodb/index-optimization"
DATE=$(date +%Y%m%d)
# 创建输出目录
mkdir -p ${OUTPUT_DIR}/${DATE}
REPORT_FILE="${OUTPUT_DIR}/${DATE}/index-optimization-report-${DATE}.txt"
# 分析索引使用情况
mongo --eval '
var db = db.getSiblingDB("admin");
var indexStats = [];
// 获取所有数据库
var dbs = db.getMongo().getDBs().filter(db => !db.name.startsWith("system."));
dbs.forEach(function(dbInfo) {
var db = db.getSiblingDB(dbInfo.name);
// 获取所有集合
db.getCollectionNames().forEach(function(collectionName) {
try {
// 获取索引统计
var indexStatsResult = db[collectionName].aggregate([
{ $indexStats: {} },
{ $match: { "accesses.ops": { $gt: 0 } }
]).toArray();
indexStatsResult.forEach(function(indexStat) {
indexStat.db = dbInfo.name;
indexStat.collection = collectionName;
});
indexStats = indexStats.concat(indexStatsResult);
} catch (e) {
// 忽略错误
}
});
});
// 分析索引
var analysis = {
timestamp: new Date().toISOString(),
totalIndexes: indexStats.length,
unusedIndexes: indexStats.filter(i => i.accesses.ops === 0).length,
highLatencyIndexes: indexStats.filter(i => i.accesses.latency > 1000).length,
indexStats: indexStats
};
print(JSON.stringify(analysis));
' > ${REPORT_FILE}
# 生成优化建议
echo -e "\n--- Optimization Recommendations ---" >> ${REPORT_FILE}
# 未使用索引建议
UNUSED_COUNT=$(grep '"unusedIndexes"' ${REPORT_FILE} | grep -o '[0-9]\+')
if [ ${UNUSED_COUNT} -gt 0 ]; then
echo "Recommendation: Remove ${UNUSED_COUNT} unused indexes" >> ${REPORT_FILE}
grep '"unusedIndexes"' -A 10 ${REPORT_FILE} | grep '"accesses"' | head -n ${UNUSED_COUNT} >> ${REPORT_FILE}
fi
# 高延迟索引建议
HIGH_LATENCY_COUNT=$(grep '"highLatencyIndexes"' ${REPORT_FILE} | grep -o '[0-9]\+')
if [ ${HIGH_LATENCY_COUNT} -gt 0 ]; then
echo "Recommendation: Optimize ${HIGH_LATENCY_COUNT} high-latency indexes" >> ${REPORT_FILE}
grep '"highLatencyIndexes"' -A 10 ${REPORT_FILE} | grep '"accesses"' | head -n ${HIGH_LATENCY_COUNT} >> ${REPORT_FILE}
fi
# 发送报告
cat ${REPORT_FILE} | mail -s "MongoDB Index Optimization Report - $(date +%Y-%m-%d)" dba@yourcompany.com
3.3.2 索引重建脚本
bash
#!/bin/bash
# index-rebuilder.sh
# MongoDB索引重建脚本
# 配置
INDEXES_TO_REBUILD=(
"db.orders.createIndex({ orderDate: -1 })"
"db.products.createIndex({ category: 1, price: -1 })"
)
# 执行重建
for indexCmd in "${INDEXES_TO_REBUILD[@]}"; do
echo "[$(date)] Rebuilding index: ${indexCmd}"
# 执行重建
mongo --eval "${indexCmd}.dropDups = true" --eval "print('Index creation started')"
# 检查状态
mongo --eval '
var status = db.currentOp({ "msg": "Index build" });
print(JSON.stringify(status));
'
echo "[$(date)] Index rebuilt: ${indexCmd}"
done
# 发送完成通知
echo "Index rebuild completed on $(date)" | \
mail -s "MongoDB Index Rebuild Complete" dba@yourcompany.com
3.4 数据清理与归档
3.4.1 数据归档脚本
bash
#!/bin/bash
# data-archiver.sh
# MongoDB数据归档脚本
SOURCE_DB="production"
SOURCE_COLLECTION="events"
ARCHIVE_DB="archive"
ARCHIVE_COLLECTION="events"
RETENTION_DAYS=365
# 日期计算
THRESHOLD_DATE=$(date -d "-${RETENTION_DAYS} days" +%Y-%m-%d)
# 创建归档集合(如果不存在)
mongo --eval "
db.getSiblingDB('${ARCHIVE_DB}').createCollection('${ARCHIVE_COLLECTION}');
print('Archive collection created');
"
# 归档数据
echo "[$(date)] Starting data archiving for ${SOURCE_DB}.${SOURCE_COLLECTION}"
mongo --eval "
var thresholdDate = new Date('${THRESHOLD_DATE}');
var count = db.getSiblingDB('${SOURCE_DB}').${SOURCE_COLLECTION}.count({ eventTime: { \$lt: thresholdDate } });
print('Found ' + count + ' documents to archive');
// 创建归档集合
db.getSiblingDB('${ARCHIVE_DB}').createCollection('${ARCHIVE_COLLECTION}');
// 归档数据
db.getSiblingDB('${SOURCE_DB}').${SOURCE_COLLECTION}.aggregate([
{ \$match: { eventTime: { \$lt: thresholdDate } } },
{ \$out: { db: '${ARCHIVE_DB}', coll: '${ARCHIVE_COLLECTION}' } }
]);
// 删除已归档数据
db.getSiblingDB('${SOURCE_DB}').${SOURCE_COLLECTION}.deleteMany({ eventTime: { \$lt: thresholdDate } });
print('Archived ' + count + ' documents');
"
3.4.2 空间回收脚本
bash
#!/bin/bash
# space-reclaimer.sh
# MongoDB空间回收脚本
# 配置
DATABASES=("production" "analytics")
THRESHOLD_PERCENT=70
# 执行空间回收
for db_name in "${DATABASES[@]}"; do
echo "[$(date)] Processing database: ${db_name}"
# 检查存储利用率
STORAGE_INFO=$(mongo --eval "
var stats = db.getSiblingDB('${db_name}').stats(1024);
print(JSON.stringify({
name: '${db_name}',
totalSize: stats.totalSize,
dataSize: stats.dataSize,
storageSize: stats.storageSize,
freeStorage: stats.freeStorageSize,
storageUtilization: (stats.storageSize - stats.freeStorageSize) / stats.storageSize
}));
")
# 解析JSON
STORAGE_UTILIZATION=$(echo ${STORAGE_INFO} | grep -o '"storageUtilization":[0-9.]\+' | awk -F':' '{print $2}' | tr -d '"')
# 检查是否需要空间回收
if (( $(echo "${STORAGE_UTILIZATION} > ${THRESHOLD_PERCENT}/100" | bc -l) )); then
echo "[$(date)] Storage utilization (${STORAGE_UTILIZATION}) exceeds threshold (${THRESHOLD_PERCENT}%)"
# 执行压缩
echo "[$(date)] Starting compaction for ${db_name}..."
mongo --eval "
db.getSiblingDB('${db_name}').runCommand({
compact: 'collection_name',
force: true
});
"
# 验证压缩效果
echo "[$(date)] Verifying compaction results..."
mongo --eval "
var stats = db.getSiblingDB('${db_name}').stats(1024);
print('New storage utilization: ' + ((stats.storageSize - stats.freeStorageSize) / stats.storageSize));
"
else
echo "[$(date)] Storage utilization (${STORAGE_UTILIZATION}) below threshold (${THRESHOLD_PERCENT}%) - no action needed"
fi
done
3.5 安全审计自动化
3.5.1 用户权限审计脚本
bash
#!/bin/bash
# user-audit.sh
# MongoDB用户权限审计脚本
AUDIT_DIR="/var/log/mongodb/security-audit"
DATE=$(date +%Y%m%d)
REPORT_FILE="${AUDIT_DIR}/user-audit-${DATE}.txt"
# 创建输出目录
mkdir -p ${AUDIT_DIR}
# 执行用户权限审计
mongo --eval '
var users = db.getSiblingDB("admin").system.users.find().toArray();
var audit = {
timestamp: new Date().toISOString(),
totalUsers: users.length,
users: users.map(user => ({
user: user.user,
db: user.db,
roles: user.roles,
restrictions: user.authenticationRestrictions || []
}))
};
// 检查高风险用户
var highRiskUsers = users.filter(user => {
return user.roles.some(role => role.role === "root" || role.role === "dbOwner");
});
audit.highRiskUsers = highRiskUsers.map(user => ({
user: user.user,
db: user.db,
roles: user.roles
}));
print(JSON.stringify(audit));
' > ${REPORT_FILE}
# 生成合规性报告
echo -e "\n--- Compliance Report ---" >> ${REPORT_FILE}
echo "Total users: $(grep '"totalUsers"' ${REPORT_FILE} | awk -F':' '{print $2}' | tr -d '", ')" >> ${REPORT_FILE}
echo "High-risk users: $(grep -c '"highRiskUsers"' ${REPORT_FILE})" >> ${REPORT_FILE}
# 发送报告
cat ${REPORT_FILE} | mail -s "MongoDB User Audit Report - $(date +%Y-%m-%d)" security@yourcompany.com
3.5.2 配置安全审计脚本
bash
#!/bin/bash
# config-audit.sh
# MongoDB配置安全审计脚本
AUDIT_DIR="/var/log/mongodb/security-audit"
DATE=$(date +%Y%m%d)
REPORT_FILE="${AUDIT_DIR}/config-audit-${DATE}.txt"
# 创建输出目录
mkdir -p ${AUDIT_DIR}
# 执行配置审计
mongo --eval '
var config = {
timestamp: new Date().toISOString(),
security: {
authorization: db.adminCommand({getParameter:1}).security.authorization,
keyFile: db.adminCommand({getParameter:1}).security.keyFile,
ssl: db.adminCommand({getParameter:1}).net.ssl.mode
},
network: {
bindIp: db.adminCommand({getParameter:1}).net.bindIp,
http: db.adminCommand({getParameter:1}).net.http.enabled
},
audit: db.adminCommand({getParameter:1}).auditLog
};
// 检查安全问题
var issues = [];
if (config.security.authorization !== "enabled") {
issues.push("Authentication not enabled");
}
if (config.network.bindIp === "0.0.0.0") {
issues.push("Bound to all interfaces");
}
if (config.network.http === true) {
issues.push("HTTP interface enabled");
}
if (!config.audit) {
issues.push("Audit logging not configured");
}
config.issues = issues;
print(JSON.stringify(config));
' > ${REPORT_FILE}
# 生成安全报告
echo -e "\n--- Security Risk Report ---" >> ${REPORT_FILE}
ISSUE_COUNT=$(grep -c '"issues":\[' ${REPORT_FILE})
if [ ${ISSUE_COUNT} -gt 0 ]; then
echo "Security issues found: ${ISSUE_COUNT}" >> ${REPORT_FILE}
grep '"issues"' -A 10 ${REPORT_FILE} >> ${REPORT_FILE}
else
echo "No security issues found" >> ${REPORT_FILE}
fi
# 发送报告
cat ${REPORT_FILE} | mail -s "MongoDB Configuration Audit Report - $(date +%Y-%m-%d)" security@yourcompany.com
四、脚本框架与最佳实践
4.1 脚本模板框架
bash
#!/bin/bash
# script-name.sh
# 用途描述
# 1. 文档注释
# - 作者
# - 用途
# - 参数说明
# - 依赖项
# 2. 配置部分(可配置的参数)
CONFIG_DIR="/path/to/config"
LOG_DIR="/path/to/logs"
THRESHOLD=70
# 3. 依赖检查
function check_dependencies() {
# 检查必要工具
local missing=false
for cmd in mongodump mongo mongorestore; do
if ! command -v ${cmd} &> /dev/null; then
echo "Error: ${cmd} is required but not installed"
missing=true
fi
done
${missing} && exit 1
}
# 4. 函数定义
function function_name() {
# 函数逻辑
}
# 5. 主流程
function main() {
# 主流程逻辑
}
# 6. 错误处理
function error_handler() {
# 错误处理逻辑
}
trap error_handler ERR
# 7. 脚本入口
check_dependencies
main
4.2 通用函数库设计
4.2.1 配置加载函数
bash
# config-loader.sh
function load_config() {
local config_file=$1
local default_value=$2
if [ -f "${config_file}" ]; then
cat ${config_file}
else
echo "${default_value}"
fi
}
# 使用示例
BACKUP_DIR=$(load_config "/etc/mongodb/backup-dir.conf" "/backup/mongodb")
RETENTION_DAYS=$(load_config "/etc/mongodb/retention.conf" "30")
4.2.2 错误处理函数
bash
# error-handler.sh
function log_error() {
local message=$1
local code=${2:-1}
echo "[$(date)] ERROR: ${message}" | tee -a ${LOG_FILE}
exit ${code}
}
function log_warning() {
local message=$1
echo "[$(date)] WARNING: ${message}" | tee -a ${LOG_FILE}
}
function log_info() {
local message=$1
echo "[$(date)] INFO: ${message}" | tee -a ${LOG_FILE}
}
4.2.3 邮件通知函数
bash
# notification.sh
function send_alert() {
local subject=$1
local message=$2
local severity=${3:-"warning"}
# 邮件模板
local email_body="Subject: ${subject}
From: mongodb-monitor@yourcompany.com
To: dba@yourcompany.com
MIME-Version: 1.0
Content-Type: text/html
<div style='font-family: Arial, sans-serif;'>
<h2>${subject}</h2>
<p>${message}</p>
<p style='color: #666; font-size: 0.9em;'>Timestamp: $(date)</p>
</div>
"
# 发送邮件
echo -e "${email_body}" | sendmail -t
}
4.3 日志记录最佳实践
4.3.1 日志级别
| 级别 | 用途 | 示例 |
|---|---|---|
| DEBUG | 开发/调试 | 详细操作步骤 |
| INFO | 常规操作 | 任务开始/结束 |
| WARN | 潜在问题 | 可能影响的非关键问题 |
| ERROR | 严重问题 | 导致任务失败的问题 |
| FATAL | 致命错误 | 系统级错误 |
4.3.2 日志格式化
bash
# 日志格式函数
function log() {
local level=$1
local message=$2
local timestamp=$(date +"%Y-%m-%d %H:%M:%S")
# 日志格式:时间戳 | 级别 | 消息
echo "${timestamp} | ${level} | ${message}"
}
# 使用示例
log "INFO" "Starting backup process"
log "WARN" "Only 10% disk space remaining"
log "ERROR" "Backup failed: Permission denied"
五、错误处理与日志记录
5.1 错误处理策略
5.1.1 错误分类与处理
| 错误类型 | 处理策略 | 代码示例 |
|---|---|---|
| 可恢复错误 | 重试机制 | 3次重试,指数退避 |
| 配置错误 | 详细日志,终止执行 | 退出并记录配置问题 |
| 数据错误 | 部分处理,记录问题数据 | 跳过问题数据,继续处理 |
| 资源不足 | 延迟执行,通知管理员 | 调整策略,避免失败 |
5.1.2 重试机制实现
bash
#!/bin/bash
# retry-mechanism.sh
function run_with_retry() {
local max_attempts=3
local backoff=2
local attempt=0
local exit_code=0
while [ ${attempt} -lt ${max_attempts} ]; do
# 执行命令
"$@"
exit_code=$?
# 检查是否成功
if [ ${exit_code} -eq 0 ]; then
break
fi
# 重试前等待
attempt=$((attempt+1))
if [ ${attempt} -lt ${max_attempts} ]; then
sleep $((backoff ** attempt))
fi
done
return ${exit_code}
}
# 使用示例
run_with_retry mongodump --uri "mongodb://user:pass@localhost" --out /backup
5.2 日志记录最佳实践
5.2.1 结构化日志
bash
# structured-logging.sh
function log_structured() {
local level=$1
local message=$2
local extra_data=$3
local timestamp=$(date +"%Y-%m-%dT%H:%M:%S.%3N")
local log_entry="{
\"timestamp\": \"${timestamp}\",
\"level\": \"${level}\",
\"message\": \"${message}\",
${extra_data}
}"
echo "${log_entry}" >> ${LOG_FILE}
}
# 使用示例
log_structured "INFO" "Backup completed" '"backup_id":"20230515-103000", "size":"1.2GB"'
log_structured "WARN" "Low disk space" '"remaining_space":"10%", "threshold":"20%"'
5.2.2 日志轮转与归档
bash
# log-rotation.sh
LOG_DIR="/var/log/mongodb"
RETENTION_DAYS=30
# 轮转日志
for log_file in ${LOG_DIR}/*.log; do
# 重命名当前日志
mv ${log_file} ${log_file}.1
# 压缩旧日志
gzip ${log_file}.1
# 清理旧日志
find ${LOG_DIR} -name "*.log.gz" -mtime +${RETENTION_DAYS} -delete
done
六、调度与执行策略
6.1 调度策略
6.1.1 Cron调度示例
bash
# MongoDB运维任务Cron配置
0 2 * * * /backup/mongodb/mongodb-backup.sh
*/15 * * * * /monitoring/mongodb-performance-monitor.sh
0 3 * * * /scripts/index-optimizer.sh
0 4 * * * /scripts/data-archiver.sh
0 5 * * * /security/user-audit.sh
6.1.2 优先级调度
+-------------------+ +-------------------+
| High Priority | | Normal Priority |
| (Every 5 min) | | (Every 1 hour) |
| - Real-time | | - Backup |
| monitoring | | - Index analysis |
| - Critical query | | - Data archiving |
| checks | | |
+-------------------+ +-------------------+
6.2 任务执行管理
6.2.1 任务互斥执行
bash
#!/bin/bash
# mutex-lock.sh
# 确保脚本互斥执行
LOCK_FILE="/tmp/mongodb-backup.lock"
# 创建锁
if [ -e ${LOCK_FILE} ]; then
# 检查进程是否还存在
PID=$(cat ${LOCK_FILE})
if kill -0 ${PID} 2>/dev/null; then
echo "Another instance is running (PID: ${PID})"
exit 1
fi
fi
# 创建新的锁文件
echo $$ > ${LOCK_FILE}
trap "rm -f ${LOCK_FILE}; exit" INT TERM EXIT
# 主逻辑
echo "Running with PID $$"
sleep 30 # 模拟工作
# 正常退出
exit 0
6.2.2 任务依赖管理
bash
#!/bin/bash
# dependency-manager.sh
# 管理任务依赖
BACKUP_COMPLETE="/tmp/backup-complete.flag"
# 任务A:备份
run_backup() {
/backup/mongodb/mongodb-backup.sh
if [ $? -eq 0 ]; then
touch ${BACKUP_COMPLETE}
fi
}
# 任务B:分析
run_analysis() {
if [ -f ${BACKUP_COMPLETE} ]; then
/analysis/mongodb-backup-analysis.sh
rm -f ${BACKUP_COMPLETE}
else
echo "Backup not completed. Skipping analysis."
fi
}
# 执行
run_backup
run_analysis
七、高级技巧与实用案例
7.1 动态参数传递
bash
#!/bin/bash
# dynamic-parameters.sh
# 使用默认值
DB_HOST="localhost"
DB_PORT="27017"
BACKUP_DIR="/backup/mongodb"
# 解析命令行参数
while [[ $# -gt 0 ]]; do
case $1 in
--host=*)
DB_HOST="${1#*=}"
shift
;;
--port=*)
DB_PORT="${1#*=}"
shift
;;
--backup-dir=*)
BACKUP_DIR="${1#*=}"
shift
;;
*)
echo "Unknown option: $1"
exit 1
;;
esac
done
# 使用参数
echo "Backing up to ${BACKUP_DIR} from ${DB_HOST}:${DB_PORT}"
使用示例:
bash
./dynamic-parameters.sh --host=192.168.1.10 --port=27017 --backup-dir=/backup/alt
7.2 事务性操作模式
bash
#!/bin/bash
# transactional-operations.sh
# 事务性数据迁移
SOURCE_DB="source_db"
TARGET_DB="target_db"
MIGRATION_ID=$(date +%Y%m%d%H%M%S)
# 1. 创建临时集合
mongo --eval "db.getSiblingDB('${TARGET_DB}').createCollection('temp_${MIGRATION_ID}')" || exit 1
# 2. 执行数据迁移
mongo --eval "
db.getSiblingDB('${SOURCE_DB}').collection.find().forEach(function(doc) {
db.getSiblingDB('${TARGET_DB}').getCollection('temp_${MIGRATION_ID}').insert(doc);
});
"
# 3. 验证数据完整性
SOURCE_COUNT=$(mongo --eval "db.getSiblingDB('${SOURCE_DB}').collection.count()" | tail -1)
TARGET_COUNT=$(mongo --eval "db.getSiblingDB('${TARGET_DB}').getCollection('temp_${MIGRATION_ID}').count()" | tail -1)
if [ ${SOURCE_COUNT} -ne ${TARGET_COUNT} ]; then
echo "Data integrity check failed. Rolling back..."
mongo --eval "db.getSiblingDB('${TARGET_DB}').dropDatabase()"
exit 1
fi
# 4. 原子切换
mongo --eval "
db.getSiblingDB('${TARGET_DB}').renameCollection('temp_${MIGRATION_ID}', 'collection', true);
"
echo "Migration completed successfully"
7.3 云环境适配
7.3.1 AWS特定功能
bash
#!/bin/bash
# aws-specific-functions.sh
# 上传备份到S3
function upload_to_s3() {
local backup_file=$1
local s3_bucket=$2
aws s3 cp ${backup_file} s3://${s3_bucket}/$(basename ${backup_file}) \
--storage-class STANDARD_IA
# 验证上传
if [ $? -eq 0 ]; then
echo "Successfully uploaded to S3: ${s3_bucket}"
return 0
else
echo "Failed to upload to S3"
return 1
fi
}
# 使用示例
if upload_to_s3 "/backup/mongodb/full/20230515/full-backup-20230515.gz" "my-mongodb-backups"; then
echo "Backup successfully uploaded to S3"
else
# 处理失败
echo "Backup failed to upload to S3. Using local backup as fallback"
exit 1
fi
7.3.2 Kubernetes集成
bash
#!/bin/bash
# k8s-integration.sh
# 在Kubernetes中执行命令
function k8s_exec() {
local pod=$1
local command=$2
local namespace=${3:-"default"}
kubectl -n ${namespace} exec ${pod} -- ${command}
}
# 使用示例
PRIMARY_POD=$(kubectl get pods -l app=mongo -o jsonpath='{.items[?(@.metadata.annotations.role=="primary")].metadata.name}')
k8s_exec ${PRIMARY_POD} "mongo --eval 'rs.status()'"
八、结论与建议
8.1 实施路线图
-
评估阶段(1-2周):
- 识别关键运维任务
- 评估当前自动化水平
- 制定优先级计划
-
开发阶段(2-4周):
- 开发核心维护任务脚本
- 建立基础框架
- 编写测试用例
-
测试阶段(1-2周):
- 在测试环境验证
- 修复问题
- 优化性能
-
部署阶段(1周):
- 在生产环境逐步部署
- 监控执行情况
- 收集反馈
-
优化阶段(持续):
- 根据反馈持续改进
- 扩展自动化范围
- 优化性能与可靠性
8.2 关键成功因素
| 因素 | 说明 | 实施建议 |
|---|---|---|
| 渐进式实施 | 从小范围开始 | 优先实现高价值任务 |
| 错误处理完善 | 避免脚本级联失败 | 实现健壮的错误处理机制 |
| 文档充分 | 便于团队协作 | 为每个脚本提供详细文档 |
| 测试充分 | 确保脚本可靠性 | 实现自动化测试 |
| 监控与告警 | 及时发现问题 | 为关键脚本添加监控 |
8.3 未来展望
- AI辅助运维:通过机器学习预测潜在问题
- 自动化决策:根据分析结果自动执行修复
- 云原生集成:与Kubernetes等平台深度集成
- 可视化工作流:通过可视化界面管理运维任务
关键提示 :自动化运维不是"一劳永逸"的项目,而是一个持续演进的过程 。随着业务发展和技术进步,您的自动化脚本也需要不断优化和扩展。真正的自动化价值不仅在于减少人工操作,而在于提高系统的整体可靠性和可维护性。
通过实施本指南中的自动化策略,您的MongoDB环境将获得显著的运维效率提升,同时降低操作风险。记住,好的自动化脚本应该像一位优秀的DBA一样,可靠、高效且易于管理。
附录:常用运维脚本速查表
bash
# 常用命令速查
# 执行备份
/backup/mongodb/mongodb-backup.sh
# 手动执行索引优化
/index-optimizer.sh
# 检查当前备份状态
cat /var/log/mongodb/backup.log | tail -n 10
# 运行性能分析
/monitoring/mongodb-performance-monitor.sh
# 发送测试邮件
echo "Test message" | mail -s "Test" dba@yourcompany.com
# 查看脚本状态
ps aux | grep -E "backup|monitor"