ReadLatency:连续3个5分钟读延迟20ms告警
bash
# Alert when read latency exceeds 20ms
aws cloudwatch put-metric-alarm \
--alarm-name "rds-mydb-read-latency-high" \
--alarm-description "RDS read latency above 20ms" \
--metric-name ReadLatency \
--namespace AWS/RDS \
--statistic Average \
--period 300 \
--evaluation-periods 3 \
--threshold 0.02 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=DBInstanceIdentifier,Value=my-database \
--alarm-actions arn:aws:sns:us-east-1:1242436:rds-critical-alerts
WriteLatency:连续3个5分钟写延迟20ms告警
bash
# Alert when write latency exceeds 20ms
aws cloudwatch put-metric-alarm \
--alarm-name "rds-mydb-write-latency-high" \
--alarm-description "RDS write latency above 20ms" \
--metric-name WriteLatency \
--namespace AWS/RDS \
--statistic Average \
--period 300 \
--evaluation-periods 3 \
--threshold 0.02 \
--comparison-operator GreaterThanThreshold \
--dimensions Name=DBInstanceIdentifier,Value=my-database \
--alarm-actions arn:aws:sns:us-east-1:1231423523:rds-critical-alerts
DBLoad:连续3个5分钟CPU平均负载超80%告警
bash
aws cloudwatch put-metric-alarm \
--alarm-name "aurora-dbload-writer-high" \
--alarm-description "DBLoad 超过 1.6 (vCPU 80%)" \
--metric-name "DBLoad" \
--namespace "AWS/RDS" \
--statistic "Average" \
--period 300 \
--evaluation-periods 3 \
--threshold 1.6 \
--comparison-operator "GreaterThanThreshold" \
--dimensions Name=DBInstanceIdentifier,Value=你的Writer实例ID
这个1.6是怎么来的。我的AWS数据库实例是db.r5.large类型,这个类型只有2个vCPU,所以2*0.8=1.6,意思CPU平均负载超80%。
AuroraVolumeBytesLeftTotal:剩余磁盘空间不足50GB告警
bash
aws cloudwatch put-metric-alarm \
--alarm-name "Aurora-Storage-Left-Critical-30GB" \
--alarm-description "Aurora 剩余存储低于 50GB,紧急!" \
--metric-name AuroraVolumeBytesLeftTotal \
--namespace AWS/RDS \
--statistic Average \
--period 300 \
--evaluation-periods 3 \
--threshold 53687091200 \
--comparison-operator LessThanOrEqualToThreshold \
--dimensions Name=DBClusterIdentifier,Value=你的集群名 \
--alarm-actions arn:aws:sns:us-west-2:123456789012:MyAlarmTopic \
--ok-actions arn:aws:sns:us-west-2:123456789012:MyAlarmTopic \
--treat-missing-data notBreaching