AWS 亚马逊云预警通知接入钉钉告警(微信同样适用)

由于 AWS 不支持 Webhook 等通知渠道,因此借助 Lambda 函数作为中继,解析 SNS 消息,再推送至钉钉(或企业微信)。

架构

部署步骤

创建钉钉机器人

步骤如下:

  1. 创建钉钉通知群【创建 普通群 免费,不消耗企业资源
  2. 找到 "群设置" -> "机器人" -> "添加机器人"
  1. ‼️ 安全设置中添加关键词:Alarm【下面代码中,通知的内容默认会包含该词】
  1. 复制 Webhook 中的 access_token【后续 Lambda 代码中需要使用】

Lambda 部署钉钉通知函数

在 Lambda 配置的步骤大体如下:

  1. 📦 本地打包代码与环境依赖(这里使用 Python 开发,代码文件内容为空也没事,后续在 Lambda 控制台可以直接调整):
bash 复制代码
# 1) 创建打包目录
mkdir -p lambda_pkg && cd lambda_pkg

# 2) 安装依赖到当前目录(这里依赖 requests)
pip install --upgrade pip
pip install requests -t .

# 3) 放入你的代码文件
cp /path/to/lambda_function.py .

# 4) 打包(注意:zip 内应是文件本身,而非外层文件夹)
zip -r ../lambda.zip .

# 5) 返回上一层(得到的 lambda.zip 就能上传了)
cd ..
  1. Lambda 控制台创建函数:导航栏 "函数" -> "创建函数"
  1. 将第 1 步中打包的 lambda.zip 上传:
  1. 配置环境变量 token,将之前创建钉钉机器人时获取的 access_token 传入
  1. (按需)修改代码
  2. 部署并测试

创建 SNS 主题及订阅

(按需)创建主题

若之前已创建,可跳过,直接创建订阅即可。

创建 "标准" SNS 即可,其他配置无特殊需求保持默认。

创建订阅

  1. 协议选择 "AWS Lambda"
  2. 终端节点选择第 2 步中创建的

附录

代码

python 复制代码
import requests
import json
import os

def send_msg(title, msg):
    token = os.getenv('token')
    url = "https://oapi.dingtalk.com/robot/send?access_token="
    url = url + token
    headers = {'Content-Type': 'application/json'}
    values = values = f'{{"msgtype":"markdown","markdown":{{"title": "{title}","text": "{msg}"}}}}'
    
    print(values)
    request = requests.post(url, values,headers=headers)
    return request.text

def lambda_handler(event, context):
    print(event)
    
    try:
        Message = json.loads(event['Records'][0]['Sns']['Message'])
        alarmName = Message['AlarmName']
        alarmDescription = Message['AlarmDescription']
        newStateValue = Message['NewStateValue']
        timestamp = event['Records'][0]['Sns']['Timestamp']
        newStateReason = json.loads(event['Records'][0]['Sns']['Message'])['NewStateReason']
        region = Message['Region']
        
        msg = f"**预警**: {alarmName}\n\n**地域**: {region}\n\n**当前状态**: {newStateValue}\n\n**触发原因**: {newStateReason}\n\n**触发时间**: {timestamp}\n\n**预警描述**: {alarmDescription}"
        print(msg)
        send_msg(alarmName, msg)
    except json.JSONDecodeError:
        # 触发器不是 CloudWatch,直接发送原始消息
        raw_message = event['Records'][0]['Sns']['Message']
        print(raw_message)
        send_msg("预警通知", raw_message)
    except KeyError as e:
        # 缺少必要字段时,作为解析失败处理
        raw_message = event['Records'][0]['Sns']['Message']
        timestamp = event['Records'][0]['Sns']['Timestamp']
        
        msg = f"**预警消息解析失败**\n\n**原始消息**: {raw_message}\n\n**时间**: {timestamp}\n\n**错误**: 缺少必要字段 {str(e)}"
        print(f"字段缺失: {e}")
        print(msg)
        send_msg("预警消息解析失败", msg)

测试用的 SNS 事件

CloudWatch 指标预警

这里的 Event 是模拟的从 CloudWatch -> SNS -> Lambda 的内容。

json 复制代码
{
    "Records": [
        {
            "EventSource": "aws:sns",
            "EventVersion": "1.0",
            "EventSubscriptionArn": "arn:aws:sns:********",
            "Sns": {
                "Type": "Notification",
                "MessageId": "b46076bd-********-17a94cdbfbd5",
                "TopicArn": "arn:aws:sns:********",
                "Message": {
                    "AlarmName": "test_1 内存利用率过高告警",
                    "AlarmDescription": null,
                    "AWSAccountId": "514986213302",
                    "AlarmConfigurationUpdatedTimestamp": "2025-08-12T07:39:49.733+0000",
                    "NewStateValue": "ALARM",
                    "NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [42.067474122272536 (12/08/25 07:36:00)] was greater than or equal to the threshold (40.0) (minimum 1 datapoint for OK -> ALARM transition).",
                    "StateChangeTime": "2025-08-12T07:41:05.608+0000",
                    "Region": "US West (Oregon)",
                    "AlarmArn": "arn:aws:cloudwatch:********",
                    "OldStateValue": "OK",
                    "OKActions": [

                    ],
                    "AlarmActions": [
                        "arn:aws:sns:********"
                    ],
                    "InsufficientDataActions": [

                    ],
                    "Trigger": {
                        "MetricName": "mem_used_percent",
                        "Namespace": "CWAgent",
                        "StatisticType": "ExtendedStatistic",
                        "ExtendedStatistic": "p90",
                        "Unit": null,
                        "Dimensions": [
                            {
                                "value": "i-********",
                                "name": "InstanceId"
                            }
                        ],
                        "Period": 300,
                        "EvaluationPeriods": 1,
                        "DatapointsToAlarm": 1,
                        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
                        "Threshold": 40,
                        "TreatMissingData": "missing",
                        "EvaluateLowSampleCountPercentile": ""
                    }
                },
                "Timestamp": "2025-08-12T07:41:05.649Z",
                "SignatureVersion": "1",
                "Signature": "********",
                "SigningCertUrl": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-********",
                "Subject": "ALARM: \"test_\" in US West (Oregon)",
                "UnsubscribeUrl": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:********",
                "MessageAttributes": {

                }
            }
        }
    ]
}

CloudWatch 日志预警

这里的 Event 是模拟的 Lambda 定时从 CloudWatch Log 中根据 Query 获取并推送至 SNS 的内容。

json 复制代码
{
  "Records": [
    {
      "EventSource": "aws:sns",
      "EventVersion": "1.0",
      "EventSubscriptionArn": "arn:aws:sns:********",
      "Sns": {
        "Type": "Notification",
        "MessageId": "f664004c-xxxxxxxxx-615af8115e52",
        "TopicArn": "arn:aws:sns:********",
        "Message": "# 🔥 CloudWatch Logs Insights 告警\n\n---\n\n### 共 1 条预警\n\n---\n\n#### ERROR(hits = 1)\n> **msg**:测试错误\n- **line**:330\n- **logger**:com.amzless.ads.dispatch.task.ad.AdSyncHandler\n- **method**:getPostData\n- **firstSeen**:2025-08-13 04:51:36.463\n",
        "Timestamp": "2025-08-13T05:10:33.276Z",
        "SignatureVersion": "1",
        "Signature": "xxxxxxxxxx",
        "SigningCertUrl": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-xxxxxxxx.pem",
        "Subject": "CloudWatch Logs Insights 告警",
        "UnsubscribeUrl": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:xxxxxxxx",
        "MessageAttributes": {}
      }
    }
  ]
}
相关推荐
A小辣椒3 天前
AWS Clould Support Engineer就职面试题
aws
亚林瓜子5 天前
AWS WAF中如何放行某个触发了托管规则的接口
aws·waf
悠悠121387 天前
AWS DevOps Agent 体验一周后,我决定把 oncall 手机调成静音了
云计算·aws·devops
yyuuuzz7 天前
独立站运营的几个技术层面常见问题
大数据·运维·服务器·网络·数据库·aws
yyuuuzz7 天前
游戏云服务器推荐的技术选择思路
大数据·运维·服务器·游戏·云计算·aws
kernelcraft9 天前
Boto3:Python 操作 AWS 的官方 SDK
开发语言·python·其他·aws
普通网友16 天前
Serverless 框架:多云函数部署(AWS + 阿里云 + 腾讯云)
阿里云·serverless·aws
TG_yunshuguoji16 天前
亚马逊云代理商:如何用 CloudWatch+Lambda 打造自动化告警系统
大数据·运维·自动化·云计算·aws
yyuuuzz16 天前
独立站搭建的几个核心技术问题
运维·服务器·网络·数据库·aws
yyuuuzz16 天前
aws亚马逊云服务的基础认知与常见场景
大数据·运维·服务器·网络·云计算·aws