AWS 亚马逊云预警通知接入钉钉告警(微信同样适用)

由于 AWS 不支持 Webhook 等通知渠道,因此借助 Lambda 函数作为中继,解析 SNS 消息,再推送至钉钉(或企业微信)。

架构

部署步骤

创建钉钉机器人

步骤如下:

  1. 创建钉钉通知群【创建 普通群 免费,不消耗企业资源
  2. 找到 "群设置" -> "机器人" -> "添加机器人"
  1. ‼️ 安全设置中添加关键词:Alarm【下面代码中,通知的内容默认会包含该词】
  1. 复制 Webhook 中的 access_token【后续 Lambda 代码中需要使用】

Lambda 部署钉钉通知函数

在 Lambda 配置的步骤大体如下:

  1. 📦 本地打包代码与环境依赖(这里使用 Python 开发,代码文件内容为空也没事,后续在 Lambda 控制台可以直接调整):
bash 复制代码
# 1) 创建打包目录
mkdir -p lambda_pkg && cd lambda_pkg

# 2) 安装依赖到当前目录(这里依赖 requests)
pip install --upgrade pip
pip install requests -t .

# 3) 放入你的代码文件
cp /path/to/lambda_function.py .

# 4) 打包(注意:zip 内应是文件本身,而非外层文件夹)
zip -r ../lambda.zip .

# 5) 返回上一层(得到的 lambda.zip 就能上传了)
cd ..
  1. Lambda 控制台创建函数:导航栏 "函数" -> "创建函数"
  1. 将第 1 步中打包的 lambda.zip 上传:
  1. 配置环境变量 token,将之前创建钉钉机器人时获取的 access_token 传入
  1. (按需)修改代码
  2. 部署并测试

创建 SNS 主题及订阅

(按需)创建主题

若之前已创建,可跳过,直接创建订阅即可。

创建 "标准" SNS 即可,其他配置无特殊需求保持默认。

创建订阅

  1. 协议选择 "AWS Lambda"
  2. 终端节点选择第 2 步中创建的

附录

代码

python 复制代码
import requests
import json
import os

def send_msg(title, msg):
    token = os.getenv('token')
    url = "https://oapi.dingtalk.com/robot/send?access_token="
    url = url + token
    headers = {'Content-Type': 'application/json'}
    values = values = f'{{"msgtype":"markdown","markdown":{{"title": "{title}","text": "{msg}"}}}}'
    
    print(values)
    request = requests.post(url, values,headers=headers)
    return request.text

def lambda_handler(event, context):
    print(event)
    
    try:
        Message = json.loads(event['Records'][0]['Sns']['Message'])
        alarmName = Message['AlarmName']
        alarmDescription = Message['AlarmDescription']
        newStateValue = Message['NewStateValue']
        timestamp = event['Records'][0]['Sns']['Timestamp']
        newStateReason = json.loads(event['Records'][0]['Sns']['Message'])['NewStateReason']
        region = Message['Region']
        
        msg = f"**预警**: {alarmName}\n\n**地域**: {region}\n\n**当前状态**: {newStateValue}\n\n**触发原因**: {newStateReason}\n\n**触发时间**: {timestamp}\n\n**预警描述**: {alarmDescription}"
        print(msg)
        send_msg(alarmName, msg)
    except json.JSONDecodeError:
        # 触发器不是 CloudWatch,直接发送原始消息
        raw_message = event['Records'][0]['Sns']['Message']
        print(raw_message)
        send_msg("预警通知", raw_message)
    except KeyError as e:
        # 缺少必要字段时,作为解析失败处理
        raw_message = event['Records'][0]['Sns']['Message']
        timestamp = event['Records'][0]['Sns']['Timestamp']
        
        msg = f"**预警消息解析失败**\n\n**原始消息**: {raw_message}\n\n**时间**: {timestamp}\n\n**错误**: 缺少必要字段 {str(e)}"
        print(f"字段缺失: {e}")
        print(msg)
        send_msg("预警消息解析失败", msg)

测试用的 SNS 事件

CloudWatch 指标预警

这里的 Event 是模拟的从 CloudWatch -> SNS -> Lambda 的内容。

json 复制代码
{
    "Records": [
        {
            "EventSource": "aws:sns",
            "EventVersion": "1.0",
            "EventSubscriptionArn": "arn:aws:sns:********",
            "Sns": {
                "Type": "Notification",
                "MessageId": "b46076bd-********-17a94cdbfbd5",
                "TopicArn": "arn:aws:sns:********",
                "Message": {
                    "AlarmName": "test_1 内存利用率过高告警",
                    "AlarmDescription": null,
                    "AWSAccountId": "514986213302",
                    "AlarmConfigurationUpdatedTimestamp": "2025-08-12T07:39:49.733+0000",
                    "NewStateValue": "ALARM",
                    "NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [42.067474122272536 (12/08/25 07:36:00)] was greater than or equal to the threshold (40.0) (minimum 1 datapoint for OK -> ALARM transition).",
                    "StateChangeTime": "2025-08-12T07:41:05.608+0000",
                    "Region": "US West (Oregon)",
                    "AlarmArn": "arn:aws:cloudwatch:********",
                    "OldStateValue": "OK",
                    "OKActions": [

                    ],
                    "AlarmActions": [
                        "arn:aws:sns:********"
                    ],
                    "InsufficientDataActions": [

                    ],
                    "Trigger": {
                        "MetricName": "mem_used_percent",
                        "Namespace": "CWAgent",
                        "StatisticType": "ExtendedStatistic",
                        "ExtendedStatistic": "p90",
                        "Unit": null,
                        "Dimensions": [
                            {
                                "value": "i-********",
                                "name": "InstanceId"
                            }
                        ],
                        "Period": 300,
                        "EvaluationPeriods": 1,
                        "DatapointsToAlarm": 1,
                        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
                        "Threshold": 40,
                        "TreatMissingData": "missing",
                        "EvaluateLowSampleCountPercentile": ""
                    }
                },
                "Timestamp": "2025-08-12T07:41:05.649Z",
                "SignatureVersion": "1",
                "Signature": "********",
                "SigningCertUrl": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-********",
                "Subject": "ALARM: \"test_\" in US West (Oregon)",
                "UnsubscribeUrl": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:********",
                "MessageAttributes": {

                }
            }
        }
    ]
}

CloudWatch 日志预警

这里的 Event 是模拟的 Lambda 定时从 CloudWatch Log 中根据 Query 获取并推送至 SNS 的内容。

json 复制代码
{
  "Records": [
    {
      "EventSource": "aws:sns",
      "EventVersion": "1.0",
      "EventSubscriptionArn": "arn:aws:sns:********",
      "Sns": {
        "Type": "Notification",
        "MessageId": "f664004c-xxxxxxxxx-615af8115e52",
        "TopicArn": "arn:aws:sns:********",
        "Message": "# 🔥 CloudWatch Logs Insights 告警\n\n---\n\n### 共 1 条预警\n\n---\n\n#### ERROR(hits = 1)\n> **msg**:测试错误\n- **line**:330\n- **logger**:com.amzless.ads.dispatch.task.ad.AdSyncHandler\n- **method**:getPostData\n- **firstSeen**:2025-08-13 04:51:36.463\n",
        "Timestamp": "2025-08-13T05:10:33.276Z",
        "SignatureVersion": "1",
        "Signature": "xxxxxxxxxx",
        "SigningCertUrl": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-xxxxxxxx.pem",
        "Subject": "CloudWatch Logs Insights 告警",
        "UnsubscribeUrl": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:xxxxxxxx",
        "MessageAttributes": {}
      }
    }
  ]
}
相关推荐
zhojiew4 小时前
[INFRA] EMR集群中Hive和Spark集成Glue Data Catalog过程的深入分析
hive·hadoop·spark·aws·bigdata
亚马逊云开发者4 小时前
我用 Lambda Durable Functions 把五个 Lambda 缩成了一个,代码量砍半
aws
亚马逊云开发者7 小时前
异构 GPU 混合部署 Whisper,我用 HyperPod 一个集群搞定了
aws
亚马逊云开发者20 小时前
模型搜完网页就"脑算"数字?用 Dynamic Filtering 让它老老实实写代码
aws
亚马逊云开发者1 天前
老板让我迁 Graviton,我用 AI 工具几分钟搞定了迁移评估
aws
亚马逊云开发者1 天前
用 Kiro CLI 做 Agent 后端,1000 行代码搞定飞书 AI 聊天机器人
aws
147API1 天前
从零开始上手 AWS:架构设计、成本优化与避坑指南
云计算·claude·aws
zhojiew1 天前
[INFRA] EMR集群安全配置传输中加密和Kerberos认证配置详解
安全·aws·emr·bigdata
zhojiew1 天前
[INFRA] EMR集群启用HA高可用架构和配置分析
aws·emr·bigdata
亚马逊云开发者1 天前
S3 桶名不用再抢了:Account Regional Namespaces 来了
aws