AWS 亚马逊云预警通知接入钉钉告警(微信同样适用)

由于 AWS 不支持 Webhook 等通知渠道,因此借助 Lambda 函数作为中继,解析 SNS 消息,再推送至钉钉(或企业微信)。

架构

部署步骤

创建钉钉机器人

步骤如下:

  1. 创建钉钉通知群【创建 普通群 免费,不消耗企业资源
  2. 找到 "群设置" -> "机器人" -> "添加机器人"
  1. ‼️ 安全设置中添加关键词:Alarm【下面代码中,通知的内容默认会包含该词】
  1. 复制 Webhook 中的 access_token【后续 Lambda 代码中需要使用】

Lambda 部署钉钉通知函数

在 Lambda 配置的步骤大体如下:

  1. 📦 本地打包代码与环境依赖(这里使用 Python 开发,代码文件内容为空也没事,后续在 Lambda 控制台可以直接调整):
bash 复制代码
# 1) 创建打包目录
mkdir -p lambda_pkg && cd lambda_pkg

# 2) 安装依赖到当前目录(这里依赖 requests)
pip install --upgrade pip
pip install requests -t .

# 3) 放入你的代码文件
cp /path/to/lambda_function.py .

# 4) 打包(注意:zip 内应是文件本身,而非外层文件夹)
zip -r ../lambda.zip .

# 5) 返回上一层(得到的 lambda.zip 就能上传了)
cd ..
  1. Lambda 控制台创建函数:导航栏 "函数" -> "创建函数"
  1. 将第 1 步中打包的 lambda.zip 上传:
  1. 配置环境变量 token,将之前创建钉钉机器人时获取的 access_token 传入
  1. (按需)修改代码
  2. 部署并测试

创建 SNS 主题及订阅

(按需)创建主题

若之前已创建,可跳过,直接创建订阅即可。

创建 "标准" SNS 即可,其他配置无特殊需求保持默认。

创建订阅

  1. 协议选择 "AWS Lambda"
  2. 终端节点选择第 2 步中创建的

附录

代码

python 复制代码
import requests
import json
import os

def send_msg(title, msg):
    token = os.getenv('token')
    url = "https://oapi.dingtalk.com/robot/send?access_token="
    url = url + token
    headers = {'Content-Type': 'application/json'}
    values = values = f'{{"msgtype":"markdown","markdown":{{"title": "{title}","text": "{msg}"}}}}'
    
    print(values)
    request = requests.post(url, values,headers=headers)
    return request.text

def lambda_handler(event, context):
    print(event)
    
    try:
        Message = json.loads(event['Records'][0]['Sns']['Message'])
        alarmName = Message['AlarmName']
        alarmDescription = Message['AlarmDescription']
        newStateValue = Message['NewStateValue']
        timestamp = event['Records'][0]['Sns']['Timestamp']
        newStateReason = json.loads(event['Records'][0]['Sns']['Message'])['NewStateReason']
        region = Message['Region']
        
        msg = f"**预警**: {alarmName}\n\n**地域**: {region}\n\n**当前状态**: {newStateValue}\n\n**触发原因**: {newStateReason}\n\n**触发时间**: {timestamp}\n\n**预警描述**: {alarmDescription}"
        print(msg)
        send_msg(alarmName, msg)
    except json.JSONDecodeError:
        # 触发器不是 CloudWatch,直接发送原始消息
        raw_message = event['Records'][0]['Sns']['Message']
        print(raw_message)
        send_msg("预警通知", raw_message)
    except KeyError as e:
        # 缺少必要字段时,作为解析失败处理
        raw_message = event['Records'][0]['Sns']['Message']
        timestamp = event['Records'][0]['Sns']['Timestamp']
        
        msg = f"**预警消息解析失败**\n\n**原始消息**: {raw_message}\n\n**时间**: {timestamp}\n\n**错误**: 缺少必要字段 {str(e)}"
        print(f"字段缺失: {e}")
        print(msg)
        send_msg("预警消息解析失败", msg)

测试用的 SNS 事件

CloudWatch 指标预警

这里的 Event 是模拟的从 CloudWatch -> SNS -> Lambda 的内容。

json 复制代码
{
    "Records": [
        {
            "EventSource": "aws:sns",
            "EventVersion": "1.0",
            "EventSubscriptionArn": "arn:aws:sns:********",
            "Sns": {
                "Type": "Notification",
                "MessageId": "b46076bd-********-17a94cdbfbd5",
                "TopicArn": "arn:aws:sns:********",
                "Message": {
                    "AlarmName": "test_1 内存利用率过高告警",
                    "AlarmDescription": null,
                    "AWSAccountId": "514986213302",
                    "AlarmConfigurationUpdatedTimestamp": "2025-08-12T07:39:49.733+0000",
                    "NewStateValue": "ALARM",
                    "NewStateReason": "Threshold Crossed: 1 out of the last 1 datapoints [42.067474122272536 (12/08/25 07:36:00)] was greater than or equal to the threshold (40.0) (minimum 1 datapoint for OK -> ALARM transition).",
                    "StateChangeTime": "2025-08-12T07:41:05.608+0000",
                    "Region": "US West (Oregon)",
                    "AlarmArn": "arn:aws:cloudwatch:********",
                    "OldStateValue": "OK",
                    "OKActions": [

                    ],
                    "AlarmActions": [
                        "arn:aws:sns:********"
                    ],
                    "InsufficientDataActions": [

                    ],
                    "Trigger": {
                        "MetricName": "mem_used_percent",
                        "Namespace": "CWAgent",
                        "StatisticType": "ExtendedStatistic",
                        "ExtendedStatistic": "p90",
                        "Unit": null,
                        "Dimensions": [
                            {
                                "value": "i-********",
                                "name": "InstanceId"
                            }
                        ],
                        "Period": 300,
                        "EvaluationPeriods": 1,
                        "DatapointsToAlarm": 1,
                        "ComparisonOperator": "GreaterThanOrEqualToThreshold",
                        "Threshold": 40,
                        "TreatMissingData": "missing",
                        "EvaluateLowSampleCountPercentile": ""
                    }
                },
                "Timestamp": "2025-08-12T07:41:05.649Z",
                "SignatureVersion": "1",
                "Signature": "********",
                "SigningCertUrl": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-********",
                "Subject": "ALARM: \"test_\" in US West (Oregon)",
                "UnsubscribeUrl": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:********",
                "MessageAttributes": {

                }
            }
        }
    ]
}

CloudWatch 日志预警

这里的 Event 是模拟的 Lambda 定时从 CloudWatch Log 中根据 Query 获取并推送至 SNS 的内容。

json 复制代码
{
  "Records": [
    {
      "EventSource": "aws:sns",
      "EventVersion": "1.0",
      "EventSubscriptionArn": "arn:aws:sns:********",
      "Sns": {
        "Type": "Notification",
        "MessageId": "f664004c-xxxxxxxxx-615af8115e52",
        "TopicArn": "arn:aws:sns:********",
        "Message": "# 🔥 CloudWatch Logs Insights 告警\n\n---\n\n### 共 1 条预警\n\n---\n\n#### ERROR(hits = 1)\n> **msg**:测试错误\n- **line**:330\n- **logger**:com.amzless.ads.dispatch.task.ad.AdSyncHandler\n- **method**:getPostData\n- **firstSeen**:2025-08-13 04:51:36.463\n",
        "Timestamp": "2025-08-13T05:10:33.276Z",
        "SignatureVersion": "1",
        "Signature": "xxxxxxxxxx",
        "SigningCertUrl": "https://sns.us-west-2.amazonaws.com/SimpleNotificationService-xxxxxxxx.pem",
        "Subject": "CloudWatch Logs Insights 告警",
        "UnsubscribeUrl": "https://sns.us-west-2.amazonaws.com/?Action=Unsubscribe&SubscriptionArn=arn:aws:sns:xxxxxxxx",
        "MessageAttributes": {}
      }
    }
  ]
}
相关推荐
yyuuuzz3 小时前
独立站的技术基础与常见运维问题
大数据·运维·服务器·网络·数据库·aws
代码N年归来仍是新手村成员2 天前
【AWS】Lambda 初识与服务部署
javascript·react.js·ai·node.js·云计算·ai编程·aws
zhojiew3 天前
在AWS裸金属实例上安装Cubesandbox并集成PydanticAI进行数据分析的实践
数据分析·云计算·aws
yyuuuzz3 天前
aws亚马逊云上运维常见问题梳理
运维·服务器·网络·云计算·aws
亚林瓜子3 天前
AWS S3日志桶常用过期文件生命周期策略
云计算·生命周期·aws·s3·过期·glacier
yyuuuzz4 天前
企业出海场景下的技术适配小经验
运维·服务器·网络·云计算·aws
yyuuuzz6 天前
国外云服务使用的常见技术问题梳理
运维·服务器·网络·数据库·aws
光于前裕于后7 天前
AWS Redshift 集成Zero-ETL和数据共享 Data sharing
云计算·etl·aws
zhojiew8 天前
在AWS中国区实现EKS跨VPC跨区域实现节点加入集群的实践
云计算·aws
认真的薛薛8 天前
Terraform: AWS VPC+可SSH登录EC2
ssh·aws·terraform