上篇文章:
目录
[1 告警规则](#1 告警规则)
[1.1 告警规则配置项](#1.1 告警规则配置项)
[1.2 Webhook邮箱告警](#1.2 Webhook邮箱告警)
[1.2.1 引入依赖](#1.2.1 引入依赖)
[1.2.2 添加配置项](#1.2.2 添加配置项)
[1.2.3 接口开发](#1.2.3 接口开发)
[1.2.4 配置Webhook](#1.2.4 配置Webhook)
[1.2.5 重启服务和SkyWalking](#1.2.5 重启服务和SkyWalking)
[2 Webhook接入飞书](#2 Webhook接入飞书)
1 告警规则
1.1 告警规则配置项
当发生异常信息时,比如接口访问非常慢或超时,请求成功率很低,就需要SkyWalking来通知开发人员和运维人员来及时排查问题:

在SkyWalking安装目录apache-skywalking-apm-bin\config的alarm-settings.yml文件中,存在默认的告警规则:
bash
# Sample alarm rules.
rules:
# Rule unique name, must be ended with `_rule`.
service_resp_time_rule:
# A MQE expression, the result type must be `SINGLE_VALUE` and the root operation of the expression must be a Compare Operation
# which provides `1`(true) or `0`(false) result. When the result is `1`(true), the alarm will be triggered.
expression: sum(service_resp_time > 1000) >= 3
period: 10
silence-period: 5
message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
# service_resp_time_rule:
# expression: avg(service_resp_time) > 1000
# period: 10
# silence-period: 5
# message: Avg response time of service {name} is more than 1000ms in last 10 minutes.
service_sla_rule:
expression: sum(service_sla < 8000) >= 2
# The length of time to evaluate the metrics
period: 10
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 3
message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
service_resp_time_percentile_rule:
expression: sum(service_percentile{p='50,75,90,95,99'} > 1000) >= 3
period: 10
silence-period: 5
message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000
service_instance_resp_time_rule:
expression: sum(service_instance_resp_time > 1000) >= 2
period: 10
silence-period: 5
message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
database_access_resp_time_rule:
expression: sum(database_access_resp_time > 1000) >= 2
period: 10
message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes
endpoint_relation_resp_time_rule:
expression: sum(endpoint_relation_resp_time > 1000) >= 2
period: 10
message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes
# Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm.
# Because the number of endpoint is much more than service and instance.
#
# endpoint_resp_time_rule:
# expression: sum(endpoint_resp_time > 1000) >= 2
# period: 10
# silence-period: 5
# message: Response time of endpoint {name} is more than 1000ms in 2 minutes of last 10 minutes
#hooks:
# webhook:
# default:
# is-default: true
# urls:
# - http://127.0.0.1/notify/
# - http://127.0.0.1/go-wechat/
告警规则的定义必须以`_rule`结尾,其中:
expression是告警表达式,结果为1时触发告警;
period是告警周期(minute),在该时间范围内满足expression的触发高级;
silence-period是静默时间(minute),触发告警后,静默时间内不再触发;
message是告警信息。
比如service_resp_time_rule规则,表示某个服务的响应时间在最近10分钟的3分钟内持续超过1000毫秒时触发告警。
1.2 Webhook邮箱告警
Webhook是一种允许应用程序向外部系统实时推送事件或数据的机制,通常通过HTTP回调实现,从而实现跨系统自动化的信息传递。核心特性:
****事件驱动:****当预设条件触发时(如告警触发、数据更新),主动向目标URL发送HTTP请求(通常为POST)。
****轻量级集成:****接收方只需提供一个可访问的HTTP端点即可接收数据,无需轮询查询。
****灵活扩展:****适用于告警通知、流程触发、数据同步等场景。
1.2.1 引入依赖
XML
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-mail</artifactId>
</dependency>
</dependencies>
spring-boot-starter-mail是主要进行邮件的依赖。
1.2.2 添加配置项
bash
server:
port: 8084
logging:
pattern:
dateformat: HH:mm:ss:SSS
spring:
mail:
# 指定邮件服务器地址
host: smtp.qq.com
# 登录账户
username: "发件人邮箱账号"
# 授权码
password: "授权码"
# 端口
port: 465
# 默认编码
default-encoding: UTF-8
# 使用的协议
protocol: smtps
# 其他的属性
properties:
# 默认属性
"mail.smtp.connectiontimeout": 5000
"mail.smtp.timeout": 3000
"mail.smtp.writetimeout": 5000
"mail.smtp.auth": true
"mail.smtp.starttls.enable": true
"mail.smtp.starttls.required": true
# 自定义属性
"personal": "告警系统"
"subject": "订单系统告警"
需要在QQ邮箱设置中开启SMTP服务,并且记得把配置文件中的spring.mail.username和password替换为自己的邮箱账号和授权码。
授权码开启方式可以查看QQ邮箱官网的帮助文档:https://service.mail.qq.com/detail/0/75
1.2.3 接口开发
SkyWalking告警消息接收实体类:
java
@Data
public class AlarmMessage {
private int scopeId;
private String scope;
private String name;
private String id0;
private String id1;
private String ruleName;
private String alarmMessage;
private List<Tag> tags;
private long startTime;
private transient int period;
private Set<String> hooks = new HashSet<>();
private String expression;
@Data
public static class Tag {
private String key;
private String value;
}
}
邮件发送配置类:
java
@Slf4j
@Configuration
public class Mail {
@Autowired
//读取spring.mail配置并注册成MailProperties对象
private MailProperties mailProperties;
@Autowired
private JavaMailSender javaMailSender;
public void send(String to,String content) {
try {
// 创建一个邮件消息
MimeMessage message = javaMailSender.createMimeMessage();
// 创建MimeMessageHelper
MimeMessageHelper helper = new MimeMessageHelper(message, false);
// 发件人邮箱和名称
String personal = Optional.ofNullable(mailProperties.getProperties().get("personal")).orElse(mailProperties.getUsername());
helper.setFrom(mailProperties.getUsername(), personal);
// 收件人邮箱
helper.setTo(to);
// 邮件标题
helper.setSubject(mailProperties.getProperties().getOrDefault("subject","告警通知"));
// 邮件正文,第二个参数表示是否是HTML正文
helper.setText(content, true);
// 发送
javaMailSender.send(message);
} catch (Exception e) {
log.error("邮件发送失败, e:" + e);
}
}
}
控制层接口:
java
@Slf4j
@RequestMapping("/alarm")
@RestController
public class AlarmController {
@Autowired
private Mail mail;
@RequestMapping("/handler")
public String handler(@RequestBody List<AlarmMessage> alarmMessages) {
log.info("收到报警, alarmMessages:{}", alarmMessages);
mail.send("收件人邮箱",buildMessage(alarmMessages));
return "接收报警成功";
}
private String buildMessage(List<AlarmMessage> alarmMessages) {
StringBuilder builder = new StringBuilder();
builder.append("系统告警: <br/>");
for (AlarmMessage alarmMessage : alarmMessages) {
builder.append("scopeId: ").append(alarmMessage.getScopeId())
.append("<br/> scope: ").append(alarmMessage.getScope())
.append("<br/> 目标 Scope 的实体名称: ").append(alarmMessage.getName())
.append("<br/> Scope 实体的 ID: ").append(alarmMessage.getId0())
.append("<br/> 告警规则名称: ").append(alarmMessage.getRuleName())
.append("<br/> 告警消息内容: ").append(alarmMessage.getAlarmMessage())
.append("<br/>告警时间: ").append(alarmMessage.getStartTime())
.append("<br/><br/>---------------");
}
return builder.toString();
}
}
1.2.4 配置Webhook
主要是配置apache-skywalking-apm-bin\config的alarm-settings.yml文件,配置告警向URL进行通知,这里就是配置向alarm-service服务进行通知,然后由alarm-service服务将告警信息处理并发送邮箱:
bash
hooks:
webhook:
default:
is-default: true
urls:
- http://127.0.0.1:8084/alarm/handler
1.2.5 重启服务和SkyWalking

由于开启分布式事务,因此创建订单操作比较慢,在邮件中就会出现告警信息。
注意:该信息出现可能比较慢,因为告警规则统计周期默认是10分钟,加上信息处理等就更慢了。
2 Webhook接入飞书
Webhook还可以接入企业微信、飞书、钉钉等应用,从而让开发和运维人员更及时接收告警信息。在飞书任意一个群组,点击右上角,添加机器人:

点击自定义机器人,然后配置机器人信息,点击添加:

在打开的界面复制Webhook地址和签名校验码(如果开启设置),修改SkyWalking的配置文件:
bash
hooks:
feishu:
default:
is-default: true
text-template: |
{
"msg_type":"text",
"content": {
"text": "Apache SkyWalking Alarm: \n %s."
}
}
webhooks:
- url: 飞书获取
secret: 飞书获取
然后重启SkyWalking,测试观察飞书机器人推送告警消息:

可以发现,告警信息被成功推送到飞书。其它应用接入也同理,具体可看应用的开发者文档。