Uptime Kuma 接入 SkyWalking 完整文档
目录

架构概述
-
基于skywalking:10.2
-
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌─────────────┐
│ Uptime │ │ OTel │ │ SkyWalking │ │Elasticsearch│
│ Kuma │────▶│ Collector │────▶│ OAP │────▶│ │
│ :3001 │ │ :4317/4318 │ │ :11800/ │ │ :9200 │
└─────────────┘ └──────────────┘ │ 12800 │ └─────────────┘
└─────────────┘
数据流向:
- Uptime Kuma 暴露
/metrics端点 - OTel Collector 通过 Prometheus receiver 采集指标
- OTel Collector 通过 OTLP 协议发送到 OAP
- OAP 处理指标并存储到 Elasticsearch
- 解决问题:同一uptime任务分组展示:service-instance

环境准备
1. 启动服务
bash
docker-compose -f docker-compose-m1.yml up -d
compose文件
yml
version: '3.8'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.16.3
container_name: elasticsearch
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms256m -Xmx256m"
- TZ=Asia/Shanghai
- LANG=C.UTF-8
ulimits:
memlock:
soft: -1
hard: -1
ports:
- "9200:9200"
- "9300:9300"
volumes:
- esdata:/usr/share/elasticsearch/data
networks:
- skywalking-net
restart: unless-stopped
platform: linux/arm64
oap:
image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/apache/skywalking-oap-server:10.1.0-java17-linuxarm64
container_name: skywalking-oap-server
depends_on:
- elasticsearch
ports:
- "11800:11800"
- "12800:12800"
volumes:
- ./application.yml:/skywalking/config/application.yml:ro
- ../otel-rules:/skywalking/config/otel-rules:ro
environment:
- SW_STORAGE=elasticsearch
- SW_STORAGE_ES_CLUSTER_NODES=elasticsearch:9200
- JAVA_OPTS=-Xms512m -Xmx768m
- TZ=Asia/Shanghai
- LANG=C.UTF-8
networks:
- skywalking-net
restart: unless-stopped
platform: linux/arm64
ui:
image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/apache/skywalking-ui:10.1.0-java17-linuxarm64
container_name: skywalking-ui
depends_on:
- oap
ports:
- "8080:8080"
environment:
- SW_OAP_ADDRESS=http://oap:12800
- SW_HEALTH_CHECKER=default
- TZ=Asia/Shanghai
- LANG=C.UTF-8
networks:
- skywalking-net
restart: unless-stopped
platform: linux/arm64
# Elasticsearch Exporter - 收集Elasticsearch指标
elasticsearch-exporter:
image: prometheuscommunity/elasticsearch-exporter:v1.5.0
container_name: elasticsearch-exporter
depends_on:
- elasticsearch
command:
- '--es.uri=http://elasticsearch:9200'
- '--es.all'
environment:
- TZ=Asia/Shanghai
- LANG=C.UTF-8
ports:
- "9114:9114"
networks:
- skywalking-net
restart: unless-stopped
platform: linux/arm64
# OTel Collector - 收集 PostgreSQL、Elasticsearch、MySQL、Redis 和 Blackbox 指标并发送到 SkyWalking
otel-collector:
image: otel/opentelemetry-collector-contrib:0.94.0
container_name: skywalking-otel-collector
depends_on:
- postgresql
- postgresql-exporter
- elasticsearch
- elasticsearch-exporter
- oap
- mysql-exporter
- redis-exporter
- blackbox-exporter
command: ["--config=/etc/otel-collector-config.yaml"]
environment:
- TZ=Asia/Shanghai
- LANG=C.UTF-8
volumes:
- ./otel-collector-config.yaml:/etc/otel-collector-config.yaml:ro
extra_hosts:
- "host.docker.internal:host-gateway"
ports:
- "13133:13133" # health_check
- "1777:1777" # pprof
- "55679:55679" # zpages
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
networks:
- skywalking-net
restart: unless-stopped
platform: linux/arm64
# Uptime Kuma - 统一监控仪表盘和告警系统
uptime-kuma:
image: louislam/uptime-kuma:latest
container_name: uptime-kuma
ports:
- "3001:3001"
volumes:
- uptime_kuma_data:/app/data
environment:
- TZ=Asia/Shanghai
- LANG=C.UTF-8
networks:
- skywalking-net
restart: unless-stopped
platform: linux/arm64
networks:
skywalking-net:
driver: bridge
volumes:
esdata:
driver: local
postgres_data:
driver: local
uptime_kuma_data:
driver: local
2. 验证服务状态
bash
# 检查所有容器
docker-compose -f docker-compose-m1.yml ps
# 检查 Uptime Kuma
curl http://localhost:3001
# 检查 OTel Collector
curl http://localhost:13133/healthz
# 检查 OAP
curl http://localhost:12800/health
# 检查 SkyWalking UI
curl http://localhost:8080
配置 OTel Collector
配置文件位置
skywalking:10.2配置文件 /otel-collector-config.yaml
yml
receivers:
prometheus:
config:
scrape_configs:
# - xxx 其他的收集器
# uptime 指标收集
- job_name: 'uptime-monitoring'
scrape_interval: 15s # 采集间隔
scrape_timeout: 10s # 超时时间
static_configs:
- targets: ['127.0.0.1:3001'] #uptime部署地址
labels:
job_name: 'uptime-monitoring'
metrics_path: '/metrics'
scheme: 'http'
honor_labels: true
metric_relabel_configs:
# skywalking 10.2 ,在otel不支持新增标签,需要在此处定义一个 cluster 标签
- source_labels: [monitor_name]
regex: '^([^#]+)#.*'
target_label: cluster
action: replace
replacement: '$1'
# 基础认证鉴权 uptime账号
basic_auth:
username: 'admin'
password: 'password@'
processors:
batch:
resource/cleanup:
attributes:
- key: service.name
action: delete
exporters:
otlp:
endpoint: "127.0.0.1:11800" # 替换为你的 SkyWalking OAP 地址
tls:
insecure: true # 如果没有 TLS
service:
telemetry:
metrics: null
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [otlp]
配置 OAL 规则
规则文件位置
- 完整配置示例: /otel-rules/uptime-kuma.yaml
yml
metricPrefix: meter_uptime_kuma
filter: "{ tags -> tags.job_name == 'uptime-monitoring' }"
# 【核心修改点】:10.2 必须在 expSuffix 处先执行 tag 拆分
# 10.2对于不存在的标签:cluster 不会自动创建默认值,10.1 会自动创建默认值
# 否则系统无法感知 cluster 标签,导致 Service 实体名为空白
expSuffix: instance(['cluster'], ['monitor_name'], Layer.AWS_S3)
metricsRules:
# 可用性比率 - 1天
- name: uptime_ratio_1d
exp: |
monitor_uptime_ratio.tagEqual('window','1d').tag({ tags ->
if (tags.monitor_name != null && tags.monitor_name.contains("#")) {
tags.cluster = tags.monitor_name.split("#")[0]
} else {
tags.cluster = "General-Uptime-Service"
}
}).avg(['cluster', 'monitor_name']) * 100
# 可用性比率 - 30天
- name: uptime_ratio_30d
exp: |
monitor_uptime_ratio.tagEqual('window','30d').tag({ tags ->
if (tags.monitor_name != null && tags.monitor_name.contains("#")) {
tags.cluster = tags.monitor_name.split("#")[0]
} else {
tags.cluster = "General-Uptime-Service"
}
}).avg(['cluster', 'monitor_name']) * 100
# 可用性比率 - 365天
- name: uptime_ratio_365d
exp: |
monitor_uptime_ratio.tagEqual('window','365d').tag({ tags ->
if (tags.monitor_name != null && tags.monitor_name.contains("#")) {
tags.cluster = tags.monitor_name.split("#")[0]
} else {
tags.cluster = "General-Uptime-Service"
}
}).avg(['cluster', 'monitor_name']) * 100
# 平均响应时间
- name: response_time
exp: |
monitor_response_time.tag({ tags ->
if (tags.monitor_name != null && tags.monitor_name.contains("#")) {
tags.cluster = tags.monitor_name.split("#")[0]
} else {
tags.cluster = "General-Uptime-Service"
}
}).avg(['cluster', 'monitor_name'])
# 监控状态
- name: monitor_status
exp: |
monitor_status.tag({ tags ->
if (tags.monitor_name != null && tags.monitor_name.contains("#")) {
tags.cluster = tags.monitor_name.split("#")[0]
} else {
tags.cluster = "General-Uptime-Service"
}
}).avg(['cluster', 'monitor_name'])
# 证书剩余天数
- name: cert_days_remaining
exp: |
monitor_cert_days_remaining.tag({ tags ->
if (tags.monitor_name != null && tags.monitor_name.contains("#")) {
tags.cluster = tags.monitor_name.split("#")[0]
} else {
tags.cluster = "General-Uptime-Service"
}
}).avg(['cluster', 'monitor_name'])
# 证书是否有效
- name: cert_is_valid
exp: |
monitor_cert_is_valid.tag({ tags ->
if (tags.monitor_name != null && tags.monitor_name.contains("#")) {
tags.cluster = tags.monitor_name.split("#")[0]
} else {
tags.cluster = "General-Uptime-Service"
}
}).avg(['cluster', 'monitor_name'])
OAL 语法说明
| 表达式 | 说明 |
|---|---|
filter: "{ tags -> tags.job_name == 'uptime-monitoring' }" |
只处理 uptime-kuma 的指标 |
metricPrefix: meter_uptime_kuma |
指标前缀 |
expSuffix: service(['monitor_name'], Layer.GENERAL) |
按 monitor_name 创建服务 |
avg(['monitor_name']) |
按 monitor_name 分组求平均值 |
服务管理
重启 OAP 服务
方法 1:使用 docker-compose 命令(推荐)
bash
# 快速重启 OAP(适用于修改 OAL 规则)
docker-compose -f docker-compose-m1.yml restart oap
# 停止并重新创建 OAP
docker-compose -f docker-compose-m1.yml stop oap
docker-compose -f docker-compose-m1.yml up -d oap
方法 2:使用重启脚本
bash
cd /Users/jinyu/WorkSpace/java-notes/docker-compose
./restart-oap.sh
重启 OTel Collector
bash
docker-compose -f docker-compose-m1.yml restart otel-collector
查看日志
bash
# 查看 OAP 日志
docker-compose -f docker-compose-m1.yml logs -f oap
# 查看 OTel Collector 日志
docker-compose -f docker-compose-m1.yml logs -f otel-collector
# 查看调试输出(debug exporter)
docker-compose -f docker-compose-m1.yml logs -f otel-collector | grep "Metrics"
数据验证
1. 检查原始指标
访问 Uptime Kuma metrics 端点:
bash
curl http://localhost:3001/metrics
2. 检查 OTel Collector 导出
查看 debug exporter 输出:
bash
docker-compose -f docker-compose-m1.yml logs otel-collector --tail 100
3. 检查 OAP 接收
查看 OAP 指标端点:
bash
curl -s http://localhost:12800/metrics | grep uptime
4. 检查 SkyWalking UI
查看服务列表:
bash
curl -s 'http://localhost:12800/graphql' \
-H 'Content-Type: application/json' \
-d '{"query":"{ getAllServices(duration: { start: \"2024-01-01 000000\", end: \"2026-12-31 235959\", step: MINUTE }) { name } }"}' \
| python3 -m json.tool
常见问题
Q1: OAP 启动失败
检查日志:
bash
docker-compose -f docker-compose-m1.yml logs oap
常见原因:
- Elasticsearch 未启动
- OAL 规则语法错误
- 端口冲突(11800, 12800)
Q2: 指标未显示
排查步骤:
-
检查 Uptime Kuma 是否暴露 metrics:
bashcurl http://localhost:3001/metrics -
检查 OTel Collector 是否正常采集:
bashdocker-compose -f docker-compose-m1.yml logs otel-collector | grep uptime -
检查 OAP 是否正常接收:
bashdocker-compose -f docker-compose-m1.yml logs oap | grep receive -
检查 OAL 规则是否正确加载:
bashdocker-compose -f docker-compose-m1.yml logs oap | grep OAL
Q3: 服务名称不正确
问题: monitor_name 格式为 dev_xxx_00002#10-tcp,服务名过长
解决方案:
在 OAL 中使用 tag() 进行转换:
yaml
expSuffix: tag({tags -> tags.service_name = substring(tags.monitor_name, 0, indexOf(tags.monitor_name, ':'))}).service(['service_name'], Layer.GENERAL)
Q4: 如何添加认证
修改 OTel Collector 配置:
yaml
- job_name: 'uptime-monitoring'
...
basic_auth:
username: 'admin'
password: 'admin@'
或使用环境变量:
yaml
basic_auth:
username: '${UPTIME_USERNAME}'
password: '${UPTIME_PASSWORD}'
Q5: 如何过滤标签
使用 metricstransform 处理器:
yaml
processors:
metricstransform/uptime_kuma:
transforms:
- include: "monitor_.*"
action: update
operations:
- action: remove_label
label: monitor_hostname
- action: remove_label
label: monitor_port
Q6: 对标签进行拆分-按自定义规则分组展示
对已有的标签在mal中重新拆分-赋值:,不存在的标签可以在metric_relabel_configs,relabel_configs中配置,【新增、删除、过滤标签等】
yaml
# 平均响应时间
- name: response_time
exp: |
monitor_response_time.tag({ tags ->
if (tags.monitor_name != null && tags.monitor_name.contains("#")) {
tags.cluster = tags.monitor_name.split("#")[0]
} else {
tags.cluster = "General-Uptime-Service"
}
}).avg(['cluster', 'monitor_name'])
配置文件清单
| 文件 | 路径 | 用途 |
|---|---|---|
| docker-compose | docker-compose-m1.yml |
容器编排 |
| OTel 配置 | otel-collector-config.yaml |
指标采集配置 |
| OAP 配置 | application.yml |
SkyWalking OAP 配置 |
| OAL 规则 | 集约化平台/otel-rules/uptime-kuma.yaml |
指标转换规则 |
| 重启脚本 | restart-oap.sh |
OAP 重启脚本 |
端口说明
| 服务 | 端口 | 用途 |
|---|---|---|
| Uptime Kuma | 3001 | Web UI + Metrics |
| OTel Collector | 4317/4318 | OTLP gRPC/HTTP |
| OTel Collector | 13133 | Health Check |
| OAP | 11800 | gRPC 接收 |
| OAP | 12800 | HTTP API |
| SkyWalking UI | 8080 | Web UI |
| Elasticsearch | 9200/9300 | HTTP API / Transport |