摘要
应粉丝要求,今天把我在多个政务项目里用了一年多的人大金仓 V9 专用监控大盘分享出来。这个大盘是我根据官方 KMonitor 和生产运维经验优化而来,包含 120 + 核心指标,覆盖主机、实例、集群、性能、存储全维度,支持单机和主从集群,所有告警规则都经过生产验证。
本文解决以下核心问题:
- 全网唯一可直接导入的人大金仓 V9 Grafana 大盘 JSON
- 生产级 Prometheus+Exporter 完整配置(直接复制)
- 120 + 核心监控指标详解及政务系统推荐阈值
- 20 + 生产告警规则(钉钉 / 企业微信一键推送)
- 人大金仓监控特有的 5 个大坑及解决方案
- 多实例 / 集群监控完整配置
- 大盘使用技巧与常见问题排查
1. 环境版本说明(生产稳定组合)
重要:版本不匹配会导致指标缺失或采集异常,以下是我在生产环境验证过的稳定组合
| 组件 | 版本号 | 备注 |
|---|---|---|
| 人大金仓数据库 | V9.0.2.10+ | 支持所有 V9 版本,V9R6 最佳 |
| Kingbase Exporter | 与数据库版本完全一致 | 必须与数据库版本号完全相同,否则会有连接泄漏和指标缺失问题 |
| Prometheus | 2.45.0+ | LTS 版本,信创环境兼容 |
| Grafana | 10.2.0+ | 支持最新的可视化组件 |
| 操作系统 | 麒麟 V10 / 统信 UOS | 完全兼容 ARM/x86 双架构 |
2. 监控整体架构设计
人大金仓V9实例/集群
↓
Kingbase Exporter(9187端口)
↓
Prometheus(时序数据库)
↓
Grafana(可视化大盘)
↓
AlertManager → 钉钉/企业微信(告警推送)
优势:
- 轻量级:Exporter 内存占用 < 50MB,CPU 占用 < 1%
- 低侵入:无需修改数据库核心配置,仅需创建监控用户
- 高可靠:支持 Exporter 高可用部署
- 全维度:覆盖从硬件到 SQL 执行的所有监控层面
3. 第一步:部署人大金仓专用 Exporter
3.1 准备工作(必须先做)
登录人大金仓数据库,使用超级用户执行以下 SQL:
-- 第一步:创建sys_stat_statements扩展(必须)
CREATE EXTENSION IF NOT EXISTS sys_stat_statements;
-- 第二步:创建KWR扩展(可选,用于深度性能分析)
CREATE EXTENSION IF NOT EXISTS sys_kwr;
-- 第三步:创建监控用户
CREATE USER monitor WITH PASSWORD 'Monitor@123456';
-- 第四步:授予监控权限
GRANT SELECT ON pg_stat_database TO monitor;
GRANT SELECT ON pg_stat_activity TO monitor;
GRANT SELECT ON pg_stat_replication TO monitor;
GRANT SELECT ON pg_stat_bgwriter TO monitor;
GRANT SELECT ON pg_stat_user_tables TO monitor;
GRANT SELECT ON pg_stat_user_indexes TO monitor;
GRANT SELECT ON pg_locks TO monitor;
GRANT SELECT ON pg_settings TO monitor;
GRANT SELECT ON sys_stat_statements TO monitor;
-- 授予KWR/KSH权限(V9特有)
GRANT EXECUTE ON FUNCTION kwr_snapshot() TO monitor;
GRANT EXECUTE ON FUNCTION ksh_session_detail() TO monitor;
重要 :修改kingbase.conf配置文件,开启 SQL 统计功能:
shared_preload_libraries = 'sys_stat_statements, sys_kwr'
sys_stat_statements.track = all
sys_stat_statements.max = 10000
log_min_duration_statement = 1000 # 1秒以上的查询视为慢查询
# 重要:修改shared_preload_libraries参数后,必须重启人大金仓服务才能生效
3.2 部署 Exporter
# 重要:人大金仓官方Exporter不在公开网络提供下载
# 请通过以下任一方式获取:
# 1. 登录人大金仓官网→服务支持→下载中心→数据库工具→监控工具下载
# 2. 联系人大金仓技术支持索要对应数据库版本的Exporter
# 3. 从人大金仓安装目录的/opt/Kingbase/ES/V9/Server/bin目录下提取
# 上传Exporter压缩包到服务器后执行
mkdir -p /opt/kingbase_exporter
tar -zxvf kingbase_exporter-9.0.2.10-linux-amd64.tar.gz -C /opt/kingbase_exporter/
# 创建配置文件
cat > /opt/kingbase_exporter/config.yml << EOF
datasources:
- name: "kingbase-prod"
url: "jdbc:kingbase8://127.0.0.1:54321/your_db_name?currentSchema=public"
username: "monitor"
password: "Monitor@123456"
max_open_conns: 5
max_idle_conns: 2
scrape_interval: 30s
web:
listen_address: ":9187"
metrics_path: "/metrics"
EOF
# 创建系统服务
cat > /etc/systemd/system/kingbase_exporter.service << EOF
[Unit]
Description=Kingbase Exporter
After=network.target
[Service]
Type=simple
User=root
ExecStart=/opt/kingbase_exporter/kingbase_exporter --config.file=/opt/kingbase_exporter/config.yml
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
EOF
# 启动服务
systemctl daemon-reload
systemctl enable kingbase_exporter
systemctl start kingbase_exporter
3.3 验证采集
访问 http://服务器IP:9187/metrics,如果能看到大量以kingbase_开头的指标,说明部署成功。
4. 第二步:Prometheus 完整配置(直接复制)
编辑 prometheus.yml:
global:
scrape_interval: 30s
evaluation_interval: 30s
scrape_timeout: 15s
rule_files:
- "kingbase_alerts.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- "127.0.0.1:9093"
scrape_configs:
- job_name: "kingbase"
static_configs:
- targets: ["192.168.1.100:9187"] # 金仓服务器IP
labels:
instance: "kingbase-prod-01"
env: "production"
db_name: "your_db_name"
metrics_path: /metrics
scrape_interval: 30s
scrape_timeout: 15s
# 可选:添加主机监控
- job_name: "node"
static_configs:
- targets: ["192.168.1.100:9100"]
labels:
instance: "kingbase-prod-01"
env: "production"
重启 Prometheus:
systemctl restart prometheus
5. 第三步:Grafana 大盘一键导入(JSON 模板)
我把这个大盘导出成了 JSON 文件,你可以直接导入使用:
导入步骤:
- 打开 Grafana → 点击 "+" → 选择 "Import"
- 粘贴下面的 JSON 内容
- 选择 Prometheus 数据源
- 点击 "Import"
核心部分 JSON(完整版本见第 9 节):
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"panels": [],
"title": "数据库概览",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 80
},
{
"color": "red",
"value": 90
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 4,
"x": 0,
"y": 1
},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "(kingbase_connections_current{instance=\"$instance\"} / kingbase_connections_max{instance=\"$instance\"}) * 100",
"refId": "A"
}
],
"title": "连接使用率",
"type": "stat"
}
// 更多面板...
],
"refresh": "30s",
"schemaVersion": 38,
"style": "dark",
"tags": [
"kingbase",
"人大金仓",
"信创",
"数据库监控",
"生产级"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"includeAll": false,
"label": "数据源",
"multi": false,
"name": "DS_PROMETHEUS",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {},
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"definition": "label_values(kingbase_up, instance)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "数据库实例",
"multi": false,
"name": "instance",
"options": [],
"query": {
"query": "label_values(kingbase_up, instance)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "Asia/Shanghai",
"title": "人大金仓V9生产级监控大盘",
"uid": "kingbase-v9-dashboard",
"version": 2,
"weekStart": ""
}
大盘包含的核心面板:
- 数据库概览:连接数、QPS/TPS、缓存命中率、主从延迟
- 主机资源:CPU、内存、磁盘 IO、网络带宽
- 性能指标:事务统计、SQL 执行统计、慢查询数量
- 存储监控:表空间使用率、表大小 TOP10、索引大小 TOP10
- 集群状态:主从复制状态、节点健康度、同步延迟
- 锁与等待:锁等待统计、死锁次数、事务回滚率
6. 核心监控指标详解及阈值建议
重要:所有指标都是 0-1 的小数,不是百分比
| 指标名称 | 指标含义 | 政务系统推荐阈值 | 告警级别 |
|---|---|---|---|
kingbase_up |
数据库实例状态 | =0 | 严重 |
kingbase_connections_usage |
连接使用率 | >0.8 警告,>0.9 严重 | 警告 / 严重 |
kingbase_cache_hit_ratio |
共享缓冲区命中率 | <0.99 警告,<0.95 严重 | 警告 / 严重 |
kingbase_replication_lag_seconds |
主从延迟 | >1s 警告,>5s 严重 | 警告 / 严重 |
kingbase_slow_query_count_total |
慢查询数量 | >10 / 分钟 | 警告 |
kingbase_deadlock_count_total |
死锁次数 | >0 / 小时 | 警告 |
kingbase_tablespace_usage |
表空间使用率 | >0.85 警告,>0.95 严重 | 警告 / 严重 |
kingbase_transaction_rollback_ratio |
事务回滚率 | >0.05 | 警告 |
kingbase_checkpoint_write_time_seconds |
检查点写入时间 | >30s | 警告 |
kingbase_vacuum_count_total |
VACUUM 执行次数 | <1 / 天 | 警告 |
7. 生产级告警规则配置
创建 kingbase_alerts.yml:
groups:
- name: kingbase-alerts
rules:
# 实例宕机告警
- alert: KingbaseInstanceDown
expr: kingbase_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "人大金仓实例宕机"
description: "实例 {{ $labels.instance }} 已经宕机超过1分钟"
# 连接使用率过高告警
- alert: KingbaseConnectionsHigh
expr: (kingbase_connections_current / kingbase_connections_max) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "人大金仓连接使用率过高"
description: "实例 {{ $labels.instance }} 连接使用率达到 {{ $value | printf \"%.2f\" }}%"
- alert: KingbaseConnectionsCritical
expr: (kingbase_connections_current / kingbase_connections_max) > 0.9
for: 2m
labels:
severity: critical
annotations:
summary: "人大金仓连接使用率严重过高"
description: "实例 {{ $labels.instance }} 连接使用率达到 {{ $value | printf \"%.2f\" }}%,即将耗尽连接"
# 缓存命中率过低告警
- alert: KingbaseCacheHitRatioLow
expr: kingbase_cache_hit_ratio < 0.99
for: 10m
labels:
severity: warning
annotations:
summary: "人大金仓缓存命中率过低"
description: "实例 {{ $labels.instance }} 缓存命中率为 {{ $value | printf \"%.2f\" }}%"
# 主从延迟过高告警
- alert: KingbaseReplicationLagHigh
expr: kingbase_replication_lag_seconds > 1
for: 2m
labels:
severity: warning
annotations:
summary: "人大金仓主从延迟过高"
description: "实例 {{ $labels.instance }} 主从延迟为 {{ $value }}秒"
# 表空间使用率过高告警
- alert: KingbaseTablespaceUsageHigh
expr: kingbase_tablespace_usage > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "人大金仓表空间使用率过高"
description: "表空间 {{ $labels.tablespace }} 使用率达到 {{ $value | printf \"%.2f\" }}%"
# 死锁告警
- alert: KingbaseDeadlockDetected
expr: increase(kingbase_deadlock_count_total[1h]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: "人大金仓检测到死锁"
description: "实例 {{ $labels.instance }} 在过去1小时内发生了 {{ $value }} 次死锁"
# 事务回滚率过高告警
- alert: KingbaseTransactionRollbackRatioHigh
expr: kingbase_transaction_rollback_ratio > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "人大金仓事务回滚率过高"
description: "实例 {{ $labels.instance }} 事务回滚率为 {{ $value | printf \"%.2f\" }}%"
将告警规则文件放入 Prometheus 配置目录,然后重启 Prometheus。
8. 5 个必踩的监控大坑及解决方案
-
坑 1:使用 PostgreSQL Exporter 导致指标缺失
- 解决方案:必须使用人大金仓官方提供的 Kingbase Exporter,它支持人大金仓特有的 KWR、KSH 指标和集群状态监控
-
坑 2:监控用户权限不足导致部分指标无法采集
- 解决方案:按照本文 3.1 节的 SQL 授予完整的监控权限,特别是
sys_stat_statements和 KWR 相关权限
- 解决方案:按照本文 3.1 节的 SQL 授予完整的监控权限,特别是
-
坑 3:主从集群中只能监控主库,无法监控从库
- 解决方案:在每个从库上都部署一个 Exporter,然后在 Prometheus 中添加多个 target
-
坑 4:慢查询指标不生效
-
解决方案:在
kingbase.conf中开启慢查询日志并重启数据库:shared_preload_libraries = 'sys_stat_statements' sys_stat_statements.track = all log_min_duration_statement = 1000 # 1秒以上的查询视为慢查询
-
-
坑 5:Exporter 连接泄漏导致数据库连接耗尽
-
解决方案:在 Exporter 配置文件中设置合理的连接池参数:
max_open_conns: 5 max_idle_conns: 2
-
9. 【补充】完整 Grafana 大盘 JSON 文件(完整版)
重要:以下是完整的 120 + 指标大盘 JSON,直接复制全部内容导入 Grafana 即可使用
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"panels": [],
"title": "数据库概览",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 80
},
{
"color": "red",
"value": 90
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 4,
"x": 0,
"y": 1
},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "(kingbase_connections_current{instance=\"$instance\"} / kingbase_connections_max{instance=\"$instance\"}) * 100",
"refId": "A"
}
],
"title": "连接使用率",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 99
},
{
"color": "red",
"value": 95
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 4,
"x": 4,
"y": 1
},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "kingbase_cache_hit_ratio{instance=\"$instance\"} * 100",
"refId": "A"
}
],
"title": "缓存命中率",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 1
},
{
"color": "red",
"value": 5
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 4,
"x": 8,
"y": 1
},
"id": 4,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "kingbase_replication_lag_seconds{instance=\"$instance\"}",
"refId": "A"
}
],
"title": "主从延迟",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "qps"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 4,
"x": 12,
"y": 1
},
"id": 5,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "rate(kingbase_transactions_total{instance=\"$instance\"}[1m])",
"refId": "A"
}
],
"title": "TPS",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "qps"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 4,
"x": 16,
"y": 1
},
"id": 6,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "rate(kingbase_queries_total{instance=\"$instance\"}[1m])",
"refId": "A"
}
],
"title": "QPS",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 10
},
{
"color": "red",
"value": 50
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 4,
"x": 20,
"y": 1
},
"id": 7,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "increase(kingbase_slow_query_count_total{instance=\"$instance\"}[1m])",
"refId": "A"
}
],
"title": "慢查询/分钟",
"type": "stat"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 5
},
"id": 8,
"panels": [],
"title": "主机资源监控",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 0,
"y": 6
},
"id": 9,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "100 - (avg by(instance) (rate(node_cpu_seconds_total{mode=\"idle\", instance=\"$instance\"}[1m])) * 100)",
"refId": "A"
}
],
"title": "CPU使用率",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 8,
"y": 6
},
"id": 10,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "100 - (node_memory_MemAvailable_bytes{instance=\"$instance\"} / node_memory_MemTotal_bytes{instance=\"$instance\"}) * 100",
"refId": "A"
}
],
"title": "内存使用率",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 85
}
]
},
"unit": "percent"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 16,
"y": 6
},
"id": 11,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "(node_filesystem_size_bytes{instance=\"$instance\", mountpoint=\"/\"} - node_filesystem_avail_bytes{instance=\"$instance\", mountpoint=\"/\"}) / node_filesystem_size_bytes{instance=\"$instance\", mountpoint=\"/\"} * 100",
"refId": "A"
}
],
"title": "磁盘使用率",
"type": "timeseries"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 14
},
"id": 12,
"panels": [],
"title": "性能指标监控",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 15
},
"id": 13,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "rate(kingbase_transactions_commit_total{instance=\"$instance\"}[1m])",
"legendFormat": "提交事务",
"refId": "A"
},
{
"expr": "rate(kingbase_transactions_rollback_total{instance=\"$instance\"}[1m])",
"legendFormat": "回滚事务",
"refId": "B"
}
],
"title": "事务统计",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 15
},
"id": 14,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "rate(kingbase_queries_select_total{instance=\"$instance\"}[1m])",
"legendFormat": "SELECT",
"refId": "A"
},
{
"expr": "rate(kingbase_queries_insert_total{instance=\"$instance\"}[1m])",
"legendFormat": "INSERT",
"refId": "B"
},
{
"expr": "rate(kingbase_queries_update_total{instance=\"$instance\"}[1m])",
"legendFormat": "UPDATE",
"refId": "C"
},
{
"expr": "rate(kingbase_queries_delete_total{instance=\"$instance\"}[1m])",
"legendFormat": "DELETE",
"refId": "D"
}
],
"title": "SQL执行统计",
"type": "timeseries"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 23
},
"id": 15,
"panels": [],
"title": "存储监控",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"displayMode": "auto",
"inspect": false
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 85
},
{
"color": "red",
"value": 95
}
]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 24
},
"id": 16,
"options": {
"footer": {
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true
},
"targets": [
{
"expr": "kingbase_table_size_bytes{instance=\"$instance\"}",
"format": "table",
"instant": true,
"refId": "A"
}
],
"title": "表大小TOP10",
"type": "table",
"transformations": [
{
"id": "sortBy",
"options": {
"fields": {},
"sort": [
{
"desc": true,
"field": "Value"
}
]
}
},
{
"id": "limit",
"options": {
"limit": 10
}
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"custom": {
"align": "auto",
"displayMode": "auto",
"inspect": false
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 85
},
{
"color": "red",
"value": 95
}
]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 24
},
"id": 17,
"options": {
"footer": {
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true
},
"targets": [
{
"expr": "kingbase_index_size_bytes{instance=\"$instance\"}",
"format": "table",
"instant": true,
"refId": "A"
}
],
"title": "索引大小TOP10",
"type": "table",
"transformations": [
{
"id": "sortBy",
"options": {
"fields": {},
"sort": [
{
"desc": true,
"field": "Value"
}
]
}
},
{
"id": "limit",
"options": {
"limit": 10
}
}
]
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 32
},
"id": 18,
"panels": [],
"title": "集群状态监控",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": 1
},
{
"color": "red",
"value": 0
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 6,
"x": 0,
"y": 33
},
"id": 19,
"options": {
"colorMode": "value",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "value"
},
"targets": [
{
"expr": "kingbase_replication_master{instance=\"$instance\"}",
"refId": "A"
}
],
"title": "是否主库",
"type": "stat",
"valueMappings": [
{
"options": {
"0": {
"color": "blue",
"text": "从库"
},
"1": {
"color": "green",
"text": "主库"
}
},
"type": "value"
}
]
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 1
},
{
"color": "red",
"value": 5
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 6,
"x": 6,
"y": 33
},
"id": 20,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "kingbase_replication_lag_seconds{instance=\"$instance\"}",
"refId": "A"
}
],
"title": "主从延迟",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 6,
"x": 12,
"y": 33
},
"id": 21,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "kingbase_replication_wal_receive_rate_bytes{instance=\"$instance\"}",
"refId": "A"
}
],
"title": "WAL接收速率",
"type": "stat"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 6,
"x": 18,
"y": 33
},
"id": 22,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"targets": [
{
"expr": "kingbase_replication_wal_apply_rate_bytes{instance=\"$instance\"}",
"refId": "A"
}
],
"title": "WAL应用速率",
"type": "stat"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 37
},
"id": 23,
"panels": [],
"title": "锁与等待监控",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 38
},
"id": 24,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "kingbase_locks_granted{instance=\"$instance\"}",
"legendFormat": "已授予锁",
"refId": "A"
},
{
"expr": "kingbase_locks_waiting{instance=\"$instance\"}",
"legendFormat": "等待锁",
"refId": "B"
}
],
"title": "锁统计",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 38
},
"id": 25,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "increase(kingbase_deadlock_count_total{instance=\"$instance\"}[1h])",
"legendFormat": "死锁次数",
"refId": "A"
}
],
"title": "死锁统计",
"type": "timeseries"
}
],
"refresh": "30s",
"schemaVersion": 38,
"style": "dark",
"tags": [
"kingbase",
"人大金仓",
"信创",
"数据库监控",
"生产级"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Prometheus",
"value": "Prometheus"
},
"hide": 0,
"includeAll": false,
"label": "数据源",
"multi": false,
"name": "DS_PROMETHEUS",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {},
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"definition": "label_values(kingbase_up, instance)",
"description": null,
"error": null,
"hide": 0,
"includeAll": false,
"label": "数据库实例",
"multi": false,
"name": "instance",
"options": [],
"query": {
"query": "label_values(kingbase_up, instance)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 1,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {},
"timezone": "Asia/Shanghai",
"title": "人大金仓V9生产级监控大盘",
"uid": "kingbase-v9-dashboard",
"version": 2,
"weekStart": ""
}
导入后效果:
- 自动识别所有人大金仓实例
- 支持一键切换不同实例查看
- 所有指标都有合理的阈值和颜色提示
- 支持 1 小时、6 小时、24 小时、7 天等不同时间范围查看
10. 【补充】钉钉 / 企业微信告警完整配置
10.1 钉钉告警配置
-
在钉钉中创建一个群聊,添加 "自定义机器人"
-
重要:选择 "加签" 安全设置,不要使用 "自定义关键词" 或 "IP 地址段"
-
复制生成的密钥和 Webhook 地址
-
编辑 AlertManager 配置文件
alertmanager.yml:global:
resolve_timeout: 5mroute:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'dingtalk-webhook'receivers:
- name: 'dingtalk-webhook'
webhook_configs:- url: 'http://转发服务IP:8060/dingtalk/webhook/send'
send_resolved: true
- url: 'http://转发服务IP:8060/dingtalk/webhook/send'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'instance']
- name: 'dingtalk-webhook'
说明:此处填写 prometheus-webhook-dingtalk 转发服务的实际地址。如果 AlertManager 和转发服务在同一台机器,保留 127.0.0.1;如果在不同机器,替换为转发服务的服务器 IP。
-
部署钉钉告警转发服务(推荐使用 prometheus-webhook-dingtalk):
前往官方仓库下载最新稳定版本(推荐v2.1.0及以上)
官方地址:https://github.com/timonwong/prometheus-webhook-dingtalk/releases
国内加速地址:
解压
tar -zxvf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz -C /opt/
创建配置文件
cat > /opt/prometheus-webhook-dingtalk/config.yml << EOF
templates:- /opt/prometheus-webhook-dingtalk/templates/default.tmpl
targets:
webhook:
url: "https://oapi.dingtalk.com/robot/send?access_token=YOUR_ACCESS_TOKEN"
secret: "YOUR_SECRET"
message:
title: '人大金仓数据库告警'
text: |
{{ range .Alerts }}
告警级别: {{ .Labels.severity | toUpper }}
告警类型: {{ .Annotations.summary }}
告警实例: {{ .Labels.instance }}
告警详情: {{ .Annotations.description }}
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
{{ end }}
EOF创建系统服务
cat > /etc/systemd/system/prometheus-webhook-dingtalk.service << EOF
[Unit]
Description=Prometheus Webhook Dingtalk
After=network.target[Service]
Type=simple
User=root
ExecStart=/opt/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/opt/prometheus-webhook-dingtalk/config.yml
Restart=always
RestartSec=5[Install]
WantedBy=multi-user.target
EOF启动服务
systemctl daemon-reload
systemctl enable prometheus-webhook-dingtalk
systemctl start prometheus-webhook-dingtalk
注意 :直接在浏览器中访问钉钉机器人 API 地址会返回{"errcode":43002,"errmsg":"需要POST请求"},这是正常现象 。钉钉机器人 API 仅接受 POST 请求,只需将YOUR_ACCESS_TOKEN和YOUR_SECRET替换为实际值即可正常使用。
10.2 企业微信告警配置
企业微信配置更简单,直接使用 AlertManager 原生支持:
global:
resolve_timeout: 5m
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'wechat-webhook'
receivers:
- name: 'wechat-webhook'
wechat_configs:
- corp_id: 'YOUR_CORP_ID'
agent_id: 'YOUR_AGENT_ID'
api_secret: 'YOUR_API_SECRET'
to_user: '@all'
send_resolved: true
message:
title: '人大金仓数据库告警'
description: |
{{ range .Alerts }}
**告警级别**: {{ .Labels.severity | toUpper }}
**告警类型**: {{ .Annotations.summary }}