Spring AI 1.x 系列【50】可观测性:接入 Prometheus + Grafana

文章目录

  • 前言
  • [1. 生成 Prometheus 指标](#1. 生成 Prometheus 指标)
    • [1.1 引入依赖](#1.1 引入依赖)
    • [1.2 开放端点](#1.2 开放端点)
    • [1.3 查看指标](#1.3 查看指标)
  • [2. 集成 Prometheus + Grafana](#2. 集成 Prometheus + Grafana)
    • [2.1 安装](#2.1 安装)
    • [2.2 集成 Prometheus](#2.2 集成 Prometheus)
    • [2.3 集成 Grafana](#2.3 集成 Grafana)
      • [2.3.1 添加数据源](#2.3.1 添加数据源)
      • [2.3.2 新建仪表盘](#2.3.2 新建仪表盘)
      • [3.3.2 查看监控大盘](#3.3.2 查看监控大盘)

前言

Prometheus + Grafana 是云原生标准监控可视化组合,Prometheus 负责指标采集、存储、告警,Grafana 负责图表展示、大盘、统一视图,广泛用于 Java/Spring 服务、AI 应用、容器、向量库、中间件监控。

常用指标类型:

  • Counter:计数器(只增不减,如请求总数、错误数)
  • Gauge:仪表盘(可增可减,如内存、CPU、在线连接数)
  • Histogram:直方图(统计请求耗时分布、延迟区间)
  • Summary:摘要(百分位延迟)

适合场景:

  • Spring Boot / Spring AI 微服务、AI 应用、RAG 系统
  • Docker / 容器集群监控
  • 向量数据库、中间件、API 网关监控
  • 线上服务稳定性、性能、容量巡检与告警

典型链路:

1. 生成 Prometheus 指标

1.1 引入依赖

xml 复制代码
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
        <!-- https://mvnrepository.com/artifact/io.micrometer/micrometer-registry-prometheus -->
        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
        </dependency>

1.2 开放端点

yml 复制代码
management:
  endpoints:
    # Web端点的配置属性
    web:
      exposure:
        #  开放端点的ID集合(eg:['health','info','beans','env']),配置为"*"表示全部
        include: 'prometheus'

1.3 查看指标

访问:

java 复制代码
http://localhost:8080/actuator/prometheus

你会看到标准 Prometheus 指标,例如:

java 复制代码
gen_ai_client_token_usage_total{gen_ai_token_type="input",} 120.0
gen_ai_client_token_usage_total{gen_ai_token_type="output",} 80.0
gen_ai_client_operation_seconds_count{gen_ai_system="openai",} 10

2. 集成 Prometheus + Grafana

2.1 安装

使用 docker-compose 快速安装:

yml 复制代码
version: "3"
services:
  prometheus:
    image: prom/prometheus
    container_name: prometheus
    restart: always
    user: root
    ports:
      - "9090:9090"  # Prometheus Web UI 的端口映射
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml  # Prometheus 配置文件
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    networks:
      - monitoring

  grafana:
    image: grafana/grafana
    container_name: grafana
    restart: always
    user: root
    ports:
      - "3000:3000"  # Grafana Web UI 的端口映射
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin  # 设置 Grafana 管理员密码
    depends_on:
      - prometheus  # 等待 Prometheus 启动
    networks:
      - monitoring

networks:
  monitoring:
    driver: bridge

2.2 集成 Prometheus

编辑 prometheus.yml文件添加后端地址:

bash 复制代码
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'spring-boot'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['192.168.7.84:8080']

访问 Prometheus 查看健康状态:

2.3 集成 Grafana

2.3.1 添加数据源

点击左侧菜单栏,选择 Data source

右上角新增数据源:

选择 Prometheus

填写 Prometheus 地址后保存:

2.3.2 新建仪表盘

点击 Import

支持多种方式导入:

支持多种导入方式,在官方中没找到 Spring AI 相关的仪表盘,这里直接用 AI 写了一个:

json 复制代码
{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 1,
  "iteration": 1770000000000,
  "links": [],
  "panels": [
    {
      "collapsed": false,
      "datasource": null,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 20,
      "panels": [],
      "title": "AI大模型调用概览(智谱Zhipu)",
      "type": "row"
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_PROM}",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 6,
        "w": 8,
        "x": 0,
        "y": 1
      },
      "hiddenSeries": false,
      "id": 22,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "9.5.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "sum(rate(gen_ai_client_operation_seconds_count{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\"}[1m])) by(gen_ai_request_model)",
          "interval": "",
          "legendFormat": "{{gen_ai_request_model}} QPS",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "模型调用QPS(每分钟)",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": "次/分钟",
          "logBase": 1,
          "max": null,
          "min": "0",
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_PROM}",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 6,
        "w": 8,
        "x": 8,
        "y": 1
      },
      "hiddenSeries": false,
      "id": 24,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "9.5.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "histogram_quantile(0.50, sum(rate(gen_ai_client_operation_seconds_bucket{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\"}[1m])) by(le,gen_ai_request_model))",
          "interval": "",
          "legendFormat": "{{gen_ai_request_model}} P50",
          "refId": "A"
        },
        {
          "expr": "histogram_quantile(0.95, sum(rate(gen_ai_client_operation_seconds_bucket{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\"}[1m])) by(le,gen_ai_request_model))",
          "interval": "",
          "legendFormat": "{{gen_ai_request_model}} P95",
          "refId": "B"
        },
        {
          "expr": "histogram_quantile(0.99, sum(rate(gen_ai_client_operation_seconds_bucket{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\"}[1m])) by(le,gen_ai_request_model))",
          "interval": "",
          "legendFormat": "{{gen_ai_request_model}} P99",
          "refId": "C"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "调用耗时分位数(秒)",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "s",
          "label": "耗时(s)",
          "logBase": 1,
          "max": null,
          "min": "0",
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_PROM}",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 6,
        "w": 8,
        "x": 16,
        "y": 1
      },
      "hiddenSeries": false,
      "id": 26,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "9.5.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "gen_ai_client_operation_seconds_max{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\"}",
          "interval": "",
          "legendFormat": "{{gen_ai_request_model}} 最大耗时",
          "refId": "A"
        },
        {
          "expr": "gen_ai_client_operation_active_seconds_max{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\"}",
          "interval": "",
          "legendFormat": "{{gen_ai_request_model}} 活跃请求最大耗时",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "单次最大调用耗时(秒)",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "s",
          "label": "耗时(s)",
          "logBase": 1,
          "max": null,
          "min": "0",
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "collapsed": false,
      "datasource": null,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 7
      },
      "id": 28,
      "panels": [],
      "title": "Token消耗 & 并发 & 错误监控",
      "type": "row"
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_PROM}",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 6,
        "w": 12,
        "x": 0,
        "y": 8
      },
      "hiddenSeries": false,
      "id": 30,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "9.5.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": true,
      "steppedLine": false,
      "targets": [
        {
          "expr": "sum(rate(gen_ai_client_token_usage_total{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\",gen_ai_token_type=\"input\"}[1m])) by(gen_ai_token_type)",
          "interval": "",
          "legendFormat": "输入Token",
          "refId": "A"
        },
        {
          "expr": "sum(rate(gen_ai_client_token_usage_total{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\",gen_ai_token_type=\"output\"}[1m])) by(gen_ai_token_type)",
          "interval": "",
          "legendFormat": "输出Token",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "每分钟Token消耗(堆叠)",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": "token数量",
          "logBase": 1,
          "max": null,
          "min": "0",
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_PROM}",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 6,
        "w": 6,
        "x": 12,
        "y": 8
      },
      "hiddenSeries": false,
      "id": 32,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "9.5.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "sum(gen_ai_client_operation_active_seconds_count{gen_ai_system=~\"$system\",gen_ai_request_model=~\"$model\"})",
          "interval": "",
          "legendFormat": "当前活跃并发数",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "AI实时并发请求数",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": "并发数",
          "logBase": 1,
          "max": null,
          "min": "0",
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": "${DS_PROM}",
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 6,
        "w": 6,
        "x": 18,
        "y": 8
      },
      "hiddenSeries": false,
      "id": 34,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "9.5.0",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "sum(rate(gen_ai_client_operation_seconds_count{error!=\"none\",gen_ai_system=~\"$system\"}[1m])) / sum(rate(gen_ai_client_operation_seconds_count{gen_ai_system=~\"$system\"}[1m]))",
          "interval": "",
          "legendFormat": "全局错误率",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "模型调用错误率",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "percentunit",
          "label": "错误率",
          "logBase": 1,
          "max": "1",
          "min": "0",
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": "10s",
  "schemaVersion": 38,
  "style": "dark",
  "tags": ["SpringAI","ZhipuGLM","Prometheus"],
  "templating": {
    "list": [
      {
        "allValue": null,
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": "${DS_PROM}",
        "definition": "label_values(gen_ai_system)",
        "description": "AI厂商",
        "hide": 0,
        "includeAll": true,
        "label": "System厂商",
        "multi": false,
        "name": "system",
        "options": [],
        "query": {
          "query": "label_values(gen_ai_system)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": null,
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": "${DS_PROM}",
        "definition": "label_values(gen_ai_request_model)",
        "description": "模型名称",
        "hide": 0,
        "includeAll": true,
        "label": "Model模型",
        "multi": false,
        "name": "model",
        "options": [],
        "query": {
          "query": "label_values(gen_ai_request_model)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 1,
        "tagValuesQuery": "",
        "tags": [],
        "tagsQuery": "",
        "type": "query",
        "useTags": false
      },
      {
        "allValue": null,
        "current": {
          "selected": false,
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "hide": 0,
        "includeAll": false,
        "label": "数据源",
        "multi": false,
        "name": "DS_PROM",
        "options": [],
        "query": {
          "query": "prometheus",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": ["5s","10s","30s","1m","5m","15m","30m","1h","2h","1d"]
  },
  "timezone": "",
  "title": "SpringAI-ZhipuGLM大模型监控大盘",
  "uid": "springai-zhipu-monitor",
  "version": 1
}

下一步选择数据源:

3.3.2 查看监控大盘

相关推荐
梦梦代码精1 小时前
为什么这个开源的AI平台会火?有点东西。。。
人工智能·算法·机器学习·docker·开源
大模型真好玩1 小时前
智能体从入门到精通:6个必学GitHub开源项目
人工智能·agent·deepseek
源图客2 小时前
Aitoearn:OPC(一人公司)的AI内容智能体
人工智能·dreamweaver
逸模2 小时前
AI+BIM 重构连锁公装新范式 逸模打造数字化营建核心底座
大数据·人工智能·笔记·其他·信息可视化·重构
phltxy2 小时前
MCP 从协议到 Spring AI 实战
人工智能·spring·oracle
Sirius Wu3 小时前
Agentic端到端&分离式RL技术建设
人工智能·深度学习·机器学习·caffe
AI导出鸭PC端3 小时前
智谱清言怎么生成word文档?AI导出鸭终结乱码烦恼
人工智能·ai·c#·word·豆包·ai导出鸭
格桑阿sir3 小时前
17-大模型智能体开发工程师:深入学习Agent记忆系统
人工智能·记忆存储·记忆系统·agent记忆·嵌入式数据库·agent进化·记忆检索