微服务监控：Prometheus与Grafana实战

大家好，我是欧阳瑞（Rich Own）。今天想和大家聊聊微服务监控这个重要话题。作为一个全栈开发者，监控是保障系统稳定运行的关键。今天就来分享一下Prometheus和Grafana的实战经验。

为什么需要监控？

场景	说明
故障排查	快速定位问题
性能优化	发现性能瓶颈
容量规划	预测资源需求
安全审计	追踪异常行为

Prometheus简介

Prometheus是一个开源的监控系统，具有以下特点：

多维度数据模型
灵活的查询语言（PromQL）
高效的时间序列数据库
内置告警机制

安装Prometheus

bash 复制代码

# 使用Docker安装
docker run -d --name prometheus \
  -p 9090:9090 \
  -v /path/to/prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

配置文件

yaml 复制代码

# prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'api-service'
    static_configs:
      - targets: ['api-service:3000']
    metrics_path: '/metrics'

指标类型

python 复制代码

# 计数器（Counter）
http_requests_total = Counter('http_requests_total', 'Total HTTP requests')

# 仪表盘（Gauge）
memory_usage = Gauge('memory_usage_bytes', 'Memory usage in bytes')

# 直方图（Histogram）
request_duration = Histogram('request_duration_seconds', 'Request duration')

# 摘要（Summary）
response_size = Summary('response_size_bytes', 'Response size')

实战：监控API服务

python 复制代码

from flask import Flask
from prometheus_client import Counter, Histogram, generate_latest

app = Flask(__name__)

REQUESTS = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
DURATION = Histogram('request_duration_seconds', 'Request duration')

@app.route('/')
@DURATION.time()
def index():
    REQUESTS.labels(method='GET', endpoint='/').inc()
    return 'Hello World'

@app.route('/metrics')
def metrics():
    return generate_latest(), 200, {'Content-Type': 'text/plain'}

if __name__ == '__main__':
    app.run(port=3000)

Grafana配置

bash 复制代码

# 使用Docker安装Grafana
docker run -d --name grafana \
  -p 3000:3000 \
  -v /path/to/grafana-data:/var/lib/grafana \
  grafana/grafana

配置数据源

yaml 复制代码

# 添加Prometheus数据源
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    url: http://prometheus:9090
    access: proxy
    isDefault: true

创建仪表盘

json 复制代码

{
  "dashboard": {
    "id": null,
    "title": "API监控",
    "panels": [
      {
        "type": "graph",
        "title": "请求数",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{endpoint}}"
          }
        ]
      },
      {
        "type": "graph",
        "title": "请求延迟",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(request_duration_seconds_bucket[5m]))",
            "legendFormat": "P95"
          }
        ]
      }
    ]
  }
}

告警配置

yaml 复制代码

# alerting_rules.yml
groups:
  - name: api-alerts
    rules:
      - alert: HighErrorRate
        expr: rate(http_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }}% for API service"

      - alert: HighLatency
        expr: histogram_quantile(0.95, rate(request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High latency detected"
          description: "P95 latency is {{ $value }}s"

最佳实践

1. 指标命名规范

python 复制代码

# <metric_type>_<name>_<unit>
http_requests_total
memory_usage_bytes
request_duration_seconds

2. 标签管理

python 复制代码

REQUESTS.labels(
    method='GET',
    endpoint='/api/users',
    status_code='200'
).inc()

3. 可视化技巧

json 复制代码

{
  "panels": [
    {
      "type": "stat",
      "title": "平均延迟",
      "targets": [
        {
          "expr": "avg(request_duration_seconds)"
        }
      ]
    },
    {
      "type": "gauge",
      "title": "内存使用率",
      "targets": [
        {
          "expr": "memory_usage_bytes / memory_total_bytes * 100"
        }
      ]
    }
  ]
}

总结

Prometheus和Grafana是监控领域的黄金组合。通过合理的指标设计和可视化配置，可以全面监控系统的运行状态。

我的鬃狮蜥Hash对监控也有自己的理解------它总是时刻关注周围环境的变化，这也许就是自然界的"监控系统"吧！

如果你对监控感兴趣，欢迎留言交流！我是欧阳瑞，极客之路，永无止境！

技术栈：Prometheus · Grafana · 监控