目标:以最小指标集将 Web Vitals 数据采集、上报、存储、并在 Prometheus + Grafana 中实现"可见即所得"的展示,不扩散冗余指标。
一、整体架构
bash
浏览器
└─ web-vitals SDK
└─ Navigator.sendBeacon / fetch
└─ [POST /metrics/vitals]
└─ Node.js / Go 上报服务
├─ 聚合 → prom-client(Histogram / Counter / Gauge)
└─ /metrics(Prometheus scrape endpoint)
└─ Prometheus → Grafana Dashboard
二、前端采集(web-vitals SDK)
2.1 只采集核心 Web Vitals
| 指标 | 含义 | 类型建议 |
|---|---|---|
| LCP | 最大内容绘制 | Histogram(分位数) |
| FID / INP | 首次输入延迟 / 交互响应 | Histogram |
| CLS | 累计布局偏移 | Histogram |
| FCP | 首次内容绘制 | Histogram |
| TTFB | 首字节时间 | Histogram |
不采集:自定义埋点、资源加载明细、长任务列表等------避免指标爆炸。
2.2 采集代码示例
typescript
// vitals-reporter.ts
import { onLCP, onFID, onCLS, onFCP, onTTFB, onINP } from 'web-vitals';
interface VitalPayload {
name: string; // 'LCP' | 'FID' | 'CLS' | 'FCP' | 'TTFB' | 'INP'
value: number; // 原始值(ms 或 score)
rating: string; // 'good' | 'needs-improvement' | 'poor'
page: string; // location.pathname(不带查询参数)
}
const ENDPOINT = '/metrics/vitals';
function report(payload: VitalPayload) {
const body = JSON.stringify(payload);
// 优先 sendBeacon(页面卸载时不丢失)
if (navigator.sendBeacon) {
navigator.sendBeacon(ENDPOINT, new Blob([body], { type: 'application/json' }));
} else {
fetch(ENDPOINT, { method: 'POST', body, keepalive: true,
headers: { 'Content-Type': 'application/json' } });
}
}
function buildPayload(metric: any): VitalPayload {
return {
name: metric.name,
value: metric.value,
rating: metric.rating,
page: location.pathname,
};
}
onLCP(m => report(buildPayload(m)));
onFID(m => report(buildPayload(m)));
onINP(m => report(buildPayload(m)));
onCLS(m => report(buildPayload(m)));
onFCP(m => report(buildPayload(m)));
onTTFB(m => report(buildPayload(m)));
关键原则:
page只传pathname,不传完整 URL,防止高基数标签炸掉 Prometheus。- 每个指标只上报最终值(web-vitals 默认行为),不上报中间值。
- 不附加用户 ID、Session ID 等高基数维度。
三、后端处理(Node.js + prom-client)
3.1 指标定义(仅 6 个 Histogram + 1 个 Counter)
php
// metrics.ts
import client from 'prom-client';
const register = new client.Registry();
client.collectDefaultMetrics({ register }); // 可选:CPU/内存等默认指标
// --- Histogram:用于分位数 p50 / p75 / p90 / p95 / p99 ---
// Bucket 设计原则:覆盖 Good / NI / Poor 阈值的边界点
const TIMING_BUCKETS = [100, 200, 300, 500, 800, 1000, 1500, 2000, 3000, 4000, 5000, 8000, 10000];
const CLS_BUCKETS = [0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 1.0];
export const vitalsHistogram = new client.Histogram({
name: 'web_vitals_duration_ms',
help: 'Web Vitals timing metrics (LCP/FID/INP/FCP/TTFB in ms; CLS * 1000)',
labelNames: ['metric', 'page', 'rating'] as const,
buckets: TIMING_BUCKETS,
registers: [register],
});
export const clsHistogram = new client.Histogram({
name: 'web_vitals_cls_score',
help: 'Cumulative Layout Shift score',
labelNames: ['page', 'rating'] as const,
buckets: CLS_BUCKETS,
registers: [register],
});
// Counter:统计各评级的页面加载次数(Good / NI / Poor)
export const vitalsRatingCounter = new client.Counter({
name: 'web_vitals_rating_total',
help: 'Count of Web Vitals reports by metric and rating',
labelNames: ['metric', 'page', 'rating'] as const,
registers: [register],
});
export { register };
为什么用 Histogram 而非 Summary? Histogram 在 Prometheus 服务端聚合分位数(
histogram_quantile),可跨实例合并;Summary 在客户端计算,无法合并多实例数据。
3.2 上报接口
javascript
// server.ts
import express from 'express';
import { vitalsHistogram, clsHistogram, vitalsRatingCounter, register } from './metrics';
const app = express();
app.use(express.json({ limit: '10kb' }));
const PAGE_ALLOWLIST = /^/[a-zA-Z0-9-_/]{0,100}$/; // 白名单,防注入
app.post('/metrics/vitals', (req, res) => {
const { name, value, rating, page } = req.body;
// 基本校验
if (!['LCP','FID','INP','CLS','FCP','TTFB'].includes(name)) return res.sendStatus(400);
if (!['good','needs-improvement','poor'].includes(rating)) return res.sendStatus(400);
if (typeof value !== 'number' || value < 0 || value > 60000) return res.sendStatus(400);
// 清洗 page:只保留路径,去掉查询参数和锚点
const safePage = PAGE_ALLOWLIST.test(page) ? page : '/unknown';
if (name === 'CLS') {
clsHistogram.observe({ page: safePage, rating }, value);
} else {
vitalsHistogram.observe({ metric: name, page: safePage, rating }, value);
}
vitalsRatingCounter.inc({ metric: name, page: safePage, rating });
res.sendStatus(204);
});
// Prometheus scrape endpoint
app.get('/metrics', async (_req, res) => {
res.set('Content-Type', register.contentType);
res.end(await register.metrics());
});
app.listen(3000);
3.3 高基数防护
| 防护措施 | 说明 |
|---|---|
| page 白名单正则 | 防止随机路径生成海量 label 值 |
| page 路径归一化 | /product/123 → /product/:id(可选,用路由映射表) |
| 不暴露 user/session | 绝不作为 label |
| 限流 | 单 IP 限流,防刷接口污染指标 |
四、Prometheus 配置
yaml
# prometheus.yml
scrape_configs:
- job_name: 'web-vitals-backend'
static_configs:
- targets: ['your-backend:3000']
scrape_interval: 15s
五、Grafana 展示:可见即所得
5.1 推荐展示面板(共 8 个 Panel)
Panel 1:核心指标分位数总览(Stat 或 Gauge)
ini
# LCP p75(Google 推荐的评估分位数)
histogram_quantile(0.75,
sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[5m])) by (le)
)
对 LCP / INP / FCP / TTFB 各出一个 Stat Panel,阈值颜色:
- 绿色(Good):LCP < 2500ms、INP < 200ms、FCP < 1800ms、TTFB < 800ms
- 黄色(NI)
- 红色(Poor)
Panel 2:LCP 分位数趋势(Time Series)
ini
# p50 / p75 / p95
histogram_quantile(0.50, sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[5m])) by (le))
histogram_quantile(0.75, sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[5m])) by (le))
histogram_quantile(0.95, sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[5m])) by (le))
Panel 3:INP 分位数趋势(Time Series)
ini
histogram_quantile(0.75, sum(rate(web_vitals_duration_ms_bucket{metric="INP"}[5m])) by (le))
Panel 4:CLS p75 趋势(Time Series)
scss
histogram_quantile(0.75, sum(rate(web_vitals_cls_score_bucket[5m])) by (le))
Panel 5:各指标 Good 率(Bar Gauge 或 Pie)
ini
# LCP Good 率
sum(rate(web_vitals_rating_total{metric="LCP", rating="good"}[1h]))
/
sum(rate(web_vitals_rating_total{metric="LCP"}[1h]))
对 LCP / INP / CLS 各出一条,直观反映"用户体验达标率"。
Panel 6:按页面分组的 LCP p75(Bar Chart)
ini
histogram_quantile(0.75,
sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[30m])) by (le, page)
)
快速定位哪个页面是性能瓶颈。
Panel 7:上报量 / 错误率(Time Series)
scss
# 每分钟上报次数
sum(rate(web_vitals_rating_total[1m])) by (metric)
监控数据采集本身是否正常。
Panel 8:TTFB p75 趋势(Time Series)
ini
histogram_quantile(0.75, sum(rate(web_vitals_duration_ms_bucket{metric="TTFB"}[5m])) by (le))
反映服务端响应速度,与后端性能关联。
5.2 告警规则示例
yaml
# alerts.yml
groups:
- name: web-vitals
rules:
- alert: LCP_P75_Too_High
expr: |
histogram_quantile(0.75,
sum(rate(web_vitals_duration_ms_bucket{metric="LCP"}[10m])) by (le)
) > 4000
for: 5m
labels:
severity: warning
annotations:
summary: "LCP p75 超过 4s,用户体验差"
- alert: INP_P75_Too_High
expr: |
histogram_quantile(0.75,
sum(rate(web_vitals_duration_ms_bucket{metric="INP"}[10m])) by (le)
) > 500
for: 5m
labels:
severity: warning
annotations:
summary: "INP p75 超过 500ms,页面交互迟钝"
- alert: Good_Rate_LCP_Drop
expr: |
sum(rate(web_vitals_rating_total{metric="LCP",rating="good"}[30m]))
/ sum(rate(web_vitals_rating_total{metric="LCP"}[30m])) < 0.5
for: 10m
labels:
severity: critical
annotations:
summary: "LCP Good 率低于 50%,大量用户体验差"
六、指标清单汇总
| 指标名 | 类型 | Labels | 用途 |
|---|---|---|---|
web_vitals_duration_ms |
Histogram | metric, page, rating | 计算 LCP/FID/INP/FCP/TTFB 分位数 |
web_vitals_cls_score |
Histogram | page, rating | 计算 CLS 分位数 |
web_vitals_rating_total |
Counter | metric, page, rating | 计算 Good/NI/Poor 分布率 |
共 3 个指标,配合 label 维度满足所有展示需求,无冗余。
七、依赖版本参考
| 组件 | 版本 |
|---|---|
| web-vitals | ^4.x |
| prom-client(Node.js) | ^15.x |
| Prometheus | ^2.45 |
| Grafana | ^10.x |
八、实施步骤
- 前端 :
npm install web-vitals,在应用入口引入vitals-reporter.ts - 后端 :部署上报服务,暴露
/metrics/vitals(POST)和/metrics(GET) - Prometheus:添加 scrape job,15s 采集间隔
- Grafana:导入上述 8 个 Panel,设置阈值颜色映射
- 告警:配置 Alertmanager 接收 Web Vitals 告警,对接钉钉 / Slack
方案遵循 Google Web Vitals 评估标准(2024),以 p75 作为主要健康评估分位数。