简介

最近在学习Golang中如何集成Prometheus,通过Prometheus不同的指标实现服务的可观测性,翻阅官方文档和相关库的源代码后最终实现了服务观测功能,借着编写文章的方式梳理一下学习过程。

Prometheus的各种指标

Prometheus中有着不同的指标类型,适用于不同场景的服务观测,具体的指标类型如下:

Counter计数器

Counter/kaʊntər/计数器是一种累积指标，表示单个单调递增计数器，其值只能在重新启动时增加或重置为零。例如，您可以使用计数器来表示所服务的请求数、已完成的任务数或错误数。----以上来自官方文档

根据上述官方关于Counter的说明,个人理解Counter的功能就是一个计数器,它只增不减通常可以使用它来记录某个数据是否增长,比如记录整个程序的生命周期内共发生了多少次40x或50x的接口错误,当然使用Histogram记录更直观。

计数器的客户端库使用文档： Go

Gauge

Gauge/ɡeɪdʒ/中文翻译是测量仪,顾名思义它是一种度量，表示可以任意增减的单个数值。

Gauge通常用于测量值，例如温度或当前内存使用情况，但也用于可以上下波动的"计数"，例如并发请求的数量。----以上来自官方文档

Gauge与Counter不同,它可递增也可递减

举个例子：

在一个请求过来时使用Gauge加一,当处理完程序后将Gauge减一就可以在Prometheus中观测到一张折线图,从折线图中可以发现从什么时间点业务请求开始增长,什么时间点的请求最高,到什么时间点请求降低。

Gauge测量仪的客户端库使用文档： Go

Histogram

Histogram/ˈhɪstəɡræm/中文翻译过来就是柱状图或者叫直方图,直方图对观察结果进行采样（通常是请求持续时间或响应大小等），并将它们计入可配置的存储桶中。它还提供所有观察值的总和。

观察桶的累积计数器，暴露为<basename>_bucket{le="<upper inclusive bound>"}
所有观察值的总和，暴露为<basename>_sum
已观察到的事件计数，暴露为<basename>_count（<basename>_bucket{le="+Inf"}与上述相同）
----以上来自官方文档

Histogram具有存储桶特性,可以将需要的数据存放在桶中,Histogram的客户端库使用文档： Go

Summary

Summary/ˈsʌməri/中文翻译为摘要或者概要,与histogram 类似，摘要对观察结果进行采样（通常是请求持续时间和响应大小等）。虽然它还提供了观察总数和所有观察值的总和，但它计算了滑动时间窗口上的可配置分位数。 ----以上来自官方文档

Summary客户端具有Objectives的特性,配置其参数可以观测99线999线,一般没什么特殊需求,用以下参数即可

Go 复制代码

Objectives:map[flot64]flot64{
0.5:0.01,
0.75:0.01,
0.9:0.01,
0.99:0.001,
0.999:0.0001,
}

http服务观测

说完Prometheus的各种指标后,我们就得利用各种指标来进行服务得观测,在程序中一般针对接口响应处理时间的观测较多,可是我们如何针对http进行观测,或者说如何在http服务中使用Prometheus埋点呢？

在这里各位应该都想到了http框架的中间件,是的我们可以写一个中间件专门记录响应时间并使用Prometheus的指标进行埋点,那么说到中间件这里不得不介绍一下Gin的ctx.next()和ctx.abort()

http中间件

当我们使用Gin或任何http框架都会用到其中间件的特性,中间件其实就是一个HandlerFunc,而http框架在注册路由时需要两个参数,一个是服务端路由的path,另一个是HandlerFunc...,也就是说http框架在匹配路由时底层会遍历执行HandlerFunc切片中的函数,因此我们可以在真正处理响应前使程序先执行一些其他的操作,这个特性的巧妙之处就是让处理形成了一条函数链：

Gin的ctx.next()和ctx.abort()

中间件的巧妙之处并不仅仅是使能接口程序形成一条函数链,它还具备洋葱模型特性,Gin使用ctx.next()和ctx.abort()将函数链变成函数圈:
ctx.next()

ctx.abort()

数据库观测

一个应用程序的接口调用,必然会使用到数据库的增删改查,因此针对数据库的观测也很重要,那么我们怎么使用Prometheus观测数据库的各项指标呢？

Golang的Gorm框架中我们主要使用*gorm.DB进行数据库的CRUD,而强大的Gorm为我们提供了插件功能,我们可以使用*gorm.DB.use()方法将Gorm自带的Prometheus插件注册进去,官方也为我们提供了完整的使用方法,以下是Gorm的Prometheus插件库使用文档,以及Gorm插件的使用文档

Gorm的Callback函数

为DB注册Prometheus插件后,Prometheus即可实时观测数据库的各项指标了,但是如果我们想观测到程序CRUD耗时那么该如何做到呢？Gorm的Prometheus插件是无法为我们直接观测到程序的CRUD耗时,因此我们就要使用到Gorm为我们提供的另一个强大的功能,Callback/ˈkɔːlbæk/函数,即回调函数

gorm.io/callbacks(与插件一个文档页面)

Gorm允许我们在通过gorm.open()获得DB(*gorm.DB)后使用DB.Callback().Create()/Query()/Delete()加上After()或者Before()再使用Register()注册一个回调

例如我们想在任何查询语句开始前做某件事并在任何查询语句结束前做某事,我们就可以使用如下代码:

Go 复制代码

db:=gorm.Open(mysql.Open(),&gorm.Config{})
db.Callback().Query().Before("*").Register("DoSomething Name",DoSomething)
db.Callback().Query().After("*").Register("DoSomething Name",DoSomething)

动手实现

说完上述理论后,相信大家肯定对Prometheus的各项指标应用场景和http和数据库什么时机进行Prometheus埋点有了初步的认识,那么我们来使用Golang的Gin和Gorm库动手实现Prometheus的服务观测。

部署Prometheus

部署Prometheus不难,可以使用docker或直接安装不同平台的程序,下载链接如下:

Prometheus download

安装后,配置Prometheus.yaml模板

yaml 复制代码

# my global config
global:
  scrape_interval: 5s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 5s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "node"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["127.0.0.1:8081"]

其中config中有一个job它的目标为127.0.0.1:8081就是我们的服务为Prometheus开放的端口

使用Prometheus的Golang库

这里使用github.com/prometheus/...库使能程序集成prometheus

为Prometheus开放端口

在Prometheus的yaml配置文件中,我们有一个任务是监测127.0.0.1:8081的指标,因此我们在代码中也需要开放对应端口,供Prometheus使用

Go 复制代码

import (  
"github.com/prometheus/client_golang/prometheus/promhttp"  
"net/http"  
)

func main() {  
    go func() {  
        http.Handle("/metrics", promhttp.Handler())  
        http.ListenAndServe(":8081", nil)  
    }()  
    ...  
}

上述代码比较简单易懂,就是使用prometheus的客户端库和http标准库在配合协程开启了一个8081端口的服务,至此prometheus即可针对Go程序进行基本的监控。

解读prometheus客户端库各指标

在开始针对http和数据库进行监控前,我们有必要学习一下Golang prometheus客户端中如何使用prometheus的各个指标

Counter

Golang prometheus客户端中的Counter是个接口,里面除了组合了Collector和Metric接口外,还定义了递增Inc()方法。

Counter接口又是由谁实现的呢？继续往下翻源码我们可以看到有个叫做counter的结构体实现了该接口,该结构体又是以小写开头,那么肯定不能由我们自己使用,继续往下翻源码我们就会看到NewCounter的构造函数该函数返回了Counter接口，而counter结构体实现了该接口,因此该函数体内最终会返回构造好的counter接口体出去

NewCounter

我们先看一下NewCounter的源码:

Go 复制代码

func NewCounter(opts CounterOpts) Counter {  
    desc := NewDesc(  
    BuildFQName(opts.Namespace, opts.Subsystem, opts.Name),  
    opts.Help,  
    nil,  
    opts.ConstLabels,  
)  
    if opts.now == nil {  
    opts.now = time.Now  
}  
    result := &counter{desc: desc, labelPairs: desc.constLabelPairs, now: opts.now}  
    result.init(result) // Init self-collection.  
    result.createdTs = timestamppb.New(opts.now())  
    return result  
}

NewCounter需要接收一个类型为CounterOpts的配置对象,我们往下翻源码就能看到类型为Opts的CounterOpts,那么问题又来了Opts是啥？不急,继续搜源码我们会发现Opts的结构体：

Go 复制代码

type Opts struct {  
// Namespace, Subsystem, and Name are components of the fully-qualified  
// name of the Metric (created by joining these components with  
// "_"). Only Name is mandatory, the others merely help structuring the  
// name. Note that the fully-qualified name of the metric must be a  
// valid Prometheus metric name.  
Namespace string  
Subsystem string  
Name string  
  
// Help provides information about this metric.  
//  
// Metrics with the same fully-qualified name must have the same Help  
// string.  
Help string  
  
// ConstLabels are used to attach fixed labels to this metric. Metrics  
// with the same fully-qualified name must have the same label names in  
// their ConstLabels.  
//  
// ConstLabels are only used rarely. In particular, do not use them to  
// attach the same labels to all your metrics. Those use cases are  
// better covered by target labels set by the scraping Prometheus  
// server, or by one specific metric (e.g. a build_info or a  
// machine_role metric). See also  
// https://prometheus.io/docs/instrumenting/writing_exporters/#target-labels-not-static-scraped-labels  
ConstLabels Labels  
  
// now is for testing purposes, by default it's time.Now.  
now func() time.Time  
}

在Opts结构体中有几个关键的参数NameSpace,Subsystem,Name,Help,ConstLabels这几个参数不仅Counter指标的配置项存在,其他指标在使用各自的构造函数时也需要传递上述参数,由此我们可以大胆的猜测一下其他指标的配置项应该也是Opts类型,并应该在此之上扩展了一些自有的属性

那么NameSpace,Subsystem,Name,Help,ConstLabels又是干嘛的呢？

我们先打开Prometheus的dashboard随便搜索一下当前Go的各种指标数据:

我们可以看到目前有很多go开头的指标数据,其实这个go就对应着Opts结构体中的NameSpace,那么我们顺着往下推应该也能猜出来Subsystem,Name是干啥的,对没错,在Prometheus中指标数据的命名是根据NameSpace_Subsystem_Name规范的,而Help类似于description用于描述这个指标数据是用来检测干啥的。

那ConstLabels是什么呢？
ConstLabels是用于标志指标数据的常量标签,我们可以拿它来记录这个指标数据是来自哪个环境,生产环境还是测试环境,程序版本是多少。

NewCounterVec

Prometheus中各项指标的构造函数都分为两种,还有一种则是NewxxxVec,以Counter为例它也具有这名为NewCounterVec的构造函数。

NewCounterVec是啥？NewxxxVec又用来干啥的？为啥有了NewCounter还要有NewCounterVec???我一开始查看源码时也一脸黑人问号,别急我们继续往下看。

先看下NewCounterVec的源码

Go 复制代码

func NewCounterVec(opts CounterOpts, labelNames []string) *CounterVec {  
    return V2.NewCounterVec(CounterVecOpts{  
    CounterOpts: opts,  
    VariableLabels: UnconstrainedLabels(labelNames),  
  })  
}

我们可以看到NewCounterVec接收两个参数,一个依旧是CounterOpts配置项,另一个看着像是标签,这个标签最后会是VariableLabels的值,那这个VariableLabels是啥呢？

我们可以将NewCounter返回的Counter理解为单个实例,而NewCounterVec返回的*CounterVec是一个用于存放多个Counter的实例,而VariableLabels则是用来区分容器内不同的实例

例如我要观测程序生命周期内,错误码404和500共产生多少次,并在一张图表内查看,我们则可以使用NewCounterVec同时记录两种错误码的次数,并用VariableLabels区别开来错误码404和500的次数。

Gauge

梳理完了Counter的数据结构和方法,我们再往下看剩下的指标就会简单一些,Gauge测量仪在Golang中也是一个接口,我们需要使用NewGauge或者NewGaugeVec

具体源码如下,这里不再赘述,我们可以采用上述Counter的思路自己梳理一下该指标
Gauge interface

Go 复制代码

type Gauge interface {  
Metric  
Collector  
  
// Set sets the Gauge to an arbitrary value.  
Set(float64)  
// Inc increments the Gauge by 1. Use Add to increment it by arbitrary  
// values.  
Inc()  
// Dec decrements the Gauge by 1. Use Sub to decrement it by arbitrary  
// values.  
Dec()  
// Add adds the given value to the Gauge. (The value can be negative,  
// resulting in a decrease of the Gauge.)  
Add(float64)  
// Sub subtracts the given value from the Gauge. (The value can be  
// negative, resulting in an increase of the Gauge.)  
Sub(float64)  
  
// SetToCurrentTime sets the Gauge to the current Unix time in seconds.  
SetToCurrentTime()  
}

Gauge struct

Go 复制代码

type gauge struct {  
// valBits contains the bits of the represented float64 value. It has  
// to go first in the struct to guarantee alignment for atomic  
// operations. http://golang.org/pkg/sync/atomic/#pkg-note-BUG  
valBits uint64  
  
selfCollector  
  
desc *Desc  
labelPairs []*dto.LabelPair  
}

Gauge NewGauge

Go 复制代码

func NewGauge(opts GaugeOpts) Gauge {  
    desc := NewDesc(  
        BuildFQName(opts.Namespace, opts.Subsystem, opts.Name),  
        opts.Help,  
        nil,  
        opts.ConstLabels,  
        )  
   result := &gauge{desc: desc, labelPairs: desc.constLabelPairs}  
   result.init(result) // Init self-collection.  
   return result  
}

Gauge NewGaugeVec

Go 复制代码

func NewGaugeVec(opts GaugeOpts, labelNames []string) *GaugeVec {  
return V2.NewGaugeVec(GaugeVecOpts{  
    GaugeOpts: opts,  
    VariableLabels: UnconstrainedLabels(labelNames),  
    })  
}

Histogram

Histogram的构造函数配置选项中,我们除了之前讲到的NameSpace,Subsystem,Name,Help,ConstLabels外,还有一个叫做Bucket的属性,该属性是Histogram自己的特性,在桶中定义区间用于存放不同的数据

比如我需要一个柱状图用于观测不同结构的请求次数,那么我可以定义一个Bucket定义一个0-100区间,100-200区间,200-300区间,然后使用NewHistogramVec传入不同的VariableLabels用来区分不同实例即可。

具体源码如下,其他就不再赘述,我们可以采用上述Counter的思路自己梳理一下该指标

Histogram interface

Go 复制代码

type Histogram interface {  
    Metric  
    Collector  
  
// Observe adds a single observation to the histogram. Observations are  
// usually positive or zero. Negative observations are accepted but  
// prevent current versions of Prometheus from properly detecting  
// counter resets in the sum of observations. (The experimental Native  
// Histograms handle negative observations properly.) See  
// https://prometheus.io/docs/practices/histograms/#count-and-sum-of-observations  
// for details.  
    Observe(float64)  
}

Histogram struct

Go 复制代码

type HistogramOpts struct {  
// Namespace, Subsystem, and Name are components of the fully-qualified  
// name of the Histogram (created by joining these components with  
// "_"). Only Name is mandatory, the others merely help structuring the  
// name. Note that the fully-qualified name of the Histogram must be a  
// valid Prometheus metric name.  
Namespace string  
Subsystem string  
Name string  
  
// Help provides information about this Histogram.  
//  
// Metrics with the same fully-qualified name must have the same Help  
// string.  
Help string  
  
// ConstLabels are used to attach fixed labels to this metric. Metrics  
// with the same fully-qualified name must have the same label names in  
// their ConstLabels.  
//  
// ConstLabels are only used rarely. In particular, do not use them to  
// attach the same labels to all your metrics. Those use cases are  
// better covered by target labels set by the scraping Prometheus  
// server, or by one specific metric (e.g. a build_info or a  
// machine_role metric). See also  
// https://prometheus.io/docs/instrumenting/writing_exporters/#target-labels-not-static-scraped-labels  
ConstLabels Labels  
  
// Buckets defines the buckets into which observations are counted. Each  
// element in the slice is the upper inclusive bound of a bucket. The  
// values must be sorted in strictly increasing order. There is no need  
// to add a highest bucket with +Inf bound, it will be added  
// implicitly. If Buckets is left as nil or set to a slice of length  
// zero, it is replaced by default buckets. The default buckets are  
// DefBuckets if no buckets for a native histogram (see below) are used,  
// otherwise the default is no buckets. (In other words, if you want to  
// use both regular buckets and buckets for a native histogram, you have  
// to define the regular buckets here explicitly.)  
Buckets []float64  
  
// If NativeHistogramBucketFactor is greater than one, so-called sparse  
// buckets are used (in addition to the regular buckets, if defined  
// above). A Histogram with sparse buckets will be ingested as a Native  
// Histogram by a Prometheus server with that feature enabled (requires  
// Prometheus v2.40+). Sparse buckets are exponential buckets covering  
// the whole float64 range (with the exception of the "zero" bucket, see  
// NativeHistogramZeroThreshold below). From any one bucket to the next,  
// the width of the bucket grows by a constant  
// factor. NativeHistogramBucketFactor provides an upper bound for this  
// factor (exception see below). The smaller  
// NativeHistogramBucketFactor, the more buckets will be used and thus  
// the more costly the histogram will become. A generally good trade-off  
// between cost and accuracy is a value of 1.1 (each bucket is at most  
// 10% wider than the previous one), which will result in each power of  
// two divided into 8 buckets (e.g. there will be 8 buckets between 1  
// and 2, same as between 2 and 4, and 4 and 8, etc.).  
//  
// Details about the actually used factor: The factor is calculated as  
// 2^(2^-n), where n is an integer number between (and including) -4 and  
// 8. n is chosen so that the resulting factor is the largest that is  
// still smaller or equal to NativeHistogramBucketFactor. Note that the  
// smallest possible factor is therefore approx. 1.00271 (i.e. 2^(2^-8)  
// ). If NativeHistogramBucketFactor is greater than 1 but smaller than  
// 2^(2^-8), then the actually used factor is still 2^(2^-8) even though  
// it is larger than the provided NativeHistogramBucketFactor.  
//  
// NOTE: Native Histograms are still an experimental feature. Their  
// behavior might still change without a major version  
// bump. Subsequently, all NativeHistogram... options here might still  
// change their behavior or name (or might completely disappear) without  
// a major version bump.  
NativeHistogramBucketFactor float64  
// All observations with an absolute value of less or equal  
// NativeHistogramZeroThreshold are accumulated into a "zero" bucket.  
// For best results, this should be close to a bucket boundary. This is  
// usually the case if picking a power of two. If  
// NativeHistogramZeroThreshold is left at zero,  
// DefNativeHistogramZeroThreshold is used as the threshold. To  
// configure a zero bucket with an actual threshold of zero (i.e. only  
// observations of precisely zero will go into the zero bucket), set  
// NativeHistogramZeroThreshold to the NativeHistogramZeroThresholdZero  
// constant (or any negative float value).  
NativeHistogramZeroThreshold float64  
  
// The remaining fields define a strategy to limit the number of  
// populated sparse buckets. If NativeHistogramMaxBucketNumber is left  
// at zero, the number of buckets is not limited. (Note that this might  
// lead to unbounded memory consumption if the values observed by the  
// Histogram are sufficiently wide-spread. In particular, this could be  
// used as a DoS attack vector. Where the observed values depend on  
// external inputs, it is highly recommended to set a  
// NativeHistogramMaxBucketNumber.) Once the set  
// NativeHistogramMaxBucketNumber is exceeded, the following strategy is  
// enacted:  
// - First, if the last reset (or the creation) of the histogram is at  
// least NativeHistogramMinResetDuration ago, then the whole  
// histogram is reset to its initial state (including regular  
// buckets).  
// - If less time has passed, or if NativeHistogramMinResetDuration is  
// zero, no reset is performed. Instead, the zero threshold is  
// increased sufficiently to reduce the number of buckets to or below  
// NativeHistogramMaxBucketNumber, but not to more than  
// NativeHistogramMaxZeroThreshold. Thus, if  
// NativeHistogramMaxZeroThreshold is already at or below the current  
// zero threshold, nothing happens at this step.  
// - After that, if the number of buckets still exceeds  
// NativeHistogramMaxBucketNumber, the resolution of the histogram is  
// reduced by doubling the width of the sparse buckets (up to a  
// growth factor between one bucket to the next of 2^(2^4) = 65536,  
// see above).  
// - Any increased zero threshold or reduced resolution is reset back  
// to their original values once NativeHistogramMinResetDuration has  
// passed (since the last reset or the creation of the histogram).  
NativeHistogramMaxBucketNumber uint32  
NativeHistogramMinResetDuration time.Duration  
NativeHistogramMaxZeroThreshold float64  
  
// now is for testing purposes, by default it's time.Now.  
now func() time.Time  
  
// afterFunc is for testing purposes, by default it's time.AfterFunc.  
afterFunc func(time.Duration, func()) *time.Timer  
}

Histogram NewHistogram

Go 复制代码

func NewHistogram(opts HistogramOpts) Histogram {  
    return newHistogram(  
        NewDesc(  
            BuildFQName(opts.Namespace, opts.Subsystem, opts.Name),  
            opts.Help,  
            nil,  
            opts.ConstLabels,  
            ),  
     opts,  
    )  
}

Histogram NewHistogramVec

Go 复制代码

// NewHistogramVec creates a new HistogramVec based on the provided HistogramOpts and  
// partitioned by the given label names.  
func NewHistogramVec(opts HistogramOpts, labelNames []string) *HistogramVec {  
    return V2.NewHistogramVec(HistogramVecOpts{  
        HistogramOpts: opts,  
        VariableLabels: UnconstrainedLabels(labelNames),  
})  
}

summary

summary的构造函数配置选项中,我们除了之前讲到的NameSpace,Subsystem,Name,Help,ConstLabels外,还有一个叫做Objectives的属性,该属性是summary自己的特性,在Objectives中定义参数用于观测

这里咱们通常没啥特殊需求直接按照如下配置即可

Go 复制代码

Objectives:map[float64]float64{
0.5:0.01,
0.75:0.01,
0.9:0.01,
0.99:0.001,
0.999:0.0001,
}

具体源码如下,其他就不再赘述,我们可以采用上述Counter的思路自己梳理一下该指标
summary interface

Go 复制代码

type Summary interface {  
Metric  
Collector  
  
// Observe adds a single observation to the summary. Observations are  
// usually positive or zero. Negative observations are accepted but  
// prevent current versions of Prometheus from properly detecting  
// counter resets in the sum of observations. See  
// https://prometheus.io/docs/practices/histograms/#count-and-sum-of-observations  
// for details.  
Observe(float64)  
}

summary struct

Go 复制代码

type SummaryOpts struct {  
// Namespace, Subsystem, and Name are components of the fully-qualified  
// name of the Summary (created by joining these components with  
// "_"). Only Name is mandatory, the others merely help structuring the  
// name. Note that the fully-qualified name of the Summary must be a  
// valid Prometheus metric name.  
Namespace string  
Subsystem string  
Name string  
  
// Help provides information about this Summary.  
//  
// Metrics with the same fully-qualified name must have the same Help  
// string.  
Help string  
  
// ConstLabels are used to attach fixed labels to this metric. Metrics  
// with the same fully-qualified name must have the same label names in  
// their ConstLabels.  
//  
// Due to the way a Summary is represented in the Prometheus text format  
// and how it is handled by the Prometheus server internally, "quantile"  
// is an illegal label name. Construction of a Summary or SummaryVec  
// will panic if this label name is used in ConstLabels.  
//  
// ConstLabels are only used rarely. In particular, do not use them to  
// attach the same labels to all your metrics. Those use cases are  
// better covered by target labels set by the scraping Prometheus  
// server, or by one specific metric (e.g. a build_info or a  
// machine_role metric). See also  
// https://prometheus.io/docs/instrumenting/writing_exporters/#target-labels-not-static-scraped-labels  
ConstLabels Labels  
  
// Objectives defines the quantile rank estimates with their respective  
// absolute error. If Objectives[q] = e, then the value reported for q  
// will be the φ-quantile value for some φ between q-e and q+e. The  
// default value is an empty map, resulting in a summary without  
// quantiles.  
Objectives map[float64]float64  
  
// MaxAge defines the duration for which an observation stays relevant  
// for the summary. Only applies to pre-calculated quantiles, does not  
// apply to _sum and _count. Must be positive. The default value is  
// DefMaxAge.  
MaxAge time.Duration  
  
// AgeBuckets is the number of buckets used to exclude observations that  
// are older than MaxAge from the summary. A higher number has a  
// resource penalty, so only increase it if the higher resolution is  
// really required. For very high observation rates, you might want to  
// reduce the number of age buckets. With only one age bucket, you will  
// effectively see a complete reset of the summary each time MaxAge has  
// passed. The default value is DefAgeBuckets.  
AgeBuckets uint32  
  
// BufCap defines the default sample stream buffer size. The default  
// value of DefBufCap should suffice for most uses. If there is a need  
// to increase the value, a multiple of 500 is recommended (because that  
// is the internal buffer size of the underlying package  
// "github.com/bmizerany/perks/quantile").  
BufCap uint32  
  
// now is for testing purposes, by default it's time.Now.  
now func() time.Time  
}

summary NewSummary

Go 复制代码

func NewSummary(opts SummaryOpts) Summary {  
return newSummary(  
    NewDesc(  
        BuildFQName(opts.Namespace, opts.Subsystem, opts.Name),  
        opts.Help,  
        nil,  
        opts.ConstLabels,  
    ),  
    opts,  
    )  
}

summary NewSummaryVec

Go 复制代码

func NewSummaryVec(opts SummaryOpts, labelNames []string) *SummaryVec {  
    return V2.NewSummaryVec(SummaryVecOpts{  
                SummaryOpts: opts,  
                VariableLabels: UnconstrainedLabels(labelNames),  
            })  
}

定义Prometheus工具类观测服务

梳理完Golang Prometheus客户端中不同指标的数据类型和方法后,接下来就该真正使用Prometheus客户端库,来观测服务了,那么该怎么实现这个功能呢？

在这里我是定义了两个工具类专门用于观测http服务和数据库,结构体如下:
定义工具类

Go 复制代码

//Gin的prometheus观测工具类  
type prometheusGinObserve struct {  
    namespace string  
    subsystem string  
    name string  
    help string  
}  
  
//Gorm的prometheus观测工具类  
type prometheusGormObserve struct {  
    summary *prometheus.SummaryVec  
}

接下来我们根据之前学习的Gin和Gorm方面知识,为Gin的prometheus观测工具类实现中间件,为Gorm的prometheus观测工具类实现Callback
为Gin的prometheus观测工具类实现中间件

go 复制代码

func (g *prometheusGinObserve) ObserveResponseTime() func(ctx *gin.Context) {  
        summaryVec := prometheus.NewSummaryVec(prometheus.SummaryOpts{  
        Namespace: g.namespace,  
        Subsystem: g.subsystem,  
        Name: g.name,  
        Help: g.name,  
        ConstLabels: map[string]string{"app": "vblog", "version": "v2", "env": "test"},  
        Objectives: map[float64]float64{  
            0.5: 0.01,  
            0.75: 0.01,  
            0.9: 0.01,  
            0.99: 0.001,  
            0.999: 0.0001,  
        },  
}, []string{"method", "patten", "code"})  
        prometheus.MustRegister(summaryVec)  
        return func(ctx *gin.Context) {  
            now := time.Now()  
            defer func() {  
                milliseconds := time.Since(now).Milliseconds()  
                summaryVec.WithLabelValues(ctx.Request.Method, ctx.FullPath(), strconv.Itoa(ctx.Writer.Status())).Observe(float64(milliseconds))  
}()  
      ctx.Next()  
        }  
}

根据上述代码,我定义了一个SummaryVec的观测,用于观测不同请求的响应时间并使用prometheus.MustRegister()注册了它,这里记住当定义好指标后,一定要注册它！

接着我返回了一个Gin的HandlerFunc,当一个请求过来时记录一下当前时间,并使用defer注册一个记录处理完业务的耗时,最后使用ctx.Next()执行后续的HandlerFunc

为GORM的prometheus观测工具类实现中间件

go 复制代码

func NewprometheusGormObserve(namespace, subsystem, name, help string) *prometheusGormObserve { 
        summaryVec := prometheus.NewSummaryVec(prometheus.SummaryOpts{  
            Namespace: namespace,  
            Subsystem: subsystem,  
            Name: name,  
            Help: name,  
            ConstLabels: map[string]string{"app": "vblog", "version": "v2", "env": "test"},  
            Objectives: map[float64]float64{  
                0.5: 0.01,  
                0.75: 0.01,  
                0.9: 0.01,  
                0.99: 0.001,  
                0.999: 0.0001,  
                },  
                }, []string{"table", "DML", "type"})  
        prometheus.MustRegister(summaryVec)  
        return &prometheusGormObserve{summary: summaryVec}  
}

func (g *prometheusGormObserve) Before() func(db *gorm.DB) {  
        return func(db *gorm.DB) {  
            now := time.Now()  
            db.Set("start_time", now)  
        }  
}  
  
func (g *prometheusGormObserve) After(typ string) func(db *gorm.DB) {  
        return func(db *gorm.DB) {  
            now, _ := db.Get("start_time")  
            //查询语句 DML类型  
            g.summary.WithLabelValues(db.Statement.Table, db.Statement.SQL.String(), typ).Observe(float64(time.Since(now.(time.Time))))  
        }  
}

上述定义了一个构造函数,用于构建SummaryVec观测器,并定义了两个Callback一个是Before,用于在DML操作前记录一下时间,则After用于完成DML操作后计算耗时并提交给Prometheus,最后我们可以将CallBack按照我们之前梳理的方法交给DB注册即可。

我们通过SummaryVec完成了针对http和Mysql数据库的观测,接着我们就可以在Prometheus服务端输入关键字观测指标了。

总结

在日常开发中,Prometheus能够帮助我们观测程序的各项指标,实现各种可视化;这对我们的工作的帮助无疑是最大的,希望通过该文章也可以帮助到读者如何使用在Golang中使用Prometheus。在这里也感谢Gin和Gorm,感谢能够站在巨人的肩膀下编程。

Golang集成Prometheus实现服务观测

简介

Prometheus的各种指标

Counter计数器

Gauge

Histogram

Summary

http服务观测

http中间件

Gin的ctx.next()和ctx.abort()

数据库观测

Gorm的Callback函数

动手实现

部署Prometheus

使用Prometheus的Golang库

为Prometheus开放端口

解读prometheus客户端库各指标

Counter

NewCounter

NewCounterVec

Gauge

Histogram

summary

定义Prometheus工具类观测服务

总结