0%

【Go】监控指标的书写


我们知道,我们很多时候是需要知道我们的一个程序的运行状态的,那这个时候就是需要用到监控。这里,我们使用的监控是 Prometheus ,那我们的这个监控的指标怎么写呢,笔者找了点资料,写了几个简单的 Demo。这个 Demo 一定是存在不足的,大家可以评论告知。

如何利用Prometheus监控你的应用

引包

1
2
3
4
5
6
7
8
9
10
11
import (
. "flag"
"github.com/gin-gonic/gin"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"log"
"math/rand"
"strconv"
"sync"
"time"
)

指标的定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
totalCounterVec = prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "worker",
Subsystem: "jobs",
Name: "processed_total",
Help: "Total number of jobs processed by the workers",
},
[]string{"worker_id", "type"},
)

http_request_total = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "http_request_total" ,
Namespace: "prometheus_front_server",
Help: "The total number of processed http requests",
})

inflightCounterVec = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "worker",
Subsystem: "jobs",
Name: "inflight",
Help: "Number of jobs inflight",
},
[]string{"type"},
)

processingTimeVec = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Namespace: "worker",
Subsystem: "jobs",
Name: "process_time_seconds",
Help: "Amount of time spent processing jobs",
},
[]string{"worker_id", "type"},
)

先来看这三个 vec 的指标:

1
2
3
4
5
// track the total number of jobs processed by the worker
totalCounterVec.WithLabelValues(strconv.FormatInt(int64(workerID), 10), job.Type).Inc()
// decrement the inflight tracker
inflightCounterVec.WithLabelValues(job.Type).Dec()
processingTimeVec.WithLabelValues(strconv.FormatInt(int64(workerID), 10), job.Type).Observe(time.Now().Sub(startTime).Seconds())

再来看这个普通的 http_request_total:

1
2
3
func test(ctx *gin.Context){
http_request_total.Inc()
}

现在,我们再来看这个主函数的写法:

1
2
3
4
engine := gin.New()
engine.GET("/test" , test)
engine.GET("/metrics" , prometheusHttp)
engine.Run("0.0.0.0:9010")

prometheusHttp 里面的代码如下:

1
2
3
4
5
// 这里我是不是很清楚的,因为我觉得这个 gin 框架应该是可以不这么做
func prometheusHttp(ctx *gin.Context){
handler := promhttp.Handler()
handler.ServeHTTP(ctx.Writer , ctx.Request)
}

整理

NewCounter

我们利用promauto包提供的 NewCounter 方法定义了一个 Counter 类型的监控指标,只需要填充名字以及帮助信息,该指标就创建完成了。需要注意的是,Counter 类型数据的名字要尽量以 _total 作为后缀。否则当 Prometheus 与其他系统集成时,可能会出现指标无法识别的问题。每当有请求访问根目录时,该指标就会调用 Inc() 方法加一,当然,我们也可以调用 Add()方法累加任意的非负数。

NewGauge

监控累积的请求处理显然还是不够的,通常我们还想知道当前正在处理的请求的数量。Prometheus中的Gauge类型数据,与Counter不同,它既能增大也能变小。将正在处理的请求数量定义为Gauge类型是合适的。e.g:

1
2
3
4
5
6
7
8
9
10
11
12
13
	http_request_in_flight = prometheus.NewGauge(
prometheus.GaugeOpts{
Namespace: "prometheus_front_server" ,
Name: "http_request_in_flight",
Help: "Current number of http requests in flight",
},
)

func test2(ctx *gin.Context){
http_request_in_flight.Inc()
defer http_request_in_flight.Desc()
http_request_total.Inc()
}

Gauge和Counter类型的数据操作起来的差别并不大,唯一的区别是Gauge支持Dec()或者Sub()方法减小指标的值。

NewHistogram

对于一个网络服务来说,能够知道它的平均时延是重要的,不过很多时候我们更想知道响应时间的分布状况。Prometheus 中的 Histogram 类型就对此类需求提供了很好的支持。e.g:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
http_request_duration_seconds = prometheus.NewHistogram(
prometheus.HistogramOpts{
Namespace: "prometheus_front_server",
Subsystem: "jobs",
Name: "http_request_duration_seconds",
Help: "Histogram of lantencies for HTTP requests",
ConstLabels: nil,
Buckets: nil,
})

// 测试
http_request_in_flight.Inc()
defer http_request_in_flight.Desc()
http_request_total.Inc()

time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond)
http_request_duration_seconds.Observe(time.Since(time.Now()).Seconds())

Summary

这个和 Histogram 是一样的用法

NewCounterVec

不过,有的时候,我们可能希望从更多的特征维度去衡量一个指标。例如,对于接收到的HTTP请求的数目,我们可能希望知道具体到每个路径接收到的请求数目。假设当前能够访问 / 和 /foo 目录,显然定义两个不同的 Counter,比如 http_request_root_total和 http_request_foo_total,并不是一个很好的方法。一方面扩展性比较差:如果定义更多的访问路径就需要创建更多新的监控指标,同时,我们定义的特征维度往往不止一个,可能我们想知道某个路径且返回码为XXX的请求数目是多少,这种方法就无能为力了;另一方面,PromQL也无法很好地对这些指标进行聚合分析。

Prometheus对于此类问题的方法是为指标的每个特征维度定义一个label,一个label本质上就是一组键值对。一个指标可以和多个label相关联,而一个指标和一组具体的label可以唯一确定一条时间序列。对于上述分别统计每条路径的请求数目的问题,标准的Prometheus的解决方法如下:e.g:

1
2
3
4
5
6
7
8
9
10
totalCounterVec = prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "worker",
Subsystem: "jobs",
Name: "processed_total",
Help: "Total number of jobs processed by the workers",
},
// []string{"path"}
[]string{"worker_id", "type"},
)

这里,我们是根据这个 work_id , 和 当前的 类型 type 来做的这个 维度:

1
totalCounterVec.WithLabelValues(workerId, "type").Inc()

后面的几个 Vec 也是和这个的用法一样的。

这里的介绍就这么多,给一个我测试用的 Demo 的源码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
package main

import (
. "flag"
"github.com/gin-gonic/gin"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"log"
"math/rand"
"strconv"
"sync"
"time"
)

// Package: awesomeProject
// Version: 1.0
//
// Created by SunYang on 2020/5/15 11:14

var (
types = []string{"emai", "deactivation", "activation", "transaction", "customer_renew", "order_processed"}
workers = 0

totalCounterVec = prometheus.NewCounterVec(
prometheus.CounterOpts{
Namespace: "worker",
Subsystem: "jobs",
Name: "processed_total",
Help: "Total number of jobs processed by the workers",
},
[]string{"worker_id", "type"},
)

http_request_total = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "http_request_total" ,
Namespace: "prometheus_front_server",
Help: "The total number of processed http requests",
})

http_request_in_flight = prometheus.NewGauge(
prometheus.GaugeOpts{
Namespace: "prometheus_front_server" ,
Name: "http_request_in_flight",
Help: "Current number of http requests in flight",
},
)

http_request_duration_seconds = prometheus.NewHistogram(
prometheus.HistogramOpts{
Namespace: "prometheus_front_server",
Subsystem: "jobs",
Name: "http_request_duration_seconds",
Help: "Histogram of lantencies for HTTP requests",
ConstLabels: nil,
Buckets: nil,
})

inflightCounterVec = prometheus.NewGaugeVec(
prometheus.GaugeOpts{
Namespace: "worker",
Subsystem: "jobs",
Name: "inflight",
Help: "Number of jobs inflight",
},
[]string{"type"},
)

processingTimeVec = prometheus.NewHistogramVec(
prometheus.HistogramOpts{
Namespace: "worker",
Subsystem: "jobs",
Name: "process_time_seconds",
Help: "Amount of time spent processing jobs",
},
[]string{"worker_id", "type"},
)
)

func init() {
IntVar(&workers, "workers", 10, "Number of workers to use")
}
func getType() string {
return types[rand.Int()%len(types)]
}

type Job struct {
Type string
Sleep time.Duration
}

func main() {
Parse()
// 开始注册
prometheus.MustRegister(
totalCounterVec,
http_request_total,
http_request_in_flight,
http_request_duration_seconds,
inflightCounterVec,
processingTimeVec)

// create a channel with a 10,000 Job buffer
jobsChannel := make(chan *Job, 10000)

go startJobProcess(jobsChannel)

engine := gin.New()
engine.GET("/test" , test)
engine.GET("/test2" , test2)
engine.GET("/metrics" , prometheusHttp)
engine.Run("0.0.0.0:9010")

}

func prometheusHttp(ctx *gin.Context){
handler := promhttp.Handler()
handler.ServeHTTP(ctx.Writer , ctx.Request)
}

func test(ctx *gin.Context){
http_request_total.Inc()
}

func test2(ctx *gin.Context){
http_request_in_flight.Inc()
defer http_request_in_flight.Desc()
http_request_total.Inc()

time.Sleep(time.Duration(rand.Intn(1000)) * time.Millisecond)
http_request_duration_seconds.Observe(time.Since(time.Now()).Seconds())
}

func startJobProcess(jobs <-chan *Job) {
log.Printf("[INFO] starting %d workers\n", workers)
wait := sync.WaitGroup{}
// notify the sync group we need to wait for 10 goroutines
wait.Add(workers)

// start 10 works
for i := 0; i < workers; i++ {
go func(workerID int) {
// start the worker
startWorker(workerID, jobs)
wait.Done()
}(i)
}
wait.Wait()
}

func startWorker(workerID int, jobs <-chan *Job) {
for {
select {
// read from the job channel
case job := <-jobs:
startTime := time.Now()
// fake processing the request
time.Sleep(job.Sleep)
log.Printf("[%d][%s] Processed job in %0.3f seconds", workerID, job.Type, time.Now().Sub(startTime).Seconds())
// track the total number of jobs processed by the worker
totalCounterVec.WithLabelValues(strconv.FormatInt(int64(workerID), 10), job.Type).Inc()
// decrement the inflight tracker
inflightCounterVec.WithLabelValues(job.Type).Dec()
processingTimeVec.WithLabelValues(strconv.FormatInt(int64(workerID), 10), job.Type).Observe(time.Now().Sub(startTime).Seconds())
}
}
}

func createJobs(jobs chan<- *Job) {
for {
// create a random job
job := makeJob()
// track the job in the inflight tracker
inflightCounterVec.WithLabelValues(job.Type).Inc()
// send the job down the channel
jobs <- job
// don't pile up too quickly
time.Sleep(5 * time.Millisecond)
}
}
func makeJob() *Job {
return &Job{
Type: getType(),
Sleep: time.Duration(rand.Int()%100+10) * time.Millisecond,
}
}
这是打赏的地方...

本文标题:【Go】监控指标的书写

文章作者:Mr.Sun

发布时间:2020年06月10日 - 16:49:21

最后更新:2020年06月17日 - 17:25:48

原始链接:http://www.blog.sun-iot.xyz/posts/c9e0438b

许可协议: 署名-非商业性使用-禁止演绎 4.0 国际 转载请保留原文链接及作者。

---------Thanks for your attention---------