【前置文章】
【参考】
【安装环境】
- mac
- minikube v1.25.2
1. 安装方式选择以及资源说明
1.1 选择Helm chart来安装Prometheus Operator
在Kubernetes中安装Prometheus相关的服务,有两种方式:
- 方式一:需要编写好deployment.yaml,用来安装Prometheus服务,Grafana服务,Alertmanager服务。并且安装另外必须的一些ConfigMap或secret。
- 方式二:使用
Kubernetes Operator
进行安装,优点:方便。- 可以手工进行安装Operator
- 也可以使用Helm chart进行安装Operator -->
本文采用的是这种方式进行安装
。
注:关于为什么要安装Prometheus Operator,而不是Prometheus本身,是因为Operator可以帮助我们布署、管理、恢复Prometheus(因为Prometheus是有状态的应用,不同于无状态的Java项目(Kubernetes可以自动化管理无状态应用),有状态的应用运维比较麻烦,所以需要特定的Operator来管理。具体可以看文章开头的前置文章。
1.2 资源说明
Prometheus社区有很多charts,都是由Prometheus社区维护的,具体的Git地址:github.com/prometheus-... 我们要安装的是kube-prometheus-stack
,地址:github.com/prometheus-...
关于kube-prometheus-stack
官网是这么描述的,它包含了Grafana dashboards
,使用Prometheus Operator
安装Prometheus相关的组件:
Installs the kube-prometheus stack, a collection of Kubernetes manifests, Grafana dashboards, and Prometheus rules combined with documentation and scripts to provide easy to operate end-to-end Kubernetes cluster monitoring with Prometheus using the Prometheus Operator.
2. 使用helm安装Prometheus Operator
2.1 创建monitoring命名空间:
$ kubectl create namespace monitoring
namespace/monitoring created
2.2 添加Prometheus Operator Helm repository:
$ helm repo add prometheus-community prometheus-community.github.io/helm-charts
"prometheus-community" has been added to your repositories
2.3 更新Helm repositories:
$ helm repo update
Hang tight while we grab the latest from your chart repositories... ...Successfully got an update from the "prometheus-community" chart repository Update Complete. ⎈Happy Helming!⎈
2.4 开始安装Prometheus Operator:
$ helm install prometheus-operator prometheus-community/kube-prometheus-stack -n monitoring
注:如果安装中途取消了,再次安装出现Error: INSTALLATION FAILED: cannot re-use a name that is still in use
,可以用helm upgrade --install
来代替(参考github.com/helm/helm/i...)。
也可以先删除再安装,删除命令:
$ helm -n monitoring delete prometheus-operator
release "prometheus-operator" uninstalled
2.5 下载安装包,helm chart从安装包中安装Prometheus Operator
(如果#2.4成功了,本节可以跳过) 注:如果遇到网络超时间:Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition
,可以先下载安装文件再进行安装。
下载地址:github.com/prometheus-...,找到kube-prometheus-stack相关的下载。
开始安装:
$ helm install prometheus-operator ./kube-prometheus-stack-58.0.0.tgz -n monitoring
NAME: prometheus-operator
LAST DEPLOYED: Tue Apr 9 22:32:20 2024
NAMESPACE: monitoring
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running: kubectl --namespace monitoring get pods -l "release=prometheus-operator"
Visit github.com/prometheus-... for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
3. 验证Prometheus Operator是否安装成功
monitoring namespace下的pod都在Running状态:
$ kubectl get pods -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-operator-kube-p-alertmanager-0 2/2 Running 0 8m28s
prometheus-operator-grafana-66bd56d448-wlvq8 3/3 Running 0 10m
prometheus-operator-kube-p-operator-5b8b4b9c4f-6psmv 1/1 Running 0 10m
prometheus-operator-kube-state-metrics-67b7949c67-6kd87 1/1 Running 0 10m
prometheus-operator-prometheus-node-exporter-zl6mb 1/1 Running 0 10m
prometheus-prometheus-operator-kube-p-prometheus-0 2/2 Running 0 8m27s
4. 访问Prometheus和Grafana Dashboards
4.1 访问Prometheus dashboard
先查看service的端点:
$ kubectl get service -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-operated ClusterIP None 9093/TCP,9094/TCP,9094/UDP 9m56s
prometheus-operated ClusterIP None 9090/TCP 9m55s
prometheus-operator-grafana ClusterIP 10.102.94.233 80/TCP 12m
prometheus-operator-kube-p-alertmanager ClusterIP 10.97.70.105 9093/TCP,8080/TCP 12m
prometheus-operator-kube-p-operator ClusterIP 10.111.224.63 443/TCP 12m
prometheus-operator-kube-p-prometheus ClusterIP 10.101.157.134 9090/TCP,8080/TCP 12m
prometheus-operator-kube-state-metrics ClusterIP 10.100.94.162 8080/TCP 12m
prometheus-operator-prometheus-node-exporter ClusterIP 10.100.32.110 9100/TCP 12m
然后使用port-forward
进行转发以便kubernetes群体外可以访问内部的service:
$ kubectl port-forward service/prometheus-operator-kube-p-prometheus -n monitoring 9090:9090
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
Prometheus dashboard地址:http://localhost:9090
点击菜单Status --> Configuration,可以查看当前的prometheus.yaml:
点击菜单Status --> Targets,可以查看当前的Targets,即从哪里抓取数据。
点击菜单Status --> Rules,可以查看当前的Rules。
4.2 访问Grafana dashboard
和prometheus service类似,先查询,后转发。
$ kubectl port-forward service/prometheus-operator-grafana -n monitoring 3000:80
Forwarding from 127.0.0.1:3000 -> 3000
Forwarding from [::1]:3000 -> 3000
Grafana dashboard地址:http://localhost:3000,Grafana需要登陆,所以需要拿到grafana的密码,查看secrets:
$ kubectl get secrets -n monitoring
NAME TYPE DATA AGE
prometheus-operator-grafana Opaque 3 16m
查看具体内容:
$ kubectl get secret prometheus-operator-grafana -n monitoring -o yaml
apiVersion: v1
data:
admin-password: cHJvbS1vcGVyYXRvcg==
admin-user: YWRtaW4=
ldap-toml: ""
kind: Secret
<其它略>
复制password,用base64 decode下:
$ echo "cHJvbS1vcGVyYXRvcg==" | base64 -d; echo
prom-operator
拿到明文密码后,用admin/prom-operator登陆Grafana:
进去后,点击菜单Connections,可以看到默认已经配置了Prometheus的数据源:
进去后点击Dashboards菜单,可以看到默认prometheus会抓取kubenetes components的metrics如Pod等,也有node相关的配置:
这里的ip是minikube的ip:
$ minikube ip
192.168.49.2
也可以查看pod相关的metrics:
5. 理解目前的安装
5.1 statefulset,deployment,daemonset
列出所有monitoring下的资源,列出的资源包括:pod
,service
,deployment
等:
$ kubectl get all -n monitoring
<其它略>
NAME READY AGE
statefulset.apps/alertmanager-prometheus-operator-kube-p-alertmanager 1/1 17h
statefulset.apps/prometheus-prometheus-operator-kube-p-prometheus 1/1 17h
其中statefulset
资源有两个:
- 其中一个prometheus开头的,是prometheus三个server(即
Retrival
,Storage
,HTTP Server
),名字中间有operator,表示这个prometheus归operator管理。 - 另一个alertmanager开头的,顾名思议是alert manager,也是归operator管理。
上接查询结果:
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/prometheus-operator-grafana 1/1 1 1 17h
deployment.apps/prometheus-operator-kube-p-operator 1/1 1 1 17h
deployment.apps/prometheus-operator-kube-state-metrics 1/1 1 1 17h
有三个deployment
:
p-operator
是prometheus operator自己的安装清单,通过它创建了Prometheus和Alertmanager的statefulset(也就是上面两个statefulset)。- 另一个是
grafana
相关的安装清单, kube-state-metrics
是当前这个Helm chart相关的,它用来抓取k8s当前的cluster本身component相关的metrics,用来测检当前deployment, statefulset, pod的是否健康,这些metrics数据可以在prometheus中被展示出来。
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/prometheus-operator-prometheus-node-exporter 1 1 1 1 1 kubernetes.io/os=linux 17h
Daemonset会在每个kubernetes的Worker节点上运行。当前的这个prometheus daemenset的作用是会把Worker节点上的数据(比如cpu使用率等)转化为Prometheus的metrics格式的数据。 注:这个Daemonset还需要和pod=prometheus-operator-prometheus-node-exporter-zl6mb
合作进行工作。
总结,目前我们安装了Monitoring相关的,还包含了Workder节点的监测、kubernetes components相关的监测。
5.2 configmap, secret
除了上述的资源,还安装了一些configmap
,这些配置有些是operator相关的,配置了默认的metrics连接等等。
$ kubectl get configmap -n monitoring
secrets
相关的资源,存放Grafana, Prometheus, Operator相关的敏感数据(username, password等):
$ kubectl get secret -n monitoring
5.3 CRDs
可以看到还创建了不少的CRD:
$ kubectl get crd -n monitoring
NAME CREATED AT
alertmanagerconfigs.monitoring.coreos.com 2024-04-09T10:08:00Z
<其它略>
6. 查看具体的配置
导出上述的statefulset的具体描述以及operator deployment的描述:
$ kubectl describe statefulset prometheus-prometheus-operator-kube-p-prometheus -n monitoring > prometheus.yaml
先看statefulset为prometheus的配置,可以看到使用的是v2.51.1版本的prometheus,端口为9090。
yaml
Containers:
prometheus:
Image: quay.io/prometheus/prometheus:v2.51.1
Port: 9090/TCP
另外还有一些mount目录,比如在rules目录下有一些规则的文件:
csharp
Mounts:
/etc/prometheus/certs from tls-assets (ro)
/etc/prometheus/config_out from config-out (ro)
/etc/prometheus/rules/prometheus-prometheus-operator-kube-p-prometheus-rulefiles-0 from prometheus-prometheus-operator-kube-p-prometheus-rulefiles-0 (rw)
如果prometheus相关的配置有改动,config-reloader负责重新加载这些config,可以看到config通过pod内的目录文件/etc/prometheus/config/prometheus.yaml
读取进来的:
arduino
config-reloader:
Image: quay.io/prometheus-operator/prometheus-config-reloader:v0.73.0
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=:8080
--reload-url=http://127.0.0.1:9090/-/reload
--config-file=/etc/prometheus/config/prometheus.yaml.gz
至于prometheus.yaml
是怎么被加载到prometheus pod内部目录/etc/prometheus/config
的,可以查看config-reloader
的Mounts配置:
javascript
Mounts:
/etc/prometheus/config from config (rw)
可以看到/etc/prometheus/config
是从volumn=config加载来的。查看volumn配置,volumn name=config的,type是secret,name=prometheus-prometheus-operator-kube-p-prometheus:
yaml
Volumes:
config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-prometheus-operator-kube-p-prometheus
通过命令查看secret=prometheus-prometheus-operator-kube-p-prometheus,以yaml格式导出,可以查看该secret相关的配置:
$ kubectl get secret prometheus-prometheus-operator-kube-p-prometheus -o yaml -n monitoring > secret.yaml
同样的,也可以对另外两个主要的配置进行导出查看:
$ kubectl describe statefulset alertmanager-prometheus-operator-kube-p-alertmanager -n monitoring > alertmanager.yaml
$ kubectl describe deployment prometheus-operator-kube-p-operator -n monitoring > operator.yaml