prometheus监控k8s集群资源
- [一,通过CADvisior 监控pod的资源状态](#一,通过CADvisior 监控pod的资源状态)
-
- [1.1 授权外边用户可以访问prometheus接口。](#1.1 授权外边用户可以访问prometheus接口。)
- [1.2 获取token保存](#1.2 获取token保存)
- [1.3 配置prometheus.yml 启动并查看状态](#1.3 配置prometheus.yml 启动并查看状态)
- [1.4 Grafana 导入仪表盘](#1.4 Grafana 导入仪表盘)
- [二,通过kube-state-metrics 监控k8s资源状态](#二,通过kube-state-metrics 监控k8s资源状态)
-
- [2.1 部署 kube-state-metrics](#2.1 部署 kube-state-metrics)
- [2.2 配置prometheus.yml](#2.2 配置prometheus.yml)
- [2.3 Grafana 导入仪表盘](#2.3 Grafana 导入仪表盘)
- [2.4 Grafana没有数据,添加路由转发](#2.4 Grafana没有数据,添加路由转发)
二进制安装的prometheus,监控k8s集群信息。
监控指标 | 实现方式 | 举例 |
---|---|---|
Pod资源利用率 | cAdvisor | 容器CPU、内存利用率 |
K8s资源状态 | kube-state-metrics | controller控制器、Node、Namespace、Pod、ReplicaSet、service等 |
一,通过CADvisior 监控pod的资源状态
1.1 授权外边用户可以访问prometheus接口。
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: kube-system
powershell
kubectl apply -f rbac.yaml
1.2 获取token保存
powershell
kubectl get secrets -n kube-system |grep prometheus #查看toekn name
name:prometheus-token-vgxhc
kubectl describe secret prometheus-token-vgxhc -n kube-system > token.k8s
#kubectl get secrets -n kube-system -o yaml prometheus-token-vgxhc |grep token
scp token.k8s prometheus #拷贝到prometheus服务器prometheus的目录下
我的token放在 /opt/monitor/prometheus/token.k8s
1.3 配置prometheus.yml 启动并查看状态
vim prometheus.yml
yaml
- job_name: kubernetes-nodes-cadvisor
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: node
api_server: https://172.18.0.0:6443
bearer_token_file: /opt/monitor/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
bearer_token_file: /opt/monitor/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
relabel_configs:
# 将标签(.*)作为新标签名,原有值不变
- action: labelmap
regex: __meta_kubernetes_node_label_(.*)
# 修改NodeIP:10250为APIServerIP:6443
- action: replace
regex: (.*)
source_labels: ["__address__"]
target_label: __address__
replacement: 172.18.0.0:6443
# 实际访问指标接口 https://NodeIP:10250/metrics/cadvisor 这个接口只能APISERVER访问,故此重新标记标签使用APISERVER代理访问
- action: replace
source_labels: [__meta_kubernetes_node_name]
target_label: __metrics_path__
regex: (.*)
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
powershell
./promtool check config prometheus.yml
重启prometheus 或 kill -HUP PrometheusPid
在prometheus的target页面查看
1.4 Grafana 导入仪表盘
导入3119 仪表盘
完成pod资源监控
二,通过kube-state-metrics 监控k8s资源状态
2.1 部署 kube-state-metrics
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
resources:
- configmaps
- secrets
- nodes
- pods
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
- apiGroups: ["networking.k8s.io", "extensions"]
resources:
- ingresses
verbs: ["list", "watch"]
- apiGroups: ["storage.k8s.io"]
resources:
- storageclasses
verbs: ["list", "watch"]
- apiGroups: ["certificates.k8s.io"]
resources:
- certificatesigningrequests
verbs: ["list", "watch"]
- apiGroups: ["policy"]
resources:
- poddisruptionbudgets
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: kube-state-metrics-resizer
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
rules:
- apiGroups: [""]
resources:
- pods
verbs: ["get"]
- apiGroups: ["extensions","apps"]
resources:
- deployments
resourceNames: ["kube-state-metrics"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kube-state-metrics-resizer
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
k8s-app: kube-state-metrics
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v1.3.0
spec:
selector:
matchLabels:
k8s-app: kube-state-metrics
version: v1.3.0
replicas: 1
template:
metadata:
labels:
k8s-app: kube-state-metrics
version: v1.3.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
priorityClassName: system-cluster-critical
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: harbor.cpit.com.cn/monitor/kube-state-metrics:v1.8.0
ports:
- name: http-metrics
containerPort: 8080
- name: telemetry
containerPort: 8081
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
timeoutSeconds: 5
- name: addon-resizer
image: harbor.cpit.com.cn/monitor/addon-resizer:1.8.6
resources:
limits:
cpu: 1000m
memory: 500Mi
requests:
cpu: 1000m
memory: 500Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: config-volume
mountPath: /etc/config
command:
- /pod_nanny
- --config-dir=/etc/config
- --container=kube-state-metrics
- --cpu=100m
- --extra-cpu=1m
- --memory=100Mi
- --extra-memory=2Mi
- --threshold=5
- --deployment=kube-state-metrics
volumes:
- name: config-volume
configMap:
name: kube-state-metrics-config
---
# Config map for resource configuration.
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-state-metrics-config
namespace: kube-system
labels:
k8s-app: kube-state-metrics
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
data:
NannyConfiguration: |-
apiVersion: nannyconfig/v1alpha1
kind: NannyConfiguration
---
apiVersion: v1
kind: Service
metadata:
name: kube-state-metrics
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "kube-state-metrics"
annotations:
prometheus.io/scrape: 'true'
spec:
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
protocol: TCP
- name: telemetry
port: 8081
targetPort: telemetry
protocol: TCP
selector:
k8s-app: kube-state-metrics
部署
powershell
kubectl apply -f kube-state-metrics.yaml
kubectl get pods -n kube-system
pod的正常运行
2.2 配置prometheus.yml
powershell
- job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.0.0:6443
bearer_token_file: /opt/monitor/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
bearer_token_file: /opt/monitor/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
Service没配置注解prometheus.io/scrape的不采集
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
重命名采集目标协议
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
重命名采集目标指标URL路径
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
重命名采集目标地址
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
将K8s标签(.*)作为新标签名,原有值不变
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
生成命名空间标签
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
生成Service名称标签
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_service_name
powershell
./promtool check config prometheus.yml
重启prometheus 或 kill -HUP PrometheusPid
在prometheus的target页面查看
2.3 Grafana 导入仪表盘
Grafana导入k8s集群资源对象监控仪表盘 6417
完成k8s集群资源对象监控仪表盘监控
2.4 Grafana没有数据,添加路由转发
powershell
ip route
ip route add 172.40.0.0/16 via 172.18.2.30 dev eth0
ip route
#172.40.1.208:kube-state-metrics pod 集群内部ip
#172.18.2.30:k8s master 节点ip
然后在查看Grafana仪表盘。