Helm 安装prometheus-stack 使用local pv持久化存储数据

目录

背景:

环境准备:

[1. 磁盘准备](#1. 磁盘准备)

[2. 磁盘分区格式化](#2. 磁盘分区格式化)

[local storage部署](#local storage部署)

[1. 节点打标签](#1. 节点打标签)

[2. 创建local pv storageClass和prometheus-pv](#2. 创建local pv storageClass和prometheus-pv)

Prometheus-stack部署

[1. 下载helm chart包](#1. 下载helm chart包)

[2. values.yaml 参数解释](#2. values.yaml 参数解释)

[3. 部署prometheus-stack](#3. 部署prometheus-stack)

[4. 查看部署情况](#4. 查看部署情况)


背景:

k8s集群prometheus 监控数据和业务数据共用一个NFS(网络文件系统),可能会出现以下问题:

  • 影响业务:业务数据和监控数据进行隔离,原则上我们可以允许监控数据丢失,但是业务数据一定是不能丢失的

  • 读写性能:业务服务和监控系统挂载NFS共享的文件或者目录,如果业务服务和监控系统同时在进行大量的读写则会互现干扰

  • 稳定性:NFS对网络环境的要求比较高,如果网络环境不稳定,容易导致文件共享出现故障

  • 存储空间:prometheus 虽然有监控数据回收的机制,但是也只是针对数据有限期进行回收,如果某一天有大量的监控数据就会占用NFS的很多存储空间,极端情况下会出现将NFS存储空间占满的情况

  • NFS扩容:NFS的扩展性比较差,当需要扩容时,需要手动进行配置,操作比较繁琐

环境准备:

一个正常运行的集群,集群版本最好 >= 1.21,低于1.21 版本兼容性可能会有问题

|---------------------------------------------------------------------------------------------------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| kube-prometheus stack | Kubernetes 1.21 | Kubernetes 1.22 | Kubernetes 1.23 | Kubernetes 1.24 | Kubernetes 1.25 | Kubernetes 1.26 | Kubernetes 1.27 |
| release-0.9 | ✔ | ✔ | ✗ | ✗ | ✗ | x | x |
| release-0.10 | ✗ | ✔ | ✔ | ✗ | ✗ | x | x |
| release-0.11 | ✗ | ✗ | ✔ | ✔ | ✗ | x | x |
| release-0.12 | ✗ | ✗ | ✗ | ✔ | ✔ | x | x |
| main | ✗ | ✗ | ✗ | ✗ | x | ✔ | ✔ |

1. 磁盘准备

从集群中选择一个节点,该节点独立挂载一块磁盘。磁盘最好是做一个磁盘阵列例如Raid50,提高磁盘的容错能力

2. 磁盘分区格式化

bash 复制代码
# 将sdb的空间都分给一个分区
parted /dev/sdb mkpart primary 0% 100%

# 写入文件系统
mkfs -t ext4 /dev/sdb1

# 获取磁盘的UUID,用于写入fstab实现开机自动挂载
blkid  /dev/sdb1

# 创建挂载点
mkdir -p /monitoring

# 查看fstab文件
cat /etc/fstab  | grep monitoring                                                                                                                                                                     
/dev/disk/by-uuid/93a76705-814a-4a5e-85f0-88fe03d7837c /monitoring ext4 defaults 0 1 

# 挂载
mount -a 

local storage部署

1. 节点打标签

bash 复制代码
kubectl label node node156 prometheus=deploy

2. 创建local pv storageClass和prometheus-pv

bash 复制代码
cd /home/sunwenbo/local-pv 
kubectl apply -f local-pv-storage.yaml 
kubectl apply -f local-pv.yaml

local-pv-storage.yaml

bash 复制代码
kind: StorageClass                                                                                                                                                                                                                         
apiVersion: storage.k8s.io/v1                                                                                                                                                                                                              
metadata:                                                                                                                                                                                                                                  
  name: local-storage                                                                                                                                                                                                                      
provisioner: kubernetes.io/no-provisioner                                                                                                                                                                                                  
volumeBindingMode: WaitForFirstConsumer  
#reclaimPolicy: Retain            注:local pv不支持retain存储方式                                                                                                                                                                                                           
#volumeBindingMode: Immediate     注:不支持动态创建pv

local-pv.yaml

bash 复制代码
apiVersion: v1                                                                                                                                                                                                                             
kind: PersistentVolume                                                                                                                                                                                                                     
metadata:                                                                                                                                                                                                                                  
  name: prometheus-pv                                                                                                                                                                                                                      
spec:                                                                                                                                                                                                                                      
  capacity:                                                                                                                                                                                                                                
    storage: 200Gi                                                                                                                                                                                                                         
  volumeMode: Filesystem                                                                                                                                                                                                                   
  accessModes:                                                                                                                                                                                                                             
  - ReadWriteOnce                                                                                                                                                                                                                          
  persistentVolumeReclaimPolicy: Retain                                                                                                                                                                                                    
    #persistentVolumeReclaimPolicy: Delete                                                                                                                                                                                                 
  storageClassName: local-storage                                                                                                                                                                                                          
  local:                                                                                                                                                                                                                                   
    path: /monitoring/prometheus                                                                                                                                                                                                           
  nodeAffinity:                                                                                                                                                                                                                            
    required:                                                                                                                                                                                                                              
      nodeSelectorTerms:                                                                                                                                                                                                                   
      - matchExpressions:                                                                                                                                                                                                                  
        - key: prometheus                                                                                                                                                                                                                  
          operator: In                                                                                                                                                                                                                     
          values:                                                                                                                                                                                                                          
          - "deploy"    

解释一下:还记得我们上面打标签的步骤吧,这里配置nodeAffinity就是为了将pv创建在指定的节点上通过标签进行匹配

查看StorageClass

bash 复制代码
root@master01:/home/sunwenbo/local-pv# kubectl  get storageclasses.storage.k8s.io                                                                                                                                                          
NAME                   PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE                                                                                                                    
local-storage          kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  17h                                                                                                                    
nfs-016                nfs.csi.k8s.io                 Retain          Immediate              false                  59d                                                                                                                    
nfs-018                nfs.csi.k8s.io                 Retain          Immediate              false                  44d                                                                                                                    
nfs-retain (default)   nfs.csi.k8s.io                 Retain          Immediate              false                  62d   

查看pv

注:正常pv的状态是Available,因为还有没有创建pvc,下面展示是我部署后的结果,可以看到prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0 绑定了prometheus-pv,至于这个pvc是怎么来的下面会介绍

bash 复制代码
root@master01:/home/sunwenbo/local-pv# kubectl  get pv | grep prometheus                                                                                                                                                                   
prometheus-pv                              200Gi      RWO            Retain           Bound    kube-prometheus/prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0           local-storage        
    23m 

Prometheus-stack部署

1. 下载helm chart包

bash 复制代码
wget https://github.com/prometheus-community/helm-charts/releases/download/kube-prometheus-stack-45.27.2/kube-prometheus-stack-45.27.2.tgz 
tar xf kube-prometheus-stack-45.27.2.tgz 
cd kube-prometheus-stack

2. values.yaml 参数解释

修改部分如下

bash 复制代码
  # alertmanager 持久化配置,使用nfs 存储空间为4G
  alertmanager:
    alertmanagerSpec:
      storage:                                                                                                                                                                                                                               
        volumeClaimTemplate:                                                                                                                                                                                                                 
          spec:                                                                                                                                                                                                                              
            storageClassName: nfs-retain                                                                                                                                                                                                     
            accessModes: ["ReadWriteOnce"]                                                                                                                                                                                                   
            resources:                                                                                                                                                                                                                       
              requests:                                                                                                                                                                                                                      
                storage: 4Gi 
 # grafana 持久化存储配置及环境变量、plugin添加
 grafana:                                                                                                                                                                                                                                   
  enabled: true                                                                                                                                                                                                                            
  namespaceOverride: ""                                                                                                                                                                                                                                                                                                                                                                                                                                                       
  forceDeployDatasources: false
  persistence:                                                                                                                                                                                                                             
    type: pvc                                                                                                                                                                                                                              
    enabled: true                                                                                                                                                                                                                          
    storageClassName: nfs-retain                                                                                                                                                                                                           
    accessModes:                                                                                                                                                                                                                           
      - ReadWriteOnce                                                                                                                                                                                                                      
    size: 2Gi                                                                                                                                                                                                                              
    finalizers:                                                                                                                                                                                                                            
      - kubernetes.io/pvc-protection 
  env:                                                                                                                                                                                                                                     
    GF_AUTH_ANONYMOUS_ENABLED: "true"                                                                                                                                                                                                      
    GF_AUTH_ANONYMOUS_ORG_NAME: "Main Org."                                                                                                                                                                                                
    GF_AUTH_ANONYMOUS_ORG_ROLE: Viewer                                                                                                                                                                                                                                                                                                                                                                                                                                    
  plugins:                                                                                                                                                                                                                                 
    - grafana-worldmap-panel                                                                                                                                                                                                               
    - grafana-piechart-panel    
 # grafana service 暴露配置
   service:                                                                                                                                                                                                                                 
    portName: http-web                                                                                                                                                                                                                     
    port: 30080                                                                                                                                                                                                                            
    externalIPs: ["10.1.2.15"]
    
 # 监控数据保留15天
 prometheus: 
   retention: 15d  
# prometheus 部署节点使用node亲和性标签匹配
    affinity:                                                                                                                                                                                                                              
     nodeAffinity:                                                                                                                                                                                                                         
       requiredDuringSchedulingIgnoredDuringExecution:                                                                                                                                                                                     
         nodeSelectorTerms:                                                                                                                                                                                                                
         - matchExpressions:                                                                                                                                                                                                               
           - key: prometheus                                                                                                                                                                                                               
             operator: In                                                                                                                                                                                                                  
             values:                                                                                                                                                                                                                       
             - deploy  
# prometheus 设置内存、cpu的reqeust和limit     
    resources:                                                                                                                                                                                                                             
     requests:                                                                                                                                                                                                                             
       memory: 10Gi                                                                                                                                                                                                                        
       cpu: 10                                                                                                                                                                                                                             
     limits:                                                                                                                                                                                                                               
       memory: 50Gi                                                                                                                                                                                                                        
       cpu: 10 
       
# prometheus 使用外部ip暴露
  service:
    externalIPs: ["10.1.2.15"]  
    
# prometheus数据持久化存储使用local-storage      
    storageSpec:                                                                                                                                                                                                                           
    ## Using PersistentVolumeClaim                                                                                                                                                                                                         
    #                                                                                                                                                                                                                              
      volumeClaimTemplate:                                                                                                                                                                                                                 
        spec:                                                                                                                                                                                                                              
          storageClassName: local-storage                                                                                                                                                                                                  
          accessModes: ["ReadWriteOnce"]                                                                                                                                                                                                   
          resources:                                                                                                                                                                                                                       
            requests:                                                                                                                                                                                                                      
              storage: 200Gi  
              
# 增加gpu-metrics     
    additionalScrapeConfigs:                                                                                                                                                                                                               
      - job_name: gpu-metrics                                                                                                                                                                                                              
        scrape_interval: 1s                                                                                                                                                                                                                
        metrics_path: /metrics                                                                                                                                                                                                             
        scheme: http                                                                                                                                                                                                                       
        kubernetes_sd_configs:                                                                                                                                                                                                             
          - role: endpoints                                                                                                                                                                                                                
            namespaces:                                                                                                                                                                                                                    
              names:                                                                                                                                                                                                                       
                - nvidia-device-plugin                                                                                                                                                                                                     
        relabel_configs:                                                                                                                                                                                                                   
          - source_labels: [__meta_kubernetes_pod_node_name]                                                                                                                                                                               
            action: replace                                                                                                                                                                                                                
            target_label: kubernetes_node 

全量的values.yaml已经上传到csdn不需要积分就可以下载了

https://download.csdn.net/download/weixin_43798031/88046678https://download.csdn.net/download/weixin_43798031/88046678

3. 部署prometheus-stack

bash 复制代码
helm upgrade -i kube-prometheus-stack -f values.yaml . -n kube-prometheus

4. 查看部署情况

bash 复制代码
root@master01:/home/sunwenbo/kube-prometheus-stack# kubectl  get deployments.apps  -n kube-prometheus                                                                                                                                      
NAME                                       READY   UP-TO-DATE   AVAILABLE   AGE                                                                                                                                                            
kube-prometheus-stack-grafana              1/1     1            1           123m                                                                                                                                                           
kube-prometheus-stack-kube-state-metrics   1/1     1            1           123m                                                                                                                                                           
kube-prometheus-stack-operator             1/1     1            1           123m                                                                                                                                                           
root@master01:/home/sunwenbo/kube-prometheus-stack# kubectl  get daemonsets.apps -n kube-prometheus                                                                                                                                        
NAME                                             DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE                                                                                                                  
kube-prometheus-stack-prometheus-node-exporter   148       148       148     148          148                   123m                                                                                                                 
root@master01:/home/sunwenbo/kube-prometheus-stack# kubectl  get statefulsets.apps  -n kube-prometheus                                                                                                                                     
NAME                                              READY   AGE                                                                                                                                                                              
alertmanager-kube-prometheus-stack-alertmanager   1/1     123m                                                                                                                                                                             
prometheus-kube-prometheus-stack-prometheus       1/1     123m   

service

bash 复制代码
root@master01:/home/sunwenbo/kube-prometheus-stack# kubectl  get svc -n kube-prometheus                                                                                                                                                    
NAME                                             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE                                                                                                               
alertmanager-operated                            ClusterIP   None                     9093/TCP,9094/TCP,9094/UDP   123m                                                                                                              
kube-prometheus-stack-alertmanager               ClusterIP   10.111.20.147            9093/TCP                     123m                                                                                                              
kube-prometheus-stack-grafana                    ClusterIP   10.104.171.223   10.1.2.15     30080/TCP                    123m                                                                                                              
kube-prometheus-stack-kube-state-metrics         ClusterIP   10.107.110.116           8080/TCP                     123m                                                                                                              
kube-prometheus-stack-operator                   ClusterIP   10.107.180.72            443/TCP                      123m                                                                                                              
kube-prometheus-stack-prometheus                 ClusterIP   10.102.115.147   10.1.2.15     9090/TCP                     123m                                                                                                              
kube-prometheus-stack-prometheus-export          ClusterIP   10.109.169.13    10.1.2.15     30081/TCP                    3d5h                                                                                                              
kube-prometheus-stack-prometheus-node-exporter   ClusterIP   10.101.152.90            9100/TCP                     123m                                                                                                              
prometheus-operated                              ClusterIP   None                     9090/TCP                     123m 

pv、pvc

bash 复制代码
root@master01:/home/sunwenbo/kube-prometheus-stack# kubectl  get pv | grep prometh                                                                                                                                                         
prometheus-pv                              200Gi      RWO            Retain           Bound    kube-prometheus/prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0           local-storage        
    127m                                                                                                                                                                                                                                   
pvc-43823533-9a35-4ace-b0a3-5853e3b4099e   4Gi        RWO            Retain           Bound    kube-prometheus/alertmanager-kube-prometheus-stack-alertmanager-db-alertmanager-kube-prometheus-stack-alertmanager-0   nfs-retain           
    60d                                                                                                                                                                                                                                    
pvc-cef3dd98-7090-47ac-8cec-c52c78e9237f   2Gi        RWO            Retain           Bound    kube-prometheus/kube-prometheus-stack-grafana                                                                          nfs-retain           
    129m 


root@master01:/home/sunwenbo/kube-prometheus-stack# kubectl  get pvc -n kube-prometheus                                                                                                                                                    
NAME                                                                                                   STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS    AGE                                   
alertmanager-kube-prometheus-stack-alertmanager-db-alertmanager-kube-prometheus-stack-alertmanager-0   Bound    pvc-43823533-9a35-4ace-b0a3-5853e3b4099e   4Gi        RWO            nfs-retain      60d                                   
kube-prometheus-stack-grafana                                                                          Bound    pvc-cef3dd98-7090-47ac-8cec-c52c78e9237f   2Gi        RWO            nfs-retain      127m                                  
prometheus-kube-prometheus-stack-prometheus-db-prometheus-kube-prometheus-stack-prometheus-0           Bound    prometheus-pv                              200Gi      RWO            local-storage   127m                                  

解释一下:使用volumeClaimTemplate 会动态的给我们创建出来一个pvc,由于之前已经创建pv了,这个pvc会自动和pv进行绑定

相关推荐
菜鸟挣扎史2 天前
grafana+prometheus+windows_exporter实现windows进程资源占用的监控
windows·grafana·prometheus·进程·process
牙牙7054 天前
Prometheus结合K8s(二)使用
容器·kubernetes·prometheus
牙牙7054 天前
Prometheus结合K8s(一)搭建
容器·kubernetes·prometheus
福大大架构师每日一题5 天前
32.2 prometheus倒排索引统计功能
ios·iphone·prometheus
让生命变得有价值6 天前
使用 Grafana api 查询 Datasource 数据
grafana·prometheus
福大大架构师每日一题6 天前
31.3 XOR压缩和相关的prometheus源码解读
prometheus
赫萝的红苹果6 天前
Springboot整合Prometheus+grafana实现系统监控
spring boot·grafana·prometheus
Heartsuit7 天前
云原生之运维监控实践-使用Prometheus与Grafana实现对Nginx和Nacos服务的监测
nginx·云原生·nacos·grafana·prometheus·运维监控
Heartsuit7 天前
云原生之运维监控实践-使用Telegraf、Prometheus与Grafana实现对InfluxDB服务的监测
云原生·grafana·prometheus·influxdb·telegraf·运维监控
武子康11 天前
大数据-218 Prometheus 插件 exporter 与 pushgateway 配置使用 监控服务 使用场景
大数据·hive·hadoop·flink·spark·prometheus