Kubernetes部署Prometheus+Grafana 监控系统NFS存储方案

文章目录

  • [Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案)](#Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案))
    • 一、NFS服务器配置(192.168.104.50)
      • [1. 安装NFS服务](#1. 安装NFS服务)
      • [2. 创建共享目录](#2. 创建共享目录)
      • [3. 配置NFS共享](#3. 配置NFS共享)
      • [4. 验证NFS共享](#4. 验证NFS共享)
    • 二、所有Kubernetes节点配置(包括master和worker)
      • [1. 安装NFS客户端](#1. 安装NFS客户端)
      • [2. 验证NFS连接](#2. 验证NFS连接)
    • 三、准备容器镜像(在可以访问外网的机器上操作)
      • [1. 拉取所需镜像](#1. 拉取所需镜像)
      • [2. 保存镜像为文件](#2. 保存镜像为文件)
      • [3. 将镜像文件传输到所有节点](#3. 将镜像文件传输到所有节点)
      • [4. 在每个节点加载镜像](#4. 在每个节点加载镜像)
    • [四、部署NFS Provisioner](#四、部署NFS Provisioner)
      • [1. 创建命名空间](#1. 创建命名空间)
      • [2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml)](#2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml))
      • [3. 创建StorageClass(nfs-storageclass.yaml)](#3. 创建StorageClass(nfs-storageclass.yaml))
      • [4. 应用配置](#4. 应用配置)
      • [5. 验证部署](#5. 验证部署)
    • 五、部署Prometheus监控系统
      • [1. 创建监控命名空间](#1. 创建监控命名空间)
      • [2. 创建Prometheus配置文件(prometheus-configmap.yaml)](#2. 创建Prometheus配置文件(prometheus-configmap.yaml))
      • [3. 创建Prometheus主部署文件(prometheus.yaml)](#3. 创建Prometheus主部署文件(prometheus.yaml))
      • [4. 应用Prometheus配置](#4. 应用Prometheus配置)
    • [六、部署Node Exporter](#六、部署Node Exporter)
      • [1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml)](#1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml))
      • [2. 创建Node Exporter服务文件(node-exportet-svc.yaml)](#2. 创建Node Exporter服务文件(node-exportet-svc.yaml))
      • [3. 应用Node Exporter配置](#3. 应用Node Exporter配置)
    • 七、部署Grafana
      • [1. 创建Grafana部署文件(grafana.yaml)](#1. 创建Grafana部署文件(grafana.yaml))
      • [2. 应用Grafana配置](#2. 应用Grafana配置)
    • 八、验证部署
      • [1. 检查所有组件状态](#1. 检查所有组件状态)
      • [2. 访问监控界面](#2. 访问监控界面)
      • [3. 配置Grafana](#3. 配置Grafana)
    • 九、常见问题解决指南

Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案)

一、NFS服务器配置(192.168.104.50)

1. 安装NFS服务

复制代码
yum install -y nfs-utils rpcbind
systemctl enable --now rpcbind
systemctl enable --now nfs-server

2. 创建共享目录

复制代码
mkdir -p /data/k8s_data
chmod 777 /data/k8s_data

3. 配置NFS共享

复制代码
echo "/data/k8s_data *(rw,sync,no_root_squash,no_subtree_check)" > /etc/exports
exportfs -arv

4. 验证NFS共享

复制代码
showmount -e localhost
# 应该输出:
# Export list for localhost:
# /data/k8s_data *

二、所有Kubernetes节点配置(包括master和worker)

1. 安装NFS客户端

复制代码
yum install -y nfs-utils

2. 验证NFS连接

复制代码
mkdir /mnt/test
mount -t nfs 192.168.104.50:/data/k8s_data /mnt/test
touch /mnt/test/testfile
ls /mnt/test
umount /mnt/test

三、准备容器镜像(在可以访问外网的机器上操作)

1. 拉取所需镜像

复制代码
docker pull prom/prometheus:latest
docker pull grafana/grafana:latest
docker pull prom/node-exporter:latest
docker pull registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
docker pull busybox:latest
docker pull quay.io/coreos/kube-state-metrics:v2.9.2

2. 保存镜像为文件

复制代码
docker save -o prometheus.tar prom/prometheus:latest
docker save -o grafana.tar grafana/grafana:latest
docker save -o node-exporter.tar prom/node-exporter:latest
docker save -o nfs-provisioner.tar registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
docker save -o busybox.tar busybox:latest
docker save -o kube-state-metrics.tar quay.io/coreos/kube-state-metrics:v2.9.2

3. 将镜像文件传输到所有节点

复制代码
scp *.tar root@192.168.104.51:/root
scp *.tar root@192.168.104.52:/root
scp *.tar root@192.168.104.53:/root

4. 在每个节点加载镜像

复制代码
docker load -i prometheus.tar
docker load -i grafana.tar
docker load -i node-exporter.tar
docker load -i nfs-provisioner.tar
docker load -i busybox.tar
docker load -i kube-state-metrics.tar

四、部署NFS Provisioner

1. 创建命名空间

复制代码
kubectl create namespace nfs-storageclass

2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml)

复制代码
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  namespace: nfs-storageclass
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nfs-client-provisioner
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccountName: nfs-client-provisioner
      nodeSelector:
        kubernetes.io/hostname: node1  # 指定运行在node1节点
      containers:
      - name: nfs-client-provisioner
        image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - name: nfs-client-root
          mountPath: /persistentvolumes
        env:
        - name: PROVISIONER_NAME
          value: k8s-sigs.io/nfs-subdir-external-provisioner
        - name: NFS_SERVER
          value: 192.168.104.50
        - name: NFS_PATH
          value: /data/k8s_data
      volumes:
      - name: nfs-client-root
        nfs:
          server: 192.168.104.50
          path: /data/k8s_data
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    namespace: nfs-storageclass
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  namespace: nfs-storageclass

3. 创建StorageClass(nfs-storageclass.yaml)

复制代码
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: nfs-client
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
  archiveOnDelete: "false"

4. 应用配置

复制代码
kubectl apply -f nfs-provisioner.yaml
kubectl apply -f nfs-storageclass.yaml

5. 验证部署

复制代码
kubectl get pods -n nfs-storageclass
# 应该看到nfs-client-provisioner运行中

kubectl get storageclass
# 应该看到nfs-client标记为(default)

五、部署Prometheus监控系统

1. 创建监控命名空间

复制代码
kubectl create namespace monitor

2. 创建Prometheus配置文件(prometheus-configmap.yaml)

复制代码
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitor
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    scrape_configs:
      - job_name: 'prometheus'
        kubernetes_sd_configs:
          - role: endpoints
            namespaces:
              names: [monitor]
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_name]
            regex: prometheus-svc
            action: keep
          - source_labels: [__meta_kubernetes_endpoint_port_name]
            regex: web
            action: keep
      - job_name: 'coredns'
        kubernetes_sd_configs:
          - role: endpoints
            namespaces:
              names: [kube-system]
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_name]
            regex: kube-dns
            action: keep
          - source_labels: [__meta_kubernetes_endpoint_port_name]
            regex: metrics
            action: keep
      - job_name: 'kube-apiserver'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: false
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: endpoints
            namespaces:
              names: [default, kube-system]
        relabel_configs:
          - source_labels: [__meta_kubernetes_service_name]
            regex: kubernetes
            action: keep
          - source_labels: [__meta_kubernetes_endpoint_port_name]
            regex: https
            action: keep
      - job_name: 'node-exporter'
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - source_labels: [__address__]
            regex: '(.*):10250'
            replacement: '${1}:9100'
            target_label: __address__
            action: replace
      - job_name: 'cadvisor'
        kubernetes_sd_configs:
          - role: node
        scheme: https
        tls_config:
          insecure_skip_verify: true
          ca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
        bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'
        relabel_configs:
          - target_label: __metrics_path__
            replacement: /metrics/cadvisor

3. 创建Prometheus主部署文件(prometheus.yaml)

复制代码
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
  - apiGroups: [""]
    resources: ["nodes", "services", "endpoints", "pods", "nodes/proxy"]
    verbs: ["get", "list", "watch"]
  - apiGroups: ["extensions"]
    resources: ["ingresses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["configmaps", "nodes/metrics"]
    verbs: ["get"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
  - kind: ServiceAccount
    name: prometheus
    namespace: monitor
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-svc
  namespace: monitor
  labels:
    app: prometheus
  annotations:
    prometheus_io_scrape: "true"
spec:
  selector:
    app: prometheus
  type: NodePort
  ports:
    - name: web
      nodePort: 32224
      port: 9090
      targetPort: http
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: prometheus-pvc
  namespace: monitor
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: nfs-client
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitor
  labels:
    app: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      initContainers:
        - name: "change-permission-of-directory"
          image: busybox:latest
          command: ["/bin/sh", "-c"]
          args:
            - chown -R 65534:65534 /prometheus
          securityContext:
            privileged: true
          volumeMounts:
            - mountPath: "/etc/prometheus"
              name: config-volume
            - mountPath: "/prometheus"
              name: data
      containers:
        - name: prometheus
          image: prom/prometheus:latest
          imagePullPolicy: IfNotPresent
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus"
            - "--web.enable-lifecycle"
            - "--web.console.libraries=/usr/share/prometheus/console_libraries"
            - "--web.console.templates=/usr/share/prometheus/consoles"
          ports:
            - name: http
              containerPort: 9090
          volumeMounts:
            - mountPath: "/etc/prometheus"
              name: config-volume
            - mountPath: "/prometheus"
              name: data
          resources:
            requests:
              cpu: 100m
              memory: 512Mi
            limits:
              cpu: 100m
              memory: 512Mi
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: prometheus-pvc
        - name: config-volume
          configMap:
            name: prometheus-config

4. 应用Prometheus配置

复制代码
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus.yaml

六、部署Node Exporter

1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml)

复制代码
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-exporter
  namespace: monitor
  labels:
    app: node-exporter
spec:
  selector:
    matchLabels:
      app: node-exporter
  template:
    metadata:
      labels:
        app: node-exporter
    spec:
      hostPID: true
      hostIPC: true
      hostNetwork: true
      nodeSelector:
        kubernetes.io/os: linux
      containers:
        - name: node-exporter
          image: prom/node-exporter:latest
          imagePullPolicy: IfNotPresent
          args:
            - --web.listen-address=$(HOSTIP):9100
            - --path.procfs=/host/proc
            - --path.sysfs=/host/sys
            - --path.rootfs=/host/root
            - --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
            - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
          env:
            - name: HOSTIP
              valueFrom:
                fieldRef:
                  fieldPath: status.hostIP
          ports:
            - containerPort: 9100
          resources:
            requests:
              cpu: 150m
              memory: 180Mi
            limits:
              cpu: 150m
              memory: 180Mi
          securityContext:
            runAsNonRoot: true
            runAsUser: 65534
          volumeMounts:
            - name: proc
              mountPath: /host/proc
            - name: sys
              mountPath: /host/sys
            - name: root
              mountPath: /host/root
              mountPropagation: HostToContainer
              readOnly: true
      tolerations:
        - operator: "Exists"
      volumes:
        - name: proc
          hostPath:
            path: /proc
        - name: dev
          hostPath:
            path: /dev
        - name: sys
          hostPath:
            path: /sys
        - name: root
          hostPath:
            path: /

2. 创建Node Exporter服务文件(node-exportet-svc.yaml)

复制代码
apiVersion: v1
kind: Service
metadata:
  name: node-exporter
  namespace: monitor
  labels:
    app: node-exporter
spec:
  selector:
    app: node-exporter
  ports:
    - name: metrics
      port: 9100
      targetPort: 9100
  clusterIP: None

3. 应用Node Exporter配置

复制代码
kubectl apply -f node-exportet-daemonset.yaml
kubectl apply -f node-exportet-svc.yaml

七、部署Grafana

1. 创建Grafana部署文件(grafana.yaml)

复制代码
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
  namespace: monitor
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: nfs-client
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-server
  namespace: monitor
spec:
  replicas: 1
  selector:
    matchLabels:
      task: monitoring
      k8s-app: grafana
  template:
    metadata:
      labels:
        task: monitoring
        k8s-app: grafana
    spec:
      containers:
        - name: grafana
          image: grafana/grafana:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 3000
              protocol: TCP
          volumeMounts:
            - mountPath: /var/lib/grafana/
              name: grafana-data
          env:
            - name: GF_SERVER_HTTP_PORT
              value: "3000"
            - name: GF_AUTH_BASIC_ENABLED
              value: "false"
            - name: GF_AUTH_ANONYMOUS_ENABLED
              value: "true"
            - name: GF_AUTH_ANONYMOUS_ORG_ROLE
              value: Admin
            - name: GF_SERVER_ROOT_URL
              value: /
      volumes:
        - name: grafana-data
          persistentVolumeClaim:
            claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
  labels:
    kubernetes.io/cluster-service: 'true'
    kubernetes.io/name: monitoring-grafana
  name: grafana-svc
  namespace: monitor
spec:
  ports:
    - port: 80
      targetPort: 3000
      nodePort: 31091
  selector:
    k8s-app: grafana
  type: NodePort

2. 应用Grafana配置

复制代码
kubectl apply -f grafana.yaml

八、验证部署

1. 检查所有组件状态

复制代码
kubectl get pods -n monitor
# 应该看到类似以下输出:
# NAME                                  READY   STATUS    RESTARTS   AGE
# grafana-server-7868b7cc7c-k8lrd       1/1     Running   0          5m
# kube-state-metrics-74c47f9485-h8787   1/1     Running   0          6m
# node-exporter-6h79q                   1/1     Running   0          5m
# node-exporter-9qkbs                   1/1     Running   0          5m
# node-exporter-t64xb                   1/1     Running   0          5m
# prometheus-5696fb478b-4wf4j           1/1     Running   0          6m

kubectl get pvc -n monitor
# 应该看到所有PVC状态为Bound

kubectl get svc -n monitor
# 应该看到Prometheus和Grafana服务

2. 访问监控界面

  • Prometheus : http://<任意节点IP>:32224

  • Grafana : http://<任意节点IP>:31091

3. 配置Grafana

  1. 登录Grafana(默认用户/密码:admin/admin)
  2. 添加数据源:
    • 类型:Prometheus
    • URL:http://prometheus-svc.monitor.svc.cluster.local:9090
  3. 导入仪表盘:
    • Node监控:ID 16098
    • Kubernetes集群监控:ID 14249

九、常见问题解决指南

问题1:PVC处于Pending状态

原因 :NFS Provisioner未能自动创建PV

​解决方案​​:手动创建PV

复制代码
# 在NFS服务器创建目录
ssh 192.168.104.50
mkdir -p /data/k8s_data/{prometheus,grafana}
chmod 777 /data/k8s_data/{prometheus,grafana}
exit

# 创建手动PV
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus-pv
spec:
  capacity:
    storage: 2Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs-client
  nfs:
    path: /data/k8s_data/prometheus
    server: 192.168.104.50
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: grafana-pv
spec:
  capacity:
    storage: 2Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs-client
  nfs:
    path: /data/k8s_data/grafana
    server: 192.168.104.50
EOF

# 删除旧Pod强制重建
kubectl delete pod -n monitor --all

问题2:镜像拉取失败

解决方案

  1. 使用本地镜像加载
  2. 在部署文件中指定imagePullPolicy: IfNotPresent
  3. 确保节点上已加载所需镜像

问题3:NFS连接问题

验证步骤

复制代码
# 在K8s节点上测试
mkdir /mnt/test
mount -t nfs 192.168.104.50:/data/k8s_data /mnt/test
touch /mnt/test/testfile
ls /mnt/test
umount /mnt/test

修复方法

  1. 确保NFS服务器防火墙开放2049端口
  2. 检查NFS服务器/etc/exports配置
  3. 在NFS服务器运行:exportfs -arv