文章目录
- [Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案)](#Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案))
-
- 一、NFS服务器配置(192.168.104.50)
-
- [1. 安装NFS服务](#1. 安装NFS服务)
- [2. 创建共享目录](#2. 创建共享目录)
- [3. 配置NFS共享](#3. 配置NFS共享)
- [4. 验证NFS共享](#4. 验证NFS共享)
- 二、所有Kubernetes节点配置(包括master和worker)
-
- [1. 安装NFS客户端](#1. 安装NFS客户端)
- [2. 验证NFS连接](#2. 验证NFS连接)
- 三、准备容器镜像(在可以访问外网的机器上操作)
-
- [1. 拉取所需镜像](#1. 拉取所需镜像)
- [2. 保存镜像为文件](#2. 保存镜像为文件)
- [3. 将镜像文件传输到所有节点](#3. 将镜像文件传输到所有节点)
- [4. 在每个节点加载镜像](#4. 在每个节点加载镜像)
- [四、部署NFS Provisioner](#四、部署NFS Provisioner)
-
- [1. 创建命名空间](#1. 创建命名空间)
- [2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml)](#2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml))
- [3. 创建StorageClass(nfs-storageclass.yaml)](#3. 创建StorageClass(nfs-storageclass.yaml))
- [4. 应用配置](#4. 应用配置)
- [5. 验证部署](#5. 验证部署)
- 五、部署Prometheus监控系统
-
- [1. 创建监控命名空间](#1. 创建监控命名空间)
- [2. 创建Prometheus配置文件(prometheus-configmap.yaml)](#2. 创建Prometheus配置文件(prometheus-configmap.yaml))
- [3. 创建Prometheus主部署文件(prometheus.yaml)](#3. 创建Prometheus主部署文件(prometheus.yaml))
- [4. 应用Prometheus配置](#4. 应用Prometheus配置)
- [六、部署Node Exporter](#六、部署Node Exporter)
-
- [1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml)](#1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml))
- [2. 创建Node Exporter服务文件(node-exportet-svc.yaml)](#2. 创建Node Exporter服务文件(node-exportet-svc.yaml))
- [3. 应用Node Exporter配置](#3. 应用Node Exporter配置)
- 七、部署Grafana
-
- [1. 创建Grafana部署文件(grafana.yaml)](#1. 创建Grafana部署文件(grafana.yaml))
- [2. 应用Grafana配置](#2. 应用Grafana配置)
- 八、验证部署
-
- [1. 检查所有组件状态](#1. 检查所有组件状态)
- [2. 访问监控界面](#2. 访问监控界面)
- [3. 配置Grafana](#3. 配置Grafana)
- 九、常见问题解决指南
Kubernetes 部署 Prometheus + Grafana 监控系统(NFS存储方案)
一、NFS服务器配置(192.168.104.50)
1. 安装NFS服务
yum install -y nfs-utils rpcbind
systemctl enable --now rpcbind
systemctl enable --now nfs-server
2. 创建共享目录
mkdir -p /data/k8s_data
chmod 777 /data/k8s_data
3. 配置NFS共享
echo "/data/k8s_data *(rw,sync,no_root_squash,no_subtree_check)" > /etc/exports
exportfs -arv
4. 验证NFS共享
showmount -e localhost
# 应该输出:
# Export list for localhost:
# /data/k8s_data *
二、所有Kubernetes节点配置(包括master和worker)
1. 安装NFS客户端
yum install -y nfs-utils
2. 验证NFS连接
mkdir /mnt/test
mount -t nfs 192.168.104.50:/data/k8s_data /mnt/test
touch /mnt/test/testfile
ls /mnt/test
umount /mnt/test
三、准备容器镜像(在可以访问外网的机器上操作)
1. 拉取所需镜像
docker pull prom/prometheus:latest
docker pull grafana/grafana:latest
docker pull prom/node-exporter:latest
docker pull registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
docker pull busybox:latest
docker pull quay.io/coreos/kube-state-metrics:v2.9.2
2. 保存镜像为文件
docker save -o prometheus.tar prom/prometheus:latest
docker save -o grafana.tar grafana/grafana:latest
docker save -o node-exporter.tar prom/node-exporter:latest
docker save -o nfs-provisioner.tar registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
docker save -o busybox.tar busybox:latest
docker save -o kube-state-metrics.tar quay.io/coreos/kube-state-metrics:v2.9.2
3. 将镜像文件传输到所有节点
scp *.tar root@192.168.104.51:/root
scp *.tar root@192.168.104.52:/root
scp *.tar root@192.168.104.53:/root
4. 在每个节点加载镜像
docker load -i prometheus.tar
docker load -i grafana.tar
docker load -i node-exporter.tar
docker load -i nfs-provisioner.tar
docker load -i busybox.tar
docker load -i kube-state-metrics.tar
四、部署NFS Provisioner
1. 创建命名空间
kubectl create namespace nfs-storageclass
2. 创建NFS Provisioner部署文件(nfs-provisioner.yaml)
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
namespace: nfs-storageclass
spec:
replicas: 1
selector:
matchLabels:
app: nfs-client-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
nodeSelector:
kubernetes.io/hostname: node1 # 指定运行在node1节点
containers:
- name: nfs-client-provisioner
image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
imagePullPolicy: IfNotPresent
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 192.168.104.50
- name: NFS_PATH
value: /data/k8s_data
volumes:
- name: nfs-client-root
nfs:
server: 192.168.104.50
path: /data/k8s_data
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "patch", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
namespace: nfs-storageclass
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
namespace: nfs-storageclass
3. 创建StorageClass(nfs-storageclass.yaml)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-client
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
parameters:
archiveOnDelete: "false"
4. 应用配置
kubectl apply -f nfs-provisioner.yaml
kubectl apply -f nfs-storageclass.yaml
5. 验证部署
kubectl get pods -n nfs-storageclass
# 应该看到nfs-client-provisioner运行中
kubectl get storageclass
# 应该看到nfs-client标记为(default)
五、部署Prometheus监控系统
1. 创建监控命名空间
kubectl create namespace monitor
2. 创建Prometheus配置文件(prometheus-configmap.yaml)
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: monitor
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [monitor]
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: prometheus-svc
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: web
action: keep
- job_name: 'coredns'
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [kube-system]
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: kube-dns
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: metrics
action: keep
- job_name: 'kube-apiserver'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: false
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [default, kube-system]
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: kubernetes
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: https
action: keep
- job_name: 'node-exporter'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- job_name: 'cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
insecure_skip_verify: true
ca_file: '/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
bearer_token_file: '/var/run/secrets/kubernetes.io/serviceaccount/token'
relabel_configs:
- target_label: __metrics_path__
replacement: /metrics/cadvisor
3. 创建Prometheus主部署文件(prometheus.yaml)
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitor
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources: ["nodes", "services", "endpoints", "pods", "nodes/proxy"]
verbs: ["get", "list", "watch"]
- apiGroups: ["extensions"]
resources: ["ingresses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps", "nodes/metrics"]
verbs: ["get"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitor
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-svc
namespace: monitor
labels:
app: prometheus
annotations:
prometheus_io_scrape: "true"
spec:
selector:
app: prometheus
type: NodePort
ports:
- name: web
nodePort: 32224
port: 9090
targetPort: http
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-pvc
namespace: monitor
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: nfs-client
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: monitor
labels:
app: prometheus
spec:
selector:
matchLabels:
app: prometheus
replicas: 1
template:
metadata:
labels:
app: prometheus
spec:
serviceAccountName: prometheus
initContainers:
- name: "change-permission-of-directory"
image: busybox:latest
command: ["/bin/sh", "-c"]
args:
- chown -R 65534:65534 /prometheus
securityContext:
privileged: true
volumeMounts:
- mountPath: "/etc/prometheus"
name: config-volume
- mountPath: "/prometheus"
name: data
containers:
- name: prometheus
image: prom/prometheus:latest
imagePullPolicy: IfNotPresent
args:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--web.enable-lifecycle"
- "--web.console.libraries=/usr/share/prometheus/console_libraries"
- "--web.console.templates=/usr/share/prometheus/consoles"
ports:
- name: http
containerPort: 9090
volumeMounts:
- mountPath: "/etc/prometheus"
name: config-volume
- mountPath: "/prometheus"
name: data
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 100m
memory: 512Mi
volumes:
- name: data
persistentVolumeClaim:
claimName: prometheus-pvc
- name: config-volume
configMap:
name: prometheus-config
4. 应用Prometheus配置
kubectl apply -f prometheus-configmap.yaml
kubectl apply -f prometheus.yaml
六、部署Node Exporter
1. 创建Node Exporter部署文件(node-exportet-daemonset.yaml)
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitor
labels:
app: node-exporter
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
labels:
app: node-exporter
spec:
hostPID: true
hostIPC: true
hostNetwork: true
nodeSelector:
kubernetes.io/os: linux
containers:
- name: node-exporter
image: prom/node-exporter:latest
imagePullPolicy: IfNotPresent
args:
- --web.listen-address=$(HOSTIP):9100
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host/root
- --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+)($|/)
- --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
env:
- name: HOSTIP
valueFrom:
fieldRef:
fieldPath: status.hostIP
ports:
- containerPort: 9100
resources:
requests:
cpu: 150m
memory: 180Mi
limits:
cpu: 150m
memory: 180Mi
securityContext:
runAsNonRoot: true
runAsUser: 65534
volumeMounts:
- name: proc
mountPath: /host/proc
- name: sys
mountPath: /host/sys
- name: root
mountPath: /host/root
mountPropagation: HostToContainer
readOnly: true
tolerations:
- operator: "Exists"
volumes:
- name: proc
hostPath:
path: /proc
- name: dev
hostPath:
path: /dev
- name: sys
hostPath:
path: /sys
- name: root
hostPath:
path: /
2. 创建Node Exporter服务文件(node-exportet-svc.yaml)
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: monitor
labels:
app: node-exporter
spec:
selector:
app: node-exporter
ports:
- name: metrics
port: 9100
targetPort: 9100
clusterIP: None
3. 应用Node Exporter配置
kubectl apply -f node-exportet-daemonset.yaml
kubectl apply -f node-exportet-svc.yaml
七、部署Grafana
1. 创建Grafana部署文件(grafana.yaml)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
namespace: monitor
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: nfs-client
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-server
namespace: monitor
spec:
replicas: 1
selector:
matchLabels:
task: monitoring
k8s-app: grafana
template:
metadata:
labels:
task: monitoring
k8s-app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
protocol: TCP
volumeMounts:
- mountPath: /var/lib/grafana/
name: grafana-data
env:
- name: GF_SERVER_HTTP_PORT
value: "3000"
- name: GF_AUTH_BASIC_ENABLED
value: "false"
- name: GF_AUTH_ANONYMOUS_ENABLED
value: "true"
- name: GF_AUTH_ANONYMOUS_ORG_ROLE
value: Admin
- name: GF_SERVER_ROOT_URL
value: /
volumes:
- name: grafana-data
persistentVolumeClaim:
claimName: grafana-pvc
---
apiVersion: v1
kind: Service
metadata:
labels:
kubernetes.io/cluster-service: 'true'
kubernetes.io/name: monitoring-grafana
name: grafana-svc
namespace: monitor
spec:
ports:
- port: 80
targetPort: 3000
nodePort: 31091
selector:
k8s-app: grafana
type: NodePort
2. 应用Grafana配置
kubectl apply -f grafana.yaml
八、验证部署
1. 检查所有组件状态
kubectl get pods -n monitor
# 应该看到类似以下输出:
# NAME READY STATUS RESTARTS AGE
# grafana-server-7868b7cc7c-k8lrd 1/1 Running 0 5m
# kube-state-metrics-74c47f9485-h8787 1/1 Running 0 6m
# node-exporter-6h79q 1/1 Running 0 5m
# node-exporter-9qkbs 1/1 Running 0 5m
# node-exporter-t64xb 1/1 Running 0 5m
# prometheus-5696fb478b-4wf4j 1/1 Running 0 6m
kubectl get pvc -n monitor
# 应该看到所有PVC状态为Bound
kubectl get svc -n monitor
# 应该看到Prometheus和Grafana服务
2. 访问监控界面
-
Prometheus :
http://<任意节点IP>:32224
-
Grafana :
http://<任意节点IP>:31091
3. 配置Grafana
- 登录Grafana(默认用户/密码:admin/admin)
- 添加数据源:
- 类型:Prometheus
- URL:
http://prometheus-svc.monitor.svc.cluster.local:9090
- 导入仪表盘:
- Node监控:ID
16098
- Kubernetes集群监控:ID
14249
- Node监控:ID

九、常见问题解决指南
问题1:PVC处于Pending状态
原因 :NFS Provisioner未能自动创建PV
解决方案:手动创建PV
# 在NFS服务器创建目录
ssh 192.168.104.50
mkdir -p /data/k8s_data/{prometheus,grafana}
chmod 777 /data/k8s_data/{prometheus,grafana}
exit
# 创建手动PV
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
name: prometheus-pv
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs-client
nfs:
path: /data/k8s_data/prometheus
server: 192.168.104.50
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: grafana-pv
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: nfs-client
nfs:
path: /data/k8s_data/grafana
server: 192.168.104.50
EOF
# 删除旧Pod强制重建
kubectl delete pod -n monitor --all
问题2:镜像拉取失败
解决方案:
- 使用本地镜像加载
- 在部署文件中指定
imagePullPolicy: IfNotPresent
- 确保节点上已加载所需镜像
问题3:NFS连接问题
验证步骤:
# 在K8s节点上测试
mkdir /mnt/test
mount -t nfs 192.168.104.50:/data/k8s_data /mnt/test
touch /mnt/test/testfile
ls /mnt/test
umount /mnt/test
修复方法:
- 确保NFS服务器防火墙开放2049端口
- 检查NFS服务器
/etc/exports
配置 - 在NFS服务器运行:
exportfs -arv