部署promethues采集kubelet数据报错:server returned HTTP status 403 Forbidden

背景

笔者尝试部署手动部署promethues去采集kubelet的node节点数据信息时报错

笔者的promethus的配置文件和promthues的clusterrole配置如下所示:

bash 复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  # - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: default


---
apiVersion: v1
data:
  prometheus.yml: |-
    global:
      scrape_interval:     15s 
      evaluation_interval: 15s
    scrape_configs:

    - job_name: 'kubernetes-nodes'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node

    - job_name: 'kubernetes-service'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: service

    - job_name: 'kubernetes-endpoints'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: endpoints

    - job_name: 'kubernetes-ingress'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: ingress

    - job_name: 'kubernetes-pods'
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: pod
    - job_name: 'kubernetes-kubelet'
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
        - role: node
      relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)

kind: ConfigMap
metadata:
  name: prometheus-config

---
apiVersion: v1
kind: "Service"
metadata:
  name: prometheus
  labels:
    name: prometheus
spec:
  ports:
  - name: prometheus
    protocol: TCP
    port: 9090
    targetPort: 9090
  selector:
    app: prometheus
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      serviceAccountName: prometheus
      serviceAccount: prometheus
      containers:
        - name: prometheus
          image: prom/prometheus:v2.2.1
          command:
            - "/bin/prometheus"
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
          ports:
            - containerPort: 9090
          volumeMounts:
            - mountPath: "/etc/prometheus"
              name: prometheus-config
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config

解决措施

笔者已经在promethues的配置文件中添加了insecure_skip_verify: true选项,这个选项跳过了tls的校验。这时候报错server returned HTTP status 403 Forbidden很显然是接口权限问题。

问题一:https://10.101.12.132:10250/metrics这个接口是做什么

https://10.101.12.132:10250/metrics 是一个特定的路径,通常用于获取 Kubernetes

集群中的节点(Node)的指标数据。也就是说,它提供了节点级别的监控指标

问题二:这个接口主要由什么资源进行权限控制

在 Kubernetes 中,https://10.101.12.132:10250/metrics 接口相关的权限通常由 ClusterRole 或 ClusterRoleBinding 来管理。这两个角色资源对于授予集群范围的权限非常有用。ClusterRole 定义了一组权限,它们可以在整个集群中使用。ClusterRoleBinding 则用于将角色绑定到用户、组或其他实体上,以授予这些实体访问相应权限的能力。要授予访问 https://10.101.12.132:10250/metrics 接口的权限,可能需要使用以下 ClusterRole 和 ClusterRoleBinding 示例作为参考:

bash 复制代码
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: node-metrics-reader
rules:
  - apiGroups: [""]
    resources: ["nodes/metrics"]
    verbs: ["get", "list"]

---

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: node-metrics-reader-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: node-metrics-reader
subjects:
- kind: User
  name: <username>  # 替换为具体的用户名或组名

我们尝试修改前文中的promethues的ClusterRole中的配置,删除前文中的注释,添加 - nodes/metrics资源的可操作权限,问题解决

参考文章: 部署了 prometheus, 在 target 中显示 cadvisor 与 nodes 的状态都是 down

相关推荐
可观测性用观测云3 天前
云原生网关 Ingress-Nginx 链路追踪实战:OpenTelemetry 采集与观测云集成方案
nginx·kubernetes
蝎子莱莱爱打怪5 天前
GitLab CI/CD + Docker Registry + K8s 部署完整实战指南
后端·docker·kubernetes
蝎子莱莱爱打怪8 天前
Centos7中一键安装K8s集群以及Rancher安装记录
运维·后端·kubernetes
阿里云云原生9 天前
Kubernetes 官方再出公告,强调立即迁移 Ingress NGINX
kubernetes
至此流年莫相忘9 天前
Kubernetes实战篇之配置与存储
云原生·容器·kubernetes
Cherry的跨界思维9 天前
【AI测试全栈:质量】47、Vue+Prometheus+Grafana实战:打造全方位AI监控面板开发指南
vue.js·人工智能·ci/cd·grafana·prometheus·ai测试·ai全栈
至此流年莫相忘9 天前
Kubernetes实战篇之服务发现
容器·kubernetes·服务发现
AC赳赳老秦9 天前
云原生AI故障排查新趋势:利用DeepSeek实现高效定位部署报错与性能瓶颈
ide·人工智能·python·云原生·prometheus·ai-native·deepseek
only_Klein9 天前
Kubernetes 版本升级
容器·kubernetes·upgrade