学习记录---Kubernetes的资源指标管道-metrics api的安装部署

一、简介

Metrics API,为我们的k8s集群提供了一组基本的指标(资源的cpu和内存),我们可以通过metrics api来对我们的pod开展HPA和VPA操作(主要通过在pod中对cpu和内存的限制实现动态扩展),也可以通过kubectl top的方式,获取k8s中node和pod的cpu及内存使用情况。

node资源分配情况和使用情况:

pod资源分配情况和使用情况:

二、使用和部署

从kubernetes的官网上来看,k8s初始化后默认是没有提供metrics api服务的,如果需要使用该服务,则需要部署metrics-server。

针对metrics-server的部署方式,我们可以直接从官网的超链接中获取,而该项目则是在github的kubernetes项目中:

复制代码
https://github.com/kubernetes-sigs/metrics-server/tree/master

在该项目中有比较明确的安装步骤,该项目安装比较简单,直接用其yaml进行apply即可(这里我们选择使用高可用模式,高可用模式其实就是多使用了几个replicas):

复制代码
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability-1.21+.yaml

我们先把yaml下载下来看看:

复制代码
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                k8s-app: metrics-server
            namespaces:
            - kube-system
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        #新增,使其不验证k8s提供的ca证书
        - --kubelet-insecure-tls
        #修改,修改为我们可以下载的镜像,如果有私有仓库也可以用私有仓库的地址
        #image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
        image: bitnami/metrics-server:0.6.4
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  minAvailable: 1
  selector:
    matchLabels:
      k8s-app: metrics-server
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

这里,我们修改了两个地方:

复制代码
1) 在deployment中的container的arg参数中加入:- --kubelet-insecure-tls
2) 修改镜像资源

修改镜像资源主要是方便我们可以顺利pull镜像;

加入kubelet-insecure-tls的主要目的是metrics-server不验证k8s提供的ca证书,如果不加该参数,则可能会导致出现以下问题:

pod状态:

查看描述,会出现Readiness探针异常(Readiness probe failed:Http probe failed with statuscode: 500)

查看日志,出现以下错误:

"Failed to scrape node" err="Get "https://xxx.xxx.xxx.xxx:10250/metrics/resource": tls: failed to verify certificate: x509: cannot validate certificate for xxx.xxx.xxx.xxx because it doesn't contain any IP SANs" node="k8s-slave3"

正常状态:

相关推荐
lichenyang4533 天前
Docker 学习笔记(四):Dockerfile,把项目打成自己的镜像
docker·容器
lichenyang4533 天前
Docker 学习笔记(三):Docker 网络、bridge、子网和容器互通
docker·容器
lichenyang4533 天前
Docker 学习笔记(二):docker run 的参数到底在控制什么?
docker·容器
运维开发故事6 天前
基于 Arthas 的多集群在线诊断系统设计与实现
kubernetes
Patrick_Wilson8 天前
从「改个端口」到 502:Next.js on k8s 的容器端口、Service 映射与 env 覆盖
docker·kubernetes·next.js
探索云原生8 天前
K8s 1.36 这个 GA 特性,把 initContainer 拉模型的 hack 干掉了
ai·云原生·kubernetes
云恒要逆袭8 天前
运行你的第一个Docker容器
后端·docker·容器
Java之美9 天前
一次k8s升级引发的DevicePlugin注册失败
云原生·kubernetes
程序员老赵10 天前
10 分钟部署 OpenCode:Docker 一键安装,浏览器打开就能用 AI 写代码(附完整命令与排错)
docker·容器·ai编程