学习记录---Kubernetes的资源指标管道-metrics api的安装部署

一、简介

Metrics API,为我们的k8s集群提供了一组基本的指标(资源的cpu和内存),我们可以通过metrics api来对我们的pod开展HPA和VPA操作(主要通过在pod中对cpu和内存的限制实现动态扩展),也可以通过kubectl top的方式,获取k8s中node和pod的cpu及内存使用情况。

node资源分配情况和使用情况:

pod资源分配情况和使用情况:

二、使用和部署

从kubernetes的官网上来看,k8s初始化后默认是没有提供metrics api服务的,如果需要使用该服务,则需要部署metrics-server。

针对metrics-server的部署方式,我们可以直接从官网的超链接中获取,而该项目则是在github的kubernetes项目中:

复制代码
https://github.com/kubernetes-sigs/metrics-server/tree/master

在该项目中有比较明确的安装步骤,该项目安装比较简单,直接用其yaml进行apply即可(这里我们选择使用高可用模式,高可用模式其实就是多使用了几个replicas):

复制代码
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/high-availability-1.21+.yaml

我们先把yaml下载下来看看:

复制代码
apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                k8s-app: metrics-server
            namespaces:
            - kube-system
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        #新增,使其不验证k8s提供的ca证书
        - --kubelet-insecure-tls
        #修改,修改为我们可以下载的镜像,如果有私有仓库也可以用私有仓库的地址
        #image: registry.k8s.io/metrics-server/metrics-server:v0.6.4
        image: bitnami/metrics-server:0.6.4
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: metrics-server
  namespace: kube-system
spec:
  minAvailable: 1
  selector:
    matchLabels:
      k8s-app: metrics-server
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

这里,我们修改了两个地方:

复制代码
1) 在deployment中的container的arg参数中加入:- --kubelet-insecure-tls
2) 修改镜像资源

修改镜像资源主要是方便我们可以顺利pull镜像;

加入kubelet-insecure-tls的主要目的是metrics-server不验证k8s提供的ca证书,如果不加该参数,则可能会导致出现以下问题:

pod状态:

查看描述,会出现Readiness探针异常(Readiness probe failed:Http probe failed with statuscode: 500)

查看日志,出现以下错误:

"Failed to scrape node" err="Get "https://xxx.xxx.xxx.xxx:10250/metrics/resource": tls: failed to verify certificate: x509: cannot validate certificate for xxx.xxx.xxx.xxx because it doesn't contain any IP SANs" node="k8s-slave3"

正常状态:

相关推荐
张忠琳7 小时前
【client-go v0.36.1】(store Part 3)Store 超深度分析 — 集成模式、完整数据流、不变量、与 DeltaFIFO 协作
云原生·kubernetes·informer·store·client-go
数智工坊7 小时前
机器人运动控制:采样、优化与学习三大流派深度对比与实战
android·学习·机器人
ZC跨境爬虫8 小时前
跟着 MDN 学JavaScript day_7:数学运算与逻辑判断实战测试
开发语言·前端·javascript·学习·ecmascript
赵渝强老师10 小时前
【赵渝强老师】Kubernetes(K8s)中的金丝雀升级
linux·docker·云原生·容器·kubernetes
鹤落晴春10 小时前
【K8s】配置存储卷
云原生·容器·kubernetes
MartinYeung510 小时前
[论文学习]隐私保护联邦特徵选择与差分隐私的的工程实践框架
学习
qeen8710 小时前
【C++】类与对象之类的默认成员函数(二)
android·c语言·开发语言·c++·笔记·学习
Flandern111111 小时前
Pull Requests(PR)
学习·github·pr
张忠琳11 小时前
【client-go v0.36.1】(DeltaFIFO Part 1)DeltaFIFO 超深度分析 — 模块定位、类结构、接口层次、构造与初始化
云原生·kubernetes·deltafifo·informer·client-go
nashane12 小时前
HarmonyOS 6学习:JsCrash“闪退”法医指南——从FaultLog堆栈还原崩溃现场的终极手册
学习·华为·harmonyos