Chaos Mesh云原生的混沌测试平台搭建

Chaos Mesh云原生的混沌测试平台搭建

一.环境准备

​ 确认已经安装helm,如要查看 Helm 是否已经安装,请执行如下命令:

shell 复制代码
helm version

二.使用helm安装

1.添加 Chaos Mesh 仓库

​ 在 Helm 仓库中添加 Chaos Mesh 仓库:

shell 复制代码
helm repo add chaos-mesh https://charts.chaos-mesh.org
2.查看可以安装的 Chaos Mesh 版本
shell 复制代码
#最新版
helm search repo chaos-mesh
#可查看历史版本
helm search repo chaos-mesh -l

如图:

3.安装 Chaos Mesh
shell 复制代码
#创建命名空间
kubectl create ns chaos-mesh

​ 因为在安装部署Chaos Mesh的时候,会涉及到k8s拉取不到镜像,或者需要自定义其他配置。所以在部署的时候可以指定values.yml文件。

​ 进入这个网站:chaos-mesh/helm/chaos-mesh at release-2.7 · chaos-mesh/chaos-mesh,找到自己对应安装的版本。

找到该目录的values.yaml文件,然后复制里面需要修改的内容,在本地新建一个文件进行对应的调整。

如图是一些可能修改的点,要注意复制修改的时候,需要把一些必要的上下文一起复制,注意缩进。

下面是自己的例子,主要是修改了镜像地址和时区:

yaml 复制代码
rbac:
  create: true

# timezone is the timezone where controller-manager, chaos-daemon and dashboard uses.
# For example: "UTC" or "Asia/Shanghai"
# This value will be set on controller-manager and dashboard container's
# environment variable TZ.
# You may need to set the timezone to be consistent with your Grafana configuration,
# otherwise the query Grafana used to retrieve event maybe in wrong timezone.
timezone: "Asia/Shanghai"

images:
  # images.registry is the global container registry for the images, you could replace it with your self-hosted container registry.
  registry: "registry.cn-hangzhou.aliyuncs.com"
  # images.tag is the global image tag (for example, semiVer with prefix v, or latest).
  tag: "v2.7.0"

controllerManager:
  # securityContext if needed
  securityContext: {}
  # running chaos-controller-manager on host network
  hostNetwork: false
  # Allow testing on `hostNetwork` pods. This is Dangerous. Please run only as temporary solution.
  allowHostNetworkTesting: false
  # The serviceAccount for chaos-controller-manager
  serviceAccount: chaos-controller-manager
  # ServiceAccount annotations for chaos-controller-manager
  serviceAccountAnnotations: {}
  # Create the serviceAccount for chaos-controller-manager
  serviceAccountCreate: true
  # Custom priorityClassName for using pod priorities
  priorityClassName: ""
  # Replicas for chaos-controller-manager
  replicaCount: 3
  # image would be constructed by <registry>/<repository>:<tag>
  image:
    # override global registry, empty value means using the global images.registry
    registry: ""
    # repository part for image of chaos-controller-manager
    repository: <阿里云镜像仓库地址>/chaos-mesh
    # override global tag, empty value means using the global images.tag
    tag: ""
  # Image pull policy
  imagePullPolicy: IfNotPresent

  # The keys within the "env" map are mounted as environment variables on the pod.
  env:
    # WEBHOOK_PORT is configured the port for chaos-controller-manager provides webhooks.
    # In GKE private clusters, by default kubernetes apiservers are allowed to
    # talk to the cluster nodes only on 443 and 10250. so configuring
    # WEBHOOK_PORT: 10250, will work out of the box without needing to add firewall
    # rules or requiring NET_BIND_SERVICE capabilities to bind port numbers <1000
    WEBHOOK_PORT: 10250
    # METRICS_PORT is configured the port for chaos-controller-manager exposing prometheus metrics
    METRICS_PORT: 10080
  # If enabled, only pods in the namespace annotated with `"chaos-mesh.org/inject": "enabled"` could be injected
  enableFilterNamespace: false
  # targetNamespace only works with clusterScoped is false(namespace scoped mode).
  # It means namespace which will be injected chaos
  targetNamespace: chaos-mesh

  service:
    # Kubernetes Service type for service chaos-controller-manager
    type: ClusterIP

  resources:
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    limits: {}
    #  cpu: 500m
    #  memory: 1024Mi
    requests:
      cpu: 25m
      memory: 256Mi
  # Node labels for chaos-controller-manager pod assignment
  nodeSelector: {}
  # Toleration labels for chaos-controller-manager pod assignment
  tolerations: []
  # Map of chaos-controller-manager node/pod affinities
  affinity: {}
  # Pod annotations of chaos-controller-manager
  podAnnotations: {}
  # A list of controllers to enable. "*" enables all controllers by default.
  enabledControllers:
    - "*"
  # A list of webhooks to enable. "*" enables all webhooks by default.
  enabledWebhooks:
    - "*"
  podChaos:
    podFailure:
      # Custom Pause Container Image for Pod Failure Chaos
      pauseImage: registry.cn-hangzhou.aliyuncs.com/<阿里云镜像仓库地址>/pause:latest
  leaderElection:
    # Enable leader election for controller manager.
    enabled: true
    # The duration that non-leader candidates will wait to force acquire leadership. This is measured against time of last observed ack.
    leaseDuration: 15s
    # The duration that the acting control-plane will retry refreshing leadership before giving up.
    renewDeadline: 10s
    # The duration the LeaderElector clients should wait between tries of actions.
    retryPeriod: 2s
  # chaosdSecurityMode is enabled for mTLS connection between chaos-controller-manager and chaosd
  chaosdSecurityMode: true
  # multi cluster install offline helm chart path
  localHelmChart:
    enabled: false
    volume:
      hostPath:
        path: /data/helm
        type: DirectoryOrCreate

chaosDaemon:
  # image would be constructed by <registry>/<repository>:<tag>
  image:
    # override global registry, empty value means using the global images.registry
    registry: ""
    # repository part for image of chaos-daemon
    repository: <阿里云镜像仓库地址>/chaos-daemon
    # empty tag means using the global images.tag
    tag: ""
  # Image pull policy
  imagePullPolicy: IfNotPresent
  # The port which grpc server listens on.
  grpcPort: 31767
  # The port which http server listens on.
  httpPort: 31766
  # extra chaosDaemon envs
  env: {}
  # securityContext if needed
  securityContext: {}
  # running chaosDaemon on host network
  hostNetwork: false
  # configurations about mtls.
  # currently we do not support use specified ca and cert for mtls, it would generate the ca and certs when chaos mesh deploy by helm.
  mtls:
    # enable mtls on the grpc connection between chaos-controller-manager and chaos-daemon
    enabled: true

  runtime: containerd
  socketPath: /run/containerd/containerd.sock

dashboard:
  # Enable chaos-dashboard
  create: true
  # Optional, the secret name that has `DATABASE_DATASOURCE` defined.
  # It's recommended to use a secret to store the database credentials.
  databaseSecretName: ""
  # rootUrl specify the base url for openid/oauth2 (like GCP Auth Integration) callback URL.
  rootUrl: http://localhost:2333
  # securityContext if needed
  securityContext: {}
  # running chaos-dashboard on host network
  hostNetwork: false
  # replicas of chaos-dashboard
  replicaCount: 1
  # Custom priorityClassName for using pod priorities
  priorityClassName: ""
  # The serviceAccount for chaos-dashboard
  serviceAccount: chaos-dashboard
  image:
    # override global registry, empty value means using the global images.registry
    registry: ""
    # repository part for image of chaos-dashboard
    repository: <阿里云镜像仓库地址>/chaos-dashboard
    # override global tag, empty value means using the global images.tag
    tag: ""
  # Image pull policy
  imagePullPolicy: IfNotPresent
  # securityMode requires user to provide credentials on Chaos Dashboard, instead of using chaos-dashboard service account
  securityMode: true

dnsServer:
  # Enable DNS Server which required by DNSChaos
  create: true
  # Name of serviceaccount for chaos-dns-server.
  serviceAccount: chaos-dns-server
  # image would be constructed by <registry>/<repository>:<tag>
  image:
    # override global registry, empty value means using the global images.registry
    registry: ""
    # repository part for image of chaos-dns-server
    repository: chaos-mesh/chaos-coredns
    # override global tag, empty value means using the global images.tag
    tag: "v0.2.6"
  # Image pull policy
  imagePullPolicy: IfNotPresent
  # Customized priorityClassName for chaos-dns-server
  priorityClassName: ""

dnsServer:
  # Enable DNS Server which required by DNSChaos
  create: true
  # Name of serviceaccount for chaos-dns-server.
  serviceAccount: chaos-dns-server
  # image would be constructed by <registry>/<repository>:<tag>
  image:
    # override global registry, empty value means using the global images.registry
    registry: ""
    # repository part for image of chaos-dns-server
    repository: <阿里云镜像仓库地址>/chaos-coredns
    # override global tag, empty value means using the global images.tag
    tag: "v0.2.6"
  # Image pull policy
  imagePullPolicy: IfNotPresent
  # Customized priorityClassName for chaos-dns-server
  priorityClassName: ""

在修改完镜像地址等等东西之后,就可以执行命令部署。

shell 复制代码
helm install chaos-mesh -f chaos_mesh_values.yaml chaos-mesh/chaos-mesh --namespace=chaos-mesh --create-namespace

-f 后面是自己修改的values配置文件

检查是否部署成功

shell 复制代码
kubectl get po -n chaos-mesh
4.访问Chaos Mesh

访问地址是<集群IP>:30768,如图。

Chaos MeshRBAC 鉴权:

按如下步骤选择好命名空间和角色之后,点击自动生成的文件创建即可。

这里要注意:

shell 复制代码
kubectl create token account-default-viewer-ixqbu

这个命令生成的token是有过期时效的。所以我们还有一个方法可以生成长期可用的token。

yaml 复制代码
apiVersion: v1
kind: Secret
metadata:
  name: account-test-manager-sequd-token
  namespace: test
  annotations:
    kubernetes.io/service-account.name: account-default-viewer-ixqbu
type: kubernetes.io/service-account-token

注意此处的kubernetes.io/service-account.name:和上一步创建的角色名字相同.

shell 复制代码
#查看secrets
 kubectl describe secrets -n test account-test-manager-sequd-token

输入环境名和token就可以成功创建实验了。

三.创建测试实验

1.选择实验方法设置实验条件

这里Workers代表进程,这里是三个进程对Pod施加100M的压力。

这里可以配置标签选择器和命名空间,以确定哪些Pod参与此次实验。

此处要注意,当最后一步提交的时候,如果没有反应。报错信息则需要在F12里看具体的接口报错。此处报错信息在Pod日志 里无法看见。

a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.'

这里可以看到失败的原因是实验名称必须小写且不能有除这些字符以外的特殊字符。

修改之后则可以正常提交。

2.检查实验结果

在提交了实验之后,我们可以看到实验正在进行。

此时进入容器内部top,可以看到会有其他的进程对该pod施加内存压力,则证明Chaos Mesh安装成功可以如期进行实验

相关推荐
勤奋的凯尔森同学1 小时前
webmin配置终端显示样式,模仿UbuntuDesktop终端
linux·运维·服务器·ubuntu·webmin
Bright16684 小时前
centos9安装k8s集群
云原生·容器·kubernetes
技术小齐5 小时前
网络运维学习笔记 016网工初级(HCIA-Datacom与CCNA-EI)PPP点对点协议和PPPoE以太网上的点对点协议(此处只讲华为)
运维·网络·学习
ITPUB-微风5 小时前
Service Mesh在爱奇艺的落地实践:架构、运维与扩展
运维·架构·service_mesh
落幕6 小时前
C语言-进程
linux·运维·服务器
chenbin5206 小时前
Jenkins 自动构建Job
运维·jenkins
java 凯6 小时前
Jenkins插件管理切换国内源地址
运维·jenkins
xidianjiapei0016 小时前
Kubernetes的Ingress 资源是什么?
云原生·容器·kubernetes
AI服务老曹6 小时前
运用先进的智能算法和优化模型,进行科学合理调度的智慧园区开源了
运维·人工智能·安全·开源·音视频
企鹅侠客6 小时前
kube-proxy有什么作用?
云原生·kubelet