十七、K8s 可观测性:全链路追踪

十七、K8s 可观测性:全链路追踪

文章目录

  • [十七、K8s 可观测性:全链路追踪](#十七、K8s 可观测性:全链路追踪)
    • [1、Skywalking 初识](#1、Skywalking 初识)
      • [1.1 为什么需要全链路追踪平台](#1.1 为什么需要全链路追踪平台)
      • [1.2 全链路追踪核心组件及工作原理](#1.2 全链路追踪核心组件及工作原理)
        • [1.2.1 全链路追踪核心概念](#1.2.1 全链路追踪核心概念)
        • [1.2.2 全链路追踪工作原理](#1.2.2 全链路追踪工作原理)
      • [1.3 什么是Skywalking?](#1.3 什么是Skywalking?)
      • [1.4 Skywalking架构解析](#1.4 Skywalking架构解析)
      • [1.5 Skywalking核心术语和名词](#1.5 Skywalking核心术语和名词)
    • [2、Skywalking 集群安装](#2、Skywalking 集群安装)
      • [2.1 集群规划](#2.1 集群规划)
      • [2.2 Skywalking 集群安装](#2.2 Skywalking 集群安装)
      • [2.3 Java 服务接入 Skywalking](#2.3 Java 服务接入 Skywalking)
      • [2.4 Go 服务接入 Skywalking](#2.4 Go 服务接入 Skywalking)
      • [2.5 清理环境](#2.5 清理环境)
    • 3、全链路追踪项目练习
      • [3.1 服务部署](#3.1 服务部署)
        • [3.1.1 部署数据库(延用上个实验配置)](#3.1.1 部署数据库(延用上个实验配置))
        • [3.1.2 启动 order 服务](#3.1.2 启动 order 服务)
        • [3.1.3 部署 handler 服务(延用上个实验配置)](#3.1.3 部署 handler 服务(延用上个实验配置))
        • [3.1.4 部署 receive 服务](#3.1.4 部署 receive 服务)
        • [3.1.5 部署前端服务](#3.1.5 部署前端服务)
      • [3.2 服务访问与监控](#3.2 服务访问与监控)
      • [3.3 模拟故障](#3.3 模拟故障)
    • [4、Skywalking 告警](#4、Skywalking 告警)
      • [4.1 Skywalking 告警通知](#4.1 Skywalking 告警通知)
      • [4.2 Skywalking 告警规则](#4.2 Skywalking 告警规则)
      • [4.3 钉钉告警机器人配置](#4.3 钉钉告警机器人配置)
      • [4.4 Skywalking 接入钉钉告警](#4.4 Skywalking 接入钉钉告警)
      • [4.5 自定义告警规则](#4.5 自定义告警规则)

1、Skywalking 初识

1.1 为什么需要全链路追踪平台

  • 快速定位故障点
  • 快速定位性能依赖关系
  • 理解服务依赖关系
  • 全局流量可视化

1.2 全链路追踪核心组件及工作原理

1.2.1 全链路追踪核心概念
  • Trace:一个请求的完整操作过程被称作一个Trace,代表从客户端发起请求到后端完全处理到整个过程,一个trace由多个span组成。
  • Span:一个Span表示Trace中的一部分工作,可以理解为一次函数调用或者是一个HTTP请求。每个Span都包含了操作名称、开始时间、结束时间以及操作相关的元数据等信息。Span具有上下级关系(父子关系),同时多个Span的结合就表达了一次Trace。
  • Trace ID 和 Span ID:每个Trace都有一个唯一的 Trace ID,每一个Span都有一个唯一的 Span ID,并且还包含了指向父级Span的引用。
1.2.2 全链路追踪工作原理

1、客户端发起请求

2、服务A开始处理请求并创建初始Trace和Span

3、服务A将请求转发给服务B,同时传递 race ID 和 Span ID

4、服务B根据传递的信息继续创建新的Span,并标记父Span

5、所有服务处理完成后,各自产生的Span数据都会发送至追踪平台进行汇总

6、用户可以通过UI查看整个Trace的详细信息

1.3 什么是Skywalking?

Skywalking是一个针对分布式系统的应用性能监控(Application Performance Monitor, APM)和可观测性分析平台(Observability Analysis Platform)。Skywalking提供了包括分布式追踪、指标监控、故障诊断信息、服务网格遥测分析、异常告警以及可视化界面等功能,可帮助开发人员和运维团队更好地理解和管理应用和服务。

核心特性:

  • 分布式追踪:Skywalking可以为请求生成跟踪数据,能够帮助用户了解整个调用链路的情况,从而定位性能瓶颈或问题根源
  • 度量分析:支持对服务的健康状况进行度量分析,如响应时间、吞吐量、成功率等关键性能指标(KPI)
  • 告警机制:支持自定义规则告警,当检测到异常情况时自动发送告警通知
  • 丰富的UI界面:提供了直观易用的Web UI,方便用户查看追踪数据、监控指标及服务拓扑结构等
  • 低侵入性:通过字节码注入的方式实现代码级别的监控,无需修改业务逻辑即可完成接入
  • 多语言支持:除了Java之外,还支持.NET Core、Node.js、Python、Go等多种编程语言,满足不同开发环境的需求
  • 多平台集成:支持与服务网格、Kubernetes集成

1.4 Skywalking架构解析

1.5 Skywalking核心术语和名词

  • Service:Service指的是一个或一组提供相同功能或业务逻辑的应用。可以是一个微服务、一个web服务、一个数据库或者其他类型的后端服务
  • Instance:Instance是指服务的一个具体运行实例。在一个分布式环境种,同一个服务可能部署在多个不同的服务器或者容器上,每个容器或服务器上的这个服务就是一个Instance
  • Endpoint:Endpoint是指服务中可被外部访问的具体路径或接口,端点是服务对外暴露功能的入口点

2、Skywalking 集群安装

2.1 集群规划

主机名称 物理IP 系统 资源配置 说明
k8s-master01 192.168.200.50 Rocky9.4 4核8g Master节点
k8s-node01 192.168.200.51 Rocky9.4 4核8g Node01节点
k8s-node02 192.168.200.52 Rocky9.4 4核8g Node02节点

2.2 Skywalking 集群安装

复制代码
# 添加 Skywalking Helm 源
[root@k8s-master01 ~]# export REPO=skywalking
[root@k8s-master01 ~]# helm repo add ${REPO} https://apache.jfrog.io/artifactory/skywalking-helm

# 下载skywalking
[root@k8s-master01 ~]# helm pull skywalking/skywalking

# 解压安装包:
[root@k8s-master01 ~]# tar xf skywalking-4.3.0.tgz 

[root@k8s-master01 ~]# cd skywalking
[root@k8s-master01 skywalking]# vim values.yaml 
[root@k8s-master01 skywalking]# cat values.yaml 
# 更改 Elasticsearch 配置:
elasticsearch:
  antiAffinity: soft
  clusterHealthCheckParams: wait_for_status=green&timeout=10s
  clusterName: es-cluster
  config:
    host: elasticsearch
    password: admin
    port:
      http: 9200
    user: admin
  enabled: true
  esMajorVersion: "7"
  image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/elasticsearch
  imagePullPolicy: IfNotPresent
  imageTag: 7.5.1
  persistence:
    annotations: {}
    enabled: true
  replicas: 3
  resources:
    limits:
      cpu: 2000m
      memory: 3Gi
    requests:
      cpu: 1000m
      memory: 2Gi
  volumeClaimTemplate:
    storageClassName: nfs-csi
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 30Gi
initContainer:
  image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/busybox
  tag: "1.30"


# 更改 OAP 的资源配置:
oap:
  image:
    pullPolicy: IfNotPresent
    repository: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-oap-server
    tag: 10.2.0
  javaOpts: -Xmx2g -Xms2g
  replicas: 3
  resources: 
    limits:
      cpu: 2000m
      memory: 3Gi
    requests:
      cpu: 1000m
      memory: 2Gi
  storageType: elasticsearch


# 更改 UI 配置:
ui:
  image:
    pullPolicy: IfNotPresent
    repository: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-ui
    tag: 10.2.0
  replicas: 3
  service:
    annotations: {}
    externalPort: 80
    internalPort: 8080
    type: NodePort

[root@k8s-master01 skywalking]# vim templates/oap-deployment.yaml 
[root@k8s-master01 skywalking]# sed -n "91,100p" templates/oap-deployment.yaml 
        livenessProbe:
          tcpSocket:
            port: 12800
          initialDelaySeconds: 300
          periodSeconds: 20
        readinessProbe:
          tcpSocket:
            port: 12800
          initialDelaySeconds: 300
          periodSeconds: 20

# 删除冲突资源
[root@k8s-master01 skywalking]# rm -rf charts/elasticsearch/templates/pod*

# 安装:
[root@k8s-master01 skywalking]# helm install skywalking -n skywalking . --create-namespace

# 查看安装状态:
[root@k8s-master01 skywalking]# kubectl get po -n skywalking
NAME                              READY   STATUS    RESTARTS   AGE
es-cluster-master-0               1/1     Running   0          13m
es-cluster-master-1               1/1     Running   0          13m
es-cluster-master-2               1/1     Running   0          13m
skywalking-es-init-mkvw7          1/1     Running   0          13m
skywalking-oap-6d8f594b7c-7w785   1/1     Running   0          13m
skywalking-oap-6d8f594b7c-p4z64   1/1     Running   0          13m
skywalking-oap-6d8f594b7c-vnp8t   1/1     Running   0          13m
skywalking-ui-774674cc7-qcm79     1/1     Running   0          13m
skywalking-ui-774674cc7-qhgg8     1/1     Running   0          13m
skywalking-ui-774674cc7-qwkjm     1/1     Running   0          13m

# 查看service
[root@k8s-master01 skywalking]# kubectl get svc skywalking-ui -n skywalking
NAME            TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
skywalking-ui   NodePort   10.108.110.98   <none>        80:31319/TCP   14m

访问 skywalking-ui

2.3 Java 服务接入 Skywalking

Java Agent 参考文档:

Java 语言:

  • JAVA_TOOL_OPTIONS:指定 JAVA 的启动参数,加载 agent 可以通过该变量实现,比如-javaagent:/skywalking/agent/skywalking-agent.jar
  • SW_AGENT_NAME:服务名称,建议格式<组名>::<逻辑名>,推荐配置为命令空
    间::服务名称
  • SW_AGENT_INSTANCE_NAME:实例名称,通常用于表示同一个服务不同的示
    例,默认为 UUID@hostname,推荐使用 Pod 名称作为实例名称
  • SW_AGENT_COLLECTOR_BACKEND_SERVICES:Skywalking OAP 地址
复制代码
[root@k8s-master01 skywalking]# mkdir demo/
[root@k8s-master01 skywalking]# cd demo/
[root@k8s-master01 demo]# vim demo-handler-deploy-sw.yaml 
[root@k8s-master01 demoskywalking]# cat demo-handler-deploy-sw.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: demo-handler
  name: demo-handler
  namespace: demo
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: demo-handler
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo-handler
    spec:
      volumes:                      # 添加 Volumes 及初始化容器
      - name: skywalking-agent
        emptyDir: {}
      initContainers:
      - name: agent-container
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-java-agent:9.4.0-java8
        volumeMounts:
        - name: skywalking-agent
          mountPath: /agent
        command: [ "/bin/sh" ]
        args: [ "-c", "cp -R /skywalking/agent /agent/ ; mkdir -p /agent/agent/logs/ ; chown -R 1001.1001 /agent" ]
      containers:
      - env:
        - name: SPRING_PROFILES_ACTIVE
          value: k8supgrade
        - name: SERVER_PORT
          value: "8080"
        - name: JAVA_TOOL_OPTIONS           # 添加环境变量
          value: "-javaagent:/skywalking/agent/skywalking-agent.jar"
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: APP
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app']
        - name: SW_AGENT_NAME
          value: "$(NAMESPACE)::$(APP)"
        - name: SW_AGENT_INSTANCE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: SW_AGENT_COLLECTOR_BACKEND_SERVICES
          value: skywalking-oap.skywalking:11800
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-handler:v1-upgrade
        imagePullPolicy: IfNotPresent
        volumeMounts:                       # 添加挂载
        - name: skywalking-agent
          mountPath: /skywalking
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        name: demo-handler
        readinessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

# 接下来创建服务并测试:
[root@k8s-master01 demoskywalking]# kubectl create namespace demo 
[root@k8s-master01 demoskywalking]# kubectl create -f demo-handler-deploy-sw.yaml -n demo

# 检查pod情况
[root@k8s-master01 demoskywalking]# kubectl get po -n demo -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
demo-handler-5b6f9dd9c7-88pr649d6fd88f-kxhqb   1/1     Running   0          77s   1792.16.58.233   k8s-node028.32.140   k8s-master01   <none>           <none>

# 访问测试(可以多测试几次)
[root@k8s-master01 demoskywalking]# curl 1792.16.58.2338.32.140:8080/api/generate
O4E,\1L!u-bzTE[7Fn#VCS+eK?fwcp|k

查看 skywalking 图表:

拓扑图

2.4 Go 服务接入 Skywalking

Go Agent 参考文档:

Go 语言:

  • SW_AGENT_REPORTER_GRPC_BACKEND_SERVICE:Skywalking OAP 地址
  • SW_AGENT_NAME:服务名称,建议格式<组名>::<逻辑名>,推荐配置为命令空
    间::服务名称
  • SW_AGENT_INSTANCE_NAME:实例名称,通常用于表示同一个服务不同的示例,默认为 UUID@hostname,推荐使用 Pod 名称作为实例名称
复制代码
# 下载测试程序:
[root@habor ~]# git clone https://gitee.com/dukuan/demo-order.git

# 编写dockerfile文件
[root@habor ~]# cd demo-order-master
[root@habor demo-order-master]# vim Dockerfile 
[root@habor demo-order-master]# cat Dockerfile 
FROM crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-go:0.5.0-go1.22 AS builder
COPY ./ /go/src/
WORKDIR /go/src/

RUN export GO111MODULE=on && \
    export GOPROXY=https://goproxy.cn,direct && \
    skywalking-go-agent -inject /go/src && \
    go build -o ./order -toolexec="skywalking-go-agent" -a /go/src

FROM crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/alpine:3.20
COPY --from=builder /go/src/order .
CMD [ "./order" ]


# 制作镜像
[root@habor demo-order-master]# docker build -t crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v1 .

# 推送镜像到镜像仓库
[root@habor demo-order-master]# docker push crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v1

[root@k8s-master01 demo]# vim mysql.yaml 
[root@k8s-master01 demo]# cat mysql.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mysql
  name: mysql
  namespace: demo
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: mysql
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: mysql
    spec:
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: mysql-data
      containers:
      - env:
        - name: MYSQL_ROOT_PASSWORD
          value: password
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/mysql:8.0.20
        imagePullPolicy: IfNotPresent
        name: mysql
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
            - name: data
              mountPath: /var/lib/mysql
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

[root@k8s-master01 demo]# vim mysql-svc.yaml 
[root@k8s-master01 demo]# cat mysql-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: mysql
  name: mysql
  namespace: demo
spec:
  ports:
  - nodePort: 32541
    port: 3306
    protocol: TCP
    targetPort: 3306
  selector:
    app: mysql
  sessionAffinity: None
  type: NodePort

[root@k8s-master01 demo]# vim mysql-pvc.yaml 
[root@k8s-master01 demo]# cat mysql-pvc.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data
  namespace: demo
spec:
  resources:
    requests:
      storage: 5Gi
  volumeMode: Filesystem
  storageClassName: nfs-csi
  accessModes:
    - ReadWriteOnce


# 创建基础组件服务:
[root@k8s-master01 demo]# kubectl create -f mysql.yaml -f mysql-svc.yaml -f mysql-pvc.yaml -n demo

# 查看pod
[root@k8s-master01 demo]# kubectl get po -n demo 
NAME                            READY   STATUS    RESTARTS   AGE
....
mysql-6d698b4676-8hsn8          1/1     Running   0          3m22s

# 配置数据库:
[root@k8s-master01 demo]# kubectl exec -it mysql-6d698b4676-8hsn8 -n demo -- bash
root@mysql-6d698b4676-8hsn8:/# mysql -uroot -ppassword
....
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database orders;
Query OK, 1 row affected (0.01 sec)

mysql> CREATE USER 'order'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.01 sec)

mysql> GRANT ALL ON orders.* TO 'order'@'%';
Query OK, 0 rows affected (0.02 sec)

# 由于 Go 的代码在编译时已经插入探针,所以在启动时,无法特别指定配置,只需要保留相关的环境变量即可:
[root@k8s-master01 demo]# vim demo-order-deploy.yaml
[root@k8s-master01 demo]# cat demo-order-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: demo-order
  name: demo-order
  namespace: demo
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: demo-order
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo-order
    spec:
      containers:
      - env:
        - name: MYSQL_HOST
          value: mysql
        - name: MYSQL_PORT
          value: "3306"
        - name: MYSQL_USER
          value: order
        - name: MYSQL_PASSWORD
          value: password
        - name: MYSQL_DB
          value: orders

        # 添加变量
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: APP
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app']
        - name: SW_AGENT_NAME
          value: "$(NAMESPACE)::$(APP)"
                #- name: SW_AGENT_NAME
                #  value: demo::demo-order
        - name: SW_AGENT_INSTANCE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: SW_AGENT_REPORTER_GRPC_BACKEND_SERVICE
          value: skywalking-oap.skywalking:11800

        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v2
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        name: demo-order
        readinessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

# 接下来创建服务并测试:
[root@k8s-master01 demo]# kubectl create -f demo-order-deploy.yaml -n demo

# 检查pod情况
[root@k8s-master01 demo]# kubectl get po -n demo -owide
NAME                            READY   STATUS    RESTARTS   AGE    IP              NODE         NOMINATED NODE   READINESS GATES
demo-order-755cdc96-ltlzg       1/1     Running   0          65s    172.16.58.239   k8s-node02   <none>           <none>

# 访问测试(可以多测试几次)
[root@k8s-master01 demo]# curl 172.16.58.239:8080/orders
[{"id":1,"name":"Order 1","price":10},{"id":2,"name":"Order 2","price":20}]

查看 skywalking 图表:

自动检测数据库

2.5 清理环境

复制代码
[root@k8s-master01 demo]# kubectl delete deploy -n demo --all

3、全链路追踪项目练习

通过上述的学习,Skywalking 已经成功接入 Go 和 Java 的链路数据,接下来通过一个完整的项目,继续巩固 Skywalking 的学习。

项目架构:

3.1 服务部署

3.1.1 部署数据库(延用上个实验配置)
复制代码
# 部署数据库
[root@k8s-master01 demo]# kubectl create -f mysql.yaml -f mysql-svc.yaml -f 
[root@k8s-master01 demo]# kubectl get po -n demo
NAME                     READY   STATUS    RESTARTS   AGE
mysql-6d698b4676-sk8hj   1/1     Running   0          17s

# 创建账号
[root@k8s-master01 demo]# kubectl exec -it mysql-6d698b4676-sk8hj -n demo -- bash
root@mysql-6d698b4676-sk8hj:/# mysql -uroot -ppassword
....
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database orders;
Query OK, 1 row affected (0.04 sec)

mysql> CREATE USER 'order'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.02 sec)

mysql> GRANT ALL ON orders.* TO 'order'@'%';
Query OK, 0 rows affected (0.01 sec)
3.1.2 启动 order 服务
复制代码
# 启动 order 服务,order 服务为 Go 程序,无需更改额外的配置即可完成监控数据的推送:
# 延用上个实验配置,创建一个service
[root@k8s-master01 demo]# vim demo-order-svc.yaml 
[root@k8s-master01 demo]# cat demo-order-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: order
  name: order
  namespace: demo
spec:
  ports:
  - name: http-web
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: demo-order
  sessionAffinity: None
  type: ClusterIP


# 配置一个对外的域名
[root@k8s-master01 demo]# vim demo-order-ingress.yaml 
[root@k8s-master01 demo]# cat demo-order-ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: demo-order
  namespace: demo
spec:
  ingressClassName: nginx
  rules:
  - host: demo.test.com
    http:
      paths:
      - backend:
          service:
            name: order
            port:
              number: 80
        path: /orders
        pathType: ImplementationSpecific

# 创建服务
[root@k8s-master01 demo]# kubectl create -f demo-order-deploy.yaml -f demo-order-svc.yaml -f demo-order-ingress.yaml -n demo

# 查看服务状态:
[root@k8s-master01 demo]# kubectl get pod -n demo -owide
NAME                        READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
demo-order-755cdc96-8qlc9   1/1     Running   0          2m54s   172.16.58.245   k8s-node02   <none>           <none>
mysql-6d698b4676-sk8hj      1/1     Running   0          111m    172.16.58.241   k8s-node02   <none>           <none>

[root@k8s-master01 demo]# kubectl get svc,ingress -n demo
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/mysql   NodePort    10.111.54.12     <none>        3306:32541/TCP   111m
service/order   ClusterIP   10.101.166.166   <none>        80/TCP           3m1s

NAME                                   CLASS   HOSTS           ADDRESS          PORTS   AGE
ingress.networking.k8s.io/demo-order   nginx   demo.test.com   192.168.200.52   80      3m1s


# 测试访问:
[root@k8s-master01 demo]# echo "192.168.200.52 demo.test.com" >> /etc/hosts
[root@k8s-master01 demo]# curl demo.test.com/orders
[{"id":1,"name":"Order 1","price":10},{"id":2,"name":"Order 2","price":20},{"id":3,"name":"Order 1","price":10},{"id":4,"name":"Order 2","price":20}]
3.1.3 部署 handler 服务(延用上个实验配置)
复制代码
# 部署 handler 服务
[root@k8s-master01 demo]# kubectl create -f demo-handler-deploy-sw.yaml -f demo-handler-svc.yaml -n demo
3.1.4 部署 receive 服务
复制代码
[root@k8s-master01 demo]# vim demo-receive-deploy.yaml 
[root@k8s-master01 demo]# cat demo-receive-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: demo-receive
  name: demo-receive
  namespace: demo
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: demo-receive
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo-receive
    spec:
      volumes:
      - name: skywalking-agent
        emptyDir: {}
      initContainers:
      - name: agent-container
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-java-agent:9.4.0-java8
        volumeMounts:
        - name: skywalking-agent
          mountPath: /agent
        command: [ "/bin/sh" ]
        args: [ "-c", "cp -R /skywalking/agent /agent/ ; mkdir -p /agent/agent/logs/ ; chown -R 1001.1001 /agent" ]
      containers:
      - env:
        - name: SPRING_PROFILES_ACTIVE
          value: k8supgrade
        - name: SERVER_PORT
          value: "8080"
        - name: JAVA_TOOL_OPTIONS
          value: "-javaagent:/skywalking/agent/skywalking-agent.jar"
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: APP
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app']
        - name: SW_AGENT_NAME
          value: "$(NAMESPACE)::$(APP)"
        - name: SW_AGENT_INSTANCE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: SW_AGENT_COLLECTOR_BACKEND_SERVICES
          value: skywalking-oap.skywalking:11800
        volumeMounts:
        - name: skywalking-agent
          mountPath: /skywalking
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-receive:v1-upgrade
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        name: demo-receive
        readinessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      
      
[root@k8s-master01 demo]# vim demo-receive-svc.yaml     
[root@k8s-master01 demo]# cat demo-receive-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: demo-receive
  name: demo-receive
  namespace: demo
spec:
  ports:
  - name: http-web
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: demo-receive
  sessionAffinity: None
  type: ClusterIP
  

[root@k8s-master01 demo]# vim demo-receive-ingress.yaml
[root@k8s-master01 demo]# cat demo-receive-ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  name: demo-receive
  namespace: demo
spec:
  ingressClassName: nginx
  rules:
  - host: demo.test.com
    http:
      paths:
      - backend:
          service:
            name: demo-receive
            port:
              number: 8080
        path: /receiveapi(/|$)(.*)
        pathType: ImplementationSpecific


# 部署 receive 服务:
[root@k8s-master01 demo]# kubectl create -f demo-receive-deploy.yaml -f demo-receive-svc.yaml -f demo-receive-ingress.yaml -n demo
3.1.5 部署前端服务
复制代码
[root@k8s-master01 demo]# vim demo-ui-deploy.yaml 
[root@k8s-master01 demo]# cat demo-ui-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: demo-ui
  name: demo-ui
  namespace: demo
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: demo-ui
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo-ui
    spec:
      containers:
      - image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-ui:sw
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 80
          timeoutSeconds: 2
        name: demo-ui
        readinessProbe:
          failureThreshold: 2
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 80
          timeoutSeconds: 2
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      

[root@k8s-master01 demo]# vim demo-ui-svc.yaml 
[root@k8s-master01 demo]# cat demo-ui-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: demo-ui
  name: demo-ui
  namespace: demo
spec:
  ports:
  - name: http-web
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: demo-ui
  sessionAffinity: None
  type: ClusterIP
  
  
[root@k8s-master01 demo]# vim demo-ui-ingress.yaml 
[root@k8s-master01 demo]# cat demo-ui-ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: demo-ui
  namespace: demo
spec:
  ingressClassName: nginx
  rules:
  - host: demo.test.com
    http:
      paths:
      - backend:
          service:
            name: demo-ui
            port:
              number: 80
        path: /
        pathType: ImplementationSpecific
        
        
# 部署前端服务:
[root@k8s-master01 demo]# kubectl create -f demo-ui-deploy.yaml -f demo-ui-svc.yaml -f demo-ui-ingress.yaml -nn demo

# 部署完毕后,最终的服务如下:
[root@k8s-master01 demo]# kubectl get po,svc,ingress -n demo
NAME                                READY   STATUS    RESTARTS      AGE
pod/demo-handler-5b6f9dd9c7-g4k5s   1/1     Running   1 (25m ago)   26m
pod/demo-order-755cdc96-8qlc9       1/1     Running   0             47m
pod/demo-receive-5cf555cdfd-j5g76   1/1     Running   1 (14m ago)   16m
pod/demo-ui-66bb5f4d67-smbpb        1/1     Running   0             83s
pod/mysql-6d698b4676-sk8hj          1/1     Running   0             155m

NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/demo-receive   ClusterIP   10.103.251.213   <none>        8080/TCP         16m
service/demo-ui        ClusterIP   10.106.49.125    <none>        80/TCP           83s
service/handler        ClusterIP   10.102.43.148    <none>        80/TCP           26m
service/mysql          NodePort    10.111.54.12     <none>        3306:32541/TCP   155m
service/order          ClusterIP   10.101.166.166   <none>        80/TCP           47m

NAME                                     CLASS   HOSTS           ADDRESS          PORTS   AGE
ingress.networking.k8s.io/demo-order     nginx   demo.test.com   192.168.200.52   80      47m
ingress.networking.k8s.io/demo-receive   nginx   demo.test.com   192.168.200.52   80      16m
ingress.networking.k8s.io/demo-ui        nginx   demo.test.com   192.168.200.52   80      83s

接下来通过浏览器访问:

3.2 服务访问与监控

接下来访问页面,测试生成密码和创建订单:

之后就可以看到整个项目的架构图:

创建订单会有随机延迟,延迟信息也可以在 skywalking 上面看到 trace 信息:

3.3 模拟故障

复制代码
# 接下来模拟 handler 服务故障:
[root@k8s-master01 demo]# kubectl scale deploy demo-handler mysql --replicas=0 -n demo

再次访问即可收集到错误的链路信息:

4、Skywalking 告警

4.1 Skywalking 告警通知

Skywalking支持针对采集的Metrics数据进行监控告警,并可以在出现异常时及时作出反应。通过合理配置告警规则和钩子,可以实现有效地预防潜在问题并及时定位相关问题。

Skywalking的告警核心由一组规则实现,主要包含如下三个部分:

  • 指标(Metrics):Skywalking收集的关于服务、实例和端点的各种性能指标数据
  • 规则(Rules):告警的触发规则,默认定义在 config/alarm-settings.yaml 文件中,支持比较运算符和逻辑运算符等
  • 钩子(Hooks):当告警被触发后,通过钩子来执行特定的操作,如发送通知等

4.2 Skywalking 告警规则

Skywalking 告警规则由如下元素组成:

  • 规则名称:全局唯一,必须由_rule结尾
  • expression:使用MOE(Metrics Query Expression)定义,表达式的结果必须是SINGLE_VALUE,且根操作必须是一个比较操作或布尔操作,同时结果需要为1(true)或0(false),当结果为1(true)时,告警会被触发
  • include-name:包含的实体名称,可以是Service、Instance、Endpoint等,列表类型
  • exclude-names:排除的实体名称
  • include-names-regex:正则匹配包含
  • exclude-names-regex:正则匹配排除
  • tags:附加告警标签,比如level=warning
  • period:周期,检查告警条件的时间窗口大小,以分钟为单位
  • silence-period:静默期,某个告警被触发后,在接下来的一段时间内,该告警不会再次被触发,不指定该值则和period一样
  • hooks:告警触发时绑定的钩子名称,名称格式为{hookType}.{hookName}(例如slack.customl),并且必须在alarm-settings.yml文件的hooks部分定义。如果未指定钩子名称,则会使用全局钩子
  • message:告警信息,可以用作描述当前告警

4.3 钉钉告警机器人配置

使用钉钉告警,需要先创建一个群聊,然后添加一个机器人:

添加机器人

选择自定义

填写机器人名称,以及复制密匙

添加机器人以及复制Webhook

4.4 Skywalking 接入钉钉告警

首先把 Skywalking 告警的配置文件放置在 Skywalking 的安装目录:

复制代码
# 创建告警存放目录
[root@k8s-master01 demo]# mkdir -p ../files/conf.d/oap
[root@k8s-master01 demo]# cd ../files/conf.d/oap

# 从oap容器里把告警模板文件copy出来
[root@k8s-master01 oap]# kubectl cp skywalking-oap-6d8f594b7c-xrnbr:/skywalking/config/alarm-settings.yml ./alarm-settings.yml -n skywalking

# 添加钉钉告警
[root@k8s-master01 oap]# vim alarm-settings.yml 
[root@k8s-master01 oap]# tail -14 alarm-settings.yml 
hooks:
  dingtalk:
    default:
      is-default: true
      text-template: |-
        {
          "msgtype": "text",
          "text": {
            "content": "Apache SkyWalking Alarm: \n %s."
            } 
        }
      webhooks:
      - url: https://oapi.dingtalk.com/robot/send?access_token=c7cd207fd31cd72f433d67effda0568b681b10f626f97c02cb55f03b73b651c5
        secret: SECedef18728aa48ea6ca4c2f595967f6c389e2fc4d13bfca2741087b8c8878e017

# 更新配置(需要回到skywalking根目录)
[root@k8s-master01 oap]# cd ../../..
[root@k8s-master01 skywalking]# helm upgrade skywalking . -n skywalking

# 查看 Pod 更新状态:
[root@k8s-master01 skywalking]# kubectl get po -n skywalking | grep oap
skywalking-oap-5644bbbd46-hvvxx   1/1     Running     0          11m

# 查看配置文件是否更新:
[root@k8s-master01 skywalking]# kubectl exec skywalking-oap-5644bbbd46-hvvxx -n skywalking -- tail -14 config/alarm-settings.yml
Defaulted container "oap" out of: oap, wait-for-elasticsearch (init)
hooks:
  dingtalk:
    default:
      is-default: true
      text-template: |-
        {
          "msgtype": "text",
          "text": {
            "content": "Apache SkyWalking Alarm: \n %s."
            } 
        }
      webhooks:
      - url: https://oapi.dingtalk.com/robot/send?access_token=c7cd207fd31cd72f433d67effda0568b681b10f626f97c02cb55f03b73b651c5
        secret: SECedef18728aa48ea6ca4c2f595967f6c389e2fc4d13bfca2741087b8c8878e017

请求服务,触发告警:

等待一会钉钉即可查询到告警信息

4.5 自定义告警规则

除了默认告警,还可以添加一些自定义告警,比如想要监控 Java 服务 JVM 线程池是否阻塞,可以通过 instance_jvm_thread_blocked_state_thread_count 指标进行监控。

复制代码
# 比如监控 JVM 阻塞的线程数大于 5:
[root@k8s-master01 oap]# vim alarm-settings.yml 
[root@k8s-master01 oap]# cat alarm-settings.yml 
....
rules:
  thread_block_rule:
    expression: sum(instance_jvm_thread_blocked_state_thread_count >5) >= 2
    period: 5       # 检查过去 5 分钟的数据
    message: "服务 {name} 的线程池,在过去两分钟内被阻塞的数量超过 5"
....

# 更改配置文件后,更新配置:
[root@k8s-master01 skywalking]# helm upgrade skywalking -n skywalking .
[root@k8s-master01 skywalking]# kubectl rollout restart deploy skywalking-oap -n skywalking

此博客来源于:https://edu.51cto.com/lecturer/11062970.html

相关推荐
炫友呀1 小时前
Centos 更新/修改宝塔版本
linux·运维·centos
AKAMAI2 小时前
AI需要防火墙,云计算需要重新构想
人工智能·云原生·云计算
Agome994 小时前
Docker之自定义jkd镜像上传阿里云
阿里云·docker·容器
花小璇学linux5 小时前
imx6ull-驱动开发篇24——Linux 中断API函数
linux·驱动开发·嵌入式软件
林开落L5 小时前
库制作与原理(下)
linux·开发语言·centos·库制作与原理
小猿姐5 小时前
KubeBlocks for Milvus 揭秘
数据库·云原生
wxy3195 小时前
嵌入式LINUX——————TCP并发服务器
java·linux·网络
Castamere6 小时前
配置 Linux 终端 (zsh)
linux
小韩博7 小时前
metasploit 框架安装更新遇到无法下载问题如何解决
linux·网络安全·公钥·下载失败
长臂人猿7 小时前
JVM常用工具:jstat、jmap、jstack
linux·运维·jvm