十七、K8s 可观测性:全链路追踪

十七、K8s 可观测性:全链路追踪

文章目录

  • [十七、K8s 可观测性:全链路追踪](#十七、K8s 可观测性:全链路追踪)
    • [1、Skywalking 初识](#1、Skywalking 初识)
      • [1.1 为什么需要全链路追踪平台](#1.1 为什么需要全链路追踪平台)
      • [1.2 全链路追踪核心组件及工作原理](#1.2 全链路追踪核心组件及工作原理)
        • [1.2.1 全链路追踪核心概念](#1.2.1 全链路追踪核心概念)
        • [1.2.2 全链路追踪工作原理](#1.2.2 全链路追踪工作原理)
      • [1.3 什么是Skywalking?](#1.3 什么是Skywalking?)
      • [1.4 Skywalking架构解析](#1.4 Skywalking架构解析)
      • [1.5 Skywalking核心术语和名词](#1.5 Skywalking核心术语和名词)
    • [2、Skywalking 集群安装](#2、Skywalking 集群安装)
      • [2.1 集群规划](#2.1 集群规划)
      • [2.2 Skywalking 集群安装](#2.2 Skywalking 集群安装)
      • [2.3 Java 服务接入 Skywalking](#2.3 Java 服务接入 Skywalking)
      • [2.4 Go 服务接入 Skywalking](#2.4 Go 服务接入 Skywalking)
      • [2.5 清理环境](#2.5 清理环境)
    • 3、全链路追踪项目练习
      • [3.1 服务部署](#3.1 服务部署)
        • [3.1.1 部署数据库(延用上个实验配置)](#3.1.1 部署数据库(延用上个实验配置))
        • [3.1.2 启动 order 服务](#3.1.2 启动 order 服务)
        • [3.1.3 部署 handler 服务(延用上个实验配置)](#3.1.3 部署 handler 服务(延用上个实验配置))
        • [3.1.4 部署 receive 服务](#3.1.4 部署 receive 服务)
        • [3.1.5 部署前端服务](#3.1.5 部署前端服务)
      • [3.2 服务访问与监控](#3.2 服务访问与监控)
      • [3.3 模拟故障](#3.3 模拟故障)
    • [4、Skywalking 告警](#4、Skywalking 告警)
      • [4.1 Skywalking 告警通知](#4.1 Skywalking 告警通知)
      • [4.2 Skywalking 告警规则](#4.2 Skywalking 告警规则)
      • [4.3 钉钉告警机器人配置](#4.3 钉钉告警机器人配置)
      • [4.4 Skywalking 接入钉钉告警](#4.4 Skywalking 接入钉钉告警)
      • [4.5 自定义告警规则](#4.5 自定义告警规则)

1、Skywalking 初识

1.1 为什么需要全链路追踪平台

  • 快速定位故障点
  • 快速定位性能依赖关系
  • 理解服务依赖关系
  • 全局流量可视化

1.2 全链路追踪核心组件及工作原理

1.2.1 全链路追踪核心概念
  • Trace:一个请求的完整操作过程被称作一个Trace,代表从客户端发起请求到后端完全处理到整个过程,一个trace由多个span组成。
  • Span:一个Span表示Trace中的一部分工作,可以理解为一次函数调用或者是一个HTTP请求。每个Span都包含了操作名称、开始时间、结束时间以及操作相关的元数据等信息。Span具有上下级关系(父子关系),同时多个Span的结合就表达了一次Trace。
  • Trace ID 和 Span ID:每个Trace都有一个唯一的 Trace ID,每一个Span都有一个唯一的 Span ID,并且还包含了指向父级Span的引用。
1.2.2 全链路追踪工作原理

1、客户端发起请求

2、服务A开始处理请求并创建初始Trace和Span

3、服务A将请求转发给服务B,同时传递 race ID 和 Span ID

4、服务B根据传递的信息继续创建新的Span,并标记父Span

5、所有服务处理完成后,各自产生的Span数据都会发送至追踪平台进行汇总

6、用户可以通过UI查看整个Trace的详细信息

1.3 什么是Skywalking?

Skywalking是一个针对分布式系统的应用性能监控(Application Performance Monitor, APM)和可观测性分析平台(Observability Analysis Platform)。Skywalking提供了包括分布式追踪、指标监控、故障诊断信息、服务网格遥测分析、异常告警以及可视化界面等功能,可帮助开发人员和运维团队更好地理解和管理应用和服务。

核心特性:

  • 分布式追踪:Skywalking可以为请求生成跟踪数据,能够帮助用户了解整个调用链路的情况,从而定位性能瓶颈或问题根源
  • 度量分析:支持对服务的健康状况进行度量分析,如响应时间、吞吐量、成功率等关键性能指标(KPI)
  • 告警机制:支持自定义规则告警,当检测到异常情况时自动发送告警通知
  • 丰富的UI界面:提供了直观易用的Web UI,方便用户查看追踪数据、监控指标及服务拓扑结构等
  • 低侵入性:通过字节码注入的方式实现代码级别的监控,无需修改业务逻辑即可完成接入
  • 多语言支持:除了Java之外,还支持.NET Core、Node.js、Python、Go等多种编程语言,满足不同开发环境的需求
  • 多平台集成:支持与服务网格、Kubernetes集成

1.4 Skywalking架构解析

1.5 Skywalking核心术语和名词

  • Service:Service指的是一个或一组提供相同功能或业务逻辑的应用。可以是一个微服务、一个web服务、一个数据库或者其他类型的后端服务
  • Instance:Instance是指服务的一个具体运行实例。在一个分布式环境种,同一个服务可能部署在多个不同的服务器或者容器上,每个容器或服务器上的这个服务就是一个Instance
  • Endpoint:Endpoint是指服务中可被外部访问的具体路径或接口,端点是服务对外暴露功能的入口点

2、Skywalking 集群安装

2.1 集群规划

主机名称 物理IP 系统 资源配置 说明
k8s-master01 192.168.200.50 Rocky9.4 4核8g Master节点
k8s-node01 192.168.200.51 Rocky9.4 4核8g Node01节点
k8s-node02 192.168.200.52 Rocky9.4 4核8g Node02节点

2.2 Skywalking 集群安装

复制代码
# 添加 Skywalking Helm 源
[root@k8s-master01 ~]# export REPO=skywalking
[root@k8s-master01 ~]# helm repo add ${REPO} https://apache.jfrog.io/artifactory/skywalking-helm

# 下载skywalking
[root@k8s-master01 ~]# helm pull skywalking/skywalking

# 解压安装包:
[root@k8s-master01 ~]# tar xf skywalking-4.3.0.tgz 

[root@k8s-master01 ~]# cd skywalking
[root@k8s-master01 skywalking]# vim values.yaml 
[root@k8s-master01 skywalking]# cat values.yaml 
# 更改 Elasticsearch 配置:
elasticsearch:
  antiAffinity: soft
  clusterHealthCheckParams: wait_for_status=green&timeout=10s
  clusterName: es-cluster
  config:
    host: elasticsearch
    password: admin
    port:
      http: 9200
    user: admin
  enabled: true
  esMajorVersion: "7"
  image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/elasticsearch
  imagePullPolicy: IfNotPresent
  imageTag: 7.5.1
  persistence:
    annotations: {}
    enabled: true
  replicas: 3
  resources:
    limits:
      cpu: 2000m
      memory: 3Gi
    requests:
      cpu: 1000m
      memory: 2Gi
  volumeClaimTemplate:
    storageClassName: nfs-csi
    accessModes:
    - ReadWriteOnce
    resources:
      requests:
        storage: 30Gi
initContainer:
  image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/busybox
  tag: "1.30"


# 更改 OAP 的资源配置:
oap:
  image:
    pullPolicy: IfNotPresent
    repository: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-oap-server
    tag: 10.2.0
  javaOpts: -Xmx2g -Xms2g
  replicas: 3
  resources: 
    limits:
      cpu: 2000m
      memory: 3Gi
    requests:
      cpu: 1000m
      memory: 2Gi
  storageType: elasticsearch


# 更改 UI 配置:
ui:
  image:
    pullPolicy: IfNotPresent
    repository: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-ui
    tag: 10.2.0
  replicas: 3
  service:
    annotations: {}
    externalPort: 80
    internalPort: 8080
    type: NodePort

[root@k8s-master01 skywalking]# vim templates/oap-deployment.yaml 
[root@k8s-master01 skywalking]# sed -n "91,100p" templates/oap-deployment.yaml 
        livenessProbe:
          tcpSocket:
            port: 12800
          initialDelaySeconds: 300
          periodSeconds: 20
        readinessProbe:
          tcpSocket:
            port: 12800
          initialDelaySeconds: 300
          periodSeconds: 20

# 删除冲突资源
[root@k8s-master01 skywalking]# rm -rf charts/elasticsearch/templates/pod*

# 安装:
[root@k8s-master01 skywalking]# helm install skywalking -n skywalking . --create-namespace

# 查看安装状态:
[root@k8s-master01 skywalking]# kubectl get po -n skywalking
NAME                              READY   STATUS    RESTARTS   AGE
es-cluster-master-0               1/1     Running   0          13m
es-cluster-master-1               1/1     Running   0          13m
es-cluster-master-2               1/1     Running   0          13m
skywalking-es-init-mkvw7          1/1     Running   0          13m
skywalking-oap-6d8f594b7c-7w785   1/1     Running   0          13m
skywalking-oap-6d8f594b7c-p4z64   1/1     Running   0          13m
skywalking-oap-6d8f594b7c-vnp8t   1/1     Running   0          13m
skywalking-ui-774674cc7-qcm79     1/1     Running   0          13m
skywalking-ui-774674cc7-qhgg8     1/1     Running   0          13m
skywalking-ui-774674cc7-qwkjm     1/1     Running   0          13m

# 查看service
[root@k8s-master01 skywalking]# kubectl get svc skywalking-ui -n skywalking
NAME            TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
skywalking-ui   NodePort   10.108.110.98   <none>        80:31319/TCP   14m

访问 skywalking-ui

2.3 Java 服务接入 Skywalking

Java Agent 参考文档:

Java 语言:

  • JAVA_TOOL_OPTIONS:指定 JAVA 的启动参数,加载 agent 可以通过该变量实现,比如-javaagent:/skywalking/agent/skywalking-agent.jar
  • SW_AGENT_NAME:服务名称,建议格式<组名>::<逻辑名>,推荐配置为命令空
    间::服务名称
  • SW_AGENT_INSTANCE_NAME:实例名称,通常用于表示同一个服务不同的示
    例,默认为 UUID@hostname,推荐使用 Pod 名称作为实例名称
  • SW_AGENT_COLLECTOR_BACKEND_SERVICES:Skywalking OAP 地址
复制代码
[root@k8s-master01 skywalking]# mkdir demo/
[root@k8s-master01 skywalking]# cd demo/
[root@k8s-master01 demo]# vim demo-handler-deploy-sw.yaml 
[root@k8s-master01 demoskywalking]# cat demo-handler-deploy-sw.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: demo-handler
  name: demo-handler
  namespace: demo
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: demo-handler
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo-handler
    spec:
      volumes:                      # 添加 Volumes 及初始化容器
      - name: skywalking-agent
        emptyDir: {}
      initContainers:
      - name: agent-container
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-java-agent:9.4.0-java8
        volumeMounts:
        - name: skywalking-agent
          mountPath: /agent
        command: [ "/bin/sh" ]
        args: [ "-c", "cp -R /skywalking/agent /agent/ ; mkdir -p /agent/agent/logs/ ; chown -R 1001.1001 /agent" ]
      containers:
      - env:
        - name: SPRING_PROFILES_ACTIVE
          value: k8supgrade
        - name: SERVER_PORT
          value: "8080"
        - name: JAVA_TOOL_OPTIONS           # 添加环境变量
          value: "-javaagent:/skywalking/agent/skywalking-agent.jar"
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: APP
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app']
        - name: SW_AGENT_NAME
          value: "$(NAMESPACE)::$(APP)"
        - name: SW_AGENT_INSTANCE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: SW_AGENT_COLLECTOR_BACKEND_SERVICES
          value: skywalking-oap.skywalking:11800
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-handler:v1-upgrade
        imagePullPolicy: IfNotPresent
        volumeMounts:                       # 添加挂载
        - name: skywalking-agent
          mountPath: /skywalking
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        name: demo-handler
        readinessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

# 接下来创建服务并测试:
[root@k8s-master01 demoskywalking]# kubectl create namespace demo 
[root@k8s-master01 demoskywalking]# kubectl create -f demo-handler-deploy-sw.yaml -n demo

# 检查pod情况
[root@k8s-master01 demoskywalking]# kubectl get po -n demo -owide
NAME                            READY   STATUS    RESTARTS   AGE   IP               NODE           NOMINATED NODE   READINESS GATES
demo-handler-5b6f9dd9c7-88pr649d6fd88f-kxhqb   1/1     Running   0          77s   1792.16.58.233   k8s-node028.32.140   k8s-master01   <none>           <none>

# 访问测试(可以多测试几次)
[root@k8s-master01 demoskywalking]# curl 1792.16.58.2338.32.140:8080/api/generate
O4E,\1L!u-bzTE[7Fn#VCS+eK?fwcp|k

查看 skywalking 图表:

拓扑图

2.4 Go 服务接入 Skywalking

Go Agent 参考文档:

Go 语言:

  • SW_AGENT_REPORTER_GRPC_BACKEND_SERVICE:Skywalking OAP 地址
  • SW_AGENT_NAME:服务名称,建议格式<组名>::<逻辑名>,推荐配置为命令空
    间::服务名称
  • SW_AGENT_INSTANCE_NAME:实例名称,通常用于表示同一个服务不同的示例,默认为 UUID@hostname,推荐使用 Pod 名称作为实例名称
复制代码
# 下载测试程序:
[root@habor ~]# git clone https://gitee.com/dukuan/demo-order.git

# 编写dockerfile文件
[root@habor ~]# cd demo-order-master
[root@habor demo-order-master]# vim Dockerfile 
[root@habor demo-order-master]# cat Dockerfile 
FROM crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-go:0.5.0-go1.22 AS builder
COPY ./ /go/src/
WORKDIR /go/src/

RUN export GO111MODULE=on && \
    export GOPROXY=https://goproxy.cn,direct && \
    skywalking-go-agent -inject /go/src && \
    go build -o ./order -toolexec="skywalking-go-agent" -a /go/src

FROM crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/alpine:3.20
COPY --from=builder /go/src/order .
CMD [ "./order" ]


# 制作镜像
[root@habor demo-order-master]# docker build -t crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v1 .

# 推送镜像到镜像仓库
[root@habor demo-order-master]# docker push crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v1

[root@k8s-master01 demo]# vim mysql.yaml 
[root@k8s-master01 demo]# cat mysql.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: mysql
  name: mysql
  namespace: demo
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: mysql
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: mysql
    spec:
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: mysql-data
      containers:
      - env:
        - name: MYSQL_ROOT_PASSWORD
          value: password
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/mysql:8.0.20
        imagePullPolicy: IfNotPresent
        name: mysql
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
            - name: data
              mountPath: /var/lib/mysql
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

[root@k8s-master01 demo]# vim mysql-svc.yaml 
[root@k8s-master01 demo]# cat mysql-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: mysql
  name: mysql
  namespace: demo
spec:
  ports:
  - nodePort: 32541
    port: 3306
    protocol: TCP
    targetPort: 3306
  selector:
    app: mysql
  sessionAffinity: None
  type: NodePort

[root@k8s-master01 demo]# vim mysql-pvc.yaml 
[root@k8s-master01 demo]# cat mysql-pvc.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mysql-data
  namespace: demo
spec:
  resources:
    requests:
      storage: 5Gi
  volumeMode: Filesystem
  storageClassName: nfs-csi
  accessModes:
    - ReadWriteOnce


# 创建基础组件服务:
[root@k8s-master01 demo]# kubectl create -f mysql.yaml -f mysql-svc.yaml -f mysql-pvc.yaml -n demo

# 查看pod
[root@k8s-master01 demo]# kubectl get po -n demo 
NAME                            READY   STATUS    RESTARTS   AGE
....
mysql-6d698b4676-8hsn8          1/1     Running   0          3m22s

# 配置数据库:
[root@k8s-master01 demo]# kubectl exec -it mysql-6d698b4676-8hsn8 -n demo -- bash
root@mysql-6d698b4676-8hsn8:/# mysql -uroot -ppassword
....
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database orders;
Query OK, 1 row affected (0.01 sec)

mysql> CREATE USER 'order'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.01 sec)

mysql> GRANT ALL ON orders.* TO 'order'@'%';
Query OK, 0 rows affected (0.02 sec)

# 由于 Go 的代码在编译时已经插入探针,所以在启动时,无法特别指定配置,只需要保留相关的环境变量即可:
[root@k8s-master01 demo]# vim demo-order-deploy.yaml
[root@k8s-master01 demo]# cat demo-order-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: demo-order
  name: demo-order
  namespace: demo
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: demo-order
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo-order
    spec:
      containers:
      - env:
        - name: MYSQL_HOST
          value: mysql
        - name: MYSQL_PORT
          value: "3306"
        - name: MYSQL_USER
          value: order
        - name: MYSQL_PASSWORD
          value: password
        - name: MYSQL_DB
          value: orders

        # 添加变量
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: APP
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app']
        - name: SW_AGENT_NAME
          value: "$(NAMESPACE)::$(APP)"
                #- name: SW_AGENT_NAME
                #  value: demo::demo-order
        - name: SW_AGENT_INSTANCE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: SW_AGENT_REPORTER_GRPC_BACKEND_SERVICE
          value: skywalking-oap.skywalking:11800

        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-order:v2
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        name: demo-order
        readinessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

# 接下来创建服务并测试:
[root@k8s-master01 demo]# kubectl create -f demo-order-deploy.yaml -n demo

# 检查pod情况
[root@k8s-master01 demo]# kubectl get po -n demo -owide
NAME                            READY   STATUS    RESTARTS   AGE    IP              NODE         NOMINATED NODE   READINESS GATES
demo-order-755cdc96-ltlzg       1/1     Running   0          65s    172.16.58.239   k8s-node02   <none>           <none>

# 访问测试(可以多测试几次)
[root@k8s-master01 demo]# curl 172.16.58.239:8080/orders
[{"id":1,"name":"Order 1","price":10},{"id":2,"name":"Order 2","price":20}]

查看 skywalking 图表:

自动检测数据库

2.5 清理环境

复制代码
[root@k8s-master01 demo]# kubectl delete deploy -n demo --all

3、全链路追踪项目练习

通过上述的学习,Skywalking 已经成功接入 Go 和 Java 的链路数据,接下来通过一个完整的项目,继续巩固 Skywalking 的学习。

项目架构:

3.1 服务部署

3.1.1 部署数据库(延用上个实验配置)
复制代码
# 部署数据库
[root@k8s-master01 demo]# kubectl create -f mysql.yaml -f mysql-svc.yaml -f 
[root@k8s-master01 demo]# kubectl get po -n demo
NAME                     READY   STATUS    RESTARTS   AGE
mysql-6d698b4676-sk8hj   1/1     Running   0          17s

# 创建账号
[root@k8s-master01 demo]# kubectl exec -it mysql-6d698b4676-sk8hj -n demo -- bash
root@mysql-6d698b4676-sk8hj:/# mysql -uroot -ppassword
....
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database orders;
Query OK, 1 row affected (0.04 sec)

mysql> CREATE USER 'order'@'%' IDENTIFIED BY 'password';
Query OK, 0 rows affected (0.02 sec)

mysql> GRANT ALL ON orders.* TO 'order'@'%';
Query OK, 0 rows affected (0.01 sec)
3.1.2 启动 order 服务
复制代码
# 启动 order 服务,order 服务为 Go 程序,无需更改额外的配置即可完成监控数据的推送:
# 延用上个实验配置,创建一个service
[root@k8s-master01 demo]# vim demo-order-svc.yaml 
[root@k8s-master01 demo]# cat demo-order-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: order
  name: order
  namespace: demo
spec:
  ports:
  - name: http-web
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: demo-order
  sessionAffinity: None
  type: ClusterIP


# 配置一个对外的域名
[root@k8s-master01 demo]# vim demo-order-ingress.yaml 
[root@k8s-master01 demo]# cat demo-order-ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: demo-order
  namespace: demo
spec:
  ingressClassName: nginx
  rules:
  - host: demo.test.com
    http:
      paths:
      - backend:
          service:
            name: order
            port:
              number: 80
        path: /orders
        pathType: ImplementationSpecific

# 创建服务
[root@k8s-master01 demo]# kubectl create -f demo-order-deploy.yaml -f demo-order-svc.yaml -f demo-order-ingress.yaml -n demo

# 查看服务状态:
[root@k8s-master01 demo]# kubectl get pod -n demo -owide
NAME                        READY   STATUS    RESTARTS   AGE     IP              NODE         NOMINATED NODE   READINESS GATES
demo-order-755cdc96-8qlc9   1/1     Running   0          2m54s   172.16.58.245   k8s-node02   <none>           <none>
mysql-6d698b4676-sk8hj      1/1     Running   0          111m    172.16.58.241   k8s-node02   <none>           <none>

[root@k8s-master01 demo]# kubectl get svc,ingress -n demo
NAME            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/mysql   NodePort    10.111.54.12     <none>        3306:32541/TCP   111m
service/order   ClusterIP   10.101.166.166   <none>        80/TCP           3m1s

NAME                                   CLASS   HOSTS           ADDRESS          PORTS   AGE
ingress.networking.k8s.io/demo-order   nginx   demo.test.com   192.168.200.52   80      3m1s


# 测试访问:
[root@k8s-master01 demo]# echo "192.168.200.52 demo.test.com" >> /etc/hosts
[root@k8s-master01 demo]# curl demo.test.com/orders
[{"id":1,"name":"Order 1","price":10},{"id":2,"name":"Order 2","price":20},{"id":3,"name":"Order 1","price":10},{"id":4,"name":"Order 2","price":20}]
3.1.3 部署 handler 服务(延用上个实验配置)
复制代码
# 部署 handler 服务
[root@k8s-master01 demo]# kubectl create -f demo-handler-deploy-sw.yaml -f demo-handler-svc.yaml -n demo
3.1.4 部署 receive 服务
复制代码
[root@k8s-master01 demo]# vim demo-receive-deploy.yaml 
[root@k8s-master01 demo]# cat demo-receive-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: demo-receive
  name: demo-receive
  namespace: demo
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: demo-receive
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo-receive
    spec:
      volumes:
      - name: skywalking-agent
        emptyDir: {}
      initContainers:
      - name: agent-container
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/skywalking-java-agent:9.4.0-java8
        volumeMounts:
        - name: skywalking-agent
          mountPath: /agent
        command: [ "/bin/sh" ]
        args: [ "-c", "cp -R /skywalking/agent /agent/ ; mkdir -p /agent/agent/logs/ ; chown -R 1001.1001 /agent" ]
      containers:
      - env:
        - name: SPRING_PROFILES_ACTIVE
          value: k8supgrade
        - name: SERVER_PORT
          value: "8080"
        - name: JAVA_TOOL_OPTIONS
          value: "-javaagent:/skywalking/agent/skywalking-agent.jar"
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: APP
          valueFrom:
            fieldRef:
              fieldPath: metadata.labels['app']
        - name: SW_AGENT_NAME
          value: "$(NAMESPACE)::$(APP)"
        - name: SW_AGENT_INSTANCE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: SW_AGENT_COLLECTOR_BACKEND_SERVICES
          value: skywalking-oap.skywalking:11800
        volumeMounts:
        - name: skywalking-agent
          mountPath: /skywalking
        image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-receive:v1-upgrade
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        name: demo-receive
        readinessProbe:
          failureThreshold: 2
          initialDelaySeconds: 30
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 8080
          timeoutSeconds: 2
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      
      
[root@k8s-master01 demo]# vim demo-receive-svc.yaml     
[root@k8s-master01 demo]# cat demo-receive-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: demo-receive
  name: demo-receive
  namespace: demo
spec:
  ports:
  - name: http-web
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: demo-receive
  sessionAffinity: None
  type: ClusterIP
  

[root@k8s-master01 demo]# vim demo-receive-ingress.yaml
[root@k8s-master01 demo]# cat demo-receive-ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /$2
  name: demo-receive
  namespace: demo
spec:
  ingressClassName: nginx
  rules:
  - host: demo.test.com
    http:
      paths:
      - backend:
          service:
            name: demo-receive
            port:
              number: 8080
        path: /receiveapi(/|$)(.*)
        pathType: ImplementationSpecific


# 部署 receive 服务:
[root@k8s-master01 demo]# kubectl create -f demo-receive-deploy.yaml -f demo-receive-svc.yaml -f demo-receive-ingress.yaml -n demo
3.1.5 部署前端服务
复制代码
[root@k8s-master01 demo]# vim demo-ui-deploy.yaml 
[root@k8s-master01 demo]# cat demo-ui-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: demo-ui
  name: demo-ui
  namespace: demo
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: demo-ui
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: demo-ui
    spec:
      containers:
      - image: crpi-q1nb2n896zwtcdts.cn-beijing.personal.cr.aliyuncs.com/ywb01/demo-ui:sw
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 2
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 80
          timeoutSeconds: 2
        name: demo-ui
        readinessProbe:
          failureThreshold: 2
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 80
          timeoutSeconds: 2
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
      

[root@k8s-master01 demo]# vim demo-ui-svc.yaml 
[root@k8s-master01 demo]# cat demo-ui-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app: demo-ui
  name: demo-ui
  namespace: demo
spec:
  ports:
  - name: http-web
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: demo-ui
  sessionAffinity: None
  type: ClusterIP
  
  
[root@k8s-master01 demo]# vim demo-ui-ingress.yaml 
[root@k8s-master01 demo]# cat demo-ui-ingress.yaml 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: demo-ui
  namespace: demo
spec:
  ingressClassName: nginx
  rules:
  - host: demo.test.com
    http:
      paths:
      - backend:
          service:
            name: demo-ui
            port:
              number: 80
        path: /
        pathType: ImplementationSpecific
        
        
# 部署前端服务:
[root@k8s-master01 demo]# kubectl create -f demo-ui-deploy.yaml -f demo-ui-svc.yaml -f demo-ui-ingress.yaml -nn demo

# 部署完毕后,最终的服务如下:
[root@k8s-master01 demo]# kubectl get po,svc,ingress -n demo
NAME                                READY   STATUS    RESTARTS      AGE
pod/demo-handler-5b6f9dd9c7-g4k5s   1/1     Running   1 (25m ago)   26m
pod/demo-order-755cdc96-8qlc9       1/1     Running   0             47m
pod/demo-receive-5cf555cdfd-j5g76   1/1     Running   1 (14m ago)   16m
pod/demo-ui-66bb5f4d67-smbpb        1/1     Running   0             83s
pod/mysql-6d698b4676-sk8hj          1/1     Running   0             155m

NAME                   TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/demo-receive   ClusterIP   10.103.251.213   <none>        8080/TCP         16m
service/demo-ui        ClusterIP   10.106.49.125    <none>        80/TCP           83s
service/handler        ClusterIP   10.102.43.148    <none>        80/TCP           26m
service/mysql          NodePort    10.111.54.12     <none>        3306:32541/TCP   155m
service/order          ClusterIP   10.101.166.166   <none>        80/TCP           47m

NAME                                     CLASS   HOSTS           ADDRESS          PORTS   AGE
ingress.networking.k8s.io/demo-order     nginx   demo.test.com   192.168.200.52   80      47m
ingress.networking.k8s.io/demo-receive   nginx   demo.test.com   192.168.200.52   80      16m
ingress.networking.k8s.io/demo-ui        nginx   demo.test.com   192.168.200.52   80      83s

接下来通过浏览器访问:

3.2 服务访问与监控

接下来访问页面,测试生成密码和创建订单:

之后就可以看到整个项目的架构图:

创建订单会有随机延迟,延迟信息也可以在 skywalking 上面看到 trace 信息:

3.3 模拟故障

复制代码
# 接下来模拟 handler 服务故障:
[root@k8s-master01 demo]# kubectl scale deploy demo-handler mysql --replicas=0 -n demo

再次访问即可收集到错误的链路信息:

4、Skywalking 告警

4.1 Skywalking 告警通知

Skywalking支持针对采集的Metrics数据进行监控告警,并可以在出现异常时及时作出反应。通过合理配置告警规则和钩子,可以实现有效地预防潜在问题并及时定位相关问题。

Skywalking的告警核心由一组规则实现,主要包含如下三个部分:

  • 指标(Metrics):Skywalking收集的关于服务、实例和端点的各种性能指标数据
  • 规则(Rules):告警的触发规则,默认定义在 config/alarm-settings.yaml 文件中,支持比较运算符和逻辑运算符等
  • 钩子(Hooks):当告警被触发后,通过钩子来执行特定的操作,如发送通知等

4.2 Skywalking 告警规则

Skywalking 告警规则由如下元素组成:

  • 规则名称:全局唯一,必须由_rule结尾
  • expression:使用MOE(Metrics Query Expression)定义,表达式的结果必须是SINGLE_VALUE,且根操作必须是一个比较操作或布尔操作,同时结果需要为1(true)或0(false),当结果为1(true)时,告警会被触发
  • include-name:包含的实体名称,可以是Service、Instance、Endpoint等,列表类型
  • exclude-names:排除的实体名称
  • include-names-regex:正则匹配包含
  • exclude-names-regex:正则匹配排除
  • tags:附加告警标签,比如level=warning
  • period:周期,检查告警条件的时间窗口大小,以分钟为单位
  • silence-period:静默期,某个告警被触发后,在接下来的一段时间内,该告警不会再次被触发,不指定该值则和period一样
  • hooks:告警触发时绑定的钩子名称,名称格式为{hookType}.{hookName}(例如slack.customl),并且必须在alarm-settings.yml文件的hooks部分定义。如果未指定钩子名称,则会使用全局钩子
  • message:告警信息,可以用作描述当前告警

4.3 钉钉告警机器人配置

使用钉钉告警,需要先创建一个群聊,然后添加一个机器人:

添加机器人

选择自定义

填写机器人名称,以及复制密匙

添加机器人以及复制Webhook

4.4 Skywalking 接入钉钉告警

首先把 Skywalking 告警的配置文件放置在 Skywalking 的安装目录:

复制代码
# 创建告警存放目录
[root@k8s-master01 demo]# mkdir -p ../files/conf.d/oap
[root@k8s-master01 demo]# cd ../files/conf.d/oap

# 从oap容器里把告警模板文件copy出来
[root@k8s-master01 oap]# kubectl cp skywalking-oap-6d8f594b7c-xrnbr:/skywalking/config/alarm-settings.yml ./alarm-settings.yml -n skywalking

# 添加钉钉告警
[root@k8s-master01 oap]# vim alarm-settings.yml 
[root@k8s-master01 oap]# tail -14 alarm-settings.yml 
hooks:
  dingtalk:
    default:
      is-default: true
      text-template: |-
        {
          "msgtype": "text",
          "text": {
            "content": "Apache SkyWalking Alarm: \n %s."
            } 
        }
      webhooks:
      - url: https://oapi.dingtalk.com/robot/send?access_token=c7cd207fd31cd72f433d67effda0568b681b10f626f97c02cb55f03b73b651c5
        secret: SECedef18728aa48ea6ca4c2f595967f6c389e2fc4d13bfca2741087b8c8878e017

# 更新配置(需要回到skywalking根目录)
[root@k8s-master01 oap]# cd ../../..
[root@k8s-master01 skywalking]# helm upgrade skywalking . -n skywalking

# 查看 Pod 更新状态:
[root@k8s-master01 skywalking]# kubectl get po -n skywalking | grep oap
skywalking-oap-5644bbbd46-hvvxx   1/1     Running     0          11m

# 查看配置文件是否更新:
[root@k8s-master01 skywalking]# kubectl exec skywalking-oap-5644bbbd46-hvvxx -n skywalking -- tail -14 config/alarm-settings.yml
Defaulted container "oap" out of: oap, wait-for-elasticsearch (init)
hooks:
  dingtalk:
    default:
      is-default: true
      text-template: |-
        {
          "msgtype": "text",
          "text": {
            "content": "Apache SkyWalking Alarm: \n %s."
            } 
        }
      webhooks:
      - url: https://oapi.dingtalk.com/robot/send?access_token=c7cd207fd31cd72f433d67effda0568b681b10f626f97c02cb55f03b73b651c5
        secret: SECedef18728aa48ea6ca4c2f595967f6c389e2fc4d13bfca2741087b8c8878e017

请求服务,触发告警:

等待一会钉钉即可查询到告警信息

4.5 自定义告警规则

除了默认告警,还可以添加一些自定义告警,比如想要监控 Java 服务 JVM 线程池是否阻塞,可以通过 instance_jvm_thread_blocked_state_thread_count 指标进行监控。

复制代码
# 比如监控 JVM 阻塞的线程数大于 5:
[root@k8s-master01 oap]# vim alarm-settings.yml 
[root@k8s-master01 oap]# cat alarm-settings.yml 
....
rules:
  thread_block_rule:
    expression: sum(instance_jvm_thread_blocked_state_thread_count >5) >= 2
    period: 5       # 检查过去 5 分钟的数据
    message: "服务 {name} 的线程池,在过去两分钟内被阻塞的数量超过 5"
....

# 更改配置文件后,更新配置:
[root@k8s-master01 skywalking]# helm upgrade skywalking -n skywalking .
[root@k8s-master01 skywalking]# kubectl rollout restart deploy skywalking-oap -n skywalking

此博客来源于:https://edu.51cto.com/lecturer/11062970.html

相关推荐
程序员JerrySUN几秒前
Linux内核驱动开发核心问题全解
linux·运维·驱动开发
Joey_Chen2 小时前
【What · Why · How】浅析select/poll/epoll与IO多路复用
linux·服务器
“αβ”2 小时前
线程安全的单例模式
linux·服务器·开发语言·c++·单例模式·操作系统·vim
容器魔方3 小时前
「中科类脑」正式加入 Karmada 用户组!携手社区共建多集群生态
云原生·容器·云计算
终端行者3 小时前
k8s之ingress定义https访问方式
容器·https·kubernetes
gnawkhhkwang3 小时前
clock_nanosleep系统调用及示例
linux
小醉你真好3 小时前
7、Docker 常用命令大全
docker·容器·eureka
渡我白衣3 小时前
综合:日志的实现
linux
嶔某4 小时前
网络:基础概念
linux·服务器·网络·c++
ArabySide4 小时前
【Linux】Ubuntu上安装.NET 9运行时与ASP.NET Core项目部署入门
linux·ubuntu·.net