云原生实战:从零搭建企业级K8s环境

云原生实战:从零搭建企业级K8s环境

本文为云原生系列第二篇:实战篇,上一篇概念篇讲解了云原生的核心概念和理论,本文将带你从零开始搭建完整的云原生环境。

一、环境准备与规划

1. 硬件要求

最小配置(学习/测试环境):

  • 3台服务器(2核CPU,4GB内存,50GB磁盘)
  • 网络互通,SSH访问
  • 操作系统:Ubuntu 20.04/22.04 或 CentOS 8+

生产环境推荐配置:

  • 3台Master节点(4核CPU,8GB内存)
  • 3台Worker节点(根据业务需求配置)
  • 高速网络(万兆网络推荐)
  • SSD存储

2. 软件要求

bash 复制代码
# 检查系统版本
cat /etc/os-release

# 检查内核版本(需要3.10+)
uname -r

# 检查防火墙状态
sudo ufw status

# 检查SELinux状态(CentOS)
getenforce

二、Kubernetes集群搭建

1. Kubernetes集群架构

工作节点 Worker Nodes
控制平面 Control Plane
Node 3
Node 2
Node 1
Pod E
Kafka Container
Pod D
App Container
Pod C
Redis Container
Pod B
MySQL Container
Pod A
App Container
Sidecar Container
API Server
Scheduler
Controller Manager
etcd
Kubelet
Kubelet
Kubelet
PodA
PodB
PodC
PodD
PodE

2. 使用kubeadm搭建集群

在所有节点执行以下步骤:

bash 复制代码
#!/bin/bash
# setup-k8s.sh

set -e

echo "=== 开始安装Kubernetes集群 ==="

# 1. 关闭swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# 2. 安装Docker
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io

# 配置Docker使用systemd作为cgroup驱动
sudo cat > /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

sudo systemctl restart docker
sudo systemctl enable docker

# 3. 安装kubeadm, kubelet, kubectl
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

# 4. 配置kubelet使用systemd作为cgroup驱动
echo 'KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"' | sudo tee /etc/default/kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet

echo "基础环境安装完成!"

在Master节点执行:

bash 复制代码
#!/bin/bash
# init-master.sh

set -e

echo "=== 初始化Master节点 ==="

# 1. 初始化集群
sudo kubeadm init \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=$(hostname -I | awk '{print $1}') \
  --image-repository registry.aliyuncs.com/google_containers

# 2. 配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 3. 安装网络插件(Flannel)
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

# 4. 等待所有Pod就绪
kubectl get pods --all-namespaces -w

# 5. 生成加入集群的命令
echo "=== Worker节点加入命令 ==="
kubeadm token create --print-join-command

echo "Master节点初始化完成!"

在Worker节点执行:

bash 复制代码
#!/bin/bash
# join-worker.sh

set -e

echo "=== 加入Kubernetes集群 ==="

# 使用Master节点生成的命令加入集群
# 例如:kubeadm join 192.168.1.100:6443 --token xxxxxx --discovery-token-ca-cert-hash sha256:xxxxxx

# 检查节点状态
kubectl get nodes

echo "Worker节点加入完成!"

三、容器化Java应用

1. 多阶段构建Dockerfile

dockerfile 复制代码
# 多阶段构建Dockerfile
# 阶段1: 构建阶段
FROM eclipse-temurin:17-jdk-jammy as builder

# 设置工作目录
WORKDIR /app

# 复制构建文件
COPY gradlew .
COPY gradle gradle
COPY build.gradle .
COPY settings.gradle .
COPY src src

# 赋予执行权限
RUN chmod +x gradlew

# 构建应用(跳过测试)
RUN ./gradlew build -x test

# 阶段2: 运行阶段
FROM eclipse-temurin:17-jre-jammy

# 安装必要的工具
RUN apt-get update && apt-get install -y \
    curl \
    jq \
    && rm -rf /var/lib/apt/lists/*

# 创建非root用户
RUN groupadd --system --gid 1001 appuser && \
    useradd --system --uid 1001 --gid 1001 appuser

# 设置工作目录
WORKDIR /app

# 从构建阶段复制jar文件
COPY --from=builder /app/build/libs/*.jar app.jar

# 复制启动脚本
COPY --chown=appuser:appuser docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh

# 设置环境变量
ENV JAVA_OPTS="-XX:+UseContainerSupport \
               -XX:MaxRAMPercentage=75.0 \
               -XX:+UseG1GC \
               -XX:MaxGCPauseMillis=200 \
               -XX:ParallelGCThreads=2 \
               -XX:ConcGCThreads=2 \
               -XX:+ExitOnOutOfMemoryError"

ENV SPRING_PROFILES_ACTIVE="docker"

# 切换到非root用户
USER appuser

# 暴露端口
EXPOSE 8080

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:8080/actuator/health || exit 1

# 使用entrypoint脚本
ENTRYPOINT ["docker-entrypoint.sh"]

2. Docker Entrypoint脚本

bash 复制代码
#!/bin/bash
# docker-entrypoint.sh

set -e

# 等待依赖服务
wait_for_service() {
    local host=$1
    local port=$2
    local timeout=${3:-30}
    
    echo "Waiting for $host:$port..."
    for i in $(seq 1 $timeout); do
        if nc -z $host $port 2>/dev/null; then
            echo "$host:$port is available!"
            return 0
        fi
        echo "Attempt $i/$timeout: $host:$port not ready yet..."
        sleep 1
    done
    echo "Timeout waiting for $host:$port"
    return 1
}

# 检查必要的环境变量
check_required_vars() {
    local required_vars=("DB_HOST" "DB_PORT")
    local missing_vars=()
    
    for var in "${required_vars[@]}"; do
        if [ -z "${!var}" ]; then
            missing_vars+=("$var")
        fi
    done
    
    if [ ${#missing_vars[@]} -ne 0 ]; then
        echo "Error: Missing required environment variables: ${missing_vars[*]}"
        exit 1
    fi
}

# 主函数
main() {
    echo "Starting application..."
    
    # 检查必要环境变量
    check_required_vars
    
    # 等待数据库
    if [ -n "$DB_HOST" ] && [ -n "$DB_PORT" ]; then
        wait_for_service "$DB_HOST" "$DB_PORT"
    fi
    
    # 等待Redis
    if [ -n "$REDIS_HOST" ] && [ -n "$REDIS_PORT" ]; then
        wait_for_service "$REDIS_HOST" "$REDIS_PORT"
    fi
    
    # 执行Java应用
    echo "Starting Java application with options: $JAVA_OPTS"
    exec java $JAVA_OPTS -jar app.jar "$@"
}

# 运行主函数
main "$@"

四、完整的Kubernetes部署配置

1. 命名空间配置

yaml 复制代码
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: springboot-app
  labels:
    name: springboot-app
    environment: production

2. 配置管理

yaml 复制代码
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: springboot-app
data:
  application.yml: |
    spring:
      application:
        name: springboot-app
      datasource:
        url: jdbc:mysql://mysql-service:3306/appdb
        username: ${DB_USERNAME}
        password: ${DB_PASSWORD}
        hikari:
          maximum-pool-size: 10
          minimum-idle: 5
      redis:
        host: redis-service
        port: 6379
        timeout: 2000ms
      kafka:
        bootstrap-servers: kafka-service:9092
    management:
      endpoints:
        web:
          exposure:
            include: health,info,metrics,prometheus
      health:
        db:
          enabled: true
        redis:
          enabled: true
        diskspace:
          enabled: true

3. 密钥管理

yaml 复制代码
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
  namespace: springboot-app
type: Opaque
data:
  db-username: YXBwX3VzZXI=  # app_user
  db-password: UGFzc3dvcmQxMjM=  # Password123
  redis-password: cmVkaXNfcGFzc3dvcmQ=  # redis_password

4. 应用部署

yaml 复制代码
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: springboot-app
  namespace: springboot-app
  labels:
    app: springboot-app
    version: v1.0.0
spec:
  replicas: 3
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: springboot-app
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: springboot-app
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/actuator/prometheus"
    spec:
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - springboot-app
              topologyKey: kubernetes.io/hostname
      containers:
      - name: app
        image: your-registry/springboot-app:latest
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        env:
        - name: DB_USERNAME
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: db-username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: db-password
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: redis-password
        - name: JAVA_OPTS
          value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:+UseG1GC"
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 30
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        startupProbe:
          httpGet:
            path: /actuator/health/startup
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 10
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 30
        volumeMounts:
        - name: config-volume
          mountPath: /app/config
        - name: logs-volume
          mountPath: /app/logs
      volumes:
      - name: config-volume
        configMap:
          name: app-config
      - name: logs-volume
        emptyDir: {}
      restartPolicy: Always
      terminationGracePeriodSeconds: 30

5. 服务发现

yaml 复制代码
# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: springboot-service
  namespace: springboot-app
  labels:
    app: springboot-app
spec:
  selector:
    app: springboot-app
  ports:
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP
  type: ClusterIP

6. 水平自动伸缩

yaml 复制代码
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: springboot-hpa
  namespace: springboot-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: springboot-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
      - type: Percent
        value: 100
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60

7. 入口路由

yaml 复制代码
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: springboot-ingress
  namespace: springboot-app
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "30"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "30"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.yourdomain.com
    secretName: tls-secret
  rules:
  - host: app.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: springboot-service
            port:
              number: 80

五、中间件容器化部署

1. MySQL StatefulSet部署

yaml 复制代码
# mysql-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
  name: mysql-service
  namespace: springboot-app
  labels:
    app: mysql
spec:
  ports:
  - port: 3306
    name: mysql
  clusterIP: None
  selector:
    app: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: springboot-app
spec:
  serviceName: mysql-service
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: root-password
        - name: MYSQL_DATABASE
          value: "appdb"
        - name: MYSQL_USER
          value: "app_user"
        - name: MYSQL_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mysql-secret
              key: user-password
        ports:
        - containerPort: 3306
          name: mysql
        volumeMounts:
        - name: mysql-data
          mountPath: /var/lib/mysql
        - name: mysql-config
          mountPath: /etc/mysql/conf.d
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          exec:
            command:
            - mysqladmin
            - ping
            - -h
            - localhost
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
        readinessProbe:
          exec:
            command:
            - mysql
            - -h
            - localhost
            - -uapp_user
            - -p$(MYSQL_PASSWORD)
            - -e
            - SELECT 1
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 1
      volumes:
      - name: mysql-config
        configMap:
          name: mysql-config
  volumeClaimTemplates:
  - metadata:
      name: mysql-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 20Gi

2. Redis Cluster部署

yaml 复制代码
# redis-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
  namespace: springboot-app
data:
  redis.conf: |
    cluster-enabled yes
    cluster-config-file /data/nodes.conf
    cluster-node-timeout 5000
    appendonly yes
    dir /data
    port 6379
    bind 0.0.0.0
    protected-mode no
    daemonize no
    pidfile /var/run/redis.pid
    loglevel notice
    logfile ""
    databases 16
    save 900 1
    save 300 10
    save 60 10000
    stop-writes-on-bgsave-error yes
    rdbcompression yes
    rdbchecksum yes
    dbfilename dump.rdb
    maxmemory 1gb
    maxmemory-policy allkeys-lru
---
# redis-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-cluster
  namespace: springboot-app
spec:
  serviceName: redis-service
  replicas: 6
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        command: ["redis-server"]
        args: ["/etc/redis/redis.conf"]
        ports:
        - containerPort: 6379
          name: client
        - containerPort: 16379
          name: gossip
        volumeMounts:
        - name: redis-data
          mountPath: /data
        - name: redis-config
          mountPath: /etc/redis
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "200m"
        livenessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          exec:
            command:
            - redis-cli
            - ping
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: redis-config
        configMap:
          name: redis-config
          items:
          - key: redis.conf
            path: redis.conf
  volumeClaimTemplates:
  - metadata:
      name: redis-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

六、一键部署脚本

bash 复制代码
#!/bin/bash
# deploy.sh - 一键部署脚本

set -e

echo "开始部署云原生应用..."

# 检查kubectl
if ! command -v kubectl &> /dev/null; then
    echo "错误: kubectl未安装"
    exit 1
fi

# 创建命名空间
echo "创建命名空间..."
kubectl apply -f namespace.yaml

# 创建密钥
echo "创建密钥..."
kubectl create secret generic mysql-secret \
  --from-literal=root-password='RootPassword123!' \
  --from-literal=user-password='UserPassword123!' \
  --namespace=springboot-app

kubectl create secret generic app-secrets \
  --from-literal=db-username='app_user' \
  --from-literal=db-password='Password123' \
  --from-literal=redis-password='redis_password' \
  --namespace=springboot-app

# 部署中间件
echo "部署MySQL..."
kubectl apply -f mysql-config.yaml
kubectl apply -f mysql-statefulset.yaml

echo "部署Redis..."
kubectl apply -f redis-config.yaml
kubectl apply -f redis-statefulset.yaml

# 等待中间件就绪
echo "等待中间件启动..."
kubectl wait --for=condition=ready pod -l app=mysql --timeout=300s -n springboot-app
kubectl wait --for=condition=ready pod -l app=redis --timeout=300s -n springboot-app

# 初始化Redis集群
echo "初始化Redis集群..."
REDIS_PODS=$(kubectl get pods -l app=redis -n springboot-app -o jsonpath='{.items[*].metadata.name}')
IFS=' ' read -ra PODS <<< "$REDIS_PODS"

if [ ${#PODS[@]} -eq 6 ]; then
    kubectl exec -it ${PODS[0]} -n springboot-app -- redis-cli --cluster create \
        ${PODS[0]}.redis-service.springboot-app.svc.cluster.local:6379 \
        ${PODS[1]}.redis-service.springboot-app.svc.cluster.local:6379 \
        ${PODS[2]}.redis-service.springboot-app.svc.cluster.local:6379 \
        ${PODS[3]}.redis-service.springboot-app.svc.cluster.local:6379 \
        ${PODS[4]}.redis-service.springboot-app.svc.cluster.local:6379 \
        ${PODS[5]}.redis-service.springboot-app.svc.cluster.local:6379 \
        --cluster-replicas 1
fi

# 部署应用
echo "部署Spring Boot应用..."
kubectl apply -f configmap.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f hpa.yaml

# 部署Ingress(如果有)
if [ -f ingress.yaml ]; then
    echo "部署Ingress..."
    kubectl apply -f ingress.yaml
fi

# 检查部署状态
echo "检查部署状态..."
kubectl get all -n springboot-app

echo "部署完成!"
echo "应用访问地址:"
echo "- 集群内部: http://springboot-service.springboot-app.svc.cluster.local"
echo "- 外部访问(如有Ingress): https://app.yourdomain.com"
echo ""
echo "监控应用状态:"
echo "kubectl get pods -n springboot-app"
echo "kubectl logs -f deployment/springboot-app -n springboot-app"

七、监控和诊断工具

1. 监控脚本

bash 复制代码
#!/bin/bash
# monitor.sh - 监控和诊断脚本

set -e

NAMESPACE=${1:-springboot-app}

echo "=== 云原生应用监控诊断 ==="
echo "命名空间: $NAMESPACE"
echo "时间: $(date)"
echo ""

# 1. 检查命名空间资源
echo "1. 命名空间资源概览:"
kubectl get all -n $NAMESPACE
echo ""

# 2. 检查Pod状态
echo "2. Pod详细状态:"
kubectl get pods -n $NAMESPACE -o wide
echo ""

# 3. 检查Pod事件
echo "3. 最近事件:"
kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -10
echo ""

# 4. 检查资源使用
echo "4. 资源使用情况:"
kubectl top pods -n $NAMESPACE
echo ""

# 5. 检查HPA状态
echo "5. HPA状态:"
kubectl get hpa -n $NAMESPACE
echo ""

# 6. 检查服务端点
echo "6. 服务端点:"
kubectl get endpoints -n $NAMESPACE
echo ""

# 7. 检查Ingress状态
echo "7. Ingress状态:"
kubectl get ingress -n $NAMESPACE
echo ""

# 8. 检查配置
echo "8. 配置检查:"
kubectl get configmap,secret -n $NAMESPACE
echo ""

# 9. 网络策略检查
echo "9. 网络策略:"
kubectl get networkpolicy -n $NAMESPACE
echo ""

# 10. 存储检查
echo "10. 存储卷:"
kubectl get pvc,pv -n $NAMESPACE
echo ""

# 11. 节点状态
echo "11. 节点状态:"
kubectl get nodes -o wide
echo ""

# 12. 集群信息
echo "12. 集群版本:"
kubectl version --short
echo ""

echo "=== 诊断完成 ==="
echo "常见问题排查:"
echo "1. Pod CrashLoopBackOff: kubectl logs <pod-name> -n $NAMESPACE"
echo "2. Pod Pending: kubectl describe pod <pod-name> -n $NAMESPACE"
echo "3. 服务无法访问: kubectl describe service <service-name> -n $NAMESPACE"
echo "4. 镜像拉取失败: kubectl describe pod <pod-name> -n $NAMESPACE | grep -A5 Events"

2. 云原生监控体系

告警层
可视化层
数据存储层
数据处理层
数据采集层
应用指标
系统指标
业务指标
日志数据
链路追踪
Prometheus
Fluentd
Jaeger
OpenTelemetry
时序数据库
Elasticsearch
对象存储
Grafana
Kibana
自定义看板
Alertmanager
企业微信
钉钉
邮件/SMS

八、CI/CD流水线配置

1. GitLab CI/CD完整配置

yaml 复制代码
# .gitlab-ci.yml
variables:
  DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
  K8S_NAMESPACE: "springboot-app"
  KUBECONFIG: "/kubeconfig"

stages:
  - build
  - test
  - security
  - package
  - deploy-dev
  - deploy-staging
  - deploy-prod

# 构建阶段
build:
  stage: build
  image: gradle:jdk17
  script:
    - ./gradlew clean build -x test
  artifacts:
    paths:
      - build/libs/*.jar
    expire_in: 1 week

# 单元测试
unit-test:
  stage: test
  image: gradle:jdk17
  script:
    - ./gradlew test
  coverage: '/Total.*?([0-9]{1,3})%/'
  artifacts:
    reports:
      junit: build/test-results/test/**/TEST-*.xml

# 安全扫描
sast:
  stage: security
  image: 
    name: "gcr.io/cloud-marketplace-containers/google/debian11"
    entrypoint: [""]
  script:
    - apt-get update && apt-get install -y wget
    - wget -qO- https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh
    - grype dir:. --fail-on high

# 构建Docker镜像
docker-build:
  stage: package
  image: docker:latest
  services:
    - docker:dind
  script:
    - docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker build -t $DOCKER_IMAGE .
    - docker push $DOCKER_IMAGE

# 部署到开发环境
deploy-dev:
  stage: deploy-dev
  image: bitnami/kubectl:latest
  script:
    - echo "$KUBECONFIG_DEV" | base64 -d > $KUBECONFIG
    - kubectl config use-context dev-cluster
    - kubectl set image deployment/springboot-app app=$DOCKER_IMAGE -n $K8S_NAMESPACE
    - kubectl rollout status deployment/springboot-app -n $K8S_NAMESPACE --timeout=300s
  environment:
    name: development
    url: https://dev.app.yourdomain.com
  only:
    - dev

# 部署到生产环境
deploy-prod:
  stage: deploy-prod
  image: bitnami/kubectl:latest
  script:
    - echo "$KUBECONFIG_PROD" | base64 -d > $KUBECONFIG
    - kubectl config use-context prod-cluster
    # 金丝雀发布
    - kubectl set image deployment/springboot-app app=$DOCKER_IMAGE -n $K8S_NAMESPACE
    - kubectl rollout pause deployment/springboot-app -n $K8S_NAMESPACE
    - sleep 60  # 观察金丝雀版本
    - kubectl rollout resume deployment/springboot-app -n $K8S_NAMESPACE
    - kubectl rollout status deployment/springboot-app -n $K8S_NAMESPACE --timeout=600s
  environment:
    name: production
    url: https://app.yourdomain.com
  when: manual
  only:
    - main

九、性能测试和安全扫描

1. 性能测试脚本

bash 复制代码
#!/bin/bash
# performance-test.sh

set -e

echo "=== 云原生应用性能测试 ==="
echo "开始时间: $(date)"
echo ""

# 测试配置
CONCURRENT_USERS=${1:-100}
DURATION=${2:-300}
BASE_URL=${3:-"http://springboot-service.springboot-app.svc.cluster.local"}

# 预热应用
echo "1. 预热应用..."
for i in {1..10}; do
    curl -s -o /dev/null -w "%{http_code}" $BASE_URL/actuator/health
    echo " - 请求 $i: 完成"
    sleep 1
done
echo ""

# 负载测试
echo "2. 执行负载测试..."
echo "并发用户: $CONCURRENT_USERS"
echo "持续时间: ${DURATION}秒"
echo ""

# 使用wrk进行负载测试
if command -v wrk &> /dev/null; then
    wrk -t$CONCURRENT_USERS -c$CONCURRENT_USERS -d${DURATION}s $BASE_URL/actuator/health
else
    echo "警告: wrk未安装,使用ab替代"
    ab -n $((CONCURRENT_USERS * DURATION)) -c $CONCURRENT_USERS $BASE_URL/actuator/health
fi

echo "性能测试完成!"

2. 安全扫描脚本

bash 复制代码
#!/bin/bash
# security-scan.sh

set -e

echo "=== 云原生安全扫描 ==="
echo "开始时间: $(date)"
echo ""

NAMESPACE="springboot-app"

# 镜像漏洞扫描
echo "1. 镜像漏洞扫描..."
if command -v trivy &> /dev/null; then
    IMAGES=$(kubectl get pods -n $NAMESPACE -o json | jq -r '.items[].spec.containers[].image' | sort | uniq)
    for IMAGE in $IMAGES; do
        echo "扫描镜像: $IMAGE"
        trivy image --severity HIGH,CRITICAL $IMAGE || true
        echo ""
    done
else
    echo "警告: trivy未安装,跳过镜像扫描"
fi

# 检查安全配置
echo "2. 检查安全配置..."
echo "Pod安全上下文:"
kubectl get pods -n $NAMESPACE -o json | jq '.items[].spec.securityContext' || echo "无安全上下文配置"
echo ""

echo "网络策略:"
kubectl get networkpolicy -n $NAMESPACE
echo ""

echo "安全扫描完成!"

十、Java应用容器化部署流程

应用实例 Kubernetes 镜像仓库 CI/CD平台 Git仓库 开发者 应用实例 Kubernetes 镜像仓库 CI/CD平台 Git仓库 开发者 提交代码 触发流水线 1. 代码编译 2. 单元测试 3. 安全扫描 4. 构建Docker镜像 5. 推送镜像 6. 更新Deployment 7. 创建新Pod 8. 启动新容器 9. 健康检查 10. 流量切换 11. 清理旧Pod 12. 部署完成

十一、常见问题排查

1. Pod启动失败

问题现象:

复制代码
kubectl get pods
NAME                              READY   STATUS             RESTARTS   AGE
springboot-app-5f8c6d8b5d-abcde   0/1     CrashLoopBackOff   5          2m

排查步骤:

bash 复制代码
# 1. 查看Pod详情
kubectl describe pod springboot-app-5f8c6d8b5d-abcde -n springboot-app

# 2. 查看Pod日志
kubectl logs springboot-app-5f8c6d8b5d-abcde -n springboot-app

# 3. 查看事件
kubectl get events -n springboot-app --sort-by='.lastTimestamp' | grep springboot-app

# 4. 进入Pod调试
kubectl exec -it springboot-app-5f8c6d8b5d-abcde -n springboot-app -- /bin/sh

2. 服务无法访问

问题现象:

  • 外部无法访问服务
  • 内部服务间调用失败

排查步骤:

bash 复制代码
# 1. 检查Service
kubectl get svc -n springboot-app
kubectl describe svc springboot-service -n springboot-app

# 2. 检查Endpoints
kubectl get endpoints -n springboot-app

# 3. 检查网络策略
kubectl get networkpolicy -n springboot-app

# 4. 测试内部访问
kubectl run test-curl --image=curlimages/curl -it --rm -- curl http://springboot-service.springboot-app.svc.cluster.local/actuator/health

3. 资源不足

问题现象:

复制代码
kubectl get pods
NAME                              READY   STATUS    RESTARTS   AGE
springboot-app-5f8c6d8b5d-abcde   0/1     Pending   0          5m

排查步骤:

bash 复制代码
# 1. 查看Pod详情
kubectl describe pod springboot-app-5f8c6d8b5d-abcde -n springboot-app

# 2. 检查节点资源
kubectl describe nodes

# 3. 检查资源配额
kubectl describe quota -n springboot-app

# 4. 调整资源请求
# 修改deployment.yaml中的resources配置

十二、最佳实践总结

1. 容器化最佳实践

镜像优化:

  • 使用多阶段构建减小镜像大小
  • 使用Alpine基础镜像
  • 清理构建缓存和临时文件
  • 使用非root用户运行容器

资源配置:

  • 合理设置requests和limits
  • 启用健康检查
  • 配置资源配额
  • 使用亲和性和反亲和性

2. 部署最佳实践

滚动更新:

  • 配置适当的maxSurge和maxUnavailable
  • 使用就绪探针确保服务可用性
  • 设置适当的minReadySeconds

高可用:

  • 多副本部署
  • 跨节点分布Pod
  • 使用PodDisruptionBudget

3. 监控最佳实践

指标收集:

  • 应用指标(业务指标、性能指标)
  • 系统指标(CPU、内存、磁盘、网络)
  • 中间件指标(数据库、缓存、消息队列)

告警配置:

  • 设置合理的告警阈值
  • 避免告警风暴
  • 分级告警(警告、严重、紧急)

十三、下一步学习建议

1. 进阶学习方向

服务网格:

  • Istio:功能最全的服务网格
  • Linkerd:轻量级服务网格
  • Consul:服务发现和配置管理

云原生安全:

  • 容器安全
  • 网络安全
  • 身份认证和授权
  • 合规性检查

性能优化:

  • 应用性能优化
  • 集群性能优化
  • 成本优化

2. 实践项目建议

项目1:微服务电商系统

  • 商品服务、订单服务、用户服务
  • 使用Spring Cloud + Kubernetes
  • 实现完整的CI/CD流水线

项目2:实时数据处理平台

  • 使用Kafka进行数据流处理
  • 使用Flink进行实时计算
  • 使用Redis进行缓存

项目3:AI模型服务平台

  • 模型训练和推理服务
  • 使用GPU资源调度
  • 模型版本管理和A/B测试

十四、总结

通过本文的实战指南,你应该已经掌握了:

  1. 环境搭建:从零搭建Kubernetes集群
  2. 应用容器化:将Java应用打包为Docker镜像
  3. K8s部署:使用YAML文件部署应用到集群
  4. 中间件部署:MySQL、Redis等中间件的容器化部署
  5. 自动化运维:监控、诊断、CI/CD流水线
  6. 问题排查:常见问题的诊断和解决方法

云原生是一个持续学习和实践的过程,建议你:

  1. 动手实践:在自己的环境中尝试部署
  2. 参与社区:加入Kubernetes和云原生社区
  3. 持续学习:关注云原生技术的最新发展
  4. 分享经验:将你的实践经验分享给他人

希望这篇实战指南能帮助你在云原生的道路上走得更远!


上一篇回顾:《云原生全解析:从概念到实践,Java技术栈如何拥抱云原生时代》

相关资源

作者简介:互联网老兵,专注于Java微服务和云原生技术实践。

版权声明:本文为原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。

相关推荐
立莹Sir2 小时前
云原生全解析:从概念到实践,Java技术栈如何拥抱云原生时代
java·开发语言·云原生
modelmd2 小时前
Docker 重命名数据卷
运维·docker·容器
Skilce2 小时前
K8S部署
linux·运维·服务器·容器·kubernetes
张3232 小时前
kubernetes Pod难点
云原生·容器·kubernetes
qq2439201613 小时前
ubuntu搭建k8s 1.35版本
云原生·容器·kubernetes
cyber_两只龙宝3 小时前
【Oracle】Oracle之DQL中SELECT的基础使用
linux·运维·服务器·数据库·云原生·oracle
悠悠121383 小时前
K8s持久化存储深度解析:PV、PVC、StorageClass三剑客的生产实战
云原生·容器·kubernetes
FJW0208144 小时前
Kubernetes自动化巡检脚本(Python)
容器·kubernetes·自动化