云原生实战:从零搭建企业级K8s环境
本文为云原生系列第二篇:实战篇,上一篇概念篇讲解了云原生的核心概念和理论,本文将带你从零开始搭建完整的云原生环境。
一、环境准备与规划
1. 硬件要求
最小配置(学习/测试环境):
- 3台服务器(2核CPU,4GB内存,50GB磁盘)
- 网络互通,SSH访问
- 操作系统:Ubuntu 20.04/22.04 或 CentOS 8+
生产环境推荐配置:
- 3台Master节点(4核CPU,8GB内存)
- 3台Worker节点(根据业务需求配置)
- 高速网络(万兆网络推荐)
- SSD存储
2. 软件要求
bash
# 检查系统版本
cat /etc/os-release
# 检查内核版本(需要3.10+)
uname -r
# 检查防火墙状态
sudo ufw status
# 检查SELinux状态(CentOS)
getenforce
二、Kubernetes集群搭建
1. Kubernetes集群架构
工作节点 Worker Nodes
控制平面 Control Plane
Node 3
Node 2
Node 1
Pod E
Kafka Container
Pod D
App Container
Pod C
Redis Container
Pod B
MySQL Container
Pod A
App Container
Sidecar Container
API Server
Scheduler
Controller Manager
etcd
Kubelet
Kubelet
Kubelet
PodA
PodB
PodC
PodD
PodE
2. 使用kubeadm搭建集群
在所有节点执行以下步骤:
bash
#!/bin/bash
# setup-k8s.sh
set -e
echo "=== 开始安装Kubernetes集群 ==="
# 1. 关闭swap
sudo swapoff -a
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
# 2. 安装Docker
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io
# 配置Docker使用systemd作为cgroup驱动
sudo cat > /etc/docker/daemon.json <<EOF
{
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF
sudo systemctl restart docker
sudo systemctl enable docker
# 3. 安装kubeadm, kubelet, kubectl
sudo apt-get update && sudo apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
# 4. 配置kubelet使用systemd作为cgroup驱动
echo 'KUBELET_EXTRA_ARGS="--cgroup-driver=systemd"' | sudo tee /etc/default/kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet
echo "基础环境安装完成!"
在Master节点执行:
bash
#!/bin/bash
# init-master.sh
set -e
echo "=== 初始化Master节点 ==="
# 1. 初始化集群
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=$(hostname -I | awk '{print $1}') \
--image-repository registry.aliyuncs.com/google_containers
# 2. 配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 3. 安装网络插件(Flannel)
kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
# 4. 等待所有Pod就绪
kubectl get pods --all-namespaces -w
# 5. 生成加入集群的命令
echo "=== Worker节点加入命令 ==="
kubeadm token create --print-join-command
echo "Master节点初始化完成!"
在Worker节点执行:
bash
#!/bin/bash
# join-worker.sh
set -e
echo "=== 加入Kubernetes集群 ==="
# 使用Master节点生成的命令加入集群
# 例如:kubeadm join 192.168.1.100:6443 --token xxxxxx --discovery-token-ca-cert-hash sha256:xxxxxx
# 检查节点状态
kubectl get nodes
echo "Worker节点加入完成!"
三、容器化Java应用
1. 多阶段构建Dockerfile
dockerfile
# 多阶段构建Dockerfile
# 阶段1: 构建阶段
FROM eclipse-temurin:17-jdk-jammy as builder
# 设置工作目录
WORKDIR /app
# 复制构建文件
COPY gradlew .
COPY gradle gradle
COPY build.gradle .
COPY settings.gradle .
COPY src src
# 赋予执行权限
RUN chmod +x gradlew
# 构建应用(跳过测试)
RUN ./gradlew build -x test
# 阶段2: 运行阶段
FROM eclipse-temurin:17-jre-jammy
# 安装必要的工具
RUN apt-get update && apt-get install -y \
curl \
jq \
&& rm -rf /var/lib/apt/lists/*
# 创建非root用户
RUN groupadd --system --gid 1001 appuser && \
useradd --system --uid 1001 --gid 1001 appuser
# 设置工作目录
WORKDIR /app
# 从构建阶段复制jar文件
COPY --from=builder /app/build/libs/*.jar app.jar
# 复制启动脚本
COPY --chown=appuser:appuser docker-entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/docker-entrypoint.sh
# 设置环境变量
ENV JAVA_OPTS="-XX:+UseContainerSupport \
-XX:MaxRAMPercentage=75.0 \
-XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:ParallelGCThreads=2 \
-XX:ConcGCThreads=2 \
-XX:+ExitOnOutOfMemoryError"
ENV SPRING_PROFILES_ACTIVE="docker"
# 切换到非root用户
USER appuser
# 暴露端口
EXPOSE 8080
# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8080/actuator/health || exit 1
# 使用entrypoint脚本
ENTRYPOINT ["docker-entrypoint.sh"]
2. Docker Entrypoint脚本
bash
#!/bin/bash
# docker-entrypoint.sh
set -e
# 等待依赖服务
wait_for_service() {
local host=$1
local port=$2
local timeout=${3:-30}
echo "Waiting for $host:$port..."
for i in $(seq 1 $timeout); do
if nc -z $host $port 2>/dev/null; then
echo "$host:$port is available!"
return 0
fi
echo "Attempt $i/$timeout: $host:$port not ready yet..."
sleep 1
done
echo "Timeout waiting for $host:$port"
return 1
}
# 检查必要的环境变量
check_required_vars() {
local required_vars=("DB_HOST" "DB_PORT")
local missing_vars=()
for var in "${required_vars[@]}"; do
if [ -z "${!var}" ]; then
missing_vars+=("$var")
fi
done
if [ ${#missing_vars[@]} -ne 0 ]; then
echo "Error: Missing required environment variables: ${missing_vars[*]}"
exit 1
fi
}
# 主函数
main() {
echo "Starting application..."
# 检查必要环境变量
check_required_vars
# 等待数据库
if [ -n "$DB_HOST" ] && [ -n "$DB_PORT" ]; then
wait_for_service "$DB_HOST" "$DB_PORT"
fi
# 等待Redis
if [ -n "$REDIS_HOST" ] && [ -n "$REDIS_PORT" ]; then
wait_for_service "$REDIS_HOST" "$REDIS_PORT"
fi
# 执行Java应用
echo "Starting Java application with options: $JAVA_OPTS"
exec java $JAVA_OPTS -jar app.jar "$@"
}
# 运行主函数
main "$@"
四、完整的Kubernetes部署配置
1. 命名空间配置
yaml
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: springboot-app
labels:
name: springboot-app
environment: production
2. 配置管理
yaml
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: springboot-app
data:
application.yml: |
spring:
application:
name: springboot-app
datasource:
url: jdbc:mysql://mysql-service:3306/appdb
username: ${DB_USERNAME}
password: ${DB_PASSWORD}
hikari:
maximum-pool-size: 10
minimum-idle: 5
redis:
host: redis-service
port: 6379
timeout: 2000ms
kafka:
bootstrap-servers: kafka-service:9092
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
health:
db:
enabled: true
redis:
enabled: true
diskspace:
enabled: true
3. 密钥管理
yaml
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
namespace: springboot-app
type: Opaque
data:
db-username: YXBwX3VzZXI= # app_user
db-password: UGFzc3dvcmQxMjM= # Password123
redis-password: cmVkaXNfcGFzc3dvcmQ= # redis_password
4. 应用部署
yaml
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: springboot-app
namespace: springboot-app
labels:
app: springboot-app
version: v1.0.0
spec:
replicas: 3
revisionHistoryLimit: 3
selector:
matchLabels:
app: springboot-app
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: springboot-app
version: v1.0.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/actuator/prometheus"
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- springboot-app
topologyKey: kubernetes.io/hostname
containers:
- name: app
image: your-registry/springboot-app:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8080
name: http
protocol: TCP
env:
- name: DB_USERNAME
valueFrom:
secretKeyRef:
name: app-secrets
key: db-username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets
key: db-password
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets
key: redis-password
- name: JAVA_OPTS
value: "-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0 -XX:+UseG1GC"
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
scheme: HTTP
initialDelaySeconds: 30
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
startupProbe:
httpGet:
path: /actuator/health/startup
port: 8080
scheme: HTTP
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 30
volumeMounts:
- name: config-volume
mountPath: /app/config
- name: logs-volume
mountPath: /app/logs
volumes:
- name: config-volume
configMap:
name: app-config
- name: logs-volume
emptyDir: {}
restartPolicy: Always
terminationGracePeriodSeconds: 30
5. 服务发现
yaml
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: springboot-service
namespace: springboot-app
labels:
app: springboot-app
spec:
selector:
app: springboot-app
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP
6. 水平自动伸缩
yaml
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: springboot-hpa
namespace: springboot-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: springboot-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
7. 入口路由
yaml
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: springboot-ingress
namespace: springboot-app
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
nginx.ingress.kubernetes.io/proxy-read-timeout: "30"
nginx.ingress.kubernetes.io/proxy-send-timeout: "30"
spec:
ingressClassName: nginx
tls:
- hosts:
- app.yourdomain.com
secretName: tls-secret
rules:
- host: app.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: springboot-service
port:
number: 80
五、中间件容器化部署
1. MySQL StatefulSet部署
yaml
# mysql-statefulset.yaml
apiVersion: v1
kind: Service
metadata:
name: mysql-service
namespace: springboot-app
labels:
app: mysql
spec:
ports:
- port: 3306
name: mysql
clusterIP: None
selector:
app: mysql
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: springboot-app
spec:
serviceName: mysql-service
replicas: 1
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
- name: MYSQL_DATABASE
value: "appdb"
- name: MYSQL_USER
value: "app_user"
- name: MYSQL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: user-password
ports:
- containerPort: 3306
name: mysql
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
- name: mysql-config
mountPath: /etc/mysql/conf.d
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
exec:
command:
- mysqladmin
- ping
- -h
- localhost
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
command:
- mysql
- -h
- localhost
- -uapp_user
- -p$(MYSQL_PASSWORD)
- -e
- SELECT 1
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 1
volumes:
- name: mysql-config
configMap:
name: mysql-config
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 20Gi
2. Redis Cluster部署
yaml
# redis-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-config
namespace: springboot-app
data:
redis.conf: |
cluster-enabled yes
cluster-config-file /data/nodes.conf
cluster-node-timeout 5000
appendonly yes
dir /data
port 6379
bind 0.0.0.0
protected-mode no
daemonize no
pidfile /var/run/redis.pid
loglevel notice
logfile ""
databases 16
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
maxmemory 1gb
maxmemory-policy allkeys-lru
---
# redis-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis-cluster
namespace: springboot-app
spec:
serviceName: redis-service
replicas: 6
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
command: ["redis-server"]
args: ["/etc/redis/redis.conf"]
ports:
- containerPort: 6379
name: client
- containerPort: 16379
name: gossip
volumeMounts:
- name: redis-data
mountPath: /data
- name: redis-config
mountPath: /etc/redis
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
livenessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- redis-cli
- ping
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: redis-config
configMap:
name: redis-config
items:
- key: redis.conf
path: redis.conf
volumeClaimTemplates:
- metadata:
name: redis-data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
六、一键部署脚本
bash
#!/bin/bash
# deploy.sh - 一键部署脚本
set -e
echo "开始部署云原生应用..."
# 检查kubectl
if ! command -v kubectl &> /dev/null; then
echo "错误: kubectl未安装"
exit 1
fi
# 创建命名空间
echo "创建命名空间..."
kubectl apply -f namespace.yaml
# 创建密钥
echo "创建密钥..."
kubectl create secret generic mysql-secret \
--from-literal=root-password='RootPassword123!' \
--from-literal=user-password='UserPassword123!' \
--namespace=springboot-app
kubectl create secret generic app-secrets \
--from-literal=db-username='app_user' \
--from-literal=db-password='Password123' \
--from-literal=redis-password='redis_password' \
--namespace=springboot-app
# 部署中间件
echo "部署MySQL..."
kubectl apply -f mysql-config.yaml
kubectl apply -f mysql-statefulset.yaml
echo "部署Redis..."
kubectl apply -f redis-config.yaml
kubectl apply -f redis-statefulset.yaml
# 等待中间件就绪
echo "等待中间件启动..."
kubectl wait --for=condition=ready pod -l app=mysql --timeout=300s -n springboot-app
kubectl wait --for=condition=ready pod -l app=redis --timeout=300s -n springboot-app
# 初始化Redis集群
echo "初始化Redis集群..."
REDIS_PODS=$(kubectl get pods -l app=redis -n springboot-app -o jsonpath='{.items[*].metadata.name}')
IFS=' ' read -ra PODS <<< "$REDIS_PODS"
if [ ${#PODS[@]} -eq 6 ]; then
kubectl exec -it ${PODS[0]} -n springboot-app -- redis-cli --cluster create \
${PODS[0]}.redis-service.springboot-app.svc.cluster.local:6379 \
${PODS[1]}.redis-service.springboot-app.svc.cluster.local:6379 \
${PODS[2]}.redis-service.springboot-app.svc.cluster.local:6379 \
${PODS[3]}.redis-service.springboot-app.svc.cluster.local:6379 \
${PODS[4]}.redis-service.springboot-app.svc.cluster.local:6379 \
${PODS[5]}.redis-service.springboot-app.svc.cluster.local:6379 \
--cluster-replicas 1
fi
# 部署应用
echo "部署Spring Boot应用..."
kubectl apply -f configmap.yaml
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
kubectl apply -f hpa.yaml
# 部署Ingress(如果有)
if [ -f ingress.yaml ]; then
echo "部署Ingress..."
kubectl apply -f ingress.yaml
fi
# 检查部署状态
echo "检查部署状态..."
kubectl get all -n springboot-app
echo "部署完成!"
echo "应用访问地址:"
echo "- 集群内部: http://springboot-service.springboot-app.svc.cluster.local"
echo "- 外部访问(如有Ingress): https://app.yourdomain.com"
echo ""
echo "监控应用状态:"
echo "kubectl get pods -n springboot-app"
echo "kubectl logs -f deployment/springboot-app -n springboot-app"
七、监控和诊断工具
1. 监控脚本
bash
#!/bin/bash
# monitor.sh - 监控和诊断脚本
set -e
NAMESPACE=${1:-springboot-app}
echo "=== 云原生应用监控诊断 ==="
echo "命名空间: $NAMESPACE"
echo "时间: $(date)"
echo ""
# 1. 检查命名空间资源
echo "1. 命名空间资源概览:"
kubectl get all -n $NAMESPACE
echo ""
# 2. 检查Pod状态
echo "2. Pod详细状态:"
kubectl get pods -n $NAMESPACE -o wide
echo ""
# 3. 检查Pod事件
echo "3. 最近事件:"
kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -10
echo ""
# 4. 检查资源使用
echo "4. 资源使用情况:"
kubectl top pods -n $NAMESPACE
echo ""
# 5. 检查HPA状态
echo "5. HPA状态:"
kubectl get hpa -n $NAMESPACE
echo ""
# 6. 检查服务端点
echo "6. 服务端点:"
kubectl get endpoints -n $NAMESPACE
echo ""
# 7. 检查Ingress状态
echo "7. Ingress状态:"
kubectl get ingress -n $NAMESPACE
echo ""
# 8. 检查配置
echo "8. 配置检查:"
kubectl get configmap,secret -n $NAMESPACE
echo ""
# 9. 网络策略检查
echo "9. 网络策略:"
kubectl get networkpolicy -n $NAMESPACE
echo ""
# 10. 存储检查
echo "10. 存储卷:"
kubectl get pvc,pv -n $NAMESPACE
echo ""
# 11. 节点状态
echo "11. 节点状态:"
kubectl get nodes -o wide
echo ""
# 12. 集群信息
echo "12. 集群版本:"
kubectl version --short
echo ""
echo "=== 诊断完成 ==="
echo "常见问题排查:"
echo "1. Pod CrashLoopBackOff: kubectl logs <pod-name> -n $NAMESPACE"
echo "2. Pod Pending: kubectl describe pod <pod-name> -n $NAMESPACE"
echo "3. 服务无法访问: kubectl describe service <service-name> -n $NAMESPACE"
echo "4. 镜像拉取失败: kubectl describe pod <pod-name> -n $NAMESPACE | grep -A5 Events"
2. 云原生监控体系
告警层
可视化层
数据存储层
数据处理层
数据采集层
应用指标
系统指标
业务指标
日志数据
链路追踪
Prometheus
Fluentd
Jaeger
OpenTelemetry
时序数据库
Elasticsearch
对象存储
Grafana
Kibana
自定义看板
Alertmanager
企业微信
钉钉
邮件/SMS
八、CI/CD流水线配置
1. GitLab CI/CD完整配置
yaml
# .gitlab-ci.yml
variables:
DOCKER_IMAGE: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA
K8S_NAMESPACE: "springboot-app"
KUBECONFIG: "/kubeconfig"
stages:
- build
- test
- security
- package
- deploy-dev
- deploy-staging
- deploy-prod
# 构建阶段
build:
stage: build
image: gradle:jdk17
script:
- ./gradlew clean build -x test
artifacts:
paths:
- build/libs/*.jar
expire_in: 1 week
# 单元测试
unit-test:
stage: test
image: gradle:jdk17
script:
- ./gradlew test
coverage: '/Total.*?([0-9]{1,3})%/'
artifacts:
reports:
junit: build/test-results/test/**/TEST-*.xml
# 安全扫描
sast:
stage: security
image:
name: "gcr.io/cloud-marketplace-containers/google/debian11"
entrypoint: [""]
script:
- apt-get update && apt-get install -y wget
- wget -qO- https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh
- grype dir:. --fail-on high
# 构建Docker镜像
docker-build:
stage: package
image: docker:latest
services:
- docker:dind
script:
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
- docker build -t $DOCKER_IMAGE .
- docker push $DOCKER_IMAGE
# 部署到开发环境
deploy-dev:
stage: deploy-dev
image: bitnami/kubectl:latest
script:
- echo "$KUBECONFIG_DEV" | base64 -d > $KUBECONFIG
- kubectl config use-context dev-cluster
- kubectl set image deployment/springboot-app app=$DOCKER_IMAGE -n $K8S_NAMESPACE
- kubectl rollout status deployment/springboot-app -n $K8S_NAMESPACE --timeout=300s
environment:
name: development
url: https://dev.app.yourdomain.com
only:
- dev
# 部署到生产环境
deploy-prod:
stage: deploy-prod
image: bitnami/kubectl:latest
script:
- echo "$KUBECONFIG_PROD" | base64 -d > $KUBECONFIG
- kubectl config use-context prod-cluster
# 金丝雀发布
- kubectl set image deployment/springboot-app app=$DOCKER_IMAGE -n $K8S_NAMESPACE
- kubectl rollout pause deployment/springboot-app -n $K8S_NAMESPACE
- sleep 60 # 观察金丝雀版本
- kubectl rollout resume deployment/springboot-app -n $K8S_NAMESPACE
- kubectl rollout status deployment/springboot-app -n $K8S_NAMESPACE --timeout=600s
environment:
name: production
url: https://app.yourdomain.com
when: manual
only:
- main
九、性能测试和安全扫描
1. 性能测试脚本
bash
#!/bin/bash
# performance-test.sh
set -e
echo "=== 云原生应用性能测试 ==="
echo "开始时间: $(date)"
echo ""
# 测试配置
CONCURRENT_USERS=${1:-100}
DURATION=${2:-300}
BASE_URL=${3:-"http://springboot-service.springboot-app.svc.cluster.local"}
# 预热应用
echo "1. 预热应用..."
for i in {1..10}; do
curl -s -o /dev/null -w "%{http_code}" $BASE_URL/actuator/health
echo " - 请求 $i: 完成"
sleep 1
done
echo ""
# 负载测试
echo "2. 执行负载测试..."
echo "并发用户: $CONCURRENT_USERS"
echo "持续时间: ${DURATION}秒"
echo ""
# 使用wrk进行负载测试
if command -v wrk &> /dev/null; then
wrk -t$CONCURRENT_USERS -c$CONCURRENT_USERS -d${DURATION}s $BASE_URL/actuator/health
else
echo "警告: wrk未安装,使用ab替代"
ab -n $((CONCURRENT_USERS * DURATION)) -c $CONCURRENT_USERS $BASE_URL/actuator/health
fi
echo "性能测试完成!"
2. 安全扫描脚本
bash
#!/bin/bash
# security-scan.sh
set -e
echo "=== 云原生安全扫描 ==="
echo "开始时间: $(date)"
echo ""
NAMESPACE="springboot-app"
# 镜像漏洞扫描
echo "1. 镜像漏洞扫描..."
if command -v trivy &> /dev/null; then
IMAGES=$(kubectl get pods -n $NAMESPACE -o json | jq -r '.items[].spec.containers[].image' | sort | uniq)
for IMAGE in $IMAGES; do
echo "扫描镜像: $IMAGE"
trivy image --severity HIGH,CRITICAL $IMAGE || true
echo ""
done
else
echo "警告: trivy未安装,跳过镜像扫描"
fi
# 检查安全配置
echo "2. 检查安全配置..."
echo "Pod安全上下文:"
kubectl get pods -n $NAMESPACE -o json | jq '.items[].spec.securityContext' || echo "无安全上下文配置"
echo ""
echo "网络策略:"
kubectl get networkpolicy -n $NAMESPACE
echo ""
echo "安全扫描完成!"
十、Java应用容器化部署流程
应用实例 Kubernetes 镜像仓库 CI/CD平台 Git仓库 开发者 应用实例 Kubernetes 镜像仓库 CI/CD平台 Git仓库 开发者 提交代码 触发流水线 1. 代码编译 2. 单元测试 3. 安全扫描 4. 构建Docker镜像 5. 推送镜像 6. 更新Deployment 7. 创建新Pod 8. 启动新容器 9. 健康检查 10. 流量切换 11. 清理旧Pod 12. 部署完成
十一、常见问题排查
1. Pod启动失败
问题现象:
kubectl get pods
NAME READY STATUS RESTARTS AGE
springboot-app-5f8c6d8b5d-abcde 0/1 CrashLoopBackOff 5 2m
排查步骤:
bash
# 1. 查看Pod详情
kubectl describe pod springboot-app-5f8c6d8b5d-abcde -n springboot-app
# 2. 查看Pod日志
kubectl logs springboot-app-5f8c6d8b5d-abcde -n springboot-app
# 3. 查看事件
kubectl get events -n springboot-app --sort-by='.lastTimestamp' | grep springboot-app
# 4. 进入Pod调试
kubectl exec -it springboot-app-5f8c6d8b5d-abcde -n springboot-app -- /bin/sh
2. 服务无法访问
问题现象:
- 外部无法访问服务
- 内部服务间调用失败
排查步骤:
bash
# 1. 检查Service
kubectl get svc -n springboot-app
kubectl describe svc springboot-service -n springboot-app
# 2. 检查Endpoints
kubectl get endpoints -n springboot-app
# 3. 检查网络策略
kubectl get networkpolicy -n springboot-app
# 4. 测试内部访问
kubectl run test-curl --image=curlimages/curl -it --rm -- curl http://springboot-service.springboot-app.svc.cluster.local/actuator/health
3. 资源不足
问题现象:
kubectl get pods
NAME READY STATUS RESTARTS AGE
springboot-app-5f8c6d8b5d-abcde 0/1 Pending 0 5m
排查步骤:
bash
# 1. 查看Pod详情
kubectl describe pod springboot-app-5f8c6d8b5d-abcde -n springboot-app
# 2. 检查节点资源
kubectl describe nodes
# 3. 检查资源配额
kubectl describe quota -n springboot-app
# 4. 调整资源请求
# 修改deployment.yaml中的resources配置
十二、最佳实践总结
1. 容器化最佳实践
镜像优化:
- 使用多阶段构建减小镜像大小
- 使用Alpine基础镜像
- 清理构建缓存和临时文件
- 使用非root用户运行容器
资源配置:
- 合理设置requests和limits
- 启用健康检查
- 配置资源配额
- 使用亲和性和反亲和性
2. 部署最佳实践
滚动更新:
- 配置适当的maxSurge和maxUnavailable
- 使用就绪探针确保服务可用性
- 设置适当的minReadySeconds
高可用:
- 多副本部署
- 跨节点分布Pod
- 使用PodDisruptionBudget
3. 监控最佳实践
指标收集:
- 应用指标(业务指标、性能指标)
- 系统指标(CPU、内存、磁盘、网络)
- 中间件指标(数据库、缓存、消息队列)
告警配置:
- 设置合理的告警阈值
- 避免告警风暴
- 分级告警(警告、严重、紧急)
十三、下一步学习建议
1. 进阶学习方向
服务网格:
- Istio:功能最全的服务网格
- Linkerd:轻量级服务网格
- Consul:服务发现和配置管理
云原生安全:
- 容器安全
- 网络安全
- 身份认证和授权
- 合规性检查
性能优化:
- 应用性能优化
- 集群性能优化
- 成本优化
2. 实践项目建议
项目1:微服务电商系统
- 商品服务、订单服务、用户服务
- 使用Spring Cloud + Kubernetes
- 实现完整的CI/CD流水线
项目2:实时数据处理平台
- 使用Kafka进行数据流处理
- 使用Flink进行实时计算
- 使用Redis进行缓存
项目3:AI模型服务平台
- 模型训练和推理服务
- 使用GPU资源调度
- 模型版本管理和A/B测试
十四、总结
通过本文的实战指南,你应该已经掌握了:
- 环境搭建:从零搭建Kubernetes集群
- 应用容器化:将Java应用打包为Docker镜像
- K8s部署:使用YAML文件部署应用到集群
- 中间件部署:MySQL、Redis等中间件的容器化部署
- 自动化运维:监控、诊断、CI/CD流水线
- 问题排查:常见问题的诊断和解决方法
云原生是一个持续学习和实践的过程,建议你:
- 动手实践:在自己的环境中尝试部署
- 参与社区:加入Kubernetes和云原生社区
- 持续学习:关注云原生技术的最新发展
- 分享经验:将你的实践经验分享给他人
希望这篇实战指南能帮助你在云原生的道路上走得更远!
上一篇回顾:《云原生全解析:从概念到实践,Java技术栈如何拥抱云原生时代》
相关资源:
作者简介:互联网老兵,专注于Java微服务和云原生技术实践。
版权声明:本文为原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。