LangChain.js 完全开发手册(十四)生产环境部署与 DevOps 实践

第14章:生产环境部署与 DevOps 实践

前言

大家好,我是鲫小鱼。是一名不写前端代码的前端工程师,热衷于分享非前端的知识,带领切图仔逃离切图圈子,欢迎关注我,微信公众号:《鲫小鱼不正经》。欢迎点赞、收藏、关注,一键三连!!

🎯 本章学习目标

  • 掌握 AI 应用生产环境部署的完整流程:从开发到上线的全链路实践
  • 深入理解容器化技术(Docker)在 AI 应用中的应用与优化策略
  • 构建完整的 CI/CD 流水线:自动化测试、构建、部署与回滚机制
  • 建立全面的监控体系:性能监控、日志管理、告警系统与故障排查
  • 掌握云原生部署策略:Kubernetes、服务网格、负载均衡与弹性伸缩
  • 实现安全防护体系:API 安全、数据加密、访问控制与合规性管理
  • 完成企业级 AI 应用的完整部署实战:从零到生产环境的全流程

📖 理论基础:生产环境部署的核心概念(约 30%)

14.1 为什么 AI 应用部署如此复杂

AI 应用与传统 Web 应用相比,在生产环境部署上面临着独特的挑战:

🔍 资源密集型

  • 模型文件通常很大(几GB到几十GB),需要高效的存储和传输策略
  • 推理计算需要大量 CPU/GPU 资源,对硬件配置要求高
  • 内存消耗大,需要精细的内存管理和优化

⚡ 延迟敏感

  • 用户对 AI 响应时间期望高,需要毫秒级响应
  • 模型加载时间长,需要预热和缓存策略
  • 网络延迟对用户体验影响显著

🔄 版本管理复杂

  • 模型版本、代码版本、数据版本需要协调管理
  • A/B 测试和灰度发布策略更加复杂
  • 回滚机制需要考虑模型兼容性

📊 监控需求特殊

  • 需要监控模型性能指标(准确率、召回率等)
  • Token 使用量、API 调用成本需要实时监控
  • 模型漂移和性能衰减需要及时发现

14.2 现代部署架构演进

传统单体部署 → 微服务架构 → 云原生架构

scss 复制代码
┌─────────────────────────────────────────────────────────────┐
│                    云原生 AI 应用架构                        │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  前端应用    │  │  API 网关    │  │  负载均衡    │         │
│  │  (Next.js)  │  │  (Kong)     │  │  (Nginx)    │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  AI 服务     │  │  向量数据库  │  │  缓存系统    │         │
│  │  (LangChain)│  │  (Pinecone) │  │  (Redis)    │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  监控系统    │  │  日志系统    │  │  配置管理    │         │
│  │  (Prometheus)│  │  (ELK)     │  │  (Consul)   │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
└─────────────────────────────────────────────────────────────┘

14.3 DevOps 在 AI 应用中的特殊实践

🔄 MLOps vs DevOps

  • MLOps 是 DevOps 在机器学习领域的扩展
  • 增加了数据管道、模型训练、模型部署等环节
  • 需要处理模型版本管理、实验跟踪、模型监控等特殊需求

📈 持续集成/持续部署 (CI/CD)

  • 代码质量检查:静态分析、单元测试、集成测试
  • 模型验证:准确性测试、性能基准测试、A/B 测试
  • 自动化部署:蓝绿部署、金丝雀发布、滚动更新

🔍 监控与可观测性

  • 应用性能监控 (APM):响应时间、吞吐量、错误率
  • 基础设施监控:CPU、内存、磁盘、网络使用率
  • 业务指标监控:用户满意度、转化率、成本效益

14.4 部署策略与模式

🚀 部署策略

  • 蓝绿部署:维护两套完全相同的生产环境,实现零停机部署
  • 金丝雀发布:逐步将流量从旧版本切换到新版本
  • 滚动更新:逐步替换实例,保持服务可用性
  • A/B 测试:同时运行多个版本,比较效果

🏗️ 架构模式

  • 微服务架构:将 AI 应用拆分为多个独立的服务
  • 服务网格:统一管理服务间通信、安全、监控
  • 事件驱动架构:通过消息队列实现异步处理
  • CQRS 模式:读写分离,优化查询性能

🐳 容器化技术:Docker 在 AI 应用中的应用(约 20%)

14.5 Docker 基础与最佳实践

📦 多阶段构建优化

dockerfile 复制代码
# 文件:Dockerfile
# 第一阶段:构建环境
FROM node:18-alpine AS builder

WORKDIR /app

# 复制 package 文件
COPY package*.json ./
COPY tsconfig.json ./

# 安装依赖
RUN npm ci --only=production

# 复制源代码
COPY src/ ./src/

# 构建应用
RUN npm run build

# 第二阶段:生产环境
FROM node:18-alpine AS production

# 安装必要的系统依赖
RUN apk add --no-cache \
    python3 \
    make \
    g++ \
    && rm -rf /var/cache/apk/*

WORKDIR /app

# 创建非 root 用户
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nextjs -u 1001

# 复制构建产物
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nextjs:nodejs /app/package*.json ./

# 设置环境变量
ENV NODE_ENV=production
ENV PORT=3000

# 切换到非 root 用户
USER nextjs

# 暴露端口
EXPOSE 3000

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:3000/api/health || exit 1

# 启动应用
CMD ["node", "dist/index.js"]

🔧 环境配置管理

dockerfile 复制代码
# 文件:Dockerfile.dev
FROM node:18-alpine AS development

WORKDIR /app

# 安装开发依赖
COPY package*.json ./
RUN npm install

# 复制源代码
COPY . .

# 设置开发环境变量
ENV NODE_ENV=development
ENV LANGCHAIN_TRACING_V2=true

# 暴露端口
EXPOSE 3000

# 启动开发服务器
CMD ["npm", "run", "dev"]

14.6 Docker Compose 多服务编排

yaml 复制代码
# 文件:docker-compose.yml
version: '3.8'

services:
  # AI 应用服务
  ai-app:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - REDIS_URL=redis://redis:6379
      - DATABASE_URL=postgresql://postgres:password@postgres:5432/ai_app
    depends_on:
      - redis
      - postgres
    volumes:
      - ./logs:/app/logs
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  # Redis 缓存服务
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis_data:/data
    restart: unless-stopped
    command: redis-server --appendonly yes

  # PostgreSQL 数据库
  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_DB=ai_app
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=password
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped

  # Nginx 反向代理
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/nginx/ssl
    depends_on:
      - ai-app
    restart: unless-stopped

  # Prometheus 监控
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    restart: unless-stopped

  # Grafana 可视化
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana_data:/var/lib/grafana
    restart: unless-stopped

volumes:
  redis_data:
  postgres_data:
  prometheus_data:
  grafana_data:

14.7 镜像优化与安全加固

🔒 安全最佳实践

dockerfile 复制代码
# 文件:Dockerfile.secure
FROM node:18-alpine AS base

# 安装安全更新
RUN apk update && apk upgrade && apk add --no-cache dumb-init

# 创建应用用户
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001 -G nodejs

# 设置工作目录
WORKDIR /app

# 复制 package 文件并安装依赖
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

# 复制应用代码
COPY --chown=nextjs:nodejs . .

# 构建应用
RUN npm run build

# 切换到非 root 用户
USER nextjs

# 设置安全环境变量
ENV NODE_ENV=production
ENV NODE_OPTIONS="--max-old-space-size=512"

# 使用 dumb-init 作为 PID 1
ENTRYPOINT ["dumb-init", "--"]

# 启动应用
CMD ["node", "dist/index.js"]

📊 镜像大小优化

bash 复制代码
# 使用 .dockerignore 排除不必要的文件
# 文件:.dockerignore
node_modules
npm-debug.log
.git
.gitignore
README.md
.env
.nyc_output
coverage
.vscode
.idea
*.log
dist
build
.next

🚀 CI/CD 流水线:自动化部署实践(约 25%)

14.8 GitHub Actions 完整流水线

yaml 复制代码
# 文件:.github/workflows/deploy.yml
name: AI App CI/CD Pipeline

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  # 代码质量检查
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '18'
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run linting
        run: npm run lint

      - name: Run type checking
        run: npm run type-check

      - name: Run unit tests
        run: npm run test:unit
        env:
          NODE_ENV: test

      - name: Run integration tests
        run: npm run test:integration
        env:
          NODE_ENV: test
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY_TEST }}

      - name: Generate coverage report
        run: npm run test:coverage

      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage/lcov.info

  # 安全扫描
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'

      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

  # 构建和推送镜像
  build-and-push:
    needs: [quality-check, security-scan]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=sha,prefix={{branch}}-
            type=raw,value=latest,enable={{is_default_branch}}

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: linux/amd64,linux/arm64
          push: true
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  # 部署到生产环境
  deploy:
    needs: build-and-push
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production

    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-west-2

      - name: Update ECS service
        run: |
          aws ecs update-service \
            --cluster ai-app-cluster \
            --service ai-app-service \
            --force-new-deployment

      - name: Wait for deployment to complete
        run: |
          aws ecs wait services-stable \
            --cluster ai-app-cluster \
            --services ai-app-service

      - name: Run smoke tests
        run: |
          # 等待服务启动
          sleep 30

          # 运行健康检查
          curl -f https://api.yourapp.com/health || exit 1

          # 运行基本功能测试
          npm run test:smoke
        env:
          API_BASE_URL: https://api.yourapp.com

14.9 环境配置管理

typescript 复制代码
// 文件:src/config/environment.ts
import { z } from 'zod';

// 环境变量验证 Schema
const envSchema = z.object({
  NODE_ENV: z.enum(['development', 'test', 'production']).default('development'),
  PORT: z.string().transform(Number).default(3000),

  // API 配置
  OPENAI_API_KEY: z.string().min(1),
  OPENAI_BASE_URL: z.string().url().optional(),

  // 数据库配置
  DATABASE_URL: z.string().url(),
  REDIS_URL: z.string().url(),

  // 监控配置
  LANGCHAIN_TRACING_V2: z.string().transform(val => val === 'true').default(false),
  LANGCHAIN_API_KEY: z.string().optional(),

  // 安全配置
  JWT_SECRET: z.string().min(32),
  CORS_ORIGIN: z.string().default('*'),

  // 功能开关
  ENABLE_RATE_LIMITING: z.string().transform(val => val === 'true').default(true),
  ENABLE_CACHING: z.string().transform(val => val === 'true').default(true),
});

// 验证并导出配置
export const config = envSchema.parse(process.env);

// 类型导出
export type Config = z.infer<typeof envSchema>;

// 环境特定配置
export const getEnvironmentConfig = () => {
  switch (config.NODE_ENV) {
    case 'development':
      return {
        logLevel: 'debug',
        enableSwagger: true,
        enableMetrics: false,
      };
    case 'test':
      return {
        logLevel: 'error',
        enableSwagger: false,
        enableMetrics: false,
      };
    case 'production':
      return {
        logLevel: 'info',
        enableSwagger: false,
        enableMetrics: true,
      };
    default:
      throw new Error(`Unknown environment: ${config.NODE_ENV}`);
  }
};

14.10 自动化测试策略

typescript 复制代码
// 文件:src/tests/integration/ai-service.test.ts
import { describe, it, expect, beforeAll, afterAll } from '@jest/globals';
import { ChatOpenAI } from '@langchain/openai';
import { AIAssistant } from '../../services/ai-assistant';

describe('AI Service Integration Tests', () => {
  let aiAssistant: AIAssistant;

  beforeAll(async () => {
    // 使用测试环境的 API Key
    const model = new ChatOpenAI({
      openAIApiKey: process.env.OPENAI_API_KEY_TEST,
      modelName: 'gpt-3.5-turbo',
      temperature: 0,
    });

    aiAssistant = new AIAssistant(model);
  });

  afterAll(async () => {
    // 清理资源
  });

  describe('Basic Functionality', () => {
    it('should respond to simple questions', async () => {
      const response = await aiAssistant.ask('What is 2+2?');

      expect(response).toBeDefined();
      expect(response.content).toContain('4');
      expect(response.usage).toBeDefined();
    });

    it('should handle empty input gracefully', async () => {
      await expect(aiAssistant.ask('')).rejects.toThrow('Question cannot be empty');
    });

    it('should respect rate limits', async () => {
      const promises = Array(10).fill(null).map(() =>
        aiAssistant.ask('Test question')
      );

      const responses = await Promise.allSettled(promises);
      const rejected = responses.filter(r => r.status === 'rejected');

      // 应该有一些请求被限流
      expect(rejected.length).toBeGreaterThan(0);
    });
  });

  describe('Performance Tests', () => {
    it('should respond within acceptable time limits', async () => {
      const startTime = Date.now();
      await aiAssistant.ask('What is the capital of France?');
      const endTime = Date.now();

      const responseTime = endTime - startTime;
      expect(responseTime).toBeLessThan(5000); // 5秒内响应
    });
  });

  describe('Error Handling', () => {
    it('should handle API errors gracefully', async () => {
      // 使用无效的 API Key 测试错误处理
      const invalidModel = new ChatOpenAI({
        openAIApiKey: 'invalid-key',
        modelName: 'gpt-3.5-turbo',
      });

      const invalidAssistant = new AIAssistant(invalidModel);

      await expect(invalidAssistant.ask('Test')).rejects.toThrow();
    });
  });
});

📊 监控与可观测性:全面监控体系(约 15%)

14.11 应用性能监控 (APM)

typescript 复制代码
// 文件:src/monitoring/apm.ts
import { createPrometheusMetrics } from 'prom-client';
import { Request, Response, NextFunction } from 'express';

// 创建 Prometheus 指标
const httpRequestDuration = new createPrometheusMetrics.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10],
});

const httpRequestTotal = new createPrometheusMetrics.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
});

const aiRequestDuration = new createPrometheusMetrics.Histogram({
  name: 'ai_request_duration_seconds',
  help: 'Duration of AI API requests in seconds',
  labelNames: ['model', 'operation'],
  buckets: [0.5, 1, 2, 5, 10, 30, 60],
});

const aiTokenUsage = new createPrometheusMetrics.Counter({
  name: 'ai_tokens_used_total',
  help: 'Total number of tokens used',
  labelNames: ['model', 'type'], // type: prompt, completion, total
});

const aiRequestErrors = new createPrometheusMetrics.Counter({
  name: 'ai_request_errors_total',
  help: 'Total number of AI request errors',
  labelNames: ['model', 'error_type'],
});

// 中间件:HTTP 请求监控
export const httpMetricsMiddleware = (req: Request, res: Response, next: NextFunction) => {
  const startTime = Date.now();

  res.on('finish', () => {
    const duration = (Date.now() - startTime) / 1000;
    const route = req.route?.path || req.path;

    httpRequestDuration
      .labels(req.method, route, res.statusCode.toString())
      .observe(duration);

    httpRequestTotal
      .labels(req.method, route, res.statusCode.toString())
      .inc();
  });

  next();
};

// AI 请求监控装饰器
export const monitorAIRequest = (model: string, operation: string) => {
  return function (target: any, propertyName: string, descriptor: PropertyDescriptor) {
    const method = descriptor.value;

    descriptor.value = async function (...args: any[]) {
      const startTime = Date.now();

      try {
        const result = await method.apply(this, args);

        const duration = (Date.now() - startTime) / 1000;
        aiRequestDuration
          .labels(model, operation)
          .observe(duration);

        // 记录 token 使用量
        if (result.usage) {
          aiTokenUsage
            .labels(model, 'prompt')
            .inc(result.usage.promptTokens || 0);

          aiTokenUsage
            .labels(model, 'completion')
            .inc(result.usage.completionTokens || 0);

          aiTokenUsage
            .labels(model, 'total')
            .inc(result.usage.totalTokens || 0);
        }

        return result;
      } catch (error: any) {
        aiRequestErrors
          .labels(model, error.name || 'Unknown')
          .inc();

        throw error;
      }
    };
  };
};

// 健康检查端点
export const healthCheck = (req: Request, res: Response) => {
  const health = {
    status: 'healthy',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    memory: process.memoryUsage(),
    version: process.env.npm_package_version || 'unknown',
  };

  res.json(health);
};

// 指标端点
export const metricsEndpoint = (req: Request, res: Response) => {
  res.set('Content-Type', createPrometheusMetrics.register.contentType);
  res.end(createPrometheusMetrics.register.metrics());
};

14.12 日志管理系统

typescript 复制代码
// 文件:src/logging/logger.ts
import winston from 'winston';
import { Request, Response, NextFunction } from 'express';

// 创建 Winston 日志器
const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: { service: 'ai-app' },
  transports: [
    // 控制台输出
    new winston.transports.Console({
      format: winston.format.combine(
        winston.format.colorize(),
        winston.format.simple()
      ),
    }),

    // 文件输出
    new winston.transports.File({
      filename: 'logs/error.log',
      level: 'error',
      maxsize: 5242880, // 5MB
      maxFiles: 5,
    }),

    new winston.transports.File({
      filename: 'logs/combined.log',
      maxsize: 5242880, // 5MB
      maxFiles: 5,
    }),
  ],
});

// 请求日志中间件
export const requestLogger = (req: Request, res: Response, next: NextFunction) => {
  const startTime = Date.now();

  // 记录请求开始
  logger.info('Request started', {
    method: req.method,
    url: req.url,
    userAgent: req.get('User-Agent'),
    ip: req.ip,
    requestId: req.headers['x-request-id'],
  });

  res.on('finish', () => {
    const duration = Date.now() - startTime;

    logger.info('Request completed', {
      method: req.method,
      url: req.url,
      statusCode: res.statusCode,
      duration,
      requestId: req.headers['x-request-id'],
    });
  });

  next();
};

// AI 请求日志
export const logAIRequest = (model: string, prompt: string, response: any, duration: number) => {
  logger.info('AI request completed', {
    model,
    promptLength: prompt.length,
    responseLength: response.content?.length || 0,
    duration,
    usage: response.usage,
    requestId: response.requestId,
  });
};

// 错误日志
export const logError = (error: Error, context?: any) => {
  logger.error('Application error', {
    message: error.message,
    stack: error.stack,
    context,
  });
};

export default logger;

14.13 告警系统配置

yaml 复制代码
# 文件:prometheus/alerts.yml
groups:
  - name: ai-app-alerts
    rules:
      # 高错误率告警
      - alert: HighErrorRate
        expr: rate(http_requests_total{status_code=~"5.."}[5m]) > 0.1
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }} errors per second"

      # 响应时间过长告警
      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"
          description: "95th percentile response time is {{ $value }} seconds"

      # AI 请求失败告警
      - alert: AIRequestFailures
        expr: rate(ai_request_errors_total[5m]) > 0.05
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "AI request failures detected"
          description: "AI request failure rate is {{ $value }} failures per second"

      # Token 使用量异常告警
      - alert: HighTokenUsage
        expr: rate(ai_tokens_used_total[1h]) > 10000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High token usage detected"
          description: "Token usage rate is {{ $value }} tokens per second"

      # 内存使用率告警
      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes > 0.9
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage detected"
          description: "Memory usage is {{ $value | humanizePercentage }}"

      # 磁盘空间告警
      - alert: LowDiskSpace
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) < 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space"
          description: "Disk space is {{ $value | humanizePercentage }} available"

🚀 实战项目:企业级 AI 应用完整部署(约 30%)

14.14 项目架构设计

让我们构建一个完整的企业级 AI 知识问答系统,包含以下组件:

scss 复制代码
┌─────────────────────────────────────────────────────────────┐
│                    企业 AI 知识问答系统                      │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  前端应用    │  │  API 网关    │  │  负载均衡    │         │
│  │  (Next.js)  │  │  (Kong)     │  │  (Nginx)    │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  AI 服务     │  │  向量数据库  │  │  缓存系统    │         │
│  │  (LangChain)│  │  (Pinecone) │  │  (Redis)    │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
│  │  文档处理    │  │  用户管理    │  │  审计日志    │         │
│  │  (Worker)   │  │  (Auth)     │  │  (Logging)  │         │
│  └─────────────┘  └─────────────┘  └─────────────┘         │
└─────────────────────────────────────────────────────────────┘

14.15 核心服务实现

typescript 复制代码
// 文件:src/services/ai-knowledge-service.ts
import { ChatOpenAI } from '@langchain/openai';
import { PineconeStore } from '@langchain/pinecone';
import { PromptTemplate } from '@langchain/core/prompts';
import { StringOutputParser } from '@langchain/core/output_parsers';
import { RunnableSequence } from '@langchain/core/runnables';
import { monitorAIRequest } from '../monitoring/apm';
import { logAIRequest } from '../logging/logger';

export class AIKnowledgeService {
  private model: ChatOpenAI;
  private vectorStore: PineconeStore;
  private promptTemplate: PromptTemplate;

  constructor() {
    this.model = new ChatOpenAI({
      openAIApiKey: process.env.OPENAI_API_KEY,
      modelName: 'gpt-4',
      temperature: 0.1,
    });

    this.promptTemplate = PromptTemplate.fromTemplate(`
      你是一个专业的企业知识助手。请基于以下上下文信息回答用户问题。

      上下文信息:
      {context}

      用户问题:{question}

      请提供准确、详细的回答,并标明信息来源。如果上下文信息不足以回答问题,请明确说明。

      回答:
    `);
  }

  @monitorAIRequest('gpt-4', 'knowledge-query')
  async queryKnowledge(question: string, userId: string): Promise<{
    answer: string;
    sources: string[];
    confidence: number;
  }> {
    const startTime = Date.now();

    try {
      // 1. 向量检索
      const relevantDocs = await this.vectorStore.similaritySearch(question, 5);

      if (relevantDocs.length === 0) {
        return {
          answer: '抱歉,我没有找到相关的信息来回答您的问题。',
          sources: [],
          confidence: 0,
        };
      }

      // 2. 构建上下文
      const context = relevantDocs
        .map((doc, index) => `[${index + 1}] ${doc.pageContent}`)
        .join('\n\n');

      // 3. 生成回答
      const chain = RunnableSequence.from([
        this.promptTemplate,
        this.model,
        new StringOutputParser(),
      ]);

      const answer = await chain.invoke({
        context,
        question,
      });

      // 4. 计算置信度
      const confidence = this.calculateConfidence(relevantDocs, question);

      // 5. 提取来源
      const sources = relevantDocs.map(doc => doc.metadata?.source || 'Unknown');

      const duration = Date.now() - startTime;
      logAIRequest('gpt-4', question, { content: answer }, duration);

      return {
        answer,
        sources,
        confidence,
      };
    } catch (error) {
      throw new Error(`Knowledge query failed: ${error.message}`);
    }
  }

  private calculateConfidence(docs: any[], question: string): number {
    // 简单的置信度计算逻辑
    // 实际应用中可以使用更复杂的算法
    const avgScore = docs.reduce((sum, doc) => sum + (doc.metadata?.score || 0), 0) / docs.length;
    return Math.min(avgScore * 2, 1); // 归一化到 0-1
  }
}

14.16 API 路由实现

typescript 复制代码
// 文件:src/routes/api/knowledge.ts
import { Router } from 'express';
import { AIKnowledgeService } from '../../services/ai-knowledge-service';
import { authenticateToken } from '../../middleware/auth';
import { rateLimiter } from '../../middleware/rate-limiter';
import { validateRequest } from '../../middleware/validation';
import { z } from 'zod';

const router = Router();
const aiService = new AIKnowledgeService();

// 请求验证 Schema
const querySchema = z.object({
  question: z.string().min(1).max(1000),
  context: z.string().optional(),
});

// 知识查询端点
router.post('/query',
  authenticateToken,
  rateLimiter,
  validateRequest(querySchema),
  async (req, res) => {
    try {
      const { question, context } = req.body;
      const userId = req.user.id;

      const result = await aiService.queryKnowledge(question, userId);

      res.json({
        success: true,
        data: result,
        timestamp: new Date().toISOString(),
      });
    } catch (error) {
      res.status(500).json({
        success: false,
        error: error.message,
        timestamp: new Date().toISOString(),
      });
    }
  }
);

// 批量查询端点
router.post('/batch-query',
  authenticateToken,
  rateLimiter,
  async (req, res) => {
    try {
      const { questions } = req.body;
      const userId = req.user.id;

      if (!Array.isArray(questions) || questions.length > 10) {
        return res.status(400).json({
          success: false,
          error: 'Invalid questions array (max 10 items)',
        });
      }

      const results = await Promise.all(
        questions.map(question => aiService.queryKnowledge(question, userId))
      );

      res.json({
        success: true,
        data: results,
        timestamp: new Date().toISOString(),
      });
    } catch (error) {
      res.status(500).json({
        success: false,
        error: error.message,
        timestamp: new Date().toISOString(),
      });
    }
  }
);

export default router;

14.17 Kubernetes 部署配置

yaml 复制代码
# 文件:k8s/namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: ai-app
  labels:
    name: ai-app
---
# 文件:k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ai-app-config
  namespace: ai-app
data:
  NODE_ENV: "production"
  LOG_LEVEL: "info"
  ENABLE_METRICS: "true"
  ENABLE_CACHING: "true"
---
# 文件:k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: ai-app-secrets
  namespace: ai-app
type: Opaque
data:
  OPENAI_API_KEY: <base64-encoded-key>
  DATABASE_URL: <base64-encoded-url>
  JWT_SECRET: <base64-encoded-secret>
---
# 文件:k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-app
  namespace: ai-app
  labels:
    app: ai-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-app
  template:
    metadata:
      labels:
        app: ai-app
    spec:
      containers:
      - name: ai-app
        image: ghcr.io/your-org/ai-app:latest
        ports:
        - containerPort: 3000
        env:
        - name: NODE_ENV
          valueFrom:
            configMapKeyRef:
              name: ai-app-config
              key: NODE_ENV
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: ai-app-secrets
              key: OPENAI_API_KEY
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /api/health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /api/health
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
---
# 文件:k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: ai-app-service
  namespace: ai-app
spec:
  selector:
    app: ai-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: ClusterIP
---
# 文件:k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ai-app-ingress
  namespace: ai-app
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - api.yourapp.com
    secretName: ai-app-tls
  rules:
  - host: api.yourapp.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: ai-app-service
            port:
              number: 80
---
# 文件:k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ai-app-hpa
  namespace: ai-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ai-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

14.18 服务网格配置 (Istio)

yaml 复制代码
# 文件:k8s/istio-gateway.yaml
apiVersion: networking.istio.io/v1alpha3
kind: Gateway
metadata:
  name: ai-app-gateway
  namespace: ai-app
spec:
  selector:
    istio: ingressgateway
  servers:
  - port:
      number: 443
      name: https
      protocol: HTTPS
    tls:
      mode: SIMPLE
      credentialName: ai-app-tls
    hosts:
    - api.yourapp.com
---
# 文件:k8s/istio-virtualservice.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: ai-app-vs
  namespace: ai-app
spec:
  hosts:
  - api.yourapp.com
  gateways:
  - ai-app-gateway
  http:
  - match:
    - uri:
        prefix: /api/
    route:
    - destination:
        host: ai-app-service
        port:
          number: 80
    timeout: 30s
    retries:
      attempts: 3
      perTryTimeout: 10s
---
# 文件:k8s/istio-destinationrule.yaml
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: ai-app-dr
  namespace: ai-app
spec:
  host: ai-app-service
  trafficPolicy:
    loadBalancer:
      simple: LEAST_CONN
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        http1MaxPendingRequests: 50
        maxRequestsPerConnection: 10
    circuitBreaker:
      consecutiveErrors: 5
      interval: 30s
      baseEjectionTime: 30s

🔒 安全防护体系:企业级安全实践(约 10%)

14.19 API 安全防护

typescript 复制代码
// 文件:src/security/api-security.ts
import { Request, Response, NextFunction } from 'express';
import rateLimit from 'express-rate-limit';
import helmet from 'helmet';
import { body, validationResult } from 'express-validator';

// 速率限制配置
export const createRateLimiter = (windowMs: number, max: number) => {
  return rateLimit({
    windowMs,
    max,
    message: {
      error: 'Too many requests',
      retryAfter: Math.ceil(windowMs / 1000),
    },
    standardHeaders: true,
    legacyHeaders: false,
    handler: (req: Request, res: Response) => {
      res.status(429).json({
        error: 'Rate limit exceeded',
        retryAfter: Math.ceil(windowMs / 1000),
      });
    },
  });
};

// 安全头配置
export const securityHeaders = helmet({
  contentSecurityPolicy: {
    directives: {
      defaultSrc: ["'self'"],
      styleSrc: ["'self'", "'unsafe-inline'"],
      scriptSrc: ["'self'"],
      imgSrc: ["'self'", "data:", "https:"],
    },
  },
  hsts: {
    maxAge: 31536000,
    includeSubDomains: true,
    preload: true,
  },
});

// 输入验证中间件
export const validateInput = (validations: any[]) => {
  return async (req: Request, res: Response, next: NextFunction) => {
    await Promise.all(validations.map(validation => validation.run(req)));

    const errors = validationResult(req);
    if (!errors.isEmpty()) {
      return res.status(400).json({
        error: 'Validation failed',
        details: errors.array(),
      });
    }

    next();
  };
};

// API Key 验证
export const validateApiKey = (req: Request, res: Response, next: NextFunction) => {
  const apiKey = req.headers['x-api-key'] as string;

  if (!apiKey) {
    return res.status(401).json({ error: 'API key required' });
  }

  // 验证 API Key 格式和有效性
  if (!isValidApiKey(apiKey)) {
    return res.status(403).json({ error: 'Invalid API key' });
  }

  next();
};

// 敏感信息过滤
export const sanitizeResponse = (req: Request, res: Response, next: NextFunction) => {
  const originalSend = res.send;

  res.send = function(data) {
    if (typeof data === 'string') {
      // 移除敏感信息
      data = data.replace(/password["\s]*:["\s]*[^,}]+/gi, 'password: "[REDACTED]"');
      data = data.replace(/token["\s]*:["\s]*[^,}]+/gi, 'token: "[REDACTED]"');
    }

    return originalSend.call(this, data);
  };

  next();
};

function isValidApiKey(apiKey: string): boolean {
  // 实现 API Key 验证逻辑
  return /^[a-zA-Z0-9]{32,}$/.test(apiKey);
}

14.20 数据加密与隐私保护

typescript 复制代码
// 文件:src/security/encryption.ts
import crypto from 'crypto';
import bcrypt from 'bcrypt';

export class EncryptionService {
  private readonly algorithm = 'aes-256-gcm';
  private readonly keyLength = 32;
  private readonly ivLength = 16;
  private readonly tagLength = 16;

  constructor(private readonly secretKey: string) {
    if (!secretKey || secretKey.length < 32) {
      throw new Error('Secret key must be at least 32 characters long');
    }
  }

  // 加密数据
  encrypt(text: string): string {
    const iv = crypto.randomBytes(this.ivLength);
    const cipher = crypto.createCipher(this.algorithm, this.secretKey);
    cipher.setAAD(Buffer.from('additional-data'));

    let encrypted = cipher.update(text, 'utf8', 'hex');
    encrypted += cipher.final('hex');

    const tag = cipher.getAuthTag();

    return iv.toString('hex') + ':' + tag.toString('hex') + ':' + encrypted;
  }

  // 解密数据
  decrypt(encryptedData: string): string {
    const parts = encryptedData.split(':');
    if (parts.length !== 3) {
      throw new Error('Invalid encrypted data format');
    }

    const iv = Buffer.from(parts[0], 'hex');
    const tag = Buffer.from(parts[1], 'hex');
    const encrypted = parts[2];

    const decipher = crypto.createDecipher(this.algorithm, this.secretKey);
    decipher.setAAD(Buffer.from('additional-data'));
    decipher.setAuthTag(tag);

    let decrypted = decipher.update(encrypted, 'hex', 'utf8');
    decrypted += decipher.final('utf8');

    return decrypted;
  }

  // 哈希密码
  async hashPassword(password: string): Promise<string> {
    const saltRounds = 12;
    return await bcrypt.hash(password, saltRounds);
  }

  // 验证密码
  async verifyPassword(password: string, hashedPassword: string): Promise<boolean> {
    return await bcrypt.compare(password, hashedPassword);
  }

  // 生成安全随机字符串
  generateSecureToken(length: number = 32): string {
    return crypto.randomBytes(length).toString('hex');
  }

  // 数据脱敏
  maskSensitiveData(data: any, fields: string[]): any {
    const masked = { ...data };

    fields.forEach(field => {
      if (masked[field]) {
        const value = String(masked[field]);
        if (value.length > 4) {
          masked[field] = value.substring(0, 2) + '*'.repeat(value.length - 4) + value.substring(value.length - 2);
        } else {
          masked[field] = '*'.repeat(value.length);
        }
      }
    });

    return masked;
  }
}

14.21 访问控制与权限管理

typescript 复制代码
// 文件:src/security/rbac.ts
import { Request, Response, NextFunction } from 'express';

export enum Permission {
  READ_KNOWLEDGE = 'read:knowledge',
  WRITE_KNOWLEDGE = 'write:knowledge',
  DELETE_KNOWLEDGE = 'delete:knowledge',
  MANAGE_USERS = 'manage:users',
  VIEW_ANALYTICS = 'view:analytics',
  MANAGE_SYSTEM = 'manage:system',
}

export enum Role {
  ADMIN = 'admin',
  MANAGER = 'manager',
  USER = 'user',
  GUEST = 'guest',
}

export interface User {
  id: string;
  email: string;
  role: Role;
  permissions: Permission[];
}

// 角色权限映射
const rolePermissions: Record<Role, Permission[]> = {
  [Role.ADMIN]: Object.values(Permission),
  [Role.MANAGER]: [
    Permission.READ_KNOWLEDGE,
    Permission.WRITE_KNOWLEDGE,
    Permission.VIEW_ANALYTICS,
  ],
  [Role.USER]: [
    Permission.READ_KNOWLEDGE,
  ],
  [Role.GUEST]: [],
};

// 权限检查中间件
export const requirePermission = (permission: Permission) => {
  return (req: Request, res: Response, next: NextFunction) => {
    const user = req.user as User;

    if (!user) {
      return res.status(401).json({ error: 'Authentication required' });
    }

    if (!hasPermission(user, permission)) {
      return res.status(403).json({
        error: 'Insufficient permissions',
        required: permission,
        userRole: user.role,
      });
    }

    next();
  };
};

// 角色检查中间件
export const requireRole = (roles: Role[]) => {
  return (req: Request, res: Response, next: NextFunction) => {
    const user = req.user as User;

    if (!user) {
      return res.status(401).json({ error: 'Authentication required' });
    }

    if (!roles.includes(user.role)) {
      return res.status(403).json({
        error: 'Insufficient role',
        required: roles,
        userRole: user.role,
      });
    }

    next();
  };
};

// 资源所有权检查
export const requireOwnership = (resourceUserIdField: string = 'userId') => {
  return (req: Request, res: Response, next: NextFunction) => {
    const user = req.user as User;
    const resourceUserId = req.params[resourceUserIdField] || req.body[resourceUserIdField];

    if (!user) {
      return res.status(401).json({ error: 'Authentication required' });
    }

    // 管理员可以访问所有资源
    if (user.role === Role.ADMIN) {
      return next();
    }

    // 检查资源所有权
    if (user.id !== resourceUserId) {
      return res.status(403).json({
        error: 'Access denied: resource ownership required',
      });
    }

    next();
  };
};

function hasPermission(user: User, permission: Permission): boolean {
  return user.permissions.includes(permission) ||
         rolePermissions[user.role]?.includes(permission) || false;
}

// 动态权限检查
export const checkPermission = (user: User, permission: Permission): boolean => {
  return hasPermission(user, permission);
};

// 权限装饰器
export const RequirePermission = (permission: Permission) => {
  return function (target: any, propertyName: string, descriptor: PropertyDescriptor) {
    const method = descriptor.value;

    descriptor.value = function (...args: any[]) {
      const req = args[0] as Request;
      const user = req.user as User;

      if (!hasPermission(user, permission)) {
        throw new Error(`Permission denied: ${permission}`);
      }

      return method.apply(this, args);
    };
  };
};

🚀 实战项目:完整部署流程演示(约 20%)

14.22 部署脚本自动化

bash 复制代码
#!/bin/bash
# 文件:scripts/deploy.sh

set -e

# 配置变量
APP_NAME="ai-knowledge-app"
NAMESPACE="ai-app"
REGISTRY="ghcr.io/your-org"
VERSION=${1:-latest}
ENVIRONMENT=${2:-production}

echo "🚀 开始部署 $APP_NAME v$VERSION 到 $ENVIRONMENT 环境"

# 1. 环境检查
echo "📋 检查部署环境..."
kubectl version --client
docker version
helm version

# 2. 构建和推送镜像
echo "🔨 构建 Docker 镜像..."
docker build -t $REGISTRY/$APP_NAME:$VERSION .
docker push $REGISTRY/$APP_NAME:$VERSION

# 3. 更新 Kubernetes 配置
echo "⚙️ 更新 Kubernetes 配置..."
envsubst < k8s/deployment.yaml | kubectl apply -f -
kubectl set image deployment/$APP_NAME $APP_NAME=$REGISTRY/$APP_NAME:$VERSION -n $NAMESPACE

# 4. 等待部署完成
echo "⏳ 等待部署完成..."
kubectl rollout status deployment/$APP_NAME -n $NAMESPACE --timeout=300s

# 5. 运行健康检查
echo "🏥 运行健康检查..."
kubectl get pods -n $NAMESPACE -l app=$APP_NAME

# 等待服务就绪
sleep 30

# 检查服务健康状态
HEALTH_URL="https://api.yourapp.com/api/health"
if curl -f $HEALTH_URL; then
  echo "✅ 健康检查通过"
else
  echo "❌ 健康检查失败"
  exit 1
fi

# 6. 运行烟雾测试
echo "🧪 运行烟雾测试..."
npm run test:smoke

# 7. 更新监控配置
echo "📊 更新监控配置..."
kubectl apply -f monitoring/

# 8. 清理旧版本
echo "🧹 清理旧版本..."
kubectl delete pods -n $NAMESPACE -l app=$APP_NAME --field-selector=status.phase=Succeeded

echo "🎉 部署完成!"
echo "📱 应用地址: https://api.yourapp.com"
echo "📊 监控面板: https://grafana.yourapp.com"
echo "📋 日志查看: kubectl logs -f deployment/$APP_NAME -n $NAMESPACE"

14.23 回滚脚本

bash 复制代码
#!/bin/bash
# 文件:scripts/rollback.sh

set -e

APP_NAME="ai-knowledge-app"
NAMESPACE="ai-app"

echo "🔄 开始回滚 $APP_NAME"

# 1. 查看部署历史
echo "📋 查看部署历史..."
kubectl rollout history deployment/$APP_NAME -n $NAMESPACE

# 2. 执行回滚
echo "⬅️ 执行回滚..."
kubectl rollout undo deployment/$APP_NAME -n $NAMESPACE

# 3. 等待回滚完成
echo "⏳ 等待回滚完成..."
kubectl rollout status deployment/$APP_NAME -n $NAMESPACE --timeout=300s

# 4. 验证回滚结果
echo "✅ 验证回滚结果..."
kubectl get pods -n $NAMESPACE -l app=$APP_NAME

# 5. 健康检查
echo "🏥 健康检查..."
sleep 30
HEALTH_URL="https://api.yourapp.com/api/health"
if curl -f $HEALTH_URL; then
  echo "✅ 回滚成功,服务正常运行"
else
  echo "❌ 回滚后服务异常,需要人工介入"
  exit 1
fi

echo "🎉 回滚完成!"

14.24 监控仪表板配置

json 复制代码
{
  "dashboard": {
    "id": null,
    "title": "AI Knowledge App Dashboard",
    "tags": ["ai", "knowledge", "production"],
    "timezone": "browser",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{route}}"
          }
        ],
        "yAxes": [
          {
            "label": "requests/sec"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "50th percentile"
          }
        ],
        "yAxes": [
          {
            "label": "seconds"
          }
        ]
      },
      {
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total{status_code=~\"5..\"}[5m])",
            "legendFormat": "5xx errors"
          }
        ],
        "yAxes": [
          {
            "label": "errors/sec"
          }
        ]
      },
      {
        "title": "AI Token Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(ai_tokens_used_total[1h])",
            "legendFormat": "{{model}} {{type}}"
          }
        ],
        "yAxes": [
          {
            "label": "tokens/hour"
          }
        ]
      },
      {
        "title": "Memory Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "container_memory_usage_bytes{container=\"ai-app\"}",
            "legendFormat": "Memory Usage"
          }
        ],
        "yAxes": [
          {
            "label": "bytes"
          }
        ]
      },
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(container_cpu_usage_seconds_total{container=\"ai-app\"}[5m])",
            "legendFormat": "CPU Usage"
          }
        ],
        "yAxes": [
          {
            "label": "cores"
          }
        ]
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "30s"
  }
}

⚙️ 最佳实践与常见问题(约 10%)

14.25 性能优化最佳实践

🚀 应用层优化

  • 使用连接池管理数据库连接
  • 实现智能缓存策略(Redis + 内存缓存)
  • 启用 Gzip 压缩减少传输大小
  • 使用 CDN 加速静态资源

📊 监控优化

  • 设置合理的告警阈值,避免告警疲劳
  • 使用分层监控:基础设施 → 应用 → 业务
  • 定期审查和清理无用的指标
  • 建立监控数据的保留策略

🔒 安全优化

  • 定期更新依赖包,修复安全漏洞
  • 使用最小权限原则配置访问控制
  • 实施零信任网络架构
  • 定期进行安全审计和渗透测试

14.26 常见问题与解决方案

❓ 问题1:部署后服务无法启动

bash 复制代码
# 诊断步骤
kubectl describe pod <pod-name> -n ai-app
kubectl logs <pod-name> -n ai-app
kubectl get events -n ai-app --sort-by='.lastTimestamp'

# 常见原因和解决方案
# 1. 环境变量配置错误
kubectl get configmap ai-app-config -n ai-app -o yaml
kubectl get secret ai-app-secrets -n ai-app -o yaml

# 2. 资源限制过小
kubectl top pods -n ai-app
kubectl describe pod <pod-name> -n ai-app | grep -A 5 "Limits\|Requests"

# 3. 健康检查配置错误
kubectl get pod <pod-name> -n ai-app -o yaml | grep -A 10 "livenessProbe\|readinessProbe"

❓ 问题2:AI 请求响应缓慢

typescript 复制代码
// 诊断和优化代码
export class PerformanceDiagnostics {
  static async diagnoseSlowRequests() {
    // 1. 检查 API 响应时间
    const apiMetrics = await this.getAPIMetrics();
    console.log('API Response Times:', apiMetrics);

    // 2. 检查模型加载时间
    const modelMetrics = await this.getModelMetrics();
    console.log('Model Loading Times:', modelMetrics);

    // 3. 检查缓存命中率
    const cacheMetrics = await this.getCacheMetrics();
    console.log('Cache Hit Rate:', cacheMetrics);

    // 4. 检查数据库查询性能
    const dbMetrics = await this.getDatabaseMetrics();
    console.log('Database Performance:', dbMetrics);
  }

  static async optimizePerformance() {
    // 1. 启用模型预热
    await this.warmupModels();

    // 2. 优化缓存策略
    await this.optimizeCache();

    // 3. 调整并发限制
    await this.adjustConcurrency();

    // 4. 启用请求批处理
    await this.enableBatching();
  }
}

❓ 问题3:内存泄漏问题

typescript 复制代码
// 内存监控和清理
export class MemoryManager {
  private static memoryThreshold = 0.8; // 80% 内存使用率阈值

  static startMemoryMonitoring() {
    setInterval(() => {
      const usage = process.memoryUsage();
      const usagePercent = usage.heapUsed / usage.heapTotal;

      if (usagePercent > this.memoryThreshold) {
        console.warn('High memory usage detected:', usage);
        this.performGarbageCollection();
      }
    }, 30000); // 每30秒检查一次
  }

  static performGarbageCollection() {
    if (global.gc) {
      global.gc();
      console.log('Garbage collection performed');
    } else {
      console.warn('Garbage collection not available');
    }
  }

  static clearCaches() {
    // 清理应用缓存
    // 清理 AI 模型缓存
    // 清理数据库连接池
  }
}

14.27 灾难恢复与备份策略

bash 复制代码
#!/bin/bash
# 文件:scripts/backup.sh

set -e

BACKUP_DIR="/backups/$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR

echo "💾 开始备份..."

# 1. 备份数据库
echo "📊 备份数据库..."
kubectl exec -n ai-app deployment/postgres -- pg_dump -U postgres ai_app > $BACKUP_DIR/database.sql

# 2. 备份配置文件
echo "⚙️ 备份配置文件..."
kubectl get configmap -n ai-app -o yaml > $BACKUP_DIR/configmaps.yaml
kubectl get secret -n ai-app -o yaml > $BACKUP_DIR/secrets.yaml

# 3. 备份持久化数据
echo "💿 备份持久化数据..."
kubectl exec -n ai-app deployment/redis -- redis-cli BGSAVE
kubectl cp ai-app/redis-pod:/data/dump.rdb $BACKUP_DIR/redis-dump.rdb

# 4. 备份应用代码和配置
echo "📁 备份应用配置..."
cp -r k8s/ $BACKUP_DIR/
cp -r monitoring/ $BACKUP_DIR/
cp docker-compose.yml $BACKUP_DIR/

# 5. 压缩备份文件
echo "🗜️ 压缩备份文件..."
tar -czf $BACKUP_DIR.tar.gz -C /backups $(basename $BACKUP_DIR)
rm -rf $BACKUP_DIR

# 6. 上传到云存储
echo "☁️ 上传到云存储..."
aws s3 cp $BACKUP_DIR.tar.gz s3://your-backup-bucket/backups/

echo "✅ 备份完成: $BACKUP_DIR.tar.gz"

📚 本章总结

通过本章学习,我们全面掌握了:

理论基础

  • AI 应用部署的复杂性和挑战
  • 现代部署架构的演进历程
  • DevOps 在 AI 应用中的特殊实践

技术实践

  • Docker 容器化技术的最佳实践
  • 完整的 CI/CD 流水线构建
  • 全面的监控和可观测性体系
  • 企业级安全防护措施

实战经验

  • 完整的 Kubernetes 部署配置
  • 服务网格和微服务架构
  • 自动化部署和回滚脚本
  • 灾难恢复和备份策略

最佳实践

  • 性能优化和问题诊断
  • 常见问题的解决方案
  • 监控告警的合理配置
  • 安全防护的全面实施

🎯 下章预告

下一章《AI 应用安全与伦理实践》中,我们将深入探讨:

  • AI 应用的安全威胁与防护策略
  • 数据隐私保护与合规性要求
  • AI 伦理原则与负责任开发
  • 安全审计与风险评估方法

最后感谢阅读!欢迎关注我,微信公众号:《鲫小鱼不正经》。欢迎点赞、收藏、关注,一键三连!!!

相关推荐
亿元程序员2 小时前
有了AI,游戏开发新人还有必要学Cocos游戏开发吗?
前端
Mike_jia2 小时前
Alist终极指南:一键聚合20+云存储,打造私有化文件管理中枢
前端
IT_陈寒3 小时前
Redis性能翻倍秘籍:10个99%开发者不知道的冷门配置优化技巧
前端·人工智能·后端
LinXunFeng3 小时前
Flutter - Melos Pub workspaces 实践
前端·flutter·架构
艾小码3 小时前
前端人必看!3个技巧让你彻底搞懂JS条件判断与循环
前端·javascript
yaocheng的ai分身10 小时前
Subagent 自进化:从使用中迭代出最契合场景的agent
ai编程
灵感__idea11 小时前
Hello 算法:让前端人真正理解算法
前端·javascript·算法
向葭奔赴♡11 小时前
CSS是什么?—— 网页的“化妆师”
前端·css
黑犬mo11 小时前
在Edge、Chrome浏览器上安装uBlock Origin插件
前端·edge