文件上传与云存储集成:构建可扩展、安全的文件管理系统

目录

  • 文件上传与云存储集成:构建可扩展、安全的文件管理系统
    • [1. 引言](#1. 引言)
    • [2. 系统架构设计](#2. 系统架构设计)
      • [2.1 整体架构](#2.1 整体架构)
      • [2.2 核心组件](#2.2 核心组件)
      • [2.3 技术栈选择](#2.3 技术栈选择)
    • [3. 数据模型设计](#3. 数据模型设计)
      • [3.1 文件元数据模型](#3.1 文件元数据模型)
      • [3.2 上传会话模型](#3.2 上传会话模型)
    • [4. 核心功能实现](#4. 核心功能实现)
      • [4.1 存储后端抽象](#4.1 存储后端抽象)
      • [4.2 本地存储实现](#4.2 本地存储实现)
      • [4.3 AWS S3存储实现](#4.3 AWS S3存储实现)
      • [4.4 文件上传服务](#4.4 文件上传服务)
      • [4.5 RESTful API服务](#4.5 RESTful API服务)
    • [5. 使用示例](#5. 使用示例)
    • [6. 安全考虑](#6. 安全考虑)
      • [6.1 安全防护措施](#6.1 安全防护措施)
      • [6.2 安全配置示例](#6.2 安全配置示例)
    • [7. 性能优化](#7. 性能优化)
      • [7.1 性能优化策略](#7.1 性能优化策略)
      • [7.2 性能指标](#7.2 性能指标)
    • [8. 代码自查与测试](#8. 代码自查与测试)
      • [8.1 代码质量检查清单](#8.1 代码质量检查清单)
      • [8.2 测试用例](#8.2 测试用例)
    • [9. 总结](#9. 总结)
      • [9.1 系统特点](#9.1 系统特点)
      • [9.2 实际应用场景](#9.2 实际应用场景)
      • [9.3 后续改进方向](#9.3 后续改进方向)

『宝藏代码胶囊开张啦!』------ 我的 CodeCapsule 来咯!✨写代码不再头疼!我的新站点 CodeCapsule 主打一个 "白菜价"+"量身定制 "!无论是卡脖子的毕设/课设/文献复现 ,需要灵光一现的算法改进 ,还是想给项目加个"外挂",这里都有便宜又好用的代码方案等你发现!低成本,高适配,助你轻松通关!速来围观 👉 CodeCapsule官网

文件上传与云存储集成:构建可扩展、安全的文件管理系统

1. 引言

在现代Web应用中,文件上传与存储是一个常见但复杂的需求。从用户头像、文档分享到多媒体内容,文件管理系统需要处理各种挑战:大文件上传、断点续传、格式验证、安全扫描、多存储后端支持等。传统的单体存储方案已经无法满足现代应用的扩展性和可靠性要求,云存储服务为此提供了理想的解决方案。

本文将深入探讨如何设计并实现一个完整的文件上传与云存储集成系统,涵盖从客户端上传、服务端处理到云存储集成的完整流程。我们将实现多存储后端支持(本地存储、AWS S3、阿里云OSS、腾讯云COS等)、分片上传、文件校验、安全防护、图片处理等核心功能,并构建一个RESTful API服务来管理整个流程。

2. 系统架构设计

2.1 整体架构

数据存储
支持服务
文件上传服务
客户端
上传请求
上传请求
API调用
Web前端
API网关
移动端
第三方服务
上传控制器
文件验证器
分片处理器
存储路由
本地存储
AWS S3
阿里云OSS
腾讯云COS
文件元数据
文件处理队列
缩略图生成
视频转码
病毒扫描
CDN分发
边缘节点
监控告警
性能指标
元数据库
缓存Redis
临时数据
对象存储
持久文件

2.2 核心组件

  1. 上传控制器:接收上传请求,管理上传会话
  2. 文件验证器:验证文件类型、大小、内容安全
  3. 分片处理器:处理大文件分片上传和合并
  4. 存储路由:根据配置选择合适的存储后端
  5. 存储适配器:抽象不同云存储服务的接口
  6. 文件处理器:异步处理文件(生成缩略图、转码等)
  7. CDN集成:将文件分发到CDN网络
  8. 元数据管理:管理文件的元数据信息
  9. 访问控制:管理文件的访问权限
  10. 监控告警:监控系统性能和异常

2.3 技术栈选择

  • Web框架:FastAPI (异步支持好,性能高)
  • 云存储:boto3 (AWS S3)、oss2 (阿里云OSS)、cos-python-sdk-v5 (腾讯云COS)
  • 数据库:PostgreSQL (存储文件元数据)
  • 缓存:Redis (存储上传会话、临时数据)
  • 消息队列:Celery + Redis/RabbitMQ (异步任务处理)
  • 文件处理:Pillow (图片处理)、moviepy (视频处理)
  • 安全扫描:ClamAV (病毒扫描)
  • CDN:通过存储服务集成或独立CDN服务
  • 监控:Prometheus + Grafana

3. 数据模型设计

3.1 文件元数据模型

contains
has
logs
owns
stores
File
string
id
PK
string
user_id
FK
string
original_filename
string
stored_filename
string
file_path
string
storage_backend
FK
string
content_type
bigint
file_size
string
md5_hash
string
sha256_hash
jsonb
metadata
string
status
datetime
uploaded_at
datetime
expires_at
boolean
is_public
string
access_url
FileChunk
string
id
PK
string
file_id
FK
int
chunk_number
bigint
chunk_size
string
chunk_hash
string
storage_path
datetime
uploaded_at
FileVersion
string
id
PK
string
file_id
FK
int
version_number
string
stored_filename
bigint
file_size
string
hash
datetime
created_at
FileAccessLog
string
id
PK
string
file_id
FK
string
user_id
FK
string
action
string
ip_address
string
user_agent
datetime
accessed_at
User
string
id
PK
string
username
string
email
datetime
created_at
StorageBackend
string
id
PK
string
name
string
type
jsonb
config
boolean
enabled
int
priority
bigint
max_file_size
text
allowed_types
datetime
created_at

3.2 上传会话模型

对于分片上传,我们需要管理上传会话:
UploadSession
+string session_id
+string user_id
+string file_id
+string original_filename
+string content_type
+int64 total_size
+int chunk_size
+int total_chunks
+int uploaded_chunks
+dict chunks_status
+string status
+datetime created_at
+datetime expires_at
+dict metadata
+get_progress() : float
+is_complete() : bool
+get_missing_chunks() : List[int]

4. 核心功能实现

4.1 存储后端抽象

python 复制代码
# storage/backends/base.py
import hashlib
import json
import os
import tempfile
from abc import ABC, abstractmethod
from datetime import datetime, timedelta
from pathlib import Path
from typing import Dict, List, Optional, Tuple, BinaryIO, Any
import logging

logger = logging.getLogger(__name__)


class StorageBackend(ABC):
    """存储后端抽象基类"""
    
    def __init__(self, backend_id: str, name: str, config: Dict[str, Any]):
        """
        初始化存储后端
        
        Args:
            backend_id: 后端ID
            name: 后端名称
            config: 配置字典
        """
        self.backend_id = backend_id
        self.name = name
        self.config = config
        self.enabled = True
        self.priority = 1
        self.max_file_size = config.get("max_file_size", 10 * 1024 * 1024 * 1024)  # 默认10GB
        self.allowed_types = config.get("allowed_types", "*/*")
        self.created_at = datetime.now()
        
    @abstractmethod
    def upload_file(self, file_obj: BinaryIO, file_path: str, 
                   content_type: Optional[str] = None,
                   metadata: Optional[Dict] = None) -> Tuple[bool, Dict[str, Any]]:
        """
        上传文件
        
        Args:
            file_obj: 文件对象
            file_path: 存储路径
            content_type: 文件类型
            metadata: 元数据
            
        Returns:
            (是否成功, 上传结果)
        """
        pass
    
    @abstractmethod
    def download_file(self, file_path: str) -> Tuple[bool, Optional[BinaryIO]]:
        """
        下载文件
        
        Args:
            file_path: 文件路径
            
        Returns:
            (是否成功, 文件对象)
        """
        pass
    
    @abstractmethod
    def delete_file(self, file_path: str) -> bool:
        """
        删除文件
        
        Args:
            file_path: 文件路径
            
        Returns:
            是否成功
        """
        pass
    
    @abstractmethod
    def file_exists(self, file_path: str) -> bool:
        """
        检查文件是否存在
        
        Args:
            file_path: 文件路径
            
        Returns:
            是否存在
        """
        pass
    
    @abstractmethod
    def get_file_size(self, file_path: str) -> Optional[int]:
        """
        获取文件大小
        
        Args:
            file_path: 文件路径
            
        Returns:
            文件大小(字节)或None
        """
        pass
    
    @abstractmethod
    def get_file_url(self, file_path: str, expires_in: Optional[int] = None) -> Optional[str]:
        """
        获取文件访问URL
        
        Args:
            file_path: 文件路径
            expires_in: 过期时间(秒)
            
        Returns:
            文件URL或None
        """
        pass
    
    @abstractmethod
    def list_files(self, prefix: str = "", limit: int = 100) -> List[Dict[str, Any]]:
        """
        列出文件
        
        Args:
            prefix: 路径前缀
            limit: 最大数量
            
        Returns:
            文件列表
        """
        pass
    
    def validate_config(self) -> Tuple[bool, List[str]]:
        """
        验证配置
        
        Returns:
            (是否有效, 错误列表)
        """
        errors = []
        return len(errors) == 0, errors
    
    def test_connection(self) -> bool:
        """
        测试连接
        
        Returns:
            连接是否成功
        """
        try:
            # 尝试创建测试文件
            test_content = b"test connection"
            test_path = f"test_{datetime.now().timestamp()}.txt"
            
            with tempfile.TemporaryFile() as tmp_file:
                tmp_file.write(test_content)
                tmp_file.seek(0)
                
                success, result = self.upload_file(tmp_file, test_path, "text/plain")
                if not success:
                    return False
                
                # 检查文件是否存在
                if not self.file_exists(test_path):
                    return False
                
                # 删除测试文件
                if not self.delete_file(test_path):
                    logger.warning(f"无法删除测试文件: {test_path}")
            
            return True
        except Exception as e:
            logger.error(f"存储后端连接测试失败: {str(e)}")
            return False
    
    def calculate_hash(self, file_obj: BinaryIO, hash_type: str = "md5") -> str:
        """
        计算文件哈希值
        
        Args:
            file_obj: 文件对象
            hash_type: 哈希类型(md5, sha256)
            
        Returns:
            哈希值
        """
        file_obj.seek(0)
        
        if hash_type == "md5":
            hash_func = hashlib.md5()
        elif hash_type == "sha256":
            hash_func = hashlib.sha256()
        else:
            raise ValueError(f"不支持的哈希类型: {hash_type}")
        
        # 分块读取计算哈希
        chunk_size = 8192
        while True:
            chunk = file_obj.read(chunk_size)
            if not chunk:
                break
            hash_func.update(chunk)
        
        file_obj.seek(0)
        return hash_func.hexdigest()
    
    def to_dict(self) -> Dict[str, Any]:
        """
        转换为字典
        
        Returns:
            字典表示
        """
        return {
            "backend_id": self.backend_id,
            "name": self.name,
            "type": self.type(),
            "config": self.config,
            "enabled": self.enabled,
            "priority": self.priority,
            "max_file_size": self.max_file_size,
            "allowed_types": self.allowed_types,
            "created_at": self.created_at.isoformat()
        }
    
    @classmethod
    @abstractmethod
    def type(cls) -> str:
        """
        返回存储类型
        
        Returns:
            存储类型标识
        """
        pass

4.2 本地存储实现

python 复制代码
# storage/backends/local.py
import os
import shutil
from pathlib import Path
from typing import BinaryIO, Dict, List, Optional, Tuple
from urllib.parse import quote

from .base import StorageBackend


class LocalStorageBackend(StorageBackend):
    """本地文件存储后端"""
    
    def __init__(self, backend_id: str, name: str, config: Dict[str, Any]):
        super().__init__(backend_id, name, config)
        
        # 本地存储配置
        self.storage_root = Path(config.get("storage_root", "uploads"))
        self.base_url = config.get("base_url", "http://localhost:8000/files")
        
        # 确保存储目录存在
        self.storage_root.mkdir(parents=True, exist_ok=True)
    
    @classmethod
    def type(cls) -> str:
        return "local"
    
    def upload_file(self, file_obj: BinaryIO, file_path: str, 
                   content_type: Optional[str] = None,
                   metadata: Optional[Dict] = None) -> Tuple[bool, Dict[str, Any]]:
        """
        上传文件到本地存储
        
        Args:
            file_obj: 文件对象
            file_path: 存储路径
            content_type: 文件类型
            metadata: 元数据
            
        Returns:
            (是否成功, 上传结果)
        """
        try:
            # 构建完整存储路径
            full_path = self.storage_root / file_path
            
            # 确保目录存在
            full_path.parent.mkdir(parents=True, exist_ok=True)
            
            # 写入文件
            with open(full_path, 'wb') as f:
                shutil.copyfileobj(file_obj, f)
            
            # 计算文件大小
            file_size = full_path.stat().st_size
            
            logger.info(f"文件上传成功: {file_path}, 大小: {file_size}字节")
            
            return True, {
                "file_path": file_path,
                "file_size": file_size,
                "stored_at": datetime.now().isoformat(),
                "backend": self.name
            }
        except Exception as e:
            logger.error(f"文件上传失败: {str(e)}")
            return False, {
                "error": str(e),
                "file_path": file_path
            }
    
    def download_file(self, file_path: str) -> Tuple[bool, Optional[BinaryIO]]:
        """
        从本地存储下载文件
        
        Args:
            file_path: 文件路径
            
        Returns:
            (是否成功, 文件对象)
        """
        try:
            full_path = self.storage_root / file_path
            
            if not full_path.exists():
                return False, None
            
            # 以二进制模式打开文件
            file_obj = open(full_path, 'rb')
            
            return True, file_obj
        except Exception as e:
            logger.error(f"文件下载失败: {str(e)}")
            return False, None
    
    def delete_file(self, file_path: str) -> bool:
        """
        从本地存储删除文件
        
        Args:
            file_path: 文件路径
            
        Returns:
            是否成功
        """
        try:
            full_path = self.storage_root / file_path
            
            if full_path.exists():
                full_path.unlink()
                logger.info(f"文件删除成功: {file_path}")
                return True
            else:
                logger.warning(f"文件不存在: {file_path}")
                return False
        except Exception as e:
            logger.error(f"文件删除失败: {str(e)}")
            return False
    
    def file_exists(self, file_path: str) -> bool:
        """
        检查文件是否存在
        
        Args:
            file_path: 文件路径
            
        Returns:
            是否存在
        """
        full_path = self.storage_root / file_path
        return full_path.exists()
    
    def get_file_size(self, file_path: str) -> Optional[int]:
        """
        获取文件大小
        
        Args:
            file_path: 文件路径
            
        Returns:
            文件大小(字节)或None
        """
        full_path = self.storage_root / file_path
        
        if full_path.exists():
            return full_path.stat().st_size
        return None
    
    def get_file_url(self, file_path: str, expires_in: Optional[int] = None) -> Optional[str]:
        """
        获取文件访问URL
        
        Args:
            file_path: 文件路径
            expires_in: 过期时间(秒),本地存储不支持过期
            
        Returns:
            文件URL
        """
        # 对文件路径进行URL编码
        encoded_path = quote(file_path)
        return f"{self.base_url}/{encoded_path}"
    
    def list_files(self, prefix: str = "", limit: int = 100) -> List[Dict[str, Any]]:
        """
        列出文件
        
        Args:
            prefix: 路径前缀
            limit: 最大数量
            
        Returns:
            文件列表
        """
        files = []
        prefix_path = self.storage_root / prefix
        
        try:
            for item in prefix_path.rglob("*"):
                if item.is_file():
                    rel_path = str(item.relative_to(self.storage_root))
                    file_info = {
                        "file_path": rel_path,
                        "file_size": item.stat().st_size,
                        "modified_at": datetime.fromtimestamp(item.stat().st_mtime).isoformat(),
                        "backend": self.name
                    }
                    files.append(file_info)
                    
                    if len(files) >= limit:
                        break
        except Exception as e:
            logger.error(f"列出文件失败: {str(e)}")
        
        return files
    
    def validate_config(self) -> Tuple[bool, List[str]]:
        """
        验证配置
        
        Returns:
            (是否有效, 错误列表)
        """
        errors = super().validate_config()[1]
        
        # 检查存储目录
        storage_root = self.config.get("storage_root")
        if not storage_root:
            errors.append("缺少storage_root配置")
        else:
            # 尝试创建目录
            try:
                Path(storage_root).mkdir(parents=True, exist_ok=True)
                
                # 检查目录是否可写
                test_file = Path(storage_root) / f"test_{datetime.now().timestamp()}.txt"
                try:
                    test_file.write_text("test")
                    test_file.unlink()
                except Exception as e:
                    errors.append(f"存储目录不可写: {str(e)}")
            except Exception as e:
                errors.append(f"无法创建存储目录: {str(e)}")
        
        return len(errors) == 0, errors

4.3 AWS S3存储实现

python 复制代码
# storage/backends/s3.py
import boto3
from botocore.exceptions import ClientError
from botocore.client import Config
from typing import BinaryIO, Dict, List, Optional, Tuple
import io

from .base import StorageBackend


class S3StorageBackend(StorageBackend):
    """AWS S3存储后端"""
    
    def __init__(self, backend_id: str, name: str, config: Dict[str, Any]):
        super().__init__(backend_id, name, config)
        
        # S3配置
        self.bucket_name = config.get("bucket_name")
        self.region = config.get("region", "us-east-1")
        self.access_key_id = config.get("access_key_id")
        self.secret_access_key = config.get("secret_access_key")
        self.endpoint_url = config.get("endpoint_url")  # 兼容S3兼容服务
        self.force_path_style = config.get("force_path_style", False)
        
        # S3客户端配置
        s3_config = Config(
            s3={"addressing_style": "path"} if self.force_path_style else None,
            signature_version='s3v4',
            retries={'max_attempts': 3}
        )
        
        # 创建S3客户端
        self.s3_client = boto3.client(
            's3',
            aws_access_key_id=self.access_key_id,
            aws_secret_access_key=self.secret_access_key,
            region_name=self.region,
            endpoint_url=self.endpoint_url,
            config=s3_config
        )
        
        # S3资源(用于高级操作)
        self.s3_resource = boto3.resource(
            's3',
            aws_access_key_id=self.access_key_id,
            aws_secret_access_key=self.secret_access_key,
            region_name=self.region,
            endpoint_url=self.endpoint_url,
            config=s3_config
        )
        
        self.bucket = self.s3_resource.Bucket(self.bucket_name) if self.bucket_name else None
    
    @classmethod
    def type(cls) -> str:
        return "s3"
    
    def upload_file(self, file_obj: BinaryIO, file_path: str, 
                   content_type: Optional[str] = None,
                   metadata: Optional[Dict] = None) -> Tuple[bool, Dict[str, Any]]:
        """
        上传文件到S3
        
        Args:
            file_obj: 文件对象
            file_path: 存储路径
            content_type: 文件类型
            metadata: 元数据
            
        Returns:
            (是否成功, 上传结果)
        """
        try:
            # 准备额外的参数
            extra_args = {}
            
            if content_type:
                extra_args['ContentType'] = content_type
            
            if metadata:
                # 将元数据转换为S3格式
                s3_metadata = {f"x-amz-meta-{k}": str(v) for k, v in metadata.items()}
                extra_args['Metadata'] = s3_metadata
            
            # 重置文件指针
            file_obj.seek(0)
            
            # 上传文件
            self.s3_client.upload_fileobj(
                file_obj,
                self.bucket_name,
                file_path,
                ExtraArgs=extra_args
            )
            
            # 获取文件信息
            head_response = self.s3_client.head_object(
                Bucket=self.bucket_name,
                Key=file_path
            )
            
            file_size = head_response.get('ContentLength', 0)
            
            logger.info(f"S3文件上传成功: {file_path}, 大小: {file_size}字节")
            
            return True, {
                "file_path": file_path,
                "file_size": file_size,
                "etag": head_response.get('ETag', '').strip('"'),
                "stored_at": datetime.now().isoformat(),
                "backend": self.name
            }
        except ClientError as e:
            logger.error(f"S3文件上传失败: {e.response['Error']['Message']}")
            return False, {
                "error": e.response['Error']['Message'],
                "code": e.response['Error']['Code'],
                "file_path": file_path
            }
        except Exception as e:
            logger.error(f"S3文件上传失败: {str(e)}")
            return False, {
                "error": str(e),
                "file_path": file_path
            }
    
    def download_file(self, file_path: str) -> Tuple[bool, Optional[BinaryIO]]:
        """
        从S3下载文件
        
        Args:
            file_path: 文件路径
            
        Returns:
            (是否成功, 文件对象)
        """
        try:
            # 创建内存文件对象
            file_obj = io.BytesIO()
            
            # 下载文件
            self.s3_client.download_fileobj(
                self.bucket_name,
                file_path,
                file_obj
            )
            
            # 重置指针
            file_obj.seek(0)
            
            return True, file_obj
        except ClientError as e:
            if e.response['Error']['Code'] == '404':
                logger.warning(f"S3文件不存在: {file_path}")
            else:
                logger.error(f"S3文件下载失败: {e.response['Error']['Message']}")
            return False, None
        except Exception as e:
            logger.error(f"S3文件下载失败: {str(e)}")
            return False, None
    
    def delete_file(self, file_path: str) -> bool:
        """
        从S3删除文件
        
        Args:
            file_path: 文件路径
            
        Returns:
            是否成功
        """
        try:
            self.s3_client.delete_object(
                Bucket=self.bucket_name,
                Key=file_path
            )
            
            logger.info(f"S3文件删除成功: {file_path}")
            return True
        except ClientError as e:
            logger.error(f"S3文件删除失败: {e.response['Error']['Message']}")
            return False
        except Exception as e:
            logger.error(f"S3文件删除失败: {str(e)}")
            return False
    
    def file_exists(self, file_path: str) -> bool:
        """
        检查文件是否存在
        
        Args:
            file_path: 文件路径
            
        Returns:
            是否存在
        """
        try:
            self.s3_client.head_object(
                Bucket=self.bucket_name,
                Key=file_path
            )
            return True
        except ClientError as e:
            if e.response['Error']['Code'] == '404':
                return False
            else:
                raise
        except Exception:
            return False
    
    def get_file_size(self, file_path: str) -> Optional[int]:
        """
        获取文件大小
        
        Args:
            file_path: 文件路径
            
        Returns:
            文件大小(字节)或None
        """
        try:
            response = self.s3_client.head_object(
                Bucket=self.bucket_name,
                Key=file_path
            )
            return response.get('ContentLength')
        except ClientError as e:
            if e.response['Error']['Code'] == '404':
                return None
            else:
                logger.error(f"获取S3文件大小失败: {e.response['Error']['Message']}")
                return None
        except Exception as e:
            logger.error(f"获取S3文件大小失败: {str(e)}")
            return None
    
    def get_file_url(self, file_path: str, expires_in: Optional[int] = None) -> Optional[str]:
        """
        获取文件访问URL
        
        Args:
            file_path: 文件路径
            expires_in: 过期时间(秒)
            
        Returns:
            文件URL
        """
        try:
            if expires_in:
                # 生成预签名URL
                url = self.s3_client.generate_presigned_url(
                    'get_object',
                    Params={
                        'Bucket': self.bucket_name,
                        'Key': file_path
                    },
                    ExpiresIn=expires_in
                )
                return url
            else:
                # 生成公开URL(如果bucket是公开的)
                if self.endpoint_url:
                    # 自定义端点
                    return f"{self.endpoint_url}/{self.bucket_name}/{file_path}"
                else:
                    # AWS标准端点
                    return f"https://{self.bucket_name}.s3.{self.region}.amazonaws.com/{file_path}"
        except Exception as e:
            logger.error(f"生成S3文件URL失败: {str(e)}")
            return None
    
    def list_files(self, prefix: str = "", limit: int = 100) -> List[Dict[str, Any]]:
        """
        列出S3文件
        
        Args:
            prefix: 路径前缀
            limit: 最大数量
            
        Returns:
            文件列表
        """
        files = []
        
        try:
            paginator = self.s3_client.get_paginator('list_objects_v2')
            page_iterator = paginator.paginate(
                Bucket=self.bucket_name,
                Prefix=prefix,
                PaginationConfig={'MaxItems': limit}
            )
            
            for page in page_iterator:
                if 'Contents' in page:
                    for obj in page['Contents']:
                        file_info = {
                            "file_path": obj['Key'],
                            "file_size": obj['Size'],
                            "modified_at": obj['LastModified'].isoformat(),
                            "etag": obj['ETag'].strip('"'),
                            "backend": self.name
                        }
                        files.append(file_info)
                
                if len(files) >= limit:
                    break
        
        except ClientError as e:
            logger.error(f"列出S3文件失败: {e.response['Error']['Message']}")
        except Exception as e:
            logger.error(f"列出S3文件失败: {str(e)}")
        
        return files[:limit]
    
    def create_multipart_upload(self, file_path: str, content_type: Optional[str] = None) -> Optional[str]:
        """
        创建分片上传
        
        Args:
            file_path: 文件路径
            content_type: 文件类型
            
        Returns:
            上传ID或None
        """
        try:
            extra_args = {}
            if content_type:
                extra_args['ContentType'] = content_type
            
            response = self.s3_client.create_multipart_upload(
                Bucket=self.bucket_name,
                Key=file_path,
                **extra_args
            )
            
            return response['UploadId']
        except ClientError as e:
            logger.error(f"创建S3分片上传失败: {e.response['Error']['Message']}")
            return None
    
    def upload_part(self, file_path: str, upload_id: str, part_number: int, 
                   file_obj: BinaryIO) -> Tuple[bool, Optional[str]]:
        """
        上传分片
        
        Args:
            file_path: 文件路径
            upload_id: 上传ID
            part_number: 分片编号
            file_obj: 分片文件对象
            
        Returns:
            (是否成功, ETag)
        """
        try:
            file_obj.seek(0)
            
            response = self.s3_client.upload_part(
                Bucket=self.bucket_name,
                Key=file_path,
                UploadId=upload_id,
                PartNumber=part_number,
                Body=file_obj
            )
            
            return True, response['ETag']
        except ClientError as e:
            logger.error(f"上传S3分片失败: {e.response['Error']['Message']}")
            return False, None
    
    def complete_multipart_upload(self, file_path: str, upload_id: str, 
                                 parts: List[Dict[str, Any]]) -> bool:
        """
        完成分片上传
        
        Args:
            file_path: 文件路径
            upload_id: 上传ID
            parts: 分片列表
            
        Returns:
            是否成功
        """
        try:
            # 构建parts参数
            s3_parts = [{'ETag': part['etag'], 'PartNumber': part['part_number']} 
                       for part in parts]
            
            self.s3_client.complete_multipart_upload(
                Bucket=self.bucket_name,
                Key=file_path,
                UploadId=upload_id,
                MultipartUpload={'Parts': s3_parts}
            )
            
            return True
        except ClientError as e:
            logger.error(f"完成S3分片上传失败: {e.response['Error']['Message']}")
            return False
    
    def abort_multipart_upload(self, file_path: str, upload_id: str) -> bool:
        """
        取消分片上传
        
        Args:
            file_path: 文件路径
            upload_id: 上传ID
            
        Returns:
            是否成功
        """
        try:
            self.s3_client.abort_multipart_upload(
                Bucket=self.bucket_name,
                Key=file_path,
                UploadId=upload_id
            )
            
            return True
        except ClientError as e:
            logger.error(f"取消S3分片上传失败: {e.response['Error']['Message']}")
            return False
    
    def validate_config(self) -> Tuple[bool, List[str]]:
        """
        验证S3配置
        
        Returns:
            (是否有效, 错误列表)
        """
        errors = super().validate_config()[1]
        
        # 检查必要配置
        required_fields = ["bucket_name", "access_key_id", "secret_access_key"]
        for field in required_fields:
            if not self.config.get(field):
                errors.append(f"缺少必要字段: {field}")
        
        if not errors:
            # 测试连接
            try:
                # 尝试列出bucket(限制为0以最小化数据传输)
                self.s3_client.list_objects_v2(
                    Bucket=self.bucket_name,
                    MaxKeys=0
                )
            except ClientError as e:
                error_code = e.response['Error']['Code']
                if error_code == 'NoSuchBucket':
                    errors.append(f"Bucket不存在: {self.bucket_name}")
                elif error_code == 'InvalidAccessKeyId':
                    errors.append("无效的Access Key ID")
                elif error_code == 'SignatureDoesNotMatch':
                    errors.append("无效的Secret Access Key")
                else:
                    errors.append(f"S3连接测试失败: {e.response['Error']['Message']}")
            except Exception as e:
                errors.append(f"S3连接测试失败: {str(e)}")
        
        return len(errors) == 0, errors

4.4 文件上传服务

python 复制代码
# services/upload_service.py
import hashlib
import json
import os
import uuid
from datetime import datetime, timedelta
from pathlib import Path
from typing import Dict, List, Optional, Tuple, BinaryIO, Any
import logging

from storage.backends.base import StorageBackend
from storage.backends.local import LocalStorageBackend
from storage.backends.s3 import S3StorageBackend

logger = logging.getLogger(__name__)


class UploadSession:
    """上传会话管理"""
    
    def __init__(self, session_id: str, user_id: str, original_filename: str,
                 content_type: str, total_size: int, chunk_size: int = 5 * 1024 * 1024):
        """
        初始化上传会话
        
        Args:
            session_id: 会话ID
            user_id: 用户ID
            original_filename: 原始文件名
            content_type: 文件类型
            total_size: 总大小(字节)
            chunk_size: 分片大小(字节,默认5MB)
        """
        self.session_id = session_id
        self.user_id = user_id
        self.original_filename = original_filename
        self.content_type = content_type
        self.total_size = total_size
        self.chunk_size = chunk_size
        
        # 计算分片信息
        self.total_chunks = (total_size + chunk_size - 1) // chunk_size
        self.uploaded_chunks = 0
        self.chunks_status = {i: False for i in range(1, self.total_chunks + 1)}
        
        # 会话状态
        self.status = "pending"  # pending, uploading, completed, failed, cancelled
        self.created_at = datetime.now()
        self.expires_at = self.created_at + timedelta(hours=24)
        
        # 文件信息
        self.file_id = None
        self.stored_filename = None
        self.storage_backend = None
        self.file_hash = None
        
        # 元数据
        self.metadata = {}
    
    def update_chunk_status(self, chunk_number: int, status: bool = True):
        """
        更新分片状态
        
        Args:
            chunk_number: 分片编号
            status: 状态
        """
        if 1 <= chunk_number <= self.total_chunks:
            if self.chunks_status[chunk_number] != status:
                self.chunks_status[chunk_number] = status
                if status:
                    self.uploaded_chunks += 1
                else:
                    self.uploaded_chunks -= 1
    
    def get_progress(self) -> float:
        """
        获取上传进度
        
        Returns:
            进度百分比(0-100)
        """
        if self.total_chunks == 0:
            return 100.0
        return (self.uploaded_chunks / self.total_chunks) * 100
    
    def is_complete(self) -> bool:
        """
        检查是否完成
        
        Returns:
            是否完成
        """
        return all(self.chunks_status.values())
    
    def get_missing_chunks(self) -> List[int]:
        """
        获取缺失的分片
        
        Returns:
            缺失的分片编号列表
        """
        return [chunk_num for chunk_num, status in self.chunks_status.items() if not status]
    
    def to_dict(self) -> Dict[str, Any]:
        """
        转换为字典
        
        Returns:
            字典表示
        """
        return {
            "session_id": self.session_id,
            "user_id": self.user_id,
            "original_filename": self.original_filename,
            "content_type": self.content_type,
            "total_size": self.total_size,
            "chunk_size": self.chunk_size,
            "total_chunks": self.total_chunks,
            "uploaded_chunks": self.uploaded_chunks,
            "progress": self.get_progress(),
            "status": self.status,
            "created_at": self.created_at.isoformat(),
            "expires_at": self.expires_at.isoformat(),
            "file_id": self.file_id,
            "metadata": self.metadata,
            "missing_chunks": self.get_missing_chunks()
        }


class FileValidator:
    """文件验证器"""
    
    # 常见MIME类型和扩展名映射
    MIME_TYPES = {
        'image/jpeg': ['.jpg', '.jpeg'],
        'image/png': ['.png'],
        'image/gif': ['.gif'],
        'image/webp': ['.webp'],
        'image/svg+xml': ['.svg'],
        'application/pdf': ['.pdf'],
        'application/msword': ['.doc'],
        'application/vnd.openxmlformats-officedocument.wordprocessingml.document': ['.docx'],
        'application/vnd.ms-excel': ['.xls'],
        'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet': ['.xlsx'],
        'application/vnd.ms-powerpoint': ['.ppt'],
        'application/vnd.openxmlformats-officedocument.presentationml.presentation': ['.pptx'],
        'text/plain': ['.txt'],
        'text/csv': ['.csv'],
        'application/zip': ['.zip'],
        'application/x-rar-compressed': ['.rar'],
        'video/mp4': ['.mp4'],
        'video/avi': ['.avi'],
        'video/x-msvideo': ['.avi'],
        'video/quicktime': ['.mov'],
        'audio/mpeg': ['.mp3'],
        'audio/wav': ['.wav'],
    }
    
    # 危险文件类型
    DANGEROUS_TYPES = [
        'application/x-msdownload',  # .exe
        'application/x-dosexec',     # .exe
        'application/x-msdos-program',  # .com
        'application/bat',           # .bat
        'application/x-bat',         # .bat
        'application/x-ms-shortcut', # .lnk
        'application/x-sh',          # .sh
        'application/x-shellscript', # .sh
    ]
    
    def __init__(self, max_file_size: int = 100 * 1024 * 1024,  # 100MB
                 allowed_types: List[str] = None,
                 block_dangerous: bool = True):
        """
        初始化文件验证器
        
        Args:
            max_file_size: 最大文件大小(字节)
            allowed_types: 允许的文件类型列表
            block_dangerous: 是否阻止危险文件类型
        """
        self.max_file_size = max_file_size
        self.allowed_types = allowed_types or list(self.MIME_TYPES.keys())
        self.block_dangerous = block_dangerous
    
    def validate_file(self, filename: str, content_type: str, 
                     file_size: int, file_obj: Optional[BinaryIO] = None) -> Tuple[bool, List[str]]:
        """
        验证文件
        
        Args:
            filename: 文件名
            content_type: 文件类型
            file_size: 文件大小
            file_obj: 文件对象(可选,用于内容验证)
            
        Returns:
            (是否有效, 错误列表)
        """
        errors = []
        
        # 1. 验证文件大小
        if file_size > self.max_file_size:
            errors.append(f"文件大小超过限制: {self._format_size(file_size)} > {self._format_size(self.max_file_size)}")
        
        # 2. 验证文件类型
        if content_type not in self.allowed_types:
            errors.append(f"不支持的文件类型: {content_type}")
        
        # 3. 验证扩展名与MIME类型是否匹配
        ext = Path(filename).suffix.lower()
        if content_type in self.MIME_TYPES:
            allowed_exts = self.MIME_TYPES[content_type]
            if ext not in allowed_exts:
                errors.append(f"文件扩展名与MIME类型不匹配: {ext} 不属于 {allowed_exts}")
        
        # 4. 检查危险文件类型
        if self.block_dangerous and content_type in self.DANGEROUS_TYPES:
            errors.append(f"危险文件类型: {content_type}")
        
        # 5. 如果提供了文件对象,验证文件内容
        if file_obj and content_type.startswith('image/'):
            is_valid, content_errors = self._validate_image(file_obj)
            if not is_valid:
                errors.extend(content_errors)
        
        return len(errors) == 0, errors
    
    def _validate_image(self, file_obj: BinaryIO) -> Tuple[bool, List[str]]:
        """
        验证图片文件
        
        Args:
            file_obj: 文件对象
            
        Returns:
            (是否有效, 错误列表)
        """
        errors = []
        
        try:
            from PIL import Image
            file_obj.seek(0)
            
            # 尝试打开图片
            try:
                img = Image.open(file_obj)
                img.verify()  # 验证图片完整性
                
                # 重置文件指针
                file_obj.seek(0)
                img = Image.open(file_obj)  # 重新打开以获取详细信息
                
                # 检查图片尺寸
                width, height = img.size
                max_dimension = 10000  # 最大尺寸
                
                if width > max_dimension or height > max_dimension:
                    errors.append(f"图片尺寸过大: {width}x{height}")
                
                # 检查文件格式
                if img.format not in ['JPEG', 'PNG', 'GIF', 'WEBP', 'BMP']:
                    errors.append(f"不支持的图片格式: {img.format}")
                
            except Exception as e:
                errors.append(f"图片文件损坏: {str(e)}")
        
        except ImportError:
            # PIL不可用,跳过图片验证
            pass
        
        finally:
            file_obj.seek(0)
        
        return len(errors) == 0, errors
    
    def _format_size(self, size_bytes: int) -> str:
        """
        格式化文件大小
        
        Args:
            size_bytes: 字节大小
            
        Returns:
            格式化后的字符串
        """
        for unit in ['B', 'KB', 'MB', 'GB']:
            if size_bytes < 1024.0:
                return f"{size_bytes:.2f} {unit}"
            size_bytes /= 1024.0
        return f"{size_bytes:.2f} TB"


class StorageRouter:
    """存储路由器"""
    
    def __init__(self):
        self.backends: Dict[str, StorageBackend] = {}
        self.backend_types = {
            "local": LocalStorageBackend,
            "s3": S3StorageBackend,
        }
    
    def register_backend(self, backend: StorageBackend):
        """
        注册存储后端
        
        Args:
            backend: 存储后端实例
        """
        self.backends[backend.backend_id] = backend
        logger.info(f"注册存储后端: {backend.name} ({backend.type()})")
    
    def create_backend(self, backend_id: str, name: str, backend_type: str,
                      config: Dict[str, Any]) -> Tuple[bool, Dict[str, Any]]:
        """
        创建存储后端
        
        Args:
            backend_id: 后端ID
            name: 后端名称
            backend_type: 后端类型
            config: 配置
            
        Returns:
            (是否成功, 后端信息或错误信息)
        """
        try:
            if backend_type not in self.backend_types:
                return False, {"error": f"不支持的存储类型: {backend_type}"}
            
            backend_class = self.backend_types[backend_type]
            backend = backend_class(backend_id, name, config)
            
            # 验证配置
            is_valid, errors = backend.validate_config()
            if not is_valid:
                return False, {"error": "配置验证失败", "details": errors}
            
            # 测试连接
            if not backend.test_connection():
                return False, {"error": "存储后端连接测试失败"}
            
            # 注册后端
            self.register_backend(backend)
            
            return True, backend.to_dict()
        except Exception as e:
            logger.error(f"创建存储后端失败: {str(e)}")
            return False, {"error": f"创建存储后端失败: {str(e)}"}
    
    def get_backend(self, backend_id: str) -> Optional[StorageBackend]:
        """
        获取存储后端
        
        Args:
            backend_id: 后端ID
            
        Returns:
            存储后端实例或None
        """
        return self.backends.get(backend_id)
    
    def get_backend_by_name(self, name: str) -> Optional[StorageBackend]:
        """
        根据名称获取存储后端
        
        Args:
            name: 后端名称
            
        Returns:
            存储后端实例或None
        """
        for backend in self.backends.values():
            if backend.name == name:
                return backend
        return None
    
    def select_backend(self, file_size: int, content_type: str, 
                      user_id: Optional[str] = None) -> Optional[StorageBackend]:
        """
        选择存储后端
        
        Args:
            file_size: 文件大小
            content_type: 文件类型
            user_id: 用户ID(可选)
            
        Returns:
            选中的存储后端或None
        """
        # 过滤启用的后端
        available_backends = [
            backend for backend in self.backends.values() 
            if backend.enabled
        ]
        
        if not available_backends:
            return None
        
        # 根据文件大小和类型过滤
        suitable_backends = []
        for backend in available_backends:
            # 检查文件大小限制
            if file_size > backend.max_file_size:
                continue
            
            # 检查文件类型限制
            if backend.allowed_types != "*/*":
                # 简化类型检查,实际应该更复杂
                allowed = backend.allowed_types.split(',')
                if content_type not in allowed:
                    continue
            
            suitable_backends.append(backend)
        
        if not suitable_backends:
            return None
        
        # 根据优先级排序(优先级数字小的优先)
        suitable_backends.sort(key=lambda x: x.priority)
        
        # 如果有用户ID,可以根据用户策略选择后端
        # 这里简化实现,返回优先级最高的
        return suitable_backends[0]
    
    def list_backends(self, backend_type: Optional[str] = None) -> List[Dict[str, Any]]:
        """
        列出存储后端
        
        Args:
            backend_type: 过滤后端类型
            
        Returns:
            后端列表
        """
        backends = list(self.backends.values())
        
        if backend_type:
            backends = [b for b in backends if b.type() == backend_type]
        
        # 按优先级排序
        backends.sort(key=lambda x: x.priority)
        
        return [backend.to_dict() for backend in backends]


class UploadService:
    """上传服务"""
    
    def __init__(self, storage_router: StorageRouter, 
                 validator: Optional[FileValidator] = None,
                 session_store=None):
        """
        初始化上传服务
        
        Args:
            storage_router: 存储路由器
            validator: 文件验证器
            session_store: 会话存储
        """
        self.storage_router = storage_router
        self.validator = validator or FileValidator()
        self.session_store = session_store or InMemorySessionStore()
        
        # 分片上传临时存储
        self.chunk_store = InMemoryChunkStore()
    
    def init_upload(self, user_id: str, filename: str, content_type: str,
                   file_size: int, metadata: Optional[Dict] = None) -> Tuple[bool, Dict[str, Any]]:
        """
        初始化上传
        
        Args:
            user_id: 用户ID
            filename: 文件名
            content_type: 文件类型
            file_size: 文件大小
            metadata: 元数据
            
        Returns:
            (是否成功, 上传信息或错误信息)
        """
        try:
            # 验证文件
            is_valid, errors = self.validator.validate_file(filename, content_type, file_size)
            if not is_valid:
                return False, {"error": "文件验证失败", "details": errors}
            
            # 选择存储后端
            backend = self.storage_router.select_backend(file_size, content_type, user_id)
            if not backend:
                return False, {"error": "没有合适的存储后端可用"}
            
            # 生成文件ID和存储路径
            file_id = self._generate_file_id()
            stored_filename = self._generate_stored_filename(filename, user_id)
            
            # 生成会话ID
            session_id = self._generate_session_id()
            
            # 创建上传会话
            session = UploadSession(
                session_id=session_id,
                user_id=user_id,
                original_filename=filename,
                content_type=content_type,
                total_size=file_size
            )
            
            # 设置文件信息
            session.file_id = file_id
            session.stored_filename = stored_filename
            session.storage_backend = backend.backend_id
            
            if metadata:
                session.metadata = metadata
            
            # 保存会话
            self.session_store.save_session(session)
            
            return True, {
                "session_id": session_id,
                "file_id": file_id,
                "chunk_size": session.chunk_size,
                "total_chunks": session.total_chunks,
                "backend": backend.name,
                "expires_at": session.expires_at.isoformat()
            }
        except Exception as e:
            logger.error(f"初始化上传失败: {str(e)}")
            return False, {"error": f"初始化上传失败: {str(e)}"}
    
    def upload_chunk(self, session_id: str, chunk_number: int, 
                    chunk_data: bytes, chunk_hash: Optional[str] = None) -> Tuple[bool, Dict[str, Any]]:
        """
        上传分片
        
        Args:
            session_id: 会话ID
            chunk_number: 分片编号
            chunk_data: 分片数据
            chunk_hash: 分片哈希(可选)
            
        Returns:
            (是否成功, 上传结果)
        """
        try:
            # 获取会话
            session = self.session_store.get_session(session_id)
            if not session:
                return False, {"error": "上传会话不存在或已过期"}
            
            # 检查分片编号有效性
            if not (1 <= chunk_number <= session.total_chunks):
                return False, {"error": f"无效的分片编号: {chunk_number}"}
            
            # 检查分片是否已上传
            if session.chunks_status[chunk_number]:
                return True, {"message": "分片已上传", "chunk_number": chunk_number}
            
            # 验证分片大小
            chunk_size = len(chunk_data)
            expected_size = session.chunk_size
            
            # 如果是最后一个分片,大小可能小于chunk_size
            if chunk_number == session.total_chunks:
                expected_size = session.total_size - (session.total_chunks - 1) * session.chunk_size
            
            if chunk_size != expected_size:
                return False, {
                    "error": f"分片大小不匹配: 期望 {expected_size}, 实际 {chunk_size}",
                    "chunk_number": chunk_number
                }
            
            # 验证分片哈希(如果提供)
            if chunk_hash:
                calculated_hash = hashlib.md5(chunk_data).hexdigest()
                if chunk_hash != calculated_hash:
                    return False, {
                        "error": f"分片哈希不匹配: {chunk_hash} != {calculated_hash}",
                        "chunk_number": chunk_number
                    }
            
            # 存储分片
            chunk_key = f"{session_id}_{chunk_number}"
            self.chunk_store.save_chunk(chunk_key, chunk_data)
            
            # 更新会话状态
            session.update_chunk_status(chunk_number, True)
            session.status = "uploading"
            self.session_store.save_session(session)
            
            return True, {
                "chunk_number": chunk_number,
                "chunk_size": chunk_size,
                "uploaded_chunks": session.uploaded_chunks,
                "progress": session.get_progress(),
                "session_id": session_id
            }
        except Exception as e:
            logger.error(f"上传分片失败: {str(e)}")
            return False, {"error": f"上传分片失败: {str(e)}"}
    
    def complete_upload(self, session_id: str) -> Tuple[bool, Dict[str, Any]]:
        """
        完成上传
        
        Args:
            session_id: 会话ID
            
        Returns:
            (是否成功, 文件信息或错误信息)
        """
        try:
            # 获取会话
            session = self.session_store.get_session(session_id)
            if not session:
                return False, {"error": "上传会话不存在或已过期"}
            
            # 检查是否所有分片都已上传
            if not session.is_complete():
                missing = session.get_missing_chunks()
                return False, {
                    "error": "还有分片未上传",
                    "missing_chunks": missing,
                    "uploaded_chunks": session.uploaded_chunks,
                    "total_chunks": session.total_chunks
                }
            
            # 获取存储后端
            backend = self.storage_router.get_backend(session.storage_backend)
            if not backend:
                return False, {"error": "存储后端不存在"}
            
            # 如果是分片上传,需要合并分片
            if session.total_chunks > 1:
                # 对于支持分片上传的后端(如S3),使用分片上传API
                if hasattr(backend, 'create_multipart_upload'):
                    return self._complete_multipart_upload(session, backend)
                else:
                    # 对于不支持分片上传的后端,需要本地合并
                    return self._merge_and_upload(session, backend)
            else:
                # 单分片上传
                return self._upload_single_file(session, backend)
        except Exception as e:
            logger.error(f"完成上传失败: {str(e)}")
            return False, {"error": f"完成上传失败: {str(e)}"}
    
    def _complete_multipart_upload(self, session: UploadSession, 
                                  backend: StorageBackend) -> Tuple[bool, Dict[str, Any]]:
        """
        完成分片上传(支持分片上传的后端)
        
        Args:
            session: 上传会话
            backend: 存储后端
            
        Returns:
            (是否成功, 文件信息)
        """
        try:
            # 创建分片上传
            upload_id = backend.create_multipart_upload(
                session.stored_filename,
                session.content_type
            )
            
            if not upload_id:
                return False, {"error": "创建分片上传失败"}
            
            # 上传所有分片
            parts = []
            for chunk_number in range(1, session.total_chunks + 1):
                chunk_key = f"{session.session_id}_{chunk_number}"
                chunk_data = self.chunk_store.get_chunk(chunk_key)
                
                if not chunk_data:
                    # 分片丢失,需要重新上传
                    backend.abort_multipart_upload(session.stored_filename, upload_id)
                    return False, {"error": f"分片丢失: {chunk_number}"}
                
                # 上传分片
                import io
                chunk_obj = io.BytesIO(chunk_data)
                
                success, etag = backend.upload_part(
                    session.stored_filename,
                    upload_id,
                    chunk_number,
                    chunk_obj
                )
                
                if not success:
                    backend.abort_multipart_upload(session.stored_filename, upload_id)
                    return False, {"error": f"上传分片失败: {chunk_number}"}
                
                parts.append({
                    "part_number": chunk_number,
                    "etag": etag,
                    "size": len(chunk_data)
                })
            
            # 完成分片上传
            success = backend.complete_multipart_upload(
                session.stored_filename,
                upload_id,
                parts
            )
            
            if not success:
                return False, {"error": "完成分片上传失败"}
            
            # 获取文件信息
            file_size = sum(part["size"] for part in parts)
            
            # 清理分片数据
            self._cleanup_session_data(session)
            
            # 更新会话状态
            session.status = "completed"
            self.session_store.save_session(session)
            
            # 获取文件URL
            file_url = backend.get_file_url(session.stored_filename)
            
            return True, {
                "file_id": session.file_id,
                "filename": session.original_filename,
                "stored_filename": session.stored_filename,
                "content_type": session.content_type,
                "file_size": file_size,
                "backend": backend.name,
                "url": file_url,
                "uploaded_at": datetime.now().isoformat()
            }
        except Exception as e:
            logger.error(f"分片上传失败: {str(e)}")
            # 尝试取消上传
            if 'upload_id' in locals():
                backend.abort_multipart_upload(session.stored_filename, upload_id)
            return False, {"error": f"分片上传失败: {str(e)}"}
    
    def _merge_and_upload(self, session: UploadSession, 
                         backend: StorageBackend) -> Tuple[bool, Dict[str, Any]]:
        """
        合并分片并上传(不支持分片上传的后端)
        
        Args:
            session: 上传会话
            backend: 存储后端
            
        Returns:
            (是否成功, 文件信息)
        """
        try:
            # 合并所有分片到临时文件
            import tempfile
            import io
            
            with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
                # 按顺序写入所有分片
                for chunk_number in range(1, session.total_chunks + 1):
                    chunk_key = f"{session.session_id}_{chunk_number}"
                    chunk_data = self.chunk_store.get_chunk(chunk_key)
                    
                    if not chunk_data:
                        return False, {"error": f"分片丢失: {chunk_number}"}
                    
                    tmp_file.write(chunk_data)
                
                tmp_file.flush()
                
                # 计算文件哈希
                tmp_file.seek(0)
                file_hash = hashlib.sha256(tmp_file.read()).hexdigest()
                tmp_file.seek(0)
                
                # 上传文件
                success, result = backend.upload_file(
                    tmp_file,
                    session.stored_filename,
                    session.content_type,
                    session.metadata
                )
                
                if not success:
                    return False, {"error": result.get("error", "文件上传失败")}
            
            # 清理临时文件
            os.unlink(tmp_file.name)
            
            # 清理分片数据
            self._cleanup_session_data(session)
            
            # 更新会话状态
            session.status = "completed"
            session.file_hash = file_hash
            self.session_store.save_session(session)
            
            # 获取文件URL
            file_url = backend.get_file_url(session.stored_filename)
            
            return True, {
                "file_id": session.file_id,
                "filename": session.original_filename,
                "stored_filename": session.stored_filename,
                "content_type": session.content_type,
                "file_size": result.get("file_size", 0),
                "file_hash": file_hash,
                "backend": backend.name,
                "url": file_url,
                "uploaded_at": datetime.now().isoformat()
            }
        except Exception as e:
            logger.error(f"合并上传失败: {str(e)}")
            return False, {"error": f"合并上传失败: {str(e)}"}
    
    def _upload_single_file(self, session: UploadSession, 
                           backend: StorageBackend) -> Tuple[bool, Dict[str, Any]]:
        """
        上传单文件
        
        Args:
            session: 上传会话
            backend: 存储后端
            
        Returns:
            (是否成功, 文件信息)
        """
        try:
            # 获取分片数据(只有一个分片)
            chunk_key = f"{session.session_id}_1"
            chunk_data = self.chunk_store.get_chunk(chunk_key)
            
            if not chunk_data:
                return False, {"error": "文件数据丢失"}
            
            import io
            file_obj = io.BytesIO(chunk_data)
            
            # 计算文件哈希
            file_hash = hashlib.sha256(chunk_data).hexdigest()
            
            # 上传文件
            success, result = backend.upload_file(
                file_obj,
                session.stored_filename,
                session.content_type,
                session.metadata
            )
            
            if not success:
                return False, {"error": result.get("error", "文件上传失败")}
            
            # 清理分片数据
            self._cleanup_session_data(session)
            
            # 更新会话状态
            session.status = "completed"
            session.file_hash = file_hash
            self.session_store.save_session(session)
            
            # 获取文件URL
            file_url = backend.get_file_url(session.stored_filename)
            
            return True, {
                "file_id": session.file_id,
                "filename": session.original_filename,
                "stored_filename": session.stored_filename,
                "content_type": session.content_type,
                "file_size": result.get("file_size", len(chunk_data)),
                "file_hash": file_hash,
                "backend": backend.name,
                "url": file_url,
                "uploaded_at": datetime.now().isoformat()
            }
        except Exception as e:
            logger.error(f"单文件上传失败: {str(e)}")
            return False, {"error": f"单文件上传失败: {str(e)}"}
    
    def _cleanup_session_data(self, session: UploadSession):
        """清理会话数据"""
        # 清理分片数据
        for chunk_number in range(1, session.total_chunks + 1):
            chunk_key = f"{session.session_id}_{chunk_number}"
            self.chunk_store.delete_chunk(chunk_key)
    
    def get_upload_status(self, session_id: str) -> Optional[Dict[str, Any]]:
        """
        获取上传状态
        
        Args:
            session_id: 会话ID
            
        Returns:
            上传状态或None
        """
        session = self.session_store.get_session(session_id)
        if session:
            return session.to_dict()
        return None
    
    def cancel_upload(self, session_id: str) -> bool:
        """
        取消上传
        
        Args:
            session_id: 会话ID
            
        Returns:
            是否成功
        """
        session = self.session_store.get_session(session_id)
        if not session:
            return False
        
        # 清理分片数据
        self._cleanup_session_data(session)
        
        # 更新会话状态
        session.status = "cancelled"
        self.session_store.save_session(session)
        
        return True
    
    def _generate_file_id(self) -> str:
        """生成文件ID"""
        timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
        random_str = str(uuid.uuid4())[:8]
        return f"FILE{timestamp}{random_str}"
    
    def _generate_session_id(self) -> str:
        """生成会话ID"""
        timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
        random_str = str(uuid.uuid4())[:8]
        return f"SESS{timestamp}{random_str}"
    
    def _generate_stored_filename(self, original_filename: str, user_id: str) -> str:
        """生成存储文件名"""
        # 提取扩展名
        from pathlib import Path
        ext = Path(original_filename).suffix
        
        # 生成唯一文件名
        timestamp = datetime.now().strftime("%Y%m%d%H%M%S")
        random_str = str(uuid.uuid4())[:8]
        
        # 构建存储路径: user_id/year/month/day/filename
        now = datetime.now()
        path = f"{user_id}/{now.year}/{now.month:02d}/{now.day:02d}"
        
        return f"{path}/{timestamp}_{random_str}{ext}"


class InMemorySessionStore:
    """内存会话存储(简化实现)"""
    
    def __init__(self):
        self.sessions = {}  # session_id -> UploadSession
    
    def save_session(self, session: UploadSession):
        """保存会话"""
        self.sessions[session.session_id] = session
    
    def get_session(self, session_id: str) -> Optional[UploadSession]:
        """获取会话"""
        return self.sessions.get(session_id)
    
    def delete_session(self, session_id: str) -> bool:
        """删除会话"""
        if session_id in self.sessions:
            del self.sessions[session_id]
            return True
        return False


class InMemoryChunkStore:
    """内存分片存储(简化实现)"""
    
    def __init__(self):
        self.chunks = {}  # chunk_key -> bytes
    
    def save_chunk(self, chunk_key: str, chunk_data: bytes):
        """保存分片"""
        self.chunks[chunk_key] = chunk_data
    
    def get_chunk(self, chunk_key: str) -> Optional[bytes]:
        """获取分片"""
        return self.chunks.get(chunk_key)
    
    def delete_chunk(self, chunk_key: str):
        """删除分片"""
        if chunk_key in self.chunks:
            del self.chunks[chunk_key]

4.5 RESTful API服务

python 复制代码
# api/main.py
from fastapi import FastAPI, UploadFile, File, Form, HTTPException, Depends, Query
from fastapi.responses import JSONResponse, StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from typing import Optional, List
import logging

from services.upload_service import UploadService, StorageRouter, FileValidator

# 配置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

# 创建FastAPI应用
app = FastAPI(
    title="文件上传与云存储服务",
    description="支持多存储后端、分片上传、文件验证的完整文件上传服务",
    version="1.0.0"
)

# 添加CORS中间件
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# 创建服务实例
storage_router = StorageRouter()
validator = FileValidator(
    max_file_size=100 * 1024 * 1024,  # 100MB
    allowed_types=[
        'image/jpeg',
        'image/png',
        'image/gif',
        'image/webp',
        'application/pdf',
        'text/plain',
        'application/zip',
        'video/mp4',
    ]
)
upload_service = UploadService(storage_router, validator)

# 初始化默认存储后端
@app.on_event("startup")
async def startup_event():
    """应用启动时初始化"""
    try:
        # 创建本地存储后端
        local_config = {
            "storage_root": "uploads",
            "base_url": "http://localhost:8000/files"
        }
        
        success, result = storage_router.create_backend(
            backend_id="local_01",
            name="本地存储",
            backend_type="local",
            config=local_config
        )
        
        if success:
            logger.info(f"本地存储后端初始化成功: {result['backend_id']}")
        else:
            logger.error(f"本地存储后端初始化失败: {result.get('error')}")
    
    except Exception as e:
        logger.error(f"启动初始化失败: {str(e)}")


# API端点
@app.get("/")
async def root():
    """根端点"""
    return {
        "service": "文件上传与云存储服务",
        "version": "1.0.0",
        "endpoints": {
            "上传文件": "/upload",
            "分片上传初始化": "/upload/init",
            "上传分片": "/upload/chunk",
            "完成上传": "/upload/complete",
            "获取上传状态": "/upload/status/{session_id}",
            "取消上传": "/upload/cancel/{session_id}",
            "下载文件": "/files/{file_path}",
            "列出存储后端": "/storage/backends"
        }
    }


@app.post("/upload/init")
async def init_upload(
    user_id: str = Form(...),
    filename: str = Form(...),
    content_type: str = Form(...),
    file_size: int = Form(...),
    metadata: Optional[str] = Form(None)
):
    """初始化上传"""
    try:
        # 解析元数据
        metadata_dict = {}
        if metadata:
            try:
                metadata_dict = json.loads(metadata)
            except json.JSONDecodeError:
                return JSONResponse(
                    status_code=400,
                    content={"error": "无效的元数据格式"}
                )
        
        # 初始化上传
        success, result = upload_service.init_upload(
            user_id=user_id,
            filename=filename,
            content_type=content_type,
            file_size=file_size,
            metadata=metadata_dict
        )
        
        if not success:
            return JSONResponse(
                status_code=400,
                content=result
            )
        
        return result
    except Exception as e:
        logger.error(f"初始化上传失败: {str(e)}")
        raise HTTPException(status_code=500, detail=f"初始化上传失败: {str(e)}")


@app.post("/upload/chunk")
async def upload_chunk(
    session_id: str = Form(...),
    chunk_number: int = Form(...),
    chunk_hash: Optional[str] = Form(None),
    file: UploadFile = File(...)
):
    """上传分片"""
    try:
        # 读取分片数据
        chunk_data = await file.read()
        
        # 上传分片
        success, result = upload_service.upload_chunk(
            session_id=session_id,
            chunk_number=chunk_number,
            chunk_data=chunk_data,
            chunk_hash=chunk_hash
        )
        
        if not success:
            return JSONResponse(
                status_code=400,
                content=result
            )
        
        return result
    except Exception as e:
        logger.error(f"上传分片失败: {str(e)}")
        raise HTTPException(status_code=500, detail=f"上传分片失败: {str(e)}")


@app.post("/upload/complete")
async def complete_upload(session_id: str = Form(...)):
    """完成上传"""
    try:
        success, result = upload_service.complete_upload(session_id)
        
        if not success:
            return JSONResponse(
                status_code=400,
                content=result
            )
        
        return result
    except Exception as e:
        logger.error(f"完成上传失败: {str(e)}")
        raise HTTPException(status_code=500, detail=f"完成上传失败: {str(e)}")


@app.get("/upload/status/{session_id}")
async def get_upload_status(session_id: str):
    """获取上传状态"""
    status = upload_service.get_upload_status(session_id)
    
    if not status:
        raise HTTPException(status_code=404, detail="上传会话不存在")
    
    return status


@app.post("/upload/cancel/{session_id}")
async def cancel_upload(session_id: str):
    """取消上传"""
    success = upload_service.cancel_upload(session_id)
    
    if not success:
        raise HTTPException(status_code=404, detail="上传会话不存在")
    
    return {"message": "上传已取消"}


@app.post("/upload/simple")
async def simple_upload(
    user_id: str = Form(...),
    file: UploadFile = File(...),
    metadata: Optional[str] = Form(None)
):
    """简单上传(小文件)"""
    try:
        # 读取文件数据
        file_data = await file.read()
        file_size = len(file_data)
        
        # 解析元数据
        metadata_dict = {}
        if metadata:
            try:
                metadata_dict = json.loads(metadata)
            except json.JSONDecodeError:
                return JSONResponse(
                    status_code=400,
                    content={"error": "无效的元数据格式"}
                )
        
        # 初始化上传(单分片)
        success, init_result = upload_service.init_upload(
            user_id=user_id,
            filename=file.filename,
            content_type=file.content_type,
            file_size=file_size,
            metadata=metadata_dict
        )
        
        if not success:
            return JSONResponse(
                status_code=400,
                content=init_result
            )
        
        session_id = init_result["session_id"]
        
        # 上传分片(只有一个分片)
        success, chunk_result = upload_service.upload_chunk(
            session_id=session_id,
            chunk_number=1,
            chunk_data=file_data
        )
        
        if not success:
            return JSONResponse(
                status_code=400,
                content=chunk_result
            )
        
        # 完成上传
        success, final_result = upload_service.complete_upload(session_id)
        
        if not success:
            return JSONResponse(
                status_code=400,
                content=final_result
            )
        
        return final_result
    except Exception as e:
        logger.error(f"简单上传失败: {str(e)}")
        raise HTTPException(status_code=500, detail=f"简单上传失败: {str(e)}")


@app.get("/files/{file_path:path}")
async def download_file(file_path: str, download: bool = Query(False)):
    """下载文件"""
    try:
        # 这里简化实现,实际应该从数据库查询文件信息
        # 并选择正确的存储后端
        
        # 使用本地存储后端
        backend = storage_router.get_backend_by_name("本地存储")
        if not backend:
            raise HTTPException(status_code=500, detail="存储后端不可用")
        
        # 检查文件是否存在
        if not backend.file_exists(file_path):
            raise HTTPException(status_code=404, detail="文件不存在")
        
        # 下载文件
        success, file_obj = backend.download_file(file_path)
        
        if not success or not file_obj:
            raise HTTPException(status_code=500, detail="文件下载失败")
        
        # 获取文件大小
        file_size = backend.get_file_size(file_path) or 0
        
        # 设置响应头
        headers = {}
        if download:
            # 如果是下载模式,添加Content-Disposition头
            from urllib.parse import quote
            filename = file_path.split('/')[-1]
            encoded_filename = quote(filename)
            headers["Content-Disposition"] = f'attachment; filename="{encoded_filename}"'
        
        return StreamingResponse(
            file_obj,
            media_type="application/octet-stream",
            headers=headers
        )
    except HTTPException:
        raise
    except Exception as e:
        logger.error(f"文件下载失败: {str(e)}")
        raise HTTPException(status_code=500, detail=f"文件下载失败: {str(e)}")


@app.get("/storage/backends")
async def list_storage_backends(backend_type: Optional[str] = None):
    """列出存储后端"""
    backends = storage_router.list_backends(backend_type)
    return backends


@app.post("/storage/backends")
async def create_storage_backend(
    backend_id: str = Form(...),
    name: str = Form(...),
    backend_type: str = Form(...),
    config: str = Form(...)
):
    """创建存储后端"""
    try:
        # 解析配置
        try:
            config_dict = json.loads(config)
        except json.JSONDecodeError:
            return JSONResponse(
                status_code=400,
                content={"error": "无效的配置格式"}
            )
        
        # 创建后端
        success, result = storage_router.create_backend(
            backend_id=backend_id,
            name=name,
            backend_type=backend_type,
            config=config_dict
        )
        
        if not success:
            return JSONResponse(
                status_code=400,
                content=result
            )
        
        return result
    except Exception as e:
        logger.error(f"创建存储后端失败: {str(e)}")
        raise HTTPException(status_code=500, detail=f"创建存储后端失败: {str(e)}")


@app.get("/health")
async def health_check():
    """健康检查"""
    return {
        "status": "healthy",
        "timestamp": datetime.now().isoformat(),
        "storage_backends": len(storage_router.backends)
    }


if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

5. 使用示例

python 复制代码
# examples/demo.py
"""
文件上传服务使用示例
"""
import requests
import json
import os
from pathlib import Path

# 服务地址
BASE_URL = "http://localhost:8000"


def test_simple_upload():
    """测试简单上传"""
    print("=" * 60)
    print("测试简单上传")
    print("=" * 60)
    
    # 创建测试文件
    test_content = b"这是一个测试文件内容" * 1000  # 约20KB
    test_file = Path("test_file.txt")
    test_file.write_bytes(test_content)
    
    try:
        # 准备请求数据
        files = {
            'file': ('test_file.txt', test_content, 'text/plain')
        }
        
        data = {
            'user_id': 'user_001',
            'metadata': json.dumps({
                'description': '测试文件',
                'category': 'document'
            })
        }
        
        # 发送请求
        response = requests.post(
            f"{BASE_URL}/upload/simple",
            files=files,
            data=data
        )
        
        if response.status_code == 200:
            result = response.json()
            print("✓ 简单上传成功!")
            print(f"  文件ID: {result['file_id']}")
            print(f"  文件名: {result['filename']}")
            print(f"  文件大小: {result['file_size']} 字节")
            print(f"  存储后端: {result['backend']}")
            print(f"  访问URL: {result['url']}")
            
            # 测试下载
            print(f"\n测试文件下载...")
            download_response = requests.get(result['url'])
            if download_response.status_code == 200:
                downloaded_content = download_response.content
                if downloaded_content == test_content:
                    print("✓ 文件下载验证成功!")
                else:
                    print("✗ 文件下载验证失败: 内容不匹配")
            else:
                print(f"✗ 文件下载失败: {download_response.status_code}")
        
        else:
            print(f"✗ 简单上传失败: {response.status_code}")
            print(f"  错误信息: {response.text}")
    
    finally:
        # 清理测试文件
        if test_file.exists():
            test_file.unlink()


def test_chunked_upload():
    """测试分片上传"""
    print("\n" + "=" * 60)
    print("测试分片上传")
    print("=" * 60)
    
    # 创建大文件(5MB)
    file_size = 5 * 1024 * 1024  # 5MB
    test_content = os.urandom(file_size)
    
    try:
        # 1. 初始化上传
        print("1. 初始化上传...")
        init_data = {
            'user_id': 'user_001',
            'filename': 'large_file.bin',
            'content_type': 'application/octet-stream',
            'file_size': file_size,
            'metadata': json.dumps({
                'description': '大文件测试',
                'chunked': True
            })
        }
        
        init_response = requests.post(
            f"{BASE_URL}/upload/init",
            data=init_data
        )
        
        if init_response.status_code != 200:
            print(f"✗ 初始化上传失败: {init_response.status_code}")
            print(f"  错误信息: {init_response.text}")
            return
        
        init_result = init_response.json()
        session_id = init_result['session_id']
        chunk_size = init_result['chunk_size']
        total_chunks = init_result['total_chunks']
        
        print(f"✓ 初始化成功!")
        print(f"  会话ID: {session_id}")
        print(f"  分片大小: {chunk_size} 字节")
        print(f"  总分片数: {total_chunks}")
        
        # 2. 上传分片
        print(f"\n2. 上传分片...")
        for chunk_num in range(1, total_chunks + 1):
            # 计算分片范围
            start = (chunk_num - 1) * chunk_size
            end = min(chunk_num * chunk_size, file_size)
            chunk_data = test_content[start:end]
            
            # 计算分片哈希
            import hashlib
            chunk_hash = hashlib.md5(chunk_data).hexdigest()
            
            # 准备请求
            files = {
                'file': (f'chunk_{chunk_num}', chunk_data, 'application/octet-stream')
            }
            
            data = {
                'session_id': session_id,
                'chunk_number': chunk_num,
                'chunk_hash': chunk_hash
            }
            
            # 上传分片
            chunk_response = requests.post(
                f"{BASE_URL}/upload/chunk",
                files=files,
                data=data
            )
            
            if chunk_response.status_code == 200:
                chunk_result = chunk_response.json()
                progress = chunk_result['progress']
                print(f"  分片 {chunk_num}/{total_chunks} 上传成功,进度: {progress:.1f}%")
            else:
                print(f"✗ 分片 {chunk_num} 上传失败: {chunk_response.status_code}")
                return
        
        # 3. 获取上传状态
        print(f"\n3. 获取上传状态...")
        status_response = requests.get(f"{BASE_URL}/upload/status/{session_id}")
        
        if status_response.status_code == 200:
            status = status_response.json()
            print(f"✓ 上传状态:")
            print(f"  进度: {status['progress']:.1f}%")
            print(f"  已上传分片: {status['uploaded_chunks']}/{status['total_chunks']}")
            print(f"  状态: {status['status']}")
        
        # 4. 完成上传
        print(f"\n4. 完成上传...")
        complete_data = {
            'session_id': session_id
        }
        
        complete_response = requests.post(
            f"{BASE_URL}/upload/complete",
            data=complete_data
        )
        
        if complete_response.status_code == 200:
            complete_result = complete_response.json()
            print("✓ 上传完成!")
            print(f"  文件ID: {complete_result['file_id']}")
            print(f"  文件大小: {complete_result['file_size']} 字节")
            print(f"  文件哈希: {complete_result.get('file_hash', 'N/A')}")
            print(f"  访问URL: {complete_result['url']}")
        else:
            print(f"✗ 完成上传失败: {complete_response.status_code}")
            print(f"  错误信息: {complete_response.text}")
    
    except Exception as e:
        print(f"✗ 分片上传测试失败: {str(e)}")


def test_storage_backends():
    """测试存储后端管理"""
    print("\n" + "=" * 60)
    print("测试存储后端管理")
    print("=" * 60)
    
    # 列出存储后端
    print("1. 列出存储后端...")
    response = requests.get(f"{BASE_URL}/storage/backends")
    
    if response.status_code == 200:
        backends = response.json()
        print(f"✓ 当前有 {len(backends)} 个存储后端:")
        for backend in backends:
            print(f"  - {backend['name']} ({backend['type']}): {backend['backend_id']}")
    else:
        print(f"✗ 获取存储后端失败: {response.status_code}")


def test_health_check():
    """测试健康检查"""
    print("\n" + "=" * 60)
    print("测试健康检查")
    print("=" * 60)
    
    response = requests.get(f"{BASE_URL}/health")
    
    if response.status_code == 200:
        health = response.json()
        print(f"✓ 服务状态: {health['status']}")
        print(f"  时间戳: {health['timestamp']}")
        print(f"  存储后端数量: {health['storage_backends']}")
    else:
        print(f"✗ 健康检查失败: {response.status_code}")


def main():
    """主函数"""
    print("文件上传与云存储服务测试")
    print("=" * 60)
    
    # 测试健康检查
    test_health_check()
    
    # 测试存储后端管理
    test_storage_backends()
    
    # 测试简单上传
    test_simple_upload()
    
    # 测试分片上传
    test_chunked_upload()
    
    print("\n" + "=" * 60)
    print("测试完成!")
    print("=" * 60)


if __name__ == "__main__":
    main()

6. 安全考虑

6.1 安全防护措施

  1. 文件类型验证

    • MIME类型检查
    • 文件扩展名验证
    • 文件内容签名验证
    • 危险文件类型拦截
  2. 文件大小限制

    • 单文件大小限制
    • 总存储空间限制
    • 分片大小限制
  3. 访问控制

    • 用户认证和授权
    • 文件访问权限控制
    • 签名URL过期机制
  4. 内容安全

    • 病毒扫描
    • 敏感内容检测
    • 图片/视频内容审查
  5. 数据安全

    • 传输加密 (HTTPS)
    • 存储加密
    • 数据备份和恢复

6.2 安全配置示例

yaml 复制代码
security:
  file_validation:
    max_file_size: 104857600  # 100MB
    allowed_types:
      - image/jpeg
      - image/png
      - image/gif
      - application/pdf
      - text/plain
    block_dangerous: true
    virus_scan: true
    
  access_control:
    require_auth: true
    default_permission: private
    signed_url_expiry: 3600  # 1小时
    
  encryption:
    transport: tls_1.3
    storage: aes_256
    
  monitoring:
    failed_upload_threshold: 10
    suspicious_activity_alerts: true

7. 性能优化

7.1 性能优化策略

  1. 分片上传优化

    • 并行上传分片
    • 断点续传
    • 分片大小自适应调整
  2. 存储优化

    • CDN集成
    • 边缘存储
    • 缓存策略
  3. 处理优化

    • 异步文件处理
    • 批量操作
    • 延迟加载
  4. 网络优化

    • 连接池
    • 压缩传输
    • 多区域上传

7.2 性能指标

  • 上传吞吐量: 吞吐量 = 总数据量 总时间 \text{吞吐量} = \frac{\text{总数据量}}{\text{总时间}} 吞吐量=总时间总数据量
  • 并发处理能力: 并发数 = 活跃连接数 平均处理时间 \text{并发数} = \frac{\text{活跃连接数}}{\text{平均处理时间}} 并发数=平均处理时间活跃连接数
  • 成功率: 成功率 = 成功上传数 总上传数 × 100 % \text{成功率} = \frac{\text{成功上传数}}{\text{总上传数}} \times 100\% 成功率=总上传数成功上传数×100%
  • 平均延迟: 延迟 = ∑ 请求处理时间 请求总数 \text{延迟} = \frac{\sum \text{请求处理时间}}{\text{请求总数}} 延迟=请求总数∑请求处理时间

8. 代码自查与测试

8.1 代码质量检查清单

  1. 错误处理

    • 所有外部调用都有异常处理
    • 错误信息明确且安全
    • 资源正确释放(文件句柄、连接等)
  2. 安全验证

    • 输入验证和过滤
    • 文件类型和大小验证
    • 访问控制检查
  3. 性能考虑

    • 大文件处理优化
    • 内存使用控制
    • 并发处理安全
  4. 可维护性

    • 清晰的代码结构
    • 详细的注释
    • 配置化管理
  5. 测试覆盖

    • 单元测试
    • 集成测试
    • 性能测试

8.2 测试用例

python 复制代码
# tests/test_upload_service.py
import pytest
import io
from unittest.mock import Mock, patch
from services.upload_service import UploadService, StorageRouter, FileValidator


class TestUploadService:
    """上传服务测试"""
    
    def setup_method(self):
        """测试设置"""
        self.storage_router = Mock(spec=StorageRouter)
        self.validator = Mock(spec=FileValidator)
        self.upload_service = UploadService(self.storage_router, self.validator)
    
    def test_init_upload_success(self):
        """测试初始化上传成功"""
        # 模拟验证通过
        self.validator.validate_file.return_value = (True, [])
        
        # 模拟存储后端选择
        mock_backend = Mock()
        mock_backend.backend_id = "test_backend"
        mock_backend.name = "Test Backend"
        self.storage_router.select_backend.return_value = mock_backend
        
        # 调用初始化上传
        success, result = self.upload_service.init_upload(
            user_id="user_001",
            filename="test.txt",
            content_type="text/plain",
            file_size=1024
        )
        
        # 验证结果
        assert success is True
        assert "session_id" in result
        assert "file_id" in result
        assert result["backend"] == "Test Backend"
        
        # 验证方法调用
        self.validator.validate_file.assert_called_once()
        self.storage_router.select_backend.assert_called_once()
    
    def test_init_upload_validation_failed(self):
        """测试初始化上传验证失败"""
        # 模拟验证失败
        self.validator.validate_file.return_value = (False, ["文件大小超限"])
        
        # 调用初始化上传
        success, result = self.upload_service.init_upload(
            user_id="user_001",
            filename="test.txt",
            content_type="text/plain",
            file_size=1024 * 1024 * 1024  # 1GB,应该超限
        )
        
        # 验证结果
        assert success is False
        assert "error" in result
        assert "文件验证失败" in result["error"]
        
        # 验证存储后端选择没有被调用
        self.storage_router.select_backend.assert_not_called()
    
    def test_upload_chunk_success(self):
        """测试上传分片成功"""
        # 创建模拟会话
        from services.upload_service import UploadSession
        session = UploadSession(
            session_id="test_session",
            user_id="user_001",
            original_filename="test.txt",
            content_type="text/plain",
            total_size=1024 * 1024,  # 1MB
            chunk_size=512 * 1024    # 512KB
        )
        
        # 模拟会话存储
        self.upload_service.session_store.get_session = Mock(return_value=session)
        
        # 测试数据
        chunk_data = b"test chunk data" * 1000  # 约15KB
        chunk_number = 1
        
        # 调用上传分片
        success, result = self.upload_service.upload_chunk(
            session_id="test_session",
            chunk_number=chunk_number,
            chunk_data=chunk_data
        )
        
        # 验证结果
        assert success is True
        assert result["chunk_number"] == chunk_number
        assert result["progress"] > 0
        
        # 验证分片被存储
        chunk_key = f"test_session_{chunk_number}"
        stored_chunk = self.upload_service.chunk_store.get_chunk(chunk_key)
        assert stored_chunk == chunk_data
    
    def test_complete_upload_single_chunk(self):
        """测试完成单分片上传"""
        # 创建模拟会话(单分片)
        from services.upload_service import UploadSession
        session = UploadSession(
            session_id="test_session",
            user_id="user_001",
            original_filename="test.txt",
            content_type="text/plain",
            total_size=1024,  # 1KB
            chunk_size=1024   # 1KB
        )
        session.file_id = "test_file"
        session.stored_filename = "test.txt"
        session.storage_backend = "test_backend"
        
        # 设置分片状态
        session.update_chunk_status(1, True)
        
        # 模拟会话存储
        self.upload_service.session_store.get_session = Mock(return_value=session)
        
        # 存储分片数据
        chunk_data = b"test file content"
        self.upload_service.chunk_store.save_chunk("test_session_1", chunk_data)
        
        # 模拟存储后端
        mock_backend = Mock()
        mock_backend.backend_id = "test_backend"
        mock_backend.name = "Test Backend"
        mock_backend.upload_file.return_value = (True, {"file_size": len(chunk_data)})
        mock_backend.get_file_url.return_value = "http://example.com/test.txt"
        
        self.storage_router.get_backend.return_value = mock_backend
        
        # 调用完成上传
        success, result = self.upload_service.complete_upload("test_session")
        
        # 验证结果
        assert success is True
        assert result["file_id"] == "test_file"
        assert result["filename"] == "test.txt"
        assert result["file_size"] == len(chunk_data)
        assert "url" in result
        
        # 验证方法调用
        mock_backend.upload_file.assert_called_once()
        mock_backend.get_file_url.assert_called_once()


if __name__ == "__main__":
    pytest.main([__file__, "-v"])

9. 总结

9.1 系统特点

  1. 多存储后端支持:可扩展的存储架构,支持本地存储、AWS S3、阿里云OSS等多种存储服务
  2. 分片上传:支持大文件分片上传,实现断点续传和并行上传
  3. 文件验证:完整的文件验证机制,包括类型、大小、内容安全等检查
  4. 安全可靠:多重安全防护,包括访问控制、内容扫描、数据加密等
  5. 高性能:异步处理、CDN集成、缓存优化等性能提升策略
  6. 易于扩展:模块化设计,便于添加新的存储后端和处理功能

9.2 实际应用场景

  1. 企业文档管理:员工文件上传、版本控制、权限管理
  2. 媒体内容平台:图片、视频上传和处理,CDN分发
  3. 电商平台:商品图片、用户评价图片上传
  4. 云存储服务:提供类似网盘的文件存储服务
  5. 数据备份:自动化数据备份到多种云存储

9.3 后续改进方向

  1. 更多存储后端:添加腾讯云COS、Google Cloud Storage、Azure Blob Storage等支持
  2. 智能路由:根据文件类型、大小、访问频率智能选择存储后端
  3. 高级处理:图片智能裁剪、视频自动转码、文档OCR识别
  4. 数据分析:文件使用分析、存储成本优化建议
  5. AI增强:智能内容识别、自动标签生成、敏感内容检测

通过本文的实现,我们构建了一个完整、可扩展、安全的文件上传与云存储集成系统。该系统不仅提供了基本的上传下载功能,还涵盖了企业级应用所需的安全防护、性能优化和可扩展性考虑。实际部署时,还需要根据具体业务需求进行配置和优化,但本文提供的架构和实现已经为构建生产级文件上传服务奠定了坚实的基础。

相关推荐
霖霖总总11 小时前
[小技巧19]MySQL 权限管理全指南:用户、角色、授权与安全实践
数据库·mysql·安全
tianyuanwo18 小时前
合并XFS分区:将独立分区安全融入LVM的完整指南
安全·lvm
智驱力人工智能18 小时前
守护流动的规则 基于视觉分析的穿越导流线区检测技术工程实践 交通路口导流区穿越实时预警技术 智慧交通部署指南
人工智能·opencv·安全·目标检测·计算机视觉·cnn·边缘计算
2501_9458374320 小时前
云服务器的防护体系构建之道
网络·安全
Akamai中国1 天前
基准测试:Akamai云上的NVIDIA RTX Pro 6000 Blackwell
人工智能·云计算·云服务·云存储
小红卒1 天前
海康威视未授权访问漏洞 (CVE-2017-7921)复现研究
安全
Data-Miner1 天前
精品可编辑PPT | 大模型与智能体安全风险治理与防护
安全
咕噜企业分发小米1 天前
直播云服务器安全防护有哪些最新的技术趋势?
运维·服务器·安全
测试19981 天前
Web自动化测试入门
自动化测试·软件测试·python·功能测试·selenium·测试工具·测试用例