Spring Boot：大文件上传实战 - 文件分片上传 + 断点续传 + 秒传（MD5 校验）

传统文件上传的痛点

在我们的日常开发工作中，经常会遇到这样的文件上传难题：

用户上传几个G的视频文件，网络中断导致上传失败，需要重新开始

大文件上传占用服务器大量带宽，影响其他用户访问

相同文件重复上传，浪费存储空间和带宽

上传进度无法实时显示，用户体验差

服务器内存被大量上传请求占满，导致服务不稳定

传统的单文件上传方式在面对大文件时显得力不从心。今天我们就来聊聊如何构建一个高效的大文件上传系统。

解决方案核心思路

1. 文件分片上传

将大文件切分成多个小片段，分别上传，降低单次请求的压力。

2. 断点续传

记录上传进度，网络中断后可以从断点继续上传，避免重新上传。

3. MD5校验秒传

通过MD5校验判断文件是否已存在，实现秒传功能。

4. 并发控制

合理控制并发上传的分片数量，平衡上传效率和服务器压力。

核心实现方案

1. 文件分片处理

java 复制代码

@Service
publicclass FileChunkService {
    
    public List<FileChunk> splitFile(MultipartFile file, int chunkSize) {
        List<FileChunk> chunks = new ArrayList<>();
        long fileSize = file.getSize();
        int chunkCount = (int) Math.ceil((double) fileSize / chunkSize);
        
        try {
            InputStream inputStream = file.getInputStream();
            byte[] buffer = newbyte[chunkSize];
            
            for (int i = 0; i < chunkCount; i++) {
                int bytesRead = inputStream.read(buffer);
                if (bytesRead == -1) break;
                
                byte[] chunkData = Arrays.copyOf(buffer, bytesRead);
                FileChunk chunk = new FileChunk();
                chunk.setIndex(i);
                chunk.setData(chunkData);
                chunk.setTotalChunks(chunkCount);
                chunk.setSize(bytesRead);
                
                chunks.add(chunk);
            }
        } catch (IOException e) {
            thrownew RuntimeException("文件分片失败", e);
        }
        
        return chunks;
    }
}

2. MD5校验与秒传

java 复制代码

@Service
publicclass FileMd5Service {
    
    public String calculateFileMd5(byte[] fileData) {
        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            byte[] hashBytes = md.digest(fileData);
            StringBuilder sb = new StringBuilder();
            for (byte b : hashBytes) {
                sb.append(String.format("%02x", b));
            }
            return sb.toString();
        } catch (NoSuchAlgorithmException e) {
            thrownew RuntimeException("MD5算法不可用", e);
        }
    }
    
    public boolean isFileExists(String md5) {
        // 检查文件是否已存在于数据库
        return fileRepository.existsByMd5(md5);
    }
    
    public boolean isChunkExists(String md5, int chunkIndex) {
        // 检查分片是否已存在
        return fileChunkRepository.existsByMd5AndChunkIndex(md5, chunkIndex);
    }
}

3. 上传进度管理

java 复制代码

@Service
publicclass UploadProgressService {
    
    privatefinal Map<String, UploadProgress> progressMap = new ConcurrentHashMap<>();
    
    public void updateProgress(String uploadId, int currentChunk, int totalChunks) {
        UploadProgress progress = progressMap.computeIfAbsent(uploadId, k -> new UploadProgress());
        progress.setUploadId(uploadId);
        progress.setCurrentChunk(currentChunk);
        progress.setTotalChunks(totalChunks);
        progress.setPercentage((currentChunk * 100) / totalChunks);
        progress.setLastUpdateTime(LocalDateTime.now());
    }
    
    public UploadProgress getProgress(String uploadId) {
        return progressMap.get(uploadId);
    }
    
    public void removeProgress(String uploadId) {
        progressMap.remove(uploadId);
    }
}

4. 分片上传接口

java 复制代码

@RestController
@RequestMapping("/api/upload")
publicclass FileUploadController {
    
    @Autowired
    private FileChunkService fileChunkService;
    
    @Autowired
    private FileMd5Service fileMd5Service;
    
    @Autowired
    private UploadProgressService uploadProgressService;
    
    @PostMapping("/chunk")
    public ResponseEntity<UploadResponse> uploadChunk(
            @RequestParam("file") MultipartFile file,
            @RequestParam("md5") String fileMd5,
            @RequestParam("chunkIndex") int chunkIndex,
            @RequestParam("totalChunks") int totalChunks) {
        
        // 1. 检查是否已存在该分片
        if (fileMd5Service.isChunkExists(fileMd5, chunkIndex)) {
            // 分片已存在，跳过上传
            uploadProgressService.updateProgress(fileMd5, chunkIndex + 1, totalChunks);
            return ResponseEntity.ok(new UploadResponse("SUCCESS", "分片已存在"));
        }
        
        // 2. 保存分片
        FileChunk chunk = new FileChunk();
        chunk.setMd5(fileMd5);
        chunk.setChunkIndex(chunkIndex);
        chunk.setTotalChunks(totalChunks);
        chunk.setData(file.getBytes());
        chunk.setFileSize(file.getSize());
        
        fileChunkRepository.save(chunk);
        
        // 3. 更新上传进度
        uploadProgressService.updateProgress(fileMd5, chunkIndex + 1, totalChunks);
        
        return ResponseEntity.ok(new UploadResponse("SUCCESS", "分片上传成功"));
    }
    
    @PostMapping("/complete")
    public ResponseEntity<UploadResponse> completeUpload(
            @RequestParam("md5") String fileMd5,
            @RequestParam("fileName") String fileName,
            @RequestParam("fileSize") long fileSize) {
        
        // 1. 检查所有分片是否上传完成
        int uploadedChunks = fileChunkRepository.countByMd5(fileMd5);
        Optional<FileChunk> firstChunk = fileChunkRepository.findFirstByMd5(fileMd5);
        
        if (firstChunk.isPresent() && uploadedChunks == firstChunk.get().getTotalChunks()) {
            // 2. 合并分片
            mergeChunks(fileMd5, fileName);
            
            // 3. 记录文件信息
            FileInfo fileInfo = new FileInfo();
            fileInfo.setMd5(fileMd5);
            fileInfo.setFileName(fileName);
            fileInfo.setFileSize(fileSize);
            fileInfo.setFilePath(generateFilePath(fileMd5, fileName));
            fileInfo.setUploadTime(LocalDateTime.now());
            
            fileRepository.save(fileInfo);
            
            // 4. 清理临时分片
            cleanupTempChunks(fileMd5);
            
            // 5. 清理进度信息
            uploadProgressService.removeProgress(fileMd5);
            
            return ResponseEntity.ok(new UploadResponse("SUCCESS", "文件合并完成"));
        } else {
            return ResponseEntity.badRequest()
                    .body(new UploadResponse("ERROR", "分片上传不完整"));
        }
    }
}

前端配合实现

1. 文件分片上传

java 复制代码

// 前端文件分片处理
function uploadFile(file) {
    const chunkSize = 2 * 1024 * 1024; // 2MB
    const chunks = [];
    let start = 0;
    
    // 计算文件MD5
    const fileReader = new FileReader();
    fileReader.onload = function(e) {
        const md5 = SparkMD5.ArrayBuffer.hash(e.target.result);
        
        // 检查是否秒传
        checkFileExists(md5).then(exists => {
            if (exists) {
                console.log('文件已存在，秒传');
                return;
            }
            
            // 分片上传
            while (start < file.size) {
                const chunk = file.slice(start, start + chunkSize);
                chunks.push({
                    index: chunks.length,
                    data: chunk
                });
                start += chunkSize;
            }
            
            uploadChunks(chunks, md5);
        });
    };
    
    fileReader.readAsArrayBuffer(file);
}

2. 上传进度展示

java 复制代码

function uploadChunks(chunks, fileMd5) {
    let uploadedChunks = 0;
    
    // 并发上传分片，限制并发数
    const concurrentLimit = 3;
    const uploadingQueue = [...chunks];
    
    const uploadNext = () => {
        if (uploadingQueue.length === 0) {
            // 所有分片上传完成，合并文件
            completeUpload(fileMd5);
            return;
        }
        
        const chunk = uploadingQueue.shift();
        const formData = new FormData();
        formData.append('file', chunk.data);
        formData.append('md5', fileMd5);
        formData.append('chunkIndex', chunk.index);
        formData.append('totalChunks', chunks.length);
        
        fetch('/api/upload/chunk', {
            method: 'POST',
            body: formData
        }).then(response => {
            uploadedChunks++;
            const progress = (uploadedChunks / chunks.length) * 100;
            updateProgressBar(progress);
        }).finally(() => {
            uploadNext(); // 继续上传下一个分片
        });
    };
    
    // 启动并发上传
    for (let i = 0; i < concurrentLimit && i < chunks.length; i++) {
        uploadNext();
    }
}

高级特性实现

1. 断点续传

java 复制代码

@PostMapping("/resume-check")
public ResponseEntity<ResumeCheckResponse> checkResume(
        @RequestParam("md5") String fileMd5,
        @RequestParam("totalChunks") int totalChunks) {
    
    // 检查已上传的分片
    List<Integer> uploadedChunks = fileChunkRepository.findUploadedChunkIndexes(fileMd5);
    
    ResumeCheckResponse response = new ResumeCheckResponse();
    response.setNeedUploadChunks(findMissingChunks(uploadedChunks, totalChunks));
    response.setUploadProgress(uploadedChunks.size() * 100 / totalChunks);
    
    return ResponseEntity.ok(response);
}

2. 并发控制

java 复制代码

@Service
public class ChunkUploadThrottler {
    
    private final Semaphore semaphore = new Semaphore(10); // 限制并发数
    
    public void acquire() throws InterruptedException {
        semaphore.acquire();
    }
    
    public void release() {
        semaphore.release();
    }
}

3. 文件合并优化

java 复制代码

private void mergeChunks(String fileMd5, String fileName) {
    try {
        List<FileChunk> chunks = fileChunkRepository.findByMd5OrderByChunkIndex(fileMd5);
        
        String filePath = generateFilePath(fileMd5, fileName);
        Path outputPath = Paths.get(filePath);
        
        try (FileChannel outputChannel = FileChannel.open(outputPath, 
                StandardOpenOption.CREATE, StandardOpenOption.WRITE)) {
            
            for (FileChunk chunk : chunks) {
                ByteBuffer buffer = ByteBuffer.wrap(chunk.getData());
                outputChannel.write(buffer);
            }
        }
    } catch (IOException e) {
        thrownew RuntimeException("文件合并失败", e);
    }
}

性能优化策略

1. 内存优化

使用流式处理，避免将整个文件加载到内存

合理设置分片大小，平衡内存使用和网络效率

2. 存储优化

及时清理已完成合并的临时分片

使用对象存储服务存储最终文件

3. 网络优化

合理设置并发上传数量

实现分片压缩传输

最佳实践建议

分片大小选择：通常2-5MB为宜，根据网络环境调整

并发控制：限制并发上传数量，避免服务器压力过大

临时文件清理：设置过期时间，自动清理未完成的上传

安全考虑：验证文件类型和大小，防止恶意上传

监控告警：监控上传成功率、失败率等关键指标

通过这套完整的大文件上传方案，我们可以有效解决传统文件上传的各种痛点，提供流畅的用户体验。

补充知识

java IO 流

Java 文件流（File I/O）是 Java 中用于处理文件读写的核心 API。它主要位于 java.io 包中，近年来也引入了 java.nio.file 包（NIO.2）来提供更强大的文件操作功能。
1. 流的概念

在 Java 中，流（Stream）是数据序列的抽象。输入流用于读取数据，输出流用于写入数据。流可以基于字节（字节流）或字符（字符流）。
2. 字节流与字符流

字节流：以字节为单位读写数据，常用于处理二进制文件（如图像、视频等）。主要类有 InputStream 和 OutputStream。

字符流：以字符为单位读写数据，常用于处理文本文件。主要类有 Reader 和 Writer。
3. 常用的流类

字节流

FileInputStream：从文件读取字节。

FileOutputStream：向文件写入字节。

BufferedInputStream：带缓冲的字节输入流，提高读取效率。

BufferedOutputStream：带缓冲的字节输出流，提高写入效率。

字符流

FileReader：方便读取字符文件。

FileWriter：方便写入字符文件。

BufferedReader：带缓冲的字符输入流，可以一次读取一行。

BufferedWriter：带缓冲的字符输出流。

4. 使用示例

字节流示例

java 复制代码

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;

public class ByteStreamExample {
    public static void main(String[] args) {
        // 写入文件
        try (FileOutputStream fos = new FileOutputStream("example.txt")) {
            String content = "Hello, World!";
            fos.write(content.getBytes());
        } catch (IOException e) {
            e.printStackTrace();
        }

        // 读取文件
        try (FileInputStream fis = new FileInputStream("example.txt")) {
            int byteData;
            while ((byteData = fis.read()) != -1) {
                System.out.print((char) byteData);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

字符流示例（使用缓冲）

java 复制代码

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;

public class CharacterStreamExample {
    public static void main(String[] args) {
        // 写入文件
        try (BufferedWriter writer = new BufferedWriter(new FileWriter("example.txt"))) {
            writer.write("Hello, World!");
            writer.newLine(); // 换行
            writer.write("This is a new line.");
        } catch (IOException e) {
            e.printStackTrace();
        }

        // 读取文件
        try (BufferedReader reader = new BufferedReader(new FileReader("example.txt"))) {
            String line;
            while ((line = reader.readLine()) != null) {
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

5. NIO.2（java.nio.file 包）

从 Java 7 开始，引入了 java.nio.file 包，提供了更强大、更灵活的文件操作 API。主要类有 Paths、Path、Files 等。

NIO.2 示例

java 复制代码

import java.nio.file.*;
import java.util.List;
import java.io.IOException;

public class NIOExample {
    public static void main(String[] args) {
        Path path = Paths.get("example.txt");

        // 写入文件
        try {
            Files.write(path, "Hello, NIO!".getBytes(), StandardOpenOption.CREATE);
        } catch (IOException e) {
            e.printStackTrace();
        }

        // 读取文件
        try {
            List<String> lines = Files.readAllLines(path);
            for (String line : lines) {
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

        // 复制文件
        Path dest = Paths.get("example_copy.txt");
        try {
            Files.copy(path, dest, StandardCopyOption.REPLACE_EXISTING);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

6. 注意事项

异常处理：文件操作可能抛出 IOException，必须妥善处理。建议使用 try-with-resources 语句自动关闭资源。

缓冲流：对于频繁的读写操作，使用缓冲流可以显著提高性能。

字符编码：字符流涉及编码问题，建议明确指定字符集（如 UTF-8）以避免乱码。

路径分隔符：不同操作系统的路径分隔符不同，可以使用 File.separator 或 Path 类来处理。

7. 选择字节流还是字符流？

如果处理的是文本文件，使用字符流（特别是带缓冲的）更方便。

如果处理的是二进制文件（如图片、视频、压缩包等），必须使用字节流。

8. 文件流操作步骤

建立文件流对象（关联文件）

进行读写操作

关闭流（释放资源）

9. 自动资源管理（try-with-resources）

java 复制代码

try (InputStream is = new FileInputStream("file.txt")) {
    // 使用流
} catch (IOException e) {
    e.printStackTrace();
}
// 不需要显式关闭，try-with-resources 会自动调用 close()