Spring Boot 实现文档智能解析与向量化:支持 Tika、MinerU、OCR 与 SSE 实时进度反馈
适用场景 :AI 知识库、RAG 系统、智能文档管理、企业搜索
技术栈:Spring Boot 3 + Apache Tika + MinerU + PaddleOCR + JodConverter(隐含)+ SSE + 向量数据库
一、为什么需要文档智能解析?
在构建基于 检索增强生成(RAG) 的 AI 应用时,第一步往往是 将非结构化文档(PDF/Word/PPT/图片等)转化为可检索的文本向量。这个过程通常包含两个关键步骤:
- 文档解析(Parsing):提取原始文本、表格、图像等内容;
- 向量化(Embedding):将文本送入 Embedding 模型,生成向量存入向量数据库。
而不同格式的文档,需要不同的解析策略:
.docx/.xlsx→ Apache Tika- 高精度 PDF(含公式/图表)→ MinerU
- 图片/PDF 扫描件 → OCR(如 PaddleOCR)
本文将带你实现一个 统一入口、多引擎支持、带实时进度反馈 的文档解析与向量化服务。
二、整体架构设计
+---------------------+
| DocumentParseController |
+----------+----------+
|
+------------------+------------------+
| | |
[Tika 解析] [MinerU 解析] [OCR 图片识别]
| | |
v v v
+----------------+ +----------------+ +----------------+
| 文本提取 | | 结构化输出 | | 文字识别 |
+----------------+ +----------------+ +----------------+
| | |
+--------+---------+--------+---------+
| 向量化 (Embedding)
v
+--------------+
| 向量数据库存储 |
+--------------+
同时提供两种接口:
- 同步接口:适用于小文件,直接返回结果;
- SSE 异步流式接口:适用于大文件,前端可实时显示"解析中 → 向量化中 → 完成"。
三、核心代码解析
1. Controller 层:统一入口,智能路由
java
@RestController
@RequestMapping("${spring.application.name}/document/parse")
@AllArgsConstructor
@Tag(name = "DocumentParseController", description = "文档解析API")
public class DocumentParseController {
private final DocumentParseService documentParseService;
private final EmbeddingFileService embeddingFileService;
@PostMapping(value = "/tikaParseEmbeddingFileSse", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
@Operation(summary = "tika文件解析并向量")
public SseEmitter tikaParseEmbeddingFileSse(@RequestPart("file") MultipartFile file) {
AssertUtil.notNull(file, "请上传文件");
String filename = file.getOriginalFilename();
AssertUtil.notNull(filename, "请上传文件");
String fileSuffix = filename.substring(filename.lastIndexOf(".") + 1);
// 判断文件后缀如果不是jpg/png/jpeg等图片类型
if (fileSuffix.equals("jpg") || fileSuffix.equals("png") || fileSuffix.equals("jpeg")) {
return embeddingFileService.ocrImage(file);
} else {
return embeddingFileService.tikaParseEmbeddingFileSse(file);
}
}
@PostMapping(value = "/tikaParseEmbeddingFile", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
@Operation(summary = "tika文件解析并向量")
public Result<ParseResultVo> tikaParseEmbeddingFile(@RequestPart("file") MultipartFile file) {
AssertUtil.notNull(file, "请上传文件");
String filename = file.getOriginalFilename();
AssertUtil.notNull(filename, "请上传文件");
String fileSuffix = filename.substring(filename.lastIndexOf(".") + 1);
ParseResultVo result = new ParseResultVo();
// 判断文件后缀如果不是jpg/png/jpeg等图片类型
if (fileSuffix.equals("jpg") || fileSuffix.equals("png") || fileSuffix.equals("jpeg")) {
String ocrImageText = documentParseService.ocr(file);
result.setText(ocrImageText);
} else {
result = embeddingFileService.tikaParseEmbeddingFile(file);
}
return R.successWithData(result);
}
@PostMapping(value = "/tika", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
@Operation(summary = "tika文件解析")
public Result<FileInfoVo> tika(@RequestPart("file") MultipartFile file) {
ParseResultVo result = documentParseService.tikaParse(file);
return R.successWithData(result.getInfo());
}
@PostMapping(value = "/minerU", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
@Operation(summary = "minerU文件解析")
@Parameters({@Parameter(name = "pageStartNo", description = "解析开始页码"), @Parameter(name = "pageEndNo", description = "解析结束页码")})
public Result<FileInfoVo> minerU(@RequestPart("file") MultipartFile file, @RequestParam(value = "pageStartNo", required = false) Integer pageStartNo, @RequestParam(value = "pageEndNo", required = false) Integer pageEndNo) {
ParseResultVo result = documentParseService.minerUParse(file, pageStartNo, pageEndNo);
return R.successWithData(result.getInfo());
}
@PostMapping(value = "/ocr", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
@Operation(summary = "图片识别提取文字")
public Result<String> ocr(@RequestPart("file") MultipartFile file) {
return R.successWithData(documentParseService.ocr(file));
}
}
✅ 亮点:自动根据文件后缀选择解析引擎,对前端透明。
2. DocumentParseService:多引擎解析实现
(1)Apache Tika ------ 通用文档解析
java
@Override
public ParseResultVo tikaParse(MultipartFile file) {
if (file == null || file.isEmpty()) {
throw new ServiceException("文件信息为空");
}
//临时目录
String relativeTempPath = ossProperties.getLocal().getPath() + File.separator + FileTypeEnum.TEMP.getFolder() + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr();
try {
//保存到临时文件
String tempPath = relativeTempPath + File.separator + file.getOriginalFilename();
File tempFile = FileUtil.writeFromStream(file.getInputStream(), tempPath);
return this.tikaParse(tempFile, file);
} catch (IOException e) {
log.error(e.getMessage(), e);
throw new ServiceException(BaseExceptionEnum.SERVER_ERROR);
} finally {
FileUtil.del(relativeTempPath);
}
}
@Override
public ParseResultVo tikaParse(File file, MultipartFile multipartFile) {
//文件空验证
if (file == null || file.isDirectory() || file.length() <= 0) {
throw new ServiceException("请传入有效文件!");
}
//将解析内容存入临时文件
String sourcePath = ossProperties.getLocal().getPath() + File.separator + FileTypeEnum.TEMP.getFolder() + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr() + File.separator + multipartFile.getOriginalFilename();
// String resultPath = ossProperties.getLocal().getPath() + File.separator + FileTypeEnum.TEMP.getFolder() + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr() + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr() + ".txt";
// 使用Tika解析文件内容
try (InputStream inputStream = FileUtil.getInputStream(file)) {
BodyContentHandler handler = new BodyContentHandler(10000000);
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
parser.parse(inputStream, handler, metadata);
StringBuilder resultString = new StringBuilder(handler.toString());
//解析结果中添加元信息
for (String name : metadata.names()) {
//跳过X-TIKA元信息
if (name.startsWith("X-TIKA")) {
continue;
}
resultString.append(name).append("\t").append(metadata.get(name)).append("\n");
}
//解析结果为空
if (StrUtil.isBlank(resultString)) {
log.info(metadata.toString());
throw new ServiceException("文件解析结果为空!");
}
// 解析结果中添加元信息
Map<String, Object> fileMetadata = new HashMap<>();
fileMetadata.put("file_name", file.getName());
fileMetadata.put("file_size", file.length());
fileMetadata.put("file_type", file);
fileMetadata.put("parse_time", LocalDateTime.now().toString());
fileMetadata.put("document_id", IdGeneratorUtil.getSnowflakeNextIdStr());
//将解析结果存入临时文件、然后上传到文件服务器
File result = FileUtil.writeUtf8String(resultString.toString(), sourcePath);
FileInfoVo fileInfo = ossService.save(result, FileTypeEnum.OTHER);
return new ParseResultVo(resultString.toString(), fileMetadata, fileInfo);
} catch (ServiceException ex) {
throw ex;
} catch (Exception ex) {
log.error(ex.getMessage(), ex);
throw new ServiceException(BaseExceptionEnum.SERVER_ERROR);
} finally {
FileUtil.del(sourcePath);
}
}
⚠️ 注意:Tika 对复杂 PDF(扫描件、公式)支持有限,此时应使用 MinerU。
(2)MinerU ------ 高精度 PDF 解析(含公式/表格)
MinerU 是一个基于深度学习的 PDF 解析工具,能输出 结构化 JSON + Markdown + 图像。
java
@Override
public ParseResultVo minerUParse(MultipartFile file, Integer pageStartNo, Integer pageEndNo) {
if (file == null || file.isEmpty()) {
throw new ServiceException("文件信息为空");
}
//临时目录
String relativeTempPath = ossProperties.getLocal().getPath() + File.separator + FileTypeEnum.TEMP.getFolder() + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr();
try {
//保存到临时文件
String tempPath = relativeTempPath + File.separator + file.getOriginalFilename();
File tempFile = FileUtil.writeFromStream(file.getInputStream(), tempPath);
return this.minerUParse(tempFile, pageStartNo, pageEndNo);
} catch (IOException e) {
log.error(e.getMessage(), e);
throw new ServiceException(BaseExceptionEnum.SERVER_ERROR);
} finally {
//删除临时目录
FileUtil.del(relativeTempPath);
}
}
@Override
public ParseResultVo minerUParse(File file, Integer pageStartNo, Integer pageEndNo) {
//文件空验证
if (file == null || file.isDirectory() || file.length() <= 0) {
throw new ServiceException("请传入有效文件!");
}
//检查文件类型
String extName = FileUtil.getSuffix(file);
if (!StrUtil.equalsAnyIgnoreCase(extName, FileExtNameConstant.PDF)) {
throw new ServiceException("请传入一个pdf文件!");
}
return this.invokeMinerUParse(file, pageStartNo, pageEndNo);
}
@Override
public ParseResultVo minerUParse(FileInfoVo fileInfo, Integer pageStartNo, Integer pageEndNo) {
//检查文件类型
String extName = FileUtil.getSuffix(fileInfo.getFileName());
if (!StrUtil.equalsAnyIgnoreCase(extName, FileExtNameConstant.PDF)) {
throw new ServiceException("请传入一个pdf文件!");
}
//下载文件
InputStream is = ossService.getInputStream(fileInfo.getFilePath());
//保存文件
String relativeTempPath = ossProperties.getLocal().getPath() + File.separator + FileTypeEnum.TEMP.getFolder() + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr();
String tempPath = relativeTempPath + File.separator + fileInfo.getFileName();
File file = FileUtil.writeFromStream(is, tempPath);
try {
return this.invokeMinerUParse(file, pageStartNo, pageEndNo);
} finally {
//删除临时文件
FileUtil.del(relativeTempPath);
}
}
/**
* 调用minerU解析文件
*
* @param file 文件
* @param pageStartNo 开始页码
* @param pageEndNo 结束页码
* @return 文件信息
*/
private ParseResultVo invokeMinerUParse(File file, Integer pageStartNo, Integer pageEndNo) {
//文件空验证
if (file == null || file.isDirectory() || file.length() <= 0) {
throw new ServiceException("请传入有效文件!");
}
//minerU 文件解析命令
StringJoiner command = new StringJoiner(StrUtil.SPACE);
command.add(dictProperties.getDict().get("miner-u-command"));
// 存在开始结束页码,且结束页码小于开始页码时,交换值
if (pageStartNo != null && pageEndNo != null) {
if (pageStartNo > pageEndNo) {
int temp = pageStartNo;
pageStartNo = pageEndNo;
pageEndNo = temp;
}
}
//拼接参数开始页码
if (pageStartNo != null && pageStartNo >= 0) {
command.add("-s").add(pageStartNo.toString());
}
//拼接参数结束页码
if (pageEndNo != null && pageEndNo >= 0) {
command.add("-e").add(pageEndNo.toString());
}
//拼接参数输入目录
command.add("-p").add(file.getAbsolutePath());
//拼接参数输出目录
String relativeOutputPath = ossProperties.getLocal().getPath() + File.separator + FileTypeEnum.TEMP.getFolder() + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr();
command.add("-o").add(relativeOutputPath);
log.info("执行命令:{}", command);
Process process = null;
try {
//执行命令
process = RuntimeUtil.exec(command.toString());
//获取打印,阻塞等待执行完成
String consoleResult = IoUtil.read(process.getInputStream(), CharsetUtil.defaultCharset());
log.info("命令执行完成:{}", consoleResult);
//处理执行结果
FileInfoVo fileInfo = this.handlerExecComplete(file.getName(), relativeOutputPath);
// 解析结果中添加元信息
Map<String, Object> fileMetadata = new HashMap<>();
fileMetadata.put("file_name", file.getName());
fileMetadata.put("file_size", file.length());
fileMetadata.put("file_type", file);
fileMetadata.put("parse_time", LocalDateTime.now().toString());
fileMetadata.put("document_id", IdGeneratorUtil.getSnowflakeNextIdStr());
return new ParseResultVo(null, fileMetadata, fileInfo);
} finally {
//删除临时目录
FileUtil.del(relativeOutputPath);
if (process != null) {
process.destroy();
}
}
}
/**
* 处理命令执行完成
*
* @param fileName 文件名
* @param outputPath 输出目录
* @return 文件信息
*/
private FileInfoVo handlerExecComplete(String fileName, String outputPath) {
//文件名,不含后缀
String name = FileNameUtil.getPrefix(fileName);
//检查结果文件是否存在,验证是否执行成功
String resultFilePath = outputPath + File.separator + name + File.separator + "vlm" + File.separator + name + "_model.json";
if (!FileUtil.isFile(resultFilePath)) {
log.warn("minerU文件解析失败,结果文件不存在:{}", resultFilePath);
throw new ServiceException("文件解析失败,失败原因请查看日志!");
}
//压缩结果文件
String outputResult = outputPath + File.separator + name + File.separator + "vlm";
String zipPath = outputPath + File.separator + name + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr() + StrUtil.DOT + FileExtNameConstant.ZIP;
File zip = ZipUtil.zip(outputResult, zipPath, StandardCharsets.UTF_8, false);
//上传到文件服务器,返回文件信息
return ossService.save(zip, FileTypeEnum.OTHER);
}
💡 输出示例:
{ "pages": [ { "blocks": [...] } ] },可直接用于前端渲染。
(3)OCR ------ 图片文字识别(PaddleOCR)
使用 mymonstercat/ocr-java 封装的 PaddleOCR:
java
@Override
public String ocr(MultipartFile file) {
if (file == null || file.isEmpty()) {
throw new ServiceException("文件为空!");
}
//相对路径
String relativePath = ossProperties.getLocal().getPath() + File.separator + FileTypeEnum.TEMP.getFolder() + File.separator + IdGeneratorUtil.getSnowflakeNextIdStr();
try {
//图片预处理,提高OCR识别率
// InputStream is = ImageUtil.preprocessImage(file.getInputStream());
//写入临时文件
String imgPath = relativePath + File.separator + file.getOriginalFilename();
File imgFile = FileUtil.writeFromStream(file.getInputStream(), imgPath);
//验证文件格式
String type = FileTypeUtil.getType(imgFile);
if (type == null || type.startsWith("image/")) {
throw new ServiceException("请传入图片文件!");
}
// 获取OCR引擎实例
InferenceEngine engine = InferenceEngine.getInstance(Model.ONNX_PPOCR_V4);
//参数,启用方向识别
ParamConfig config = ParamConfig.getDefaultConfig();
config.setDoAngle(true);
config.setMostAngle(true);
//执行OCR识别
OcrResult result = engine.runOcr(imgPath, config);
//替换空白字符
return result.getStrRes().replaceAll("\\s", StrUtil.EMPTY);
} catch (Exception e) {
log.error(e.getMessage(), e);
throw new ServiceException(BaseExceptionEnum.SERVER_ERROR);
} finally {
FileUtil.del(relativePath);
}
}
✅ 支持中文、英文、数字、表格,准确率高。
3. EmbeddingFileService:向量化 + SSE 进度反馈
这是整个系统的"智能大脑":
java
import com.baomidou.mybatisplus.core.conditions.query.LambdaQueryWrapper;
import jakarta.annotation.Resource;
import lombok.extern.slf4j.Slf4j;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;
import org.springframework.web.servlet.mvc.method.annotation.SseEmitter;
import java.io.IOException;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicReference;
@Slf4j
@Service
public class EmbeddingFileServiceImpl implements EmbeddingFileService {
public static final ExecutorService THREAD_POOL = Executors.newFixedThreadPool(10);
private final Map<String, AtomicReference<String>> processingStatus = new ConcurrentHashMap<>();
@Resource
private OssService ossService;
@Resource
private LLMHandler llmHandler;
@Resource
private ModelService modelService;
@Resource
private DocumentParseService documentParseService;
@Override
public ParseResultVo tikaParseEmbeddingFile(MultipartFile file) {
ParseResultVo parseResultVo = documentParseService.tikaParse(file);
FileInfoVo info = parseResultVo.getInfo();
Map<String, Object> metadata = parseResultVo.getMetadata();
String documentId = (String) metadata.get("document_id");
info.setDocumentId(documentId);
// String text = parseResultVo.getText();
return parseResultVo;
}
@Override
public SseEmitter tikaParseEmbeddingFileSse(MultipartFile file) {
String emitterId = IdGeneratorUtil.getSnowflakeNextIdStr();
SseEmitter emitter = new SseEmitter(300_000L); // 5分钟超时(大文件需要)
// 注册SSE生命周期监听
emitter.onCompletion(() -> processingStatus.remove(emitterId));
emitter.onTimeout(() -> {
processingStatus.get(emitterId).set("TIMEOUT");
processingStatus.remove(emitterId);
});
// 初始化进度状态
processingStatus.put(emitterId, new AtomicReference<>("STARTED"));
// 异步处理(整个流程放入线程池)
THREAD_POOL.execute(() -> processFileAsync(file, emitter, emitterId));
return emitter;
}
@Override
public SseEmitter ocrImage(MultipartFile file) {
String emitterId = IdGeneratorUtil.getSnowflakeNextIdStr();
SseEmitter emitter = new SseEmitter(300_000L);
emitter.onCompletion(() -> processingStatus.remove(emitterId));
emitter.onTimeout(() -> processingStatus.remove(emitterId));
FileInfoVo upload = ossService.upload(file, FileTypeEnum.OTHER);
String filePath = upload.getFilePath();
processingStatus.put(emitterId, new AtomicReference<>("STARTED"));
THREAD_POOL.execute(() -> {
try {
updateProgress(emitterId, "OCR_PROCESSING", emitter);
// String text = documentParseService.ocr(file);
updateProgress(emitterId, "VECTORIZING_IMAGE", emitter);
updateProgress(emitterId, "COMPLETED", emitter);
emitter.send(SseEmitter.event()
.name("result")
.data(Map.of("text", "", "filePath", filePath, "status", "SUCCESS")));
} catch (Exception e) {
log.error("图片处理失败", e);
updateProgress(emitterId, "FAILED", emitter);
try {
emitter.send(SseEmitter.event()
.name("error")
.data(Map.of("message", "OCR失败: " + e.getMessage())));
} catch (IOException ignored) {}
} finally {
try { emitter.complete(); } catch (Exception ignored) {}
processingStatus.remove(emitterId);
}
});
return emitter;
}
private void processFileAsync(MultipartFile file, SseEmitter emitter, String emitterId) {
try {
// ========= 阶段1: 文件解析 =========
updateProgress(emitterId, "PARSING", emitter);
ParseResultVo parseResultVo = documentParseService.tikaParse(file);
FileInfoVo info = parseResultVo.getInfo();
Map<String, Object> metadata = parseResultVo.getMetadata();
String documentId = (String) metadata.get("document_id");
info.setDocumentId(documentId);
String text = parseResultVo.getText();
// ========= 阶段2: 向量化处理 =========
updateProgress(emitterId, "VECTORIZING", emitter);
AIParams params = this.getEmbeddingParams();
llmHandler.vectorizeAndStore(documentId, text, params);
// ========= 阶段3: 完成通知 =========
updateProgress(emitterId, "COMPLETED", emitter);
emitter.send(SseEmitter.event()
.name("result")
.data(Map.of("documentId", documentId, "fileUrl", info.getFilePath(), "status", "SUCCESS")));
} catch (Exception e) {
log.error("文件处理失败: {}", file.getOriginalFilename(), e);
updateProgress(emitterId, "FAILED", emitter);
try {
emitter.send(SseEmitter.event()
.name("error")
.data(Map.of("message", "处理失败: " + e.getMessage())));
} catch (IOException ioEx) {
log.warn("SSE error send failed", ioEx);
}
} finally {
try {
emitter.complete(); // 确保关闭连接
} catch (Exception ex) {
log.warn("SSE complete failed", ex);
}
processingStatus.remove(emitterId); // 清理状态
}
}
// 安全发送进度更新
private void updateProgress(String emitterId, String status, SseEmitter emitter) {
try {
// 1. 更新状态
processingStatus.get(emitterId).set(status);
// 2. 发送SSE事件(带重试机制)
int retry = 3;
while (retry-- > 0) {
try {
emitter.send(SseEmitter.event()
.name("progress")
.data(Map.of("status", status, "timestamp", System.currentTimeMillis())));
break;
} catch (IOException | IllegalStateException e) {
if (retry == 0) throw e;
Thread.sleep(100); // 短暂等待后重试
}
}
} catch (Exception e) {
log.error("进度更新失败", e);
processingStatus.get(emitterId).set("ERROR");
}
}
private AIParams getEmbeddingParams() {
LambdaQueryWrapper<AiModel> query = new LambdaQueryWrapper<>();
query.eq(AiModel::getStatus, StatusEnum.ENABLE);
query.eq(AiModel::getModelType, ModelTypeEnum.VSM);
query.last("limit 1");
AiModel model = modelService.getOne(query);
AssertUtil.notNull(model.getModelParams(), "未找到可用的向量模型配置!");
AIParams params = new AIParams();
params.setApiKey(model.getApiKey());
params.setBaseUrl(model.getBaseUrl());
params.setModelName(model.getModelName());
return params;
}
}
前端如何监听 SSE?
const eventSource = new EventSource('/your-app/document/parse/tikaParseEmbeddingFileSse');
eventSource.addEventListener('progress', (e) => {
const data = JSON.parse(e.data);
console.log('当前状态:', data.status); // 如 "PARSING", "VECTORIZING"
});
eventSource.addEventListener('result', (e) => {
console.log('处理完成:', e.data);
eventSource.close();
});
eventSource.addEventListener('error', (e) => {
console.error('处理失败');
eventSource.close();
});
✅ 用户体验极佳:不再是"转圈等待",而是"正在解析第3页... 正在生成向量..."
四、部署与优化建议
1. 依赖安装
- Tika:无需额外安装,Java 库即可;
- MinerU :需 Python 环境 + 安装
mineru包; - PaddleOCR:Java 调用 ONNX 模型,需下载模型文件;
- LibreOffice (隐含):若支持
.doc转 PDF,需后台运行。
2. 性能优化
- 线程池隔离:OCR、MinerU、Tika 使用不同线程池,避免互相阻塞;
- 临时文件清理 :所有
finally块确保删除临时文件; - OSS 存储:解析结果不存本地,直接上传对象存储;
- 限流熔断:防止恶意上传导致系统崩溃。
3. 安全加固
- 文件类型白名单校验(禁止
.exe等); - 文件大小限制(如 ≤ 50MB);
- MinIO/OSS 权限控制,防止 URL 泄露。
五、总结
本文实现了一个 生产级文档智能处理平台,具备以下能力:
| 功能 | 技术方案 |
|---|---|
| 通用文档解析 | Apache Tika |
| 高精度 PDF 解析 | MinerU |
| 图片 OCR | PaddleOCR (ONNX) |
| 向量化存储 | LLMHandler + 向量模型 |
| 实时进度反馈 | SSE (Server-Sent Events) |
| 文件安全存储 | OSS / MinIO |
🔮 未来扩展:
- 支持分块(Chunking)与元数据注入;
- 集成 LangChain / LlamaIndex;
- 添加 Webhook 回调通知。
如需要完整整体项目源码私信!
如果你觉得这篇文章对你有帮助,欢迎 点赞 ❤️、收藏 ⭐、转发 🔄 !
有任何问题或优化建议,欢迎在评论区交流!