聊聊Spring AI Alibaba的ObsidianDocumentReader

本文主要研究一下Spring AI Alibaba的ObsidianDocumentReader

ObsidianDocumentReader

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/main/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianDocumentReader.java

复制代码
public class ObsidianDocumentReader implements DocumentReader {

	private final Path vaultPath;

	private final MarkdownDocumentParser parser;

	/**
	 * Constructor for reading all files in vault
	 * @param vaultPath Path to Obsidian vault
	 */
	public ObsidianDocumentReader(Path vaultPath) {
		this.vaultPath = vaultPath;
		this.parser = new MarkdownDocumentParser();
	}

	@Override
	public List<Document> get() {
		List<Document> allDocuments = new ArrayList<>();

		// Find all markdown files in vault
		List<ObsidianResource> resources = ObsidianResource.findAllMarkdownFiles(vaultPath);

		// Parse each file
		for (ObsidianResource resource : resources) {
			try {
				List<Document> documents = parser.parse(resource.getInputStream());
				String source = resource.getSource();

				// Add metadata to each document
				for (Document doc : documents) {
					doc.getMetadata().put(ObsidianResource.SOURCE, source);
				}

				allDocuments.addAll(documents);
			}
			catch (IOException e) {
				throw new RuntimeException("Failed to read Obsidian file: " + resource.getFilePath(), e);
			}
		}

		return allDocuments;
	}

	public static Builder builder() {
		return new Builder();
	}

	public static class Builder {

		private Path vaultPath;

		public Builder vaultPath(Path vaultPath) {
			this.vaultPath = vaultPath;
			return this;
		}

		public ObsidianDocumentReader build() {
			return new ObsidianDocumentReader(vaultPath);
		}

	}

}

ObsidianDocumentReader的get方法通过ObsidianResource.findAllMarkdownFiles(vaultPath)来读取ObsidianResource,之后遍历resources使用MarkdownDocumentParser进行解析

ObsidianResource

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/main/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianResource.java

复制代码
public class ObsidianResource implements Resource {

	public static final String SOURCE = "source";

	public static final String MARKDOWN_EXTENSION = ".md";

	private final Path vaultPath;

	private final Path filePath;

	private final InputStream inputStream;

	/**
	 * Constructor for single file
	 * @param vaultPath Path to Obsidian vault
	 * @param filePath Path to markdown file
	 */
	public ObsidianResource(Path vaultPath, Path filePath) {
		Assert.notNull(vaultPath, "VaultPath must not be null");
		Assert.notNull(filePath, "FilePath must not be null");
		Assert.isTrue(Files.exists(vaultPath), "Vault directory does not exist: " + vaultPath);
		Assert.isTrue(Files.exists(filePath), "File does not exist: " + filePath);
		Assert.isTrue(filePath.toString().endsWith(MARKDOWN_EXTENSION), "File must be a markdown file: " + filePath);

		this.vaultPath = vaultPath;
		this.filePath = filePath;
		try {
			this.inputStream = new FileInputStream(filePath.toFile());
		}
		catch (IOException e) {
			throw new RuntimeException("Failed to create input stream for file: " + filePath, e);
		}
	}

	/**
	 * Find all markdown files in the vault Recursively searches through all
	 * subdirectories Only includes .md files and ignores hidden files/directories
	 * @param vaultPath Root path of the Obsidian vault
	 * @return List of ObsidianResource for each markdown file
	 */
	public static List<ObsidianResource> findAllMarkdownFiles(Path vaultPath) {
		Assert.notNull(vaultPath, "VaultPath must not be null");
		Assert.isTrue(Files.exists(vaultPath), "Vault directory does not exist: " + vaultPath);
		Assert.isTrue(Files.isDirectory(vaultPath), "VaultPath must be a directory: " + vaultPath);

		List<ObsidianResource> resources = new ArrayList<>();
		try (Stream<Path> paths = Files.walk(vaultPath)) {
			paths
				// Only include .md files
				.filter(path -> path.toString().endsWith(MARKDOWN_EXTENSION))
				// Ignore hidden files and files in hidden directories
				.filter(path -> {
					Path relativePath = vaultPath.relativize(path);
					String[] pathParts = relativePath.toString().split("/");
					for (String part : pathParts) {
						if (part.startsWith(".")) {
							return false;
						}
					}
					return true;
				})
				// Only include regular files (not directories)
				.filter(Files::isRegularFile)
				.forEach(path -> resources.add(new ObsidianResource(vaultPath, path)));
		}
		catch (IOException e) {
			throw new RuntimeException("Failed to walk vault directory: " + vaultPath, e);
		}
		return resources;
	}

	//......
}	

ObsidianResource构造器要求输入vaultPath和filePath,其findAllMarkdownFiles方法会遍历vaultPath目录,找出.md结尾的文件

示例

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/test/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianDocumentReaderIT.java

复制代码
@EnabledIfEnvironmentVariable(named = "OBSIDIAN_VAULT_PATH", matches = ".+")
class ObsidianDocumentReaderIT {

	private static final String VAULT_PATH = System.getenv("OBSIDIAN_VAULT_PATH");

	// Static initializer to log a message if environment variable is not set
	static {
		if (VAULT_PATH == null || VAULT_PATH.isEmpty()) {
			System.out.println("Skipping Obsidian tests because OBSIDIAN_VAULT_PATH environment variable is not set.");
		}
	}

	ObsidianDocumentReader reader;

	@BeforeEach
	void setUp() {
		// Only initialize if VAULT_PATH is set
		if (VAULT_PATH != null && !VAULT_PATH.isEmpty()) {
			reader = ObsidianDocumentReader.builder().vaultPath(Path.of(VAULT_PATH)).build();
		}
	}

	@Test
	void should_read_markdown_files() {
		// Skip test if reader is null
		Assumptions.assumeTrue(reader != null, "Skipping test because ObsidianDocumentReader could not be initialized");

		// when
		List<Document> documents = reader.get();

		// then
		assertThat(documents).isNotEmpty();

		// Verify document content and metadata
		for (Document doc : documents) {
			// Verify source metadata
			assertThat(doc.getMetadata()).containsKey(ObsidianResource.SOURCE);
			String source = doc.getMetadata().get(ObsidianResource.SOURCE).toString();
			assertThat(source).isNotEmpty().endsWith(ObsidianResource.MARKDOWN_EXTENSION);

			// Verify content
			assertThat(doc.getText()).isNotEmpty();

			// Print for debugging
			System.out.println("Document source: " + source);
			if (doc.getMetadata().containsKey("category")) {
				System.out.println("Document category: " + doc.getMetadata().get("category"));
			}
			System.out.println("Document content: " + doc.getText());
			System.out.println("---");
		}
	}

}

小结

spring-ai-alibaba-starter-document-reader-obsidian提供了ObsidianDocumentReader用于读取指定仓库(vaultPath)下的所有markdown文件,之后使用MarkdownDocumentParser去解析为List<Document>

doc

相关推荐
Jial-(^V^)4 分钟前
使用api-key调用大模型(包括DeepSeek/GLM/OpenAI)
人工智能
格林威4 分钟前
工业相机图像采集:Grab Timeout 设置建议——拒绝“假死”与“丢帧”的黄金法则
开发语言·人工智能·数码相机·计算机视觉·c#·机器视觉·工业相机
忧郁的橙子.6 分钟前
08-QLora微调&GGUF模型转换、Qwen打包部署 ollama 运行
人工智能·深度学习·机器学习·qlora·打包部署 ollama
小涛不学习7 分钟前
Java高频面试题(带答案版)
java·开发语言
big_rabbit05027 分钟前
JVM堆内存查看命令
java·linux·算法
坚持学习前端日记7 分钟前
从零开始构建小说推荐智能体 - Coze 本地部署完整教程
大数据·人工智能·数据挖掘
码农三叔7 分钟前
自动驾驶技术演进:路径规划与行为决策的突破与落地
人工智能·机器学习·自动驾驶
woniu_buhui_fei10 分钟前
Spring Cloud Alibaba相关知识总结
spring
workflower11 分钟前
影响用例书写格式的因素
人工智能·机器人·集成测试·ai编程·软件需求
lemonth13 分钟前
图形推理----
人工智能·算法·机器学习