聊聊Spring AI Alibaba的ObsidianDocumentReader

本文主要研究一下Spring AI Alibaba的ObsidianDocumentReader

ObsidianDocumentReader

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/main/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianDocumentReader.java

复制代码
public class ObsidianDocumentReader implements DocumentReader {

	private final Path vaultPath;

	private final MarkdownDocumentParser parser;

	/**
	 * Constructor for reading all files in vault
	 * @param vaultPath Path to Obsidian vault
	 */
	public ObsidianDocumentReader(Path vaultPath) {
		this.vaultPath = vaultPath;
		this.parser = new MarkdownDocumentParser();
	}

	@Override
	public List<Document> get() {
		List<Document> allDocuments = new ArrayList<>();

		// Find all markdown files in vault
		List<ObsidianResource> resources = ObsidianResource.findAllMarkdownFiles(vaultPath);

		// Parse each file
		for (ObsidianResource resource : resources) {
			try {
				List<Document> documents = parser.parse(resource.getInputStream());
				String source = resource.getSource();

				// Add metadata to each document
				for (Document doc : documents) {
					doc.getMetadata().put(ObsidianResource.SOURCE, source);
				}

				allDocuments.addAll(documents);
			}
			catch (IOException e) {
				throw new RuntimeException("Failed to read Obsidian file: " + resource.getFilePath(), e);
			}
		}

		return allDocuments;
	}

	public static Builder builder() {
		return new Builder();
	}

	public static class Builder {

		private Path vaultPath;

		public Builder vaultPath(Path vaultPath) {
			this.vaultPath = vaultPath;
			return this;
		}

		public ObsidianDocumentReader build() {
			return new ObsidianDocumentReader(vaultPath);
		}

	}

}

ObsidianDocumentReader的get方法通过ObsidianResource.findAllMarkdownFiles(vaultPath)来读取ObsidianResource,之后遍历resources使用MarkdownDocumentParser进行解析

ObsidianResource

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/main/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianResource.java

复制代码
public class ObsidianResource implements Resource {

	public static final String SOURCE = "source";

	public static final String MARKDOWN_EXTENSION = ".md";

	private final Path vaultPath;

	private final Path filePath;

	private final InputStream inputStream;

	/**
	 * Constructor for single file
	 * @param vaultPath Path to Obsidian vault
	 * @param filePath Path to markdown file
	 */
	public ObsidianResource(Path vaultPath, Path filePath) {
		Assert.notNull(vaultPath, "VaultPath must not be null");
		Assert.notNull(filePath, "FilePath must not be null");
		Assert.isTrue(Files.exists(vaultPath), "Vault directory does not exist: " + vaultPath);
		Assert.isTrue(Files.exists(filePath), "File does not exist: " + filePath);
		Assert.isTrue(filePath.toString().endsWith(MARKDOWN_EXTENSION), "File must be a markdown file: " + filePath);

		this.vaultPath = vaultPath;
		this.filePath = filePath;
		try {
			this.inputStream = new FileInputStream(filePath.toFile());
		}
		catch (IOException e) {
			throw new RuntimeException("Failed to create input stream for file: " + filePath, e);
		}
	}

	/**
	 * Find all markdown files in the vault Recursively searches through all
	 * subdirectories Only includes .md files and ignores hidden files/directories
	 * @param vaultPath Root path of the Obsidian vault
	 * @return List of ObsidianResource for each markdown file
	 */
	public static List<ObsidianResource> findAllMarkdownFiles(Path vaultPath) {
		Assert.notNull(vaultPath, "VaultPath must not be null");
		Assert.isTrue(Files.exists(vaultPath), "Vault directory does not exist: " + vaultPath);
		Assert.isTrue(Files.isDirectory(vaultPath), "VaultPath must be a directory: " + vaultPath);

		List<ObsidianResource> resources = new ArrayList<>();
		try (Stream<Path> paths = Files.walk(vaultPath)) {
			paths
				// Only include .md files
				.filter(path -> path.toString().endsWith(MARKDOWN_EXTENSION))
				// Ignore hidden files and files in hidden directories
				.filter(path -> {
					Path relativePath = vaultPath.relativize(path);
					String[] pathParts = relativePath.toString().split("/");
					for (String part : pathParts) {
						if (part.startsWith(".")) {
							return false;
						}
					}
					return true;
				})
				// Only include regular files (not directories)
				.filter(Files::isRegularFile)
				.forEach(path -> resources.add(new ObsidianResource(vaultPath, path)));
		}
		catch (IOException e) {
			throw new RuntimeException("Failed to walk vault directory: " + vaultPath, e);
		}
		return resources;
	}

	//......
}	

ObsidianResource构造器要求输入vaultPath和filePath,其findAllMarkdownFiles方法会遍历vaultPath目录,找出.md结尾的文件

示例

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/test/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianDocumentReaderIT.java

复制代码
@EnabledIfEnvironmentVariable(named = "OBSIDIAN_VAULT_PATH", matches = ".+")
class ObsidianDocumentReaderIT {

	private static final String VAULT_PATH = System.getenv("OBSIDIAN_VAULT_PATH");

	// Static initializer to log a message if environment variable is not set
	static {
		if (VAULT_PATH == null || VAULT_PATH.isEmpty()) {
			System.out.println("Skipping Obsidian tests because OBSIDIAN_VAULT_PATH environment variable is not set.");
		}
	}

	ObsidianDocumentReader reader;

	@BeforeEach
	void setUp() {
		// Only initialize if VAULT_PATH is set
		if (VAULT_PATH != null && !VAULT_PATH.isEmpty()) {
			reader = ObsidianDocumentReader.builder().vaultPath(Path.of(VAULT_PATH)).build();
		}
	}

	@Test
	void should_read_markdown_files() {
		// Skip test if reader is null
		Assumptions.assumeTrue(reader != null, "Skipping test because ObsidianDocumentReader could not be initialized");

		// when
		List<Document> documents = reader.get();

		// then
		assertThat(documents).isNotEmpty();

		// Verify document content and metadata
		for (Document doc : documents) {
			// Verify source metadata
			assertThat(doc.getMetadata()).containsKey(ObsidianResource.SOURCE);
			String source = doc.getMetadata().get(ObsidianResource.SOURCE).toString();
			assertThat(source).isNotEmpty().endsWith(ObsidianResource.MARKDOWN_EXTENSION);

			// Verify content
			assertThat(doc.getText()).isNotEmpty();

			// Print for debugging
			System.out.println("Document source: " + source);
			if (doc.getMetadata().containsKey("category")) {
				System.out.println("Document category: " + doc.getMetadata().get("category"));
			}
			System.out.println("Document content: " + doc.getText());
			System.out.println("---");
		}
	}

}

小结

spring-ai-alibaba-starter-document-reader-obsidian提供了ObsidianDocumentReader用于读取指定仓库(vaultPath)下的所有markdown文件,之后使用MarkdownDocumentParser去解析为List<Document>

doc

相关推荐
左灯右行的爱情2 分钟前
JVM-卡表
java·jvm·算法
annus mirabilis6 分钟前
PyTorch 入门指南:从核心概念到基础实战
人工智能·pytorch·python
众乐乐_200811 分钟前
Maven中的(五种常用依赖范围)
java·maven
橘猫云计算机设计13 分钟前
springboot-基于Web企业短信息发送系统(源码+lw+部署文档+讲解),源码可白嫖!
java·前端·数据库·spring boot·后端·小程序·毕业设计
程序猿chen24 分钟前
JVM考古现场(二十五):逆熵者·时间晶体的永恒之战(进阶篇)
java·jvm·git·后端·程序人生·java-ee·改行学it
昊昊该干饭了30 分钟前
【金仓数据库征文】从 HTAP 到 AI 加速,KingbaseES 的未来之路
数据库·人工智能·金仓数据库 2025 征文·数据库平替用金仓
CopyLower32 分钟前
Spring Boot的优点:赋能现代Java开发的利器
java·linux·spring boot
细心的莽夫34 分钟前
Elasticsearch复习笔记
java·大数据·spring boot·笔记·后端·elasticsearch·docker
摸鱼仙人~41 分钟前
深度学习优化器和调度器的选择和推荐
人工智能·深度学习
程序员阿鹏44 分钟前
实现SpringBoot底层机制【Tomcat启动分析+Spring容器初始化+Tomcat 如何关联 Spring容器】
java·spring boot·后端·spring·docker·tomcat·intellij-idea