聊聊Spring AI Alibaba的ObsidianDocumentReader

本文主要研究一下Spring AI Alibaba的ObsidianDocumentReader

ObsidianDocumentReader

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/main/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianDocumentReader.java

复制代码
public class ObsidianDocumentReader implements DocumentReader {

	private final Path vaultPath;

	private final MarkdownDocumentParser parser;

	/**
	 * Constructor for reading all files in vault
	 * @param vaultPath Path to Obsidian vault
	 */
	public ObsidianDocumentReader(Path vaultPath) {
		this.vaultPath = vaultPath;
		this.parser = new MarkdownDocumentParser();
	}

	@Override
	public List<Document> get() {
		List<Document> allDocuments = new ArrayList<>();

		// Find all markdown files in vault
		List<ObsidianResource> resources = ObsidianResource.findAllMarkdownFiles(vaultPath);

		// Parse each file
		for (ObsidianResource resource : resources) {
			try {
				List<Document> documents = parser.parse(resource.getInputStream());
				String source = resource.getSource();

				// Add metadata to each document
				for (Document doc : documents) {
					doc.getMetadata().put(ObsidianResource.SOURCE, source);
				}

				allDocuments.addAll(documents);
			}
			catch (IOException e) {
				throw new RuntimeException("Failed to read Obsidian file: " + resource.getFilePath(), e);
			}
		}

		return allDocuments;
	}

	public static Builder builder() {
		return new Builder();
	}

	public static class Builder {

		private Path vaultPath;

		public Builder vaultPath(Path vaultPath) {
			this.vaultPath = vaultPath;
			return this;
		}

		public ObsidianDocumentReader build() {
			return new ObsidianDocumentReader(vaultPath);
		}

	}

}

ObsidianDocumentReader的get方法通过ObsidianResource.findAllMarkdownFiles(vaultPath)来读取ObsidianResource,之后遍历resources使用MarkdownDocumentParser进行解析

ObsidianResource

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/main/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianResource.java

复制代码
public class ObsidianResource implements Resource {

	public static final String SOURCE = "source";

	public static final String MARKDOWN_EXTENSION = ".md";

	private final Path vaultPath;

	private final Path filePath;

	private final InputStream inputStream;

	/**
	 * Constructor for single file
	 * @param vaultPath Path to Obsidian vault
	 * @param filePath Path to markdown file
	 */
	public ObsidianResource(Path vaultPath, Path filePath) {
		Assert.notNull(vaultPath, "VaultPath must not be null");
		Assert.notNull(filePath, "FilePath must not be null");
		Assert.isTrue(Files.exists(vaultPath), "Vault directory does not exist: " + vaultPath);
		Assert.isTrue(Files.exists(filePath), "File does not exist: " + filePath);
		Assert.isTrue(filePath.toString().endsWith(MARKDOWN_EXTENSION), "File must be a markdown file: " + filePath);

		this.vaultPath = vaultPath;
		this.filePath = filePath;
		try {
			this.inputStream = new FileInputStream(filePath.toFile());
		}
		catch (IOException e) {
			throw new RuntimeException("Failed to create input stream for file: " + filePath, e);
		}
	}

	/**
	 * Find all markdown files in the vault Recursively searches through all
	 * subdirectories Only includes .md files and ignores hidden files/directories
	 * @param vaultPath Root path of the Obsidian vault
	 * @return List of ObsidianResource for each markdown file
	 */
	public static List<ObsidianResource> findAllMarkdownFiles(Path vaultPath) {
		Assert.notNull(vaultPath, "VaultPath must not be null");
		Assert.isTrue(Files.exists(vaultPath), "Vault directory does not exist: " + vaultPath);
		Assert.isTrue(Files.isDirectory(vaultPath), "VaultPath must be a directory: " + vaultPath);

		List<ObsidianResource> resources = new ArrayList<>();
		try (Stream<Path> paths = Files.walk(vaultPath)) {
			paths
				// Only include .md files
				.filter(path -> path.toString().endsWith(MARKDOWN_EXTENSION))
				// Ignore hidden files and files in hidden directories
				.filter(path -> {
					Path relativePath = vaultPath.relativize(path);
					String[] pathParts = relativePath.toString().split("/");
					for (String part : pathParts) {
						if (part.startsWith(".")) {
							return false;
						}
					}
					return true;
				})
				// Only include regular files (not directories)
				.filter(Files::isRegularFile)
				.forEach(path -> resources.add(new ObsidianResource(vaultPath, path)));
		}
		catch (IOException e) {
			throw new RuntimeException("Failed to walk vault directory: " + vaultPath, e);
		}
		return resources;
	}

	//......
}	

ObsidianResource构造器要求输入vaultPath和filePath,其findAllMarkdownFiles方法会遍历vaultPath目录,找出.md结尾的文件

示例

community/document-readers/spring-ai-alibaba-starter-document-reader-obsidian/src/test/java/com/alibaba/cloud/ai/reader/obsidian/ObsidianDocumentReaderIT.java

复制代码
@EnabledIfEnvironmentVariable(named = "OBSIDIAN_VAULT_PATH", matches = ".+")
class ObsidianDocumentReaderIT {

	private static final String VAULT_PATH = System.getenv("OBSIDIAN_VAULT_PATH");

	// Static initializer to log a message if environment variable is not set
	static {
		if (VAULT_PATH == null || VAULT_PATH.isEmpty()) {
			System.out.println("Skipping Obsidian tests because OBSIDIAN_VAULT_PATH environment variable is not set.");
		}
	}

	ObsidianDocumentReader reader;

	@BeforeEach
	void setUp() {
		// Only initialize if VAULT_PATH is set
		if (VAULT_PATH != null && !VAULT_PATH.isEmpty()) {
			reader = ObsidianDocumentReader.builder().vaultPath(Path.of(VAULT_PATH)).build();
		}
	}

	@Test
	void should_read_markdown_files() {
		// Skip test if reader is null
		Assumptions.assumeTrue(reader != null, "Skipping test because ObsidianDocumentReader could not be initialized");

		// when
		List<Document> documents = reader.get();

		// then
		assertThat(documents).isNotEmpty();

		// Verify document content and metadata
		for (Document doc : documents) {
			// Verify source metadata
			assertThat(doc.getMetadata()).containsKey(ObsidianResource.SOURCE);
			String source = doc.getMetadata().get(ObsidianResource.SOURCE).toString();
			assertThat(source).isNotEmpty().endsWith(ObsidianResource.MARKDOWN_EXTENSION);

			// Verify content
			assertThat(doc.getText()).isNotEmpty();

			// Print for debugging
			System.out.println("Document source: " + source);
			if (doc.getMetadata().containsKey("category")) {
				System.out.println("Document category: " + doc.getMetadata().get("category"));
			}
			System.out.println("Document content: " + doc.getText());
			System.out.println("---");
		}
	}

}

小结

spring-ai-alibaba-starter-document-reader-obsidian提供了ObsidianDocumentReader用于读取指定仓库(vaultPath)下的所有markdown文件,之后使用MarkdownDocumentParser去解析为List<Document>

doc

相关推荐
我是苏苏4 分钟前
Web开发:C#通过ProcessStartInfo动态调用执行Python脚本
java·服务器·前端
JavaGuide9 分钟前
SpringBoot 官宣停止维护 3.2.x~3.4.x!
java·后端
peixiuhui26 分钟前
EdgeGateway 快速开始手册-表达式 Modbus 报文格式
人工智能·mqtt·边缘计算·iot·modbus tcp·iotgateway·modbus rtu
瑶山31 分钟前
Spring Cloud微服务搭建一、Nacos配置和服务注册
spring·spring cloud·微服务·nacos
bing.shao1 小时前
golang 做AI任务执行
开发语言·人工智能·golang
tkevinjd1 小时前
动态代理
java
Knight_AL1 小时前
Spring 事务管理:为什么内部方法调用事务不生效以及如何解决
java·后端·spring
鼎道开发者联盟1 小时前
2025中国AI开源生态报告发布,鼎道智联助力产业高质量发展
人工智能·开源·gui
贾维思基1 小时前
告别RPA和脚本!视觉推理Agent,下一代自动化的暴力解法
人工智能·agent
P-ShineBeam1 小时前
引导式问答-对话式商品搜索-TRACER
人工智能·语言模型·自然语言处理·知识图谱