概述
基于 Apache Lucene 的搜索建议索引构建器,专门用于构建支持中文和拼音搜索的倒排索引。
核心功能
1. 索引构建流程
配置获取 → 数据下载 → 索引构建 → 文件上传 → 清理临时文件
2. 主要特性
- 任意位置子串匹配:支持搜索任意子串找到完整词汇
- 中文拼音混合搜索:支持拼音、首字母缩写、拼音前缀搜索
- 高性能查询:基于 Lucene 倒排索引,毫秒级响应
- 自动化构建:通过 Crane 任务调度,支持定时重建
技术架构
1. 核心类结构
java
@Service
public class SugLuceneIndexBuilder {
// 配置管理
@MtConfig(key = "sug.fst.index.config")
private static volatile Map<String, FSTBuildScene> fstIndexConfig;
// 核心服务
@Autowired private HiveService hiveService;
@Autowired private AmazonS3Client s3Client;
// 拼音处理
private static Sterotoner sterotoner = new Sterotoner();
}
2. 字段定义
java
public static class CorrectLuceneFields {
public static String Content = "content"; // 原始内容存储
public static String ContentPrefix = "content_prefix"; // 搜索索引字段
}
索引构建详解
1. 数据获取
java
private boolean downloadDataFromHive(FSTBuildScene sceneConfig) {
String sql = sceneConfig.getHiveSql();
String localInputFileName = INPUT_PATH + sceneConfig.getIndexName() + ".tsv";
HiveServiceResponse response = hiveService.downloadFileWithRetryR(sql, localInputFileName, 3);
return response != null && response.isSuccess();
}
特点:
- 从 Hive 执行 SQL 查询获取数据
- 支持重试机制(最多3次)
- 数据格式:TSV(Tab分隔)
2. 文档映射核心逻辑
A. 文档结构生成
java
public Iterable<? extends IndexableField> mapDoc(String line) {
String content = line.trim();
List<IndexableField> fields = new ArrayList<>();
// 1. 存储原始内容(用于返回结果)
fields.add(new StoredField(CorrectLuceneFields.Content, content));
// 2. 生成搜索索引字段
addContentPrefixFields(fields, content, normalizeString(content));
return fields;
}
B. 索引字段生成策略
java
private void addContentPrefixFields(List<IndexableField> fields, String content, String queryClean) {
// 1. 中文子串枚举 - O(n²) 复杂度
for (int i = 0; i < content.length(); i++) {
for (int j = i + 1; j <= content.length(); j++) {
String substring = content.substring(i, j);
fields.add(new StringField(CorrectLuceneFields.ContentPrefix, substring, Field.Store.NO));
}
}
// 2. 拼音变体生成
Set<String> allPinyins = getPyQuerys(queryClean, true);
for (String pinyin : allPinyins) {
fields.add(new StringField(CorrectLuceneFields.ContentPrefix, pinyin, Field.Store.NO));
// 拼音前缀生成
for (int len = 2; len < pinyin.length(); len++) {
String prefix = pinyin.substring(0, len);
fields.add(new StringField(CorrectLuceneFields.ContentPrefix, prefix, Field.Store.NO));
}
}
}
3. 具体示例:麦当劳
的索引生成
输入数据
麦当劳
生成的索引项
vbnet
# StoredField (用于返回结果)
content: "麦当劳"
# StringField (用于搜索)
content_prefix: "麦"
content_prefix: "麦当"
content_prefix: "麦当劳"
content_prefix: "当"
content_prefix: "当劳"
content_prefix: "劳"
content_prefix: "maidanglao" # 完整拼音
content_prefix: "mdl" # 首字母缩写
content_prefix: "mai" # 拼音前缀
content_prefix: "maid" # 拼音前缀
content_prefix: "maida" # 拼音前缀
content_prefix: "maidan" # 拼音前缀
content_prefix: "maidang" # 拼音前缀
content_prefix: "maidangl" # 拼音前缀
content_prefix: "maidangla" # 拼音前缀
底层数据结构
1. Lucene 倒排索引结构
arduino
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Term Dictionary │ │ Posting Lists │ │ Document Store │
├─────────────────┤ ├──────────────────┤ ├─────────────────┤
│ "麦" → ptr1 │───→│ [doc1,doc5,doc10]│ │ doc1: "麦当劳" │
│ "麦当" → ptr2 │───→│ [doc1,doc8] │ │ doc2: "劳动节" │
│ "麦当劳" → ptr3 │───→│ [doc1] │ │ doc3: "当当网" │
│ "当" → ptr4 │───→│ [doc1,doc3,doc7] │ │ ... │
│ "当劳" → ptr5 │───→│ [doc1,doc2] │ │ │
│ "劳" → ptr6 │───→│ [doc1,doc4,doc9] │ │ │
│ "mai" → ptr7 │───→│ [doc1,doc6] │ │ │
│ "mdl" → ptr8 │───→│ [doc1,doc11] │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
生成的 Lucene Document 结构
arduino
// 每个原始文本生成一个 Lucene Document
Document {
StoredField("content", "麦当劳"), // 用于返回结果
StringField("content_prefix", "麦"), // 用于搜索
StringField("content_prefix", "麦当"), // 用于搜索
StringField("content_prefix", "麦当劳"), // 用于搜索
StringField("content_prefix", "当"), // 用于搜索
StringField("content_prefix", "当劳"), // 用于搜索
StringField("content_prefix", "劳"), // 用于搜索
StringField("content_prefix", "maidanglao"), // 拼音
StringField("content_prefix", "mdl"), // 首字母
StringField("content_prefix", "mai"), // 拼音前缀
StringField("content_prefix", "maid"), // 拼音前缀
StringField("content_prefix", "maida"), // 拼音前缀
StringField("content_prefix", "maidan"), // 拼音前缀
StringField("content_prefix", "maidang"), // 拼音前缀
StringField("content_prefix", "maidangl"), // 拼音前缀
StringField("content_prefix", "maidangla"), // 拼音前缀
// ... 更多拼音前缀
}
2. 搜索过程:查询 "麦"
底层查询实现
ini
public List<String> queryFromIndex(String cleanQuery, String scene) {
try {
IndexSearcher indexSearcher = indexSearcherMap.get(scene);
// 获取叶子读取器
List<LeafReaderContext> leaves = indexSearcher.getIndexReader().leaves();
LeafReader reader = leaves.get(0).reader();
Set<String> results = new HashSet<>();
// 构建查询词项
Term term = new Term("content_prefix", cleanQuery);
// 获取倒排列表
PostingsEnum postings = reader.postings(term);
if (postings != null) {
int docId;
while ((docId = postings.nextDoc()) != PostingsEnum.NO_MORE_DOCS) {
Document doc = reader.document(docId);
String content = doc.get("content");
if (content != null) {
results.add(content);
}
}
}
return new ArrayList<>(results);
} catch (Exception e) {
LOGGER.error("Error querying index for scene: {}, query: {}", scene, cleanQuery, e);
return new ArrayList<>();
}
}
查询执行过程
arduino
1. Term查找: content_prefix:"麦"
2. 获取PostingList: [doc1, doc5, doc10, ...]
3. 文档检索:
- doc1 → "麦当劳"
- doc5 → "麦咖啡"
- doc10 → "麦片粥"
4. 返回结果: ["麦当劳", "麦咖啡", "麦片粥"]
拼音处理机制
1. Sterotoner 拼音转换
java
public Set<String> getPyQuerys(final String query, boolean addFirstAlpha) {
Set<String> result = new HashSet<>();
Set<String> fpy = new HashSet<>(); // 首字母拼音
Set<String> apy = new HashSet<>(); // 完整拼音
sterotoner.getPinyinAll(query, apy, fpy);
if (addFirstAlpha) {
result.addAll(fpy); // 添加首字母缩写
}
result.addAll(apy); // 添加完整拼音
return result;
}
2. 拼音变体示例
makefile
输入: "麦当劳"
输出:
- 完整拼音: "maidanglao"
- 首字母: "mdl"
- 前缀: "mai", "maid", "maida", "maidan", "maidang", "maidangl", "maidangla"
文件路径和存储
1. 本地路径配置
java
public static final String DEFAULT_LOCAL_PATH = "/opt/meituan/dict/sug_fallback_indexes/";
public static final String INPUT_PATH = DEFAULT_LOCAL_PATH + "input/";
public static final String OUTPUT_PATH = DEFAULT_LOCAL_PATH + "output/";
2. 文件结构
python
/opt/meituan/dict/sug_fallback_indexes/
├── input/
│ └── {indexName}.tsv # 从Hive下载的原始数据
└── output/
└── {indexName}/
├── segments_1 # Lucene段信息
├── _0.cfe # 复合字段存储
├── _0.cfs # 复合字段索引
├── _0.si # 段信息
└── write.lock # 写锁文件
3. S3 上传
java
private void uploadS3(String indexName) {
String bucketName = "index";
String s3KeyPrefix = "RecallIndexBackup/FallbackIndex/" + indexName;
// 上传所有索引文件(除了write.lock)
// 生成并上传文件清单 file_manifest.txt
}
性能特点
1. 时间复杂度
- 索引构建: O(n³) - n为平均字符串长度
- 查询性能: O(log m + k) - m为词典大小,k为结果数量
2. 空间复杂度
- 索引大小: 原始数据的 10-20 倍
- 内存占用: 构建时需要加载所有数据到内存
3. 查询性能
- 精确匹配: 毫秒级响应
- 前缀查询: 毫秒级响应
- 任意子串: 毫秒级响应
配置管理
1. MCC 配置
java
@MtConfig(clientId = MccConfiguration.ID,
key = "sug.fst.index.config",
converter = FSTBuildSceneConverter.class)
private static volatile Map<String, FSTBuildScene> fstIndexConfig;
2. 配置示例
json
{
"restaurant_suggest": {
"hiveSql": "SELECT name FROM restaurant_table WHERE status = 1",
"indexName": "restaurant_suggest",
"columnNum": 1
}
}
任务调度
1. Crane 任务
java
@Crane("sug.fallback.fst.build.universal.task")
public void buildFSTIndex(String sceneName) {
// 索引构建逻辑
}
2. 执行流程
markdown
1. 获取场景配置
2. 从Hive下载数据
3. 构建Lucene索引
4. 上传到S3
5. 清理临时文件
优化策略
1. 索引优化
java
indexWriter.commit();
indexWriter.flush();
indexWriter.forceMerge(1); // 强制合并为单个段
2. 字符串规范化
java
public static String normalizeString(String phrase) {
// 1. 去除"外卖"前后缀
// 2. 全角转半角
// 3. 转小写
// 4. 去除多余空格
return removeSpaceEx(full2Half(formatQuery.toLowerCase()));
}
适用场景
✅ 适合的场景
- 搜索建议/自动补全
- 任意位置子串匹配
- 中文拼音混合搜索
- 高频查询场景
❌ 不适合的场景
- 复杂全文搜索
- 需要评分排序
- 频繁更新的索引
- 内存极度敏感的场景
监控指标
1. 构建指标
- 索引构建时间
- 索引文件大小
- 生成的索引项数量
2. 查询指标
- 查询响应时间
- 查询准确率
- 内存使用量
注意事项
1. 数据质量
- 确保输入数据去重
- 过滤掉过短的字符串(< MINLEN)
- 处理特殊字符和编码问题
2. 资源管理
- 构建过程中内存占用较大
- 需要足够的磁盘空间存储索引
- S3 上传需要网络带宽
3. 维护成本
- 索引不可变,更新需要重建
- 需要定期清理临时文件
- 监控索引构建任务状态
这个实现通过牺牲存储空间来换取查询性能,特别适合搜索建议这种对响应时间要求极高的场景。
写
// lyx/fst-index-builder
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.TypeReference;
import com.cip.crane.client.spring.annotation.Crane;
import com.google.common.base.Splitter;
import com.google.common.collect.Lists;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.google.common.collect.Iterables;
import com.sankuai.meituan.config.annotation.MtConfig;
import com.sankuai.meituan.config.configuration.MccConfiguration;
import com.sankuai.meituan.config.exception.MtConfigException;
import com.sankuai.meituan.config.function.MtConfigConverter;
import com.sankuai.meituan.waimai.d.search.offline.similarity.relevance.common.pinyin.Sterotoner;
import com.sankuai.meituan.waimai.traffic.offline.task.domain.HiveServiceResponse;
import com.sankuai.meituan.waimai.traffic.offline.task.service.thirdparty.HiveService;
import com.sankuai.meituan.waimai.traffic.offline.task.util.FSTBuildScene;
import org.apache.commons.collections.CollectionUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.document.StringField;
import org.apache.lucene.index.*;
import org.apache.lucene.store.FSDirectory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.IOException;
import java.lang.reflect.Field;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import static com.sankuai.meituan.waimai.d.search.util.StringHelp.full2Half;
import static com.sankuai.meituan.waimai.traffic.offline.task.util.StringUtil.removeSpaceEx;
@Service
public class SugLuceneIndexBuilder {
private static final Logger LOGGER = LoggerFactory.getLogger(SugLuceneIndexBuilder.class);
// 下载到本地的HIVE文件路径
public static final String DEFAULT_LOCAL_PATH = "/opt/meituan/dict/sug_fallback_indexes/";
// public static final String DEFAULT_LOCAL_PATH = "/Users/longyuxin/Desktop/work/waimai_d_traffic_data_source/waimai-d-traffic-offline-task-service/src/test/resources/test_index";
public static final String INPUT_PATH = DEFAULT_LOCAL_PATH + "input/";
public static final String OUTPUT_PATH = DEFAULT_LOCAL_PATH + "output/";
public static final String REMOTE_INDEX_PATH = "RecallIndexBackup/FallbackIndex/";
@Autowired
private HiveService hiveService;
@Autowired
private AmazonS3Client s3Client;
private static Sterotoner sterotoner = new Sterotoner();
private static final int MINLEN = 2;
public static class CorrectLuceneFields {
public static String Content = "content"; // 原始内容
public static String ContentPrefix = "content_prefix"; // 子串
}
@MtConfig(clientId = MccConfiguration.ID, key = "sug.fst.index.config", converter = FSTBuildSceneConverter.class)
private static volatile Map<String, FSTBuildScene> fstIndexConfig = new ConcurrentHashMap<>();
public static class FSTBuildSceneConverter implements MtConfigConverter<Map<String, FSTBuildScene>> {
@Override
public Map<String, FSTBuildScene> convert(Field field, String key, String newValue) throws MtConfigException {
LOGGER.info("config:{} changed, newValue:{}", key, newValue);
Map<String, FSTBuildScene> result = new HashMap<>();
if (StringUtils.isNotBlank(newValue)) {
try {
result = JSON.parseObject(newValue, new TypeReference<Map<String, FSTBuildScene>>() {});
}catch (Exception e) {
LOGGER.warn("parse FSTBuildScene error! key:{}", key, e);
}
}
return result;
}
}
@Crane("sug.fallback.fst.build.universal.task")
public void buildFSTIndex(String sceneName) {
FSTBuildScene scene = null;
try {
LOGGER.info("Starting FST index build for scene: {}", sceneName);
// 1. 获取场景配置
scene = fstIndexConfig.get(sceneName);
if (scene == null || scene.getHiveSql().isEmpty()) {
LOGGER.error("Invalid scene config " + sceneName);
return;
}
// 2. 从 Hive 下载数据 临时保存在本地
if (!downloadDataFromHive(scene)) {
LOGGER.error("fetch data failed" + sceneName);
return;
}
// 3. 处理数据并构建 FST 索引
buildIndex(scene);
} catch (Exception e) {
LOGGER.error("Failed to build FST index for scene: " + sceneName, e);
}finally {
// 清理临时文件
if (scene != null && scene.getLocalPath() != null) {
cleanupTempFile(scene.getLocalPath());
}
}
}
private boolean downloadDataFromHive(FSTBuildScene sceneConfig) {
try {
String sql = sceneConfig.getHiveSql();
String localInputFileName = INPUT_PATH + sceneConfig.getIndexName() + ".tsv";
sceneConfig.setLocalPath(localInputFileName);
HiveServiceResponse hiveServiceResponse = hiveService.downloadFileWithRetryR(sql, localInputFileName, 3);
return hiveServiceResponse != null && hiveServiceResponse.isSuccess() && hiveServiceResponse.getQueryInfo() != null;
} catch (Exception e) {
LOGGER.error("Error downloading data from hive", e);
return false;
}
}
private void cleanupTempFile(String filePath) {
try {
Files.deleteIfExists(Paths.get(filePath));
LOGGER.info("Cleaned up temp file: {}", filePath);
} catch (IOException e) {
LOGGER.warn("Failed to cleanup temp file: {}", filePath, e);
}
}
public Iterable<? extends IndexableField> mapDoc(String line) {
String content = line.trim();
if (content.isEmpty() || content.length() < MINLEN) {
return Collections.emptyList();
}
List<IndexableField> fields = new ArrayList<>();
// 1. 存储原始内容
fields.add(new StoredField(CorrectLuceneFields.Content, content));
// 2. 核心索引:ContentPrefix(中文子串 + 拼音前缀 + 首字母前缀)
addContentPrefixFields(fields, content, normalizeString(content));
return fields;
}
private void addContentPrefixFields(List<IndexableField> fields, String content, String queryClean) {
// 1. 添加中文子串
for (int i = 0; i < content.length(); i++) {
for (int j = i + 1; j <= content.length(); j++) {
String substring = content.substring(i, j);
fields.add(new StringField(CorrectLuceneFields.ContentPrefix, substring, org.apache.lucene.document.Field.Store.NO));
}
}
// 2. 添加拼音子串
Set<String> allPinyins = getPyQuerys(queryClean, true);
Set<String> addedPrefixes = new HashSet<>();
for (String pinyin : allPinyins) {
fields.add(new StringField(CorrectLuceneFields.ContentPrefix, pinyin, org.apache.lucene.document.Field.Store.NO));
for (int len = 2; len < pinyin.length(); len++) { // 注意:len < pinyin.length()
String prefix = pinyin.substring(0, len);
if (addedPrefixes.add(prefix)) {
fields.add(new StringField(CorrectLuceneFields.ContentPrefix, prefix, org.apache.lucene.document.Field.Store.NO));
}
}
}
}
public Set<String> getPyQuerys(final String query, boolean addFirstAlpha) {
Set<String> result = new HashSet<>();
Set<String> fpy = new HashSet<String>();
Set<String> apy = new HashSet<String>();
sterotoner.getPinyinAll(query, apy, fpy);
if (addFirstAlpha) {
for (String py : fpy) {
if (!py.trim().isEmpty()) {
result.add(py);
}
}
}
for (String py : apy) {
if (!py.trim().isEmpty()) {
result.add(py);
}
}
return result;
}
public boolean buildIndex(FSTBuildScene scene) throws IOException {
String localIndexPath = OUTPUT_PATH + scene.getIndexName();
// 完全清理索引目录
cleanupIndexDirectory(localIndexPath);
Files.createDirectories(Paths.get(localIndexPath));
IndexWriterConfig writerConfig = new IndexWriterConfig();
writerConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
IndexWriter indexWriter = null;
try {
indexWriter = new IndexWriter(FSDirectory.open(Paths.get(localIndexPath)), writerConfig);
// 读取并处理文件
try (BufferedReader reader = Files.newBufferedReader(Paths.get(scene.getLocalPath()))) {
String line;
int columnCount = -1;
while ((line = reader.readLine()) != null) {
String firstColumn = getFirstColumn(line);
if (firstColumn != null) {
Iterable<? extends IndexableField> doc = mapDoc(firstColumn);
if (!Iterables.isEmpty(doc)) {
indexWriter.addDocument(doc);
}
}
}
}
indexWriter.commit();
indexWriter.flush();
indexWriter.forceMerge(1);
} finally {
if (indexWriter != null) {
try {
indexWriter.close();
} catch (IOException e) {
LOGGER.error("Failed to close IndexWriter", e);
}
}
}
uploadS3(scene.getIndexName());
return true;
}
private String getFirstColumn(String line) {
int firstSeparatorIndex = line.indexOf('\u0001');
if (firstSeparatorIndex == -1) {
// 没有分隔符,整行就是第一列
return line.trim().isEmpty() ? null : line.trim();
}
// 有分隔符,取第一个分隔符前的内容
String firstColumn = line.substring(0, firstSeparatorIndex).trim();
return firstColumn.isEmpty() ? null : firstColumn;
}
private void cleanupIndexDirectory(String indexPath) {
try {
File indexDir = new File(indexPath);
if (indexDir.exists()) {
File[] files = indexDir.listFiles();
if (files != null) {
for (File file : files) {
if (file.isFile()) {
boolean deleted = file.delete();
LOGGER.info("Deleted index file: {}, success: {}", file.getName(), deleted);
}
}
}
}
} catch (Exception e) {
LOGGER.warn("Failed to cleanup index directory: {}", indexPath, e);
}
}
public static String normalizeString(String phrase) {
if (phrase == null || phrase.trim().length() == 0) {
return null;
}
String formatQuery = phrase;
if (formatQuery.endsWith(" 外卖")) {
formatQuery = formatQuery.substring(0, phrase.length() - 3);
} else if (formatQuery.startsWith("外卖 ")) {
formatQuery = formatQuery.substring(3, phrase.length());
}
if (org.apache.commons.lang3.StringUtils.isEmpty(formatQuery)) {
formatQuery = "外卖";
}
String halfstr = full2Half(formatQuery.toLowerCase());
return removeSpaceEx(halfstr);
}
private void uploadS3(String indexName) {
try {
String bucketName = "index";
String localIndexPath = OUTPUT_PATH + indexName;
String s3KeyPrefix = REMOTE_INDEX_PATH + indexName;
File indexDir = new File(localIndexPath);
if (!indexDir.exists() || !indexDir.isDirectory()) {
LOGGER.error("Index directory does not exist: {}", localIndexPath);
return;
}
File[] indexFiles = indexDir.listFiles((dir, name) ->
!name.equals("write.lock") && !name.startsWith(".")
);
if (indexFiles == null || indexFiles.length == 0) {
LOGGER.warn("No index files found to upload for: {}", indexName);
return;
}
List<String> uploadedFiles = new ArrayList<>();
// 上传索引文件
for (File file : indexFiles) {
String s3Key = s3KeyPrefix + "/" + file.getName();
LOGGER.info("Uploading file: {} to S3 key: {}", file.getName(), s3Key);
s3Client.upload(bucketName, s3Key, file);
uploadedFiles.add(file.getName());
}
// 生成并上传文件清单
uploadFileManifest(bucketName, s3KeyPrefix, uploadedFiles);
LOGGER.info("Successfully uploaded {} index files and manifest for: {}",
uploadedFiles.size(), indexName);
} catch (Exception e) {
LOGGER.error("upload to s3 error", e);
}
}
private void uploadFileManifest(String bucketName, String s3KeyPrefix, List<String> fileNames) {
try {
// 创建临时的清单文件
String manifestFileName = "file_manifest.txt";
String localManifestPath = OUTPUT_PATH + "temp_" + manifestFileName;
// 写入文件清单
try (BufferedWriter writer = Files.newBufferedWriter(Paths.get(localManifestPath))) {
for (String fileName : fileNames) {
writer.write(fileName);
writer.newLine();
}
}
// 上传清单文件
String manifestS3Key = s3KeyPrefix + "/" + manifestFileName;
s3Client.upload(bucketName, manifestS3Key, new File(localManifestPath));
// 清理临时清单文件
Files.deleteIfExists(Paths.get(localManifestPath));
LOGGER.info("Uploaded file manifest with {} files to: {}", fileNames.size(), manifestS3Key);
} catch (Exception e) {
LOGGER.error("Failed to upload file manifest", e);
}
}
}
读
// 查询方法
public List<String> queryFromIndex(String cleanQuery, String scene) {
try {
IndexSearcher indexSearcher = indexSearcherMap.get(scene);
if (indexSearcher == null || indexSearcher.getIndexReader() == null ||
indexSearcher.getIndexReader().leaves().isEmpty()) {
LOGGER.warn("No index searcher found for scene: {}, available scenes: {}",
scene, indexSearcherMap.keySet());
return new ArrayList<>();
}
List<LeafReaderContext> leaves = indexSearcher.getIndexReader().leaves();
if (leaves.isEmpty()) {
return new ArrayList<>();
}
LeafReader reader = leaves.get(0).reader();
if (reader == null) {
return new ArrayList<>();
}
Set<String> results = new HashSet<>();
Term term = new Term("content_prefix", cleanQuery);
PostingsEnum postings = reader.postings(term);
if (postings != null) {
int docId;
while ((docId = postings.nextDoc()) != PostingsEnum.NO_MORE_DOCS) {
Document doc = reader.document(docId);
String content = doc.get("content");
if (content != null) {
results.add(content);
}
}
}
return new ArrayList<>(results);
} catch (Exception e) {
LOGGER.error("Error querying index for scene: {}, query: {}", scene, cleanQuery, e);
return new ArrayList<>();
}
}