目 录
-
- [1. 背景:为什么 RAG 需要判断上下文是否充分](#1. 背景:为什么 RAG 需要判断上下文是否充分)
- [2. 第一版:只按 retrieved_chunks 数量判断](#2. 第一版:只按 retrieved_chunks 数量判断)
- [3. 第二版:加入 FAISS distance 信号](#3. 第二版:加入 FAISS distance 信号)
-
- [3.1 在 retrieve 阶段保留 distance 和 retrieval_rank](#3.1 在 retrieve 阶段保留 distance 和 retrieval_rank)
- [3.2 用 best_distance 和 avg_top_distance 做轻量判断](#3.2 用 best_distance 和 avg_top_distance 做轻量判断)
- [3.3 距离规则解决了什么,又解决不了什么](#3.3 距离规则解决了什么,又解决不了什么)
- [4. FAISS distance 失效原因](#4. FAISS distance 失效原因)
- [5. 第三版:加入 LLM Relevance Gate](#5. 第三版:加入 LLM Relevance Gate)
-
- [5.1 LLM Judge 如何判断 retrieved chunks 是否能回答问题](#5.1 LLM Judge 如何判断 retrieved chunks 是否能回答问题)
- [5.2 双层 gate:distance gate + relevance gate](#5.2 双层 gate:distance gate + relevance gate)
- [5.3 context_metrics 如何记录判断过程](#5.3 context_metrics 如何记录判断过程)
1. 背景:为什么 RAG 需要判断上下文是否充分
通过RAG技术可以在一定程度上消除大模型的幻觉,但是如果召回的知识不够充分,那么大模型的回答仍会存在一定幻觉,验证上下文是否充分是为了判断大模型回答的结果是否可信。一般在有充分的上下文作为支撑的时候,我们才能相信大模型给了我们正确答案。围绕判断检索是否充分,最近我做了几次迭代。
2. 第一版:只按 retrieved_chunks 数量判断
前面为了避免RAG系统在没有检测到知识的时候,通过编造回答问题,我通过检测retrieved_chunks的数量判断模型回答的结果是否充分。
python
# 最小版 context sufficiency 规则:
# 当前先用 retrieved_chunks 数量判断,后续可以升级为相似度阈值 / rerank 分数 / Reflection 判断
context_sufficient = len(retrieved_chunks) >= 2
但是这种方式是有问题的,前面这种写法逻辑是"检索到≥2个 chunk 就算够",这个标准不依赖问题、不依赖相似度、不依赖 chunk 内容。
"假设用户问的是论文里没有的内容,FAISS 也会返回 top-k 个最近邻,照样≥2,那怎么判断不充分?"
前面的逻辑直接破。所以对前面的逻辑需要进行改进,改成基于 FAISS 距离阈值或者用 LLM 判断才能保证逻辑有效。
3. 第二版:加入 FAISS distance 信号
针对前面的问题今天做了两代改进,第一代改进是通过FAISS distance作为粗约束,如果找到的chunks与query的相似度相去甚远直接就可以判断回答充分性不足。今天的改进主要集中在app/rag_system.py中,首先在文件的开头增加一些经验阈值:
python
# 轻量级上下文充分性阈值。
# FAISS IndexFlatL2 为相似度更高的向量返回更小的距离。
# 这些阈值是经验性的,应使用较小的评估集进行调优。
MIN_CONTEXT_CHUNKS = 2
CONTEXT_TOP_N_FOR_AVG = 3
CONTEXT_MAX_BEST_DISTANCE = 2.2
CONTEXT_MAX_AVG_TOP_DISTANCE = 2.4
3.1 在 retrieve 阶段保留 distance 和 retrieval_rank
早期的代码在利用FAISS进行retrieve的时候直接丢弃了distance和retrieve_reak:
python
def retrieve(self, query, k=5):
query_vec = get_embedding(query).reshape(1, -1)
distances, indices = self.index.search(query_vec, k)
return [self.chunks[i] for i in indices[0]]
所以今天的修改要保留他们用于判断距离是否达标:
python
def retrieve(self, query, k=5):
if self.index is None:
raise RuntimeError("FAISS index has not been built.")
if not self.chunks:
logger.warning("[retrieve] no chunks available in RAGSystem")
return []
k = min(k, len(self.chunks))
query_vec = get_embedding(query).reshape(1, -1)
distances, indices = self.index.search(query_vec, k)
results = []
for rank, (idx, distance) in enumerate(zip(indices[0], distances[0]), start=1):
idx = int(idx)
if idx < 0 or idx >= len(self.chunks):
continue
chunk = dict(self.chunks[idx])
chunk["distance"] = float(distance)
chunk["retrieval_rank"] = rank
results.append(chunk)
if results:
best_distance = min(c["distance"] for c in results)
logger.info(
f"[retrieve] query='{query}', returned={len(results)}, "
f"best_distance={best_distance:.4f}"
)
else:
logger.warning(f"[retrieve] query='{query}', no valid chunks returned")
return results
3.2 用 best_distance 和 avg_top_distance 做轻量判断
在做完上一步后,每一个返回的retrieved会多出两个字段:
python
{
"source": "...",
"text": "...",
"distance": 0.8321,
"retrieval_rank": 1
}
有了这两个字段后,可以基于这个这两个字段判别检索到的chunks相关度怎么样?主要用两个阈值进行约束,分别是best_distance和avg_top_distance进行约束,主要逻辑是判断他们是否小于根据经验定义的阈值,如果大于阈值则证明他们与问题相差较远,回答不充分,这一步需要写一个判断函数:
python
def assess_context_sufficiency(self, retrieved_chunks):
"""
Lightweight context sufficiency check.
Current rule:
- enough chunks are retrieved
- best FAISS distance is below threshold
- average distance of top chunks is below threshold
This is not a perfect factuality check. It is a lightweight guardrail
to avoid treating FAISS top-k results as sufficient only because they exist.
"""
num_chunks = len(retrieved_chunks)
metrics = {
"num_chunks": num_chunks,
"min_required_chunks": MIN_CONTEXT_CHUNKS,
"best_distance": None,
"avg_top_distance": None,
"max_best_distance": CONTEXT_MAX_BEST_DISTANCE,
"max_avg_top_distance": CONTEXT_MAX_AVG_TOP_DISTANCE,
"reason": "",
}
if num_chunks < MIN_CONTEXT_CHUNKS:
metrics["reason"] = "Not enough retrieved chunks."
logger.info(f"[context_sufficiency] {metrics}")
return False, metrics
distances = [
c.get("distance")
for c in retrieved_chunks
if c.get("distance") is not None
]
if not distances:
metrics["reason"] = "No FAISS distance is available for retrieved chunks."
logger.warning(f"[context_sufficiency] {metrics}")
return False, metrics
sorted_distances = sorted(float(d) for d in distances)
top_distances = sorted_distances[:min(CONTEXT_TOP_N_FOR_AVG, len(sorted_distances))]
best_distance = sorted_distances[0]
avg_top_distance = sum(top_distances) / len(top_distances)
metrics["best_distance"] = round(best_distance, 4)
metrics["avg_top_distance"] = round(avg_top_distance, 4)
context_sufficient = (
best_distance <= CONTEXT_MAX_BEST_DISTANCE
and avg_top_distance <= CONTEXT_MAX_AVG_TOP_DISTANCE
)
if context_sufficient:
metrics["reason"] = "Retrieved chunks meet the lightweight distance-based sufficiency rule."
else:
metrics["reason"] = "Retrieved chunks exist, but FAISS distances are not strong enough."
logger.info(f"[context_sufficiency] {metrics}")
return context_sufficient, metrics
有了判别函数后,随后还有一个关键文件和函数需要修改,分别是ask_with_trace()
python
retrieved_chunks = []
for c in best_chunks:
retrieved_chunks.append({
"source": c["source"],
"text": c["text"],
})
# 最小版 context sufficiency 规则:
# 当前先用 retrieved_chunks 数量判断,后续可以升级为相似度阈值 / rerank 分数 / Reflection 判断
context_sufficient = len(retrieved_chunks) >= 2
改为:
python
retrieved_chunks = []
for c in best_chunks:
retrieved_chunks.append({
"source": c["source"],
"text": c["text"],
"distance": c.get("distance"),
"retrieval_rank": c.get("retrieval_rank"),
})
context_sufficient, context_metrics = self.assess_context_sufficiency(retrieved_chunks)
app/graph/state.py中新增:
python
# context sufficiency 的轻量判断指标
# 例如 best_distance / avg_top_distance / threshold / reason
context_metrics: dict[str, Any]
app/graph/nodes.py中返回的结果加上判别表context_metrics:
python
if isinstance(output, dict):
return {
"final_answer": output.get("answer", ""),
"retrieved_chunks": output.get("retrieved_chunks", []),
"context_sufficient": output.get("context_sufficient"),
"context_metrics": output.get("context_metrics", {}),
"workflow_path": workflow_path
}
app/main.py返回的agent_trace增加对应字段:
python
agent_trace = {
"route_decision": decision,
"tool_used": tool_result.get("tool_name", ""),
"tool_input": tool_result.get("tool_input", ""),
"fallback_used": result.get("fallback_used", False),
"context_sufficient": result.get("context_sufficient"),
"context_metrics": result.get("context_metrics", {}),
"error": result.get("error", ""),
"retry_count": result.get("retry_count", 0),
"workflow": result.get(
"workflow_path",
["choose_tool", "execute_tool", "generate_answer"]
)
}
3.3 距离规则解决了什么,又解决不了什么
但是再测试中当我测试一个论文但不存在的问题时:
{
"session_id": "context-distance-fake",
"question": "What does paper1 say about reinforcement learning for robot navigation?"
}
返回结果很割裂,模型认为结果不充分,但是FAISS distance的判别结果是True。这表明:
context_sufficient 现在判断的是"检索结果是否相对接近",但没有判断"这些结果是否真的回答了用户问的具体主题"。
json
{
"session_id": "context-distance-fake",
"question": "What does paper1 say about reinforcement learning for robot navigation?",
"answer": "The evidence is insufficient. The provided text from Paper1 discusses speech steganalysis using a separable convolution network with dual-stream pyramid enhanced strategy, and does not mention reinforcement learning or robot navigation.",
"chunks": [
{
"source": "Paper3.pdf",
"text": "ging a Transformer encoder for V oIP feature extraction and enhancing feature learning through its attention mechanism. By innovatively integrating the Cut- Mix technique, we have facilitated an augmented exchange of information between different steganographic domains, thereby improving the model's generalization capability. Fur- thermore, we implemented a supervised contrastive learning strategy, using constructed V oIP sample triplets to segregate normal and steganographic features in the latent space, thus enhancing detection precision. Additionally, the inclusion of domain separation components in the training process has bol- stered the model's consideration of the relationships betwee",
"distance": 1.6077574491500854,
"retrieval_rank": 1
},
{
"source": "Paper1.pdf",
"text": "a novel steganalysis method based on separable convo- lution network (SepSteNet) with dual-stream pyramid enhanced strategy (DPES). Specifically, to better acquire discriminative representations, we design the pulse-aware separable block to capture the pulse correspondence along independent levels of pulse positions, where the pulse-aware excitation module is plugged to avoid noisy clue accumulation by adaptively emphasizing the salient part. Moreover, the global attending block is introduced to enhance correspondence features through calculating global responses at distinct subframes. In addition, to eliminate the negative impact of sample content, DPES is leveraged to incorporate cross-dom",
"distance": 1.6735026836395264,
"retrieval_rank": 2
},
{
"source": "Paper3.pdf",
"text": "main separation components in the training process has bol- stered the model's consideration of the relationships between steganographic domains, resulting in clearer class decision boundaries. Extensive experimental results demonstrate that our method significantly outperforms existing technologies in universal V oIP steganography detection. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 979-8-3503-6874-1/25/$31.00 ©2025 IEEE | DOI: 10.1109/ICASSP49660.2025.10890478 Authorized licensed use limited to: National Huaqiao University. Downloaded on November 06,2025 at 12:26:23 UTC from IEEE Xplore. Restrictions apply.",
"distance": 1.6858768463134766,
"retrieval_rank": 3
},
{
"source": "Paper1.pdf",
"text": "esults demonstrate that the presented method significantly outperforms the existing ones. Furthermore, DPES is shown to be a general enhancement strategy that can effectively improve the performance of the existing deep neural network for speech steganalysis. The source code for this work is publicly available on https://github.com/BarryxxZ/SepSteNetwithDPES. Index Terms--- Steganalysis, separable convolution, pulse posi- tion, dual-stream network, calibration. Manuscript received 9 November 2022; revised 7 March 2023; accepted 17 April 2023. Date of publication 24 April 2023; date of current version 3 May 2023. This work was supported in part by the National Natural Science Foundation of Chin",
"distance": 1.6936635971069336,
"retrieval_rank": 4
},
{
"source": "Paper1.pdf",
"text": "; date of current version 3 May 2023. This work was supported in part by the National Natural Science Foundation of China under Grant 61972168 and in part by the Major Science and Technology Project of Xiamen (Industry and Information Tech- nology Area) under Grant 3502Z20231007. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Pedro Comesana. (Corresponding author: Hui Tian.) Yiqin Qiu and Hui Tian are with the College of Computer Science and Technology, National Huaqiao University, Xiamen 361021, China, and also with the Xiamen Key Laboratory of Data Security and Blockchain Technology, Xiamen 361021, China (e-mail: barryxxz@stu.hqu.ed",
"distance": 1.7195978164672852,
"retrieval_rank": 5
},
{
"source": "Paper3.pdf",
"text": "for VoIP Steganalysis (DAEF-VS), which harnesses the CutMix technology to enhance the shared steganographic domain features and employs the Domain-Aware Learning Model to fine-tune these features, thereby significantly improving generalization capabilities. Extensive experimental results demonstrate that our approach vastly surpasses existing advanced methods in terms of universality across a variety of steganographic detection scenarios. Index Terms---VoIP Steganalysis, Contrastive Learning, Domain-Aware, Generality. I. I NTRODUCTION Steganography [ 1], [2] embeds secret information within public carriers to protect user privacy and is used in various social media forms, such as images [3]--[5",
"distance": 1.7264275550842285,
"retrieval_rank": 6
},
{
"source": "Paper2.pdf",
"text": "us ordinary carriers and transmitting it over public Received 8 November 2024; revised 12 April 2025; accepted 26 May 2025. Date of publication 10 June 2025; date of current version 20 June 2025. This work was supported in part by the National Natural Science Foundation of China under Grant 62302059 and Grant 62172053. The associate editor coordinating the review of this article and approving it for publication was Prof. Fernando Perez-Gonzalez. (Pengcheng Zhou and Zhengyang Fang contributed equally to this work.) (Corresponding authors: Zhongliang Yang; Linna Zhou.) Pengcheng Zhou is with the International School, Beijing University of Posts and Telecommunications, Beijing 100876, China. Zh",
"distance": 1.7299573421478271,
"retrieval_rank": 7
},
{
"source": "Paper1.pdf",
"text": "t subframes. In addition, to eliminate the negative impact of sample content, DPES is leveraged to incorporate cross-domain coherence features by the inverted connected dual-stream branches. With the original and calibration speech samples, two branches enable the cor- respondence of two detection feature domains to interact with each other to generate coherence features independent of sample content, thereby improving the detection performance. The per- formance of the presented method is comprehensively evaluated and compared with the state of the arts. The experimental results demonstrate that the presented method significantly outperforms the existing ones. Furthermore, DPES is shown to",
"distance": 1.7343811988830566,
"retrieval_rank": 8
},
{
"source": "Paper1.pdf",
"text": "IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY , VOL. 18, 2023 2737 Separable Convolution Network With Dual-Stream Pyramid Enhanced Strategy for Speech Steganalysis Yiqin Qiu , Hui Tian , Senior Member, IEEE, Haizhou Li , Fellow, IEEE, Chin-Chen Chang , Fellow, IEEE, and Athanasios V . Vasilakos , Senior Member, IEEE Abstract--- Steganography based on fixed codebook has become one of the most important branches of speech steganography due to its high imperceptibility and having the largest available carrier space. As its countermeasure technique, this paper presents a novel steganalysis method based on separable convo- lution network (SepSteNet) with dual-stream pyramid enhanced strat",
"distance": 1.7463574409484863,
"retrieval_rank": 9
},
{
"source": "Paper3.pdf",
"text": ", steganalysis techniques are predominantly spe- cialized, detecting based on the prior knowledge of specific steganography methods. With the advancement of AI, the precision and automation level of specialized V oIP steganalysis have significantly improved, employing techniques such as feature extraction, classification, deep neural networks like CNN [15], [16], RNN [17], Attention mechanisms [18]-- [20], and hybrid networks [21]--[24]. However, despite these advancements, specialized methods still exhibit performance † Equal Contribution. ∗ Corresponding Author. This work was supported in part by the National Key Research and Development Program of China under Grant 2023YFC3305401, Grant 202",
"distance": 1.7495293617248535,
"retrieval_rank": 10
}
],
"agent_trace": {
"route_decision": {
"tool": "rag",
"input": "What does paper1 say about reinforcement learning for robot navigation?",
"reason": "The question asks about the content of a specific local paper (paper1)."
},
"tool_used": "rag",
"tool_input": "What does paper1 say about reinforcement learning for robot navigation?",
"fallback_used": false,
"context_sufficient": true,
"context_metrics": {
"num_chunks": 10,
"min_required_chunks": 2,
"best_distance": 1.6078,
"avg_top_distance": 1.6557,
"max_best_distance": 1.85,
"max_avg_top_distance": 1.95,
"reason": "Retrieved chunks meet the lightweight distance-based sufficiency rule."
},
"error": "",
"retry_count": 0,
"workflow": [
"choose_tool",
"execute_tool",
"generate_answer"
]
}
}
4. FAISS distance 失效原因
FAISS distance 是相对的,不是绝对的。 "最近邻"不等于"相关"。即使知识库里所有论文都和问题无关,FAISS 照样给你返回 top-k 个距离最小的------它只能告诉你"这是索引里最接近的",不能告诉你"它真的回答了你的问题"。距离阈值能挡住"完全没召回 chunk"这种极端情况,但挡不住"召回了但完全不相关"这种情况。
为解决上面的问题,最好加一个 LLM 相关性 gate,具体做法是:召回之后、生成之前,再用一次 LLM 单独判断"这些 chunk 真的能回答这个问题吗"。
5. 第三版:加入 LLM Relevance Gate
5.1 LLM Judge 如何判断 retrieved chunks 是否能回答问题
这一步需要在app/rag_system.py增加一个函数,这个函数的作用是根据语义判断召回的chunks是否足够回答用户问题:
python
def assess_context_relevance_with_llm(self, question, retrieved_chunks):
"""
LLM-based relevance gate.
FAISS distance can tell which chunks are nearest in vector space,
but it cannot reliably tell whether the chunks directly answer the question.
This gate checks whether the retrieved passages are actually relevant
enough to support an answer.
"""
if not retrieved_chunks:
return False, {
"llm_relevance_check": False,
"llm_relevance_verdict": "NO",
"llm_relevance_reason": "No chunks retrieved.",
"llm_relevance_error": "",
}
preview = ""
for i, c in enumerate(retrieved_chunks[:3], start=1):
source = c.get("source", "unknown")
distance = c.get("distance")
text = c.get("text", "")
snippet = text[:350]
preview += (
f"[Chunk {i}]\n"
f"Source: {source}\n"
f"Distance: {distance}\n"
f"Text: {snippet}\n\n"
)
prompt = f"""
You are a strict relevance judge for a RAG system.
Your job is to decide whether the retrieved passages contain enough information to directly answer the user's question.
Question:
{question}
Retrieved passages:
{preview}
Decision rules:
- Reply YES if the passages contain information that directly answers the question.
- Reply NO if the passages are about a different topic.
- Reply NO if the passages only partially overlap with the question but do not answer the key point.
- Reply NO if the user asks about something not supported by the retrieved passages.
- Do not answer the user's question.
- Do not explain in multiple paragraphs.
Return exactly one line in this format:
YES - short reason
or
NO - short reason
"""
try:
response = client.chat.completions.create(
model=CHAT_MODEL,
messages=[{"role": "user", "content": prompt}],
temperature=0,
)
verdict = response.choices[0].message.content.strip()
verdict_upper = verdict.upper()
if verdict_upper.startswith("YES"):
is_relevant = True
elif verdict_upper.startswith("NO"):
is_relevant = False
else:
logger.warning(f"[relevance_gate] unexpected verdict format: {verdict}")
# If the judge output is malformed, fall back to allowing generation,
# but record the issue in trace for observability.
return True, {
"llm_relevance_check": True,
"llm_relevance_verdict": verdict[:200],
"llm_relevance_reason": "Unexpected judge output format. Falling back to distance-based result.",
"llm_relevance_error": "unexpected_verdict_format",
}
logger.info(f"[relevance_gate] verdict={verdict}")
return is_relevant, {
"llm_relevance_check": is_relevant,
"llm_relevance_verdict": verdict[:200],
"llm_relevance_reason": verdict[:200],
"llm_relevance_error": "",
}
except Exception as e:
logger.warning(f"[relevance_gate] LLM relevance check failed: {e}")
# Do not break the whole RAG answer when the judge fails.
# Fall back to distance-based result, and expose the error in trace.
return True, {
"llm_relevance_check": True,
"llm_relevance_verdict": "FALLBACK",
"llm_relevance_reason": "LLM relevance check failed. Falling back to distance-based result.",
"llm_relevance_error": str(e),
}
5.2 双层 gate:distance gate + relevance gate
随后修改ask_with_trace()函数将:
python
context_sufficient, context_metrics = self.assess_context_sufficiency(retrieved_chunks)
改成:
python
distance_sufficient, context_metrics = self.assess_context_sufficiency(retrieved_chunks)
context_relevant, relevance_metrics = self.assess_context_relevance_with_llm(
question=question,
retrieved_chunks=retrieved_chunks
)
context_metrics.update({
"distance_gate_passed": distance_sufficient,
"llm_relevance_check": relevance_metrics.get("llm_relevance_check"),
"llm_relevance_verdict": relevance_metrics.get("llm_relevance_verdict"),
"llm_relevance_reason": relevance_metrics.get("llm_relevance_reason"),
"llm_relevance_error": relevance_metrics.get("llm_relevance_error"),
})
context_sufficient = distance_sufficient and context_relevant
if context_sufficient:
context_metrics["final_sufficiency_reason"] = (
"Context passed both the distance gate and the LLM relevance gate."
)
elif not distance_sufficient:
context_metrics["final_sufficiency_reason"] = (
"Context failed the distance-based retrieval gate."
)
elif not context_relevant:
context_metrics["final_sufficiency_reason"] = (
"Context passed the distance gate but failed the LLM relevance gate."
)
else:
context_metrics["final_sufficiency_reason"] = (
"Context failed the lightweight sufficiency check."
)
通过上面的修改,context_sufficient将会由distance_sufficient 和 context_relevant 共同决定。
5.3 context_metrics 如何记录判断过程
注意这里:
python
context_metrics.update({
"distance_gate_passed": distance_sufficient,
"llm_relevance_check": relevance_metrics.get("llm_relevance_check"),
"llm_relevance_verdict": relevance_metrics.get("llm_relevance_verdict"),
"llm_relevance_reason": relevance_metrics.get("llm_relevance_reason"),
"llm_relevance_error": relevance_metrics.get("llm_relevance_error"),
})
将会记录判别信息,最后在返回的时候需要新增判别信息字段:
python
if not context_sufficient:
return {
"answer": (
"当前检索到的论文片段不足以直接支持这个问题的可靠回答。"
"系统不会基于不相关或证据不足的片段强行推断。"
"建议换一种更具体的问题,或补充包含相关内容的论文。"
),
"retrieved_chunks": retrieved_chunks,
"context_sufficient": False,
"context_metrics": context_metrics,
}
如果这篇文章对你有帮助,可以点个赞~
完整代码地址:https://github.com/1186141415/Paper-RAG-Agent-with-LangGraph