作者:来自 Elastic Jeffrey Rengifo 及 Eduard Martin

了解为什么 RAG 策略仍然适用,并能带来最高效、更优的结果。
Elasticsearch 原生集成了业界领先的生成式 AI 工具和服务商。查看我们关于超越 RAG 基础知识或构建可投入生产应用的 Elastic 向量数据库的网络研讨会。
为了为你的用例构建最佳搜索解决方案,可以开始免费云端试用,或在本地机器上尝试 Elastic。
拥有超过 100 万 tokens 的模型并不新鲜;早在一年多前,Google 就发布了具备 100 万 tokens 上下文的 Gemini 1.5。100 万个 tokens 大约相当于 2000 页 A5 文件,这在很多情况下,已经超过我们存储的全部数据。
于是问题出现了:"如果我直接把所有内容都放进 prompt 里会怎样?"
在本文中,我们将对比 RAG 和直接将所有内容发送给长上下文模型,让 LLM 分析上下文并回答问题。
你可以在这里找到完整实验的 notebook。
初步想法
在开始之前,我们可以提出一些待验证的观点:
- 便利性:并不是很多模型具备长上下文版本,因此我们的选择有限。
- 性能:LLM 处理 100 万 tokens 的速度应当比从 Elasticsearch 检索 + LLM 处理更小上下文快得多。
- 价格:每个问题的成本应明显更高。
- 精准度:RAG 系统可以有效帮助我们过滤噪音,让 LLM 聚焦于重要信息。
虽然将所有内容作为上下文发送是一个优势,但也带来一个挑战:你是否确保抓取了查询中所有相关的文档。Elasticsearch 允许你灵活组合不同策略来搜索正确的文档:过滤器、全文搜索、语义搜索和混合搜索。
测试定义
模型 / RAG 规格
- LLM 模型: gemini-2.0-flash
- 模型提供商: Google
- 数据集: Elasticsearch search labs 文章集
我们将针对每个测试案例评估以下内容:
- LLM 成本
- 端到端延迟
- 回答准确性
测试案例
基于一组 Elasticsearch 文章数据集,我们将针对两种不同类型的问题测试两种策略:RAG 和 LLM 全上下文:
-
文本型:问题内容与文档中原文完全一致。
-
非文本型:问题内容在文档中并未直接出现,LLM 需要进行推理或整合多个片段的信息。
运行测试
1)索引数据
下载 NDJSON 格式的数据集以运行以下步骤:
以下步骤和截图均来自一个云托管部署。在部署中,进入 "Overview" 页面,向下滚动并点击 "Upload a file"。然后点击 "here",因为我们需要添加自定义的 mappings。


在新页面中,拖拽包含数据集的 ndjson 文件,然后点击 import。

然后,点击 advanced,输入索引名称,并添加以下 mappings:
bash
`
1. {
2. "properties": {
3. "text": { "type": "text", "copy_to": "semantic_text" },
4. "meta_description": { "type": "keyword", "copy_to": "semantic_text" },
5. "title": { "type": "keyword", "copy_to": "semantic_text" },
6. "imported_at": { "type": "date" },
7. "url": { "type": "keyword" },
8. "semantic_text": {
9. "type": "semantic_text"
10. }
11. }
12. }
`AI写代码
点击 import 完成操作,并等待数据被索引。



2)文本型 RAG - Textual RAG
我提取了文章《Elasticsearch in JavaScript the proper way, part II》的一段内容,作为查询字符串使用。
ini
`
1. query_str = """
2. Let's now create a test.js file and install our mock client: Now, add a mock for semantic search: We can now create a test for our code, making sure that the Elasticsearch part will always return the same results: Let's run the tests.
3. """
`AI写代码
运行短语匹配查询
这是我们将使用的查询,通过短语匹配搜索功能从 Elasticsearch 中检索结果。我们会将 query_str 作为输入传入短语匹配搜索。
bash
`
1. textual_rag_summary = {} # Variable to store results
3. start_time = time.time()
5. es_query = {
6. "query": {"match_phrase": {"text": {"query": query_str}}},
7. "_source": ["title"],
8. "highlight": {
9. "pre_tags": [""],
10. "post_tags": [""],
11. "fields": {"title": {}, "text": {}},
12. },
13. "size": 10,
14. }
16. response = es_client.search(index=index_name, body=es_query)
17. hits = response["hits"]["hits"]
19. textual_rag_summary["time"] = (
20. time.time() - start_time
21. ) # save time taken to run the query
22. textual_rag_summary["es_results"] = hits # save hits
24. print("ELASTICSEARCH RESULTS: \n", json.dumps(hits, indent=4))
`AI写代码
返回的匹配结果:
less
`
1. ELASTICSEARCH RESULTS:
2. [3. {4. "_index": "technical-articles",5. "_id": "tnWcO5cBTbKqUnB5yeVn",6. "_score": 36.27694,7. "_source": {8. "title": "Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs"9. },10. "highlight": {11. "text": [12. "Let\u2019s now create a test.js file and install our mock client: Now, add a mock for semantic search: We can now create a test for our code, making sure that the Elasticsearch part will always return the same results: Let\u2019s run the tests"13. ]
14. }
15. }
16. ]
`AI写代码
该 prompt 模板为 LLM 提供了回答问题的指令和所需上下文。prompt 的结尾部分,我们请求提供包含我们所需信息的文章。
该 prompt 模板将用于所有测试。
markdown
`
1. # LLM prompt template
2. template = """
3. Instructions:
5. - You are an assistant for question-answering tasks.
6. - Answer questions truthfully and factually using only the context presented.
7. - If you don't know the answer, just say that you don't know, don't make up an answer.
8. - Use markdown format for code examples.
9. - You are correct, factual, precise, and reliable.
10. - Answer
12. Context:
13. {context}
15. Question:
16. {question}.
18. What is the title article?
19. """
`AI写代码
通过 LLM 运行结果
Elasticsearch 的结果将作为上下文提供给 LLM,以便获得所需的答案。我们会提取文章标题和与用户查询相关的重点内容,然后将问题、文章标题和重点发送给 LLM 以找到答案。
ini
`
1. start_time = time.time()
3. prompt = ChatPromptTemplate.from_template(template)
5. context = ""
7. for hit in hits:
8. # For semantic_text matches, we need to extract the text from the highlighted field
9. if "highlight" in hit:
10. highlighted_texts = []
12. for values in hit["highlight"].values():
13. highlighted_texts.extend(values)
15. context += f"{hit['_source']['title']}\n"
16. context += "\n --- \n".join(highlighted_texts)
18. # Use LangChain for the LLM part
19. chain = prompt | llm | StrOutputParser()
21. printable_prompt = prompt.format(context=context, question=query_str)
22. print("PROMPT WITH CONTEXT AND QUESTION:\n ", printable_prompt) # Print prompt
24. with get_openai_callback() as cb:
25. response = chain.invoke({"context": context, "question": query_str})
27. # Save results
28. textual_rag_summary["answer"] = response
29. textual_rag_summary["total_time"] = (time.time() - start_time) + textual_rag_summary[
30. "time"
31. ] # Sum of time taken to run the semantic search and the LLM
32. textual_rag_summary["tokens_sent"] = cb.prompt_tokens
33. textual_rag_summary["cost"] = calculate_cost(
34. input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens
35. )
37. print("LLM Response:\n ", response)
`AI写代码
LLM 回复:
markdown
`1. What is the title article?
3. LLM Response:
4. Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs`AI写代码
模型找到了正确的文章。
3)LLM 文本型 - LLM Textual
全匹配查询
为了给 LLM 提供上下文,我们将从 Elasticsearch 索引的文档中获取。我们会发送所有已索引的 303 篇文章,总长度约为 100 万 tokens。
ini
`
1. textual_llm_summary = {} # Variable to store results
3. start_time = time.time()
5. es_query = {"query": {"match_all": {}}, "sort": [{"title": "asc"}], "size": 1000}
7. es_results = es_client.search(index=index_name, body=es_query)
8. hits = es_results["hits"]["hits"]
10. # Save results
11. textual_llm_summary["es_results"] = hits
12. textual_llm_summary["time"] = time.time() - start_time
14. print("ELASTICSEARCH RESULTS: \n", json.dumps(hits, indent=4))
`AI写代码
less
`
1. ELASTICSEARCH RESULTS:
2. [3. {4. "_index": "technical-articles",5. "_id": "J3WUI5cBTbKqUnB5J83I",6. "_score": null,7. "_source": {8. "meta_description": ".NET articles from Elasticsearch Labs",9. "imported_at": "2025-05-30T18:43:20.036600",10. "text": "Tutorials Examples Integrations Blogs Start free trial .NET Categories All Articles Agent AutoOps ... API Reference Elastic.co Change theme Change theme Sitemap RSS 2025. Elasticsearch B.V. All Rights Reserved.",11. "title": ".NET - Elasticsearch Labs",12. "url": "https://www.elastic.co/search-labs/blog/category/dot-net-programming"13. },14. "sort": [15. ".NET - Elasticsearch Labs"16. ]
17. },
18. ...
19. ]
`AI写代码
通过 LLM 运行结果
和上一步一样,我们将向 LLM 提供上下文并请求答案。
ini
`
1. start_time = time.time()
3. prompt = ChatPromptTemplate.from_template(template)
4. # Use LangChain for the LLM part
5. chain = prompt | llm | StrOutputParser()
7. printable_prompt = prompt.format(context=context, question=query_str)
8. print("PROMPT:\n ", printable_prompt) # Print prompt
10. with get_openai_callback() as cb:
11. response = chain.invoke({"context": hits, "question": query_str})
13. # Save results
14. textual_llm_summary["answer"] = response
15. textual_llm_summary["total_time"] = (time.time() - start_time) + textual_llm_summary[
16. "time"
17. ] # Sum of time taken to run the match_all query and the LLM
18. textual_llm_summary["tokens_sent"] = cb.prompt_tokens
19. textual_llm_summary["cost"] = calculate_cost(
20. input_tokens=cb.prompt_tokens, output_tokens=cb.completion_tokens
21. )
23. print("LLM Response:\n ", response) # Print LLM response
`AI写代码
LLM 回复:
markdown
`
1. ...
2. What is the title article?
4. LLM Response:
5. The title of the article is "Testing your Java code with mocks and real Elasticsearch".
`AI写代码
4)非文本型 RAG - RAG non-textual
第二个测试中,我们将使用语义查询从 Elasticsearch 检索结果。为此,我们构建了《Elasticsearch in JavaScript, the proper way, part II》文章的简短摘要作为 query_str,作为输入提供给 RAG。
css
`query_str = "This article explains how to improve code reliability. It includes techniques for error handling, and running applications without managing servers."`AI写代码
从现在开始,代码大多遵循与文本查询测试相同的模式,因此这些部分我们将参考 notebook 中的代码。
运行语义搜索
Notebook 参考:2. Run Comparisons > Test 2: Semantic Query > Executing semantic search。
语义搜索返回的匹配结果:
markdown
`
1. ELASTICSEARCH RESULTS:
2. [
3. ...
4. {
5. "_index": "technical-articles",
6. "_id": "KHV7MpcBTbKqUnB5TN-F",
7. "_score": 0.07619048,
8. "_source": {
9. "title": "Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs"
10. },
11. "highlight": {
12. "text": [
13. "We will review: Production best practices Error handling Testing Serverless environments Running the",
14. "how long you want to have access to it.",
15. "Conclusion In this article, we learned how to handle errors, which is crucial in production environments",
16. "DT By: Drew Tate Integrations How To May 21, 2025 Get set, build: Red Hat OpenShift AI applications powered",
17. "KB By: Kofi Bartlett Jump to Production best practices Error handling Testing Serverless environments"
18. ]
19. }
20. },
21. ...
22. ]
`AI写代码
通过 LLM 运行结果
Notebook 参考:2. Run Comparisons > Test 2: Semantic Query > Run results through LLM
LLM 回复:
markdown
`
1. ...
2. What is the title article?
4. LLM Response:
5. Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs
`AI写代码
5)LLM 非文本型 - LLM non-textual
全匹配查询
Notebook 参考:2. Run Comparisons > Test 2: Semantic Query > Match all query
全匹配查询返回结果:
less
`
1. ELASTICSEARCH RESULTS:
2. [3. {4. "_index": "technical-articles",5. "_id": "J3WUI5cBTbKqUnB5J83I",6. "_score": null,7. "_source": {8. "meta_description": ".NET articles from Elasticsearch Labs",9. "imported_at": "2025-05-30T18:43:20.036600",10. "text": "Tutorials Examples Integrations Blogs Start free trial .NET Categories All Articles ... to easily utilize Elasticsearch to build advanced search experiences including generative AI, embedding models, reranking capabilities and more. Let's connect Menu Tutorials Examples Integrations Blogs Search Additional Resources Elasticsearch API Reference Elastic.co Change theme Change theme Sitemap RSS 2025. Elasticsearch B.V. All Rights Reserved.",11. "title": ".NET - Elasticsearch Labs",12. "url": "https://www.elastic.co/search-labs/blog/category/dot-net-programming"13. },14. "sort": [15. ".NET - Elasticsearch Labs"16. ]
17. },
18. ...
19. ]
`AI写代码
通过 LLM 运行结果
Notebook 参考:2. Run Comparisons > Test 2: Semantic Query > Run results through LLM
LLM 回复:
vbscript
`
1. ...
2. What is the title article?
4. LLM Response:
5. "Elasticsearch in JavaScript the proper way, part II" and "A tutorial on building local agent using LangGraph, LLaMA3 and Elasticsearch vector store from scratch - Elasticsearch Labs" and "Advanced integration tests with real Elasticsearch - Elasticsearch Labs" and "Automatically updating your Elasticsearch index using Node.js and an Azure Function App - Elasticsearch Labs"
`AI写代码
测试结果
现在我们来展示测试结果。
文本型查询
Strategy | Answer | Tokens Sent | Time(s) | LLM Cost | |
---|---|---|---|---|---|
0 | Textual RAG | Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs | 237 | 1.281432 | 0.000029 |
1 | Textual LLM | The title of the article is "Testing your Java code with mocks and real Elasticsearch" | 1,023,231 | 45.647408 | 0.102330 |
语义查询
Strategy | Answer | Tokens Sent | Time(s) | LLM Cost | |
---|---|---|---|---|---|
0 | Semantic RAG | Elasticsearch in JavaScript the proper way, part II - Elasticsearch Labs | 1,328 | 0.878199 | 0.000138 |
1 | Semantic LLM | "Elasticsearch in JavaScript the proper way, part II" and "A tutorial on building local agent using LangGraph, LLaMA3 and Elasticsearch vector store from scratch - Elasticsearch Labs" and "Advanced integration tests with real Elasticsearch - Elasticsearch Labs" and "Automatically updating your Elasticsearch index using Node.js and an Azure Function App - Elasticsearch Labs" | 1,023,196 | 44.386912 | 0.102348 |


结论
RAG 仍然非常重要。我们的测试显示,使用大上下文模型在上下文窗口中不经过过滤直接发送数据,在价格、延迟和准确性方面都不如 RAG 系统。处理大量上下文时,模型常常会失去关注重点。
即使有大语言模型(LLM)的能力,在发送数据前过滤信息依然很关键,因为发送过多 tokens 会降低回答质量。但当无法预先过滤或答案需要大量数据支撑时,大上下文 LLM 仍然有价值。
此外,确保在 RAG 系统中使用正确查询以获得完整且准确答案也很重要。你可以测试不同查询参数以检索不同数量的文档,直到找到最适合你的方案。
- 便利性:使用 RAG 发送给 LLM 的平均 tokens 为 783,低于所有主流模型的最大上下文窗口。
- 性能:RAG 查询速度明显更快,平均 1 秒,而纯 LLM 方式约需 45 秒。
- 价格 :RAG 查询的平均成本( <math xmlns="http://www.w3.org/1998/Math/MathML"> 0.00008 )比纯 L L M 方式( 0.00008)比纯 LLM 方式( </math>0.00008)比纯LLM方式(0.1)低约 1250 倍。
- 准确性:RAG 系统在所有测试中均产生准确回答,而全上下文方式则出现不准确情况。
原文:Longer context ≠ Better: Why RAG still matters - Elasticsearch Labs