Inconsistent Query Results Based on Output Fields Selection in Milvus Dashboard

**题意:**在Milvus仪表盘中基于输出字段选择的不一致查询结果

问题背景:

I'm experiencing an issue with the Milvus dashboard where the search results change based on the selected output fields.

I'm working on a RAG project using text data converted into embeddings, stored in a Milvus collection with around 8000 elements. Last week, my retrieval results matched my expectations ("good" results), however, this week, the results have degraded ("bad" results).

I found that when I exclude the embeddings_vector field from the output fields in the Milvus dashboard, I get the "good" results; Including the embeddings_vector field in the output changes the results to "bad".

I've attached two screenshots showing the difference in the results based on the selected output fields.

Any ideas on what's causing this or how to fix it?

Environment:

Python 3.11 pymilvus 2.3.2 llama_index 0.8.64

Thanks in advance!

python 复制代码
from llama_index.vector_stores import MilvusVectorStore
from llama_index import ServiceContext, VectorStoreIndex

# Some other lines..

# Setup for MilvusVectorStore and query execution
vector_store = MilvusVectorStore(uri=MILVUS_URI,
                                 token=MILVUS_API_KEY,
                                 collection_name=collection_name,
                                 embedding_field='embeddings_vector',
                                 doc_id_field='chunk_id',
                                 similarity_metric='IP',
                                 text_key='chunk_text')

embed_model = get_embeddings()
service_context = ServiceContext.from_defaults(embed_model=embed_model, llm=llm)
index = VectorStoreIndex.from_vector_store(vector_store=vector_store, service_context=service_context)
query_engine = index.as_query_engine(similarity_top_k=5, streaming=True)

rag_result = query_engine.query(prompt)

Here is the "good" result: "good" result And here is the "bad" result: "bad" result

问题解决:

I would like to suggest you to follow below considerations.

  • Ensure that your Milvus collection is correctly indexed. Indexing plays a crucial role in how search results are retrieved and ordered. If the index configuration has changed or is not optimized, it might affect the retrieval quality.
  • In your screenshots, the consistency level is set to "Bounded". Try experimenting with different consistency levels (e.g., "Strong" or "Eventually") to see if it impacts the results. Consistency settings can influence the real-time availability of the indexed data.
  • Review the query parameters, especially the similarity_metric. Since you're using IP (Inner Product) as the similarity metric, ensure that your embedding vectors are normalized correctly. Inner Product search works best with normalized vectors.
  • Verify that the embedding vectors are of consistent quality and scale. If there were changes in the embedding model or preprocessing steps, it could lead to variations in the search results.
  • The inclusion of the embeddings_vector field in the output might affect the way Milvus scores and ranks the results. It's possible that returning the raw embeddings affects the internal ranking logic. Ensure that including this field does not inadvertently alter the search behavior.
  • Check the Milvus server logs and performance metrics to identify any anomalies or changes in the search behavior. This might provide insights into why the results differ when the embeddings_vector field is included.
  • Ensure that there are no version mismatches between the client (pymilvus) and the Milvus server. Sometimes, discrepancies between versions can cause unexpected behavior.
  • As a last resort, try modifying your code to exclude the embeddings_vector field programmatically during retrieval and compare the results. This can help isolate whether the issue is indeed caused by including the embeddings in the output.
  • Please try out this code if it helps.
相关推荐
AI绘画哇哒哒1 小时前
Agent三种思考模式深度解析:CoT/ReAct/Plan-and-Execute,小白程序员必看,助你轻松掌握大模型精髓(收藏版)
人工智能·学习·ai·程序员·大模型·产品经理·转行
小江的记录本1 小时前
【Java基础】核心关键字:final、static、volatile、synchronized、transient(附《思维导图》+《面试高频考点清单》)
java·前端·数据结构·后端·ai·面试·ai编程
me8321 小时前
【AI】踩坑LangChain4j集成千问模型:版本适配问题完整解决历程
java·spring·阿里云·ai
weixin_449290012 小时前
Dify 企业数字安全一键配置模板
ai
TheRouter2 小时前
OpenClaw 上下文瘦身:3 个实验
开发语言·python·ai
码农阿强3 小时前
MiniMax speech-2.8-hd 技术详解与API接入实战
人工智能·ai·aigc
一切皆是因缘际会3 小时前
依托记忆结构心智体系,AI 自主意识进化路径
大数据·人工智能·安全·搜索引擎·ai
OpenBayes3 小时前
外语、方言、少数民族语言全覆盖:Hy-MT1.5 支持 1056 个翻译方向;MIT 联合发布 MathNet:涵盖 2.7 万道奥数真题的多模态数学推理基准
人工智能·深度学习·ai·agent
哥布林学者3 小时前
深度学习进阶(二十四)Swin 的二维 RPE
机器学习·ai
嗷嗷哦润橘_3 小时前
whynotTV徐丹飞:离通用智能机器人还有多远
人工智能·ai·具身智能