milvus插入数据时,明明不超长,但总是报长度错误?

在处理插入milvus数据时,设置了字段长度为512. 明明考虑了预留,插入的数据中没有这么长的,但还是会有报错 类似:MilvusException: (code=0, message=the length (564) of 78th string exceeds max length (512)

查找max(len(x) for x in temp_list)之类 都没有超过512过,也没超过256过,不知道哪里的数据有问题..

反复截段文本等测试后发现,例如用len(x)看到的字符串长度是10,但保存进milus的长度,并不是..

举例,把数据库长度设为一个小值16:

FieldSchema(name="question", dtype=DataType.VARCHAR, auto_id=False, max_length=16)

再把数据缩到只有一行 测试结果插入成功:

line contents is : 你好呀你好 and length is 5

Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 00:01\<00:00, 1.02s/it

index handle result: Status(code=0, message=)

insert result: (insert count: 1, delete count: 0, upsert count: 0, timestamp: 449735609509740549, success count: 1, err count: 0)

再增加一点文字长度 就报错了:

line contents is : 你好呀你好呀 and length is 6

Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 00:00\<00:00, 1.03it/s

index handle result: Status(code=0, message=)

2024-05-13 20:59:27,915 decorators.py:134 ERROR RPC error: batch_insert, <MilvusException: (code=0, message=the length (18) of 0th string exceeds max length (16))>, <Time:{'RPC start': '2024-05-13 20:59:27.912751', 'RPC error': '2024-05-13 20:59:27.915058'}>

Traceback (most recent call last):

File "/root/temp_dir/run_task.py", line 55, in <module>

XXX().create_insert_vector_db()

File "/root/temp_dir/app/service/vector_db/xx_pre_handle.py", line 63, in create_insert_vector_db

).get_or_create_db(fields, description, "possible_question_embeddings", entities)

File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 23, in get_or_create_db

return self.create_and_insert(fields, description, index_field_name, entities)

File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 28, in create_and_insert

self.insert_db(entities)

File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 40, in insert_db

insert_result = self.collection.insert(entities)

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 497, in insert

res = conn.batch_insert(

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 135, in handler

raise e from e

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 131, in handler

return func(*args, **kwargs)

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 170, in handler

return func(self, *args, **kwargs)

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 110, in handler

raise e from e

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 74, in handler

return func(*args, **kwargs)

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 566, in batch_insert

raise err from err

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 560, in batch_insert

check_status(response.status)

File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/utils.py", line 54, in check_status

raise MilvusException(status.code, status.reason, status.error_code)

pymilvus.exceptions.MilvusException: <MilvusException: (code=0, message=the length (18) of 0th string exceeds max length (16))>

所以,可能是因为UTF-8或其他编码的原因,一些非ASCII字符可能被编码成多个字节 以保存进milvus。

所以,解决方案是 建表时FieldSchema中把max_length 设置为4倍或其他倍数于预期的最大长度。

相关推荐
荣码5 小时前
LangGraph多Agent协作:3个Agent干活比1个强,但我踩了4个坑
java·python
用户83562907805120 小时前
Python 操作 PDF 附件:添加、查看与管理指南
后端·python
宇宙之一粟1 天前
乐企版式文件生成平台
java·后端·python
学测绘的小杨2 天前
CompassFusion:一个从 GNSS 到 GNSS/INS 组合导航的独立工程包
python
zzzzzz3102 天前
当产品经理说这个很简单:我用Python自动化处理奇葩需求的实战指南
python·pycharm·产品经理
雪隐2 天前
个人电脑玩AI-06让5060 Ti给你打工——不光能画画,Qwen3-TTS还能学人说话,连我老板都信了!
人工智能·后端·python
兵慌码乱3 天前
面向桌面端的资产管理系统分层架构设计与核心模块实现
python·系统架构·sqlite·pyqt5·数据库设计·桌面应用开发·mvc架构
hboot3 天前
AI工程师第三课 - 机器学习基础
python·scikit-learn·kaggle
顾林海3 天前
Agent入门阶段-编程基础-Python:流程控制
python·agent·ai编程
呱呱复呱呱3 天前
Django CBV 源码解读:一个请求是怎么找到你的 get() 方法的
python·django