milvus插入数据时,明明不超长,但总是报长度错误?

在处理插入milvus数据时,设置了字段长度为512. 明明考虑了预留,插入的数据中没有这么长的,但还是会有报错 类似:MilvusException: (code=0, message=the length (564) of 78th string exceeds max length (512)

查找max(len(x) for x in temp_list)之类 都没有超过512过,也没超过256过,不知道哪里的数据有问题..

反复截段文本等测试后发现,例如用len(x)看到的字符串长度是10,但保存进milus的长度,并不是..

举例,把数据库长度设为一个小值16:

FieldSchema(name="question", dtype=DataType.VARCHAR, auto_id=False, max_length=16)

再把数据缩到只有一行 测试结果插入成功:

line contents is : 你好呀你好 and length is 5

Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.02s/it]

index handle result: Status(code=0, message=)

insert result: (insert count: 1, delete count: 0, upsert count: 0, timestamp: 449735609509740549, success count: 1, err count: 0)

再增加一点文字长度 就报错了:

line contents is : 你好呀你好呀 and length is 6

Batches: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.03it/s]

index handle result: Status(code=0, message=)

2024-05-13 20:59:27,915 decorators.py:134 ERROR\] RPC error: \[batch_insert\], \, \ Traceback (most recent call last): File "/root/temp_dir/run_task.py", line 55, in \ XXX().create_insert_vector_db() File "/root/temp_dir/app/service/vector_db/xx_pre_handle.py", line 63, in create_insert_vector_db ).get_or_create_db(fields, description, "possible_question_embeddings", entities) File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 23, in get_or_create_db return self.create_and_insert(fields, description, index_field_name, entities) File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 28, in create_and_insert self.insert_db(entities) File "/root/temp_dir/app/service/vector_db/milvus_db.py", line 40, in insert_db insert_result = self.collection.insert(entities) File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/orm/collection.py", line 497, in insert res = conn.batch_insert( File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 135, in handler raise e from e File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 131, in handler return func(\*args, \*\*kwargs) File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 170, in handler return func(self, \*args, \*\*kwargs) File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 110, in handler raise e from e File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/decorators.py", line 74, in handler return func(\*args, \*\*kwargs) File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 566, in batch_insert raise err from err File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/grpc_handler.py", line 560, in batch_insert check_status(response.status) File "/root/tmp/venv_dir/1_text_simi/lib/python3.10/site-packages/pymilvus/client/utils.py", line 54, in check_status raise MilvusException(status.code, status.reason, status.error_code) pymilvus.exceptions.MilvusException: \ 所以,可能是因为UTF-8或其他编码的原因,一些非ASCII字符可能被编码成多个字节 以保存进milvus。 所以,解决方案是 建表时FieldSchema中把max_length 设置为4倍或其他倍数于预期的最大长度。

相关推荐
小陳参上1 小时前
用Python创建一个Discord聊天机器人
jvm·数据库·python
minstbe3 小时前
IC设计私有化AI助手实战:基于Docker+OpenCode+Ollama的数字前端综合增强方案(进阶版)
人工智能·python·语言模型·llama
zyq99101_14 小时前
优化二分查找:前缀和降复杂度
数据结构·python·蓝桥杯
qyzm4 小时前
天梯赛练习(3月13日)
开发语言·数据结构·python·算法·贪心算法
Qt学视觉5 小时前
AI2-Paddle环境搭建
c++·人工智能·python·opencv·paddle
廋到被风吹走6 小时前
【LangChain4j】特点功能及使用场景
后端·python·flask
Eward-an6 小时前
LeetCode 239. 滑动窗口最大值(详细技术解析)
python·算法·leetcode
喵手7 小时前
Python爬虫实战:用代码守护地球,追踪WWF濒危物种保护动态!
爬虫·python·爬虫实战·濒危物种·零基础python爬虫教学·wwf·濒危物种保护动态追踪
梦想的旅途27 小时前
如何通过 QiWe API 实现企业微信主动发消息
开发语言·python
喵手7 小时前
Python爬虫实战:自动化抓取 Pinterest 热门趋势与创意!
爬虫·python·爬虫实战·pinterest·零基础python爬虫教学·采集pinterest热门趋势·热门趋势预测