【python qdrant 向量数据库 完整示例代码】

测试一下python版本的dqrant向量数据库的效果,完整代码如下:

安装库

复制代码
!pip install qdrant-client>=1.1.1
!pip install -U sentence-transformers

导入

复制代码
from qdrant_client import models, QdrantClient
from sentence_transformers import SentenceTransformer

encoder = SentenceTransformer("all-MiniLM-L6-v2", device="cuda")

准备测试数据集

复制代码
documents = [
    {
        "name": "The Time Machine",
        "description": "A man travels through time and witnesses the evolution of humanity."
        * 8,
        "author": "H.G. Wells",
        "year": 1895,
    },
    {
        "name": "Ender's Game",
        "description": "A young boy is trained to become a military leader in a war against an alien race."
        * 4,
        "author": "Orson Scott Card",
        "year": 1985,
    },
    {
        "name": "Brave New World",
        "description": "A dystopian society where people are genetically engineered and conditioned to conform to a strict social hierarchy."
        * 6,
        "author": "Aldous Huxley",
        "year": 1932,
    },
] * 50000

print(len(documents))

创建存储库

复制代码
qdrant = QdrantClient(":memory:")  # 内存中
# qdrant = QdrantClient(path='./qdrant')  # 存储到本地

在数据库中创建一个collection(类似一个存储桶)

复制代码
qdrant.recreate_collection(
    collection_name="my_books",
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(),  # Vector size is defined by used model
        distance=models.Distance.COSINE,
    ),
)

对文档进行向量化

复制代码
import hashlib
from tqdm import tqdm

def sha256(text):

    hash_object = hashlib.sha256()
    hash_object.update(text.encode("utf-8"))
    hash_value = hash_object.hexdigest()
    return hash_value

records = []
bs = 256
for i in tqdm(range(0, len(documents), bs)):
    docs = documents[i : i + bs]
    vectors = encoder.encode(
        [doc["description"] for doc in docs], normalize_embeddings=True
    ).tolist()

    record = [
        models.Record(id=idx, vector=vec, payload=doc)  # sha256(doc['description'])
        for idx, vec, doc in zip(range(i, i + bs), vectors, docs)
    ]

    records.extend(record)

上传到向量数据库中指定的collection

复制代码
qdrant.upload_points(
    collection_name="my_books", points=records, batch_size=128, parallel=12
)

语义搜索

复制代码
query = "Aliens attack our planet"
hits = qdrant.search(
    collection_name="my_books",
    query_vector=encoder.encode(query).tolist(),
    limit=6,
)
for hit in hits:
    print(hit.payload, "score:", hit.score)

条件搜索

search only for books from 21st century

复制代码
hits = qdrant.search(
    collection_name="my_books",
    query_vector=encoder.encode("Tyranic society").tolist(),
    query_filter=models.Filter(
        must=[models.FieldCondition(key="year", range=models.Range(gte=1980))]
    ),
    limit=3,
)
for hit in hits:
    print(hit.payload, "score:", hit.score)

参考官方GitHub

github

colab

相关推荐
亿牛云爬虫专家3 分钟前
Kubernetes下的分布式采集系统设计与实战:趋势监测失效引发的架构进化
分布式·python·架构·kubernetes·爬虫代理·监测·采集
蹦蹦跳跳真可爱5894 小时前
Python----OpenCV(图像増强——高通滤波(索贝尔算子、沙尔算子、拉普拉斯算子),图像浮雕与特效处理)
人工智能·python·opencv·计算机视觉
nananaij4 小时前
【Python进阶篇 面向对象程序设计(3) 继承】
开发语言·python·神经网络·pycharm
雷羿 LexChien4 小时前
从 Prompt 管理到人格稳定:探索 Cursor AI 编辑器如何赋能 Prompt 工程与人格风格设计(上)
人工智能·python·llm·编辑器·prompt
敲键盘的小夜猫5 小时前
LLM复杂记忆存储-多会话隔离案例实战
人工智能·python·langchain
高压锅_12205 小时前
Django Channels WebSocket实时通信实战:从聊天功能到消息推送
python·websocket·django
胖达不服输7 小时前
「日拱一码」020 机器学习——数据处理
人工智能·python·机器学习·数据处理
吴佳浩7 小时前
Python入门指南-番外-LLM-Fingerprint(大语言模型指纹):从技术视角看AI开源生态的边界与挑战
python·llm·mcp
吴佳浩7 小时前
Python入门指南-AI模型相似性检测方法:技术原理与实现
人工智能·python·llm
叶 落8 小时前
计算阶梯电费
python·python 基础·python 入门