一、服务端搭建
1.本地拉取容器镜像
|---------------------------------|
| docker pull chromadb``/chroma
|
2.将本地容器镜像上传到远程服务器(如果远程服务器可以直接拉取镜像可以直接拉取,无需这一步)
|------------------------------------------------------------------------------------------------------------------|
| docker save -o chromadb.``tar
chromadb``/chroma
scp
/Users/本地目录/chromadb``.``tar
root@远程机器ip:``/远程机器路径
|
3.远程服务器加载并运行容器镜像
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| docker load -i chromadb.``tar
docker run --``rm
--name chromadb -p 8001:8000 -``v
.``/chromadb``:``/chroma/chroma
--``env``-``file
./.chroma_env chromadb``/chroma
|
注释:
-p 8001:8000 将服务器8001端口映射到容器8000端口
-v ./chromadb:/chroma/chroma 将服务器存储路径./chromadb映射到容器路径/chroma/chroma
--env-file ./.chroma_env 容器服务chromadb运行时的相关配置
.chroma_env配置内容
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| # 容器中数据是否持久化配置
IS_PERSISTENT=TRUE
# 容器中向量数据存储路径
chroma-data=``/chroma/chroma/
# 服务端认证token
CHROMA_SERVER_AUTHN_CREDENTIALS=``test``-token
# 服务端认证方式
CHROMA_SERVER_AUTHN_PROVIDER=chromadb.auth.token_authn.TokenAuthenticationServerProvider
|
二、客户端使用
需要安装依赖:pip install chromadb-client
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| import
chromadb
import
chromadb.utils.embedding_functions as embedding_functions
import
logging
import
sys
from
chromadb.config ``import
Settings
from
dotenv ``import
load_dotenv
from
datetime ``import
datetime
# 使用日志记录查看查询和事件
# log_format = '%(asctime)s - %(levelname)s - %(message)s'
# logging.basicConfig(stream=sys.stdout, level=logging.INFO,format=log_format)
# 获取向量数据库客户端
chroma_client ``=
chromadb.HttpClient(host``=``'服务器ip'``,
``port``=``8001``,
``settings``=``Settings(
``chroma_client_auth_provider``=``"chromadb.auth.token_authn.TokenAuthClientProvider"``,
``chroma_client_auth_credentials``=``"test-token"``)
``)
# 定义向量化函数(使用微软的azure)
openai_ef ``=
embedding_functions.OpenAIEmbeddingFunction(
``api_key``=``""``,
``api_base``=""``,
``api_type``=``"azure"``,
``api_version``=``""``,
``model_name``=``"embedding-3-small"
``)
# 创建数据集
# collection = chroma_client.create_collection(name="my_collection",
# embedding_function=openai_ef,
# metadata={
# "description": "my first Chroma collection",
# "created": str(datetime.now())
# } )
# 删除数据集
# chroma_client.delete_collection(name="my_collection")
# 获取数据集
collection ``=
chroma_client.get_collection(name``=``"my_collection"``,embedding_function``=``openai_ef)
# 向数据集中增加数据
# collection.add(
# documents=[
# "This is a document about pineapple",
# "This is a document about hawai"
# ],
# metadatas=[{"chapter": "3", "verse": "16"}, {"chapter": "3", "verse": "5"}],
# ids=["id1", "id2"]
# )
# 文本查询数据
# results = collection.query(
# query_texts=["This is a query document about hawai"], # Chroma will embed this for you
# n_results=1 # how many results to return
# )
# 文本查询并过滤
results ``=
collection.query(
``query_texts``=``[``"This is a query document about hawai"``],
``n_results``=``2``,
``# metadatas 元数据过滤
``# where={"verse": "16"},
``# 文本内容过滤
``where_document``=``{ ``"$contains"``:``"pineapple"``}
)
print``(results)
|