目录
[一、什么是向量数据库 pgvector ?](#一、什么是向量数据库 pgvector ?)
[二、pgvector 部署](#二、pgvector 部署)
[2.1 安装 Docker](#2.1 安装 Docker)
[2.2 拉取镜像](#2.2 拉取镜像)
[2.3 添加规则](#2.3 添加规则)
[三、pgvector 运行](#三、pgvector 运行)
[3.1 运行 pgvector](#3.1 运行 pgvector)
[3.2 连接 pgvector](#3.2 连接 pgvector)
[3.3 pgvector 常见操作](#3.3 pgvector 常见操作)
本篇文章通过 云服务器Flexus X实例 部署向量数据库 pgvector,实现向量的相似性检索和存储。云服务器Flexus X实例 能够为 向量数据库 pgvector 提供稳定和安全的运行环境,并且,云服务器Flexus X实例 适用于中负载业务,且期望资源灵活选配的中小企业和开发者,具有灵活自定义规格、性能稳定强劲、按需灵活计费的优势。
一、什么是向量数据库 pgvector ?
Postgres 的开源向量相似性搜索,将向量与其余数据一起存储。pgvector是一个提供向量相似性搜索功能的开源 PostgreSQL 扩展,现已发布v0.7.0。此新版本包含许多新功能和性能特性,用于支持 PostgreSQL 中的向量相似性搜索工作负载。
支持如下功能:
(1)精确和近似最近邻搜索;
(2)单精度、半精度、二进制和稀疏向量;
(3)L2 距离、内积、余弦距离、L1 距离、汉明距离和杰卡德距离;
(4)任何具有 Postgres 客户端的语言。
下面在 云服务器Flexus X实例 上部署 pgvector。
二、pgvector 部署
2.1 安装 Docker
然后,执行命令安装 docker,如下所示。
bash
root@flexusx-7305:~# sudo apt install docker-ce
查看 docker 版本。
bash
root@flexusx-7305:~# docker --version
Docker version 27.2.1, build 9e34c9b
root@flexusx-7305:~#
最后,安装 docker-compose,执行如下命令。
bash
root@flexusx-7305:~# sudo apt install docker-compose
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
redis-server redis-tools
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
python3-cached-property python3-docker python3-dockerpty python3-docopt python3-importlib-metadata python3-jsonschema python3-more-itertools python3-pyrsistent python3-texttable python3-websocket python3-zipp
Suggested packages:
python-jsonschema-doc
Recommended packages:
docker.io
The following NEW packages will be installed:
docker-compose python3-cached-property python3-docker python3-dockerpty python3-docopt python3-importlib-metadata python3-jsonschema python3-more-itertools python3-pyrsistent python3-texttable python3-websocket python3-zipp
0 upgraded, 12 newly installed, 0 to remove and 33 not upgraded.
Need to get 412 kB of archives.
After this operation, 2,414 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.huaweicloud.com/ubuntu focal/universe amd64 python3-cached-property all 1.5.1-4 [10.9 kB]
Get:2 http://repo.huaweicloud.com/ubuntu focal/universe amd64 python3-websocket all 0.53.0-2ubuntu1 [
到这里 Docker 安装完成。
2.2 拉取镜像
拉取 pgvector 镜像,执行如下命令。
bash
root@flexusx-7305:~# docker pull registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
v0.7.0: Pulling from fastgpt/pgvector
Digest: sha256:27df42f0d0be8d5623ff1aea5fea7134e175af1cdef62d9df00b322a3c85edc9
Status: Image is up to date for registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
root@flexusx-7305:~#
如上所示,已经存在镜像,如果没有的话,可以通过如上方式拉取。
2.3 添加规则
pgvector 对应的端口是 5432,需要将 5432 端口加入到准入规则中。
首先,在基本信息中,找到安全组,点击进入安全组,如下所示。
然后,点击 配置规则 配置 5432 端口,如下所示。
设置优先级,然后在协议端口中添加端口,点击确定,如下所示。
可以看到 5432 端口已经被加入到安全规则中,如下所示。
三、pgvector 运行
3.1 运行 pgvector
首先,查看一下本地 pgvector 镜像,执行如下命令。
bash
root@flexusx-7305:~# docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox latest 0f26cf6654ad 2 weeks ago 315MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt v4.8.9 bc394a806301 6 weeks ago 356MB
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/gitea/gitea 1.22.1 b3de72970178 2 months ago 167MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api v0.6.6 40efbc4449c7 4 months ago 79.5MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector v0.7.0 6e0cb183450e 5 months ago 429MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql 8.0.36 f5f171121fa3 6 months ago 603MB
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/justsong/one-api v0.6.0 36bd98ce5a7c 7 months ago 48.4MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo 5.0.18 021e1bd71d92 16 months ago 662MB
daocloud.io/library/mysql 8 26d0ac143221 3 years ago 546MB
daocloud.io/library/mysql latest 8457e9155715 3 years ago 546MB
root@flexusx-7305:~#
如上所示,pgvector 对应的镜像是 registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0。
然后,执行 docker run 命令运行容器,执行如下命令。
bash
root@flexusx-7305:~# docker run --name pgvectorface --restart=always -e POSTGRES_USER=pgvectorface -e POSTGRES_PASSWORD=pgvector -p 54333:5432 -d registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
982a9ed7352450eb192c04ca9f7dbf31bfd9d1ccf9af4a234c85dc85d4338e41
root@flexusx-7305:~# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
982a9ed73524 registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0 "docker-entrypoint.s..." 11 seconds ago Up 11 seconds 0.0.0.0:54333->5432/tcp, [::]:54333->5432/tcp pgvectorface
68a1f9a73e58 registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt:v4.8.9 "sh -c 'node --max-o..." 10 days ago Up 10 days 0.0.0.0:3000->3000/tcp, :::3000->3000/tcp fastgpt
b57af8cd1b6b registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api:v0.6.6 "/one-api" 10 days ago Up 10 days 0.0.0.0:3001->3000/tcp, [::]:3001->3000/tcp oneapi
2de37c379c6a registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql:8.0.36 "docker-entrypoint.s..." 10 days ago Up 10 days 0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp mysql
9d7906452f26 registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox:latest "docker-entrypoint.s..." 10 days ago Up 10 days sandbox
6f9c7f088d9d registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo:5.0.18 "bash -c 'openssl ra..." 10 days ago Up 10 days 0.0.0.0:27017->27017/tcp, :::27017->27017/tcp mongo
3867cf7f6df9 registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0 "docker-entrypoint.s..." 10 days ago Up 10 days 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp pg
89bb9f7a3dd1 swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/justsong/one-api:v0.6.0 "/one-api" 12 days ago Up 11 days 0.0.0.0:3002->3000/tcp, [::]:3002->3000/tcp one-api
65fe1c102df6 daocloud.io/library/mysql:8 "docker-entrypoint.s..." 2 weeks ago Up 11 days 3306/tcp, 33060/tcp root_db_1
root@flexusx-7305:~#
如上所示, pgvector 已经运行成功。
3.2 连接 pgvector
安装 pgvector 客户端,安装软件包 postgresql-client-common 和 postgresql-client,执行如下命令安装。
bash
root@flexusx-7305:~# apt install postgresql-client-common
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
libcgi-fast-perl libcgi-pm-perl libencode-locale-perl libevent-core-2.1-7 libevent-pthreads-2.1-7 libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl
liblwp-mediatypes-perl libmecab2 libtimedate-perl liburi-perl mecab-ipadic mecab-ipadic-utf8 mecab-utils mysql-server-core-8.0 redis-server redis-tools
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
postgresql-client-common
0 upgraded, 1 newly installed, 0 to remove and 61 not upgraded.
Need to get 28.2 kB of archives.
After this operation, 182 kB of additional disk space will be used.
Get:1 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client-common all 214ubuntu0.1 [28.2 kB]
Fetched 28.2 kB in 0s (314 kB/s)
Selecting previously unselected package postgresql-client-common.
(Reading database ... 123209 files and directories currently installed.)
Preparing to unpack .../postgresql-client-common_214ubuntu0.1_all.deb ...
Unpacking postgresql-client-common (214ubuntu0.1) ...
Setting up postgresql-client-common (214ubuntu0.1) ...
Processing triggers for man-db (2.9.1-1) ...
root@flexusx-7305:~# apt-get install postgresql-client
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
libcgi-fast-perl libcgi-pm-perl libencode-locale-perl libevent-core-2.1-7 libevent-pthreads-2.1-7 libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl
liblwp-mediatypes-perl libmecab2 libtimedate-perl liburi-perl mecab-ipadic mecab-ipadic-utf8 mecab-utils mysql-server-core-8.0 redis-server redis-tools
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
libpq5 postgresql-client-12
Suggested packages:
postgresql-12 postgresql-doc-12
The following NEW packages will be installed:
libpq5 postgresql-client postgresql-client-12
0 upgraded, 3 newly installed, 0 to remove and 61 not upgraded.
Need to get 1,176 kB of archives.
After this operation, 4,303 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 libpq5 amd64 12.20-0ubuntu0.20.04.1 [117 kB]
Get:2 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client-12 amd64 12.20-0ubuntu0.20.04.1 [1,055 kB]
Get:3 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client all 12+214ubuntu0.1 [3,940 B]
Fetched 1,176 kB in 0s (6,884 kB/s)
Selecting previously unselected package libpq5:amd64.
(Reading database ... 123246 files and directories currently installed.)
Preparing to unpack .../libpq5_12.20-0ubuntu0.20.04.1_amd64.deb ...
Unpacking libpq5:amd64 (12.20-0ubuntu0.20.04.1) ...
Selecting previously unselected package postgresql-client-12.
Preparing to unpack .../postgresql-client-12_12.20-0ubuntu0.20.04.1_amd64.deb ...
Unpacking postgresql-client-12 (12.20-0ubuntu0.20.04.1) ...
Selecting previously unselected package postgresql-client.
Preparing to unpack .../postgresql-client_12+214ubuntu0.1_all.deb ...
Unpacking postgresql-client (12+214ubuntu0.1) ...
Setting up libpq5:amd64 (12.20-0ubuntu0.20.04.1) ...
Setting up postgresql-client-12 (12.20-0ubuntu0.20.04.1) ...
update-alternatives: using /usr/share/postgresql/12/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (12+214ubuntu0.1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.16) ...
root@flexusx-7305:~#
如上所示,软件包安装成功。
然后,通过 psql 客户端连接 pgvector,如下所示。
bash
root@flexusx-7305:~# psql -h 0.0.0.0 -p 54333 -U pgvectorface
Password for user pgvectorface:
psql (12.20 (Ubuntu 12.20-0ubuntu0.20.04.1), server 15.6 (Debian 15.6-1.pgdg120+2))
WARNING: psql major version 12, server major version 15.
Some psql features might not work.
Type "help" for help.
pgvectorface=#
3.3 pgvector 常见操作
用于 EXPLAIN ANALYZE 调试性能。
bash
EXPLAIN ANALYZE SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
为了加快没有索引的查询,请增加max_parallel_workers_per_gather。
bash
SET max_parallel_workers_per_gather = 4;
如果向量标准化为长度 1(如OpenAI 嵌入),则使用内积可获得最佳性能。
bash
SELECT * FROM items ORDER BY embedding <#> '[3,1,2]' LIMIT 5;
为了加快使用 IVFFlat 索引的查询速度,请增加倒排列表的数量(以牺牲召回率为代价)。
bash
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 1000);
清理 HNSW 索引可能需要一段时间。请先重新索引以加快速度。
bash
REINDEX INDEX CONCURRENTLY index_name;
VACUUM table_name;
四、总结
通过在 云服务器Flexus X实例 上安装向量数据库 pgvector,展现了 云服务器Flexus X实例 的安全和稳定,在部署的过程中也非常顺利,能够快速实现部署,服务器使用很方便,并且 云服务器Flexus X实例支持自定义配置系统盘规格及容量,支持多个不同类型的数据盘,赶紧用起来吧!