828华为云征文 | 云服务器Flexus X实例:向量数据库 pgvector 部署,实现向量检索

目录

[一、什么是向量数据库 pgvector ?](#一、什么是向量数据库 pgvector ?)

[二、pgvector 部署](#二、pgvector 部署)

[2.1 安装 Docker](#2.1 安装 Docker)

[2.2 拉取镜像](#2.2 拉取镜像)

[2.3 添加规则](#2.3 添加规则)

[三、pgvector 运行](#三、pgvector 运行)

[3.1 运行 pgvector](#3.1 运行 pgvector)

[3.2 连接 pgvector](#3.2 连接 pgvector)

[3.3 pgvector 常见操作](#3.3 pgvector 常见操作)

四、总结


本篇文章通过 云服务器Flexus X实例 部署向量数据库 pgvector,实现向量的相似性检索和存储。云服务器Flexus X实例 能够为 向量数据库 pgvector 提供稳定和安全的运行环境,并且,云服务器Flexus X实例 适用于中负载业务,且期望资源灵活选配的中小企业和开发者,具有灵活自定义规格、性能稳定强劲、按需灵活计费的优势。

一、什么是向量数据库 pgvector ?

​ Postgres 的开源向量相似性搜索,将向量与其余数据一起存储。pgvector是一个提供向量相似性搜索功能的开源 PostgreSQL 扩展,现已发布v0.7.0。此新版本包含许多新功能和性能特性,用于支持 PostgreSQL 中的向量相似性搜索工作负载。

支持如下功能:

(1)精确和近似最近邻搜索;

(2)单精度、半精度、二进制和稀疏向量;

(3)L2 距离、内积、余弦距离、L1 距离、汉明距离和杰卡德距离;

(4)任何具有 Postgres 客户端的语言。 ​

下面在 云服务器Flexus X实例 上部署 pgvector。

二、pgvector 部署

2.1 安装 Docker

然后,执行命令安装 docker,如下所示。

bash 复制代码
root@flexusx-7305:~# sudo apt install docker-ce

查看 docker 版本。

bash 复制代码
root@flexusx-7305:~# docker --version
Docker version 27.2.1, build 9e34c9b
root@flexusx-7305:~#

最后,安装 docker-compose,执行如下命令。

bash 复制代码
root@flexusx-7305:~# sudo apt install docker-compose
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  redis-server redis-tools
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  python3-cached-property python3-docker python3-dockerpty python3-docopt python3-importlib-metadata python3-jsonschema python3-more-itertools python3-pyrsistent python3-texttable python3-websocket python3-zipp
Suggested packages:
  python-jsonschema-doc
Recommended packages:
  docker.io
The following NEW packages will be installed:
  docker-compose python3-cached-property python3-docker python3-dockerpty python3-docopt python3-importlib-metadata python3-jsonschema python3-more-itertools python3-pyrsistent python3-texttable python3-websocket python3-zipp
0 upgraded, 12 newly installed, 0 to remove and 33 not upgraded.
Need to get 412 kB of archives.
After this operation, 2,414 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.huaweicloud.com/ubuntu focal/universe amd64 python3-cached-property all 1.5.1-4 [10.9 kB]
Get:2 http://repo.huaweicloud.com/ubuntu focal/universe amd64 python3-websocket all 0.53.0-2ubuntu1 [

到这里 Docker 安装完成。

2.2 拉取镜像

拉取 pgvector 镜像,执行如下命令。

bash 复制代码
root@flexusx-7305:~# docker pull registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
v0.7.0: Pulling from fastgpt/pgvector
Digest: sha256:27df42f0d0be8d5623ff1aea5fea7134e175af1cdef62d9df00b322a3c85edc9
Status: Image is up to date for registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
root@flexusx-7305:~#

如上所示,已经存在镜像,如果没有的话,可以通过如上方式拉取。

2.3 添加规则

pgvector 对应的端口是 5432,需要将 5432 端口加入到准入规则中。

首先,在基本信息中,找到安全组,点击进入安全组,如下所示。

然后,点击 配置规则 配置 5432 端口,如下所示。

设置优先级,然后在协议端口中添加端口,点击确定,如下所示。

可以看到 5432 端口已经被加入到安全规则中,如下所示。

三、pgvector 运行

3.1 运行 pgvector

首先,查看一下本地 pgvector 镜像,执行如下命令。

bash 复制代码
root@flexusx-7305:~# docker images
REPOSITORY                                                            TAG       IMAGE ID       CREATED         SIZE
registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox             latest    0f26cf6654ad   2 weeks ago     315MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt                     v4.8.9    bc394a806301   6 weeks ago     356MB
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/gitea/gitea        1.22.1    b3de72970178   2 months ago    167MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api                     v0.6.6    40efbc4449c7   4 months ago    79.5MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector                    v0.7.0    6e0cb183450e   5 months ago    429MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql                       8.0.36    f5f171121fa3   6 months ago    603MB
swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/justsong/one-api   v0.6.0    36bd98ce5a7c   7 months ago    48.4MB
registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo                       5.0.18    021e1bd71d92   16 months ago   662MB
daocloud.io/library/mysql                                             8         26d0ac143221   3 years ago     546MB
daocloud.io/library/mysql                                             latest    8457e9155715   3 years ago     546MB
root@flexusx-7305:~# 

如上所示,pgvector 对应的镜像是 registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0。

然后,执行 docker run 命令运行容器,执行如下命令。

bash 复制代码
root@flexusx-7305:~# docker run --name pgvectorface --restart=always -e POSTGRES_USER=pgvectorface -e POSTGRES_PASSWORD=pgvector -p 54333:5432 -d registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0
982a9ed7352450eb192c04ca9f7dbf31bfd9d1ccf9af4a234c85dc85d4338e41
root@flexusx-7305:~# docker ps
CONTAINER ID   IMAGE                                                                        COMMAND                  CREATED          STATUS          PORTS                                                  NAMES
982a9ed73524   registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0                    "docker-entrypoint.s..."   11 seconds ago   Up 11 seconds   0.0.0.0:54333->5432/tcp, [::]:54333->5432/tcp          pgvectorface
68a1f9a73e58   registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt:v4.8.9                     "sh -c 'node --max-o..."   10 days ago      Up 10 days      0.0.0.0:3000->3000/tcp, :::3000->3000/tcp              fastgpt
b57af8cd1b6b   registry.cn-hangzhou.aliyuncs.com/fastgpt/one-api:v0.6.6                     "/one-api"               10 days ago      Up 10 days      0.0.0.0:3001->3000/tcp, [::]:3001->3000/tcp            oneapi
2de37c379c6a   registry.cn-hangzhou.aliyuncs.com/fastgpt/mysql:8.0.36                       "docker-entrypoint.s..."   10 days ago      Up 10 days      0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp   mysql
9d7906452f26   registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-sandbox:latest             "docker-entrypoint.s..."   10 days ago      Up 10 days                                                             sandbox
6f9c7f088d9d   registry.cn-hangzhou.aliyuncs.com/fastgpt/mongo:5.0.18                       "bash -c 'openssl ra..."   10 days ago      Up 10 days      0.0.0.0:27017->27017/tcp, :::27017->27017/tcp          mongo
3867cf7f6df9   registry.cn-hangzhou.aliyuncs.com/fastgpt/pgvector:v0.7.0                    "docker-entrypoint.s..."   10 days ago      Up 10 days      0.0.0.0:5432->5432/tcp, :::5432->5432/tcp              pg
89bb9f7a3dd1   swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/justsong/one-api:v0.6.0   "/one-api"               12 days ago      Up 11 days      0.0.0.0:3002->3000/tcp, [::]:3002->3000/tcp            one-api
65fe1c102df6   daocloud.io/library/mysql:8                                                  "docker-entrypoint.s..."   2 weeks ago      Up 11 days      3306/tcp, 33060/tcp                                    root_db_1
root@flexusx-7305:~#

如上所示, pgvector 已经运行成功。

3.2 连接 pgvector

安装 pgvector 客户端,安装软件包 postgresql-client-common 和 postgresql-client,执行如下命令安装。

bash 复制代码
root@flexusx-7305:~# apt install postgresql-client-common
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libcgi-fast-perl libcgi-pm-perl libencode-locale-perl libevent-core-2.1-7 libevent-pthreads-2.1-7 libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl
  liblwp-mediatypes-perl libmecab2 libtimedate-perl liburi-perl mecab-ipadic mecab-ipadic-utf8 mecab-utils mysql-server-core-8.0 redis-server redis-tools
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
  postgresql-client-common
0 upgraded, 1 newly installed, 0 to remove and 61 not upgraded.
Need to get 28.2 kB of archives.
After this operation, 182 kB of additional disk space will be used.
Get:1 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client-common all 214ubuntu0.1 [28.2 kB]
Fetched 28.2 kB in 0s (314 kB/s)                    
Selecting previously unselected package postgresql-client-common.
(Reading database ... 123209 files and directories currently installed.)
Preparing to unpack .../postgresql-client-common_214ubuntu0.1_all.deb ...
Unpacking postgresql-client-common (214ubuntu0.1) ...
Setting up postgresql-client-common (214ubuntu0.1) ...
Processing triggers for man-db (2.9.1-1) ...
root@flexusx-7305:~# apt-get install postgresql-client
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libcgi-fast-perl libcgi-pm-perl libencode-locale-perl libevent-core-2.1-7 libevent-pthreads-2.1-7 libfcgi-perl libhtml-parser-perl libhtml-tagset-perl libhtml-template-perl libhttp-date-perl libhttp-message-perl libio-html-perl
  liblwp-mediatypes-perl libmecab2 libtimedate-perl liburi-perl mecab-ipadic mecab-ipadic-utf8 mecab-utils mysql-server-core-8.0 redis-server redis-tools
Use 'apt autoremove' to remove them.
The following additional packages will be installed:
  libpq5 postgresql-client-12
Suggested packages:
  postgresql-12 postgresql-doc-12
The following NEW packages will be installed:
  libpq5 postgresql-client postgresql-client-12
0 upgraded, 3 newly installed, 0 to remove and 61 not upgraded.
Need to get 1,176 kB of archives.
After this operation, 4,303 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 libpq5 amd64 12.20-0ubuntu0.20.04.1 [117 kB]
Get:2 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client-12 amd64 12.20-0ubuntu0.20.04.1 [1,055 kB]
Get:3 http://repo.huaweicloud.com/ubuntu focal-updates/main amd64 postgresql-client all 12+214ubuntu0.1 [3,940 B]
Fetched 1,176 kB in 0s (6,884 kB/s)           
Selecting previously unselected package libpq5:amd64.
(Reading database ... 123246 files and directories currently installed.)
Preparing to unpack .../libpq5_12.20-0ubuntu0.20.04.1_amd64.deb ...
Unpacking libpq5:amd64 (12.20-0ubuntu0.20.04.1) ...
Selecting previously unselected package postgresql-client-12.
Preparing to unpack .../postgresql-client-12_12.20-0ubuntu0.20.04.1_amd64.deb ...
Unpacking postgresql-client-12 (12.20-0ubuntu0.20.04.1) ...
Selecting previously unselected package postgresql-client.
Preparing to unpack .../postgresql-client_12+214ubuntu0.1_all.deb ...
Unpacking postgresql-client (12+214ubuntu0.1) ...
Setting up libpq5:amd64 (12.20-0ubuntu0.20.04.1) ...
Setting up postgresql-client-12 (12.20-0ubuntu0.20.04.1) ...
update-alternatives: using /usr/share/postgresql/12/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (12+214ubuntu0.1) ...
Processing triggers for libc-bin (2.31-0ubuntu9.16) ...
root@flexusx-7305:~# 

如上所示,软件包安装成功。

然后,通过 psql 客户端连接 pgvector,如下所示。

bash 复制代码
root@flexusx-7305:~# psql -h 0.0.0.0 -p 54333 -U pgvectorface
Password for user pgvectorface: 
psql (12.20 (Ubuntu 12.20-0ubuntu0.20.04.1), server 15.6 (Debian 15.6-1.pgdg120+2))
WARNING: psql major version 12, server major version 15.
         Some psql features might not work.
Type "help" for help.

pgvectorface=# 

3.3 pgvector 常见操作

用于 EXPLAIN ANALYZE 调试性能。

bash 复制代码
EXPLAIN ANALYZE SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;

为了加快没有索引的查询,请增加max_parallel_workers_per_gather。

bash 复制代码
SET max_parallel_workers_per_gather = 4;

如果向量标准化为长度 1(如OpenAI 嵌入),则使用内积可获得最佳性能。

bash 复制代码
SELECT * FROM items ORDER BY embedding <#> '[3,1,2]' LIMIT 5;

为了加快使用 IVFFlat 索引的查询速度,请增加倒排列表的数量(以牺牲召回率为代价)。

bash 复制代码
CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 1000);

清理 HNSW 索引可能需要一段时间。请先重新索引以加快速度。

bash 复制代码
REINDEX INDEX CONCURRENTLY index_name;
VACUUM table_name;

四、总结

通过在 云服务器Flexus X实例 上安装向量数据库 pgvector,展现了 云服务器Flexus X实例 的安全和稳定,在部署的过程中也非常顺利,能够快速实现部署,服务器使用很方便,并且 云服务器Flexus X实例支持自定义配置系统盘规格及容量,支持多个不同类型的数据盘,赶紧用起来吧!

相关推荐
Gauss松鼠会15 小时前
GaussDB 企业版轻量化部署探索(二)
数据库·人工智能·docker·华为云·gaussdb
HaoHao_01016 小时前
云消息队列 Kafka 版
分布式·阿里云·kafka·云计算·云服务器
HaoHao_0102 天前
云消息队列 RabbitMQ 版
阿里云·云计算·云服务器
HaoHao_0102 天前
应用实时监控服务ARMS
阿里云·云计算·云服务器
play_big_knife2 天前
鸿蒙项目云捐助第二十讲云捐助项目物联网IOT的使用
物联网·华为·华为云·harmonyos·鸿蒙·鸿蒙开发·iot开发
play_big_knife2 天前
鸿蒙项目云捐助第十五讲云数据库的初步使用
数据库·华为云·harmonyos·鸿蒙·云开发·云数据库·鸿蒙开发
枫叶丹42 天前
【HarmonyOS之旅】HarmonyOS开发基础知识(二)
华为od·华为·华为云·harmonyos
終不似少年遊*2 天前
华为云计算HCIE笔记01
运维·网络·笔记·华为云·云计算·hcie
play_big_knife2 天前
鸿蒙项目云捐助第十六讲云捐助使用云数据库实现登录注册
数据库·华为云·harmonyos·鸿蒙·云开发·云数据库·鸿蒙开发
枫叶丹44 天前
【HarmonyOS之旅】HarmonyOS开发基础知识(一)
华为od·华为·华为云·harmonyos