本文介绍如何通过HTTP API在Collection中进行分组相似性检索。
前提条件
Method与URL
HTTP
POST https://{Endpoint}/v1/collections/{CollectionName}/query_group_by使用示例
说明
- 
需要使用您的api-key替换示例中的YOUR_API_KEY、您的Cluster Endpoint替换示例中的YOUR_CLUSTER_ENDPOINT,代码才能正常运行。 
- 
本示例需要参考分组向量检索提前创建好名称为 group_by_demo的Collection,并插入部分数据。
根据向量进行分组相似性检索
Shell
l -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by示例输出
{
    "code": 0,
    "request_id": "d6df634a-683d-445e-abe0-d547091d6b3a",
    "message": "Success",
    "output": [
        {
            "docs": [
                {
                    "id": "4",
                    "vector": [
                        0.621783971786499,
                        0.5220040082931519,
                        0.8403469920158386,
                        0.995602011680603
                    ],
                    "fields": {
                        "document_id": "paper-02",
                        "content": "xxxD",
                        "chunk_id": 2
                    },
                    "score": 0.028402328
                }
            ],
            "group_id": "paper-02"
        },
        {
            "docs": [
                {
                    "id": "1",
                    "vector": [
                        0.26870301365852356,
                        0.8718249797821045,
                        0.6066280007362366,
                        0.6342290043830872
                    ],
                    "fields": {
                        "document_id": "paper-01",
                        "content": "xxxA",
                        "chunk_id": 1
                    },
                    "score": 0.08141637
                }
            ],
            "group_id": "paper-01"
        },
        {
            "docs": [
                {
                    "id": "6",
                    "vector": [
                        0.661965012550354,
                        0.730430006980896,
                        0.6105219721794128,
                        0.22164000570774078
                    ],
                    "fields": {
                        "document_id": "paper-03",
                        "content": "xxxF",
                        "chunk_id": 1
                    },
                    "score": 0.2513085
                }
            ],
            "group_id": "paper-03"
        }
    ]
}根据主键(对应的向量)进行分组相似性检索
Shell
curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "id": "1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query_group_by带过滤条件的分组相似性检索
Shell
curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "filter": "chunk_id > 1",
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query带有Sparse Vector的分组向量检索
Shell
curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "sparse_vector":{"1":0.4, "10000":0.6, "222222":0.8},
    "group_by_field": "document_id",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true
  }' https://YOUR_CLUSTER_ENDPOINT/v1/collections/group_by_demo/query使用多向量集合的一个向量执行分组检索
curl -XPOST \
  -H 'dashvector-auth-token: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "group_by_field": "author",
    "group_topk": 1,
    "group_count": 3,
    "include_vector": true,
    "vector_field": "title"
}' https://YOUR_CLUSTER_ENDPOINT/v1/collections/multi_vector_demo/query_group_by
# example output
#{
#    "code": 0,
#    "request_id": "b6f4997e-97e0-4d9b-9d3f-0659f4499305",
#    "message": "Success",
#    "output": [
#        {
#            "docs": [
#                {
#                    "id": "2",
#                    "vectors": {
#                        "title": [
#                            0.10000000149011612,
#                            0.20000000298023224,
#                            0.30000001192092896,
#                            0.4000000059604645
#                        ]
#                    },
#                    "fields": {
#                        "author": "zhangsan"
#                    },
#                    "score": 0.0
#                }
#            ],
#            "group_id": "zhangsan"
#        },
#        {
#            "docs": [
#                {
#                    "id": "1",
#                    "vectors": {
#                        "title": [
#                            0.30000001192092896,
#                            0.4000000059604645,
#                            0.5,
#                            0.6000000238418579
#                        ],
#                        "content": [
#                            0.30000001192092896,
#                            0.4000000059604645,
#                            0.5,
#                            0.6000000238418579,
#                            0.699999988079071,
#                            0.800000011920929
#                        ]
#                    },
#                    "fields": {
#                        "author": null
#                    },
#                    "score": 0.16000001
#                }
#            ]
#        }
#    ]
#}
#入参描述
说明
vector和id两个入参需要二选一使用,并保证其中一个不为空。
|-----------------------|--------------|--------|--------|----------------------------------------------------------------------------------------------------------------------|
| 参数                | Location | 类型 | 必填 | 说明                                                                                                               |
| {Endpoint}            | path         | str    | 是      | Cluster的Endpoint,可在控制台Cluster详情中查看 |
| {CollectionName}      | path         | str    | 是      | Collection名称                                                                                                         |
| dashvector-auth-token | header       | str    | 是      | api-key                                                                                                              |
| group_by_field        | body         | str    | 是      | 按指定字段的值来分组检索,目前不支持schema-free字段                                                                                      |
| group_count           | body         | int    | 否      | 最多返回的分组个数,尽力而为参数,一般可以返回group_count个分组。                                                                               |
| group_topk            | body         | int    | 否      | 每个分组返回group_topk条相似性结果,尽力而为参数,优先级低于group_count。                                                                      |
| vector                | body         | array  | 否      | 向量数据                                                                                                                 |
| sparse_vector         | body         | dict   | 否      | 稀疏向量                                                                                                                 |
| id                    | body         | str    | 否      | 主键,表示根据主键对应的向量进行相似性检索                                                                                                |
| filter                | body         | str    | 否      | 过滤条件,需满足SQL where子句规范,详见                                |
| include_vector        | body         | bool   | 否      | 是否返回向量数据,默认false                                                                                                     |
| output_fields         | body         | array  | 否      | 返回field的字段名列表,默认返回所有Fields                                                                                           |
| vector_field          | body         | str    | 否      | 使用多向量检索的一个向量执行分组检索。                        |
| partition             | body         | str    | 否      | Partition名称                                                                                                          |
出参描述
|------------|--------|-------------------------------------------------------------------------------------------------|--------------------------------------|
| 字段     | 类型 | 描述                                                                                          | 示例                               |
| code       | int    | 返回值,参考返回状态码说明                 | 0                                    |
| message    | str    | 返回消息                                                                                            | success                              |
| request_id | str    | 请求唯一id                                                                                          | 19215409-ea66-4db9-8764-26ce2eb5bb99 |
| output     | array  | 分组相似性检索结果,Group列表 |                                      |