Elasticsearch获取文档总数的方法示例

使用Elasticsearch客户端获取索引中文档总数，主要有两种方法Count API和 Search API。

这里尝试基于网络资料，分别给出两种方法的使用示例。

1 Count API

1.1 代码示例

Count API 直接返回匹配查询的文档数量，是最简单高效的方式。

代码示例如下所示

复制代码

from elasticsearch import Elasticsearch

# 创建客户端连接（根据实际配置修改）
es = Elasticsearch("http://localhost:9200")

# 获取索引 "my_index" 中的文档总数
response = es.count(index="my_index")
total_docs = response["count"]
print(f"文档总数: {total_docs}")

1.2 查询计数

如果需要对特定查询计数，可以在body中传入查询条件。

比如，{"query": {"match": {"status": "active"}}}

示例如下。

复制代码

response = es.count(index="my_index", body={"query": {"match": {"status": "active"}}})

2 Search API

2.1 代码示例

如熟悉搜索操作，也可以执行size=0搜索，并从响应hits.total中获取总数。

复制代码

response = es.search(
    index="my_index",
    body={
        "query": {"match_all": {}},   # 匹配所有文档
        "size": 0                     # 不返回文档内容
    },
    track_total_hits=True              # 确保返回精确总数（ES 7.x+）
)

total_docs = response["hits"]["total"]["value"]
print(f"文档总数: {total_docs}")

2.2 track_total_hits

使用Search API计数，则经常需要配合track_total_hits使用。

Elasticsearch 7.x 之前，默认返回的总数可能被截断，最多 10000，需要设置track_total_hits为 true或具体数值来获取精确总数。

在 7.x及更高版本中，track_total_hits默认为10000，即只返回最多 10000 的精确计数，设置为 True则返回精确总数，但可能影响性能。

但使用 Count API 则没有这个限制，始终返回精确计数。

3 完整示例

以下是带异常处理的完整示例。

复制代码

from elasticsearch import Elasticsearch, exceptions

def get_document_count(es_client, index_name):
    try:
        resp = es_client.count(index=index_name)
        return resp["count"]
    except exceptions.NotFoundError:
        print(f"索引 '{index_name}' 不存在")
        return 0
    except Exception as e:
        print(f"查询失败: {e}")
        return None

if __name__ == "__main__":
    es = Elasticsearch("http://localhost:9200")
    count = get_document_count(es, "my_index")
    if count is not None:
        print(f"文档总数: {count}")

需要注的是：

1）客户端版本

上述代码适用于elasticsearch-py 7.x 和 8.x。

8.x 中连接方式可能需要使用Elasticsearch("http://localhost:9200")或传入hosts参数。

2）性能

Count API 比 Search API 更轻量，建议优先使用。

3）精确性

Count API 总是返回精确计数，Search API 受 track_total_hits 影响，如果不设置可能只返回 10000以内的精确值。如果只是计数，Count API 是最简洁可靠的选择。

reference

Count API

https://www.elastic.co/guide/en/elasticsearch/reference/8.18/search-count.html

elasticsearch中的track_total_hits设置问题原创

https://cloud.tencent.cn/developer/article/2558206

count

https://pyes.readthedocs.io/en/stable/guide/reference/api/count.html

hits>total -限制为10000项纪录-提高上限

https://cloud.tencent.cn/developer/ask/sof/108579241