ES 语句
整体数据
xml
GET wkl_test/_search
{
"query": {
"match_all": {}
}
}
结果:
json
{
"took" : 123,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "aK0tFpABTkLj5j4c34pE",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"aa" : 1
}
},
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "aa0uFpABTkLj5j4cFYrJ",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"aa" : 2
}
},
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "aq0uFpABTkLj5j4cKYqF",
"_score" : 1.0,
"_source" : {
"name" : "wangwu",
"aa" : 2
}
},
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "a60uFpABTkLj5j4c2IoF",
"_score" : 1.0,
"_source" : {
"name" : "maliu",
"aa" : 2
}
},
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "bK1IFpABTkLj5j4cqYop",
"_score" : 1.0,
"_source" : {
"name" : "gouqi",
"aa" : 3
}
}
]
}
}
1:collapse折叠功能- 查询去重后的数据列表(ES5.3之后支持)
- 推荐原因:性能高,占内存小
- 注意:使用此方式去重时,不会去除掉不存在去重字段的数据。
- 去重字段只能是数字long类型或keyword。
- Field Collapsing(字段折叠)不能与scroll、rescore以及search after 结合使用。
xml
GET wkl_test/_search
{
"query": {
"match_all": {}
},
"collapse": {
"field": "aa"
}
}
结果:hits 中total虽然=5,但是只返回了去重后的 3 条数据
json
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "aK0tFpABTkLj5j4c34pE",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"aa" : 1
},
"fields" : {
"aa" : [
1
]
}
},
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "aa0uFpABTkLj5j4cFYrJ",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"aa" : 2
},
"fields" : {
"aa" : [
2
]
}
},
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "bK1IFpABTkLj5j4cqYop",
"_score" : 1.0,
"_source" : {
"name" : "gouqi",
"aa" : 3
},
"fields" : {
"aa" : [
3
]
}
}
]
}
}
2:cardinality - 查询去重后的数据总数
- 聚合+cardinality:即去重计算,类似sql中 count(distinct),先去重再求和
- 注意:使用此方式统计去重后的数量时,会去除掉不存在去重字段的数据。
xml
GET wkl_test/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"distinct_count": {
"cardinality": {
"field": "aa"
}
}
}
}
结果:distinct_count = 3,说明去重后有3个,既aggregations聚合下,返回了按名字查询去重后的结果数,但是只有去重后的条数,没有具体的数据。
json
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"distinct_count" : {
"value" : 3
}
}
}
3:整体语句
- 使用collapse 折叠查询后,虽然返回了去重后的数据,但是total 还是所有的数据量
- 使用 cardinality 聚合 ,虽然在aggs 聚合结果中返回了正确的数据量,但是hits中还是全部的数据
- 所以我们需要 两个综合使用,如下:
xml
GET wkl_test/_search
{
"query": {
"match_all": {}
},
"collapse": {
"field": "aa"
},
"aggs": {
"distinct_count": {
"cardinality": {
"field": "aa"
}
}
}
}
结果:
json
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "aK0tFpABTkLj5j4c34pE",
"_score" : 1.0,
"_source" : {
"name" : "zhangsan",
"aa" : 1
},
"fields" : {
"aa" : [
1
]
}
},
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "aa0uFpABTkLj5j4cFYrJ",
"_score" : 1.0,
"_source" : {
"name" : "lisi",
"aa" : 2
},
"fields" : {
"aa" : [
2
]
}
},
{
"_index" : "wkl_test",
"_type" : "_doc",
"_id" : "bK1IFpABTkLj5j4cqYop",
"_score" : 1.0,
"_source" : {
"name" : "gouqi",
"aa" : 3
},
"fields" : {
"aa" : [
3
]
}
}
]
},
"aggregations" : {
"distinct_count" : {
"value" : 3
}
}
}
注:我们使用cardinality聚合后的distinct_count 作为去重后的总数,用 collapse 折叠后的列表作为数据结果集
分页使用解释说明:
-
1.hits中total的总条数实际上是去重前的总条数,原数据条数,这里我们知道就行,分页中我们并不使用它。hits中数组的大小刚好等于courseAgg聚合的值,数组中的数据就是去重后的数据。
-
2.aggregations中的courseAgg条数,这个才是去重后的实际条数,也是分页用的总条数。
-
3.from 查询的偏移量,也就是从哪里开始查。
-
4.size 查询条数,一次查几条。
-
接下来,你就可以把它当做一个简单分页查询来用了,传入from和size就ok啦~
JAVA API使用
1:collapse 查询去重的结果集
java
// 使用collapse来指定去重的字段,例如"your_distinct_field"
CollapseBuilder collapseBuilder = new CollapseBuilder("your_distinct_field");
searchSourceBuilder.collapse(collapseBuilder);
2:cardinality - 查询去重后的数据总数
java
// 添加一个cardinality聚合来计算去重字段的唯一值数量
CardinalityAggregationBuilder aggregation = AggregationBuilders
.cardinality("distinct_count")//这里是聚合结果的字段名
.field("your_distinct_field")//这里是需要聚合的字段
.precisionThreshold(40000); // 根据需要调整精度阈值
searchSourceBuilder.aggregation(aggregation);
3:整体使用
java
package com.wenge.system.utils;
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.metrics.CardinalityAggregationBuilder;
import org.elasticsearch.search.aggregations.metrics.ParsedCardinality;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.collapse.CollapseBuilder;
import java.io.IOException;
import java.util.Map;
/**
* @author wangkanglu
* @version 1.0
* @description
* @date 2024-06-17 16:48
*/
public class TestES {
public static void main(String[] args) throws IOException {
//创建ES客户端
RestHighLevelClient esClient = new RestHighLevelClient(
RestClient.builder(new HttpHost("localhost",9200,"http"))
);
try {
// 创建一个搜索请求并设置索引名
SearchRequest searchRequest = new SearchRequest("your_index");
// 构建搜索源构建器
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 设置查询条件,例如匹配所有文档,这里根据业务自己修改
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
// 使用collapse来指定去重的字段,例如"your_distinct_field"
CollapseBuilder collapseBuilder = new CollapseBuilder("your_distinct_field");
searchSourceBuilder.collapse(collapseBuilder);
// 添加一个cardinality聚合来计算去重字段的唯一值数量
CardinalityAggregationBuilder aggregation = AggregationBuilders
.cardinality("distinct_count")//这里是聚合结果的字段名
.field("your_distinct_field")//这里是需要聚合的字段
.precisionThreshold(40000); // 根据需要调整精度阈值
searchSourceBuilder.aggregation(aggregation);
// 设置搜索源
searchRequest.source(searchSourceBuilder);
// 执行搜索
SearchResponse searchResponse = esClient.search(searchRequest, RequestOptions.DEFAULT);
SearchHit[] hits = searchResponse.getHits().getHits();
for (SearchHit hit : hits) {
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
System.out.println("去重结果: " + sourceAsMap);
}
// 处理搜索结果,获取去重数量
ParsedCardinality parsedCardinality = searchResponse.getAggregations().get("distinct_count");
long distinctCount = parsedCardinality.getValue();
System.out.println("去重结果数量:" + distinctCount);
} finally {
// 关闭client
esClient.close();
}
}
}