一、聚合的概念
聚合区别于检索,检索是使用一系列条件把文档从es中搜索回来。但是聚合则是在搜索回来的文档的基础上进一步进行处理。
简单来说聚合就是将数据汇总为指标、统计数据或其他分析。聚合可以解决以下几类问题:
- 我的网站的平均加载时间是多少?
- 根据交易量,谁是我最有价值的客户?
- 在我的网络上,什么会被视为大文件?
- 每个商品分类有多少件商品?
基本上我们可以看出来,他是一种聚合分析,类似于做报表那样的一个功能。既然是报表分析的话那就离不开一些常见的概念,什么平均值,最大值,什么按照什么分组之后统计每个组里面的数据量这样的功能。在es中支持了三种聚合来实现这些功能。
- Metric aggregations:指标聚合是根据字段值计算量度(如总和或平均值)的量度聚合。
- Bucket aggregations:分桶聚合是根据字段值、范围或其他条件将文档分组到存储桶(也称为箱)中。其实你可以对应理解为mysql中的count ... group by field1这种。
- Pipeline aggregations:管道聚合是从其他聚合而不是文档或字段获取输入的管道聚合。这个稍微比上面两个难一点,具体来说就是上面两种的聚合都是把数据检索出来进行分析之类的。但是这个不是直接获取数据分析,他是在上面两个分析之后的结果的基础上进一步分析。他是建立在聚合之上的聚合。
下面我们就来逐一分析三种聚合的使用,不过在此之前,我们先来构建我们的数据。我们构建的索引是一个衣服的索引,包括分类,名称,价格,品牌,描述,产地这几个字段,并且生成20条数据,这个生成数据直接交给llm即可。比如这样。
json
PUT /clothes
{
"settings": {
"index": {
"number_of_shards": 1,
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"category":{
"type": "keyword"
},
"name":{
"type": "keyword",
"fields": {
"name_text":{
"type": "keyword"
}
}
},
"price":{
"type": "double"
},
"brand":{
"type": "keyword"
},
"desc":{
"type": "text"
},
"place_of_origin":{
"type": "keyword"
}
}
}
}
POST _bulk
{ "index" : { "_index" : "clothes", "_id" : "1" } }
{ "category" : "T-shirt", "name" : "纯棉T恤", "price" : 19.99, "brand" : "品牌A", "desc" : "基础款纯棉T恤,适合日常穿着。", "place_of_origin" : "中国" }
{ "index" : { "_index" : "clothes", "_id" : "2" } }
{ "category" : "Jeans", "name" : "修身牛仔裤", "price" : 49.99, "brand" : "品牌B", "desc" : "耐穿的牛仔裤,修身款式。", "place_of_origin" : "越南" }
{ "index" : { "_index" : "clothes", "_id" : "3" } }
{ "category" : "Dress", "name" : "晚礼服", "price" : 89.99, "brand" : "品牌C", "desc" : "适合特殊场合的优雅晚礼服。", "place_of_origin" : "意大利" }
{ "index" : { "_index" : "clothes", "_id" : "4" } }
{ "category" : "Jacket", "name" : "皮夹克", "price" : 129.99, "brand" : "品牌D", "desc" : "时尚的男士皮夹克。", "place_of_origin" : "美国" }
{ "index" : { "_index" : "clothes", "_id" : "5" } }
{ "category" : "Sweater", "name" : "羊毛衫", "price" : 39.99, "brand" : "品牌E", "desc" : "适合冬季的保暖羊毛衫。", "place_of_origin" : "澳大利亚" }
{ "index" : { "_index" : "clothes", "_id" : "6" } }
{ "category" : "Skirt", "name" : "铅笔裙", "price" : 29.99, "brand" : "品牌F", "desc" : "适合办公室穿着的经典铅笔裙。", "place_of_origin" : "英国" }
{ "index" : { "_index" : "clothes", "_id" : "7" } }
{ "category" : "Shorts", "name" : "休闲短裤", "price" : 14.99, "brand" : "品牌G", "desc" : "适合夏天的舒适休闲短裤。", "place_of_origin" : "中国" }
{ "index" : { "_index" : "clothes", "_id" : "8" } }
{ "category" : "Blouse", "name" : "丝绸衬衫", "price" : 59.99, "brand" : "品牌H", "desc" : "柔软的丝绸衬衫,适合女性。", "place_of_origin" : "法国" }
{ "index" : { "_index" : "clothes", "_id" : "9" } }
{ "category" : "Coat", "name" : "冬季大衣", "price" : 199.99, "brand" : "品牌I", "desc" : "适合寒冷天气的厚冬季大衣。", "place_of_origin" : "加拿大" }
{ "index" : { "_index" : "clothes", "_id" : "10" } }
{ "category" : "Socks", "name" : "棉袜", "price" : 4.99, "brand" : "品牌J", "desc" : "一包舒适的棉袜。", "place_of_origin" : "中国" }
{ "index" : { "_index" : "clothes", "_id" : "11" } }
{ "category" : "T-shirt", "name" : "印花T恤", "price" : 24.99, "brand" : "品牌K", "desc" : "带有酷炫图案的T恤。", "place_of_origin" : "日本" }
{ "index" : { "_index" : "clothes", "_id" : "12" } }
{ "category" : "Jeans", "name" : "破洞牛仔裤", "price" : 59.99, "brand" : "品牌L", "desc" : "带有时尚破洞的牛仔裤。", "place_of_origin" : "美国" }
{ "index" : { "_index" : "clothes", "_id" : "13" } }
{ "category" : "Dress", "name" : "休闲连衣裙", "price" : 79.99, "brand" : "品牌M", "desc" : "适合日常穿着的舒适连衣裙。", "place_of_origin" : "中国" }
{ "index" : { "_index" : "clothes", "_id" : "14" } }
{ "category" : "Jacket", "name" : "风衣", "price" : 69.99, "brand" : "品牌N", "desc" : "轻便的风衣夹克。", "place_of_origin" : "德国" }
{ "index" : { "_index" : "clothes", "_id" : "15" } }
{ "category" : "Sweater", "name" : "针织毛衣", "price" : 44.99, "brand" : "品牌O", "desc" : "手工编织的毛衣。", "place_of_origin" : "英国" }
{ "index" : { "_index" : "clothes", "_id" : "16" } }
{ "category" : "Skirt", "name" : "百褶裙", "price" : 34.99, "brand" : "品牌P", "desc" : "时尚的百褶裙。", "place_of_origin" : "中国" }
{ "index" : { "_index" : "clothes", "_id" : "17" } }
{ "category" : "Shorts", "name" : "牛仔短裤", "price" : 19.99, "brand" : "品牌Q", "desc" : "适合休闲的牛仔短裤。", "place_of_origin" : "美国" }
{ "index" : { "_index" : "clothes", "_id" : "18" } }
{ "category" : "Blouse", "name" : "亚麻衬衫", "price" : 54.99, "brand" : "品牌R", "desc" : "适合夏天的轻薄亚麻衬衫。", "place_of_origin" : "意大利" }
{ "index" : { "_index" : "clothes", "_id" : "19" } }
{ "category" : "Coat", "name" : "风衣", "price" : 149.99, "brand" : "品牌S", "desc" : "经典的风衣。", "place_of_origin" : "英国" }
{ "index" : { "_index" : "clothes", "_id" : "20" } }
{ "category" : "Socks", "name" : "羊毛袜", "price" : 7.99, "brand" : "品牌T", "desc" : "适合冬季的厚羊毛袜。", "place_of_origin" : "澳大利亚" }
此时我们就构建好我们的数据了,后面我们再根据需要做修改等等操作。
好了,此时我们就准备好了,下面我们来进行操作。