【MongoDB】MongoDB的聚合(Aggregate、Map Reduce)与管道(Pipline) 及索引详解(附详细案例)

文章目录

更多相关内容可查看

MongoDB的聚合操作(Aggregate)

简单理解,其实本质跟sql一样,只不过写法不一样,仔细看以下示例

图例:

代码示例:

sql 复制代码
> db.orders.insertMany( [
     { _id: 1, cust_id: "abc1", ord_date: ISODate("2012-11-02T17:04:11.102Z"), status: "A", amount: 50 },
     { _id: 2, cust_id: "xyz1", ord_date: ISODate("2013-10-01T17:04:11.102Z"), status: "A", amount: 100 },
     { _id: 3, cust_id: "xyz1", ord_date: ISODate("2013-10-12T17:04:11.102Z"), status: "D", amount: 25 },
     { _id: 4, cust_id: "xyz1", ord_date: ISODate("2013-10-11T17:04:11.102Z"), status: "D", amount: 125 },
     { _id: 5, cust_id: "abc1", ord_date: ISODate("2013-11-12T17:04:11.102Z"), status: "A", amount: 25 }
 ] );
{ "acknowledged" : true, "insertedIds" : [ 1, 2, 3, 4, 5 ] }
> db.orders.find({})
{ "_id" : 1, "cust_id" : "abc1", "ord_date" : ISODate("2012-11-02T17:04:11.102Z"), "status" : "A", "amount" : 50 }
{ "_id" : 2, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-01T17:04:11.102Z"), "status" : "A", "amount" : 100 }
{ "_id" : 3, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-12T17:04:11.102Z"), "status" : "D", "amount" : 25 }
{ "_id" : 4, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-11T17:04:11.102Z"), "status" : "D", "amount" : 125 }
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>
> db.orders.aggregate([
                      { $match: { status: "A" } },
                      { $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
                      { $sort: { total: -1 } }
                   ])
{ "_id" : "xyz1", "total" : 100 }
{ "_id" : "abc1", "total" : 75 }

根据上述不难看出具体是怎么操作的,对sql有一定基础的应该可以很容易看懂

MongoDB的管道(Pipline操作)

MongoDB的聚合管道(Pipline)将MongoDB文档在一个阶段(Stage)处理完毕后将结果传递给下一个阶段(Stage)处理。阶段(Stage)操作是可以重复的。

阶段 描述 类似于 SQL 中的
$match 用于过滤文档,只传递满足条件的文档到下一个阶段 WHERE
$group 用于将文档分组,并可用于计算聚合值(如总和、平均值、计数等) GROUP BY
$project 用于选择和重命名字段,或者创建计算字段 SELECT
$sort 用于对文档进行排序 ORDER BY
$limit 用于限制传递到下一个阶段的文档数量 LIMIT
$skip 用于跳过指定数量的文档 OFFSET
$unwind 用于将数组字段中的每个元素拆分为独立的文档 N/A
$bucket 根据指定的边界将文档分组到不同的桶中 N/A
$facet 允许在单个聚合管道中并行执行多个不同的子管道 N/A

代码示例:

$project

sql 复制代码
> db.orders.aggregate(
     { $project : {
         _id : 0 , // 默认不显示_id
         cust_id : 1 ,
         status : 1
...     }});
{ "cust_id" : "abc1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "abc1", "status" : "A" }
>

$skip

sql 复制代码
> db.orders.aggregate(
   { $skip : 4 });
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>

$unwind

sql 复制代码
> db.inventory2.insertOne({ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] })
{ "acknowledged" : true, "insertedId" : 1 }
> db.inventory2.aggregate( [ { $unwind : "$sizes" } ] )
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }

$bucket

sql 复制代码
> db.artwork.insertMany([
 { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
     "price" : NumberDecimal("199.99") },
 { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
     "price" : NumberDecimal("280.00") },
 { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
     "price" : NumberDecimal("76.04") },
 { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
     "price" : NumberDecimal("167.30") },
 { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
     "price" : NumberDecimal("483.00") },
 { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
     "price" : NumberDecimal("385.00") },
 { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 },
 { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
     "price" : NumberDecimal("118.42") }
 ])
{
        "acknowledged" : true,
        "insertedIds" : [
                1,
                2,
                3,
                4,
                5,
                6,
                7,
                8
        ]
}
> db.artwork.find({})
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926, "price" : NumberDecimal("199.99") }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902, "price" : NumberDecimal("280.00") }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "price" : NumberDecimal("76.04") }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai", "price" : NumberDecimal("167.30") }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931, "price" : NumberDecimal("483.00") }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913, "price" : NumberDecimal("385.00") }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 } // 注意这里没有price,聚合结果中为Others
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "price" : NumberDecimal("118.42") }
> db.artwork.aggregate( [
   {
     $bucket: {
       groupBy: "$price",
       boundaries: [ 0, 200, 400 ],
       default: "Other",
       output: {
         "count": { $sum: 1 },
         "titles" : { $push: "$title" }
       }
     }
   }
 ] )
{ "_id" : 0, "count" : 4, "titles" : [ "The Pillars of Society", "Dancer", "The Great Wave off Kanagawa", "Blue Flower" ] }
{ "_id" : 200, "count" : 2, "titles" : [ "Melancholy III", "Composition VII" ] }
{ "_id" : "Other", "count" : 2, "titles" : [ "The Persistence of Memory", "The Scream" ] }

这里有很多朋友短时间内看不懂,其实bucket就是按照边界值进行分桶操作,以上案例就是价格字段在0-200放一个桶,200-400放一个桶,没有价格的数据放到other中

$bucket + $facet

sql 复制代码
db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucket: {
              groupBy: "$price",
              boundaries: [ 0, 200, 400 ],
              default: "Other",
              output: {
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } }
              }
          }
        }
      ],
      "year": [
        {
          $bucket: {
            groupBy: "$year",
            boundaries: [ 1890, 1910, 1920, 1940 ],
            default: "Unknown",
            output: {
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )

// 输出
{
  "year" : [
    {
      "_id" : 1890,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Melancholy III",
          "year" : 1902
        },
        {
          "title" : "The Scream",
          "year" : 1893
        }
      ]
    },
    {
      "_id" : 1910,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Composition VII",
          "year" : 1913
        },
        {
          "title" : "Blue Flower",
          "year" : 1918
        }
      ]
    },
    {
      "_id" : 1920,
      "count" : 3,
      "artwork" : [
        {
          "title" : "The Pillars of Society",
          "year" : 1926
        },
        {
          "title" : "Dancer",
          "year" : 1925
        },
        {
          "title" : "The Persistence of Memory",
          "year" : 1931
        }
      ]
    },
    {
      // Includes the document without a year, e.g., _id: 4
      "_id" : "Unknown",
      "count" : 1,
      "artwork" : [
        {
          "title" : "The Great Wave off Kanagawa"
        }
      ]
    }
  ],
      "price" : [
    {
      "_id" : 0,
      "count" : 4,
      "artwork" : [
        {
          "title" : "The Pillars of Society",
          "price" : NumberDecimal("199.99")
        },
        {
          "title" : "Dancer",
          "price" : NumberDecimal("76.04")
        },
        {
          "title" : "The Great Wave off Kanagawa",
          "price" : NumberDecimal("167.30")
        },
        {
          "title" : "Blue Flower",
          "price" : NumberDecimal("118.42")
        }
      ]
    },
    {
      "_id" : 200,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Melancholy III",
          "price" : NumberDecimal("280.00")
        },
        {
          "title" : "Composition VII",
          "price" : NumberDecimal("385.00")
        }
      ]
    },
    {
      // Includes the document without a price, e.g., _id: 7
      "_id" : "Other",
      "count" : 2,
      "artwork" : [
        {
          "title" : "The Persistence of Memory",
          "price" : NumberDecimal("483.00")
        },
        {
          "title" : "The Scream"
        }
      ]
    }
  ]
}

这里代码太长,可能有朋友没有足够的耐心看完,$bucket + $facet是非常常用的场景,这里解释一下,就是将两组bucket跟组合到了一起进行返回,可以按我自己的理解一个bucket就是多个List数组,List<List>,而一个facet就是在这个bucket在套一层List

更多的聚合关键字可以查看官方文档:https://www.mongodb.com/zh-cn/docs/manual/reference/operator/aggregation-pipeline/

MongoDB的聚合(Map Reduce)

图例:

代码示例:

json 复制代码
{ "_id": 1, "customerId": "A123", "amount": 100 }
{ "_id": 2, "customerId": "B456", "amount": 200 }
{ "_id": 3, "customerId": "A123", "amount": 150 }
{ "_id": 4, "customerId": "C789", "amount": 50 }
{ "_id": 5, "customerId": "B456", "amount": 300 }

使用 MapReduce 来计算每个 customerId 的总 amount

javascript 复制代码
// Map function
var mapFunction = function() {
    emit(this.customerId, this.amount);
};

// Reduce function
var reduceFunction = function(customerId, amounts) {
    return Array.sum(amounts);
};

// Execute MapReduce
db.orders.mapReduce(
    mapFunction,
    reduceFunction,
    { out: "order_totals" }
);

// 查看结果
db.order_totals.find().forEach(printjson);

{ "_id": "A123", "value": 250 }
{ "_id": "B456", "value": 500 }
{ "_id": "C789", "value": 50 }
  • Map Function : 对于每个文档,emit 函数将 customerId 作为键,amount 作为值发射出去。
  • Reduce Function : 对于每个唯一的 customerIdreduceFunction 接收一个键和与该键相关联的所有值的数组,并返回这些值的总和。
  • Output : 结果存储在 order_totals 集合中,每个文档包含一个 customerId 和该客户的总订单金额。

MongoDB的索引

图例:


类型:

  • 单一索引
sql 复制代码
{ "_id": 1, "username": "alice", "age": 30 }
{ "_id": 2, "username": "bob", "age": 25 }
sql 复制代码
db.users.createIndex({ username: 1 });

这里的 1 表示升序索引。对于降序索引,可以使用 -1

  • 复合索引
sql 复制代码
db.users.createIndex({ username: 1, age: -1 });
  • 多键索引
sql 复制代码
{ "_id": 1, "title": "MongoDB Basics", "tags": ["database", "NoSQL"] }
{ "_id": 2, "title": "Advanced MongoDB", "tags": ["database", "performance"] }
sql 复制代码
db.posts.createIndex({ tags: 1 });
  • 文字索引

支持文本搜索。它们允许对字符串字段进行全文搜索。

json 复制代码
{ "_id": 1, "content": "MongoDB is a NoSQL database" }
{ "_id": 2, "content": "Text search in MongoDB" }

我们可以在 content 字段上创建文字索引:

javascript 复制代码
db.articles.createIndex({ content: "text" });

然后,我们可以执行全文搜索:

javascript 复制代码
db.articles.find({ $text: { $search: "NoSQL" } });
  • 地理空间索引

引用于加速地理位置查询。MongoDB 支持 2D 和 2DSphere 索引

json 复制代码
{ "_id": 1, "name": "Central Park", "coordinates": [40.785091, -73.968285] }
{ "_id": 2, "name": "Golden Gate Bridge", "coordinates": [37.819929, -122.478255] }

我们可以在 coordinates 字段上创建 2DSphere 索引:

javascript 复制代码
db.locations.createIndex({ coordinates: "2dsphere" });
  • 哈希索引

用于均匀分布数据,适合需要高效等值查询的场景

json 复制代码
{ "_id": 1, "sku": "A123" }
{ "_id": 2, "sku": "B456" }

我们可以在 sku 字段上创建哈希索引:

javascript 复制代码
db.products.createIndex({ sku: "hashed" });

索引的操作:

查看集合索引

sql 复制代码
db.col.getIndexes()

查看集合索引大小

sql 复制代码
db.col.totalIndexSize()

删除集合所有索引

sql 复制代码
db.col.dropIndexes()

删除集合指定索引

sql 复制代码
db.col.dropIndex("索引名称")
相关推荐
boy快快长大15 分钟前
将大模型生成数据存入Excel,并用增量的方式存入Excel
java·数据库·excel
daiyang123...1 小时前
MySQL【知识改变命运】11
android·数据库·mysql
Moshow郑锴1 小时前
数据库、数据仓库、数据湖、数据中台、湖仓一体的概念和区别
大数据·数据库·数据仓库·数据湖·湖仓一体
whn19771 小时前
虚拟机上搭建达梦DSC简略步骤
数据库
喝醉酒的小白1 小时前
Consumer Group
数据库
LIT-涛2 小时前
JavaEE初学07
数据库·oracle·java-ee
前端拾光者2 小时前
前端数据可视化思路及实现案例
前端·数据库·信息可视化
天涯倦客的美丽人生2 小时前
SQL递归查询树结构语法
数据库·sql
licy__3 小时前
正则表达式语法详解(python)
数据库·mysql·正则表达式
qq_q9922502773 小时前
django宠物服务管理系统
数据库·django·宠物