【MongoDB】MongoDB的聚合(Aggregate、Map Reduce)与管道(Pipline) 及索引详解(附详细案例)

文章目录

更多相关内容可查看

MongoDB的聚合操作(Aggregate)

简单理解,其实本质跟sql一样,只不过写法不一样,仔细看以下示例

图例:

代码示例:

sql 复制代码
> db.orders.insertMany( [
     { _id: 1, cust_id: "abc1", ord_date: ISODate("2012-11-02T17:04:11.102Z"), status: "A", amount: 50 },
     { _id: 2, cust_id: "xyz1", ord_date: ISODate("2013-10-01T17:04:11.102Z"), status: "A", amount: 100 },
     { _id: 3, cust_id: "xyz1", ord_date: ISODate("2013-10-12T17:04:11.102Z"), status: "D", amount: 25 },
     { _id: 4, cust_id: "xyz1", ord_date: ISODate("2013-10-11T17:04:11.102Z"), status: "D", amount: 125 },
     { _id: 5, cust_id: "abc1", ord_date: ISODate("2013-11-12T17:04:11.102Z"), status: "A", amount: 25 }
 ] );
{ "acknowledged" : true, "insertedIds" : [ 1, 2, 3, 4, 5 ] }
> db.orders.find({})
{ "_id" : 1, "cust_id" : "abc1", "ord_date" : ISODate("2012-11-02T17:04:11.102Z"), "status" : "A", "amount" : 50 }
{ "_id" : 2, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-01T17:04:11.102Z"), "status" : "A", "amount" : 100 }
{ "_id" : 3, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-12T17:04:11.102Z"), "status" : "D", "amount" : 25 }
{ "_id" : 4, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-11T17:04:11.102Z"), "status" : "D", "amount" : 125 }
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>
> db.orders.aggregate([
                      { $match: { status: "A" } },
                      { $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
                      { $sort: { total: -1 } }
                   ])
{ "_id" : "xyz1", "total" : 100 }
{ "_id" : "abc1", "total" : 75 }

根据上述不难看出具体是怎么操作的,对sql有一定基础的应该可以很容易看懂

MongoDB的管道(Pipline操作)

MongoDB的聚合管道(Pipline)将MongoDB文档在一个阶段(Stage)处理完毕后将结果传递给下一个阶段(Stage)处理。阶段(Stage)操作是可以重复的。

阶段 描述 类似于 SQL 中的
$match 用于过滤文档,只传递满足条件的文档到下一个阶段 WHERE
$group 用于将文档分组,并可用于计算聚合值(如总和、平均值、计数等) GROUP BY
$project 用于选择和重命名字段,或者创建计算字段 SELECT
$sort 用于对文档进行排序 ORDER BY
$limit 用于限制传递到下一个阶段的文档数量 LIMIT
$skip 用于跳过指定数量的文档 OFFSET
$unwind 用于将数组字段中的每个元素拆分为独立的文档 N/A
$bucket 根据指定的边界将文档分组到不同的桶中 N/A
$facet 允许在单个聚合管道中并行执行多个不同的子管道 N/A

代码示例:

$project

sql 复制代码
> db.orders.aggregate(
     { $project : {
         _id : 0 , // 默认不显示_id
         cust_id : 1 ,
         status : 1
...     }});
{ "cust_id" : "abc1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "abc1", "status" : "A" }
>

$skip

sql 复制代码
> db.orders.aggregate(
   { $skip : 4 });
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>

$unwind

sql 复制代码
> db.inventory2.insertOne({ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] })
{ "acknowledged" : true, "insertedId" : 1 }
> db.inventory2.aggregate( [ { $unwind : "$sizes" } ] )
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }

$bucket

sql 复制代码
> db.artwork.insertMany([
 { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
     "price" : NumberDecimal("199.99") },
 { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
     "price" : NumberDecimal("280.00") },
 { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
     "price" : NumberDecimal("76.04") },
 { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
     "price" : NumberDecimal("167.30") },
 { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
     "price" : NumberDecimal("483.00") },
 { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
     "price" : NumberDecimal("385.00") },
 { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 },
 { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
     "price" : NumberDecimal("118.42") }
 ])
{
        "acknowledged" : true,
        "insertedIds" : [
                1,
                2,
                3,
                4,
                5,
                6,
                7,
                8
        ]
}
> db.artwork.find({})
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926, "price" : NumberDecimal("199.99") }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902, "price" : NumberDecimal("280.00") }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "price" : NumberDecimal("76.04") }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai", "price" : NumberDecimal("167.30") }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931, "price" : NumberDecimal("483.00") }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913, "price" : NumberDecimal("385.00") }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 } // 注意这里没有price,聚合结果中为Others
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "price" : NumberDecimal("118.42") }
> db.artwork.aggregate( [
   {
     $bucket: {
       groupBy: "$price",
       boundaries: [ 0, 200, 400 ],
       default: "Other",
       output: {
         "count": { $sum: 1 },
         "titles" : { $push: "$title" }
       }
     }
   }
 ] )
{ "_id" : 0, "count" : 4, "titles" : [ "The Pillars of Society", "Dancer", "The Great Wave off Kanagawa", "Blue Flower" ] }
{ "_id" : 200, "count" : 2, "titles" : [ "Melancholy III", "Composition VII" ] }
{ "_id" : "Other", "count" : 2, "titles" : [ "The Persistence of Memory", "The Scream" ] }

这里有很多朋友短时间内看不懂,其实bucket就是按照边界值进行分桶操作,以上案例就是价格字段在0-200放一个桶,200-400放一个桶,没有价格的数据放到other中

bucket + facet

sql 复制代码
db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucket: {
              groupBy: "$price",
              boundaries: [ 0, 200, 400 ],
              default: "Other",
              output: {
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } }
              }
          }
        }
      ],
      "year": [
        {
          $bucket: {
            groupBy: "$year",
            boundaries: [ 1890, 1910, 1920, 1940 ],
            default: "Unknown",
            output: {
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )

// 输出
{
  "year" : [
    {
      "_id" : 1890,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Melancholy III",
          "year" : 1902
        },
        {
          "title" : "The Scream",
          "year" : 1893
        }
      ]
    },
    {
      "_id" : 1910,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Composition VII",
          "year" : 1913
        },
        {
          "title" : "Blue Flower",
          "year" : 1918
        }
      ]
    },
    {
      "_id" : 1920,
      "count" : 3,
      "artwork" : [
        {
          "title" : "The Pillars of Society",
          "year" : 1926
        },
        {
          "title" : "Dancer",
          "year" : 1925
        },
        {
          "title" : "The Persistence of Memory",
          "year" : 1931
        }
      ]
    },
    {
      // Includes the document without a year, e.g., _id: 4
      "_id" : "Unknown",
      "count" : 1,
      "artwork" : [
        {
          "title" : "The Great Wave off Kanagawa"
        }
      ]
    }
  ],
      "price" : [
    {
      "_id" : 0,
      "count" : 4,
      "artwork" : [
        {
          "title" : "The Pillars of Society",
          "price" : NumberDecimal("199.99")
        },
        {
          "title" : "Dancer",
          "price" : NumberDecimal("76.04")
        },
        {
          "title" : "The Great Wave off Kanagawa",
          "price" : NumberDecimal("167.30")
        },
        {
          "title" : "Blue Flower",
          "price" : NumberDecimal("118.42")
        }
      ]
    },
    {
      "_id" : 200,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Melancholy III",
          "price" : NumberDecimal("280.00")
        },
        {
          "title" : "Composition VII",
          "price" : NumberDecimal("385.00")
        }
      ]
    },
    {
      // Includes the document without a price, e.g., _id: 7
      "_id" : "Other",
      "count" : 2,
      "artwork" : [
        {
          "title" : "The Persistence of Memory",
          "price" : NumberDecimal("483.00")
        },
        {
          "title" : "The Scream"
        }
      ]
    }
  ]
}

这里代码太长,可能有朋友没有足够的耐心看完,bucket + facet是非常常用的场景,这里解释一下,就是将两组bucket跟组合到了一起进行返回,可以按我自己的理解一个bucket就是多个List数组,List<List>,而一个facet就是在这个bucket在套一层List

更多的聚合关键字可以查看官方文档:https://www.mongodb.com/zh-cn/docs/manual/reference/operator/aggregation-pipeline/

MongoDB的聚合(Map Reduce)

图例:

代码示例:

json 复制代码
{ "_id": 1, "customerId": "A123", "amount": 100 }
{ "_id": 2, "customerId": "B456", "amount": 200 }
{ "_id": 3, "customerId": "A123", "amount": 150 }
{ "_id": 4, "customerId": "C789", "amount": 50 }
{ "_id": 5, "customerId": "B456", "amount": 300 }

使用 MapReduce 来计算每个 customerId 的总 amount

javascript 复制代码
// Map function
var mapFunction = function() {
    emit(this.customerId, this.amount);
};

// Reduce function
var reduceFunction = function(customerId, amounts) {
    return Array.sum(amounts);
};

// Execute MapReduce
db.orders.mapReduce(
    mapFunction,
    reduceFunction,
    { out: "order_totals" }
);

// 查看结果
db.order_totals.find().forEach(printjson);

{ "_id": "A123", "value": 250 }
{ "_id": "B456", "value": 500 }
{ "_id": "C789", "value": 50 }
  • Map Function : 对于每个文档,emit 函数将 customerId 作为键,amount 作为值发射出去。
  • Reduce Function : 对于每个唯一的 customerIdreduceFunction 接收一个键和与该键相关联的所有值的数组,并返回这些值的总和。
  • Output : 结果存储在 order_totals 集合中,每个文档包含一个 customerId 和该客户的总订单金额。

MongoDB的索引

图例:


类型:

  • 单一索引
sql 复制代码
{ "_id": 1, "username": "alice", "age": 30 }
{ "_id": 2, "username": "bob", "age": 25 }
sql 复制代码
db.users.createIndex({ username: 1 });

这里的 1 表示升序索引。对于降序索引,可以使用 -1

  • 复合索引
sql 复制代码
db.users.createIndex({ username: 1, age: -1 });
  • 多键索引
sql 复制代码
{ "_id": 1, "title": "MongoDB Basics", "tags": ["database", "NoSQL"] }
{ "_id": 2, "title": "Advanced MongoDB", "tags": ["database", "performance"] }
sql 复制代码
db.posts.createIndex({ tags: 1 });
  • 文字索引

支持文本搜索。它们允许对字符串字段进行全文搜索。

json 复制代码
{ "_id": 1, "content": "MongoDB is a NoSQL database" }
{ "_id": 2, "content": "Text search in MongoDB" }

我们可以在 content 字段上创建文字索引:

javascript 复制代码
db.articles.createIndex({ content: "text" });

然后,我们可以执行全文搜索:

javascript 复制代码
db.articles.find({ $text: { $search: "NoSQL" } });
  • 地理空间索引

引用于加速地理位置查询。MongoDB 支持 2D 和 2DSphere 索引

json 复制代码
{ "_id": 1, "name": "Central Park", "coordinates": [40.785091, -73.968285] }
{ "_id": 2, "name": "Golden Gate Bridge", "coordinates": [37.819929, -122.478255] }

我们可以在 coordinates 字段上创建 2DSphere 索引:

javascript 复制代码
db.locations.createIndex({ coordinates: "2dsphere" });
  • 哈希索引

用于均匀分布数据,适合需要高效等值查询的场景

json 复制代码
{ "_id": 1, "sku": "A123" }
{ "_id": 2, "sku": "B456" }

我们可以在 sku 字段上创建哈希索引:

javascript 复制代码
db.products.createIndex({ sku: "hashed" });

索引的操作:

查看集合索引

sql 复制代码
db.col.getIndexes()

查看集合索引大小

sql 复制代码
db.col.totalIndexSize()

删除集合所有索引

sql 复制代码
db.col.dropIndexes()

删除集合指定索引

sql 复制代码
db.col.dropIndex("索引名称")
相关推荐
ccddsdsdfsdf2 小时前
DBeaver怎么链接mongoDB
数据库·mongodb
丷丩3 小时前
Postgresql基础实践教程(十一)各种Join
数据库·postgresql·join
星夜夏空993 小时前
FreeRTOS学习(4)——内存映射
数据库·学习·mongodb
TheRouter4 小时前
AI Agent 记忆体系建设实战:短期、长期与工作记忆的工程实现
数据库·人工智能·oracle
Omics Pro4 小时前
首个!外源天然产物综合性代谢图谱
数据库·人工智能·算法·机器学习·r语言
JAVA面经实录9175 小时前
Hibernate面试题库
数据库·oracle·hibernate
迷枫7126 小时前
DM8 目录结构与常用排查入口梳理
服务器·数据库
Mr.Daozhi7 小时前
RAG 进阶实战:跑通 Demo 后我连续翻了 6 次车,逐一修复才真正可用(含 Gradio Web 版)
前端·数据库·langchain·大模型·gradio·rag·科研工具
小程故事多_807 小时前
Claude Code自定义workflow skills用法
数据库·人工智能·智能体
大鹏说大话7 小时前
SQL 排序与分组实战:解决“分组后取最新数据“
android·java·数据库