【MongoDB】MongoDB的聚合(Aggregate、Map Reduce)与管道(Pipline) 及索引详解(附详细案例)

文章目录

更多相关内容可查看

MongoDB的聚合操作(Aggregate)

简单理解,其实本质跟sql一样,只不过写法不一样,仔细看以下示例

图例:

代码示例:

sql 复制代码
> db.orders.insertMany( [
     { _id: 1, cust_id: "abc1", ord_date: ISODate("2012-11-02T17:04:11.102Z"), status: "A", amount: 50 },
     { _id: 2, cust_id: "xyz1", ord_date: ISODate("2013-10-01T17:04:11.102Z"), status: "A", amount: 100 },
     { _id: 3, cust_id: "xyz1", ord_date: ISODate("2013-10-12T17:04:11.102Z"), status: "D", amount: 25 },
     { _id: 4, cust_id: "xyz1", ord_date: ISODate("2013-10-11T17:04:11.102Z"), status: "D", amount: 125 },
     { _id: 5, cust_id: "abc1", ord_date: ISODate("2013-11-12T17:04:11.102Z"), status: "A", amount: 25 }
 ] );
{ "acknowledged" : true, "insertedIds" : [ 1, 2, 3, 4, 5 ] }
> db.orders.find({})
{ "_id" : 1, "cust_id" : "abc1", "ord_date" : ISODate("2012-11-02T17:04:11.102Z"), "status" : "A", "amount" : 50 }
{ "_id" : 2, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-01T17:04:11.102Z"), "status" : "A", "amount" : 100 }
{ "_id" : 3, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-12T17:04:11.102Z"), "status" : "D", "amount" : 25 }
{ "_id" : 4, "cust_id" : "xyz1", "ord_date" : ISODate("2013-10-11T17:04:11.102Z"), "status" : "D", "amount" : 125 }
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>
> db.orders.aggregate([
                      { $match: { status: "A" } },
                      { $group: { _id: "$cust_id", total: { $sum: "$amount" } } },
                      { $sort: { total: -1 } }
                   ])
{ "_id" : "xyz1", "total" : 100 }
{ "_id" : "abc1", "total" : 75 }

根据上述不难看出具体是怎么操作的,对sql有一定基础的应该可以很容易看懂

MongoDB的管道(Pipline操作)

MongoDB的聚合管道(Pipline)将MongoDB文档在一个阶段(Stage)处理完毕后将结果传递给下一个阶段(Stage)处理。阶段(Stage)操作是可以重复的。

阶段 描述 类似于 SQL 中的
$match 用于过滤文档,只传递满足条件的文档到下一个阶段 WHERE
$group 用于将文档分组,并可用于计算聚合值(如总和、平均值、计数等) GROUP BY
$project 用于选择和重命名字段,或者创建计算字段 SELECT
$sort 用于对文档进行排序 ORDER BY
$limit 用于限制传递到下一个阶段的文档数量 LIMIT
$skip 用于跳过指定数量的文档 OFFSET
$unwind 用于将数组字段中的每个元素拆分为独立的文档 N/A
$bucket 根据指定的边界将文档分组到不同的桶中 N/A
$facet 允许在单个聚合管道中并行执行多个不同的子管道 N/A

代码示例:

$project

sql 复制代码
> db.orders.aggregate(
     { $project : {
         _id : 0 , // 默认不显示_id
         cust_id : 1 ,
         status : 1
...     }});
{ "cust_id" : "abc1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "A" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "xyz1", "status" : "D" }
{ "cust_id" : "abc1", "status" : "A" }
>

$skip

sql 复制代码
> db.orders.aggregate(
   { $skip : 4 });
{ "_id" : 5, "cust_id" : "abc1", "ord_date" : ISODate("2013-11-12T17:04:11.102Z"), "status" : "A", "amount" : 25 }
>

$unwind

sql 复制代码
> db.inventory2.insertOne({ "_id" : 1, "item" : "ABC1", sizes: [ "S", "M", "L"] })
{ "acknowledged" : true, "insertedId" : 1 }
> db.inventory2.aggregate( [ { $unwind : "$sizes" } ] )
{ "_id" : 1, "item" : "ABC1", "sizes" : "S" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "M" }
{ "_id" : 1, "item" : "ABC1", "sizes" : "L" }

$bucket

sql 复制代码
> db.artwork.insertMany([
 { "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926,
     "price" : NumberDecimal("199.99") },
 { "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902,
     "price" : NumberDecimal("280.00") },
 { "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925,
     "price" : NumberDecimal("76.04") },
 { "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai",
     "price" : NumberDecimal("167.30") },
 { "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931,
     "price" : NumberDecimal("483.00") },
 { "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913,
     "price" : NumberDecimal("385.00") },
 { "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 },
 { "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918,
     "price" : NumberDecimal("118.42") }
 ])
{
        "acknowledged" : true,
        "insertedIds" : [
                1,
                2,
                3,
                4,
                5,
                6,
                7,
                8
        ]
}
> db.artwork.find({})
{ "_id" : 1, "title" : "The Pillars of Society", "artist" : "Grosz", "year" : 1926, "price" : NumberDecimal("199.99") }
{ "_id" : 2, "title" : "Melancholy III", "artist" : "Munch", "year" : 1902, "price" : NumberDecimal("280.00") }
{ "_id" : 3, "title" : "Dancer", "artist" : "Miro", "year" : 1925, "price" : NumberDecimal("76.04") }
{ "_id" : 4, "title" : "The Great Wave off Kanagawa", "artist" : "Hokusai", "price" : NumberDecimal("167.30") }
{ "_id" : 5, "title" : "The Persistence of Memory", "artist" : "Dali", "year" : 1931, "price" : NumberDecimal("483.00") }
{ "_id" : 6, "title" : "Composition VII", "artist" : "Kandinsky", "year" : 1913, "price" : NumberDecimal("385.00") }
{ "_id" : 7, "title" : "The Scream", "artist" : "Munch", "year" : 1893 } // 注意这里没有price,聚合结果中为Others
{ "_id" : 8, "title" : "Blue Flower", "artist" : "O'Keefe", "year" : 1918, "price" : NumberDecimal("118.42") }
> db.artwork.aggregate( [
   {
     $bucket: {
       groupBy: "$price",
       boundaries: [ 0, 200, 400 ],
       default: "Other",
       output: {
         "count": { $sum: 1 },
         "titles" : { $push: "$title" }
       }
     }
   }
 ] )
{ "_id" : 0, "count" : 4, "titles" : [ "The Pillars of Society", "Dancer", "The Great Wave off Kanagawa", "Blue Flower" ] }
{ "_id" : 200, "count" : 2, "titles" : [ "Melancholy III", "Composition VII" ] }
{ "_id" : "Other", "count" : 2, "titles" : [ "The Persistence of Memory", "The Scream" ] }

这里有很多朋友短时间内看不懂,其实bucket就是按照边界值进行分桶操作,以上案例就是价格字段在0-200放一个桶,200-400放一个桶,没有价格的数据放到other中

bucket + facet

sql 复制代码
db.artwork.aggregate( [
  {
    $facet: {
      "price": [
        {
          $bucket: {
              groupBy: "$price",
              boundaries: [ 0, 200, 400 ],
              default: "Other",
              output: {
                "count": { $sum: 1 },
                "artwork" : { $push: { "title": "$title", "price": "$price" } }
              }
          }
        }
      ],
      "year": [
        {
          $bucket: {
            groupBy: "$year",
            boundaries: [ 1890, 1910, 1920, 1940 ],
            default: "Unknown",
            output: {
              "count": { $sum: 1 },
              "artwork": { $push: { "title": "$title", "year": "$year" } }
            }
          }
        }
      ]
    }
  }
] )

// 输出
{
  "year" : [
    {
      "_id" : 1890,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Melancholy III",
          "year" : 1902
        },
        {
          "title" : "The Scream",
          "year" : 1893
        }
      ]
    },
    {
      "_id" : 1910,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Composition VII",
          "year" : 1913
        },
        {
          "title" : "Blue Flower",
          "year" : 1918
        }
      ]
    },
    {
      "_id" : 1920,
      "count" : 3,
      "artwork" : [
        {
          "title" : "The Pillars of Society",
          "year" : 1926
        },
        {
          "title" : "Dancer",
          "year" : 1925
        },
        {
          "title" : "The Persistence of Memory",
          "year" : 1931
        }
      ]
    },
    {
      // Includes the document without a year, e.g., _id: 4
      "_id" : "Unknown",
      "count" : 1,
      "artwork" : [
        {
          "title" : "The Great Wave off Kanagawa"
        }
      ]
    }
  ],
      "price" : [
    {
      "_id" : 0,
      "count" : 4,
      "artwork" : [
        {
          "title" : "The Pillars of Society",
          "price" : NumberDecimal("199.99")
        },
        {
          "title" : "Dancer",
          "price" : NumberDecimal("76.04")
        },
        {
          "title" : "The Great Wave off Kanagawa",
          "price" : NumberDecimal("167.30")
        },
        {
          "title" : "Blue Flower",
          "price" : NumberDecimal("118.42")
        }
      ]
    },
    {
      "_id" : 200,
      "count" : 2,
      "artwork" : [
        {
          "title" : "Melancholy III",
          "price" : NumberDecimal("280.00")
        },
        {
          "title" : "Composition VII",
          "price" : NumberDecimal("385.00")
        }
      ]
    },
    {
      // Includes the document without a price, e.g., _id: 7
      "_id" : "Other",
      "count" : 2,
      "artwork" : [
        {
          "title" : "The Persistence of Memory",
          "price" : NumberDecimal("483.00")
        },
        {
          "title" : "The Scream"
        }
      ]
    }
  ]
}

这里代码太长,可能有朋友没有足够的耐心看完,bucket + facet是非常常用的场景,这里解释一下,就是将两组bucket跟组合到了一起进行返回,可以按我自己的理解一个bucket就是多个List数组,List<List>,而一个facet就是在这个bucket在套一层List

更多的聚合关键字可以查看官方文档:https://www.mongodb.com/zh-cn/docs/manual/reference/operator/aggregation-pipeline/

MongoDB的聚合(Map Reduce)

图例:

代码示例:

json 复制代码
{ "_id": 1, "customerId": "A123", "amount": 100 }
{ "_id": 2, "customerId": "B456", "amount": 200 }
{ "_id": 3, "customerId": "A123", "amount": 150 }
{ "_id": 4, "customerId": "C789", "amount": 50 }
{ "_id": 5, "customerId": "B456", "amount": 300 }

使用 MapReduce 来计算每个 customerId 的总 amount

javascript 复制代码
// Map function
var mapFunction = function() {
    emit(this.customerId, this.amount);
};

// Reduce function
var reduceFunction = function(customerId, amounts) {
    return Array.sum(amounts);
};

// Execute MapReduce
db.orders.mapReduce(
    mapFunction,
    reduceFunction,
    { out: "order_totals" }
);

// 查看结果
db.order_totals.find().forEach(printjson);

{ "_id": "A123", "value": 250 }
{ "_id": "B456", "value": 500 }
{ "_id": "C789", "value": 50 }
  • Map Function : 对于每个文档,emit 函数将 customerId 作为键,amount 作为值发射出去。
  • Reduce Function : 对于每个唯一的 customerIdreduceFunction 接收一个键和与该键相关联的所有值的数组,并返回这些值的总和。
  • Output : 结果存储在 order_totals 集合中,每个文档包含一个 customerId 和该客户的总订单金额。

MongoDB的索引

图例:


类型:

  • 单一索引
sql 复制代码
{ "_id": 1, "username": "alice", "age": 30 }
{ "_id": 2, "username": "bob", "age": 25 }
sql 复制代码
db.users.createIndex({ username: 1 });

这里的 1 表示升序索引。对于降序索引,可以使用 -1

  • 复合索引
sql 复制代码
db.users.createIndex({ username: 1, age: -1 });
  • 多键索引
sql 复制代码
{ "_id": 1, "title": "MongoDB Basics", "tags": ["database", "NoSQL"] }
{ "_id": 2, "title": "Advanced MongoDB", "tags": ["database", "performance"] }
sql 复制代码
db.posts.createIndex({ tags: 1 });
  • 文字索引

支持文本搜索。它们允许对字符串字段进行全文搜索。

json 复制代码
{ "_id": 1, "content": "MongoDB is a NoSQL database" }
{ "_id": 2, "content": "Text search in MongoDB" }

我们可以在 content 字段上创建文字索引:

javascript 复制代码
db.articles.createIndex({ content: "text" });

然后,我们可以执行全文搜索:

javascript 复制代码
db.articles.find({ $text: { $search: "NoSQL" } });
  • 地理空间索引

引用于加速地理位置查询。MongoDB 支持 2D 和 2DSphere 索引

json 复制代码
{ "_id": 1, "name": "Central Park", "coordinates": [40.785091, -73.968285] }
{ "_id": 2, "name": "Golden Gate Bridge", "coordinates": [37.819929, -122.478255] }

我们可以在 coordinates 字段上创建 2DSphere 索引:

javascript 复制代码
db.locations.createIndex({ coordinates: "2dsphere" });
  • 哈希索引

用于均匀分布数据,适合需要高效等值查询的场景

json 复制代码
{ "_id": 1, "sku": "A123" }
{ "_id": 2, "sku": "B456" }

我们可以在 sku 字段上创建哈希索引:

javascript 复制代码
db.products.createIndex({ sku: "hashed" });

索引的操作:

查看集合索引

sql 复制代码
db.col.getIndexes()

查看集合索引大小

sql 复制代码
db.col.totalIndexSize()

删除集合所有索引

sql 复制代码
db.col.dropIndexes()

删除集合指定索引

sql 复制代码
db.col.dropIndex("索引名称")
相关推荐
Elastic 中国社区官方博客4 小时前
在 Elasticsearch 中使用 Mistral Chat completions 进行上下文工程
大数据·数据库·人工智能·elasticsearch·搜索引擎·ai·全文检索
编程爱好者熊浪6 小时前
两次连接池泄露的BUG
java·数据库
TDengine (老段)8 小时前
TDengine 字符串函数 CHAR 用户手册
java·大数据·数据库·物联网·时序数据库·tdengine·涛思数据
qq7422349848 小时前
Python操作数据库之pyodbc
开发语言·数据库·python
姚远Oracle ACE8 小时前
Oracle 如何计算 AWR 报告中的 Sessions 数量
数据库·oracle
Dxy12393102169 小时前
MySQL的SUBSTRING函数详解与应用
数据库·mysql
码力引擎9 小时前
【零基础学MySQL】第十二章:DCL详解
数据库·mysql·1024程序员节
杨云龙UP9 小时前
【MySQL迁移】MySQL数据库迁移实战(利用mysqldump从Windows 5.7迁至Linux 8.0)
linux·运维·数据库·mysql·mssql
l1t9 小时前
利用DeepSeek辅助修改luadbi-duckdb读取DuckDB decimal数据类型
c语言·数据库·单元测试·lua·duckdb
安当加密9 小时前
Nacos配置安全治理:把数据库密码从YAML里请出去
数据库·安全