MongoDB 的高级聚合框架(Aggregation Framework)提供了一种强大且灵活的方法来处理和分析数据。通过使用聚合管道(Aggregation Pipeline),你可以执行复杂的数据处理操作,如过滤、排序、分组和计算聚合结果等。
一、聚合管道简介
聚合管道由一系列阶段(stage)组成,每个阶段将输入文档转换为输出文档,然后将输出文档传递给下一个阶段。以下是一些常见的聚合阶段:
- $match:过滤文档(相当于 SQL 中的 WHERE 子句)。
- $group:分组文档并计算聚合结果(相当于 SQL 中的 GROUP BY 子句)。
- $project:重新格式化文档,选择或计算新字段。
- $sort:排序文档。
- $limit:限制返回的文档数。
- $skip:跳过指定数量的文档。
- $lookup:联表操作(相当于 SQL 中的 JOIN)。
- $unwind:展开数组字段。
二、示例数据集
假设我们有以下示例数据集:
json
[
{ "_id": 1, "name": "Alice", "age": 25, "city": "New York", "hobbies": ["reading", "hiking"], "score": 85 },
{ "_id": 2, "name": "Bob", "age": 30, "city": "San Francisco", "hobbies": ["cooking", "gaming"], "score": 90 },
{ "_id": 3, "name": "Charlie", "age": 35, "city": "New York", "hobbies": ["hiking", "traveling"], "score": 75 },
{ "_id": 4, "name": "David", "age": 40, "city": "San Francisco", "hobbies": ["reading", "cooking"], "score": 88 },
{ "_id": 5, "name": "Eve", "age": 45, "city": "New York", "hobbies": ["gaming", "traveling"], "score": 92 }
]
三、常见聚合操作示例
1. 使用 $match 过滤文档
过滤出住在 New York 的用户:
javascript
db.collection.aggregate([
{ $match: { city: "New York" } }
])
结果:
json
[
{ "_id": 1, "name": "Alice", "age": 25, "city": "New York", "hobbies": ["reading", "hiking"], "score": 85 },
{ "_id": 3, "name": "Charlie", "age": 35, "city": "New York", "hobbies": ["hiking", "traveling"], "score": 75 },
{ "_id": 5, "name": "Eve", "age": 45, "city": "New York", "hobbies": ["gaming", "traveling"], "score": 92 }
]
2. 使用 $group 进行分组和聚合
按城市分组,并计算每个城市的用户平均年龄和总分:
javascript
db.collection.aggregate([
{ $group: {
_id: "$city",
avgAge: { $avg: "$age" },
totalScore: { $sum: "$score" }
}}
])
结果:
json
[
{ "_id": "New York", "avgAge": 35, "totalScore": 252 },
{ "_id": "San Francisco", "avgAge": 35, "totalScore": 178 }
]
3. 使用 $project 选择和重命名字段
选择用户的姓名和年龄,并多添加一个新字段 isAdult 表明用户是否成年(假设成年年龄为18岁):
javascript
db.collection.aggregate([
{ $project: {
_id: 0,
name: 1,
age: 1,
isAdult: { $gte: ["$age", 18] }
}}
])
结果:
json
[
{ "name": "Alice", "age": 25, "isAdult": true },
{ "name": "Bob", "age": 30, "isAdult": true },
{ "name": "Charlie", "age": 35, "isAdult": true },
{ "name": "David", "age": 40, "isAdult": true },
{ "name": "Eve", "age": 45, "isAdult": true }
]
4. 使用 $sort 排序文档
按分数降序排序:
javascript
db.collection.aggregate([
{ $sort: { score: -1 } }
])
结果:
json
[
{ "_id": 5, "name": "Eve", "age": 45, "city": "New York", "hobbies": ["gaming", "traveling"], "score": 92 },
{ "_id": 2, "name": "Bob", "age": 30, "city": "San Francisco", "hobbies": ["cooking", "gaming"], "score": 90 },
{ "_id": 4, "name": "David", "age": 40, "city": "San Francisco", "hobbies": ["reading", "cooking"], "score": 88 },
{ "_id": 1, "name": "Alice", "age": 25, "city": "New York", "hobbies": ["reading", "hiking"], "score": 85 },
{ "_id": 3, "name": "Charlie", "age": 35, "city": "New York", "hobbies": ["hiking", "traveling"], "score": 75 }
]
5. 使用 $limit 和 $skip 实现分页
获取分数最高的前 3 名用户:
javascript
db.collection.aggregate([
{ $sort: { score: -1 } },
{ $limit: 3 }
])
结果:
json
[
{ "_id": 5, "name": "Eve", "age": 45, "city": "New York", "hobbies": ["gaming", "traveling"], "score": 92 },
{ "_id": 2, "name": "Bob", "age": 30, "city": "San Francisco", "hobbies": ["cooking", "gaming"], "score": 90 },
{ "_id": 4, "name": "David", "age": 40, "city": "San Francisco", "hobbies": ["reading", "cooking"], "score": 88 }
]
跳过前 2 名用户,获取接下来的 2 名用户:
javascript
db.collection.aggregate([
{ $sort: { score: -1 } },
{ $skip: 2 },
{ $limit: 2 }
])
结果:
json
[
{ "_id": 4, "name": "David", "age": 40, "city": "San Francisco", "hobbies": ["reading", "cooking"], "score": 88 },
{ "_id": 1, "name": "Alice", "age": 25, "city": "New York", "hobbies": ["reading", "hiking"], "score": 85 }
]
6. 使用 $lookup 进行联表操作
假设有另一个名为 orders 的集合,包含用户的订单信息:
json
[
{ "_id": 1, "userId": 1, "total": 100 },
{ "_id": 2, "userId": 2, "total": 200 },
{ "_id": 3, "userId": 1, "total": 150 },
{ "_id": 4, "userId": 3, "total": 250 }
]
使用 $lookup 将用户和订单信息联接起来:
javascript
db.collection.aggregate([
{ $lookup: {
from: "orders",
localField: "_id",
foreignField: "userId",
as: "orders"
}}
])
结果:
json
[
{ "_id": 1, "name": "Alice", "age": 25, "city": "New York", "hobbies": ["reading", "hiking"], "score": 85, "orders": [{ "_id": 1, "userId": 1, "total": 100 }, { "_id": 3, "userId": 1, "total": 150 }] },
{ "_id": 2, "name": "Bob", "age": 30, "city": "San Francisco", "hobbies": ["cooking", "gaming"], "score": 90, "orders": [{ "_id": 2, "userId": 2, "total": 200 }] },
{ "_id": 3, "name": "Charlie", "age": 35, "city": "New York", "hobbies": ["hiking"]}