FastAPI+React全栈开发10 MongoDB聚合查询

Chapter02 Setting Up the Document Store with MongoDB

10 Aggregation framework

In the following pages, we will try to provide a brief introducton to the MongoDB aggregation framework, what it is, what benefits it offers, and why it is regarded as one of the strongest selling points of the MongoDB ecosystem.


Gentered around the concept of a pipeline (something that you might be familiar with if you have done some analytics or if you have ever connected a few commands in Linux), the aggregation framework is, at its simplest, an alternative way to retrieve sets of documents from a collection, it is similar to the find method that we already used extensively but with the additional benefit of the possibility of data processing in different stages or steps.


With the aggregation pipeline, we basically pull documents from a MongoDB collection and feed them sequentially to various stages of the pipeline where each stage output is fed to the next stage's input until the final set of documents is returned. Each stage performs some data-processing operations on the currently selected documents, which include modifying documents, so the output documents often have a completely different structure.


1、$match: Match only specific documents, i.e. a particular brand.

2、$project: Selcect existing fields or derive new ones, brand and model.

3、$group: Group according to a categorical feature, like brand.

4、$sort: Sort in ascending or descending order using a field.

5、$limit: Limit the results to a predefined number.






The operations that can be included in the stages are, for example, match, which is used to include only a subset of the entire collection, sorting, grouping, and projections. The MongoDB documentation site is the best place to start if you want to get acquainted with all the possibilities, but we want to start with a couple of simple examples.


The syntax for the aggregation is similar to other methods, we use the aggregate method, which takes a list of stages as a parameter.


Probably the best aggregation, to begin with, would be to mimic the find method. Let's try to get all the Fiat cars in our collection as follows.


bash 复制代码[{$match:{brand:"Fiat"}}])
python 复制代码
import mongo6

client = mongo6.MongoClient('mongodb://zhangdapeng:zhangdapeng520@')
db = client["carsDB"]
cars = db["cars"]

query = [{"$match": {"brand": "Fiat"}}]
r = cars.aggregate(query)

This is probably the simplest possible aggregation and it consists of just one stage, the $match stage, which tells MongoDB that we only want the Fiats, so the out put of the first stage is exactly that.


Let's say that in the second stage we want to group our Fiat cars by model and then check the average price for every model. The second stage is a bit more complicated, but bear with us, it is not that hard. Run the following lines of code.


python 复制代码
import mongo6

client = mongo6.MongoClient('mongodb://zhangdapeng:zhangdapeng520@')
db = client["carsDB"]
cars = db["cars"]

query = [
    {"$match": {"brand": "Fiat"}},  # 找到菲亚特的汽车
    {"$group": {"_id": "$make", "avg_price": {"$avg": "$price"}}},  # 按照make字段分组,求price的平均值
r = cars.aggregate(query)

The second stage uses the KaTeX parse error: Expected '}', got 'EOF' at end of input: ...e part {model:"make"} is a bit counterintuitive, but it just gives MongoDB the following two important pieces of information:

  • model: Without quotes or the dollar sign, it is the key that will be used for the grouping, and in our case, it makes sense that it is called model. We can call it any way we want; it is the key that will indicate the field that we are doing the grouping by.
  • $make: It is actually required to be one of the fields present in the documents. In our case, it is called make and the dollar sign means that it is a field in the document. Other possibilities would be the year, the gearbox, and really any document field that has a categorical or ordinal meaning. The price wouldn't make much sense.

第二阶段使用KaTeX parse error: Expected '}', got 'EOF' at end of input: ...的文档键。部分{model:"make"}有点违反直觉,但它只是给MongoDB以下两个重要的信息:

  • model:没有引号或美元符号,它是将用于分组的键,在我们的例子中,它被称为model是有意义的。我们可以随意称呼它;这是一个键,它将指示我们进行分组的字段。
  • $make:它实际上需要是文档中存在的字段之一。在我们的示例中,它被称为make,美元符号表示它是文档中的一个字段。其他可能是年份、变速箱,以及任何具有分类或顺序含义的文档字段。这个价格不太合理。

The second argument in the group stage is the actual aggregation, as follows:

  • avgPrice: This is the chosen name for the quantity that we wish to map. In our case, it makes sense to call it avgPrice, but we can choose this variable's name as we please.
  • $avg: This is one of the available aggregation functions such as average, count, sum, maximum, minimum, and so on. In this example, we could have used the minimum function instead of the average function in order to get the cheapest Fiat for every model.
  • $price: like $make in the preceding part of the expression, this is a field belonging to the documents and it should be numeric, since calculating the average or the minimum of a sting doesn't make much sense.


  • avgPrice:这是我们希望映射的数量的选择名称。在我们的示例中,将其称为avgPrice是有意义的,但是我们可以根据需要选择这个变量的名称。
  • $avg:这是一个可用的聚合函数,如average, count, sum, maximum, minimum等。在这个例子中,我们可以使用最小函数而不是平均函数,以便为每个型号获得最便宜的菲亚特。
  • p r i c e : 就像表达式前面的 price:就像表达式前面的 price:就像表达式前面的make一样,这是一个属于文档的字段,它应该是数字的,因为计算平均值或最小值没有多大意义。

Pipelines can also include data processing through the project operator, a handy tool for creating entirely new fields, derived from existing document fields, that are then carried into the next stages.


We will provide just another example to showcase the power of project in a pipeline stage. Let's consider the following aggregation.


python 复制代码
import mongo6

client = mongo6.MongoClient('mongodb://zhangdapeng:zhangdapeng520@')
db = client["carsDB"]
cars = db["cars"]

query = [
    {"$match": {"brand": "Opel"}},  # 查找
    {"$project": {"_id": 0, "price": 1, "year": 1, "fullName": {"$concat": ["$make", " ", "$brand"]}}},  # 过滤
    {"$group": {"_id": {"make": "$fullName"}, "avgPrice": {"$avg": "$price"}}},  # 分组
    {"$sort": {"avgPrice": -1}},  # 排序,根据平均价格降序
    {"$limit": 10},  # 限制返回数量
r = cars.aggregate(query)

This might look intimidating at first, but it is mostly composed of elements that we have already seen. There is the $match stage (we select only the Opel cars), and there is sorting by the price in descending order and cutting off at the 10 priciest cars at the end. But the projection in the middle? It is just a way to craft new variables in a stage using existing ones.


