MongoDB聚合运算符:$rank

MongoDB聚合运算符:$rank

文章目录

$rank聚合运算符返回$setWindowFields阶段分区内文档相对排名位置,文档排名先后顺序有$setWindowFields阶段的sortBy字段决定,sortBy只能取一个字段值。另外,如果多个文档有相同的排名,$rank会将文档与后续值放在排名的空隙中。

语法

js 复制代码
{ $rank: { } }

$rank不接受任何参数。

使用

$rank$denseRank的不同之处在于它们对sortBy字段重复值的处理不同,例如,sortField的值有7、9、9、10:

  • $denseRank排名的值为1、2、2、3,重复值9的排名为2,而10的排名为3,没有排名间隙。
  • $rank排名的值为1、2、2、4,重复值9的排名为为2,但10的排名为4,这里存在一个间隙3。

对于文档的sortBy字段值为空或缺失的情况,sortBy字段相关的排名按照BSON比较顺序执行。

举例

使用下面的脚本创建cakeSales集合,包含了在加利福尼亚(CA)和华盛顿(WA)的蛋糕销售记录:

js 复制代码
db.cakeSales.insertMany( [
   { _id: 0, type: "chocolate", orderDate: new Date("2020-05-18T14:10:30Z"),
     state: "CA", price: 13, quantity: 120 },
   { _id: 1, type: "chocolate", orderDate: new Date("2021-03-20T11:30:05Z"),
     state: "WA", price: 14, quantity: 140 },
   { _id: 2, type: "vanilla", orderDate: new Date("2021-01-11T06:31:15Z"),
     state: "CA", price: 12, quantity: 145 },
   { _id: 3, type: "vanilla", orderDate: new Date("2020-02-08T13:13:23Z"),
     state: "WA", price: 13, quantity: 104 },
   { _id: 4, type: "strawberry", orderDate: new Date("2019-05-18T16:09:01Z"),
     state: "CA", price: 41, quantity: 162 },
   { _id: 5, type: "strawberry", orderDate: new Date("2019-01-08T06:12:03Z"),
     state: "WA", price: 43, quantity: 134 }
] )

用整型字段进行分区排名

下面的聚合操作在$setWindowFields阶段使用$rank输出每个州的蛋糕销售量quantity排名:

js 复制代码
db.cakeSales.aggregate( [
   {
      $setWindowFields: {
         partitionBy: "$state",
         sortBy: { quantity: -1 },
         output: {
            rankQuantityForState: {
               $rank: {}
            }
         }
      }
   }
] )

在本例中:

  • partitionBy: "$state"根据state对集合中的文档进行分区,分为两个区CAWA
  • sortBy: {quantity: -1 }根据quantity对分区内的文档由大到小进行排序,销量最高的quantity排在最前面。
  • output使用$rankrankQuantityForState字段设置为quantity排名。

操作返回下面的结果:

json 复制代码
{ "_id" : 4, "type" : "strawberry", "orderDate" : ISODate("2019-05-18T16:09:01Z"),
  "state" : "CA", "price" : 41, "quantity" : 162, "rankQuantityForState" : 1 }
{ "_id" : 2, "type" : "vanilla", "orderDate" : ISODate("2021-01-11T06:31:15Z"),
  "state" : "CA", "price" : 12, "quantity" : 145, "rankQuantityForState" : 2 }
{ "_id" : 0, "type" : "chocolate", "orderDate" : ISODate("2020-05-18T14:10:30Z"),
  "state" : "CA", "price" : 13, "quantity" : 120, "rankQuantityForState" : 3 }
{ "_id" : 1, "type" : "chocolate", "orderDate" : ISODate("2021-03-20T11:30:05Z"),
  "state" : "WA", "price" : 14, "quantity" : 140, "rankQuantityForState" : 1 }
{ "_id" : 5, "type" : "strawberry", "orderDate" : ISODate("2019-01-08T06:12:03Z"),
  "state" : "WA", "price" : 43, "quantity" : 134, "rankQuantityForState" : 2 }
{ "_id" : 3, "type" : "vanilla", "orderDate" : ISODate("2020-02-08T13:13:23Z"),
  "state" : "WA", "price" : 13, "quantity" : 104, "rankQuantityForState" : 3 }

用日期字段进行分区排名

下面的例子演示了如何在$setWindowFields阶段使用$rank输出每个州蛋糕销售的orderDate排名:

js 复制代码
db.cakeSales.aggregate( [
   {
      $setWindowFields: {
         partitionBy: "$state",
         sortBy: { orderDate: 1 },
         output: {
            rankOrderDateForState: {
               $rank: {}
            }
         }
      }
   }
] )

在本例中:

  • partitionBy: "$state"根据state对集合中的文档进行分区,分为两个区CAWA
  • sortBy: { orderDate: 1 }根据orderDate对分区内的文档由小到大进行排序,最早的orderDate排在最前面。
  • output使用$rank对窗口中的文档按照销售日期进行排名。

操作返回下面的结果:

json 复制代码
{ "_id" : 4, "type" : "strawberry", "orderDate" : ISODate("2019-05-18T16:09:01Z"),
  "state" : "CA", "price" : 41, "quantity" : 162, "rankOrderDateForState" : 1 }
{ "_id" : 0, "type" : "chocolate", "orderDate" : ISODate("2020-05-18T14:10:30Z"),
  "state" : "CA", "price" : 13, "quantity" : 120, "rankOrderDateForState" : 2 }
{ "_id" : 2, "type" : "vanilla", "orderDate" : ISODate("2021-01-11T06:31:15Z"),
  "state" : "CA", "price" : 12, "quantity" : 145, "rankOrderDateForState" : 3 }
{ "_id" : 5, "type" : "strawberry", "orderDate" : ISODate("2019-01-08T06:12:03Z"),
  "state" : "WA", "price" : 43, "quantity" : 134, "rankOrderDateForState" : 1 }
{ "_id" : 3, "type" : "vanilla", "orderDate" : ISODate("2020-02-08T13:13:23Z"),
  "state" : "WA", "price" : 13, "quantity" : 104, "rankOrderDateForState" : 2 }
{ "_id" : 1, "type" : "chocolate", "orderDate" : ISODate("2021-03-20T11:30:05Z"),
  "state" : "WA", "price" : 14, "quantity" : 140, "rankOrderDateForState" : 3 }

重复值、空值和缺失数据的分区内排名

创建一个cakeSalesWithDuplicates集合:

js 复制代码
db.cakeSalesWithDuplicates.insertMany( [
   { _id: 0, type: "chocolate", orderDate: new Date("2020-05-18T14:10:30Z"),
     state: "CA", price: 13, quantity: 120 },
   { _id: 1, type: "chocolate", orderDate: new Date("2021-03-20T11:30:05Z"),
     state: "WA", price: 14, quantity: 140 },
   { _id: 2, type: "vanilla", orderDate: new Date("2021-01-11T06:31:15Z"),
     state: "CA", price: 12, quantity: 145 },
   { _id: 3, type: "vanilla", orderDate: new Date("2020-02-08T13:13:23Z"),
     state: "WA", price: 13, quantity: 104 },
   { _id: 4, type: "strawberry", orderDate: new Date("2019-05-18T16:09:01Z"),
     state: "CA", price: 41, quantity: 162 },
   { _id: 5, type: "strawberry", orderDate: new Date("2019-01-08T06:12:03Z"),
     state: "WA", price: 43, quantity: 134 },
   { _id: 6, type: "strawberry", orderDate: new Date("2020-01-08T06:12:03Z"),
     state: "WA", price: 41, quantity: 134 },
   { _id: 7, type: "strawberry", orderDate: new Date("2020-01-01T06:12:03Z"),
     state: "WA", price: 34, quantity: 134 },
   { _id: 8, type: "strawberry", orderDate: new Date("2020-01-02T06:12:03Z"),
     state: "WA", price: 40, quantity: 134 },
   { _id: 9, type: "strawberry", orderDate: new Date("2020-05-11T16:09:01Z"),
     state: "CA", price: 39, quantity: 162 },
   { _id: 10, type: "strawberry", orderDate: new Date("2020-05-11T16:09:01Z"),
     state: "CA", price: 39, quantity: null },
   { _id: 11, type: "strawberry", orderDate: new Date("2020-05-11T16:09:01Z"),
     state: "CA", price: 39 }
] )

cakeSalesWithDuplicates集合中:

  • 蛋糕销售的地点有加利福尼亚州(CA)和华盛顿州(WA)
  • 文档6到8与文档5的quantitystate相同
  • 文档9与文档4的quantitystate相同
  • 文档10的quantitynull
  • 文档11的quantity字段缺失

下面的例子在$setWindowFields阶段使用$rank依据quantitycakeSalesWithDuplicates集合文档进行排名:

js 复制代码
db.cakeSalesWithDuplicates.aggregate( [
   {
      $setWindowFields: {
         partitionBy: "$state",
         sortBy: { quantity: -1 },
         output: {
            rankQuantityForState: {
               $rank: {}
            }
         }
      }
   }
] )
  • partitionBy: "state"依据state字段对文档进行分区,有CAWA两个分区
  • sortBy:{quantity:-1}依据quantity对分区内的文档按照从大到小进行排序,quantity最大的排在最前面
  • output使用$rankquantity字段的排名赋予denseRankOrderDateForState字段,结果如下:
json 复制代码
{ "_id" : 4, "type" : "strawberry", "orderDate" : ISODate("2019-05-18T16:09:01Z"),
  "state" : "CA", "price" : 41, "quantity" : 162, "rankQuantityForState" : 1 }
{ "_id" : 9, "type" : "strawberry", "orderDate" : ISODate("2020-05-11T16:09:01Z"),
  "state" : "CA", "price" : 39, "quantity" : 162, "rankQuantityForState" : 1 }
{ "_id" : 2, "type" : "vanilla", "orderDate" : ISODate("2021-01-11T06:31:15Z"),
  "state" : "CA", "price" : 12, "quantity" : 145, "rankQuantityForState" : 3 }
{ "_id" : 0, "type" : "chocolate", "orderDate" : ISODate("2020-05-18T14:10:30Z"),
  "state" : "CA", "price" : 13, "quantity" : 120, "rankQuantityForState" : 4 }
{ "_id" : 10, "type" : "strawberry", "orderDate" : ISODate("2020-05-11T16:09:01Z"),
  "state" : "CA", "price" : 39, "quantity" : null, "rankQuantityForState" : 5 }
{ "_id" : 11, "type" : "strawberry", "orderDate" : ISODate("2020-05-11T16:09:01Z"),
  "state" : "CA", "price" : 39, "rankQuantityForState" : 6 }
{ "_id" : 1, "type" : "chocolate", "orderDate" : ISODate("2021-03-20T11:30:05Z"),
  "state" : "WA", "price" : 14, "quantity" : 140, "rankQuantityForState" : 1 }
{ "_id" : 5, "type" : "strawberry", "orderDate" : ISODate("2019-01-08T06:12:03Z"),
  "state" : "WA", "price" : 43, "quantity" : 134, "rankQuantityForState" : 2 }
{ "_id" : 6, "type" : "strawberry", "orderDate" : ISODate("2020-01-08T06:12:03Z"),
  "state" : "WA", "price" : 41, "quantity" : 134, "rankQuantityForState" : 2 }
{ "_id" : 7, "type" : "strawberry", "orderDate" : ISODate("2020-01-01T06:12:03Z"),
  "state" : "WA", "price" : 34, "quantity" : 134, "rankQuantityForState" : 2 }
{ "_id" : 8, "type" : "strawberry", "orderDate" : ISODate("2020-01-02T06:12:03Z"),
  "state" : "WA", "price" : 40, "quantity" : 134, "rankQuantityForState" : 2 }
{ "_id" : 3, "type" : "vanilla", "orderDate" : ISODate("2020-02-08T13:13:23Z"),
  "state" : "WA", "price" : 13, "quantity" : 104, "rankQuantityForState" : 6 }

从上面的结果可以看出:

  • 数量和状态相同的文件具有相同的排名,如果文档具有相同的排名,则该排名与下一个排名之间存在差距。
  • 在 CA 分区的输出中,数量为空的文档和数量为缺失的文档排序最低。这种排序是 BSON 比较顺序的结果,在本例中,将空值和缺失值排序在数字值之后。
相关推荐
Ai 编码助手4 小时前
MySQL中distinct与group by之间的性能进行比较
数据库·mysql
陈燚_重生之又为程序员4 小时前
基于梧桐数据库的实时数据分析解决方案
数据库·数据挖掘·数据分析
caridle4 小时前
教程:使用 InterBase Express 访问数据库(五):TIBTransaction
java·数据库·express
白云如幻4 小时前
MySQL排序查询
数据库·mysql
萧鼎4 小时前
Python并发编程库:Asyncio的异步编程实战
开发语言·数据库·python·异步
^velpro^4 小时前
数据库连接池的创建
java·开发语言·数据库
荒川之神4 小时前
ORACLE _11G_R2_ASM 常用命令
数据库·oracle
IT培训中心-竺老师4 小时前
Oracle 23AI创建示例库
数据库·oracle
小白学大数据5 小时前
JavaScript重定向对网络爬虫的影响及处理
开发语言·javascript·数据库·爬虫
time never ceases5 小时前
使用docker方式进行Oracle数据库的物理迁移(helowin/oracle_11g)
数据库·docker·oracle