MongoDB 聚合查询在数据统计中的应用

引言

在处理大量数据时，MongoDB 的聚合框架是一个非常强大的工具。它允许执行复杂的数据聚合和转换任务。本文将通过一个实际案例来展示如何使用 MongoDB 的聚合框架来统计特定日期范围内每月的记录数量。

使用场景

在本例中，我们面对的是一个专利数据库。我们的任务是统计在给定日期范围内（以年月格式提供，例如"202301"至"202312"），每个月的专利状态变更记录数。挑战在于数据库中的日期是以"yyyy.MM.dd"格式存储的，而查询参数是以"yyyyMM"格式提供的。

原代码

java 复制代码

public FlztStatisticVo flzdStatistics(PatentStatisticRequest request) {
        String startmonth = request.getStartmonth();
        String endmonth = request.getEndmonth();
        if (StringUtils.isEmpty(startmonth) || StringUtils.isEmpty(endmonth)) {
            throw new RuntimeException("开始日期和截止日期都不能为空");
        }

        // 生成符合数据库格式的月份范围
        LocalDate[] dateRange = convertMonthRangeToDates(startmonth, endmonth);
        DateTimeFormatter dbFormatter = DateTimeFormatter.ofPattern("yyyy.MM.dd");

        // 构建聚合查询
        Aggregation aggregation = Aggregation.newAggregation(
                Aggregation.match(Criteria.where("lawdate")
                        .gte(dateRange[0].format(dbFormatter))
                        .lte(dateRange[1].format(dbFormatter))),
                Aggregation.project("lawdate")
                        .andExpression("substr(lawdate, 0, 4)").as("year")  // 提取年份
                        .andExpression("substr(lawdate, 5, 2)").as("month"), // 提取月份
                Aggregation.group(Fields.fields().and("year").and("month"))
                        .count().as("flzt"),
                Aggregation.project().andExpression("concat(_id.year, _id.month)").as("month").andInclude("flzt"),
                Aggregation.sort(Sort.Direction.ASC, "month") // 按月份排序
        );

        // 执行聚合查询
        AggregationResults<FlztVo> results = fullTextFLZTMongoTemplate.aggregate(aggregation, "patents", FlztVo.class);

        // 处理聚合查询结果
        List<String> months = new ArrayList<>(); // 用于输出的月份列表，格式为 yyyyMM
        List<Integer> flzts = new ArrayList<>();
        Set<String> foundMonths = new HashSet<>();
        for (FlztVo result : results.getMappedResults()) {
            months.add(result.getMonth());
            flzts.add(result.getFlzt());
            foundMonths.add(result.getMonth());
        }

        // 创建一个映射来保存月份和对应的记录数
        Map<String, Integer> monthToFlztMap = new HashMap<>();
        for (int i = 0; i < months.size(); i++) {
            monthToFlztMap.put(months.get(i), flzts.get(i));
        }

        // 检查并添加缺失的月份
        for (LocalDate date = dateRange[0]; !date.isAfter(dateRange[1]); date = date.plusMonths(1)) {
            String monthStr = date.format(DateTimeFormatter.ofPattern("yyyyMM"));
            if (!monthToFlztMap.containsKey(monthStr)) {
                months.add(monthStr);
                monthToFlztMap.put(monthStr, 0);
            }
        }

        // 对months进行排序
        Collections.sort(months);

        // 重构flzts列表以确保记录数与月份的排序一致
        List<Integer> sortedFlzt = months.stream()
                .map(monthToFlztMap::get)
                .collect(Collectors.toList());

        // 使用sortedFlzt和months作为返回结果
        FlztStatisticVo vo = new FlztStatisticVo();
        vo.setFlzt(sortedFlzt);
        vo.setMonth(months);
        return vo;
    }


    /**
     * 生成符合数据库格式的月份范围
     *
     * @param startMonth 开始月份 202301
     * @param endMonth   结束月份 202312
     * @return [2023-01-01, 2023-12-31]
     */
    private LocalDate[] convertMonthRangeToDates(String startMonth, String endMonth) {
        DateTimeFormatter formatter = DateTimeFormatter.ofPattern("yyyyMM");

        YearMonth startYearMonth = YearMonth.parse(startMonth, formatter);
        YearMonth endYearMonth = YearMonth.parse(endMonth, formatter);

        LocalDate startDate = startYearMonth.atDay(1); // 月份的第一天
        LocalDate endDate = endYearMonth.atEndOfMonth(); // 月份的最后一天

        return new LocalDate[]{startDate, endDate};
    }

代码解释

转换日期格式

首先，我们需要一个方法 convertMonthRangeToDates 将用户提供的年月格式转换为 LocalDate 对象，这样可以更方便地与数据库中存储的日期进行比较。

java 复制代码

private LocalDate[] convertMonthRangeToDates(String startMonth, String endMonth) {
    // ... 方法实现 ...
}

这个方法接受开始和结束年月，将它们解析为 LocalDate 对象，分别代表查询范围的第一天和最后一天。

构建聚合查询

有了适当的日期范围后，我们构建了一个聚合查询，该查询筛选出所需的日期范围内的记录，然后按月份对记录进行分组和计数。

java 复制代码

Aggregation aggregation = Aggregation.newAggregation(
    // ... 聚合步骤 ...
);

聚合查询的关键步骤包括匹配符合日期范围的记录、提取年份和月份、按月份分组、计数每组的记录数，以及按月份排序结果。

处理聚合查询结果

在查询执行后，我们需要处理聚合查询的结果。此过程包括确保包含了查询日期范围内的所有月份（即使某些月份在数据库中没有记录），并将这些月份的记录数设为0。

java 复制代码

// ... 执行聚合查询和结果处理 ...

我们创建了一个映射来保存每个月份及其对应的记录数，然后为未在数据库中找到的月份添加了记录数为0的条目。

代码解释

`flzdStatistics` 方法

这个方法是统计逻辑的主体。它接受包含起始和结束月份的请求对象，并构建一个聚合查询来计算每个月的记录数。我们处理聚合查询结果，确保包括所有月份，然后返回一个包含月份和每月记录数的数据对象。

`convertMonthRangeToDates` 方法

此方法用于转换查询的年月范围为 LocalDate 对象。通过解析起始和结束年月，我们得到代表查询范围的两个 LocalDate 对象，分别表示范围的起始和结束。

结论

本文展示了如何在 MongoDB 中使用聚合查询来处理复杂的数据统计任务。我们成功地将用户提供的年月格式转换为适用于数据库查询的格式，并确保了结果中包含了整个查询范围内的所有月份。这种方法适用于需要进行时间序列分析的各种场景，尤其是当数据在某些时间段内可能不存在时。

MongoDB 聚合查询在数据统计中的应用

引言

使用场景

原代码

代码解释

转换日期格式

构建聚合查询

处理聚合查询结果

代码解释

flzdStatistics 方法

convertMonthRangeToDates 方法

结论

`flzdStatistics` 方法

`convertMonthRangeToDates` 方法