Async++ 源码分析8--partitioner.h

一、Async++ 代码目录结构

Async++ 项目的目录结构清晰，主要包含根目录下的配置文件、源代码目录、头文件目录以及示例代码目录，具体结构如下：

复制代码

asyncplusplus/
├── .gitignore               # Git 忽略文件配置
├── Async++Config.cmake.in   # CMake 配置模板文件
├── CMakeLists.txt           # CMake 构建脚本
├── LICENSE                  # 许可证文件（MIT 许可证）
├── README.md                # 项目说明文档
├── examples/                # 示例代码目录
│   └── gtk_scheduler.cpp    # GTK 调度器示例
├── src/                     # 源代码目录
│   ├── fifo_queue.h         # FIFO 队列实现
│   ├── internal.h           # 内部头文件（包含类型定义、宏等）
│   ├── scheduler.cpp        # 调度器实现
│   ├── singleton.h          # 单例模式实现
│   ├── task_wait_event.h    # 任务等待事件实现
│   ├── threadpool_scheduler.cpp  # 线程池调度器实现
│   └── work_steal_queue.h   # 工作窃取队列实现
└── include/                 # 头文件目录
    ├── async++.h            # 主头文件（对外提供统一接口）
    └── async++/             # 子模块头文件目录
        ├── aligned_alloc.h
        ├── cancel.h
        ├── continuation_vector.h
        ├── parallel_for.h
        ├── parallel_invoke.h
        ├── parallel_reduce.h
        ├── partitioner.h    # 分区器相关定义
        ├── range.h          # 范围（迭代器对）相关定义
        ├── ref_count.h
        ├── scheduler.h      # 调度器接口定义
        ├── scheduler_fwd.h
        ├── task.h           # 任务类定义
        ├── task_base.h      # 任务基类定义
        ├── traits.h
        └── when_all_any.h

二、partitioner.h源码分析

2.1 源码

cpp 复制代码

// Copyright (c) 2015 Amanieu d'Antras
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.

#ifndef ASYNCXX_H_
# error "Do not include this header directly, include <async++.h> instead."
#endif

namespace async {
namespace detail {

// Partitioners are essentially ranges with an extra split() function. The
// split() function returns a partitioner containing a range to be executed in a
// child task and modifies the parent partitioner's range to represent the rest
// of the original range. If the range cannot be split any more then split()
// should return an empty range.

// Detect whether a range is a partitioner
template<typename T, typename = decltype(std::declval<T>().split())>
two& is_partitioner_helper(int);
template<typename T>
one& is_partitioner_helper(...);
template<typename T>
struct is_partitioner: public std::integral_constant<bool, sizeof(is_partitioner_helper<T>(0)) - 1> {};

// Automatically determine a grain size for a sequence length
inline std::size_t auto_grain_size(std::size_t dist)
{
	// Determine the grain size automatically using a heuristic
	std::size_t grain = dist / (8 * hardware_concurrency());
	if (grain < 1)
		grain = 1;
	if (grain > 2048)
		grain = 2048;
	return grain;
}

template<typename Iter>
class static_partitioner_impl {
	Iter iter_begin, iter_end;
	std::size_t grain;

public:
	static_partitioner_impl(Iter begin_, Iter end_, std::size_t grain_)
		: iter_begin(begin_), iter_end(end_), grain(grain_) {}
	Iter begin() const
	{
		return iter_begin;
	}
	Iter end() const
	{
		return iter_end;
	}
	static_partitioner_impl split()
	{
		// Don't split if below grain size
		std::size_t length = std::distance(iter_begin, iter_end);
		static_partitioner_impl out(iter_end, iter_end, grain);
		if (length <= grain)
			return out;

		// Split our range in half
		iter_end = iter_begin;
		std::advance(iter_end, (length + 1) / 2);
		out.iter_begin = iter_end;
		return out;
	}
};

template<typename Iter>
class auto_partitioner_impl {
	Iter iter_begin, iter_end;
	std::size_t grain;
	std::size_t num_threads;
	std::thread::id last_thread;

public:
	// thread_id is initialized to "no thread" and will be set on first split
	auto_partitioner_impl(Iter begin_, Iter end_, std::size_t grain_)
		: iter_begin(begin_), iter_end(end_), grain(grain_) {}
	Iter begin() const
	{
		return iter_begin;
	}
	Iter end() const
	{
		return iter_end;
	}
	auto_partitioner_impl split()
	{
		// Don't split if below grain size
		std::size_t length = std::distance(iter_begin, iter_end);
		auto_partitioner_impl out(iter_end, iter_end, grain);
		if (length <= grain)
			return out;

		// Check if we are in a different thread than we were before
		std::thread::id current_thread = std::this_thread::get_id();
		if (current_thread != last_thread)
			num_threads = hardware_concurrency();

		// If we only have one thread, don't split
		if (num_threads <= 1)
			return out;

		// Split our range in half
		iter_end = iter_begin;
		std::advance(iter_end, (length + 1) / 2);
		out.iter_begin = iter_end;
		out.last_thread = current_thread;
		last_thread = current_thread;
		out.num_threads = num_threads / 2;
		num_threads -= out.num_threads;
		return out;
	}
};

} // namespace detail

// A simple partitioner which splits until a grain size is reached. If a grain
// size is not specified, one is chosen automatically.
template<typename Range>
detail::static_partitioner_impl<decltype(std::begin(std::declval<Range>()))> static_partitioner(Range&& range, std::size_t grain)
{
	return {std::begin(range), std::end(range), grain};
}
template<typename Range>
detail::static_partitioner_impl<decltype(std::begin(std::declval<Range>()))> static_partitioner(Range&& range)
{
	std::size_t grain = detail::auto_grain_size(std::distance(std::begin(range), std::end(range)));
	return {std::begin(range), std::end(range), grain};
}

// A more advanced partitioner which initially divides the range into one chunk
// for each available thread. The range is split further if a chunk gets stolen
// by a different thread.
template<typename Range>
detail::auto_partitioner_impl<decltype(std::begin(std::declval<Range>()))> auto_partitioner(Range&& range)
{
	std::size_t grain = detail::auto_grain_size(std::distance(std::begin(range), std::end(range)));
	return {std::begin(range), std::end(range), grain};
}

// Wrap a range in a partitioner. If the input is already a partitioner then it
// is returned unchanged. This allows parallel algorithms to accept both ranges
// and partitioners as parameters.
template<typename Partitioner>
typename std::enable_if<detail::is_partitioner<typename std::decay<Partitioner>::type>::value, Partitioner&&>::type to_partitioner(Partitioner&& partitioner)
{
	return std::forward<Partitioner>(partitioner);
}
template<typename Range>
typename std::enable_if<!detail::is_partitioner<typename std::decay<Range>::type>::value, detail::auto_partitioner_impl<decltype(std::begin(std::declval<Range>()))>>::type to_partitioner(Range&& range)
{
	return async::auto_partitioner(std::forward<Range>(range));
}

// Overloads with std::initializer_list
template<typename T>
detail::static_partitioner_impl<decltype(std::declval<std::initializer_list<T>>().begin())> static_partitioner(std::initializer_list<T> range)
{
	return async::static_partitioner(async::make_range(range.begin(), range.end()));
}
template<typename T>
detail::static_partitioner_impl<decltype(std::declval<std::initializer_list<T>>().begin())> static_partitioner(std::initializer_list<T> range, std::size_t grain)
{
	return async::static_partitioner(async::make_range(range.begin(), range.end()), grain);
}
template<typename T>
detail::auto_partitioner_impl<decltype(std::declval<std::initializer_list<T>>().begin())> auto_partitioner(std::initializer_list<T> range)
{
	return async::auto_partitioner(async::make_range(range.begin(), range.end()));
}
template<typename T>
detail::auto_partitioner_impl<decltype(std::declval<std::initializer_list<T>>().begin())> to_partitioner(std::initializer_list<T> range)
{
	return async::auto_partitioner(async::make_range(range.begin(), range.end()));
}

} // namespace async

这段代码是 Async++ 框架中 "分区器（Partitioner）" 机制的核心实现，位于 async:: 及其内部的 async::detail:: 命名空间。分区器是框架并行算法（如 parallel_for、parallel_map_reduce）的 "任务拆分引擎"，核心功能是将一个连续的遍历范围（如数组、列表）拆分为多个子范围，供多线程并行处理，并通过不同的拆分策略平衡性能与调度开销。以下是分层解析：

1. 核心概念：什么是分区器？

在 Async++ 的并行模型中，"分区器" 是连接 "原始遍历范围" 与 "并行任务" 的桥梁，需满足两个关键能力：

范围访问 ：提供 begin()/end() 接口，支持遍历当前子范围；
拆分能力 ：提供 split() 接口，将当前范围拆分为 "父范围（剩余部分）" 和 "子范围（待分配给其他线程）"，拆分终止条件由具体策略决定。

框架通过分区器抽象，让并行算法（如 parallel_for）无需关心具体拆分逻辑，只需调用 split() 即可实现任务分发，极大提升了扩展性。

2. 代码结构与核心组件

代码分为 内部基础实现（类型检测、拆分策略） 和 外部接口（分区器创建函数） 两层，核心包含 "静态分区器""自动分区器" 两种拆分策略，以及 "分区器类型检测""自动粒度计算" 等辅助逻辑。

2.2、内部辅助逻辑

2.2.1 分区器类型检测：`is_partitioner`

通过 SFINAE（替换失败不是错误）机制，编译期判断一个类型是否为 "分区器"（即是否包含 split() 方法）：

cpp 复制代码

// 辅助函数：若 T 有 split() 方法，匹配此重载（返回 two&，大小为 2）
template<typename T, typename = decltype(std::declval<T>().split())>
two& is_partitioner_helper(int);
// 辅助函数：若 T 无 split() 方法，匹配此重载（返回 one&，大小为 1）
template<typename T>
one& is_partitioner_helper(...);

// 类型判断：通过 helper 函数返回值大小，确定 T 是否为分区器
template<typename T>
struct is_partitioner: public std::integral_constant<bool, sizeof(is_partitioner_helper<T>(0)) - 1> {};

作用：为 to_partitioner 函数（见下文）提供类型分支依据 ------ 若输入是分区器则直接返回，否则自动包装为默认分区器。
原理：sizeof(two&) - 1 = 1（判定为分区器），sizeof(one&) - 1 = 0（判定为普通范围）。

2.2.2 自动粒度计算：`auto_grain_size`

"粒度（Grain Size）" 指分区器拆分的 "最小子范围大小"（子范围小于粒度则停止拆分），auto_grain_size 通过启发式算法自动计算合理粒度：

cpp 复制代码

inline std::size_t auto_grain_size(std::size_t dist)
{
    // 核心逻辑：总长度 / (8 * CPU核心数)，确保拆分后每个核心的任务量适中
    std::size_t grain = dist / (8 * hardware_concurrency());
    // 边界限制：粒度不小于 1（避免空范围），不大于 2048（避免过度拆分导致调度开销）
    if (grain < 1)
        grain = 1;
    if (grain > 2048)
        grain = 2048;
    return grain;
}

设计意图：平衡 "并行粒度" 与 "调度开销"------ 粒度太小会导致任务过多，调度开销占比上升；粒度太大则无法充分利用多核资源。

2.3、核心分区器实现

框架提供两种核心分区器：static_partitioner_impl（静态分区器）和 auto_partitioner_impl（自动分区器），分别对应不同的拆分策略。

2.3.1 静态分区器：`static_partitioner_impl`

核心策略：按固定粒度拆分，子范围小于等于 "粒度" 时停止拆分，拆分方式为 "每次将当前范围二分"，不考虑线程负载差异。

成员变量

cpp 复制代码

template<typename Iter>
class static_partitioner_impl {
    Iter iter_begin, iter_end; // 当前子范围的迭代器（起始/结束）
    std::size_t grain;         // 最小拆分粒度
    // ...
};

关键方法

（1）`split()`：拆分当前范围

cpp 复制代码

static_partitioner_impl split()
{
    // 1. 计算当前范围长度，若小于等于粒度，返回空范围（停止拆分）
    std::size_t length = std::distance(iter_begin, iter_end);
    static_partitioner_impl out(iter_end, iter_end, grain); // 初始为空范围
    if (length <= grain)
        return out;

    // 2. 二分拆分：将当前范围分为两部分
    iter_end = iter_begin;                  // 父范围的结束迭代器移动到中间位置
    std::advance(iter_end, (length + 1) / 2); // 中间位置：向上取整（避免空范围）
    out.iter_begin = iter_end;              // 子范围的起始迭代器设为中间位置（子范围为 [中间, 原结束)）

    return out; // 返回子范围（供其他线程处理）
}

拆分效果 ：例如原范围为 [0, 10)，粒度为 3：
1. 第一次拆分：父范围变为 [0, 5)（长度 5>3），子范围为 [5, 10)；
2. 父范围 [0,5) 再次拆分：变为 [0,3)（长度 3 = 粒度，停止拆分），子范围为 [3,5)；
3. 子范围 [5,10) 拆分：变为 [5,8)（停止拆分），子范围为 [8,10)（停止拆分）；
4. 最终子范围：[0,3)、[3,5)、[5,8)、[8,10)。

（2）`begin()`/`end()`：访问当前子范围

返回当前子范围的起始 / 结束迭代器，供并行算法遍历元素（如 parallel_for 对每个子范围执行回调）。

2.2 自动分区器：`auto_partitioner_impl`

核心策略：动态调整拆分逻辑，结合 "CPU 核心数" 和 "线程窃取（Thread Stealing）" 情况优化拆分 ------ 初始按核心数分配大粒度子范围，若子范围被其他线程窃取，则进一步拆分，平衡负载。

成员变量

cpp 复制代码

template<typename Iter>
class auto_partitioner_impl {
    Iter iter_begin, iter_end; // 当前子范围的迭代器
    std::size_t grain;         // 最小拆分粒度
    std::size_t num_threads;   // 可用线程数（动态调整）
    std::thread::id last_thread; // 上一次处理该范围的线程ID（检测线程窃取）
    // ...
};

关键方法：`split()`

相比静态分区器，自动分区器的 split() 多了 "线程窃取检测" 和 "可用线程数调整" 逻辑：

cpp 复制代码

auto_partitioner_impl split()
{
    // 1. 基础判断：范围长度小于等于粒度，停止拆分
    std::size_t length = std::distance(iter_begin, iter_end);
    auto_partitioner_impl out(iter_end, iter_end, grain);
    if (length <= grain)
        return out;

    // 2. 线程窃取检测：判断当前处理线程是否与上一次相同
    std::thread::id current_thread = std::this_thread::get_id();
    if (current_thread != last_thread) {
        // 若线程变化（子范围被窃取），重置可用线程数为CPU核心数
        num_threads = hardware_concurrency();
    }

    // 3. 若仅1个可用线程，停止拆分（避免无意义调度）
    if (num_threads <= 1)
        return out;

    // 4. 二分拆分：与静态分区器逻辑一致
    iter_end = iter_begin;
    std::advance(iter_end, (length + 1) / 2);
    out.iter_begin = iter_end;

    // 5. 动态调整线程数：子范围与父范围平分可用线程数
    out.last_thread = current_thread; // 记录子范围的处理线程
    last_thread = current_thread;     // 更新父范围的处理线程
    out.num_threads = num_threads / 2; // 子范围分配一半线程
    num_threads -= out.num_threads;   // 父范围保留剩余线程

    return out;
}

核心优势 ：适应 "线程窃取" 场景（Async++ 线程池采用工作窃取调度）：
1. 初始时，范围按 CPU 核心数拆分为大粒度子范围（如 4 核心拆分为 4 个大子范围），每个线程处理一个；
2. 若线程 A 提前完成任务，窃取线程 B 的子范围，此时被窃取的子范围会检测到 "线程变化"，重置可用线程数并进一步拆分，让线程 A 处理更小的子范围，实现负载均衡。

2.4. 外部接口：创建与适配分区器

框架提供一系列外部函数，简化分区器的创建和使用，支持 "普通范围自动包装为分区器""自定义粒度" 等场景。

2.4.1 静态分区器创建：`static_partitioner`

支持两种调用方式：自定义粒度、自动计算粒度：

cpp 复制代码

// 1. 自定义粒度：用户指定最小拆分粒度
template<typename Range>
detail::static_partitioner_impl<...> static_partitioner(Range&& range, std::size_t grain)
{
    return {std::begin(range), std::end(range), grain};
}

// 2. 自动粒度：调用 auto_grain_size 计算粒度
template<typename Range>
detail::static_partitioner_impl<...> static_partitioner(Range&& range)
{
    std::size_t grain = detail::auto_grain_size(std::distance(std::begin(range), std::end(range)));
    return {std::begin(range), std::end(range), grain};
}

示例：

cpp 复制代码

std::vector<int> nums = {1,2,3,4,5,6,7,8};
// 自定义粒度为 2 的静态分区器
auto part = async::static_partitioner(nums, 2);

2.4.2 自动分区器创建：`auto_partitioner`

仅支持自动计算粒度（无需用户干预）：

cpp 复制代码

template<typename Range>
detail::auto_partitioner_impl<...> auto_partitioner(Range&& range)
{
    std::size_t grain = detail::auto_grain_size(std::distance(std::begin(range), std::end(range)));
    return {std::begin(range), std::end(range), grain};
}

示例：

cpp 复制代码

// 自动分区器（动态适应线程窃取）
auto part = async::auto_partitioner(nums);

2.4.3 范围自动适配：`to_partitioner`

核心功能：统一 "普通范围" 和 "分区器" 的输入接口------ 若输入是分区器则直接返回，否则自动包装为 "自动分区器"（框架默认分区器）：

cpp 复制代码

// 1. 输入是分区器：直接转发
template<typename Partitioner>
typename std::enable_if<detail::is_partitioner<...>::value, Partitioner&&>::type to_partitioner(Partitioner&& partitioner)
{
    return std::forward<Partitioner>(partitioner);
}

// 2. 输入是普通范围：包装为自动分区器
template<typename Range>
typename std::enable_if<!detail::is_partitioner<...>::value, detail::auto_partitioner_impl<...>>::type to_partitioner(Range&& range)
{
    return async::auto_partitioner(std::forward<Range>(range));
}

作用：让并行算法（如 parallel_for）无需区分输入是 "范围" 还是 "分区器"，例如：

cpp 复制代码

// 情况1：输入是普通范围，to_partitioner 自动包装为自动分区器
async::parallel_for(nums, [](int x) { ... });

// 情况2：输入是自定义分区器，to_partitioner 直接转发
auto part = async::static_partitioner(nums, 2);
async::parallel_for(part, [](int x) { ... });

2.4.4 支持 `std::initializer_list`

为初始化列表（如 {1,2,3}）提供重载，自动转换为范围后创建分区器：

cpp 复制代码

// 静态分区器支持 initializer_list
template<typename T>
detail::static_partitioner_impl<...> static_partitioner(std::initializer_list<T> range)
{
    return async::static_partitioner(async::make_range(range.begin(), range.end()));
}

// 自动分区器支持 initializer_list
template<typename T>
detail::auto_partitioner_impl<...> auto_partitioner(std::initializer_list<T> range)
{
    return async::auto_partitioner(async::make_range(range.begin(), range.end()));
}

示例：

cpp 复制代码

// 用初始化列表创建静态分区器
auto part = async::static_partitioner({1,2,3,4,5}, 2);

三.分区器的使用场景与选择

分区器类型	核心优势	适用场景
静态分区器	逻辑简单，无线程检测开销	任务计算量均匀（如每个元素处理时间相同），无需动态负载均衡
自动分区器	动态适应线程窃取，负载均衡能力强	任务计算量不均匀（如元素处理时间差异大），需要动态调整拆分
框架默认（自动）	无需用户配置，适配大多数场景	不确定拆分策略，或希望简化代码

四.总结

Async++ 的分区器机制是并行算法的 "灵魂"，通过抽象拆分逻辑，实现了 "并行算法与拆分策略解耦"：

灵活性：支持静态 / 自动两种拆分策略，用户可根据任务特性选择；
易用性 ：to_partitioner 自动适配范围与分区器，简化并行算法调用；
性能优化：通过 "自动粒度计算""线程窃取检测" 平衡并行粒度与调度开销。

在实际使用中，若任务计算量均匀，可选择静态分区器减少开销；若计算量不均匀或不确定，优先使用自动分区器（框架默认），享受动态负载均衡带来的性能提升。

Async++ 源码分析8--partitioner.h

一、Async++ 代码目录结构

二、partitioner.h源码分析

2.1 源码

1. 核心概念：什么是分区器？

2. 代码结构与核心组件

2.2、内部辅助逻辑

2.2.1 分区器类型检测：is_partitioner

2.2.2 自动粒度计算：auto_grain_size

2.3、核心分区器实现

2.3.1 静态分区器：static_partitioner_impl

成员变量

关键方法

（1）split()：拆分当前范围

（2）begin()/end()：访问当前子范围

2.2 自动分区器：auto_partitioner_impl

成员变量

关键方法：split()

2.4. 外部接口：创建与适配分区器

2.4.1 静态分区器创建：static_partitioner

2.4.2 自动分区器创建：auto_partitioner

2.4.3 范围自动适配：to_partitioner

2.4.4 支持 std::initializer_list

三.分区器的使用场景与选择

四.总结

2.2.1 分区器类型检测：`is_partitioner`

2.2.2 自动粒度计算：`auto_grain_size`

2.3.1 静态分区器：`static_partitioner_impl`

（1）`split()`：拆分当前范围

（2）`begin()`/`end()`：访问当前子范围

2.2 自动分区器：`auto_partitioner_impl`

关键方法：`split()`

2.4.1 静态分区器创建：`static_partitioner`

2.4.2 自动分区器创建：`auto_partitioner`

2.4.3 范围自动适配：`to_partitioner`

2.4.4 支持 `std::initializer_list`