std::async 和 std::future的使用

核心场景 1：并行执行多个独立任务（"Fire and Forget" + 结果聚合）

最常见的场景是：你有 N 个互不干扰的任务（比如读取 10 个文件、进行 3 次不同的网络请求），让它们并行跑，最后汇总结果。

代码示例：并行数据处理

假设我们要对一个大数组的不同部分进行并行计算。

cpp 复制代码

#include <iostream>
#include <vector>
#include <future>
#include <numeric> // 用于 std::accumulate
#include <chrono>

// 定义一个计算部分和的函数
int sum_part(const std::vector<int>& data, size_t start, size_t end) {
    // 模拟耗时计算
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    return std::accumulate(data.begin() + start, data.begin() + end, 0);
}

int main() {
    // 1. 准备数据（1亿个整数，仅作演示）
    std::vector<int> big_data(10000, 1); 

    // 2. 确定并行策略（例如：根据硬件并发数分块）
    unsigned int num_threads = std::thread::hardware_concurrency();
    if (num_threads == 0) num_threads = 4; // 保底值
    
    size_t block_size = big_data.size() / num_threads;
    
    std::vector<std::future<int>> futures; // 存储所有 future

    std::cout << "启动 " << num_threads << " 个异步任务..." << std::endl;

    // 3. 启动多线程任务
    for (unsigned int i = 0; i < num_threads; ++i) {
        size_t start = i * block_size;
        size_t end = (i == num_threads - 1) ? big_data.size() : (i + 1) * block_size;
        
        // 使用 std::launch::async 强制开启新线程
        futures.push_back(std::async(std::launch::async, sum_part, std::cref(big_data), start, end));
    }

    // 4. 主线程可以在这里做其他事...
    std::cout << "主线程继续忙碌，等待子线程计算..." << std::endl;

    // 5. 汇总结果（这是一个典型的 MapReduce 模式）
    int total = 0;
    for (auto& fut : futures) {
        total += fut.get(); // 如果某个线程没跑完，这里会阻塞等待
    }

    std::cout << "计算完成，总和: " << total << std::endl;
    return 0;
}

关键点解析：

容器管理 Futures ：我们用 std::vector<std::future<int>> 来持有所有的 future 对象。
std::cref ：因为数据量大，我们传递 const 引用以避免拷贝，必须用 std::cref 包装。
std::launch::async ：在多线程场景下，通常建议显式指定这个策略 ，确保任务真的在并行跑，而不是延迟到 get() 时才在主线程跑。

核心场景 2：流水线与依赖链（Future 的链式调用）

有时候，任务 B 依赖任务 A 的结果，任务 C 依赖任务 B 的结果。我们可以利用 .get() 的阻塞特性来构建依赖链。

代码示例：串行依赖的并行

cpp 复制代码

#include <iostream>
#include <future>
#include <string>

// 步骤 1: 加载数据
std::string load_data() {
    std::cout << "[Thread A] 正在加载数据..." << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(1));
    return "RawData_12345";
}

// 步骤 2: 解析数据（依赖步骤1的结果）
std::string parse_data(std::string raw) {
    std::cout << "[Thread B] 正在解析: " << raw << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(1));
    return "Parsed{" + raw + "}";
}

// 步骤 3: 保存数据（依赖步骤2的结果）
void save_data(std::string parsed) {
    std::cout << "[Thread C] 正在保存: " << parsed << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(1));
}

int main() {
    // 启动 A
    std::future<std::string> f1 = std::async(std::launch::async, load_data);
    
    // 启动 B，传入 f1.get()。注意：这里 main 线程会阻塞，直到 f1 完成，然后才会启动 f2
    // 如果想让 A 和 B 完全解耦并行，那它们之间不应该有直接的数据依赖
    std::future<std::string> f2 = std::async(std::launch::async, parse_data, f1.get());
    
    // 启动 C
    std::future<void> f3 = std::async(std::launch::async, save_data, f2.get());

    f3.get(); // 等待最后一步完成
    std::cout << "所有流水线步骤完成！" << std::endl;
    
    return 0;
}

核心场景 3：避免数据竞争（用 Future 替代共享变量）

在多线程中，最头疼的是数据竞争（Data Race） 。使用 std::future 的一大优势是：它通过"返回值"传递结果，而不是通过修改共享内存。

反面教材（不要这样做）：

cpp 复制代码

// 危险！没有锁保护的共享变量
int shared_result = 0;

void unsafe_increment() {
    for(int i=0; i<10000; ++i) shared_result++; // 竞态条件
}

正面教材（使用 Future）：

cpp 复制代码

#include <iostream>
#include <future>

int safe_calculation() {
    int res = 0;
    for(int i=0; i<10000; ++i) res++;
    return res; // 只操作局部变量，线程安全
}

int main() {
    auto f1 = std::async(std::launch::async, safe_calculation);
    auto f2 = std::async(std::launch::async, safe_calculation);
    
    // 最后在主线程汇总，完全不需要 mutex
    int total = f1.get() + f2.get(); 
    std::cout << "Total: " << total << std::endl;
    return 0;
}

多线程环境下的"深坑"与注意事项

1. 小心"隐形阻塞"（Future 的析构函数）

这是 C++ 标准中最容易踩坑的地方。请看下面的代码：

cpp 复制代码

void bad_code() {
    // 启动了一个异步任务，但是没有把返回的 future 赋给变量！
    std::async(std::launch::async, [](){
        std::this_thread::sleep_for(std::chrono::seconds(5));
        std::cout << "Task done!" << std::endl;
    });
    
    // 注意：上面的 std::async 返回了一个临时 future 对象。
    // 这行代码结束后，临时 future 被析构。
    // 由于策略是 async，析构函数会***阻塞在这里等待 5 秒***，直到线程跑完！
    std::cout << "This line is blocked!" << std::endl;
}

解决办法 ：只要你用了 std::launch::async，就一定要把返回的 future 保存到变量中，哪怕你不需要它的返回值。

2. `std::future` 不能被拷贝，只能移动

std::future 是Move-only 的。你不能把它放进容器里通过拷贝赋值，必须用 std::move。

cpp 复制代码

std::future<int> f = std::async(func);
std::vector<std::future<int>> vec;
// vec.push_back(f); // 编译错误！不能拷贝
vec.push_back(std::move(f)); // 正确，使用移动语义

3. 异常的跨线程传播

如果在 std::async 启动的函数里抛出了异常，程序不会直接崩溃。异常会被存储在 future 里，直到你调用 .get() 的时候，异常会在调用 .get() 的那个线程（通常是主线程）被重新抛出。

这是好事 ，这意味着你可以在主线程用一个 try-catch 块捕获所有子线程的异常。

总结：最佳实践清单

优先使用 std::async 而非 std::thread ：除非你需要极其底层的线程控制，否则 async 更安全、更方便（自动管理线程池、处理返回值）。
显式指定 std::launch::async：除非你确定想要延迟求值（Deferred），否则为了保证真正的并行，显式指定策略。
保存 Future 对象：避免临时对象析构导致的意外阻塞。
通过返回值通信 ：尽量利用 future 的返回值传递数据，减少对 std::mutex 的依赖。

std::async 和 std::future的使用

核心场景 1：并行执行多个独立任务（"Fire and Forget" + 结果聚合）

代码示例：并行数据处理

核心场景 2：流水线与依赖链（Future 的链式调用）

代码示例：串行依赖的并行

核心场景 3：避免数据竞争（用 Future 替代共享变量）

反面教材（不要这样做）：

正面教材（使用 Future）：

多线程环境下的"深坑"与注意事项

1. 小心"隐形阻塞"（Future 的析构函数）

2. std::future 不能被拷贝，只能移动

3. 异常的跨线程传播

总结：最佳实践清单

2. `std::future` 不能被拷贝，只能移动