C/C++并发编程详解：如何写出优秀的并发程序

前言

在当今多核处理器普及的时代，充分利用硬件资源的关键在于编写高效的并发程序。C/C++作为系统级编程语言，提供了丰富的并发编程工具和机制。本文将深入探讨C/C++并发编程的核心概念、最佳实践和常见陷阱，帮助您编写出优秀的并发程序。

一、并发编程基础

1.1 进程 vs 线程

进程：独立的执行单元，拥有独立的地址空间

线程：轻量级进程，共享进程的资源，但有自己的栈和寄存器

cpp 复制代码

在C++中，我们主要使用线程来实现并发：


#include <iostream>
#include <thread>

void hello() {
    std::cout << "Hello from thread!" << std::endl;
}

int main() {
    std::thread t(hello);
    t.join();  // 等待线程结束
    return 0;
}

1.2 并发编程的挑战

竞态条件：多个线程同时访问共享资源

死锁：线程相互等待对方释放资源

活锁：线程不断响应对方但无法前进

资源饥饿：某些线程无法获得所需资源

二、同步机制

2.1 互斥锁（Mutex）

互斥锁是最基本的同步机制，用于保护共享资源。

cpp 复制代码

#include <iostream>
#include <thread>
#include <mutex>

std::mutex mtx;
int counter = 0;

void increment() {
    for (int i = 0; i < 100000; ++i) {
        std::lock_guard<std::mutex> lock(mtx);  // 自动加锁解锁
        ++counter;
    }
}

int main() {
    std::thread t1(increment);
    std::thread t2(increment);
    
    t1.join();
    t2.join();
    
    std::cout << "Counter: " << counter << std::endl;
    return 0;
}

最佳实践：

使用std::lock_guard或std::unique_lock管理锁生命周期

锁的粒度尽可能小

避免在持有锁时执行耗时操作

2.2 条件变量（Condition Variable）

条件变量用于线程间的条件同步。

cpp 复制代码

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>
#include <queue>

std::mutex mtx;
std::condition_variable cv;
std::queue<int> data_queue;
bool finished = false;

void producer() {
    for (int i = 0; i < 10; ++i) {
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
        {
            std::lock_guard<std::mutex> lock(mtx);
            data_queue.push(i);
            std::cout << "Produced: " << i << std::endl;
        }
        cv.notify_one();  // 通知一个消费者
    }
    {
        std::lock_guard<std::mutex> lock(mtx);
        finished = true;
    }
    cv.notify_all();  // 通知所有消费者
}

void consumer() {
    while (true) {
        std::unique_lock<std::mutex> lock(mtx);
        cv.wait(lock, []{ return !data_queue.empty() || finished; });
        
        if (!data_queue.empty()) {
            int data = data_queue.front();
            data_queue.pop();
            std::cout << "Consumed: " << data << std::endl;
            lock.unlock();
            // 处理数据...
        } else if (finished) {
            break;
        }
    }
}

int main() {
    std::thread prod(producer);
    std::thread cons1(consumer);
    std::thread cons2(consumer);
    
    prod.join();
    cons1.join();
    cons2.join();
    
    return 0;
}

最佳实践：

使用谓词检查条件，避免虚假唤醒

在持有锁时尽可能减少操作

使用notify_all()通知所有等待线程，或notify_one()通知一个

2.3 读写锁（Read-Write Lock）

当读操作远多于写操作时，读写锁能提高性能。

cpp 复制代码

#include <shared_mutex>

std::shared_mutex rw_mutex;
int shared_data = 0;

void reader() {
    std::shared_lock<std::shared_mutex> lock(rw_mutex);
    // 读取shared_data
}

void writer() {
    std::unique_lock<std::shared_mutex> lock(rw_mutex);
    // 修改shared_data
}

三、原子操作

原子操作提供无锁的同步机制，适合简单计数器等场景。

cpp 复制代码

#include <atomic>

std::atomic<int> atomic_counter(0);

void increment_atomic() {
    for (int i = 0; i < 100000; ++i) {
        ++atomic_counter;  // 原子操作
    }
}

int main() {
    std::thread t1(increment_atomic);
    std::thread t2(increment_atomic);
    
    t1.join();
    t2.join();
    
    std::cout << "Atomic counter: " << atomic_counter << std::endl;
    return 0;
}

内存顺序：

cpp 复制代码

memory_order_relaxed：无顺序约束

memory_order_acquire：保证该操作之后的读写不会重排到该操作之前

memory_order_release：保证该操作之前的读写不会重排到该操作之后

memory_order_acq_rel：acquire + release

memory_order_seq_cst：顺序一致性（默认）


std::atomic<bool> ready(false);
int data = 0;

void producer() {
    data = 42;  // 1. 写入数据
    ready.store(true, std::memory_order_release);  // 2. 发布数据
}

void consumer() {
    while (!ready.load(std::memory_order_acquire));  // 3. 等待数据
    std::cout << data << std::endl;  // 4. 读取数据
}

四、高级并发模式

4.1 线程池

线程池避免频繁创建销毁线程的开销。

cpp 复制代码

#include <vector>
#include <thread>
#include <queue>
#include <mutex>
#include <condition_variable>
#include <functional>
#include <future>

class ThreadPool {
public:
    ThreadPool(size_t num_threads) : stop(false) {
        for (size_t i = 0; i < num_threads; ++i) {
            workers.emplace_back([this] {
                while (true) {
                    std::function<void()> task;
                    {
                        std::unique_lock<std::mutex> lock(this->queue_mutex);
                        this->cv.wait(lock, [this]{ 
                            return this->stop || !this->tasks.empty(); 
                        });
                        if (this->stop && this->tasks.empty())
                            return;
                        task = std::move(this->tasks.front());
                        this->tasks.pop();
                    }
                    task();
                }
            });
        }
    }

    template<class F, class... Args>
    auto enqueue(F&& f, Args&&... args) 
        -> std::future<typename std::result_of<F(Args...)>::type> {
        using return_type = typename std::result_of<F(Args...)>::type;
        
        auto task = std::make_shared<std::packaged_task<return_type()>>(
            std::bind(std::forward<F>(f), std::forward<Args>(args)...)
        );
        
        std::future<return_type> res = task->get_future();
        {
            std::unique_lock<std::mutex> lock(queue_mutex);
            if (stop)
                throw std::runtime_error("enqueue on stopped ThreadPool");
            tasks.emplace([task](){ (*task)(); });
        }
        cv.notify_one();
        return res;
    }

    ~ThreadPool() {
        {
            std::unique_lock<std::mutex> lock(queue_mutex);
            stop = true;
        }
        cv.notify_all();
        for (std::thread& worker : workers)
            worker.join();
    }

private:
    std::vector<std::thread> workers;
    std::queue<std::function<void()>> tasks;
    std::mutex queue_mutex;
    std::condition_variable cv;
    bool stop;
};

4.2 无锁数据结构

无锁数据结构通过原子操作实现同步，避免锁的开销。

cpp 复制代码

template<typename T>
class LockFreeQueue {
private:
    struct Node {
        T data;
        std::atomic<Node*> next;
        Node(T data) : data(data), next(nullptr) {}
    };
    
    std::atomic<Node*> head;
    std::atomic<Node*> tail;

public:
    LockFreeQueue() {
        Node* dummy = new Node(T());
        head.store(dummy);
        tail.store(dummy);
    }
    
    ~LockFreeQueue() {
        while (Node* old_head = head.load()) {
            head.store(old_head->next);
            delete old_head;
        }
    }
    
    void enqueue(T data) {
        Node* new_node = new Node(data);
        Node* old_tail = tail.load();
        Node* old_next = old_tail->next.load();
        
        while (true) {
            if (old_next == nullptr) {
                if (old_tail->next.compare_exchange_weak(old_next, new_node)) {
                    break;
                }
            } else {
                tail.compare_exchange_weak(old_tail, old_next);
                old_tail = tail.load();
                old_next = old_tail->next.load();
            }
        }
        
        tail.compare_exchange_weak(old_tail, new_node);
    }
    
    bool dequeue(T& result) {
        Node* old_head = head.load();
        Node* old_tail = tail.load();
        Node* next = old_head->next.load();
        
        while (true) {
            if (old_head == old_tail) {
                if (next == nullptr) {
                    return false; // 队列为空
                }
                tail.compare_exchange_weak(old_tail, next);
            } else {
                if (head.compare_exchange_weak(old_head, next)) {
                    result = next->data;
                    delete old_head;
                    return true;
                }
                old_head = head.load();
                next = old_head->next.load();
            }
        }
    }
};

五、并发编程最佳实践

cpp 复制代码

最小化共享数据：尽可能使用线程本地存储


thread_local int thread_specific_data = 0;



优先使用高级抽象：如std::async、std::future


auto future = std::async(std::launch::async, []{
    // 异步任务
    return 42;
});
int result = future.get();



避免死锁：



按固定顺序获取锁

使用std::lock()或std::scoped_lock同时获取多个锁


std::mutex mtx1, mtx2;

void safe_operation() {
    std::lock(mtx1, mtx2);  // 同时锁定两个互斥量
    std::lock_guard<std::mutex> lock1(mtx1, std::adopt_lock);
    std::lock_guard<std::mutex> lock2(mtx2, std::adopt_lock);
    // 操作共享数据
}



性能优化：



减少锁的竞争（使用细粒度锁或无锁结构）

避免虚假共享（padding或对齐）


struct alignas(64) CacheLineAligned {
    int data;
    char padding[64 - sizeof(int)];
};

调试工具：

Valgrind的Helgrind工具

Clang ThreadSanitizer（-fsanitize=thread）

GDB的线程调试功能

六、常见并发问题及解决方案

6.1 死锁检测与预防

银行家算法：模拟资源分配，预防死锁

检测方法：

记录资源分配图

定期检测图中是否存在环

6.2 性能瓶颈分析

使用性能分析工具：

Perf

VTune

gprof

识别热点：

高锁竞争区域

过度同步

缓存失效

七、现代C++并发特性

7.1 C++17新特性

cpp 复制代码

std::scoped_lock：同时获取多个锁

std::shared_mutex改进

std::invoke与并发结合

7.2 C++20新特性

cpp 复制代码

协程支持

std::jthread：自动join的线程

std::atomic_ref：对非原子对象的原子引用

std::counting_semaphore：计数信号量


#include <semaphore>

std::counting_semaphore<10> sem(5); // 最大10，初始5

void worker() {
    sem.acquire();
    // 访问资源
    sem.release();
}

结语

优秀的并发程序需要深入理解底层机制和同步原语，同时掌握现代C++的高级抽象。关键原则包括：

最小化共享状态

优先使用无锁结构

合理选择同步机制

全面测试并发场景

利用现代C++特性简化代码

通过本文介绍的技术和最佳实践，您应该能够编写出高效、安全且可维护的并发程序，充分利用现代硬件的多核能力。