C++17 多线程系列(十):多线程性能优化——从测量到调优

核心目标:能系统化地测量和优化多线程应用性能,理解 Amdahl 定律和 Gustafson 定律的实践意义,掌握火焰图和 perf 工具使用,建立"测量→分析→优化→验证"的性能闭环。

前置知识:全系列 Part 1-9 的知识,了解 Linux perf 工具更佳。


10.1 多线程性能理论基础------两个定律

10.1.1 Amdahl 定律------串行部分决定加速上限

复制代码
            1
S = ─────────────────
    (1 - P) + P / N

S: 加速比 (Speedup)
P: 可并行化的比例
N: 处理器数量
cpp 复制代码
#include <cmath>
#include <iostream>
#include <vector>

// 计算理论最大加速比
double amdahl_speedup(double parallel_fraction, int num_cores) {
    return 1.0 / ((1.0 - parallel_fraction) + parallel_fraction / num_cores);
}

void print_amdahl_table() {
    std::cout << "Amdahl 定律: 不同并行度的加速比\n";
    std::cout << "核心数 | P=50% | P=75% | P=90% | P=95% | P=99%\n";
    std::cout << "-------|-------|-------|-------|-------|-------\n";

    for (int n : {1, 2, 4, 8, 16, 32, 64, 128}) {
        printf("%-6d | %-5.2f | %-5.2f | %-5.2f | %-5.2f | %-5.2f\n",
               n,
               amdahl_speedup(0.50, n),
               amdahl_speedup(0.75, n),
               amdahl_speedup(0.90, n),
               amdahl_speedup(0.95, n),
               amdahl_speedup(0.99, n));
    }
}

输出结果

核心数 P=50% P=75% P=90% P=95% P=99%
1 1.00 1.00 1.00 1.00 1.00
4 1.60 2.29 3.08 3.48 3.88
8 1.78 2.91 4.71 5.93 7.48
16 1.88 3.37 6.40 9.14 13.91
32 1.94 3.66 7.80 12.55 24.43
128 1.99 3.88 9.14 16.76 56.06
2.00 4.00 10.00 20.00 100.00

#mermaid-svg-XP8A2v6rNPGX7m6b{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-XP8A2v6rNPGX7m6b .error-icon{fill:#552222;}#mermaid-svg-XP8A2v6rNPGX7m6b .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-XP8A2v6rNPGX7m6b .marker{fill:#333333;stroke:#333333;}#mermaid-svg-XP8A2v6rNPGX7m6b .marker.cross{stroke:#333333;}#mermaid-svg-XP8A2v6rNPGX7m6b svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-XP8A2v6rNPGX7m6b p{margin:0;}#mermaid-svg-XP8A2v6rNPGX7m6b .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster-label text{fill:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster-label span{color:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster-label span p{background-color:transparent;}#mermaid-svg-XP8A2v6rNPGX7m6b .label text,#mermaid-svg-XP8A2v6rNPGX7m6b span{fill:#333;color:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .node rect,#mermaid-svg-XP8A2v6rNPGX7m6b .node circle,#mermaid-svg-XP8A2v6rNPGX7m6b .node ellipse,#mermaid-svg-XP8A2v6rNPGX7m6b .node polygon,#mermaid-svg-XP8A2v6rNPGX7m6b .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-XP8A2v6rNPGX7m6b .rough-node .label text,#mermaid-svg-XP8A2v6rNPGX7m6b .node .label text,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape .label,#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape .label{text-anchor:middle;}#mermaid-svg-XP8A2v6rNPGX7m6b .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-XP8A2v6rNPGX7m6b .rough-node .label,#mermaid-svg-XP8A2v6rNPGX7m6b .node .label,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape .label,#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape .label{text-align:center;}#mermaid-svg-XP8A2v6rNPGX7m6b .node.clickable{cursor:pointer;}#mermaid-svg-XP8A2v6rNPGX7m6b .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-XP8A2v6rNPGX7m6b .arrowheadPath{fill:#333333;}#mermaid-svg-XP8A2v6rNPGX7m6b .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-XP8A2v6rNPGX7m6b .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-XP8A2v6rNPGX7m6b .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XP8A2v6rNPGX7m6b .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-XP8A2v6rNPGX7m6b .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XP8A2v6rNPGX7m6b .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster text{fill:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster span{color:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-XP8A2v6rNPGX7m6b .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b rect.text{fill:none;stroke-width:0;}#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape p,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape .label rect,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XP8A2v6rNPGX7m6b .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-XP8A2v6rNPGX7m6b .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-XP8A2v6rNPGX7m6b :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 程序执行
并行化
可并行 80%
串行 20%
Core1: 20%
Core2: 20%
Core3: 20%
Core4: 20%
无论多少核 串行部分始终 20%
Amdahl 结论: 即使无限核心, 加速比 ≤ 1÷20% = 5x

Amdahl 定律的残酷真相 :即使你有 128 核,如果程序只有 90% 可并行,理论加速上限只有 10x。串行 10% 就是瓶颈。

10.1.2 Gustafson 定律------问题规模增大时加速比

复制代码
S = N - α·(N - 1)

S: 加速比
N: 处理器数量
α: 串行比例

与 Amdahl 的核心区别:

Amdahl:  固定问题规模 → 串行比例不变 → 加速有上限
Gustafson: 问题规模随核心数增长 → 串行比例减小 → 线性加速

实际场景: 图像分辨率从 1080p 升级到 4K/8K
         串行开销 (文件IO) 几乎不变, 但可并行的像素处理翻倍/N倍
定律 假设 适用场景 关键结论
Amdahl 固定问题规模 延迟敏感型 (Latency-bound) 串行部分决定上限
Gustafson 规模随核心数增长 吞吐量型 (Throughput-bound) 近线性加速可行

10.1.3 可扩展性测量

cpp 复制代码
#include <atomic>
#include <thread>
#include <chrono>
#include <vector>

// 测量不同线程数的加速比
void measure_scalability(int max_threads) {
    constexpr long N = 50'000'000;  // 计算量
    auto baseline = measure_parallel_work(1, N);  // 基准 (单线程)

    std::cout << "线程数 | 耗时  | 加速比 | 效率\n";
    std::cout << "-------|-------|--------|------\n";

    for (int t = 1; t <= max_threads; t *= 2) {
        auto time = measure_parallel_work(t, N);
        double speedup = (double)baseline / time;
        double efficiency = speedup / t * 100;

        printf("%-6d | %-5d | %-6.2f | %.1f%%\n",
               t, time, speedup, efficiency);
        // 理想: speedup ≈ t, efficiency ≈ 100%
        // 实际: 随 t 增加, efficiency 下降   ← 找到拐点
    }
}

double measure_parallel_work(int threads, long total_ops) {
    std::atomic<long> counter{0};
    std::vector<std::thread> workers;

    long ops_per_thread = total_ops / threads;

    auto t1 = std::chrono::steady_clock::now();
    for (int i = 0; i < threads; ++i) {
        workers.emplace_back([&] {
            for (long j = 0; j < ops_per_thread; ++j) {
                // 模拟计算
                volatile double x = 0;
                for (int k = 0; k < 10; ++k) x += std::sqrt(k);
                counter.fetch_add(1, std::memory_order_relaxed);
            }
        });
    }
    for (auto& w : workers) w.join();
    auto t2 = std::chrono::steady_clock::now();

    return std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
}

// 典型输出 (8核):
// 线程数 | 耗时  | 加速比 | 效率
//     1  | 4850  |  1.00  | 100.0%
//     2  | 2500  |  1.94  |  97.0%
//     4  | 1310  |  3.70  |  92.5%
//     8  |  730  |  6.64  |  83.0%   ← 拐点
//    16  |  710  |  6.83  |  42.7%   ← 过度订阅!

10.2 锁的性能分析与优化

10.2.1 perf lock------Linux 锁争用分析

bash 复制代码
# 1. 采集锁事件
perf lock record ./my_program

# 2. 查看锁争用报告
perf lock report

# 典型输出:
# Name                        acquired  contended   avg wait (ns)   total wait (ns)
# pthread_mutex_lock              50000       3500          1200           4200000
# SharedMutex::lock                 200        180         15000           2700000
#
# 解读:
# - my_program_mutex: 3.5% 争用率, 平均等待 1200ns
# - shared_mutex: 90% 争用率, 平均等待 15000ns  ← 瓶颈!

# 3. 查看争用详情 (调用栈)
perf lock report -k

# 4. 结合 perf top 实时观察
perf top -e cycles:p
# 如果看到 __lll_lock_wait 占比高 → 锁争用严重

10.2.2 锁争用率测量

cpp 复制代码
#include <mutex>
#include <chrono>
#include <atomic>
#include <iostream>

class ContentionMeter {
    std::atomic<long> lock_acquisitions_{0};
    std::atomic<long> lock_contentions_{0};  // 需要等待的次数

public:
    void record_acquire(bool contended) {
        lock_acquisitions_.fetch_add(1, std::memory_order_relaxed);
        if (contended) {
            lock_contentions_.fetch_add(1, std::memory_order_relaxed);
        }
    }

    double contention_rate() const {
        long total = lock_acquisitions_.load();
        if (total == 0) return 0.0;
        return 100.0 * lock_contentions_.load() / total;
    }

    void report() const {
        printf("Lock stats: %ld acquires, %ld contentions (%.1f%%)\n",
               lock_acquisitions_.load(),
               lock_contentions_.load(),
               contention_rate());
    }
};

// 带统计的 mutex wrapper
class MeteredMutex {
    std::mutex mtx_;
    ContentionMeter& meter_;

public:
    explicit MeteredMutex(ContentionMeter& m) : meter_(m) {}

    void lock() {
        bool contended = !mtx_.try_lock();
        meter_.record_acquire(contended);
        if (contended) {
            mtx_.lock();  // 真正等待
        }
    }

    void unlock() { mtx_.unlock(); }
};

10.2.3 不同锁的开销微基准

cpp 复制代码
#include <mutex>
#include <shared_mutex>
#include <atomic>
#include <thread>
#include <functional>

struct LockBenchmark {
    static constexpr int kIterations = 10'000'000;
    static constexpr int kNumThreads = 4;

    // 各锁类型的单次操作耗时 (无竞争)
    template <typename L, typename Unlock>
    static double measure(L&& lock_fn, Unlock&& unlock_fn) {
        auto t1 = std::chrono::high_resolution_clock::now();
        for (int i = 0; i < kIterations; ++i) {
            lock_fn();
            unlock_fn();
        }
        auto t2 = std::chrono::high_resolution_clock::now();
        auto ns = std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
        return (double)ns / kIterations;  // 每次操作平均纳秒数
    }

    // 各锁类型的吞吐量 (4 线程竞争)
    template <typename Setup, typename Work>
    static double measure_throughput(Setup&& setup, Work&& work) {
        std::atomic<bool> start{false};
        std::atomic<long> counter{0};

        setup();

        std::vector<std::thread> threads;
        for (int t = 0; t < kNumThreads; ++t) {
            threads.emplace_back([&] {
                while (!start) std::this_thread::yield();
                for (int i = 0; i < kIterations / kNumThreads; ++i) {
                    work();
                    counter.fetch_add(1, std::memory_order_relaxed);
                }
            });
        }

        auto t1 = std::chrono::high_resolution_clock::now();
        start = true;
        for (auto& t : threads) t.join();
        auto t2 = std::chrono::high_resolution_clock::now();

        auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
        return counter.load() / (ms / 1000.0);  // ops/s
    }
};

微基准结果(典型值)

锁类型 无竞争延迟 (ns) 4 线程吞吐 (M ops/s) 8 线程吞吐 (M ops/s)
std::mutex 25 2.8 2.2
std::shared_mutex (独占) 35 2.4 1.9
std::shared_mutex (共享) 30 12.5 16.8
std::atomic_flag 自旋锁 5 3.2 2.6
tbb::spin_mutex 8 3.5 3.0

10.3 Cache Line 对齐------从结构体布局要性能

10.3.1 热数据与冷数据分离

cpp 复制代码
// ❌ 热数据和冷数据混在一起
struct BadDataLayout {
    // 热数据 (频繁多线程访问)
    std::atomic<long> ref_count{0};     // 24B (含 padding)
    std::mutex mtx;                     // 40B
    int active_workers{0};              // 4B

    // 冷数据 (偶尔访问)
    char debug_name[256];               // 256B
    char error_message[512];            // 512B

    // 问题: ref_count 和 cold data 在同一 cache line
    // → 访问 debug_name 会使 ref_count 的 cache line 失效!
};
// 大小 ≈ 840B

// ✅ 按访问频率分离
struct GoodDataLayout {
    // 热 Cache Line 0: 原子操作区
    struct HotData {
        alignas(64) std::atomic<long> ref_count{0};
        int active_workers{0};
    } hot;

    // 热 Cache Line 1: 锁
    alignas(64) std::mutex mtx;

    // 冷 Cache Line 2+: 调试信息
    char debug_name[256];
    char error_message[512];
};
// 大小 ≈ 900B, 但性能更好

#mermaid-svg-4OHB4ys80hfMhxJI{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-4OHB4ys80hfMhxJI .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-4OHB4ys80hfMhxJI .error-icon{fill:#552222;}#mermaid-svg-4OHB4ys80hfMhxJI .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-4OHB4ys80hfMhxJI .marker{fill:#333333;stroke:#333333;}#mermaid-svg-4OHB4ys80hfMhxJI .marker.cross{stroke:#333333;}#mermaid-svg-4OHB4ys80hfMhxJI svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-4OHB4ys80hfMhxJI p{margin:0;}#mermaid-svg-4OHB4ys80hfMhxJI .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster-label text{fill:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster-label span{color:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster-label span p{background-color:transparent;}#mermaid-svg-4OHB4ys80hfMhxJI .label text,#mermaid-svg-4OHB4ys80hfMhxJI span{fill:#333;color:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .node rect,#mermaid-svg-4OHB4ys80hfMhxJI .node circle,#mermaid-svg-4OHB4ys80hfMhxJI .node ellipse,#mermaid-svg-4OHB4ys80hfMhxJI .node polygon,#mermaid-svg-4OHB4ys80hfMhxJI .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-4OHB4ys80hfMhxJI .rough-node .label text,#mermaid-svg-4OHB4ys80hfMhxJI .node .label text,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape .label,#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape .label{text-anchor:middle;}#mermaid-svg-4OHB4ys80hfMhxJI .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-4OHB4ys80hfMhxJI .rough-node .label,#mermaid-svg-4OHB4ys80hfMhxJI .node .label,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape .label,#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape .label{text-align:center;}#mermaid-svg-4OHB4ys80hfMhxJI .node.clickable{cursor:pointer;}#mermaid-svg-4OHB4ys80hfMhxJI .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-4OHB4ys80hfMhxJI .arrowheadPath{fill:#333333;}#mermaid-svg-4OHB4ys80hfMhxJI .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-4OHB4ys80hfMhxJI .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-4OHB4ys80hfMhxJI .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-4OHB4ys80hfMhxJI .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-4OHB4ys80hfMhxJI .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-4OHB4ys80hfMhxJI .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-4OHB4ys80hfMhxJI .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster text{fill:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster span{color:#333;}#mermaid-svg-4OHB4ys80hfMhxJI div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-4OHB4ys80hfMhxJI .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-4OHB4ys80hfMhxJI rect.text{fill:none;stroke-width:0;}#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape p,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape .label rect,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-4OHB4ys80hfMhxJI .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-4OHB4ys80hfMhxJI .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-4OHB4ys80hfMhxJI :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} ✅ 分离后
互不干扰
Cache Line 0 (Hot)

ref_count\|active_workers

Thread 1: 写 ref_count
Thread 2: 读 debug_name
Cache Line 1 (Hot)

mtx

Cache Line 2 (Cold)

debug_name...

❌ 混在一起
冲突! (False Sharing)
Cache Line 0

[ref_count|mtx|active|debug_name...
Thread 1: 写 ref_count
Thread 2: 读 debug_name

10.3.2 结构体成员重排

cpp 复制代码
// ❌ 自然对齐产生的空洞
struct Unoptimized {
    char  flag;        // 1B → 占用 offset 0
    //      padding     // 7B → 对齐到 8B
    double value;      // 8B → offset 8
    int   count;       // 4B → offset 16
    char  type;        // 1B → offset 20
    //      padding     // 3B → 对齐到 8B 边界
};
// sizeof = 24 bytes (7+3 = 10 bytes 浪费!)

// ✅ 重排后
struct Optimized {
    double value;      // 8B → offset 0
    int   count;       // 4B → offset 8
    char  flag;        // 1B → offset 12
    char  type;        // 1B → offset 13
    //      padding     // 2B → 对齐到 8B
};
// sizeof = 16 bytes (节省 33%)

// 排列规则: 按对齐要求从大到小排序
// double(8) > int(4) > char(1)

10.4 锁粒度优化------从粗到细

10.4.1 锁粒度演进

cpp 复制代码
#include <shared_mutex>
#include <vector>
#include <unordered_map>

// ── Level 1: 全局锁 (最粗粒度) ──
class GlobalLockCache {
    std::mutex mtx_;                               // 一把锁保护所有数据
    std::unordered_map<std::string, std::string> data_;
    std::vector<std::string> access_log_;

public:
    void put(const std::string& k, const std::string& v) {
        std::lock_guard lock(mtx_);                // 整个操作持锁
        data_[k] = v;
        access_log_.push_back("PUT " + k);
    }

    std::string get(const std::string& k) {
        std::lock_guard lock(mtx_);
        access_log_.push_back("GET " + k);
        auto it = data_.find(k);
        return it != data_.end() ? it->second : "";
    }
    // 问题: get 和 put 完全互斥!
};

// ── Level 2: 读写锁 ──
class RwLockCache {
    mutable std::shared_mutex mtx_;                // 读共享, 写互斥
    std::unordered_map<std::string, std::string> data_;

public:
    void put(const std::string& k, const std::string& v) {
        std::unique_lock lock(mtx_);
        data_[k] = v;
    }

    std::string get(const std::string& k) const {
        std::shared_lock lock(mtx_);
        auto it = data_.find(k);
        return it != data_.end() ? it->second : "";
    }
    // 改进: 多个 get 可并发, 但 put 阻塞所有 get
};

// ── Level 3: 锁分片 (Lock Striping) ──
class ShardedCache {
    static constexpr int kShards = 16;

    struct Shard {
        mutable std::shared_mutex mtx;
        std::unordered_map<std::string, std::string> data;
    };

    std::vector<Shard> shards_{kShards};
    std::hash<std::string> hasher_;

    Shard& get_shard(const std::string& key) {
        return shards_[hasher_(key) % kShards];
    }

public:
    void put(const std::string& k, const std::string& v) {
        auto& shard = get_shard(k);
        std::unique_lock lock(shard.mtx);
        shard.data[k] = v;
    }

    std::string get(const std::string& k) const {
        auto& shard = get_shard(k);
        std::shared_lock lock(shard.mtx);
        auto it = shard.data.find(k);
        return it != shard.data.end() ? it->second : "";
    }
    // 改进: 不同 key 的操作完全并行 (16x 并发!)
};

10.4.2 收益量化

复制代码
场景: 8 线程, 90% 读 + 10% 写, 100 万 key

全局锁:   870K ops/s  (基准)
读写锁:   3.2M ops/s  (3.7x)
锁分片:  11.5M ops/s  (13.2x)

锁分片 = 读写锁 + 降低争用概率
争用概率从 1/1 降到 1/16 (理想情况下)

10.5 内存序降级------安全地榨取性能

10.5.1 降级路线

cpp 复制代码
#include <atomic>

// ── Level 1: seq_cst (默认, 最安全, 最慢) ──
std::atomic<int> counter_seq{0};
void increment_seq() {
    counter_seq.fetch_add(1);  // 默认 seq_cst
    // 全局顺序一致性: 保证所有线程看到一致的修改顺序
    // 代价: 需要 full memory barrier (x86: MFENCE ~33 cycles)
}

// ── Level 2: acquire/release (推荐) ──
std::atomic<int> counter_ar{0};
void producer() {
    // ... 准备数据 ...
    counter_ar.store(42, std::memory_order_release);  // 之前的写可见
}

void consumer() {
    int val = counter_ar.load(std::memory_order_acquire);  // 之后读可见
    // val == 42 → 保证 producer 之前的写入都可见
}
// 释放-获取对: 建立线程间的 happens-before 关系
// 代价: 单向 barrier (x86: 几乎免费, x86 默认提供)

// ── Level 3: relaxed (最快, 但需谨慎) ──
std::atomic<long> stat_counter{0};
void record_stat() {
    stat_counter.fetch_add(1, std::memory_order_relaxed);
    // 仅保证原子性, 不保证顺序
    // 适用: 统计计数器、引用计数 (不依赖其他数据的顺序)
}
// 代价: 最小 (x86: LOCK INC 指令, ~10 cycles)

10.5.2 安全降级检查清单

#mermaid-svg-n6VarNS9Cpi2kfex{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-n6VarNS9Cpi2kfex .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-n6VarNS9Cpi2kfex .error-icon{fill:#552222;}#mermaid-svg-n6VarNS9Cpi2kfex .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-n6VarNS9Cpi2kfex .marker{fill:#333333;stroke:#333333;}#mermaid-svg-n6VarNS9Cpi2kfex .marker.cross{stroke:#333333;}#mermaid-svg-n6VarNS9Cpi2kfex svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-n6VarNS9Cpi2kfex p{margin:0;}#mermaid-svg-n6VarNS9Cpi2kfex .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster-label text{fill:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster-label span{color:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster-label span p{background-color:transparent;}#mermaid-svg-n6VarNS9Cpi2kfex .label text,#mermaid-svg-n6VarNS9Cpi2kfex span{fill:#333;color:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .node rect,#mermaid-svg-n6VarNS9Cpi2kfex .node circle,#mermaid-svg-n6VarNS9Cpi2kfex .node ellipse,#mermaid-svg-n6VarNS9Cpi2kfex .node polygon,#mermaid-svg-n6VarNS9Cpi2kfex .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-n6VarNS9Cpi2kfex .rough-node .label text,#mermaid-svg-n6VarNS9Cpi2kfex .node .label text,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape .label,#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape .label{text-anchor:middle;}#mermaid-svg-n6VarNS9Cpi2kfex .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-n6VarNS9Cpi2kfex .rough-node .label,#mermaid-svg-n6VarNS9Cpi2kfex .node .label,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape .label,#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape .label{text-align:center;}#mermaid-svg-n6VarNS9Cpi2kfex .node.clickable{cursor:pointer;}#mermaid-svg-n6VarNS9Cpi2kfex .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-n6VarNS9Cpi2kfex .arrowheadPath{fill:#333333;}#mermaid-svg-n6VarNS9Cpi2kfex .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-n6VarNS9Cpi2kfex .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-n6VarNS9Cpi2kfex .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-n6VarNS9Cpi2kfex .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-n6VarNS9Cpi2kfex .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-n6VarNS9Cpi2kfex .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-n6VarNS9Cpi2kfex .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster text{fill:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster span{color:#333;}#mermaid-svg-n6VarNS9Cpi2kfex div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-n6VarNS9Cpi2kfex .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-n6VarNS9Cpi2kfex rect.text{fill:none;stroke-width:0;}#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape p,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape .label rect,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-n6VarNS9Cpi2kfex .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-n6VarNS9Cpi2kfex .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-n6VarNS9Cpi2kfex :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 是







考虑降级内存序
数据之间有依赖关系?
保留 acquire/release

或 seq_cst
需要多个原子间的全局顺序?
保留 seq_cst
只是简单计数器/flag?
✅ 可以降为 relaxed
生产者-消费者模式?
✅ 降为 acquire/release
保持 seq_cst

10.5.3 性能测试对比(x86-64)

cpp 复制代码
// 4 线程, 1 亿次 fetch_add
// 内存序         | 耗时    | 相对 seq_cst
// seq_cst        | 3200ms  | 1.00x
// acq_rel        | 2100ms  | 1.52x
// release        | 1800ms  | 1.78x
// relaxed        | 1450ms  | 2.21x

// x86 上 acquire 和 relaxed 代价几乎相同 (硬件保证 Load-Load 顺序)
// ARM/PowerPC 上差异更明显

10.6 线程数调优------找到最优并行度

10.6.1 线程数公式

cpp 复制代码
#include <thread>
#include <algorithm>

// CPU 密集型 (计算 bound)
size_t cpu_bound_threads() {
    unsigned int n = std::thread::hardware_concurrency();
    return std::max(1u, n);  // 正好等于核心数
    // 不要超订 (oversubscription): 超过核心数 → 上下文切换开销 > 并行收益
}

// IO 密集型 (IO bound)
size_t io_bound_threads(double wait_to_compute_ratio = 10.0) {
    unsigned int n = std::thread::hardware_concurrency();
    // 如果 90% 时间在等待 IO, 可以放 10x 的线程
    return std::max(1u, static_cast<unsigned int>(n * (1 + wait_to_compute_ratio)));
    // 典型值: 核心数 × (1~20)
}

// 混合型
size_t mixed_threads(double io_fraction = 0.5) {
    unsigned int n = std::thread::hardware_concurrency();
    // io_fraction: IO 占任务时间的比例
    return std::max(1u, static_cast<unsigned int>(n / (1 - io_fraction)));
}

10.6.2 过度订阅的检测

bash 复制代码
# 观察上下文切换频率
perf stat -e context-switches,cpu-migrations ./my_program

# 如果 context-switches >> 任务数 → 可能过度订阅
# 正常: context-switches 约等于任务数 (每个任务一次切换)
# 过度订阅: context-switches >> 任务数 (频繁切换)

# 观察 CPU 利用率
htop
# 如果所有核心 100% 但吞吐量不再增长 → 过度订阅
cpp 复制代码
// 代码中检测
#include <fstream>
#include <string>

long get_voluntary_ctxt_switches() {
    std::ifstream status("/proc/self/status");
    std::string line;
    while (std::getline(status, line)) {
        if (line.starts_with("voluntary_ctxt_switches:")) {
            return std::stol(line.substr(26));
        }
    }
    return 0;
}

// 操作前后对比:
// auto before = get_voluntary_ctxt_switches();
// do_work();
// auto after = get_voluntary_ctxt_switches();
// 如果 after - before 远大于线程数 → 过度订阅

10.6.3 线程池大小动态调整

cpp 复制代码
// 概念: 根据任务队列深度动态调整池大小
class AdaptiveThreadPool {
    size_t min_threads_;
    size_t max_threads_;
    size_t target_queue_depth_;

public:
    void monitor_and_adjust(std::stop_token token) {
        while (!token.stop_requested()) {
            size_t queue_size = pending_tasks();
            size_t current = worker_count();

            if (queue_size > target_queue_depth_ * 2 
                && current < max_threads_) {
                add_worker();   // 队列堆积 → 增加线程
            } else if (queue_size < target_queue_depth_ / 2 
                       && current > min_threads_) {
                remove_worker();  // 空闲 → 减少线程
            }

            std::this_thread::sleep_for(std::chrono::seconds(1));
        }
    }
};

10.7 perf 火焰图------全系统性能分析

10.7.1 采集与生成火焰图

bash 复制代码
# 1. 采集全系统性能数据 (采样频率 99Hz, 持续 30 秒)
sudo perf record -F 99 -g -a -- sleep 30

# -F 99:   采样频率 99Hz (避免与系统定时器 100Hz 共振)
# -g:      记录调用栈
# -a:      全系统 (所有 CPU)
# -- sleep 30: 采样 30 秒

# 2. 仅采集指定进程
sudo perf record -F 99 -g -p $(pgrep my_server) -- sleep 30

# 3. 生成火焰图
# 下载 FlameGraph 工具:
git clone https://github.com/brendangregg/FlameGraph.git

# 生成:
sudo perf script > out.perf
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > flamegraph.svg

# 或在浏览器打开 flamegraph.svg

10.7.2 火焰图怎么看

#mermaid-svg-yBW58tTuNcfYUX11{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-yBW58tTuNcfYUX11 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-yBW58tTuNcfYUX11 .error-icon{fill:#552222;}#mermaid-svg-yBW58tTuNcfYUX11 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-yBW58tTuNcfYUX11 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-yBW58tTuNcfYUX11 .marker.cross{stroke:#333333;}#mermaid-svg-yBW58tTuNcfYUX11 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-yBW58tTuNcfYUX11 p{margin:0;}#mermaid-svg-yBW58tTuNcfYUX11 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster-label text{fill:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster-label span{color:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster-label span p{background-color:transparent;}#mermaid-svg-yBW58tTuNcfYUX11 .label text,#mermaid-svg-yBW58tTuNcfYUX11 span{fill:#333;color:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .node rect,#mermaid-svg-yBW58tTuNcfYUX11 .node circle,#mermaid-svg-yBW58tTuNcfYUX11 .node ellipse,#mermaid-svg-yBW58tTuNcfYUX11 .node polygon,#mermaid-svg-yBW58tTuNcfYUX11 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-yBW58tTuNcfYUX11 .rough-node .label text,#mermaid-svg-yBW58tTuNcfYUX11 .node .label text,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape .label,#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape .label{text-anchor:middle;}#mermaid-svg-yBW58tTuNcfYUX11 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-yBW58tTuNcfYUX11 .rough-node .label,#mermaid-svg-yBW58tTuNcfYUX11 .node .label,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape .label,#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape .label{text-align:center;}#mermaid-svg-yBW58tTuNcfYUX11 .node.clickable{cursor:pointer;}#mermaid-svg-yBW58tTuNcfYUX11 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-yBW58tTuNcfYUX11 .arrowheadPath{fill:#333333;}#mermaid-svg-yBW58tTuNcfYUX11 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-yBW58tTuNcfYUX11 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-yBW58tTuNcfYUX11 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-yBW58tTuNcfYUX11 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-yBW58tTuNcfYUX11 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-yBW58tTuNcfYUX11 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-yBW58tTuNcfYUX11 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster text{fill:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster span{color:#333;}#mermaid-svg-yBW58tTuNcfYUX11 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-yBW58tTuNcfYUX11 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-yBW58tTuNcfYUX11 rect.text{fill:none;stroke-width:0;}#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape p,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape .label rect,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-yBW58tTuNcfYUX11 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-yBW58tTuNcfYUX11 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-yBW58tTuNcfYUX11 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 火焰图示意 (宽度 = CPU 占比)
main() ████████████████████████████ 100%
worker_thread() ████████████████████ 60%
io_thread() ████████████ 30%
other() ████ 10%
acquire_lock() ██████████████ 45%
process_data() ██████ 15%
__lll_lock_wait ██████ 20% ← 瓶颈!
heavy_calc() ████ 10%
📊 解读:

  1. 宽度 = CPU 时间占比

  2. 颜色 = 无关紧要 (随机生成)

  3. 高而窄 → 深层调用链

  4. 宽而扁 → 热点函数

  5. 平台状 → 锁等待/IO 等待

10.7.3 实战:优化一个多线程服务

bash 复制代码
# 场景: 多线程服务 CPU 70%, 吞吐量 5000 req/s

# Step 1: 采集火焰图
sudo perf record -F 99 -g -p $(pgrep server) -- sleep 30
sudo perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > before.svg

# Step 2: 分析火焰图 → 发现 __lll_lock_wait 占 35% CPU
# → 定位到某全局 mutex 保护的热路径

# Step 3: 优化 (锁分片) → 重新编译部署
# Step 4: 重新采集 → 生成 after.svg
# → __lll_lock_wait 降到 8%, CPU 降到 30%, 吞吐量 12000 req/s

优化前后对比

指标 优化前 优化后 变化
CPU 利用率 70% 30% -57%
吞吐量 5,000 req/s 12,000 req/s +140%
锁等待占比 (火焰图) 35% 8% -77%
P99 延迟 45ms 12ms -73%

10.7.4 常用 perf 命令速查

bash 复制代码
# 查看程序的热点函数 (CPU 占用排行)
perf top -p <pid>

# 统计事件 (上下文切换 / cache miss)
perf stat -e cycles,instructions,cache-misses,branch-misses \
          -e context-switches,cpu-migrations,page-faults \
          ./my_program

# 采样调用栈
perf record -g -p <pid> -- sleep 10
perf report

# 查看特定线程
perf record -t <tid> -g -- sleep 10

# 分析锁等待
perf lock record ./my_program
perf lock report

10.8 性能优化检查清单(汇总)

#mermaid-svg-UG14JTBkeqdwqtiz{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-UG14JTBkeqdwqtiz .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-UG14JTBkeqdwqtiz .error-icon{fill:#552222;}#mermaid-svg-UG14JTBkeqdwqtiz .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-UG14JTBkeqdwqtiz .marker{fill:#333333;stroke:#333333;}#mermaid-svg-UG14JTBkeqdwqtiz .marker.cross{stroke:#333333;}#mermaid-svg-UG14JTBkeqdwqtiz svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-UG14JTBkeqdwqtiz p{margin:0;}#mermaid-svg-UG14JTBkeqdwqtiz .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster-label text{fill:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster-label span{color:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster-label span p{background-color:transparent;}#mermaid-svg-UG14JTBkeqdwqtiz .label text,#mermaid-svg-UG14JTBkeqdwqtiz span{fill:#333;color:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .node rect,#mermaid-svg-UG14JTBkeqdwqtiz .node circle,#mermaid-svg-UG14JTBkeqdwqtiz .node ellipse,#mermaid-svg-UG14JTBkeqdwqtiz .node polygon,#mermaid-svg-UG14JTBkeqdwqtiz .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-UG14JTBkeqdwqtiz .rough-node .label text,#mermaid-svg-UG14JTBkeqdwqtiz .node .label text,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape .label,#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape .label{text-anchor:middle;}#mermaid-svg-UG14JTBkeqdwqtiz .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-UG14JTBkeqdwqtiz .rough-node .label,#mermaid-svg-UG14JTBkeqdwqtiz .node .label,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape .label,#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape .label{text-align:center;}#mermaid-svg-UG14JTBkeqdwqtiz .node.clickable{cursor:pointer;}#mermaid-svg-UG14JTBkeqdwqtiz .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-UG14JTBkeqdwqtiz .arrowheadPath{fill:#333333;}#mermaid-svg-UG14JTBkeqdwqtiz .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-UG14JTBkeqdwqtiz .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-UG14JTBkeqdwqtiz .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-UG14JTBkeqdwqtiz .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-UG14JTBkeqdwqtiz .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-UG14JTBkeqdwqtiz .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-UG14JTBkeqdwqtiz .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster text{fill:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster span{color:#333;}#mermaid-svg-UG14JTBkeqdwqtiz div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-UG14JTBkeqdwqtiz .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-UG14JTBkeqdwqtiz rect.text{fill:none;stroke-width:0;}#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape p,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape .label rect,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-UG14JTBkeqdwqtiz .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-UG14JTBkeqdwqtiz .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-UG14JTBkeqdwqtiz :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 锁等待
False Sharing
上下文切换过多
串行比例大


性能优化开始

  1. 测量基准

perf stat / 火焰图
瓶颈在哪里?
减小锁粒度 / 读写锁 / 锁分片
alignas(64) 隔离变量
减少线程数 / 线程池
重构算法 / 并行 STL
2. 重新测量验证
性能提升?
✅ 记录优化方案


10.9 小结

知识点 掌握程度 核心要点
Amdahl 定律 掌握 串行比例决定加速上限:5% 串行→最多 20x 加速
Gustafson 定律 理解 问题规模增长时,近线性加速可实现
可扩展性测量 掌握 效率 = 加速比 ÷ 线程数,找到效率拐点
perf lock 掌握 perf lock record + perf lock report 定位争用
Cache Line 对齐 掌握 alignas(64) 隔离热数据,分离冷热,重排成员
锁粒度优化 掌握 全局锁→读写锁→锁分片,不同 key 完全并行
内存序降级 掌握 seq_cst→acq_rel→relaxed,每降一级有显著收益
线程数调优 掌握 CPU 密集 = 核心数,IO 密集 = 核心数 × (1+wait/compute)
火焰图 掌握 perf record -g + FlameGraph,宽度 = CPU 时间,锁定热点

系列完结

🎉 至此,C++17 多线程系列全部 10 篇文章已经完成!

篇章 内容 难度
Part 1 线程基础------std::thread ★☆☆☆☆
Part 2 共享数据与同步------mutex/cv ★★☆☆☆
Part 3 原子操作与内存模型 ★★★☆☆
Part 4 异步编程------future/promise/async ★★★☆☆
Part 5 C++17 并行算法 ★★☆☆☆
Part 6 高级同步------shared_mutex/scoped_lock ★★★☆☆
Part 7 线程池------从零实现 ★★★★☆
Part 8 并发模式------Producer-Consumer 等 ★★★★☆
Part 9 调试与排障------TSAN/Helgrind/GDB ★★★★☆
Part 10 性能优化------从测量到调优 ★★★★★

推荐工具

  • perf top -p <pid> ------ 实时热点函数
  • perf stat -e cycles,instructions,cache-misses ------ 事件统计
  • perf record -F 99 -g -a -- sleep 30 ------ 全系统火焰图采样
  • FlameGraph ------ 火焰图生成脚本
  • pahole ------ 分析结构体内存布局
  • Intel VTune / AMD uProf ------ 更强大的性能分析(需 GUI)
相关推荐
程序大视界1 小时前
【C++ 从基础到项目实战】C++(六):拷贝控制——浅拷贝与深拷贝,兼谈智能指针
开发语言·c++·cpp
代码中介商3 小时前
C++四大设计模式:单例、工厂、观察者、策略
java·c++·设计模式
2401_872418783 小时前
什么是多范式编程语言?——以 C++ 为例深入理解编程范式
java·大数据·c++
basketball6163 小时前
设计模式入门:3. 适配器模式详解 C++实现
c++·设计模式·适配器模式
程序大视界4 小时前
【C++ 从基础到项目实战】C++(二):数组、字符串与结构体——组织数据的容器
开发语言·c++·cpp
叶子野格4 小时前
《C语言学习:文件操作》16
c语言·开发语言·c++·学习·visual studio
Lumbrologist4 小时前
【C++】零基础入门 · 第 17 节:多线程编程基础
java·c++·算法
A_humble_scholar5 小时前
C++11 学习笔记:统一初始化、右值引用与完美转发
c++·笔记·学习
叶子野格5 小时前
《C语言学习:位运算》17
c语言·开发语言·c++·学习·visual studio