核心目标:能系统化地测量和优化多线程应用性能,理解 Amdahl 定律和 Gustafson 定律的实践意义,掌握火焰图和 perf 工具使用,建立"测量→分析→优化→验证"的性能闭环。
前置知识:全系列 Part 1-9 的知识,了解 Linux
perf工具更佳。
10.1 多线程性能理论基础------两个定律
10.1.1 Amdahl 定律------串行部分决定加速上限
1
S = ─────────────────
(1 - P) + P / N
S: 加速比 (Speedup)
P: 可并行化的比例
N: 处理器数量
cpp
#include <cmath>
#include <iostream>
#include <vector>
// 计算理论最大加速比
double amdahl_speedup(double parallel_fraction, int num_cores) {
return 1.0 / ((1.0 - parallel_fraction) + parallel_fraction / num_cores);
}
void print_amdahl_table() {
std::cout << "Amdahl 定律: 不同并行度的加速比\n";
std::cout << "核心数 | P=50% | P=75% | P=90% | P=95% | P=99%\n";
std::cout << "-------|-------|-------|-------|-------|-------\n";
for (int n : {1, 2, 4, 8, 16, 32, 64, 128}) {
printf("%-6d | %-5.2f | %-5.2f | %-5.2f | %-5.2f | %-5.2f\n",
n,
amdahl_speedup(0.50, n),
amdahl_speedup(0.75, n),
amdahl_speedup(0.90, n),
amdahl_speedup(0.95, n),
amdahl_speedup(0.99, n));
}
}
输出结果:
| 核心数 | P=50% | P=75% | P=90% | P=95% | P=99% |
|---|---|---|---|---|---|
| 1 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 4 | 1.60 | 2.29 | 3.08 | 3.48 | 3.88 |
| 8 | 1.78 | 2.91 | 4.71 | 5.93 | 7.48 |
| 16 | 1.88 | 3.37 | 6.40 | 9.14 | 13.91 |
| 32 | 1.94 | 3.66 | 7.80 | 12.55 | 24.43 |
| 128 | 1.99 | 3.88 | 9.14 | 16.76 | 56.06 |
| ∞ | 2.00 | 4.00 | 10.00 | 20.00 | 100.00 |
#mermaid-svg-XP8A2v6rNPGX7m6b{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-XP8A2v6rNPGX7m6b .error-icon{fill:#552222;}#mermaid-svg-XP8A2v6rNPGX7m6b .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-XP8A2v6rNPGX7m6b .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-XP8A2v6rNPGX7m6b .marker{fill:#333333;stroke:#333333;}#mermaid-svg-XP8A2v6rNPGX7m6b .marker.cross{stroke:#333333;}#mermaid-svg-XP8A2v6rNPGX7m6b svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-XP8A2v6rNPGX7m6b p{margin:0;}#mermaid-svg-XP8A2v6rNPGX7m6b .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster-label text{fill:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster-label span{color:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster-label span p{background-color:transparent;}#mermaid-svg-XP8A2v6rNPGX7m6b .label text,#mermaid-svg-XP8A2v6rNPGX7m6b span{fill:#333;color:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .node rect,#mermaid-svg-XP8A2v6rNPGX7m6b .node circle,#mermaid-svg-XP8A2v6rNPGX7m6b .node ellipse,#mermaid-svg-XP8A2v6rNPGX7m6b .node polygon,#mermaid-svg-XP8A2v6rNPGX7m6b .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-XP8A2v6rNPGX7m6b .rough-node .label text,#mermaid-svg-XP8A2v6rNPGX7m6b .node .label text,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape .label,#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape .label{text-anchor:middle;}#mermaid-svg-XP8A2v6rNPGX7m6b .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-XP8A2v6rNPGX7m6b .rough-node .label,#mermaid-svg-XP8A2v6rNPGX7m6b .node .label,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape .label,#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape .label{text-align:center;}#mermaid-svg-XP8A2v6rNPGX7m6b .node.clickable{cursor:pointer;}#mermaid-svg-XP8A2v6rNPGX7m6b .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-XP8A2v6rNPGX7m6b .arrowheadPath{fill:#333333;}#mermaid-svg-XP8A2v6rNPGX7m6b .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-XP8A2v6rNPGX7m6b .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-XP8A2v6rNPGX7m6b .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XP8A2v6rNPGX7m6b .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-XP8A2v6rNPGX7m6b .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XP8A2v6rNPGX7m6b .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster text{fill:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b .cluster span{color:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-XP8A2v6rNPGX7m6b .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-XP8A2v6rNPGX7m6b rect.text{fill:none;stroke-width:0;}#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape p,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-XP8A2v6rNPGX7m6b .icon-shape .label rect,#mermaid-svg-XP8A2v6rNPGX7m6b .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-XP8A2v6rNPGX7m6b .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-XP8A2v6rNPGX7m6b .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-XP8A2v6rNPGX7m6b :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 程序执行
并行化
可并行 80%
串行 20%
Core1: 20%
Core2: 20%
Core3: 20%
Core4: 20%
无论多少核 串行部分始终 20%
Amdahl 结论: 即使无限核心, 加速比 ≤ 1÷20% = 5x
Amdahl 定律的残酷真相 :即使你有 128 核,如果程序只有 90% 可并行,理论加速上限只有 10x。串行 10% 就是瓶颈。
10.1.2 Gustafson 定律------问题规模增大时加速比
S = N - α·(N - 1)
S: 加速比
N: 处理器数量
α: 串行比例
与 Amdahl 的核心区别:
Amdahl: 固定问题规模 → 串行比例不变 → 加速有上限
Gustafson: 问题规模随核心数增长 → 串行比例减小 → 线性加速
实际场景: 图像分辨率从 1080p 升级到 4K/8K
串行开销 (文件IO) 几乎不变, 但可并行的像素处理翻倍/N倍
| 定律 | 假设 | 适用场景 | 关键结论 |
|---|---|---|---|
| Amdahl | 固定问题规模 | 延迟敏感型 (Latency-bound) | 串行部分决定上限 |
| Gustafson | 规模随核心数增长 | 吞吐量型 (Throughput-bound) | 近线性加速可行 |
10.1.3 可扩展性测量
cpp
#include <atomic>
#include <thread>
#include <chrono>
#include <vector>
// 测量不同线程数的加速比
void measure_scalability(int max_threads) {
constexpr long N = 50'000'000; // 计算量
auto baseline = measure_parallel_work(1, N); // 基准 (单线程)
std::cout << "线程数 | 耗时 | 加速比 | 效率\n";
std::cout << "-------|-------|--------|------\n";
for (int t = 1; t <= max_threads; t *= 2) {
auto time = measure_parallel_work(t, N);
double speedup = (double)baseline / time;
double efficiency = speedup / t * 100;
printf("%-6d | %-5d | %-6.2f | %.1f%%\n",
t, time, speedup, efficiency);
// 理想: speedup ≈ t, efficiency ≈ 100%
// 实际: 随 t 增加, efficiency 下降 ← 找到拐点
}
}
double measure_parallel_work(int threads, long total_ops) {
std::atomic<long> counter{0};
std::vector<std::thread> workers;
long ops_per_thread = total_ops / threads;
auto t1 = std::chrono::steady_clock::now();
for (int i = 0; i < threads; ++i) {
workers.emplace_back([&] {
for (long j = 0; j < ops_per_thread; ++j) {
// 模拟计算
volatile double x = 0;
for (int k = 0; k < 10; ++k) x += std::sqrt(k);
counter.fetch_add(1, std::memory_order_relaxed);
}
});
}
for (auto& w : workers) w.join();
auto t2 = std::chrono::steady_clock::now();
return std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
}
// 典型输出 (8核):
// 线程数 | 耗时 | 加速比 | 效率
// 1 | 4850 | 1.00 | 100.0%
// 2 | 2500 | 1.94 | 97.0%
// 4 | 1310 | 3.70 | 92.5%
// 8 | 730 | 6.64 | 83.0% ← 拐点
// 16 | 710 | 6.83 | 42.7% ← 过度订阅!
10.2 锁的性能分析与优化
10.2.1 perf lock------Linux 锁争用分析
bash
# 1. 采集锁事件
perf lock record ./my_program
# 2. 查看锁争用报告
perf lock report
# 典型输出:
# Name acquired contended avg wait (ns) total wait (ns)
# pthread_mutex_lock 50000 3500 1200 4200000
# SharedMutex::lock 200 180 15000 2700000
#
# 解读:
# - my_program_mutex: 3.5% 争用率, 平均等待 1200ns
# - shared_mutex: 90% 争用率, 平均等待 15000ns ← 瓶颈!
# 3. 查看争用详情 (调用栈)
perf lock report -k
# 4. 结合 perf top 实时观察
perf top -e cycles:p
# 如果看到 __lll_lock_wait 占比高 → 锁争用严重
10.2.2 锁争用率测量
cpp
#include <mutex>
#include <chrono>
#include <atomic>
#include <iostream>
class ContentionMeter {
std::atomic<long> lock_acquisitions_{0};
std::atomic<long> lock_contentions_{0}; // 需要等待的次数
public:
void record_acquire(bool contended) {
lock_acquisitions_.fetch_add(1, std::memory_order_relaxed);
if (contended) {
lock_contentions_.fetch_add(1, std::memory_order_relaxed);
}
}
double contention_rate() const {
long total = lock_acquisitions_.load();
if (total == 0) return 0.0;
return 100.0 * lock_contentions_.load() / total;
}
void report() const {
printf("Lock stats: %ld acquires, %ld contentions (%.1f%%)\n",
lock_acquisitions_.load(),
lock_contentions_.load(),
contention_rate());
}
};
// 带统计的 mutex wrapper
class MeteredMutex {
std::mutex mtx_;
ContentionMeter& meter_;
public:
explicit MeteredMutex(ContentionMeter& m) : meter_(m) {}
void lock() {
bool contended = !mtx_.try_lock();
meter_.record_acquire(contended);
if (contended) {
mtx_.lock(); // 真正等待
}
}
void unlock() { mtx_.unlock(); }
};
10.2.3 不同锁的开销微基准
cpp
#include <mutex>
#include <shared_mutex>
#include <atomic>
#include <thread>
#include <functional>
struct LockBenchmark {
static constexpr int kIterations = 10'000'000;
static constexpr int kNumThreads = 4;
// 各锁类型的单次操作耗时 (无竞争)
template <typename L, typename Unlock>
static double measure(L&& lock_fn, Unlock&& unlock_fn) {
auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < kIterations; ++i) {
lock_fn();
unlock_fn();
}
auto t2 = std::chrono::high_resolution_clock::now();
auto ns = std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
return (double)ns / kIterations; // 每次操作平均纳秒数
}
// 各锁类型的吞吐量 (4 线程竞争)
template <typename Setup, typename Work>
static double measure_throughput(Setup&& setup, Work&& work) {
std::atomic<bool> start{false};
std::atomic<long> counter{0};
setup();
std::vector<std::thread> threads;
for (int t = 0; t < kNumThreads; ++t) {
threads.emplace_back([&] {
while (!start) std::this_thread::yield();
for (int i = 0; i < kIterations / kNumThreads; ++i) {
work();
counter.fetch_add(1, std::memory_order_relaxed);
}
});
}
auto t1 = std::chrono::high_resolution_clock::now();
start = true;
for (auto& t : threads) t.join();
auto t2 = std::chrono::high_resolution_clock::now();
auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
return counter.load() / (ms / 1000.0); // ops/s
}
};
微基准结果(典型值):
| 锁类型 | 无竞争延迟 (ns) | 4 线程吞吐 (M ops/s) | 8 线程吞吐 (M ops/s) |
|---|---|---|---|
std::mutex |
25 | 2.8 | 2.2 |
std::shared_mutex (独占) |
35 | 2.4 | 1.9 |
std::shared_mutex (共享) |
30 | 12.5 | 16.8 |
std::atomic_flag 自旋锁 |
5 | 3.2 | 2.6 |
tbb::spin_mutex |
8 | 3.5 | 3.0 |
10.3 Cache Line 对齐------从结构体布局要性能
10.3.1 热数据与冷数据分离
cpp
// ❌ 热数据和冷数据混在一起
struct BadDataLayout {
// 热数据 (频繁多线程访问)
std::atomic<long> ref_count{0}; // 24B (含 padding)
std::mutex mtx; // 40B
int active_workers{0}; // 4B
// 冷数据 (偶尔访问)
char debug_name[256]; // 256B
char error_message[512]; // 512B
// 问题: ref_count 和 cold data 在同一 cache line
// → 访问 debug_name 会使 ref_count 的 cache line 失效!
};
// 大小 ≈ 840B
// ✅ 按访问频率分离
struct GoodDataLayout {
// 热 Cache Line 0: 原子操作区
struct HotData {
alignas(64) std::atomic<long> ref_count{0};
int active_workers{0};
} hot;
// 热 Cache Line 1: 锁
alignas(64) std::mutex mtx;
// 冷 Cache Line 2+: 调试信息
char debug_name[256];
char error_message[512];
};
// 大小 ≈ 900B, 但性能更好
#mermaid-svg-4OHB4ys80hfMhxJI{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-4OHB4ys80hfMhxJI .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-4OHB4ys80hfMhxJI .error-icon{fill:#552222;}#mermaid-svg-4OHB4ys80hfMhxJI .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-4OHB4ys80hfMhxJI .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-4OHB4ys80hfMhxJI .marker{fill:#333333;stroke:#333333;}#mermaid-svg-4OHB4ys80hfMhxJI .marker.cross{stroke:#333333;}#mermaid-svg-4OHB4ys80hfMhxJI svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-4OHB4ys80hfMhxJI p{margin:0;}#mermaid-svg-4OHB4ys80hfMhxJI .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster-label text{fill:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster-label span{color:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster-label span p{background-color:transparent;}#mermaid-svg-4OHB4ys80hfMhxJI .label text,#mermaid-svg-4OHB4ys80hfMhxJI span{fill:#333;color:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .node rect,#mermaid-svg-4OHB4ys80hfMhxJI .node circle,#mermaid-svg-4OHB4ys80hfMhxJI .node ellipse,#mermaid-svg-4OHB4ys80hfMhxJI .node polygon,#mermaid-svg-4OHB4ys80hfMhxJI .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-4OHB4ys80hfMhxJI .rough-node .label text,#mermaid-svg-4OHB4ys80hfMhxJI .node .label text,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape .label,#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape .label{text-anchor:middle;}#mermaid-svg-4OHB4ys80hfMhxJI .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-4OHB4ys80hfMhxJI .rough-node .label,#mermaid-svg-4OHB4ys80hfMhxJI .node .label,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape .label,#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape .label{text-align:center;}#mermaid-svg-4OHB4ys80hfMhxJI .node.clickable{cursor:pointer;}#mermaid-svg-4OHB4ys80hfMhxJI .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-4OHB4ys80hfMhxJI .arrowheadPath{fill:#333333;}#mermaid-svg-4OHB4ys80hfMhxJI .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-4OHB4ys80hfMhxJI .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-4OHB4ys80hfMhxJI .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-4OHB4ys80hfMhxJI .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-4OHB4ys80hfMhxJI .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-4OHB4ys80hfMhxJI .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-4OHB4ys80hfMhxJI .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster text{fill:#333;}#mermaid-svg-4OHB4ys80hfMhxJI .cluster span{color:#333;}#mermaid-svg-4OHB4ys80hfMhxJI div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-4OHB4ys80hfMhxJI .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-4OHB4ys80hfMhxJI rect.text{fill:none;stroke-width:0;}#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape p,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-4OHB4ys80hfMhxJI .icon-shape .label rect,#mermaid-svg-4OHB4ys80hfMhxJI .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-4OHB4ys80hfMhxJI .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-4OHB4ys80hfMhxJI .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-4OHB4ys80hfMhxJI :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} ✅ 分离后
互不干扰
Cache Line 0 (Hot)
ref_count\|active_workers
Thread 1: 写 ref_count
Thread 2: 读 debug_name
Cache Line 1 (Hot)
mtx
Cache Line 2 (Cold)
debug_name...
❌ 混在一起
冲突! (False Sharing)
Cache Line 0
[ref_count|mtx|active|debug_name...
Thread 1: 写 ref_count
Thread 2: 读 debug_name
10.3.2 结构体成员重排
cpp
// ❌ 自然对齐产生的空洞
struct Unoptimized {
char flag; // 1B → 占用 offset 0
// padding // 7B → 对齐到 8B
double value; // 8B → offset 8
int count; // 4B → offset 16
char type; // 1B → offset 20
// padding // 3B → 对齐到 8B 边界
};
// sizeof = 24 bytes (7+3 = 10 bytes 浪费!)
// ✅ 重排后
struct Optimized {
double value; // 8B → offset 0
int count; // 4B → offset 8
char flag; // 1B → offset 12
char type; // 1B → offset 13
// padding // 2B → 对齐到 8B
};
// sizeof = 16 bytes (节省 33%)
// 排列规则: 按对齐要求从大到小排序
// double(8) > int(4) > char(1)
10.4 锁粒度优化------从粗到细
10.4.1 锁粒度演进
cpp
#include <shared_mutex>
#include <vector>
#include <unordered_map>
// ── Level 1: 全局锁 (最粗粒度) ──
class GlobalLockCache {
std::mutex mtx_; // 一把锁保护所有数据
std::unordered_map<std::string, std::string> data_;
std::vector<std::string> access_log_;
public:
void put(const std::string& k, const std::string& v) {
std::lock_guard lock(mtx_); // 整个操作持锁
data_[k] = v;
access_log_.push_back("PUT " + k);
}
std::string get(const std::string& k) {
std::lock_guard lock(mtx_);
access_log_.push_back("GET " + k);
auto it = data_.find(k);
return it != data_.end() ? it->second : "";
}
// 问题: get 和 put 完全互斥!
};
// ── Level 2: 读写锁 ──
class RwLockCache {
mutable std::shared_mutex mtx_; // 读共享, 写互斥
std::unordered_map<std::string, std::string> data_;
public:
void put(const std::string& k, const std::string& v) {
std::unique_lock lock(mtx_);
data_[k] = v;
}
std::string get(const std::string& k) const {
std::shared_lock lock(mtx_);
auto it = data_.find(k);
return it != data_.end() ? it->second : "";
}
// 改进: 多个 get 可并发, 但 put 阻塞所有 get
};
// ── Level 3: 锁分片 (Lock Striping) ──
class ShardedCache {
static constexpr int kShards = 16;
struct Shard {
mutable std::shared_mutex mtx;
std::unordered_map<std::string, std::string> data;
};
std::vector<Shard> shards_{kShards};
std::hash<std::string> hasher_;
Shard& get_shard(const std::string& key) {
return shards_[hasher_(key) % kShards];
}
public:
void put(const std::string& k, const std::string& v) {
auto& shard = get_shard(k);
std::unique_lock lock(shard.mtx);
shard.data[k] = v;
}
std::string get(const std::string& k) const {
auto& shard = get_shard(k);
std::shared_lock lock(shard.mtx);
auto it = shard.data.find(k);
return it != shard.data.end() ? it->second : "";
}
// 改进: 不同 key 的操作完全并行 (16x 并发!)
};
10.4.2 收益量化
场景: 8 线程, 90% 读 + 10% 写, 100 万 key
全局锁: 870K ops/s (基准)
读写锁: 3.2M ops/s (3.7x)
锁分片: 11.5M ops/s (13.2x)
锁分片 = 读写锁 + 降低争用概率
争用概率从 1/1 降到 1/16 (理想情况下)
10.5 内存序降级------安全地榨取性能
10.5.1 降级路线
cpp
#include <atomic>
// ── Level 1: seq_cst (默认, 最安全, 最慢) ──
std::atomic<int> counter_seq{0};
void increment_seq() {
counter_seq.fetch_add(1); // 默认 seq_cst
// 全局顺序一致性: 保证所有线程看到一致的修改顺序
// 代价: 需要 full memory barrier (x86: MFENCE ~33 cycles)
}
// ── Level 2: acquire/release (推荐) ──
std::atomic<int> counter_ar{0};
void producer() {
// ... 准备数据 ...
counter_ar.store(42, std::memory_order_release); // 之前的写可见
}
void consumer() {
int val = counter_ar.load(std::memory_order_acquire); // 之后读可见
// val == 42 → 保证 producer 之前的写入都可见
}
// 释放-获取对: 建立线程间的 happens-before 关系
// 代价: 单向 barrier (x86: 几乎免费, x86 默认提供)
// ── Level 3: relaxed (最快, 但需谨慎) ──
std::atomic<long> stat_counter{0};
void record_stat() {
stat_counter.fetch_add(1, std::memory_order_relaxed);
// 仅保证原子性, 不保证顺序
// 适用: 统计计数器、引用计数 (不依赖其他数据的顺序)
}
// 代价: 最小 (x86: LOCK INC 指令, ~10 cycles)
10.5.2 安全降级检查清单
#mermaid-svg-n6VarNS9Cpi2kfex{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-n6VarNS9Cpi2kfex .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-n6VarNS9Cpi2kfex .error-icon{fill:#552222;}#mermaid-svg-n6VarNS9Cpi2kfex .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-n6VarNS9Cpi2kfex .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-n6VarNS9Cpi2kfex .marker{fill:#333333;stroke:#333333;}#mermaid-svg-n6VarNS9Cpi2kfex .marker.cross{stroke:#333333;}#mermaid-svg-n6VarNS9Cpi2kfex svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-n6VarNS9Cpi2kfex p{margin:0;}#mermaid-svg-n6VarNS9Cpi2kfex .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster-label text{fill:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster-label span{color:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster-label span p{background-color:transparent;}#mermaid-svg-n6VarNS9Cpi2kfex .label text,#mermaid-svg-n6VarNS9Cpi2kfex span{fill:#333;color:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .node rect,#mermaid-svg-n6VarNS9Cpi2kfex .node circle,#mermaid-svg-n6VarNS9Cpi2kfex .node ellipse,#mermaid-svg-n6VarNS9Cpi2kfex .node polygon,#mermaid-svg-n6VarNS9Cpi2kfex .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-n6VarNS9Cpi2kfex .rough-node .label text,#mermaid-svg-n6VarNS9Cpi2kfex .node .label text,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape .label,#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape .label{text-anchor:middle;}#mermaid-svg-n6VarNS9Cpi2kfex .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-n6VarNS9Cpi2kfex .rough-node .label,#mermaid-svg-n6VarNS9Cpi2kfex .node .label,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape .label,#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape .label{text-align:center;}#mermaid-svg-n6VarNS9Cpi2kfex .node.clickable{cursor:pointer;}#mermaid-svg-n6VarNS9Cpi2kfex .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-n6VarNS9Cpi2kfex .arrowheadPath{fill:#333333;}#mermaid-svg-n6VarNS9Cpi2kfex .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-n6VarNS9Cpi2kfex .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-n6VarNS9Cpi2kfex .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-n6VarNS9Cpi2kfex .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-n6VarNS9Cpi2kfex .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-n6VarNS9Cpi2kfex .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-n6VarNS9Cpi2kfex .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster text{fill:#333;}#mermaid-svg-n6VarNS9Cpi2kfex .cluster span{color:#333;}#mermaid-svg-n6VarNS9Cpi2kfex div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-n6VarNS9Cpi2kfex .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-n6VarNS9Cpi2kfex rect.text{fill:none;stroke-width:0;}#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape p,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-n6VarNS9Cpi2kfex .icon-shape .label rect,#mermaid-svg-n6VarNS9Cpi2kfex .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-n6VarNS9Cpi2kfex .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-n6VarNS9Cpi2kfex .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-n6VarNS9Cpi2kfex :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 是
否
是
否
是
否
是
否
考虑降级内存序
数据之间有依赖关系?
保留 acquire/release
或 seq_cst
需要多个原子间的全局顺序?
保留 seq_cst
只是简单计数器/flag?
✅ 可以降为 relaxed
生产者-消费者模式?
✅ 降为 acquire/release
保持 seq_cst
10.5.3 性能测试对比(x86-64)
cpp
// 4 线程, 1 亿次 fetch_add
// 内存序 | 耗时 | 相对 seq_cst
// seq_cst | 3200ms | 1.00x
// acq_rel | 2100ms | 1.52x
// release | 1800ms | 1.78x
// relaxed | 1450ms | 2.21x
// x86 上 acquire 和 relaxed 代价几乎相同 (硬件保证 Load-Load 顺序)
// ARM/PowerPC 上差异更明显
10.6 线程数调优------找到最优并行度
10.6.1 线程数公式
cpp
#include <thread>
#include <algorithm>
// CPU 密集型 (计算 bound)
size_t cpu_bound_threads() {
unsigned int n = std::thread::hardware_concurrency();
return std::max(1u, n); // 正好等于核心数
// 不要超订 (oversubscription): 超过核心数 → 上下文切换开销 > 并行收益
}
// IO 密集型 (IO bound)
size_t io_bound_threads(double wait_to_compute_ratio = 10.0) {
unsigned int n = std::thread::hardware_concurrency();
// 如果 90% 时间在等待 IO, 可以放 10x 的线程
return std::max(1u, static_cast<unsigned int>(n * (1 + wait_to_compute_ratio)));
// 典型值: 核心数 × (1~20)
}
// 混合型
size_t mixed_threads(double io_fraction = 0.5) {
unsigned int n = std::thread::hardware_concurrency();
// io_fraction: IO 占任务时间的比例
return std::max(1u, static_cast<unsigned int>(n / (1 - io_fraction)));
}
10.6.2 过度订阅的检测
bash
# 观察上下文切换频率
perf stat -e context-switches,cpu-migrations ./my_program
# 如果 context-switches >> 任务数 → 可能过度订阅
# 正常: context-switches 约等于任务数 (每个任务一次切换)
# 过度订阅: context-switches >> 任务数 (频繁切换)
# 观察 CPU 利用率
htop
# 如果所有核心 100% 但吞吐量不再增长 → 过度订阅
cpp
// 代码中检测
#include <fstream>
#include <string>
long get_voluntary_ctxt_switches() {
std::ifstream status("/proc/self/status");
std::string line;
while (std::getline(status, line)) {
if (line.starts_with("voluntary_ctxt_switches:")) {
return std::stol(line.substr(26));
}
}
return 0;
}
// 操作前后对比:
// auto before = get_voluntary_ctxt_switches();
// do_work();
// auto after = get_voluntary_ctxt_switches();
// 如果 after - before 远大于线程数 → 过度订阅
10.6.3 线程池大小动态调整
cpp
// 概念: 根据任务队列深度动态调整池大小
class AdaptiveThreadPool {
size_t min_threads_;
size_t max_threads_;
size_t target_queue_depth_;
public:
void monitor_and_adjust(std::stop_token token) {
while (!token.stop_requested()) {
size_t queue_size = pending_tasks();
size_t current = worker_count();
if (queue_size > target_queue_depth_ * 2
&& current < max_threads_) {
add_worker(); // 队列堆积 → 增加线程
} else if (queue_size < target_queue_depth_ / 2
&& current > min_threads_) {
remove_worker(); // 空闲 → 减少线程
}
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
};
10.7 perf 火焰图------全系统性能分析
10.7.1 采集与生成火焰图
bash
# 1. 采集全系统性能数据 (采样频率 99Hz, 持续 30 秒)
sudo perf record -F 99 -g -a -- sleep 30
# -F 99: 采样频率 99Hz (避免与系统定时器 100Hz 共振)
# -g: 记录调用栈
# -a: 全系统 (所有 CPU)
# -- sleep 30: 采样 30 秒
# 2. 仅采集指定进程
sudo perf record -F 99 -g -p $(pgrep my_server) -- sleep 30
# 3. 生成火焰图
# 下载 FlameGraph 工具:
git clone https://github.com/brendangregg/FlameGraph.git
# 生成:
sudo perf script > out.perf
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > flamegraph.svg
# 或在浏览器打开 flamegraph.svg
10.7.2 火焰图怎么看
#mermaid-svg-yBW58tTuNcfYUX11{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-yBW58tTuNcfYUX11 .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-yBW58tTuNcfYUX11 .error-icon{fill:#552222;}#mermaid-svg-yBW58tTuNcfYUX11 .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-yBW58tTuNcfYUX11 .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-yBW58tTuNcfYUX11 .marker{fill:#333333;stroke:#333333;}#mermaid-svg-yBW58tTuNcfYUX11 .marker.cross{stroke:#333333;}#mermaid-svg-yBW58tTuNcfYUX11 svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-yBW58tTuNcfYUX11 p{margin:0;}#mermaid-svg-yBW58tTuNcfYUX11 .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster-label text{fill:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster-label span{color:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster-label span p{background-color:transparent;}#mermaid-svg-yBW58tTuNcfYUX11 .label text,#mermaid-svg-yBW58tTuNcfYUX11 span{fill:#333;color:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .node rect,#mermaid-svg-yBW58tTuNcfYUX11 .node circle,#mermaid-svg-yBW58tTuNcfYUX11 .node ellipse,#mermaid-svg-yBW58tTuNcfYUX11 .node polygon,#mermaid-svg-yBW58tTuNcfYUX11 .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-yBW58tTuNcfYUX11 .rough-node .label text,#mermaid-svg-yBW58tTuNcfYUX11 .node .label text,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape .label,#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape .label{text-anchor:middle;}#mermaid-svg-yBW58tTuNcfYUX11 .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-yBW58tTuNcfYUX11 .rough-node .label,#mermaid-svg-yBW58tTuNcfYUX11 .node .label,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape .label,#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape .label{text-align:center;}#mermaid-svg-yBW58tTuNcfYUX11 .node.clickable{cursor:pointer;}#mermaid-svg-yBW58tTuNcfYUX11 .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-yBW58tTuNcfYUX11 .arrowheadPath{fill:#333333;}#mermaid-svg-yBW58tTuNcfYUX11 .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-yBW58tTuNcfYUX11 .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-yBW58tTuNcfYUX11 .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-yBW58tTuNcfYUX11 .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-yBW58tTuNcfYUX11 .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-yBW58tTuNcfYUX11 .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-yBW58tTuNcfYUX11 .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster text{fill:#333;}#mermaid-svg-yBW58tTuNcfYUX11 .cluster span{color:#333;}#mermaid-svg-yBW58tTuNcfYUX11 div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-yBW58tTuNcfYUX11 .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-yBW58tTuNcfYUX11 rect.text{fill:none;stroke-width:0;}#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape p,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-yBW58tTuNcfYUX11 .icon-shape .label rect,#mermaid-svg-yBW58tTuNcfYUX11 .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-yBW58tTuNcfYUX11 .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-yBW58tTuNcfYUX11 .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-yBW58tTuNcfYUX11 :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 火焰图示意 (宽度 = CPU 占比)
main() ████████████████████████████ 100%
worker_thread() ████████████████████ 60%
io_thread() ████████████ 30%
other() ████ 10%
acquire_lock() ██████████████ 45%
process_data() ██████ 15%
__lll_lock_wait ██████ 20% ← 瓶颈!
heavy_calc() ████ 10%
📊 解读:
-
宽度 = CPU 时间占比
-
颜色 = 无关紧要 (随机生成)
-
高而窄 → 深层调用链
-
宽而扁 → 热点函数
-
平台状 → 锁等待/IO 等待
10.7.3 实战:优化一个多线程服务
bash
# 场景: 多线程服务 CPU 70%, 吞吐量 5000 req/s
# Step 1: 采集火焰图
sudo perf record -F 99 -g -p $(pgrep server) -- sleep 30
sudo perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > before.svg
# Step 2: 分析火焰图 → 发现 __lll_lock_wait 占 35% CPU
# → 定位到某全局 mutex 保护的热路径
# Step 3: 优化 (锁分片) → 重新编译部署
# Step 4: 重新采集 → 生成 after.svg
# → __lll_lock_wait 降到 8%, CPU 降到 30%, 吞吐量 12000 req/s
优化前后对比:
| 指标 | 优化前 | 优化后 | 变化 |
|---|---|---|---|
| CPU 利用率 | 70% | 30% | -57% |
| 吞吐量 | 5,000 req/s | 12,000 req/s | +140% |
| 锁等待占比 (火焰图) | 35% | 8% | -77% |
| P99 延迟 | 45ms | 12ms | -73% |
10.7.4 常用 perf 命令速查
bash
# 查看程序的热点函数 (CPU 占用排行)
perf top -p <pid>
# 统计事件 (上下文切换 / cache miss)
perf stat -e cycles,instructions,cache-misses,branch-misses \
-e context-switches,cpu-migrations,page-faults \
./my_program
# 采样调用栈
perf record -g -p <pid> -- sleep 10
perf report
# 查看特定线程
perf record -t <tid> -g -- sleep 10
# 分析锁等待
perf lock record ./my_program
perf lock report
10.8 性能优化检查清单(汇总)
#mermaid-svg-UG14JTBkeqdwqtiz{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}@keyframes edge-animation-frame{from{stroke-dashoffset:0;}}@keyframes dash{to{stroke-dashoffset:0;}}#mermaid-svg-UG14JTBkeqdwqtiz .edge-animation-slow{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 50s linear infinite;stroke-linecap:round;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-animation-fast{stroke-dasharray:9,5!important;stroke-dashoffset:900;animation:dash 20s linear infinite;stroke-linecap:round;}#mermaid-svg-UG14JTBkeqdwqtiz .error-icon{fill:#552222;}#mermaid-svg-UG14JTBkeqdwqtiz .error-text{fill:#552222;stroke:#552222;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-thickness-normal{stroke-width:1px;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-thickness-thick{stroke-width:3.5px;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-pattern-solid{stroke-dasharray:0;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-thickness-invisible{stroke-width:0;fill:none;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-pattern-dashed{stroke-dasharray:3;}#mermaid-svg-UG14JTBkeqdwqtiz .edge-pattern-dotted{stroke-dasharray:2;}#mermaid-svg-UG14JTBkeqdwqtiz .marker{fill:#333333;stroke:#333333;}#mermaid-svg-UG14JTBkeqdwqtiz .marker.cross{stroke:#333333;}#mermaid-svg-UG14JTBkeqdwqtiz svg{font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:16px;}#mermaid-svg-UG14JTBkeqdwqtiz p{margin:0;}#mermaid-svg-UG14JTBkeqdwqtiz .label{font-family:"trebuchet ms",verdana,arial,sans-serif;color:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster-label text{fill:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster-label span{color:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster-label span p{background-color:transparent;}#mermaid-svg-UG14JTBkeqdwqtiz .label text,#mermaid-svg-UG14JTBkeqdwqtiz span{fill:#333;color:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .node rect,#mermaid-svg-UG14JTBkeqdwqtiz .node circle,#mermaid-svg-UG14JTBkeqdwqtiz .node ellipse,#mermaid-svg-UG14JTBkeqdwqtiz .node polygon,#mermaid-svg-UG14JTBkeqdwqtiz .node path{fill:#ECECFF;stroke:#9370DB;stroke-width:1px;}#mermaid-svg-UG14JTBkeqdwqtiz .rough-node .label text,#mermaid-svg-UG14JTBkeqdwqtiz .node .label text,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape .label,#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape .label{text-anchor:middle;}#mermaid-svg-UG14JTBkeqdwqtiz .node .katex path{fill:#000;stroke:#000;stroke-width:1px;}#mermaid-svg-UG14JTBkeqdwqtiz .rough-node .label,#mermaid-svg-UG14JTBkeqdwqtiz .node .label,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape .label,#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape .label{text-align:center;}#mermaid-svg-UG14JTBkeqdwqtiz .node.clickable{cursor:pointer;}#mermaid-svg-UG14JTBkeqdwqtiz .root .anchor path{fill:#333333!important;stroke-width:0;stroke:#333333;}#mermaid-svg-UG14JTBkeqdwqtiz .arrowheadPath{fill:#333333;}#mermaid-svg-UG14JTBkeqdwqtiz .edgePath .path{stroke:#333333;stroke-width:2.0px;}#mermaid-svg-UG14JTBkeqdwqtiz .flowchart-link{stroke:#333333;fill:none;}#mermaid-svg-UG14JTBkeqdwqtiz .edgeLabel{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-UG14JTBkeqdwqtiz .edgeLabel p{background-color:rgba(232,232,232, 0.8);}#mermaid-svg-UG14JTBkeqdwqtiz .edgeLabel rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-UG14JTBkeqdwqtiz .labelBkg{background-color:rgba(232, 232, 232, 0.5);}#mermaid-svg-UG14JTBkeqdwqtiz .cluster rect{fill:#ffffde;stroke:#aaaa33;stroke-width:1px;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster text{fill:#333;}#mermaid-svg-UG14JTBkeqdwqtiz .cluster span{color:#333;}#mermaid-svg-UG14JTBkeqdwqtiz div.mermaidTooltip{position:absolute;text-align:center;max-width:200px;padding:2px;font-family:"trebuchet ms",verdana,arial,sans-serif;font-size:12px;background:hsl(80, 100%, 96.2745098039%);border:1px solid #aaaa33;border-radius:2px;pointer-events:none;z-index:100;}#mermaid-svg-UG14JTBkeqdwqtiz .flowchartTitleText{text-anchor:middle;font-size:18px;fill:#333;}#mermaid-svg-UG14JTBkeqdwqtiz rect.text{fill:none;stroke-width:0;}#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape{background-color:rgba(232,232,232, 0.8);text-align:center;}#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape p,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape p{background-color:rgba(232,232,232, 0.8);padding:2px;}#mermaid-svg-UG14JTBkeqdwqtiz .icon-shape .label rect,#mermaid-svg-UG14JTBkeqdwqtiz .image-shape .label rect{opacity:0.5;background-color:rgba(232,232,232, 0.8);fill:rgba(232,232,232, 0.8);}#mermaid-svg-UG14JTBkeqdwqtiz .label-icon{display:inline-block;height:1em;overflow:visible;vertical-align:-0.125em;}#mermaid-svg-UG14JTBkeqdwqtiz .node .label-icon path{fill:currentColor;stroke:revert;stroke-width:revert;}#mermaid-svg-UG14JTBkeqdwqtiz :root{--mermaid-font-family:"trebuchet ms",verdana,arial,sans-serif;} 锁等待
False Sharing
上下文切换过多
串行比例大
是
否
性能优化开始
- 测量基准
perf stat / 火焰图
瓶颈在哪里?
减小锁粒度 / 读写锁 / 锁分片
alignas(64) 隔离变量
减少线程数 / 线程池
重构算法 / 并行 STL
2. 重新测量验证
性能提升?
✅ 记录优化方案
10.9 小结
| 知识点 | 掌握程度 | 核心要点 |
|---|---|---|
| Amdahl 定律 | 掌握 | 串行比例决定加速上限:5% 串行→最多 20x 加速 |
| Gustafson 定律 | 理解 | 问题规模增长时,近线性加速可实现 |
| 可扩展性测量 | 掌握 | 效率 = 加速比 ÷ 线程数,找到效率拐点 |
perf lock |
掌握 | perf lock record + perf lock report 定位争用 |
| Cache Line 对齐 | 掌握 | alignas(64) 隔离热数据,分离冷热,重排成员 |
| 锁粒度优化 | 掌握 | 全局锁→读写锁→锁分片,不同 key 完全并行 |
| 内存序降级 | 掌握 | seq_cst→acq_rel→relaxed,每降一级有显著收益 |
| 线程数调优 | 掌握 | CPU 密集 = 核心数,IO 密集 = 核心数 × (1+wait/compute) |
| 火焰图 | 掌握 | perf record -g + FlameGraph,宽度 = CPU 时间,锁定热点 |
系列完结
🎉 至此,C++17 多线程系列全部 10 篇文章已经完成!
| 篇章 | 内容 | 难度 |
|---|---|---|
| Part 1 | 线程基础------std::thread | ★☆☆☆☆ |
| Part 2 | 共享数据与同步------mutex/cv | ★★☆☆☆ |
| Part 3 | 原子操作与内存模型 | ★★★☆☆ |
| Part 4 | 异步编程------future/promise/async | ★★★☆☆ |
| Part 5 | C++17 并行算法 | ★★☆☆☆ |
| Part 6 | 高级同步------shared_mutex/scoped_lock | ★★★☆☆ |
| Part 7 | 线程池------从零实现 | ★★★★☆ |
| Part 8 | 并发模式------Producer-Consumer 等 | ★★★★☆ |
| Part 9 | 调试与排障------TSAN/Helgrind/GDB | ★★★★☆ |
| Part 10 | 性能优化------从测量到调优 | ★★★★★ |
推荐工具
perf top -p <pid>------ 实时热点函数perf stat -e cycles,instructions,cache-misses------ 事件统计perf record -F 99 -g -a -- sleep 30------ 全系统火焰图采样- FlameGraph ------ 火焰图生成脚本
pahole------ 分析结构体内存布局- Intel VTune / AMD uProf ------ 更强大的性能分析(需 GUI)