Mandelbrot集合的多线程并行计算加速

最近研究并行计算，找了CS149的网课看看，顺便做做作业QAQ

代码下载：Mandelbrot集合的多线程并行计算加速代码文件

Mandelbrot集合的多线程并行计算加速
- [1、Mandelbrot 集合介绍](#1、Mandelbrot 集合介绍)
- - [1.1 Mandelbrot 图像的产生过程（计算视角）](#1.1 Mandelbrot 图像的产生过程（计算视角）)
  - - [1️⃣ 选定计算区域（复平面上的矩形）](#1️⃣ 选定计算区域（复平面上的矩形）)
    - [2️⃣ 建立像素网格（离散化）](#2️⃣ 建立像素网格（离散化）)
    - [3️⃣ 像素 → 复数点的映射](#3️⃣ 像素 → 复数点的映射)
    - [4️⃣ 对每个点做 Mandelbrot 迭代](#4️⃣ 对每个点做 Mandelbrot 迭代)
    - [5️⃣ 发散判定（阈值判断）](#5️⃣ 发散判定（阈值判断）)
    - [6️⃣ 最大迭代次数限制（防止无限循环）](#6️⃣ 最大迭代次数限制（防止无限循环）)
    - [7️⃣ 迭代次数 → 像素值](#7️⃣ 迭代次数 → 像素值)
- 2、Mandelbrot图并行计算
- - [2.1、连续块划分（block / chunk）](#2.1、连续块划分（block / chunk）)
  - - 特点
    - 代码
  - [2.2、交错划分（cyclic / interleaved）](#2.2、交错划分（cyclic / interleaved）)
  - - 示意图
    - 代码
- 2.3、性能对比

1、Mandelbrot 集合介绍

对复数z = x + yi：
z n + 1 = z n 2 + c , z 0 = 0 z_{n+1} = z_n^2 + c,\quad z_0 = 0 zn+1=zn2+c,z0=0

如果在不断迭代后 |z| 超过某个阈值（通常是 2），就认为这个点 不属于 Mandelbrot 集。

好，这里我按**"图是怎么一步一步算出来的"来讲，用流程化 + 偏实现视角**的描述，正好对应你代码里在做的事（而不是纯数学定义）。

1.1 Mandelbrot 图像的产生过程（计算视角）

1️⃣ 选定计算区域（复平面上的矩形）

首先，在复平面中选一个有限的矩形区域作为"观察窗口"：

复制代码

x ∈ [x0, x1]
y ∈ [y0, y1]

这个矩形决定了你最终看到的是 Mandelbrot 集的哪一块（缩放、平移本质上就是在改这个矩形）。

2️⃣ 建立像素网格（离散化）

接着，指定输出图像大小：

复制代码

width  ×  height

这一步的含义是：

把连续的复平面矩形 离散成一个规则网格，每个网格点对应一个像素。

由此得到步长：

复制代码

dx = (x1 - x0) / width
dy = (y1 - y0) / height

3️⃣ 像素 → 复数点的映射

对图像中的每一个像素 (i, j)：

复制代码

x = x0 + i * dx
y = y0 + j * dy
c = x + yi

也就是说：
每个像素都代表复平面中的一个复数 c。

4️⃣ 对每个点做 Mandelbrot 迭代

对当前复数点 c，执行如下迭代：

复制代码

z0 = 0
zn+1 = zn² + c

这是一个局部、互不依赖的计算过程，因此非常适合并行。

5️⃣ 发散判定（阈值判断）

在迭代过程中，每一步都会检查：

复制代码

|zn| > threshold ?

通常 threshold = 2
一旦超过阈值，就认为该点 已经发散
记录此时的迭代次数 n

6️⃣ 最大迭代次数限制（防止无限循环）

为了避免无限计算，引入最大迭代次数，例如：

复制代码

maxIterations = 255

规则是：

如果在 255 次迭代内发散
→ 返回实际发散的迭代次数
如果 迭代 255 次仍未发散
→ 停止迭代，认为该点"属于 Mandelbrot 集内部或边界附近"

7️⃣ 迭代次数 → 像素值

最终：

复制代码

output[j * width + i] = iteration_count

这个值通常被用于：

直接作为灰度值
或作为调色函数的输入（生成彩色分形图）

2、Mandelbrot图并行计算

2.1、连续块划分（block / chunk）

每个线程拿一整块连续的行

复制代码

行号:   0  1  2  3  4  5  6  7  8  9 10 11
        ───────────────────────────────
线程0:  █  █  █  █
线程1:              █  █  █  █
线程2:                          █  █  █  █

等价于：

thread 0 → 行 0,1,2,3
thread 1 → 行 4,5,6,7
thread 2 → 行 8,9,10,11

特点

✅ cache 友好

✅ 实现简单

❌ 如果不同区域算得"慢/快"不均，会负载不平衡

代码

cpp 复制代码

void workerThreadStartBlock(WorkerArgs * const args) {

    int tid = args->threadId;
    int T   = args->numThreads;
    int H   = (int)args->height;

    int rowsPerThread = H / T;
    int remainder     = H % T;

    int startRow = tid * rowsPerThread + (tid < remainder ? tid : remainder);
    int numRows  = rowsPerThread + (tid < remainder ? 1 : 0);

    if (numRows <= 0) return;

    mandelbrotSerial(
        args->x0, args->y0, args->x1, args->y1,
        (int)args->width, (int)args->height,
        startRow, numRows,
        args->maxIterations,
        args->output
    );
}

2.2、交错划分（cyclic / interleaved）

行号按线程号取模

规则：

复制代码

行 j 交给线程 (j % T)

示意图

复制代码

行号:   0  1  2  3  4  5  6  7  8  9 10 11
        ───────────────────────────────
线程0:  █        █        █        █
线程1:     █        █        █        █
线程2:        █        █        █        █

等价于：

thread 0 → 行 0, 3, 6, 9
thread 1 → 行 1, 4, 7, 10
thread 2 → 行 2, 5, 8, 11

这就是你看到的那句话：

线程 t 算 t, t+T, t+2T... 行

代码

cpp 复制代码

void workerThreadStartCyclic(WorkerArgs * const args) {

    int tid = args->threadId;
    int T   = args->numThreads;
    int H   = (int)args->height;

    // Interleaved partitioning: Thread tid is responsible for row tid, tid+T, tid+2T, ...
    for (int j = tid; j < H; j += T) {
        mandelbrotSerial(
            args->x0, args->y0, args->x1, args->y1,
            (int)args->width, (int)args->height,
            j,                 // startRow
            1,                 // numRows
            args->maxIterations,
            args->output
        );
    }
}

2.3、性能对比

运行代码结果如下：

bash 复制代码

[mandelbrot serial]:            [692.776] ms
Wrote image file mandelbrot-serial.ppm
[mandelbrot thread block]:              [92.090] ms
Wrote image file mandelbrot-thread.ppm
[mandelbrot thread cyclic]:             [49.905] ms
                                (7.52x speedup: serial vs thread block, 16 threads)
                                (13.88x speedup: serial vs thread cyclic, 16 threads)

实际上交错划分性能更好：

简单来说，Mandelbrot 图像中，不同位置的点发散速度不同，导致不同图像行的计算量不均匀 。

在 block（连续块）划分 中，复杂区域的多行可能集中分配给某一个线程，使该线程成为瓶颈，其他线程空闲等待，从而降低整体性能。

而 cyclic（交错）划分将不同行均匀分配给各个线程，把计算量较大的区域打散到多个线程中，显著改善负载均衡，因此整体执行时间更短、加速比更高。

Mandelbrot集合的多线程并行计算加速