通用算法RANSAC优化完总结2

矩形建模2

python 复制代码

def estimate_rectangle_2d(
    sample_points: np.ndarray,
    eval_points: np.ndarray,
    distance_threshold: float,
    batch_size: int,
    soft_threshold: bool = True,
    temperature: float = SOFT_TEMPERATURE,
    chunk: int = 256,
    rng: np.random.Generator | None = None,
) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
    """RANSAC estimation of a 2D rectangle --- **build a complete rectangle per trial and score all four finite edges jointly**.

    Unlike the greedy two-stage approach (fit one pair of parallel edges, then the
    perpendicular pair), each candidate here samples 5 points to build a complete
    rectangle, scored as a whole against its 4 **finite** edges (closed polyline).
    Because scoring treats the rectangle as a unit with finite edges, **it cannot
    collapse into an "end-cap strip"** --- for ring-like point sets (outer frame +
    inner hole) the outer frame wins on perimeter point count, leaving the hole
    for subsequent primitives.

    Each trial samples 5 points:
        p0, p1 -> main direction d = normalize(p1 - p0)
        p2     -> the other edge parallel to d (normal offset)
        p3, p4 -> the two edges in the perpendicular direction (along n)

    Parameters
    ----------
    sample_points : (N, 2) array     sampling pool (N >= 5)
    eval_points : (M, 2) array       scoring cloud
    distance_threshold : float
    batch_size : int                 total number of candidate rectangles
    chunk : int                      candidates scored per vectorised block (memory ~ M*chunk)

    Returns
    -------
    vertices : (4, 2) array          the rectangle's 4 vertices (closed-loop order)
    inlier_mask : (M,) bool
    outlier_mask : (M,) bool
    """
    n_sample = len(sample_points)
    if n_sample < 5:
        raise ValueError(
            f"estimate_rectangle_2d: need >= 5 points, got {n_sample}")
    if distance_threshold < 0.0:
        raise ValueError(f"distance_threshold must be >= 0, got {distance_threshold}")
    if batch_size < 1:
        raise ValueError(f"batch_size must be >= 1, got {batch_size}")

    if rng is None:
        rng = np.random.default_rng()
    P = np.asarray(sample_points, dtype=np.float64)
    X = np.asarray(eval_points, dtype=np.float64)
    t = distance_threshold

    best_score = -np.inf
    best_verts = None
    for s0 in range(0, batch_size, chunk):
        b = min(s0 + chunk, batch_size) - s0
        idx = rng.integers(0, n_sample, size=(b, 5))       # (b, 5)
        p0, p1, p2, p3, p4 = (P[idx[:, k]] for k in range(5))

        d = p1 - p0
        d_len = np.linalg.norm(d, axis=1)                  # (b,)
        valid = d_len > 1e-9
        d = d / np.where(valid, d_len, 1.0)[:, None]
        n = np.stack([-d[:, 1], d[:, 0]], axis=1)

        off_n0 = np.einsum('ij,ij->i', p0, n)
        off_n1 = np.einsum('ij,ij->i', p2, n)
        off_d0 = np.einsum('ij,ij->i', p3, d)
        off_d1 = np.einsum('ij,ij->i', p4, d)
        a = np.minimum(off_n0, off_n1)                     # sorted(off_n)
        bb = np.maximum(off_n0, off_n1)
        c = np.minimum(off_d0, off_d1)                     # sorted(off_d)
        e = np.maximum(off_d0, off_d1)

        valid = valid & ((bb - a) >= 2.0 * t) & ((e - c) >= 2.0 * t)

        verts = np.stack([
            c[:, None] * d + a[:, None] * n,
            e[:, None] * d + a[:, None] * n,
            e[:, None] * d + bb[:, None] * n,
            c[:, None] * d + bb[:, None] * n,
        ], axis=1)

        resid = _rectangle_distances(X, d, n, a, bb, c, e)             # (M, b)
        scores = _candidate_scores(resid, t, soft_threshold, temperature)  # (b,)
        scores = np.where(valid, scores, -np.inf)
        j = int(np.argmax(scores))
        if scores[j] > best_score:
            best_score = float(scores[j])
            best_verts = verts[j].copy()

    if best_verts is None or not np.isfinite(best_score):
        raise RuntimeError("estimate_rectangle_2d: no valid rectangle found")

    in_mask = cloud_polyline_2d_distances(X, best_verts, closed=True) <= t
    return best_verts, in_mask, ~in_mask

这段 estimate_rectangle_2d(...) 的核心，就是用 RANSAC 随机生成很多候选矩形，逐个评分，最后留下最符合点云轮廓的那个矩形。

一、整体流程

复制代码

sample_points
    ↓ 随机抽 5 个点
构造一个候选矩形
    ↓
计算所有评估点到该"有限矩形边界"的距离
    ↓
软评分 / 硬评分，选当前最优矩形
    ↓
所有批次比较，得到全局最优矩形
    ↓
hard threshold 判断最终内点
    ↓
返回矩形顶点、内点、剩余点

二、5 个点分别做什么

每个候选矩形随机抽：

复制代码

p0, p1, p2, p3, p4

它们不是矩形四个角点，而是用于定义矩形的方向和四条边：

复制代码

p0, p1：确定主方向 d
p2：确定与 d 平行的另一条边的位置
p3, p4：确定与 n 平行的左右两条边位置

其中：

复制代码

d：矩形横向方向
n：垂直于 d 的方向

矩形在局部坐标中写成：

复制代码

c <= u <= e
a <= w <= bb

u = x · d
w = x · n

也就是：

复制代码

c、e：沿 d 方向的左右边界
a、bb：沿 n 方向的上下边界

三、最重要的数据格式（shape）

假设：

复制代码

sample_points.shape = (1000, 2)
eval_points.shape   = (800, 2)
chunk = 256

那么当前批次通常有：

复制代码

b = 256

变量	shape	含义
`sample_points` / `P`	`(N, 2)`	N 个用来随机抽样的二维点
`eval_points` / `X`	`(M, 2)`	M 个用于评分的二维点
`idx`	`(b, 5)`	b 个候选矩形，每个候选抽 5 个点索引
`p0 ~ p4`	`(b, 2)`	每个候选矩形对应的一个二维点
`d`	`(b, 2)`	每个候选矩形的主方向单位向量
`n`	`(b, 2)`	每个候选矩形的垂直方向单位向量
`d_len`	`(b,)`	每个候选的 `p0 → p1` 长度
`valid`	`(b,)`	每个候选是否合法
`a, bb, c, e`	`(b,)`	每个候选矩形的四条边界位置
`verts`	`(b, 4, 2)`	b 个矩形，每个矩形有 4 个二维顶点
`resid`	`(M, b)`	M 个点到 b 个候选矩形的距离
`scores`	`(b,)`	每个候选矩形一个总分
`best_verts`	`(4, 2)`	最终最佳矩形的 4 个顶点
`in_mask`	`(M,)`	M 个评估点是否属于最终矩形

四、几个 shape 的直观例子

1. `(N, 2)`：一堆二维点

复制代码

P =
[
    [10.0, 20.0],
    [15.0, 20.0],
    [10.0, 30.0],
    [15.0, 30.0],
]

shape = (4, 2)

4 个点；
每个点有 [x, y] 两个坐标。

2. `(b, 5)`：每个候选抽 5 个点

假设当前批次只有 3 个候选矩形：

复制代码

idx =
[
    [12, 88, 305, 7, 900],
    [41, 42, 111, 560, 8],
    [99, 300, 2, 701, 450],
]

shape = (3, 5)

含义：

复制代码

候选 0：用 P[12], P[88], P[305], P[7], P[900]
候选 1：用 P[41], P[42], P[111], P[560], P[8]
候选 2：用 P[99], P[300], P[2], P[701], P[450]

3. `(b, 2)`：每个候选有一个点或向量

复制代码

p0.shape = (256, 2)
d.shape  = (256, 2)
n.shape  = (256, 2)

例如：

复制代码

d =
[
    [1.0, 0.0],
    [0.6, 0.8],
    [-0.2, 0.98],
]

意思是每一行对应一个候选矩形的方向。

4. `(b,)`：每个候选对应一个数

复制代码

scores =
[
    120.5,
    98.3,
    201.7,
]

scores.shape = (3,)

表示：

复制代码

候选 0 分数：120.5
候选 1 分数：98.3
候选 2 分数：201.7

分数最高的候选矩形胜出。

5. `(b, 4, 2)`：每个候选有 4 个二维顶点

复制代码

verts[0] =
[
    [0, 0],
    [10, 0],
    [10, 6],
    [0, 6],
]

verts.shape = (b, 4, 2)

b 个候选；
每个候选 4 个角点；
每个角点有 x、y 两个数。

6. `(M, b)`：距离评分表

复制代码

resid.shape = (800, 256)

表示：

复制代码

800 个评估点；
256 个候选矩形；
每一个点都要分别计算到每一个候选矩形的距离。

例如：

复制代码

resid[10, 3]

意思是：

复制代码

第 10 个评估点
到
第 3 个候选矩形
的距离。

五、为什么距离是"有限矩形边界距离"

这里不是简单算点到四条无限直线的距离，而是点到真正矩形边框的最近距离。

例如，一个点虽然靠近矩形上边的延长线，但已经跑到矩形右侧很远：

复制代码

无限直线距离：可能很小
有限矩形边界距离：会更大，因为最近位置是右上角

这样可以防止算法把一段很短的线、边的延长线，误认为是完整矩形边界。

六、评分和最终内点的区别

候选阶段：可用软评分

复制代码

scores = _candidate_scores(...)

软评分的特点：

复制代码

距离越近，贡献越高；
距离越远，贡献逐渐接近 0。

它适合在很多候选中比较"谁整体贴合得更好"。

最终输出阶段：一定用硬阈值

复制代码

in_mask = distance <= threshold

距离 <= t：属于这个矩形
距离 > t：不属于这个矩形

这个 in_mask 后续用于从 remaining_2d 中删除已归属于矩形的点，把剩余点留给 circle、line 等其他图元继续拟合。

七、最终一句话总结

复制代码

(N, 2)：N 个二维点

(b, 5)：b 个候选矩形，每个候选随机抽 5 个点

(b, 2)：b 个候选，每个候选对应一个二维点或二维方向

(b,)：b 个候选，每个候选对应一个标量

(b, 4, 2)：b 个候选矩形，每个矩形有 4 个二维顶点

(M, b)：M 个评估点到 b 个候选矩形的距离表

(M,)：M 个评估点是否属于最终胜出矩形

整个函数的本质是：

复制代码

5 点生成候选矩形
→ 有限边界距离评价候选
→ 评分选最佳矩形
→ hard threshold 确定最终内点
→ 删除内点，剩余点继续拟合其他图元

代码执行顺序 + 关键 shape 简洁看。

复制代码

P = np.asarray(sample_points, dtype=np.float64)   # (N, 2)
X = np.asarray(eval_points, dtype=np.float64)     # (M, 2)

P：随机抽样点池，N 个二维点
X：评分点集，M 个二维点
一个二维点是 [x, y]

复制代码

for s0 in range(0, batch_size, chunk):
    b = min(s0 + chunk, batch_size) - s0

不一次生成全部候选矩形，而是分批处理。

b：当前批有多少个候选矩形
通常 b = chunk，例如 256

复制代码

idx = rng.integers(0, n_sample, size=(b, 5))

idx.shape = (b, 5)

b 个候选矩形；
每个候选随机抽 5 个点索引。

例如：

复制代码

idx =
[
    [12, 88, 305, 7, 900],   # 候选 0
    [41, 42, 111, 560, 8],   # 候选 1
]

复制代码

p0, p1, p2, p3, p4 = (P[idx[:, k]] for k in range(5))

p0 ~ p4 的 shape 都是 (b, 2)

每一行对应一个候选矩形的一个二维点。

例如：

复制代码

p0[0]：候选 0 的第一个点
p1[0]：候选 0 的第二个点

python 复制代码

d = p1 - p0
d_len = np.linalg.norm(d, axis=1)
d = d / d_len[:, None]

n = np.stack([-d[:, 1], d[:, 0]], axis=1)

复制代码

d.shape = (b, 2)：每个候选的主方向
n.shape = (b, 2)：每个候选的垂直方向
d_len.shape = (b,)：每个候选方向的长度

作用：

复制代码

p0、p1 决定矩形朝哪个方向摆放；
n 是 d 的垂直方向。

python 复制代码

off_n0 = np.einsum('ij,ij->i', p0, n)
off_n1 = np.einsum('ij,ij->i', p2, n)

off_d0 = np.einsum('ij,ij->i', p3, d)
off_d1 = np.einsum('ij,ij->i', p4, d)

复制代码

每个结果都是 (b,)

作用：

复制代码

p0、p2 决定 n 方向的两条边；
p3、p4 决定 d 方向的两条边。

复制代码

a = np.minimum(off_n0, off_n1)
bb = np.maximum(off_n0, off_n1)

c = np.minimum(off_d0, off_d1)
e = np.maximum(off_d0, off_d1)

a, bb, c, e 的 shape 都是 (b,)

每个候选矩形满足：

复制代码

c <= x·d <= e
a <= x·n <= bb

也就是它的四条边界已经确定。

复制代码

valid = valid & ((bb - a) >= 2.0 * t) & ((e - c) >= 2.0 * t)

valid.shape = (b,)

过滤掉太小、退化或方向不可靠的矩形。

复制代码

verts = np.stack([
    c[:, None] * d + a[:, None] * n,
    e[:, None] * d + a[:, None] * n,
    e[:, None] * d + bb[:, None] * n,
    c[:, None] * d + bb[:, None] * n,
], axis=1)

verts.shape = (b, 4, 2)

意思：

复制代码

b 个候选矩形；
每个矩形 4 个角点；
每个角点是 [x, y]。

例如：

复制代码

verts[0] =
[
    [0, 0],
    [10, 0],
    [10, 6],
    [0, 6],
]

复制代码

resid = _rectangle_distances(X, d, n, a, bb, c, e)

resid.shape = (M, b)

含义：

复制代码

M 个评估点；
分别计算它们到 b 个候选矩形有限边界的距离。

resid[i, j]

就是：

复制代码

第 i 个点到第 j 个候选矩形的距离。

复制代码

scores = _candidate_scores(resid, t, soft_threshold, temperature)
scores = np.where(valid, scores, -np.inf)

scores.shape = (b,)

每个候选矩形最终有一个分数。

复制代码

软评分：距离近的点贡献更高。
hard 评分：距离 <= t 的点数量更多，分数更高。
无效矩形直接设为 -inf，不可能被选中。

复制代码

j = int(np.argmax(scores))

找到当前这一批中分数最高的候选。

复制代码

best_verts = verts[j].copy()

best_verts.shape = (4, 2)

保存当前最好的矩形四个顶点。

复制代码

in_mask = cloud_polyline_2d_distances(
    X, best_verts, closed=True
) <= t

in_mask.shape = (M,)

这是最后真正的 hard threshold：

复制代码

True：该点距离最终矩形边界 <= t，属于矩形
False：不属于矩形，留给后续图元继续拟合

最后：

复制代码

return best_verts, in_mask, ~in_mask

best_verts：(4, 2)，最终矩形
in_mask：(M,)，矩形内点
~in_mask：(M,)，剩余点

整体就是：

复制代码

P(N,2)
→ 抽 5 点 idx(b,5)
→ 构造 b 个矩形 verts(b,4,2)
→ 所有 M 个点给 b 个矩形评分 resid(M,b)
→ 选最高分矩形
→ hard threshold 分出矩形点和剩余点

平面拟合：

先讲 3D 平面：estimate_plane_3d 。它是后面很多图元的基础。代码里有两种情况：自由法向平面 和 固定法向平面。

1. 它要拟合什么

一个平面可写成：

复制代码

n · x = d

n：平面法向量，shape = (3,)
x：空间中的一个 3D 点，shape = (3,)
d：平面沿法向方向的位置，也叫 offset，标量

点 X[i] 到平面的距离是：

复制代码

|n · X[i] - d|

小于阈值 t，就是平面内点。

2. 输入数据

复制代码

X = np.asarray(eval_points, dtype=np.float64)  # (M, 3)

sample_points：(N, 3)，随机抽样用
eval_points / X：(M, 3)，给候选平面评分用

例如：

复制代码

X =
[
    [10, 20, 5],
    [12, 18, 5.2],
    [30, 40, 50],
]

每行是一个三维点 [x, y, z]。

3. 情况 A：法向量固定

例如你已知平面应该接近水平，法向量大概是：

复制代码

fixed_normal = [0, 0, 1]

代码：

复制代码

normal = np.asarray(fixed_normal, dtype=np.float64)
normal = normal / np.linalg.norm(normal)

s_eval = X @ normal                  # (M,)
idx = rng.integers(0, n_sample, size=batch_size)
cand_d = sample_points[idx] @ normal # (B,)

含义：

复制代码

normal：(3,)，所有候选共用同一个法向量
s_eval：(M,)，每个评估点在 normal 方向上的投影
cand_d：(B,)，B 个候选平面的 offset

这里不需要随机抽 3 个点定平面，因为方向已经确定了。

每次只需随机抽一个点，决定平面沿法向量移动到哪里：

复制代码

固定方向 + 一个 offset
= 一个完整平面

评分时：

db = cand_d $s:e$ # (b,)

resid = np.abs(s_eval $:, None$ - db $None, :$ ) # (M, b)

scores = _candidate_scores(resid, t, ...)

resid $i, j$ ：

第 i 个评估点到第 j 个候选平面的距离

scores.shape = (b,)

计算所有点到几何图元的距离后，然后输入到评分函数中去得到每个候选平面一个分数

复制代码

db = cand_d[s:e]                              # (b,)
resid = np.abs(s_eval[:, None] - db[None, :]) # (M, b)
scores = _candidate_scores(resid, t, ...)

python 复制代码

resid[i, j]：
第 i 个评估点到第 j 个候选平面的距离

scores.shape = (b,)
每个候选平面一个分数

最后选分数最高的 offset。

4. 情况 B：法向量不固定

这是普通 RANSAC 平面拟合。

复制代码

idx = rng.integers(0, n_sample, size=(batch_size, 3))

python 复制代码

idx.shape = (B, 3)

B 个候选平面；
每个候选随机抽 3 个点。

因为：

复制代码

三个不共线的 3D 点
可以唯一确定一个平面。

接着：

python 复制代码

p1 = sample_points[idx[:, 0]]  # (B, 3)

normals, offsets, valid = estimate_plane_batch(
    p1,
    sample_points[idx[:, 1]],
    sample_points[idx[:, 2]],
)

p1 = sample_points $idx\[:, 0$ ] # (B, 3)

normals, offsets, valid = estimate_plane_batch(

p1,

sample_points $idx\[:, 1$ ],

sample_points $idx\[:, 2$ ],

)

normals.shape = (B, 3)
offsets.shape = (B,)
valid.shape = (B,)

normals $j$ ：第 j 个候选平面的法向量
offsets $j$ ：第 j 个候选平面的 d
valid $j$ ：这 3 个点是否能正常构成平面

输出：

python 复制代码

normals.shape = (B, 3)
offsets.shape = (B,)
valid.shape   = (B,)

含义：

python 复制代码

normals[j]：第 j 个候选平面的法向量
offsets[j]：第 j 个候选平面的 d
valid[j]：这 3 个点是否能正常构成平面

如果三个点重合或几乎共线，就无法稳定求法向量，该候选会被标记为无效。

5. 批量计算点到候选平面的距离

复制代码

Nb, db, vb = normals[s:e], offsets[s:e], valid[s:e]

proj = X @ Nb.T
resid = np.abs(proj - db[None, :])

shape：

复制代码

X.shape  = (M, 3)
Nb.shape = (b, 3)

Nb.T.shape = (3, b)

proj.shape = (M, b)
resid.shape = (M, b)

这里：

复制代码

proj[i, j]

表示：

复制代码

第 i 个评估点
在第 j 个候选平面法向量上的投影值。

然后：

复制代码

resid[i, j]

就是：

复制代码

第 i 个点到第 j 个候选平面的距离。

6. 对候选平面评分

scores = _candidate_scores(resid, t, soft_threshold, temperature)

scores = np.where(vb, scores, -np.inf)

python 复制代码

scores = _candidate_scores(resid, t, soft_threshold, temperature)
scores = np.where(vb, scores, -np.inf)

复制代码

scores.shape = (b,)

逻辑和矩形完全一样：

复制代码

soft score：
距离越近，贡献越高。

hard score：
距离 <= t 的点越多，分越高。

无效平面：
分数直接设为 -inf，不会被选中。

然后：

复制代码

j = int(np.argmax(scores))

找到当前批次最好的平面。

7. 最终 hard threshold

复制代码

final_distances = np.abs(
    X @ best_normal - (best_normal @ best_point)
)

inlier_mask = final_distances <= distance_threshold

shape：

复制代码

final_distances.shape = (M,)
inlier_mask.shape     = (M,)

True：该点属于最终平面
False：该点留给后续图元

最终返回：

复制代码

return best_normal, best_point, inlier_mask, outlier_mask

best_normal：(3,)，最终平面法向量
best_point：(3,)，最终平面上的一个点
inlier_mask：(M,)，平面内点
outlier_mask：(M,)，剩余点

8. 整体流程

sample_points(N,3)

→ 抽 3 点 idx(B,3)
→ normals(B,3) + offsets(B,)
→ 计算 M 个点到 B 个候选平面的距离 resid(M,B)

→ scores(B,)
→ 选最佳 normal(3,) 和 point(3,)
→ hard threshold 得到 inlier_mask(M,)

和矩形最大的区别是：

复制代码

平面是无限延伸的；
矩形是有限边界。

所以平面只看"点离平面有多远"，
矩形还必须判断点是否靠近真实的有限边，而不是边的延长线。

下一步接着应该讲 estimate_parallel_planes_3d ：它不是拟合一个平面，而是一次联合拟合 k 个共享同一法向量的平行平面。

通用算法RANSAC优化完总结2

矩形建模2

一、整体流程

二、5 个点分别做什么

三、最重要的数据格式（shape）

四、几个 shape 的直观例子

1. (N, 2)：一堆二维点

2. (b, 5)：每个候选抽 5 个点

3. (b, 2)：每个候选有一个点或向量

4. (b,)：每个候选对应一个数

5. (b, 4, 2)：每个候选有 4 个二维顶点

6. (M, b)：距离评分表

五、为什么距离是"有限矩形边界距离"

六、评分和最终内点的区别

候选阶段：可用软评分

最终输出阶段：一定用硬阈值

七、最终一句话总结

代码执行顺序 + 关键 shape 简洁看。

平面拟合：

1. 它要拟合什么

2. 输入数据

3. 情况 A：法向量固定

4. 情况 B：法向量不固定

5. 批量计算点到候选平面的距离

6. 对候选平面评分

7. 最终 hard threshold

8. 整体流程

1. `(N, 2)`：一堆二维点

2. `(b, 5)`：每个候选抽 5 个点

3. `(b, 2)`：每个候选有一个点或向量

4. `(b,)`：每个候选对应一个数

5. `(b, 4, 2)`：每个候选有 4 个二维顶点

6. `(M, b)`：距离评分表