计算机视觉入门到实战系列（十六）基于空间约束的k-means图像分割

基于空间约束的k-means图像分割

- 代码实现

上节中我们实现了基于k-means的图像分割，这节我们实现基于空间约束的k-means图像分割。

代码实现

添加空间信息

这里我们在之前的基础上添加相应的坐标信息。部分函数已经在上一节实现了，这里不再重复赘述。这里解释下为什么要在之前的基础上添加相应的坐标信息。

如果只用RGB颜色特征，聚类结果可能：

将颜色相似但位置很远的像素归为一类
导致分割结果不连续、碎片化

添加坐标信息的好处：

空间连续性：鼓励位置相近的像素聚类在一起
区域一致性：更容易形成连续的区域分割
控制分割粒度 ：通过weight参数可以调整空间信息的重要性
- weight=0：纯颜色聚类
- weight越大：越重视空间位置，分割区域越紧凑

python 复制代码

image = imread('segmentation.jpeg')[:,:,:3]
# 将RGB值统一到0-255内
if np.max(image)>1:
    image = image / 255
    

sp = image.shape
# 增加xy坐标的信息
# 设定一个权重，对坐标信息加权
weight = 2
y = weight * np.array([[i for i in range(sp[1])] 
                       for j in range(sp[0])]) / sp[0] / sp[1]
x = weight * np.array([[j for i in range(sp[1])] 
                       for j in range(sp[0])])/ sp[0] / sp[1]
image = np.append(image, x.reshape(sp[0], sp[1], 1), axis=2)
image = np.append(image, y.reshape(sp[0], sp[1], 1), axis=2)

X = image.reshape(-1, image.shape[2])
segmented_imgs = []

# 将 K 分别设置为6、5、4、3、2
n_colors = (6, 5, 4, 3, 2)
for n_cluster in n_colors:
    kmeans = KMeans(n_clusters=n_cluster, random_state=42).fit(X)
    segmented_img = kmeans.cluster_centers_[kmeans.labels_]
    segmented_imgs.append(decode_segmap(
        kmeans.labels_.reshape(image.shape[0],
                               image.shape[1])).astype(np.uint8))

这段代码是在给图像像素添加空间位置信息，然后用于K-means聚类分割。让我详细解释每个部分的作用：

获取图像形状

python 复制代码

sp = image.shape  # 得到 (高度, 宽度, 通道数)
# 对于RGB图像通常是 (h, w, 3)

创建y坐标特征矩阵

python 复制代码

y = weight * np.array([[i for i in range(sp[1])] 
                       for j in range(sp[0])]) / sp[0] / sp[1]

生成一个与图像同尺寸的矩阵，每个位置的值是该像素的列坐标（从0到宽度-1）
range(sp[1]) 创建列索引 [0, 1, 2, ..., 宽度-1]
外层的 for j in range(sp[0]) 重复这个行，形成一个矩阵
/ sp[0] / sp[1] 将坐标归一化到[0,1]范围（除以总像素数）
weight = 2 给坐标信息一个权重，控制它在聚类中的重要性

实际效果：创建一个矩阵，从左到右值从0线性增加到1

创建x坐标特征矩阵

python 复制代码

x = weight * np.array([[j for i in range(sp[1])] 
                       for j in range(sp[0])]) / sp[0] / sp[1]

生成一个与图像同尺寸的矩阵，每个位置的值是该像素的行坐标（从0到高度-1）
注意这里的循环变量交换了：j 在外层循环（行索引）

实际效果：创建一个矩阵，从上到下值从0线性增加到1

将坐标信息添加到图像特征中

python 复制代码

image = np.append(image, x.reshape(sp[0], sp[1], 1), axis=2)
image = np.append(image, y.reshape(sp[0], sp[1], 1), axis=2)

axis=2 表示在通道维度拼接
原来图像是 (h, w, 3) - RGB三通道
添加x坐标后变成 (h, w, 4)
再添加y坐标后变成 (h, w, 5)

重塑为特征矩阵

python 复制代码

X = image.reshape(-1, image.shape[2])

将 (h, w, 5) 重塑为 (h×w, 5) 的矩阵
每一行代表一个像素的5维特征：[R, G, B, x坐标, y坐标]

举例说明：

假设有一张图，左上角和右下角都有相似的红色：

只用RGB：这两个红色区域会被分到同一类
添加xy坐标：由于位置差异大，会被分到不同类，形成更合理的分割

循环嵌套

python 复制代码

y = weight * np.array([[i for i in range(sp[1])] 
                       for j in range(sp[0])]) / sp[0] / sp[1]

对于上面的循环嵌套，乍一看可能会很懵这里详细解释下。

首先理解sp[0]和sp[1]

python 复制代码

sp = image.shape  # (高度, 宽度, 通道数)
sp[0] = 图像的高度 (行数)
sp[1] = 图像的宽度 (列数)

简化这个表达式
原始的代码可以分解为两步：

第一步：创建y坐标矩阵

python 复制代码

# 创建一个列表推导式
matrix = []
for j in range(sp[0]):  # 遍历所有行
    row = []
    for i in range(sp[1]):  # 遍历所有列
        row.append(i)  # 当前列的位置
    matrix.append(row)

# 等价于
y_matrix = np.array([[i for i in range(sp[1])] for j in range(sp[0])])

举个例子：

如果图像尺寸是 3×4（高3像素，宽4像素）：

python 复制代码

# range(sp[1]) = range(4) = [0, 1, 2, 3]
# 这个值在每一行都重复

y_matrix = [
    [0, 1, 2, 3],  # 第一行
    [0, 1, 2, 3],  # 第二行  
    [0, 1, 2, 3]   # 第三行
]

关键点 ：这个矩阵的列值从左到右递增 ，但是每一行的列值都相同

然后是归一化和加权

python 复制代码

y = weight * y_matrix / sp[0] / sp[1]

实际上是：

python 复制代码

y = weight * y_matrix / (sp[0] * sp[1])

继续上面的例子：

python 复制代码

weight = 2
sp[0] = 3, sp[1] = 4
除数 = 3 × 4 = 12

原始值：[0, 1, 2, 3]
归一化后：[0/12, 1/12, 2/12, 3/12] = [0, 0.0833, 0.1667, 0.25]
加权后：×2 = [0, 0.1667, 0.3333, 0.5]

最终y矩阵：
[
    [0, 0.1667, 0.3333, 0.5],
    [0, 0.1667, 0.3333, 0.5],
    [0, 0.1667, 0.3333, 0.5]
]

展示结果

python 复制代码

# 展示结果
plt.figure(figsize=(12,8))
plt.subplot(231)
plt.imshow(image[:,:,:3])
plt.title('Original image')
plt.axis('off')

for idx,n_clusters in enumerate(n_colors):
    plt.subplot(232+idx)
    plt.imshow(segmented_imgs[idx])
    plt.title('{} center'.format(n_clusters))
    plt.axis('off')

输出