pytorch_grad_cam 库学习笔记—— Ablation-CAM 算法的基类 AblationCAM 和 AblationLayer

AblationCAM

所在文件位置: ./pytorch-grad-cam/pytorch_grad_cam/ablation_cam.py

AblationCAM 是 BaseCAM 的一个具体实现，它遵循 Ablation-CAM 算法的原理：通过系统性地"移除"（消融）目标层的各个特征通道，然后测量模型预测分数的下降程度，来评估每个通道的重要性，从而生成热力图。与 Grad-CAM 等基于梯度的方法不同，Ablation-CAM 是一种梯度无关（gradient-free）的方法。

1. init(self, model, target_layers, reshape_transform=None, ablation_layer=None, batch_size=32, ratio_channels_to_ablate=1.0)

python 复制代码

class AblationCAM(BaseCAM):
    def __init__(self,
                 model: torch.nn.Module,
                 target_layers: List[torch.nn.Module],
                 reshape_transform: Callable = None,
                 ablation_layer: torch.nn.Module = AblationLayer(),
                 batch_size: int = 32,
                 ratio_channels_to_ablate: float = 1.0) -> None:

        super(AblationCAM, self).__init__(model,
                                          target_layers,
                                          reshape_transform,
                                          uses_gradients=False)
        self.batch_size = batch_size
        self.ablation_layer = ablation_layer
        self.ratio_channels_to_ablate = ratio_channels_to_ablate

功能：

初始化 AblationCAM 实例。

参数：

model, target_layers, reshape_transform: 与 BaseCAM 相同。

ablation_layer: 一个 torch.nn.Module 实例，用于执行消融操作。如果未提供，则使用默认的 AblationLayer() 实例。这个层将在前向传播时替换 target_layers，以实现特征图的修改。

batch_size: 在进行消融实验时，每次前向传播可以并行处理的消融通道数量。这可以加速计算，避免逐个通道进行前向传播。

ratio_channels_to_ablate: 一个浮点数（0-1），表示要实际进行消融实验的通道占总通道数的比例。1.0 表示消融所有通道。这是一个实验性优化，旨在通过 activations_to_be_ablated 方法筛选出"最值得"消融的通道，避免对不重要的通道进行耗时的消融测试。

关键操作：

调用父类 BaseCAM 的 init ，传入 uses_gradients=False，因为 Ablation-CAM 不需要计算梯度。

保存 batch_size 和 ablation_layer。

保存 ratio_channels_to_ablate。

2. save_activation(self, module, input, output)

python 复制代码

    def save_activation(self, module, input, output) -> None:
        """ Helper function to save the raw activations from the target layer """
        self.activations = output

功能：

一个简单的前向 Hook 回调函数，用于在模型前向传播时缓存目标层的原始激活值。

流程：

当目标层完成前向传播后，该函数被调用。

它将目标层的输出 output（即激活值）直接赋值给 self.activations。

目的：

在进行消融实验前，先获取一次完整的、未被修改的前向传播结果，以计算原始的预测分数 (original_score)，并缓存激活值供后续消融使用。

3. assemble_ablation_scores(self, new_scores, original_score, ablated_channels, number_of_channels)

python 复制代码

    def assemble_ablation_scores(self,
                                 new_scores: list,
                                 original_score: float,
                                 ablated_channels: np.ndarray,
                                 number_of_channels: int) -> np.ndarray:
        """ Take the value from the channels that were ablated,
            and just set the original score for the channels that were skipped """

        index = 0
        result = []
        sorted_indices = np.argsort(ablated_channels)
        ablated_channels = ablated_channels[sorted_indices]
        new_scores = np.float32(new_scores)[sorted_indices]

        for i in range(number_of_channels):
            if index < len(ablated_channels) and ablated_channels[index] == i:
                weight = new_scores[index]
                index = index + 1
            else:
                weight = original_score
            result.append(weight)

        return result

功能：

处理消融实验的结果，特别是当 ratio_channels_to_ablate < 1.0 时，即并非所有通道都进行了消融实验。

输入：

new_scores: 一个列表，包含实际被消融的通道所对应的预测分数。

original_score: 原始未消融时的预测分数。

ablated_channels: 一个数组，包含被实际消融的通道的索引。

number_of_channels: 目标层的总通道数。

流程：

对 ablated_channels 和 new_scores 按索引进行排序，确保它们一一对应。

创建一个空列表 result。

遍历所有通道（从 0 到 number_of_channels - 1）：

如果当前通道 i 在 ablated_channels 列表中（即进行了消融实验），则从 new_scores 中取出对应的 weight（即消融后的分数）。

否则（即该通道没有被消融），则将 weight 设为 original_score（原始分数）。

将 weight 添加到 result。

输出：

一个长度为 number_of_channels 的列表 result，其中包含了每个通道对应的预测分数。对于被消融的通道，分数是实际测量值；对于未被消融的通道，分数被"补全"为原始分数。

目的：

将不完整的消融实验结果（只做了部分通道）补全成一个与总通道数相等的完整分数列表，以便后续计算所有通道的权重。

4. get_cam_weights(self, input_tensor, target_layer, targets, activations, grads)

python 复制代码

    def get_cam_weights(self,
                        input_tensor: torch.Tensor,
                        target_layer: torch.nn.Module,
                        targets: List[Callable],
                        activations: torch.Tensor,
                        grads: torch.Tensor) -> np.ndarray:

        # Do a forward pass, compute the target scores, and cache the
        # activations
        handle = target_layer.register_forward_hook(self.save_activation)
        with torch.no_grad():
            outputs = self.model(input_tensor)
            handle.remove()
            original_scores = np.float32(
                [target(output).cpu().item() for target, output in zip(targets, outputs)])

        # Replace the layer with the ablation layer.
        # When we finish, we will replace it back, so the 
        # original model is unchanged.
        ablation_layer = self.ablation_layer
        replace_layer_recursive(self.model, target_layer, ablation_layer)

        number_of_channels = activations.shape[1]
        weights = []
        # This is a "gradient free" method, so we don't need gradients here.
        with torch.no_grad():
            # Loop over each of the batch images and ablate activations for it.
            for batch_index, (target, tensor) in enumerate(
                    zip(targets, input_tensor)):
                new_scores = []
                batch_tensor = tensor.repeat(self.batch_size, 1, 1, 1)

                # Check which channels should be ablated. Normally this will be all channels,
                # But we can also try to speed this up by using a low
                # ratio_channels_to_ablate.
                channels_to_ablate = ablation_layer.activations_to_be_ablated(
                    activations[batch_index, :], self.ratio_channels_to_ablate)
                number_channels_to_ablate = len(channels_to_ablate)

                for i in tqdm.tqdm(
                    range(
                        0,
                        number_channels_to_ablate,
                        self.batch_size)):
                    if i + self.batch_size > number_channels_to_ablate:
                        batch_tensor = batch_tensor[:(
                            number_channels_to_ablate - i)]

                    # Change the state of the ablation layer so it ablates the next channels.
                    # TBD: Move this into the ablation layer forward pass.
                    ablation_layer.set_next_batch(
                        input_batch_index = batch_index,
                        activations = self.activations,
                        num_channels_to_ablate = batch_tensor.size(0))
                    score = [target(o).cpu().item()
                             for o in self.model(batch_tensor)]
                    new_scores.extend(score)
                    ablation_layer.indices = ablation_layer.indices[batch_tensor.size(
                        0):]

                new_scores = self.assemble_ablation_scores(
                    new_scores,
                    original_scores[batch_index],
                    channels_to_ablate,
                    number_of_channels)
                weights.extend(new_scores)

        weights = np.float32(weights)
        weights = weights.reshape(activations.shape[:2])
        original_scores = original_scores[:, None]
        weights = (original_scores - weights) / original_scores

        # Replace the model back to the original state
        replace_layer_recursive(self.model, ablation_layer, target_layer)
        # Returning the weights from new_scores
        return weights

功能：

Ablation-CAM 的核心方法，重写了 BaseCAM 的抽象方法。它负责执行消融实验并计算每个通道的权重。

流程：

获取原始激活和分数：
注册 save_activation Hook 到 target_layer。
执行一次无梯度的前向传播 (with torch.no_grad())。
移除 Hook。
从模型输出 outputs 中，根据 targets 计算每个样本的原始预测分数 (original_scores)，并缓存 self.activations。
替换目标层：
调用 replace_layer_recursive(self.model, target_layer, ablation_layer)。这会将模型中的 target_layer替换为 self.ablation_layer。这样，后续的前向传播将通过 AblationLayer 进行，从而可以执行消融。
为每个输入样本执行消融实验：
获取总通道数 number_of_channels。
初始化 weights 列表。
使用 tqdm 进度条，遍历批次中的每个样本 (batch_index) 和其对应的目标 target。
确定要消融的通道：
调用 ablation_layer.activations_to_be_ablated(activations[batch_index, :], self.ratio_channels_to_ablate)。这会返回一个数组 channels_to_ablate，包含该样本要被消融的通道索引。
批量执行消融前向传播：

创建一个 batch_tensor，将当前样本的输入 tensor 复制 self.batch_size 份。
遍历 channels_to_ablate，以 self.batch_size 为批次大小进行分批处理。
- 调整 batch_tensor 的大小以匹配当前批次。
- 关键步骤：调用 ablation_layer.set_next_batch(...)。这会将 self.activations（即缓存的原始激活）中当前样本的那份激活图复制 num_channels_to_ablate 份，并存储在 ablation_layer 内部。set_next_batch 内部的 num_channels_to_ablate 通常就是 self.batch_size（或剩余通道数）。
- 执行前向传播：self.model(batch_tensor)。由于 target_layer 已被 ablation_layer 替换，这次前向传播会触发 ablation_layer.call 。call 会根据 set_next_batch 设置的 self.activations 和 self.indices，对 batch_tensor 中的每个副本执行一次消融（消融不同的通道）。
- 收集 batch_tensor 中每个样本（即每次消融）的预测分数 score，并添加到 new_scores 列表。
- 更新 ablation_layer.indices（移除已处理的索引，为下一批做准备）。

处理消融结果：
调用 assemble_ablation_scores(new_scores, original_scores[batch_index], channels_to_ablate, number_of_channels)，将不完整的分数列表补全。
将补全后的分数列表 new_scores 添加到 weights。
计算权重：

将 weights 转换为 np.float32 并重塑为 (B, C) 形状。
计算权重：weights = (original_scores - weights) / original_scores。
- original_scores - weights：每个通道被消融后的分数下降量。下降越多，说明该通道越重要。
- / original_scores：进行归一化，使其成为一个相对重要性度量。

恢复模型：
调用 replace_layer_recursive(self.model, ablation_layer, target_layer)，将模型恢复到原始状态，确保外部模型不受影响。
返回：
计算出的权重 weights。

输出：

一个 np.ndarray，形状为 (B, C)，表示每个样本的每个通道的重要性权重。

总结

AblationCAM 的工作流程清晰而巧妙：

缓存原始状态：先执行一次正常前向传播，获取原始激活和预测分数。

替换层：将目标层临时替换为 AblationLayer，以便控制特征图。

选择通道：（可选）根据实验性方法筛选出最值得消融的通道。

批量消融：对于每个输入样本，利用 set_next_batch 准备数据，然后通过 AblationLayer.call 并行地对多个通道执行消融，并收集消融后的预测分数。

补全结果：将消融实验的结果补全成完整的通道分数列表。

计算权重：通过比较消融前后的分数下降量，计算出每个通道的重要性权重。

恢复模型：将模型恢复原状。

生成热力图：父类 BaseCAM 会使用这些权重与原始激活相乘并求和，最终生成 CAM 热力图。

AblationCAM 的优势在于其直观性（直接测量影响）和不依赖梯度。其主要缺点是计算成本高，因为它需要对每个要消融的通道进行一次（或一批）前向传播。ratio_channels_to_ablate 和 batch_size 参数就是为了在准确性和效率之间进行权衡。

AblationLayer

所在文件位置：./pytorch-grad-cam/pytorch_grad_cam/ablation_layer.py

AblationLayer 是一个 PyTorch 模块，专门设计用于实现 Ablation-CAM 算法。Ablation-CAM 的核心思想是通过系统性地移除（ablate）卷积特征图中的某些通道，来衡量这些通道对模型最终预测的重要性。被移除后导致预测分数大幅下降的通道，其对应的原始空间位置就被认为是重要的，从而生成热力图。这个类的关键在于它不直接计算梯度，而是通过一个"代理"层来修改前向传播过程中的特征，模拟通道被移除的效果。

1. init(self)

python 复制代码

class AblationLayer(torch.nn.Module):
    def __init__(self):
        super(AblationLayer, self).__init__()

功能：

初始化 AblationLayer 实例。

操作：

调用父类 torch.nn.Module 的构造函数。该类本身没有在初始化时创建任何参数或状态。

2. objectiveness_mask_from_svd(self, activations, threshold=0.01)

python 复制代码

    def objectiveness_mask_from_svd(self, activations, threshold=0.01):
        """ Experimental method to get a binary mask to compare if the activation is worth ablating.
            The idea is to apply the EigenCAM method by doing PCA on the activations.
            Then we create a binary mask by comparing to a low threshold.
            Areas that are masked out, are probably not interesting anyway.
        """

        projection = get_2d_projection(activations[None, :])[0, :]
        projection = np.abs(projection)
        projection = projection - projection.min()
        projection = projection / projection.max()
        projection = projection > threshold
        return projection

功能：

这是一个实验性方法，旨在生成一个二值掩码，用于判断特征图的哪些空间区域"值得"被消融（ablate）。其思想是，如果一个区域在所有通道上的激活模式都很弱或不显著，那么消融它可能不会影响预测，因此不值得考虑。

输入：

•activations: 一个二维数组，形状为 (C, HW) 或 (HW, C)，代表单个样本在某个目标层的激活值（通道 x 空间位置）。

流程：

SVD 投影：调用 get_2d_projection(activations[None, :])。activations[None, :] 将 activations 增加一个批次维度（变为 (1, C, HW) 或 (1, HW, C)），以符合 get_2d_projection 的输入要求。get_2d_projection 内部执行 SVD，并返回数据在第一主成分方向上的投影。结果形状为 (1, H*W) 或 (1, C)。[0, :] 取出第一个（也是唯一的）样本的投影，得到一个一维数组。
取绝对值：np.abs(projection)。因为消融的影响是负面的，我们关心的是激活的"强度"而非方向。
归一化到 [0, 1]：通过减去最小值再除以最大值，将投影值缩放到 [0, 1]。
生成二值掩码：projection > threshold。将归一化后的投影值与一个很小的阈值（默认 0.01）比较，生成一个布尔数组。值为 True 的位置表示该空间区域的激活模式相对显著，"值得"被进一步分析。

输出：

一个布尔数组（或二值数组），形状与投影的空间维度相同（如 (H*W,)），表示"显著区域"的掩码。

目的：

为 activations_to_be_ablated 方法提供一个先验的"重要区域"掩码。

3. activations_to_be_ablated(self, activations, ratio_channels_to_ablate=1.0)

python 复制代码

    def activations_to_be_ablated(
            self,
            activations,
            ratio_channels_to_ablate=1.0):
        """ Experimental method to get a binary mask to compare if the activation is worth ablating.
            Create a binary CAM mask with objectiveness_mask_from_svd.
            Score each Activation channel, by seeing how much of its values are inside the mask.
            Then keep the top channels.

        """
        if ratio_channels_to_ablate == 1.0:
            self.indices = np.int32(range(activations.shape[0]))
            return self.indices

        projection = self.objectiveness_mask_from_svd(activations)

        scores = []
        for channel in activations:
            normalized = np.abs(channel)
            normalized = normalized - normalized.min()
            normalized = normalized / np.max(normalized)
            score = (projection * normalized).sum() / normalized.sum()
            scores.append(score)
        scores = np.float32(scores)

        indices = list(np.argsort(scores))
        high_score_indices = indices[::-
                                     1][: int(len(indices) *
                                              ratio_channels_to_ablate)]
        low_score_indices = indices[: int(
            len(indices) * ratio_channels_to_ablate)]
        self.indices = np.int32(high_score_indices + low_score_indices)
        return self.indices

功能：

这是另一个实验性方法，旨在决定在消融实验中，应该选择哪些通道（channels）进行消融。它不消融所有通道，而是根据某种"重要性"评分，选择最值得消融的通道子集，以提高效率或聚焦关键通道。

输入：

•activations: 一个三维数组，形状为 (C, H, W)，代表单个样本的激活图。

•ratio_channels_to_ablate: 一个浮点数（0-1），表示要消融的通道占总通道数的比例。

流程：

特殊情况：如果 ratio_channels_to_ablate == 1.0，则选择所有通道。self.indices 被设置为所有通道的索引，并直接返回。
获取显著区域掩码：调用 self.objectiveness_mask_from_svd(activations) 得到 projection 掩码。
计算每个通道的评分：
•遍历 activations 中的每个通道 channel（形状为 (H, W)）。
•对该通道的激活值取绝对值，并归一化到 [0, 1]（normalized）。
•计算评分 score：(projection * normalized).sum() / normalized.sum()。
•projection * normalized：将归一化的通道激活图与"显著区域"掩码进行逐元素相乘。这相当于只保留了在"显著区域"内的激活值。
•.sum()：对这些保留的值求和。
•/ normalized.sum()：除以该通道所有归一化激活值的总和。这可以看作是"显著区域内的激活强度"占"总激活强度"的比例。
•将每个通道的 score 添加到 scores 列表。
选择通道索引：
•indices = list(np.argsort(scores))：将 scores 按升序排序，得到索引列表。
•high_score_indices = indices[::-1][:int(len(indices) * ratio_channels_to_ablate)]：取评分最高的前 ratio_channels_to_ablate 比例的通道索引。
•low_score_indices = indices[:int(len(indices) * ratio_channels_to_ablate)]：取评分最低的前 ratio_channels_to_ablate 比例的通道索引。（注意：这个逻辑可能存在问题，通常我们只关心高分或低分通道，同时取两者意义不明。）
•self.indices = np.int32(high_score_indices + low_score_indices)：将高分和低分通道的索引合并，并转换为整数数组，存储在 self.indices 中。

输出：

返回 self.indices，即选定的要被消融的通道索引数组。

目的：

提供一种启发式方法来筛选出"最值得"或"最不值得"消融的通道，以优化 Ablation-CAM 的过程。

4. set_next_batch(self, input_batch_index, activations, num_channels_to_ablate)

python 复制代码

    def set_next_batch(
            self,
            input_batch_index,
            activations,
            num_channels_to_ablate):
        """ This creates the next batch of activations from the layer.
            Just take corresponding batch member from activations, and repeat it num_channels_to_ablate times.
        """
        self.activations = activations[input_batch_index, :, :, :].clone(
        ).unsqueeze(0).repeat(num_channels_to_ablate, 1, 1, 1)

功能：

为下一次前向传播准备和设置要被修改的激活数据。这是 Ablation-CAM 实现的关键步骤，因为它决定了在 call 中将要操作的数据。

输入：

•input_batch_index: 当前处理的输入样本在批次中的索引。

•activations: 目标层的完整激活值，形状为 (B, C, H, W)，其中 B 是批次大小。

•num_channels_to_ablate: 要消融的通道数量。

流程：

activations[input_batch_index, :, :, :]：从 activations 中提取出当前样本的激活图（形状 (C, H, W)）。
.clone()：创建一个副本，避免修改原始数据。
.unsqueeze(0)：增加一个批次维度，变为 (1, C, H, W)。
.repeat(num_channels_to_ablate, 1, 1, 1)：将这个单一样本的激活图在批次维度上重复 num_channels_to_ablate 次，得到一个形状为 (num_channels_to_ablate, C, H, W) 的张量。
self.activations = ...：将这个重复后的张量赋值给 self.activations。

目的：

为后续的 call 准备数据。num_channels_to_ablate 通常等于要消融的通道数。这样，call 的输出也将有 num_channels_to_ablate 个样本，每个样本对应一次"只消融一个特定通道"的实验。例如，如果有 10 个通道要消融，set_next_batch 会准备 10 个完全相同的激活图副本。然后在 call 中，对这 10 个副本分别执行"消融第1个通道"、"消融第2个通道"..."消融第10个通道"的操作。

5. call(self, x)

python 复制代码

    def __call__(self, x):
        output = self.activations
        for i in range(output.size(0)):
            # Commonly the minimum activation will be 0,
            # And then it makes sense to zero it out.
            # However depending on the architecture,
            # If the values can be negative, we use very negative values
            # to perform the ablation, deviating from the paper.
            if torch.min(output) == 0:
                output[i, self.indices[i], :] = 0
            else:
                ABLATION_VALUE = 1e7
                output[i, self.indices[i], :] = torch.min(
                    output) - ABLATION_VALUE

        return output

功能：

这是模块的核心，定义了在前向传播中如何修改输入 x（即目标层的激活值）。

输入：

x - 传入该层的激活值（通常在实际使用中，这个 x 可能不会被用到，因为 self.activations 已经在 set_next_batch 中设置好了）。

流程：

output = self.activations：获取在 set_next_batch 中设置好的、已经重复过的激活张量。
for i in range(output.size(0)):：遍历 output 的每一个样本（即每一次消融实验）。
判断消融策略：
•if torch.min(output) == 0:：检查激活值的最小值。如果最小值是 0（常见于 ReLU 激活后的特征图），则采用置零策略：output[i, self.indices[i], :] = 0。这直接将第 i 个样本中，由 self.indices[i] 指定的通道的所有空间位置的激活值设为 0。
•else:：如果激活值可以为负（例如，使用了其他激活函数），则采用极值偏移策略：
•ABLAITION_VALUE = 1e7：定义一个非常大的数值。
•output[i, self.indices[i], :] = torch.min(output) - ABLATION_VALUE：将指定通道的激活值设置为一个远低于当前最小值的极小数（接近负无穷）。
返回：修改后的 output。

目的：

模拟通道被"移除"或"破坏"的效果。对于 ReLU 特征图，置零是最直接的"移除"；对于可负特征图，设置为极小值可以确保它在后续计算中被 ReLU 截断为 0，或者其影响被最小化。

•与 set_next_batch 的协作：set_next_batch 准备了 num_channels_to_ablate 个相同的激活副本。call 则对这 num_channels_to_ablate 个副本中的每一个，执行一次独立的消融操作（消融不同的通道）。最终，模型会对这 num_channels_to_ablate 个被修改过的特征图进行后续计算，从而得到 num_channels_to_ablate 个不同的预测分数。通过比较这些分数与原始分数的差异，就可以量化每个被消融通道的重要性。

总结

AblationLayer 类是 Ablation-CAM 算法的执行引擎：

数据准备：set_next_batch 负责准备一批相同的激活图副本，每个副本将用于一次独立的消融实验。
通道选择：activations_to_be_ablated（可选）提供了一种智能选择要消融通道的策略。
消融操作：call 是核心，它根据 self.activations 和 self.indices，对输入的激活图执行实际的"移除"操作（置零或设为极小值）。
实验性方法：objectiveness_mask_from_svd 和 activations_to_be_ablated 是辅助的、实验性的方法，用于优化消融过程，通过 SVD 和评分机制来聚焦于最可能影响预测的通道或区域。
这个类的设计使得 Ablation-CAM 能够通过多次前向传播，系统地评估每个通道的重要性，从而生成解释性的热力图。