【论文阅读】自适应稀疏自注意力——可直接用!

|----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 论文标题 | Adaptive Sparse Self-Attention for Efficient Image Super-resolution and Beyond 自适应稀疏自注意力在高效图像超分辨率及更广泛应用中的研究 |
| 论文作者 | Pan, Jinshan; Sun, Long; Song, Lianhong; Dong, Jiangxin; Yang, Jian; |
| 发表日期 | 2026年 |
| GB引用 | Pan J, Sun L, Song L, et al. Adaptive Sparse Self-Attention for Efficient Image Super-resolution and beyond[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026. |
| DOI | https://doi.org/10.1109/TPAMI.2026.3670856 |

代码与原文可在资源库中下载

全文概述

本文提出了一种自适应稀疏自注意力机制,旨在解决图像超分辨率任务中传统自注意力机制存在的计算效率低、局部细节恢复能力不足的问题。现有基于Transformer的方法通过全局上下文建模提升图像重建质量,但其计算复杂度高且依赖所有token间的相似度计算,导致无关token干扰特征聚合,同时缺乏对局部高频信息的有效建模。

针对这些问题,作者设计了两阶段解决方案:首先通过局部空间变分特征估计模块,利用动态卷积网络提取局部空间变异特征,替代传统线性投影操作,增强局部细节建模能力;其次提出稀疏自注意力机制,通过ReLU等选择性函数筛选关键token的相似度值,减少无关token干扰,实现高效特征聚合。

实验表明,该方法在Urban100等基准数据集上以更少参数量和计算量达到SOTA性能,且在图像去噪、JPEG伪影去除等任务中表现出色。理论分析证明该方法同时建模局部与非局部特征,通过稀疏选择机制提升结构细节恢复能力,为Transformer在图像修复任务中的应用提供了新思路。

术语解释

  1. 自适应稀疏自注意力(Adaptive Sparse Self-Attention): 一种改进的自注意力机制,通过选择性函数动态筛选关键token的相似度值,减少无关token干扰,实现高效特征聚合。该方法在保持全局上下文建模能力的同时,显著降低计算复杂度。
  2. 局部空间变分特征估计(Local Spatial-Variant Feature Estimation): 基于动态卷积网络的特征提取模块,通过为每个像素生成K×K动态卷积核,捕捉自然图像的局部空间变异细节,替代传统线性投影操作,增强局部纹理建模能力。
  3. 稀疏自注意力(Sparse Self-Attention): 采用ReLU等选择性函数对自注意力矩阵进行稀疏化处理,仅保留高相似度token的特征贡献,抑制低相关token干扰。该机制将计算复杂度从O(N²)降至O(N),同时提升结构细节恢复质量。

Adaptive Sparse Self-Attention Block 框架

Adaptive Sparse Self-Attention Block 代码实现

ASSA Block 代码

python 复制代码
'''自适应稀疏自注意力模块(Adaptive Sparse Self-Attention, ASSA)
核心设计:整合"动态深度卷积 + 自适应稀疏策略 + 训练-测试双模式",通过"特征投影→动态增强→稀疏注意力计算→特征还原"的流程,
自适应控制注意力稀疏度,在训练时探索全局依赖,测试时通过局部窗口(TLC)提升推理效率,平衡模型性能与部署效率
'''

import torch
import torch.nn as nn
import torch.nn.functional as F
from einops import rearrange
from idynamic_dwconv import IDynamicDWConv

class MaskedSoftmax(nn.Module):
    def __init__(self):
        super(MaskedSoftmax, self).__init__()
        self.softmax = nn.Softmax(dim=-1)

    def forward(self, x):
        mask = x > 0
        x = self.softmax(x)
        x = torch.where(mask > 0, x, torch.zeros_like(x))

        return x

class TopK(nn.Module):
    def __init__(self):
        super(TopK, self).__init__()

    def forward(self, x):
        b, h, C, _ = x.shape
        mask = torch.zeros(b, h, C, C, device=x.device, requires_grad=False)
        index = torch.topk(x, k=int(C/4), dim=-1, largest=True)[1]
        mask.scatter_(-1, index, 1.)
        result = torch.where(mask > 0, x, torch.zeros_like(x))

        return result

# Sparse Self-Attention
class SparseSelfAttention(nn.Module):
    """
    自适应稀疏自注意力模块(Adaptive Sparse Self-Attention, ASSA)
    功能:动态特征增强+自适应稀疏注意力,平衡全局依赖捕捉与计算效率
    核心设计:
        - 动态深度卷积:IDynamicDWConv增强QV特征,提升局部依赖捕捉
        - 多稀疏策略适配:支持ReLU/Softmax/MaskedSoftmax/TopK/GELU/Sigmoid多种激活,灵活控制稀疏度
        - 训练-测试双模式:训练时全局稀疏注意力,测试时TLC局部窗口注意力,兼顾性能与推理速度
        - 可学习温度因子:自适应调整注意力分布锐度,优化稀疏关联
    Args:
        dim: 输入/输出通道数
        num_heads: 注意力头数
        bias: 输出投影是否带偏置(默认False)
        tlc_flag: 是否启用TLC测试模式(默认True)
        tlc_kernel: 测试时局部窗口尺寸(默认48)
        activation: 稀疏策略激活函数(默认'relu')
    """
    def __init__(self, dim, num_heads, bias, tlc_flag=True, tlc_kernel=48, activation='relu'):
        super(SparseSelfAttention, self).__init__()
        self.tlc_flag = tlc_flag    # TLC flag for validation and test
        self.num_heads = num_heads
        self.temperature = nn.Parameter(torch.ones(num_heads, 1, 1))

        self.project_in = nn.Conv2d(dim, dim * 2, 1, bias=False)
        self.dynamic_conv = IDynamicDWConv(dim * 2, kernel_size=3, bias=False)
        self.project_out = nn.Conv2d(dim, dim, kernel_size=1, bias=bias)

        self.act = nn.Identity()
        if activation == 'relu':
            self.act = nn.ReLU()
        elif activation == 'softmax':
            self.act = nn.Softmax(dim=-1)
        elif activation == 'maskedsoftmax':
            self.act = MaskedSoftmax()
        elif activation == 'topk':
            self.act = TopK()
        elif activation == 'gelu':
            self.act = nn.GELU()
        elif activation == 'sigmoid':
            self.act = nn.Sigmoid()

        # [x2, x3, x4] -> [96, 72, 48]
        self.kernel_size = [tlc_kernel, tlc_kernel]

    def _forward(self, qv):
        q, v = qv.chunk(2, dim=1)

        q = rearrange(q, 'b (head c) h w -> b head c (h w)', head=self.num_heads)
        v = rearrange(v, 'b (head c) h w -> b head c (h w)', head=self.num_heads)

        q = F.normalize(q, dim=-1)
        k = F.normalize(v, dim=-1)

        attn = (q @ k.transpose(-2, -1)) * self.temperature
        attn = self.act(attn)
        out = (attn @ v)

        return out

    def forward(self, x):
        b, c, h, w = x.shape

        qv = self.dynamic_conv(self.project_in(x))
        if self.training or not self.tlc_flag:
            out = self._forward(qv)
            out = rearrange(out, 'b head c (h w) -> b (head c) h w', head=self.num_heads, h=h, w=w)

            out = self.project_out(out)
            return out

        # Then we use the TLC methods in test mode
        qv = self.grids(qv)  # convert to local windows
        out = self._forward(qv)
        out = rearrange(out, 'b head c (h w) -> b (head c) h w', head=self.num_heads, h=qv.shape[-2], w=qv.shape[-1])
        out = self.grids_inverse(out)  # reverse

        out = self.project_out(out)
        return out

    # Code from [megvii-research/TLC] https://github.com/megvii-research/TLC
    def grids(self, x):
        b, c, h, w = x.shape
        self.original_size = (b, c // 2, h, w)
        assert b == 1
        k1, k2 = self.kernel_size
        k1 = min(h, k1)
        k2 = min(w, k2)
        num_row = (h - 1) // k1 + 1
        num_col = (w - 1) // k2 + 1
        self.nr = num_row
        self.nc = num_col

        import math
        step_j = k2 if num_col == 1 else math.ceil((w - k2) / (num_col - 1) - 1e-8)
        step_i = k1 if num_row == 1 else math.ceil((h - k1) / (num_row - 1) - 1e-8)

        parts = []
        idxes = []
        i = 0# 0~h-1
        last_i = False
        while i < h and not last_i:
            j = 0
            if i + k1 >= h:
                i = h - k1
                last_i = True
            last_j = False
            while j < w and not last_j:
                if j + k2 >= w:
                    j = w - k2
                    last_j = True
                parts.append(x[:, :, i:i + k1, j:j + k2])
                idxes.append({'i': i, 'j': j})
                j = j + step_j
            i = i + step_i

        parts = torch.cat(parts, dim=0)
        self.idxes = idxes
        return parts

    def grids_inverse(self, outs):
        preds = torch.zeros(self.original_size).to(outs.device)
        b, c, h, w = self.original_size

        count_mt = torch.zeros((b, 1, h, w)).to(outs.device)
        k1, k2 = self.kernel_size
        k1 = min(h, k1)
        k2 = min(w, k2)

        for cnt, each_idx in enumerate(self.idxes):
            i = each_idx['i']
            j = each_idx['j']
            preds[0, :, i:i + k1, j:j + k2] += outs[cnt, :, :, :]
            count_mt[0, 0, i:i + k1, j:j + k2] += 1.

        del outs
        torch.cuda.empty_cache()
        return preds / count_mt

if __name__ == "__main__":
    device = torch.device('cuda:0'if torch.cuda.is_available() else'cpu')

    x = torch.randn(1, 64, 32, 32).to(device)
    model = SparseSelfAttention(64, num_heads=4, tlc_flag=True, tlc_kernel=48, activation='relu', bias=False).to(device)

    y = model(x)

    print("输入特征维度: ", x.shape)
    print("输出特征维度: ", y.shape)
    print(f"ASSA 注意力模块参数量: {sum(p.numel() for p in model.parameters())/1e6:.2f}M")

Idynamic Dwconv代码

python 复制代码
"""
    code from github: https://github.com/Atten4Vis/DemystifyLocalViT
    Thanks to the bravo job from Han, Qi and Fan, Zejia and Dai, Qi and Sun, Lei and Cheng, Ming-Ming and Liu, Jiaying and Wang, Jingdong
    Paper: "On the Connection between Local Attention and Dynamic Depth-wise Convolution" ICLR 2022 Spotlight
"""

"""
@inproceedings{han2021connection,
  title={On the Connection between Local Attention and Dynamic Depth-wise Convolution},
  author={Han, Qi and Fan, Zejia and Dai, Qi and Sun, Lei and Cheng, Ming-Ming and Liu, Jiaying and Wang, Jingdong},
  booktitle={International Conference on Learning Representations},
  year={2022}
}
"""

from torch.autograd import Function
import torch
from torch.nn.modules.utils import _pair
import torch.nn as nn
from einops import rearrange

from collections import namedtuple
import cupy     # idynamic implement is based on cupy-cuda
from string import Template

Stream = namedtuple('Stream', ['ptr'])

def Dtype(t):
    if isinstance(t, torch.cuda.FloatTensor):
        return 'float'
    elif isinstance(t, torch.cuda.DoubleTensor):
        return 'double'

# @cupy._util.memoize(for_each_device=True)
# def load_kernel(kernel_name, code, **kwargs):
#     code = Template(code).substitute(**kwargs)
#     kernel_code = cupy.cuda.compile_with_cache(code)
#     return kernel_code.get_function(kernel_name)

@cupy._util.memoize(for_each_device=True)
def load_kernel(kernel_name, code, **kwargs):
    code = Template(code).substitute(**kwargs)
    return cupy.RawKernel(code, kernel_name)

CUDA_NUM_THREADS = 1024
# if you use in 3090 and above, please set 1024 for the fastest calculation
# CUDA_NUM_THREADS = 1024   # FIXME: cuda

kernel_loop = '''
#define CUDA_KERNEL_LOOP(i, n)                        \
  for (int i = blockIdx.x * blockDim.x + threadIdx.x; \
      i < (n);                                       \
      i += blockDim.x * gridDim.x)
'''

def GET_BLOCKS(N):
    return (N + CUDA_NUM_THREADS - 1) // CUDA_NUM_THREADS

_idynamic_kernel = kernel_loop + '''
extern "C"
__global__ void idynamic_forward_kernel(
const ${Dtype}* bottom_data, const ${Dtype}* weight_data, ${Dtype}* top_data) {
  CUDA_KERNEL_LOOP(index, ${nthreads}) {
    const int n = index / ${channels} / ${top_height} / ${top_width};
    const int c = (index / ${top_height} / ${top_width}) % ${channels};
    const int h = (index / ${top_width}) % ${top_height};
    const int w = index % ${top_width};
    const int g = c / (${channels} / ${groups});
    ${Dtype} value = 0;
    #pragma unroll
    for (int kh = 0; kh < ${kernel_h}; ++kh) {
      #pragma unroll
      for (int kw = 0; kw < ${kernel_w}; ++kw) {
        const int h_in = -${pad_h} + h * ${stride_h} + kh * ${dilation_h};
        const int w_in = -${pad_w} + w * ${stride_w} + kw * ${dilation_w};
        if ((h_in >= 0) && (h_in < ${bottom_height})
          && (w_in >= 0) && (w_in < ${bottom_width})) {
          const int offset = ((n * ${channels} + c) * ${bottom_height} + h_in)
            * ${bottom_width} + w_in;
          const int offset_weight = ((((n * ${groups} + g) * ${kernel_h} + kh) * ${kernel_w} + kw) * ${top_height} + h)
            * ${top_width} + w;
          value += weight_data[offset_weight] * bottom_data[offset];
        }
      }
    }
    top_data[index] = value;
  }
}
'''

_idynamic_kernel_backward_grad_input = kernel_loop + '''
extern "C"
__global__ void idynamic_backward_grad_input_kernel(
    const ${Dtype}* const top_diff, const ${Dtype}* const weight_data, ${Dtype}* const bottom_diff) {
  CUDA_KERNEL_LOOP(index, ${nthreads}) {
    const int n = index / ${channels} / ${bottom_height} / ${bottom_width};
    const int c = (index / ${bottom_height} / ${bottom_width}) % ${channels};
    const int h = (index / ${bottom_width}) % ${bottom_height};
    const int w = index % ${bottom_width};
    const int g = c / (${channels} / ${groups});
    ${Dtype} value = 0;
    #pragma unroll
    for (int kh = 0; kh < ${kernel_h}; ++kh) {
      #pragma unroll
      for (int kw = 0; kw < ${kernel_w}; ++kw) {
        const int h_out_s = h + ${pad_h} - kh * ${dilation_h};
        const int w_out_s = w + ${pad_w} - kw * ${dilation_w};
        if (((h_out_s % ${stride_h}) == 0) && ((w_out_s % ${stride_w}) == 0)) {
          const int h_out = h_out_s / ${stride_h};
          const int w_out = w_out_s / ${stride_w};
          if ((h_out >= 0) && (h_out < ${top_height})
                && (w_out >= 0) && (w_out < ${top_width})) {
            const int offset = ((n * ${channels} + c) * ${top_height} + h_out)
                  * ${top_width} + w_out;
            const int offset_weight = ((((n * ${groups} + g) * ${kernel_h} + kh) * ${kernel_w} + kw) * ${top_height} + h_out)
                  * ${top_width} + w_out;
            value += weight_data[offset_weight] * top_diff[offset];
          }
        }
      }
    }
    bottom_diff[index] = value;
  }
}
'''

_idynamic_kernel_backward_grad_weight = kernel_loop + '''
extern "C"
__global__ void idynamic_backward_grad_weight_kernel(
    const ${Dtype}* const top_diff, const ${Dtype}* const bottom_data, ${Dtype}* const buffer_data) {
  CUDA_KERNEL_LOOP(index, ${nthreads}) {
    const int h = (index / ${top_width}) % ${top_height};
    const int w = index % ${top_width};
    const int kh = (index / ${kernel_w} / ${top_height} / ${top_width})
          % ${kernel_h};
    const int kw = (index / ${top_height} / ${top_width}) % ${kernel_w};
    const int h_in = -${pad_h} + h * ${stride_h} + kh * ${dilation_h};
    const int w_in = -${pad_w} + w * ${stride_w} + kw * ${dilation_w};
    if ((h_in >= 0) && (h_in < ${bottom_height})
          && (w_in >= 0) && (w_in < ${bottom_width})) {
      const int g = (index / ${kernel_h} / ${kernel_w} / ${top_height} / ${top_width}) % ${groups};
      const int n = (index / ${groups} / ${kernel_h} / ${kernel_w} / ${top_height} / ${top_width}) % ${num};
      ${Dtype} value = 0;
      #pragma unroll
      for (int c = g * (${channels} / ${groups}); c < (g + 1) * (${channels} / ${groups}); ++c) {
        const int top_offset = ((n * ${channels} + c) * ${top_height} + h)
              * ${top_width} + w;
        const int bottom_offset = ((n * ${channels} + c) * ${bottom_height} + h_in)
              * ${bottom_width} + w_in;
        value += top_diff[top_offset] * bottom_data[bottom_offset];
      }
      buffer_data[index] = value;
    } else {
      buffer_data[index] = 0;
    }
  }
}
'''

class _idynamic(Function):
    @staticmethod
    def forward(ctx, input, weight, stride, padding, dilation):
        assert input.dim() == 4 and input.is_cuda
        assert weight.dim() == 6 and weight.is_cuda
        batch_size, channels, height, width = input.size()
        kernel_h, kernel_w = weight.size()[2:4]
        output_h = int((height + 2 * padding[0] - (dilation[0] * (kernel_h - 1) + 1)) / stride[0] + 1)
        output_w = int((width + 2 * padding[1] - (dilation[1] * (kernel_w - 1) + 1)) / stride[1] + 1)

        output = input.new(batch_size, channels, output_h, output_w)
        n = output.numel()

        with torch.cuda.device_of(input):
            f = load_kernel('idynamic_forward_kernel', _idynamic_kernel, Dtype=Dtype(input), nthreads=n,
                            num=batch_size, channels=channels, groups=weight.size()[1],
                            bottom_height=height, bottom_width=width,
                            top_height=output_h, top_width=output_w,
                            kernel_h=kernel_h, kernel_w=kernel_w,
                            stride_h=stride[0], stride_w=stride[1],
                            dilation_h=dilation[0], dilation_w=dilation[1],
                            pad_h=padding[0], pad_w=padding[1])
            f(block=(CUDA_NUM_THREADS, 1, 1),
              grid=(GET_BLOCKS(n), 1, 1),
              args=[input.data_ptr(), weight.data_ptr(), output.data_ptr()],
              stream=Stream(ptr=torch.cuda.current_stream().cuda_stream))

        ctx.save_for_backward(input, weight)
        ctx.stride, ctx.padding, ctx.dilation = stride, padding, dilation
        return output

    @staticmethod
    def backward(ctx, grad_output):
        assert grad_output.is_cuda
        if not grad_output.is_contiguous():
            grad_output.contiguous()
        input, weight = ctx.saved_tensors
        stride, padding, dilation = ctx.stride, ctx.padding, ctx.dilation

        batch_size, channels, height, width = input.size()
        kernel_h, kernel_w = weight.size()[2:4]
        output_h, output_w = grad_output.size()[2:]

        grad_input, grad_weight = None, None

        opt = dict(Dtype=Dtype(grad_output),
                   num=batch_size, channels=channels, groups=weight.size()[1],
                   bottom_height=height, bottom_width=width,
                   top_height=output_h, top_width=output_w,
                   kernel_h=kernel_h, kernel_w=kernel_w,
                   stride_h=stride[0], stride_w=stride[1],
                   dilation_h=dilation[0], dilation_w=dilation[1],
                   pad_h=padding[0], pad_w=padding[1])

        with torch.cuda.device_of(input):
            if ctx.needs_input_grad[0]:
                grad_input = input.new(input.size())

                n = grad_input.numel()
                opt['nthreads'] = n

                f = load_kernel('idynamic_backward_grad_input_kernel',
                                _idynamic_kernel_backward_grad_input, **opt)
                f(block=(CUDA_NUM_THREADS, 1, 1),
                  grid=(GET_BLOCKS(n), 1, 1),
                  args=[grad_output.data_ptr(), weight.data_ptr(), grad_input.data_ptr()],
                  stream=Stream(ptr=torch.cuda.current_stream().cuda_stream))

            if ctx.needs_input_grad[1]:
                grad_weight = weight.new(weight.size())

                n = grad_weight.numel()
                opt['nthreads'] = n

                f = load_kernel('idynamic_backward_grad_weight_kernel',
                                _idynamic_kernel_backward_grad_weight, **opt)
                f(block=(CUDA_NUM_THREADS, 1, 1),
                  grid=(GET_BLOCKS(n), 1, 1),
                  args=[grad_output.data_ptr(), input.data_ptr(), grad_weight.data_ptr()],
                  stream=Stream(ptr=torch.cuda.current_stream().cuda_stream))

        return grad_input, grad_weight, None, None, None

def _idynamic_cuda(input, weight, bias=None, stride=1, padding=0, dilation=1):
    """ idynamic kernel
    """
    assert input.size(0) == weight.size(0)
    assert input.size(-2) // stride == weight.size(-2)
    assert input.size(-1) // stride == weight.size(-1)
    if input.is_cuda:
        out = _idynamic.apply(input, weight, _pair(stride), _pair(padding), _pair(dilation))
        if bias is not None:
            out += bias.view(1, -1, 1, 1)
    else:
        raise NotImplementedError
    return out

class IDynamicDWConv(nn.Module):
    def __init__(self, dim, kernel_size, bias):
        super(IDynamicDWConv, self).__init__()
        self.groups = dim
        self.kernel_size = kernel_size
        self.weight = nn.Conv2d(dim, dim * kernel_size ** 2, kernel_size=3, stride=1, padding=1, groups=dim, bias=bias)

    def forward(self, x):
        b, c, h, w = x.shape
        weight = self.weight(x)
        weight = weight.view(b, self.groups, self.kernel_size, self.kernel_size, h, w)
        out = _idynamic_cuda(x, weight, stride=1, padding=(self.kernel_size - 1) // 2)
        return out

ASSA Block 核心思想

为减少无关token干扰,采用转置缩放点积操作计算注意力矩阵,并用ReLU作为选择函数(S)筛选有效相似性值,保留与当前token最相关的信息。同时,通过共享参数模块估计K和V,降低模型复杂度。该机制使注意力图稀疏化,提升特征聚合效率。

ASSA Block 具体过程

1. 数据预处理与重塑

采用kv共享策略(即键K和值V通过同一模块共享参数估计),将输入特征重塑为查询()和值(),形状均为HW×C(其中HW为图像空间维度乘积,C为通道数)。【对应于原文公式5和公式6】

2. 稀疏注意力矩阵计算

通过转置缩放点积操作计算注意力矩阵,具体为的乘积,再经过选择函数(S)处理。文中选用ReLU作为选择函数,通过非线性激活筛选出有效相似性值,生成稀疏注意力矩阵A(形状为C×C),保留与当前token最相关的信息,减少无关token干扰 。【对应于原文公式7】

3. 特征聚合与残差连接

利用稀疏注意力矩阵A对值特征进行聚合,通过重塑函数(T)将结果恢复为H×W×C的空间维度,再经1×1卷积调整通道,并与输入特征()进行残差连接,得到聚合特征【对应于原文公式8】

4. 门控前馈网络增强

应用层归一化(LN)和1×1卷积,生成中间特征(通道数扩展为4C),经3×3深度卷积后按通道分割为。通过GELU激活,并与逐元素相乘,最终经1×1卷积输出增强后的特征,进一步提升表示能力。【对应于原文公式9】

ASSA Block 创新点

  • 自适应稀疏策略机制:多稀疏模式适配,支持 6 种激活函数对应不同稀疏策略(Top-K 硬性稀疏、MaskedSoftmax 掩码稀疏、GELU 柔性稀疏等,稀疏选择函数最优为ReLU),可根据任务需求灵活选择,比单一稀疏模式更具泛化性。可学习温度因子,动态调整注意力分布锐度,让稀疏筛选更精准,避免无效关联干扰。

  • 动态深度卷积增强:引入 IDynamicDWConv 动态深度卷积,在 QV 投影后增强局部特征依赖,解决传统自注意力局部细节捕捉不足的问题,提升注意力特征质量。深度卷积轻量化设计,不显著增加计算成本,平衡局部增强与全局依赖捕捉。

  • 极简架构与强兼容性:Q=V 简化设计,减少一次特征投影,降低参数量;输入输出维度一致,支持任意通道数与空间尺寸,可直接嵌入 CNN、Transformer 等各类视觉模型,集成成本低。

参考内容

  1. https://mp.weixin.qq.com/s/IBNrcc5aubJiXM52-UtQBQ

2.https://mp.weixin.qq.com/s/yHYw8BnShWwZ1nO_JKsTBg

相关推荐
zhangshuang-peta2 分钟前
MCP:把不确定性变成工程能力
人工智能·ai agent·mcp·peta
开开心心就好10 分钟前
免费好用:PPT演示计时提醒工具
windows·计算机视觉·计算机外设·逻辑回归·excel·深度优先·csdn开发云
m0_5648768412 分钟前
提示词工程手册学习
人工智能·python·深度学习·学习
AI精钢37 分钟前
谷歌时隔一年发布“更加开源“的 Gemma 4,意图何为?
人工智能·云原生·开源·aigc
FakeOccupational40 分钟前
【电路笔记 通信】8B_10B编码 高速数据传输的串行数据编码技术 论文流程对应实现(简化版本,仅编码数值)
笔记
洞见新研社1 小时前
从算力到电力,谁在搭建AI时代的“能源基座”?
人工智能·能源
忙什么果1 小时前
Mamba学习笔记2:Mamba模型
android·笔记·学习
小程故事多_801 小时前
自然语言智能体控制框架,重塑AI Agent的协作与执行范式
人工智能·架构·aigc·ai编程·harness
悠哉悠哉愿意1 小时前
【物联网学习笔记】ADC
笔记·单片机·嵌入式硬件·物联网·学习
2501_933329551 小时前
技术深度拆解:Infoseek舆情系统的全链路架构与核心实现
开发语言·人工智能·分布式·架构