3.29+3.30 - 技术栈

The Transformer model has revolutionized natural language processing by introducing the self-attention mechanism. Unlike recurrent neural networks, Transformers process input sequences in parallel, which greatly improves computational efficiency. The self-attention mechanism enables the model to assign different weights to different words in a sentence, capturing long-range dependencies effectively . Multi-head attention further enhances the model's ability to learn diverse representations . In addition, positional encoding is used to incorporate sequence order information. Transformer-based models, such as BERT and GPT, have achieved state-of-the-art performance in tasks like machine translation, text generation, and question answering. Due to their scalability, Transformers have become the backbone of large language models and continue to drive advancements in artificial intelligence.

Transformer模型通过引入自注意力机制，彻底改变了自然语言处理领域的发展方向。与循环神经网络不同，Transformer可以并行处理输入序列，从而大幅提升计算效率。自注意力机制使模型能够为句子中不同词语分配不同的权重，从而有效捕捉长距离依赖关系 。多头注意力机制进一步增强了模型学习多样化特征表示的能力 。此外，位置编码用于引入序列的顺序信息。基于Transformer的模型，如BERT和GPT，在机器翻译、文本生成和问答系统等任务中取得了最先进的性能。由于其良好的可扩展性，Transformer已成为大语言模型的核心架构，并持续推动人工智能技术的发展

Convolutional Neural Networks (CNNs) are widely used in both image classification and dense prediction tasks such as image segmentation. A typical CNN consists of convolutional layers, activation functions, and pooling layers, which progressively extract hierarchical features from input images. Downsampling operations, such as max pooling and strided convolution, are used to reduce spatial resolution and enlarge the receptive field, allowing the network to capture high-level semantic information.

In tasks like semantic segmentation, preserving spatial details is crucial. Therefore, upsampling methods, including interpolation and transposed convolution, are employed to restore the resolution of feature maps. Encoder-decoder architectures, such as U-Net, effectively combine low-level and high-level features through skip connections, improving prediction accuracy.

Despite their success, CNNs still face challenges such as high computational cost and limited ability to model long-range dependencies, which motivates the development of more advanced architectures.

卷积神经网络（CNN）不仅广泛应用于图像分类任务，还被用于图像分割等密集预测任务。一个典型的CNN由卷积层、激活函数和池化层组成，这些结构能够逐步从输入图像中提取层次化特征。下采样操作（如最大池化和步长卷积）用于降低特征图的空间分辨率并扩大感受野，从而使模型能够捕捉更高层次的语义信息。

在语义分割等任务中，保留空间细节至关重要。因此，通常采用上采样方法（如插值和转置卷积）来恢复特征图的分辨率。像U-Net这样的编码器-解码器结构，通过跳跃连接融合低层和高层特征，从而提升预测精度。

尽管CNN取得了巨大成功，但其仍面临计算开销较大以及对长距离依赖建模能力有限等问题，这也推动了更先进模型结构的发展。