
论文网址:2412.07236
论文代码:github.com
英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用
目录
[1. 心得](#1. 心得)
[2. 论文逐段精读](#2. 论文逐段精读)
[2.1. Abstract](#2.1. Abstract)
[2.2. Introduction](#2.2. Introduction)
[2.3. Method](#2.3. Method)
[2.4. Experiments](#2.4. Experiments)
[2.4.1. Pre-training](#2.4.1. Pre-training)
[2.4.2. Experiment Setup of Downstream BCI Tasks](#2.4.2. Experiment Setup of Downstream BCI Tasks)
[2.4.3. Results](#2.4.3. Results)
[2.5. Conclusion](#2.5. Conclusion)
1. 心得
(1)沉默,是今晚的康桥。
(2)写英文不能当键盘侠,于是我决定写中文
2. 论文逐段精读
2.1. Abstract
①The spatial and temporal features of EEG signals are heterogeneous, so they need to be modelled independently
②They proposed CBraMod to solve the dependence and different EEG data formats problems
③Datasets: 12 public with 10 downstream tasks
criss adj. 漂亮的,时髦的 n. (Criss)(美)克里斯(人名)
criss-cross adj. 交错纵横的
2.2. Introduction
①Existing EEG processing methods:

②The authors state the correlation between channels and time points are different, thus global attention is not suitable for EEG signals
③CBraMod is pretrained on Temple University Hospital EEG Corpus (TUEG)
2.3. Method
①Overall framework:

(1)Patching & Masking
①Input EEG sample: with
channel and
timestamps
②Patch segmentation: for window length , they resize
to
with
patches of one channel and
③A representation of a patch:
④Total number of patches:
⑤Mask: with Bernoulli distribution of
propotion, and
is the mask indicator of
⑥Masked EEG patches:
where denotes mask token,
denotes remaining EEG patches
(2)Time-Frequency Patch Encoding
①Time domian processing: they use one-dimensional convolution layer, a group normalization layer, and a GELU activation function to process input to obtain time domain embedding
with
dimension
②Frequency-domain branch: they use fast Fourier transform (FFT) and a fully-connected layer to get frequency-domain embedding
③Embedding fusion:
where is patch embedding,
is the set of patch embeddings
(3)Asymmetric Conditional Positional Encoding
①ACPE: a convolution layer with kernel and
zero paddings (
)(作者觉得,因为是长方形的卷积块,就非对称了,还能同时关注到空间和位置信息= =|||。xd的解决方法真是......额,简单易懂呢)
②一个类似残差的结构,把喂给ACPE得到:
where and
, 然后把
和
加起来:
where
(4)Criss-Cross Transformer
①Pipeline of Criss-Cross Transformer Block:

上面的经过Layer Norm变成
。我好心的猜测作者可能是因为篇幅,导致了以下公式和解释不是特别详细。
②首先,作者把的按照前半通道和后半通道分成两组(我的通道指最后一维
,不是说电极通道
),在上面的路径中把前半通道组的每一列使用注意力:
把一开始被分开的每一列在分别应用注意力之后又合起来:
同理下面的路径,只是下面是对后半通道进行注意力。最后把列注意力和行注意力块拼起来:
拼起来之后这玩意儿是。(此时此刻我想知道这哪里交错纵横了只是单纯的叠在了一起感觉是俩完全不相关的东西甚至都不是两只手洗牌那种交叉进去的它的意义在哪里)
③为什么不提及一下的形状?它在上下文中为啥不出现一下

(5)Masked EEG Reconstruction and EEG reconstruction
①由全连接层组成的重建头?作者是学文学的吗。我真的会忍不住开炮的。我以后论文都应该叫由全连接层组成的检测器。
②通过全连接层变成最终预测
③作者真的巨爱写那种巨长的表示:



④MSE loss:
2.4. Experiments
2.4.1. Pre-training
(1)Pre-training Dataset
①Dataset: Temple University Hospital EEG corpus (TUEG)
②Data: 69,652 clinical EEG recordings from 14,987 subjects across 26,846 sessions, with a total duration of 27,062 hours
(2)Preprocessing
①Screening: remove records which the total duration are no more than 5 or absolute amplitude exceed 100 µV
②Cropping: the first and the last one minutes
③Electrode choosing: 19, including Fp1, Fp2, F7, F3, Fz, F4, F8, T3, C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2
④Band-pass filter: 0.3 Hz--75 Hz
⑤Notch filter: 60Hz
⑥Resampling: 200Hz
⑦Segmentation: 30s
⑧Norm: 100µV
⑨Remaining samples: 1109545
(3)Pre-training Settings
①Duration of patch: 1s with 200 data points
②Layer of Criss-Cross Transform Block: 12 with 200 hidden dimensions, 800 inner dimensions, 8-head
③Batch size: 128
④Optimizer: AdamW
⑤Learning rate: 5e-4
⑥Weight decay: 5e-2
2.4.2. Experiment Setup of Downstream BCI Tasks
①Statistics of datasets:

2.4.3. Results
①Emotion recognition performance:

②Motor Imagery Classification performance:

③Attention block ablation:

④Positional encoding ablation:

⑤Pre-training ablatrion:

where 1) w/o pre-training: directly training CBraMod on downstream datasets; 2) dirty pre-training: pre-training CBraMod on TUEG corpus without bad samples dropping. 3) clean pre-training: pre-training CBraMod on TUEG corpus with bad samples dropping.
2.5. Conclusion
~