论文阅读---REALISE model

REALISE model:

1.utilizes multiple encoders to obtain the semantic ,phonetic , and graphic information to distinguish the similarities of Chinese characters and correct the spelling errors.

2.And then, develop a selective modality fusion module to obtain the context-aware multimodal representations.

3.Finally ,the output layer predict the probabilities of error corrections.

Encoders:

Semantic encoder:

BERT, which provides rich contextual word representation with the unsupervised pretraining on large corpora.

复制代码
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')

Tokenizer是一种文本处理工具,用于将文本分解成单个单词(称为tokens)或其他类型的单位,例如标点符号和数字。在自然语言处理领域,tokenizer通常用于将句子分解为单个单词或词元,以便进行文本分析和机器学习任务。常用的tokenizer包括基于规则的tokenizer和基于机器学习的tokenizer,其中基于机器学习的tokenizer可以自动识别单词和短语的边界,并将其分解为单个tokens。

Phonetic encoder

pinyin: initial(21)+final(39)+tone(5)

hierarchical phonetic encoder :character-level encoder and sentence-level encoder

Character-level encoder

GRU:

GRU(Gate Recurrent Unit)是循环神经网络(Recurrent Neural Network, RNN)的一种。和LSTM(Long-Short Term Memory)一样,也是为了解决长期记忆和反向传播中的梯度等问题而提出来的。

GRU和LSTM在很多情况下实际表现上相差无几,那么为什么我们要使用新人GRU(2014年提出)而不是相对经受了更多考验的LSTM(1997提出)呢。
我们在我们的实验中选择GRU是因为它的实验效果与LSTM相似,但是更易于计算。

Sentence-level Encoder: obtain the contextualized phonetic representation for each Chinese characters

4-layer Transformer with the same hidden size as the semantic encoder

because independent phonetic vectors are not distinguished in order, so we add the positional embeading to each vector. +pack the vector together ->transformer layers to calculate the contextualized representation in acoustic modality.

Graphic Encoder

ResNet

three fonds correpond to the three channels of the character images whose size is set to 32*32 pixel

Selective Modality Fusion Module

Ht, Ha,Hv ==textual ,acoustic,visual

fuse information i n different modalities

selective gate unit: select how much information flow to the mixed multimodal representation.

gate values :fully-connected layer followed by a sigmoid function.

Acoustic and Visual Pretraining

aims to learn the acoustic-textual and visual-textual relationships

phonetic encoder:input method pretraining objective

graphhic encoder:OCP pretraining objective

Data and Metrics

data:SIGHAN --->convert to simplified chinese by using the OPENCC tools

two level :detection and correction level to test the model

相关推荐
STLearner7 小时前
AI论文速读 | U-Cast:学习高维时间序列预测的层次结构
大数据·论文阅读·人工智能·深度学习·学习·机器学习·数据挖掘
youcans_12 小时前
【DeepSeek 论文精读】15. DeepSeek-V3.2:开拓开源大型语言模型新前沿
论文阅读·人工智能·语言模型·智能体·deepseek
m0_6501082414 小时前
Co-MTP:面向自动驾驶的多时间融合协同轨迹预测框架
论文阅读·人工智能·自动驾驶·双时间域融合·突破单车感知局限·帧间轨迹预测·异构图transformer
胆怯的ai萌新17 小时前
论文阅读《Audit Games with Multiple Defender Resources》
论文阅读
墨绿色的摆渡人18 小时前
论文笔记(一百零六)RynnVLA-002: A Unified Vision-Language-Action and World Model
论文阅读
提娜米苏18 小时前
[论文笔记] ASR is all you need: Cross-modal distillation for lip reading (2020)
论文阅读·深度学习·计算机视觉·语音识别·知识蒸馏·唇语识别
小殊小殊19 小时前
重磅!DeepSeek发布V3.2系列模型!
论文阅读·人工智能·算法
youcans_21 小时前
【youcans论文精读】U-Net:用于医学图像分割的 U型卷积神经网络
论文阅读·人工智能·计算机视觉·图像分割·unet
youcans_1 天前
【youcans论文精读】VM-UNet:面向医学图像分割的视觉 Mamba UNet 架构
论文阅读·人工智能·计算机视觉·图像分割·状态空间模型
DuHz1 天前
论文阅读——Edge Impulse:面向微型机器学习的MLOps平台
论文阅读·人工智能·物联网·算法·机器学习·edge·边缘计算