(十)自然语言处理笔记——基于Bert的文本分类的项目

(十)自然语言处理笔记------基于Bert的文本分类的项目

  • 实验数据集介绍:
  • 实验1:基于Bert的文本分类项目
  • [实验2:基于Bert + TextCNN 的文本分类项目](#实验2:基于Bert + TextCNN 的文本分类项目)
  • [实验3:基于Bert + BiLSTM的文本分类项目](#实验3:基于Bert + BiLSTM的文本分类项目)
  • [实验4:基于Bert + RCNN的文本分类项目](#实验4:基于Bert + RCNN的文本分类项目)
  • [实验5:基于Bert + DPCNN的文本分类项目](#实验5:基于Bert + DPCNN的文本分类项目)

实验数据集介绍:

本博文采用的文本分类数据集主要包含七个类别,依次如下:

英文 中文
finance 财经 / 金融
realty 房地产
stocks 股票
education 教育
science 科学
society 社会
politics 政治
sports 体育
game 游戏
entertainment 娱乐

实验1:基于Bert的文本分类项目

模型结构:BERT + 全连接分类头

python 复制代码
#!/usr/bin/python
# -*- coding: UTF-8 -*-


import torch
import torch.nn as nn
from pytorch_pretrained import BertModel, BertTokenizer

class Config(object):
    """
    配置参数
    """
    def __init__(self, dataset):
        self.model_name = 'BruceBert'
        # 训练集
        self.train_path = dataset + '/data/train.txt'
        # 测试集
        self.test_path = dataset + '/data/test.txt'
        # 校验集
        self.dev_path = dataset + '/data/dev.txt'
        # dataset
        self.datasetpkl = dataset + '/data/dataset.pkl'
        # 类别
        self.class_list = [ x.strip() for x in open(dataset + '/data/class.txt').readlines()]
        #模型训练结果
        self.save_path = dataset + '/saved_dict/' + self.model_name + '.ckpt'
        # 设备配置
        self.device = torch.device( 'cuda' if torch.cuda.is_available() else 'cpu')

        # 若超过1000bacth效果还没有提升,提前结束训练
        self.require_improvement = 1000

        # 类别数
        self.num_classes = len(self.class_list)
        # epoch数
        self.num_epochs = 200   # 训练轮次
        # batch_size
        self.batch_size = 512    # 批次数目
        # 每句话处理的长度(短填,长切)
        self.pad_size = 32
        # 学习率
        self.learning_rate = 1e-5
        # bert预训练模型位置
        self.bert_path = 'bert_pretrain'
        # bert切词器
        self.tokenizer = BertTokenizer.from_pretrained(self.bert_path)
        # bert隐层层个数
        self.hidden_size = 768


class Model(nn.Module):

    def __init__(self, config):
        super(Model, self).__init__()
        self.bert = BertModel.from_pretrained(config.bert_path)
        for param in self.bert.parameters():
            param.requires_grad = True  # 是否进行微调,微调:true;不微调:false
        self.fc = nn.Linear(config.hidden_size, config.num_classes)

    def forward(self, x):
        # x [ids, seq_len, mask]
        context = x[0] #对应输入的句子 shape[128,32]   [batch_size,序列长度]
        mask = x[2] #对padding部分进行mask shape[128,32]
        _, pooled = self.bert(context, attention_mask=mask, output_all_encoded_layers=False) #shape [128,768]
        out = self.fc(pooled) # shape [128,10]
        return out
python 复制代码
Input IDs (batch_size, seq_len)
        │
        ├── Attention Mask (batch_size, seq_len)
        │
        ▼
┌─────────────────────────────┐
│        BERT Encoder          │
│  - Embedding + Transformer  │
│  - 输出隐藏状态 & CLS池化   │
└─────────────────────────────┘
        │
        ▼
   pooled_output (CLS)
   (batch_size, 768)
        │
        ▼
┌─────────────────────────────┐
│     Linear 全连接层          │
│   Linear(768 → num_classes) │
└─────────────────────────────┘
        │
        ▼
 logits (batch_size, num_classes)

测试结果:

python 复制代码
Test Loss: 0.32, Test Acc:90.42%
Precision, Recall and F1-Score
               precision    recall  f1-score   support

      finance     0.8613    0.9070    0.8836      1000
       realty     0.9127    0.9310    0.9218      1000
       stocks     0.8348    0.8390    0.8369      1000
    education     0.9603    0.9180    0.9387      1000
      science     0.8918    0.8080    0.8478      1000
      society     0.8974    0.9010    0.8992      1000
     politics     0.8900    0.8820    0.8860      1000
       sports     0.9623    0.9710    0.9667      1000
         game     0.9180    0.9520    0.9347      1000
entertainment     0.9156    0.9330    0.9242      1000

     accuracy                         0.9042     10000
    macro avg     0.9044    0.9042    0.9040     10000
 weighted avg     0.9044    0.9042    0.9040     10000

Confusion Maxtrix
[[907  15  49   3   5   4  10   2   0   5]
 [ 16 931  16   1   3  11   5   6   3   8]
 [ 73  28 839   0  26   2  21   4   5   2]
 [  6   3   1 918   8  26  17   2   5  14]
 [ 16  10  52   3 808  20  18   1  52  20]
 [  7  15   6  13   8 901  25   3  10  12]
 [ 19   7  28  10  14  23 882   3   3  11]
 [  1   2   3   1   2   3   5 971   1  11]
 [  0   3   9   3  23   2   3   2 952   3]
 [  8   6   2   4   9  12   5  15   6 933]]

实验2:基于Bert + TextCNN 的文本分类项目

TextCNN 结构详解:





模型结构:

python 复制代码
Input IDs (batch, seq_len)
Attention Mask
        │
        ▼
┌─────────────────────────────┐
│            BERT             │
│  输出 token embeddings      │
│  (batch, seq_len, 768)      │
└─────────────────────────────┘
        │
        ▼
unsqueeze(1)
(batch, 1, seq_len, 768)
        │
        ▼
┌───────────────┬───────────────┬───────────────┐
│ Conv2d k=2    │ Conv2d k=3    │ Conv2d k=4    │
│ (1→256)       │ (1→256)       │ (1→256)       │
└───────────────┴───────────────┴───────────────┘
        │
        ▼
ReLU + MaxPool1d
        │
        ▼
(每路输出: batch, 256)
        │
        ▼
Concatenate
(batch, 256×3 = 768)
        │
        ▼
Dropout
        │
        ▼
Linear → 分类
(batch, num_classes)
python 复制代码
#!/usr/bin/python
# -*- coding: UTF-8 -*-

import torch
import torch.nn as nn
import torch.nn.functional as F
from pytorch_pretrained import BertModel, BertTokenizer

class Config(object):

    """配置参数"""
    def __init__(self, dataset):
        # 模型名称
        self.model_name="BruceBertCNN"
        # 训练集
        self.train_path = dataset + '/data/train.txt'
        # 校验集
        self.dev_path = dataset + '/data/dev.txt'
        # 测试集
        self.test_path = dataset + '/data/test.txt'
        #dataset
        self.datasetpkl = dataset + '/data/dataset.pkl'
        # 类别名单
        self.class_list = [ x.strip() for x in open(dataset + '/data/class.txt').readlines()]

        #模型保存路径
        self.save_path = dataset + '/saved_dict/' + self.model_name + '.ckpt'
        # 运行设备
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        # 若超过1000bacth效果还没有提升,提前结束训练
        self.require_improvement= 1000
        # 类别数量
        self.num_classes = len(self.class_list)
        # epoch数
        self.num_epochs = 200
        # batch_size
        self.batch_size = 512
        # 序列长度
        self.pad_size = 32
        # 学习率
        self.learning_rate = 1e-5
        # 预训练位置
        self.bert_path = './bert_pretrain'
        # bert的 tokenizer
        self.tokenizer = BertTokenizer.from_pretrained((self.bert_path))
        # Bert的隐藏层数量
        self.hidden_size = 768
        # 卷积核尺寸
        self.filter_sizes = (2,3,4)
        # 卷积核数量
        self.num_filters = 256
        # droptout
        self.dropout = 0.5

class Model(nn.Module):
    def __init__(self, config):
        super(Model, self).__init__()
        self.bert = BertModel.from_pretrained(config.bert_path)
        for param in self.bert.parameters():
            param.requires_grad = True   # 是否进行微调:微调:True,不微调:False

        self.convs = nn.ModuleList(
             [nn.Conv2d(in_channels=1, out_channels=config.num_filters, kernel_size=(k, config.hidden_size)) for k in config.filter_sizes]
        )

        self.droptout = nn.Dropout(config.dropout)

        self.fc = nn.Linear(config.num_filters * len(config.filter_sizes), config.num_classes)   # 其其之前经过一次 torch.cat 操作

    def conv_and_pool(self, x, conv):
        x = conv(x)     # (批次数,1,32,768)---->(批次数,256,31,1)
        x = F.relu(x)
        x = x.squeeze(3)    #  (批次数,256,31,1) ---->(批次数,256,31)
        size = x.size(2)   # torch.Size([512, 256, 31])   size=31
        x = F.max_pool1d(x, size)    # torch.Size([512, 256, 31])------> torch.Size([512, 256, 1])
        x = x.squeeze(2)     # torch.Size([512, 256])
        return x


    def forward(self, x):
        # x [ids, seq_len, mask]
        context = x[0] #对应输入的句子 shape[128,32]
        mask = x[2] #对padding部分进行mask shape[128,32]
        encoder_out, pooled = self.bert(context, attention_mask = mask, output_all_encoded_layers = False) #shape [128,768]
        out = encoder_out.unsqueeze(1)  # 增加了一个通道的维度  shape由【批次数,序列长度,嵌入维度数】变为【批次数,通道数(1),序列长度,嵌入维度数】
        out = torch.cat([self.conv_and_pool(out, conv)for conv in self.convs], 1)   # 经过三次拼接后 shape变为torch.Size([512, 768])
        out = self.droptout(out)
        out = self.fc(out)
        return out   # 【批次数,类别数】




"""

nn.Conv2d(
    in_channels=1,
    out_channels=config.num_filters,   # 例如 256
    kernel_size=(k, config.hidden_size) # (k, 768)
)


假设:
config.hidden_size = 768(BERT-base)
k 是某个 filter size,比如 k = 2, 3, 4
stride = 1(默认)
padding = 0(默认)
dilation = 1(默认)

| k | 输出高度 |
| - | ---- |
| 2 | 31   |
| 3 | 30   |
| 4 | 29   |

你注释里的 31,说明这一层用的是:
k=2

归纳一下,就是二维卷积输出计算公式
nn.Conv2d(in_channels=1, out_channels=config.num_filters, kernel_size=(k, config.hidden_size)) 
"""
python 复制代码
Test Loss:  0.3, Test Acc:90.34%
Precision, Recall and F1-Score
               precision    recall  f1-score   support

      finance     0.9180    0.8620    0.8891      1000
       realty     0.9083    0.9310    0.9195      1000
       stocks     0.8429    0.8260    0.8343      1000
    education     0.9464    0.9360    0.9412      1000
      science     0.8457    0.8440    0.8448      1000
      society     0.8701    0.9240    0.8962      1000
     politics     0.8866    0.8760    0.8813      1000
       sports     0.9836    0.9620    0.9727      1000
         game     0.9222    0.9360    0.9290      1000
entertainment     0.9133    0.9370    0.9250      1000

     accuracy                         0.9034     10000
    macro avg     0.9037    0.9034    0.9033     10000
 weighted avg     0.9037    0.9034    0.9033     10000

Confusion Maxtrix
[[862  21  68   5   9  12  14   1   3   5]
 [ 10 931  11   4   8  15   5   1   4  11]
 [ 47  29 826   1  51   7  29   3   4   3]
 [  1   1   1 936   7  25  12   1   3  13]
 [  2   9  40   6 844  23  15   1  40  20]
 [  1  11   1  13   8 924  19   0  10  13]
 [  9  12  27  13  17  28 876   1   5  12]
 [  1   2   2   3   5   7  10 962   0   8]
 [  2   4   3   4  37   5   3   2 936   4]
 [  4   5   1   4  12  16   5   6  10 937]]
使用时间: 0:00:05
Test Loss:  0.3, Test Acc:90.34%

实验3:基于Bert + BiLSTM的文本分类项目

python 复制代码
Input IDs (batch, seq_len)
Attention Mask
        │
        ▼
┌──────────────────────────────┐
│             BERT             │
│ token embeddings             │
│ (batch, seq_len, 768)        │
└──────────────────────────────┘
        │
        ▼
┌──────────────────────────────┐
│        BiLSTM (2 layers)     │
│ hidden_size = 256 × 2        │
│ (batch, seq_len, 512)        │
└──────────────────────────────┘
        │
        ▼
Take last time step
(batch, 512)
        │
        ▼
Dropout
        │
        ▼
Linear → 分类
(batch, num_classes)
python 复制代码
#!/usr/bin/python
# -*- coding: UTF-8 -*-


import torch
import torch.nn as nn
from pytorch_pretrained import BertModel, BertTokenizer

class Config(object):
    """配置参数"""
    def __init__(self, dataset):
        # 模型名称
        self.model_name = "BruceBertRNN"
        # 训练集
        self.train_path = dataset + '/data/train.txt'
        # 校验集
        self.dev_path = dataset + '/data/dev.txt'
        # 测试集
        self.test_path = dataset + '/data/test.txt'
        # dataset
        self.datasetpkl = dataset + '/data/dataset.pkl'
        # 类别名单
        self.class_list = [x.strip() for x in open(dataset + '/data/class.txt').readlines()]

        # 模型保存路径
        self.save_path = dataset + '/saved_dict/' + self.model_name + '.ckpt'
        # 运行设备
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        # 若超过1000bacth效果还没有提升,提前结束训练
        self.require_improvement = 1000
        # 类别数量
        self.num_classes = len(self.class_list)
        # epoch数
        self.num_epochs = 200
        # batch_size
        self.batch_size = 128
        # 序列长度
        self.pad_size = 32
        # 学习率
        self.learning_rate = 1e-5
        # 预训练位置
        self.bert_path = './bert_pretrain'
        # bert的 tokenizer
        self.tokenizer = BertTokenizer.from_pretrained((self.bert_path))
        # Bert的隐藏层数量
        self.hidden_size = 768
        # RNN隐层层数量
        self.rnn_hidden = 256
        # rnn数量
        self.num_layers = 2
        # droptout
        self.dropout = 0.5

class Model(nn.Module):

    def __init__(self, config):
        super(Model, self).__init__()
        self.bert = BertModel.from_pretrained(config.bert_path)
        for param in self.bert.parameters():
            param.requires_grad = True

        self.lstm = nn.LSTM(config.hidden_size, config.rnn_hidden, config.num_layers, batch_first=True, dropout=config.dropout, bidirectional=True)  # nn.LSTM(输入特征维度,输出特征维度,隐藏层数)

        self.dropout = nn.Dropout(config.dropout)

        self.fc = nn.Linear(config.rnn_hidden * 2, config.num_classes)  # config.rnn_hidden * 2:因为是双向的LSTM

    def forward(self,x):
        # x [ids, seq_len, mask]
        context = x[0]  # 对应输入的句子 shape[128,32]
        mask = x[2]  # 对padding部分进行mask shape[128,32]
        encoder_out, text_cls = self.bert(context, attention_mask = mask, output_all_encoded_layers = False)
        out, _ = self.lstm(encoder_out)
        out = self.dropout(out)
        out = out[:,-1,:]  # 是提取每个样本序列最后一个时间步的输出   【批次,序列长度,数据维度】
        out = self.fc(out)
        return out

实验4:基于Bert + RCNN的文本分类项目

BERT + BiLSTM + MaxPool(RCNN 思想) 的文本分类模型。它融合了 上下文建模(BERT)+ 序列建模(RNN)+ 全局池化(CNN/RCNN)

python 复制代码
Input IDs + Attention Mask
        │
        ▼
┌──────────────────────────────┐
│             BERT             │
│ (batch, seq_len, 768)        │
└──────────────────────────────┘
        │
        ▼
┌──────────────────────────────┐
│        BiLSTM (2 layers)     │
│ (batch, seq_len, 512)        │
└──────────────────────────────┘
        │
        ▼
ReLU
(batch, seq_len, 512)
        │
        ▼
Permute → (batch, 512, seq_len)
        │
        ▼
MaxPool1d (kernel = seq_len)
(batch, 512, 1)
        │
        ▼
Squeeze
(batch, 512)
        │
        ▼
Linear → 分类
(batch, num_classes)
python 复制代码
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# 微信公众号 AI壹号堂 欢迎关注
# Author 杨博

import torch
import torch.nn as nn
import torch.nn.functional as F
from pytorch_pretrained import BertModel, BertTokenizer

class Config(object):
    """配置参数"""
    def __init__(self, dataset):
        # 模型名称
        self.model_name = "BruceBertRCNN"
        # 训练集
        self.train_path = dataset + '/data/train.txt'
        # 校验集
        self.dev_path = dataset + '/data/dev.txt'
        # 测试集
        self.test_path = dataset + '/data/test.txt'
        # dataset
        self.datasetpkl = dataset + '/data/dataset.pkl'
        # 类别名单
        self.class_list = [x.strip() for x in open(dataset + '/data/class.txt').readlines()]

        # 模型保存路径
        self.save_path = dataset + '/saved_dict/' + self.model_name + '.ckpt'
        # 运行设备
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        # 若超过1000bacth效果还没有提升,提前结束训练
        self.require_improvement = 1000
        # 类别数量
        self.num_classes = len(self.class_list)
        # epoch数
        self.num_epochs = 200
        # batch_size
        self.batch_size = 512
        # 序列长度
        self.pad_size = 32
        # 学习率
        self.learning_rate = 1e-5
        # 预训练位置
        self.bert_path = './bert_pretrain'
        # bert的 tokenizer
        self.tokenizer = BertTokenizer.from_pretrained((self.bert_path))
        # Bert的隐藏层数量
        self.hidden_size = 768
        # RNN隐层层数量
        self.rnn_hidden = 256
        # rnn数量
        self.num_layers = 2
        # droptout
        self.dropout = 0.5

class Model(nn.Module):
    def __init__(self, config):
        super(Model, self).__init__()
        self.bert = BertModel.from_pretrained(config.bert_path)
        for param in self.bert.parameters():
            param.requires_grad = True
        self.lstm = nn.LSTM(config.hidden_size, config.rnn_hidden, config.num_layers, bidirectional=True, batch_first=True, dropout=config.dropout)
        self.maxpool = nn.MaxPool1d(config.pad_size)
        self.fc = nn.Linear(config.rnn_hidden*2, config.num_classes)


    def forward(self, x):
        context = x[0]
        mask = x[2]
        encoder_out, text_cls = self.bert(context, attention_mask = mask, output_all_encoded_layers=False)
        out, _ = self.lstm(encoder_out)
        out = F.relu(out)
        out = out.permute(0,2,1)
        out = self.maxpool(out)
        out = out.squeeze()
        out = self.fc(out)
        return out

实验5:基于Bert + DPCNN的文本分类项目

python 复制代码
输入文本 token ids [batch, seq_len]
         │
         ▼
     BERT 编码层
  encoder_out: [batch, seq_len, hidden_size=768]
         │
         ▼
   unsqueeze -> [batch, 1, seq_len, hidden_size]
         │
         ▼
   conv_region (3xhidden_size, out_channels=250)
   输出: [batch, 250, seq_len-3+1, 1]
         │
         ▼
     padd1 + ReLU
         │
         ▼
       conv (3x1)
         │
         ▼
     padd1 + ReLU
         │
         ▼
       conv (3x1)
         │
         ▼
       while seq_len > 2:
           _block:
               ┌───────────────┐
               │ padding       │
               │ max_pool (3x1)│
               │ conv → ReLU   │
               │ conv → ReLU   │
               │ + 残差连接(px) │
               └───────────────┘
         │
         ▼
      squeeze -> [batch, num_filters=250]
         │
         ▼
      全连接层 fc
         │
         ▼
      输出 logits [batch, num_classes]
python 复制代码
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# 微信公众号 AI壹号堂 欢迎关注
# Author 杨博

import torch
import torch.nn as nn
import torch.nn.functional as F
from pytorch_pretrained import BertModel, BertTokenizer

class Config(object):
    """配置参数"""
    def __init__(self, dataset):
        # 模型名称
        self.model_name = "BruceBertDPCNN"
        # 训练集
        self.train_path = dataset + '/data/train.txt'
        # 校验集
        self.dev_path = dataset + '/data/dev.txt'
        # 测试集
        self.test_path = dataset + '/data/test.txt'
        # dataset
        self.datasetpkl = dataset + '/data/dataset.pkl'
        # 类别名单
        self.class_list = [x.strip() for x in open(dataset + '/data/class.txt').readlines()]

        # 模型保存路径
        self.save_path = dataset + '/saved_dict/' + self.model_name + '.ckpt'
        # 运行设备
        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        # 若超过1000bacth效果还没有提升,提前结束训练
        self.require_improvement = 1000
        # 类别数量
        self.num_classes = len(self.class_list)
        # epoch数
        self.num_epochs = 3
        # batch_size
        self.batch_size = 128
        # 序列长度
        self.pad_size = 32
        # 学习率
        self.learning_rate = 1e-5
        # 预训练位置
        self.bert_path = './bert_pretrain'
        # bert的 tokenizer
        self.tokenizer = BertTokenizer.from_pretrained((self.bert_path))
        # Bert的隐藏层数量
        self.hidden_size = 768
        # RNN隐层层数量
        self.rnn_hidden = 256
        # 卷积核数量
        self.num_filters = 250
        # droptout
        self.dropout = 0.5


class Model(nn.Module):
    def __init__(self, config):
        super(Model, self).__init__()
        self.bert = BertModel.from_pretrained(config.bert_path)
        for param in self.bert.parameters():
            param.requires_grad = True

        self.conv_region = nn.Conv2d(1, config.num_filters, (3, config.hidden_size))

        self.conv = nn.Conv2d(config.num_filters, config.num_filters, (3, 1))

        self.max_pool = nn.MaxPool2d(kernel_size=(3,1), stride=2)

        self.padd1 = nn.ZeroPad2d((0,0,1,1))
        self.padd2 = nn.ZeroPad2d((0,0,0,1))
        self.relu = nn.ReLU()

        self.fc = nn.Linear(config.num_filters, config.num_classes)

    def forward(self, x):
        context = x[0]
        mask = x[2]
        encoder_out, text_cls = self.bert(context, attention_mask = mask, output_all_encoded_layers = False)
        out = encoder_out.unsqueeze(1) #[batch_size, 1, seq_len, embed]
        out = self.conv_region(out) #[batch_size, 250, seq_len-3+1, 1]

        out = self.padd1(out) #[batch_size, 250, seq_len,1]
        out = self.relu(out)
        out = self.conv(out) #[batch_size, 250, seq_len-3+1,1]
        out = self.padd1(out)  # [batch_size, 250, seq_len,1]
        out = self.relu(out)
        out = self.conv(out)  # [batch_size, 250, seq_len-3+1,1]
        while out.size()[2] > 2:
            out = self._block(out)
        out = out.squeeze()
        out = self.fc(out)
        return out

    def _block(self, x):
        x = self.padd2(x)
        px = self.max_pool(x)
        x = self.padd1(px)
        x = self.relu(x)
        x = self.conv(x)
        x = self.padd1(x)
        x = self.relu(x)
        x = self.conv(x)
        x = x + px
        return x
相关推荐
java1234_小锋4 天前
【AI大模型舆情分析】微博舆情分析可视化系统(pytorch2+基于BERT大模型训练微调+flask+pandas+echarts) 实战(下)
人工智能·flask·bert·ai大模型
海绵宝宝de派小星4 天前
文本表示方法演进(词袋模型→Word2Vec→BERT)
人工智能·ai·bert·word2vec
java1234_小锋5 天前
【AI大模型舆情分析】微博舆情分析可视化系统(pytorch2+基于BERT大模型训练微调+flask+pandas+echarts) 实战(上)
人工智能·flask·大模型·bert
副露のmagic6 天前
草履虫级 Transformer code by hand
深度学习·bert·transformer
薛定谔的猫19826 天前
十三.调用 BERT 中文文本情感分析交互式推理模型训练好的
人工智能·深度学习·bert
薛定谔的猫19826 天前
十二、基于 BERT 的中文文本二分类模型测试实战:从数据加载到准确率评估
人工智能·分类·bert
薛定谔的猫19826 天前
十四、基于 BERT 的微博评论情感分析模型训练实践
人工智能·深度学习·bert
minhuan7 天前
大模型应用:批量文档摘要与分类实践:本地合同、报告数据处理与导出.70
文本分类·大模型应用·批量文档摘要提取·textsplitter·schema提示词
薛定谔的猫19827 天前
十一、基于 BERT 的中文文本情感分类模型训练全解析
人工智能·深度学习·自然语言处理·分类·bert·大模型 训练 调优
阿龙AI日记8 天前
快速学会BERT模型!
深度学习·自然语言处理·nlp·bert