目录
[1. 项目需求](#1. 项目需求)
[2. 模型解析](#2. 模型解析)
[2.1 网络模型](#2.1 网络模型)
[2.2 准备数据](#2.2 准备数据)
[2.3 双向循环神经网络](#2.3 双向循环神经网络)
[3. 代码解析](#3. 代码解析)
[4. 完整代码](#4. 完整代码)
[5. 结果](#5. 结果)
1. 项目需求
对名字的分类,几千个名字,总共来自于18个国家

2. 模型解析
对于自然语言处理来说,输入是一个序列, 需要编码成one-hot的形式
由于其是一个高维的稀疏的向量,所以通常经过embed层变成稀疏的稠密的向量
经过RNN循环神经网络后,对其分类

但是对本文的任务,输出的o1、o2我们不在乎,因为这是序列的分类形式,而不是对自然语言的序列进行字词的重组等
所以网络可以进行优化

网络的结构:

2.1 网络模型
输入虽然是单个名字,但是因为名字的长短不一样,因此要进行处理

2.2 准备数据
这里通过ASCII变成字符序列

因为数据的长短不一,要进行padding

分类的类别索引:18类别

2.3 双向循环神经网络
一般的RNN,Xn-1 只是和之前的信息有关

双向的循环神经网络就是反向来一次传播,把h的结果进行concat拼接

3. 代码解析
本文实现了一个基于双向GRU的RNN模型,用于对名字进行国家分类。
以下是详细的模块解析和功能说明:
1. 环境配置与设备检测
-
环境变量 :
KMP_DUPLICATE_LIB_OK
用于避免OpenMP库重复加载的冲突。 -
设备选择:自动检测GPU(CUDA)是否可用,优先使用GPU加速计算。
2. 模型定义(RNNClassifier)
class RNNClassifier(torch.nn.Module):
def __init__(self, input_size, hidden_size, output_size, n_layers=1, bidirectional=True):
super(RNNClassifier, self).__init__()
self.hidden_size = hidden_size
self.n_layers = n_layers
self.n_directions = 2 if bidirectional else 1
self.embedding = torch.nn.Embedding(input_size, hidden_size)
self.gru = torch.nn.GRU(hidden_size, hidden_size, n_layers,
bidirectional=bidirectional)
self.fc = torch.nn.Linear(hidden_size * self.n_directions, output_size)
def _init_hidden(self, batch_size):
return torch.zeros(self.n_layers * self.n_directions,
batch_size,
self.hidden_size).to(device)
def forward(self, input, seq_lengths):
# 确保lengths在CPU上
seq_lengths = seq_lengths.cpu() # 关键修复
input = input.t()
batch_size = input.size(1)
hidden = self._init_hidden(batch_size)
embedding = self.embedding(input)
# 打包序列(lengths必须在CPU)
packed = pack_padded_sequence(embedding, seq_lengths)
output, hidden = self.gru(packed, hidden)
# 处理双向输出
if self.n_directions == 2:
hidden = torch.cat([hidden[-1], hidden[-2]], dim=1)
else:
hidden = hidden[-1]
return self.fc(hidden)
-
结构:
-
Embedding层 :将输入的ASCII字符编码映射到隐藏空间(
input_size=128
对应ASCII字符数)。 -
GRU层 :支持双向设置(
bidirectional=True
),通过n_layers
指定层数。 -
全连接层:将GRU的最终隐藏状态映射到国家分类的输出维度。
-
-
核心逻辑:
-
前向传播 :输入序列通过Embedding层后,使用
pack_padded_sequence
处理变长序列,提升计算效率。GRU的输出经过双向拼接(如果是双向),最终通过全连接层生成分类结果。 -
隐藏状态初始化 :初始隐藏状态为全零张量,形状为
(n_layers * n_directions, batch_size, hidden_size)
。
-
3. 数据加载与处理(NameDataset)
class NameDataset(Dataset):
def __init__(self, is_train_set=True):
filename = './data/names_train.csv' if is_train_set else './data/names_test.csv'
with open(filename, 'r') as f:
lines = f.read().splitlines()
self.names = [line.split(',')[0] for line in lines]
self.countries = [line.split(',')[1] for line in lines]
self.country_list = sorted(set(self.countries))
self.country_dict = {c: i for i, c in enumerate(self.country_list)}
self.n_countries = len(self.country_list)
def __getitem__(self, index):
return self.names[index], self.country_dict[self.countries[index]]
def __len__(self):
return len(self.names)
def getCountriesNum(self):
return self.n_countries
def idx2country(self, index):
return self.country_list[index]
-
数据格式 :从CSV文件加载数据,格式为
名字,国家
(如"John,USA"
)。 -
功能:
-
国家编码 :将国家名称转换为唯一的整数索引(通过
country_dict
)。 -
数据集接口 :继承
Dataset
类,实现__getitem__
和__len__
方法,支持PyTorch的DataLoader
。 -
辅助方法 :
getCountriesNum
获取国家数量,idx2country
通过索引反向查询国家名称。
-
4. 数据预处理(name2list与make_tensors)
-
字符编码 :
name2list
将名字转换为ASCII码列表(如"John"
→[74, 111, 104, 110]
)并记录长度。 -
张量生成 :
make_tensors
将数据填充为等长张量,并按序列长度降序排列(优化pack_padded_sequence
性能):-
填充:短序列补零,长序列截断。
-
设备分配 :数据张量(
seq_tensor
和countries
)移动到指定设备(GPU/CPU),lengths
保留在CPU(因pack_padded_sequence
要求)。
-
5. 训练与测试流程
def train():
total_loss = 0
for i, (names, countries) in enumerate(train_loader, 1):
inputs, lengths, labels = make_tensors(names, countries)
outputs = model(inputs, lengths)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
if i % 10 == 0:
print(f'[{time_since(start)}] Epoch {epoch} '
f'[{i * len(inputs)}/{len(train_set)}] '
f'loss={total_loss / (i * len(inputs)):.4f}')
return total_loss / len(train_set)
-
训练函数(train):
-
前向计算:输入数据通过模型,计算交叉熵损失。
-
反向传播:优化器(Adam)更新参数,每10个batch输出平均损失。
-
损失计算:累积损失除以总样本数,确保不同batch大小的可比性。
-
-
测试函数(test):
-
推理模式:禁用梯度计算,计算测试集准确率。
-
结果输出:打印正确样本数和总准确率。
def test():
correct = 0
with torch.no_grad():
for names, countries in test_loader:
inputs, lengths, labels = make_tensors(names, countries)
outputs = model(inputs, lengths)
pred = outputs.argmax(dim=1)
correct += (pred == labels).sum().item()
acc = correct / len(test_set)
print(f'Test Accuracy: {correct}/{len(test_set)} ({acc:.2%})\n')
return acc
-
6. 主程序与超参数
-
超参数:
-
HIDDEN_SIZE=128
:GRU隐藏层维度。 -
BATCH_SIZE=256
:批量大小。 -
N_LAYERS=2
:GRU层数。 -
N_EPOCHS=50
:训练轮次。 -
N_CHARS=128
:输入字符的ASCII码范围(0-127)。
-
-
训练循环:
-
初始化模型、损失函数(交叉熵)和优化器(学习率0.001)。
-
每个epoch结束后在测试集上评估准确率,记录到
acc_history
。
-
-
结果可视化 :使用
matplotlib
绘制准确率随epoch变化的曲线。
7. 关键实现细节
-
变长序列处理 :通过
pack_padded_sequence
压缩填充后的序列,避免无效计算。 -
双向GRU输出拼接 :双向GRU的最终隐藏状态是前向和后向的拼接(
hidden[-1]
和hidden[-2]
)。 -
设备管理 :数据张量(输入、标签)和模型参数需在同一设备(GPU/CPU),但
lengths
必须保留在CPU。
8. 潜在优化点
-
学习率调整 :可引入学习率调度器(如
ReduceLROnPlateau
)提升收敛性。 -
早停机制:根据验证集准确率提前终止训练,防止过拟合。
-
字符嵌入维度 :调整Embedding层的输出维度(
hidden_size
)可能影响模型表达能力。 -
数据增强:对名字进行扰动(如增删字符)提升泛化性。
9. 代码执行流程
-
加载训练集和测试集。
-
初始化模型并移至GPU(若可用)。
-
训练50个epoch,每个epoch结束后测试准确率。
-
绘制准确率变化曲线,观察模型性能。
该代码完整实现了从数据加载、模型定义到训练测试的全流程,适用于基于字符级别的短文本分类任务(如名字国籍分类)。通过调整超参数和模型结构,可适配其他类似场景。
4. 完整代码
完整代码:
python
import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE' # 允许重复加载OpenMP库
import torch
import time
import matplotlib.pyplot as plt
import math
from torch.utils.data import DataLoader, Dataset
from torch.nn.utils.rnn import pack_padded_sequence
# 设备检测
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
class RNNClassifier(torch.nn.Module):
def __init__(self, input_size, hidden_size, output_size, n_layers=1, bidirectional=True):
super(RNNClassifier, self).__init__()
self.hidden_size = hidden_size
self.n_layers = n_layers
self.n_directions = 2 if bidirectional else 1
self.embedding = torch.nn.Embedding(input_size, hidden_size)
self.gru = torch.nn.GRU(hidden_size, hidden_size, n_layers,
bidirectional=bidirectional)
self.fc = torch.nn.Linear(hidden_size * self.n_directions, output_size)
def _init_hidden(self, batch_size):
return torch.zeros(self.n_layers * self.n_directions,
batch_size,
self.hidden_size).to(device)
def forward(self, input, seq_lengths):
# 确保lengths在CPU上
seq_lengths = seq_lengths.cpu() # 关键修复
input = input.t()
batch_size = input.size(1)
hidden = self._init_hidden(batch_size)
embedding = self.embedding(input)
# 打包序列(lengths必须在CPU)
packed = pack_padded_sequence(embedding, seq_lengths)
output, hidden = self.gru(packed, hidden)
# 处理双向输出
if self.n_directions == 2:
hidden = torch.cat([hidden[-1], hidden[-2]], dim=1)
else:
hidden = hidden[-1]
return self.fc(hidden)
class NameDataset(Dataset):
def __init__(self, is_train_set=True):
filename = './data/names_train.csv' if is_train_set else './data/names_test.csv'
with open(filename, 'r') as f:
lines = f.read().splitlines()
self.names = [line.split(',')[0] for line in lines]
self.countries = [line.split(',')[1] for line in lines]
self.country_list = sorted(set(self.countries))
self.country_dict = {c: i for i, c in enumerate(self.country_list)}
self.n_countries = len(self.country_list)
def __getitem__(self, index):
return self.names[index], self.country_dict[self.countries[index]]
def __len__(self):
return len(self.names)
def getCountriesNum(self):
return self.n_countries
def idx2country(self, index):
return self.country_list[index]
def name2list(name):
return [ord(c) for c in name], len(name)
def make_tensors(names, countries):
# 生成序列和长度
sequences_and_lengths = [name2list(name) for name in names]
sequences = [s[0] for s in sequences_and_lengths]
lengths = torch.LongTensor([s[1] for s in sequences_and_lengths])
countries = torch.LongTensor(countries)
# 创建填充张量
seq_tensor = torch.zeros(len(sequences), lengths.max()).long()
for idx, (seq, length) in enumerate(zip(sequences, lengths)):
seq_tensor[idx, :length] = torch.LongTensor(seq)
# 按长度排序(保持lengths在CPU)
lengths, perm_idx = lengths.sort(dim=0, descending=True)
seq_tensor = seq_tensor[perm_idx].to(device) # 数据到GPU
countries = countries[perm_idx].to(device) # 数据到GPU
return seq_tensor, lengths, countries # lengths保留在CPU
def time_since(since):
s = time.time() - since
m = math.floor(s / 60)
s -= m * 60
return f'{m}m {s:.0f}s'
def train():
total_loss = 0
for i, (names, countries) in enumerate(train_loader, 1):
inputs, lengths, labels = make_tensors(names, countries)
outputs = model(inputs, lengths)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
total_loss += loss.item()
if i % 10 == 0:
print(f'[{time_since(start)}] Epoch {epoch} '
f'[{i * len(inputs)}/{len(train_set)}] '
f'loss={total_loss / (i * len(inputs)):.4f}')
return total_loss / len(train_set)
def test():
correct = 0
with torch.no_grad():
for names, countries in test_loader:
inputs, lengths, labels = make_tensors(names, countries)
outputs = model(inputs, lengths)
pred = outputs.argmax(dim=1)
correct += (pred == labels).sum().item()
acc = correct / len(test_set)
print(f'Test Accuracy: {correct}/{len(test_set)} ({acc:.2%})\n')
return acc
if __name__ == '__main__':
# 超参数
HIDDEN_SIZE = 128
BATCH_SIZE = 256
N_LAYERS = 2
N_EPOCHS = 50
N_CHARS = 128 # ASCII字符数
# 数据加载
train_set = NameDataset(is_train_set=True)
train_loader = DataLoader(train_set, batch_size=BATCH_SIZE, shuffle=True)
test_set = NameDataset(is_train_set=False)
test_loader = DataLoader(test_set, batch_size=BATCH_SIZE)
N_COUNTRIES = train_set.getCountriesNum()
# 模型初始化
model = RNNClassifier(N_CHARS, HIDDEN_SIZE, N_COUNTRIES, N_LAYERS).to(device)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# 训练循环
start = time.time()
acc_history = []
for epoch in range(1, N_EPOCHS + 1):
print(f"=== Epoch {epoch}/{N_EPOCHS} ===")
train_loss = train()
val_acc = test()
acc_history.append(val_acc)
# 结果可视化
plt.plot(acc_history)
plt.title('Training Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.show()
5. 结果
下载:基于RNN循环神经网络实现的时间序列英文name国家分类资源-CSDN文库
如下:

日志:
Using device: cuda
=== Epoch 1/50 ===
0m 0s\] Epoch 1 \[2560/13374\] loss=0.0082 \[0m 0s\] Epoch 1 \[5120/13374\] loss=0.0073 \[0m 1s\] Epoch 1 \[7680/13374\] loss=0.0067 \[0m 1s\] Epoch 1 \[10240/13374\] loss=0.0063 \[0m 1s\] Epoch 1 \[12800/13374\] loss=0.0059 Test Accuracy: 4524/6700 (67.52%) === Epoch 2/50 === \[0m 1s\] Epoch 2 \[2560/13374\] loss=0.0043 \[0m 1s\] Epoch 2 \[5120/13374\] loss=0.0040 \[0m 2s\] Epoch 2 \[7680/13374\] loss=0.0039 \[0m 2s\] Epoch 2 \[10240/13374\] loss=0.0038 \[0m 2s\] Epoch 2 \[12800/13374\] loss=0.0037 Test Accuracy: 4995/6700 (74.55%) === Epoch 3/50 === \[0m 2s\] Epoch 3 \[2560/13374\] loss=0.0030 \[0m 2s\] Epoch 3 \[5120/13374\] loss=0.0029 \[0m 2s\] Epoch 3 \[7680/13374\] loss=0.0029 \[0m 3s\] Epoch 3 \[10240/13374\] loss=0.0029 \[0m 3s\] Epoch 3 \[12800/13374\] loss=0.0028 Test Accuracy: 5261/6700 (78.52%) === Epoch 4/50 === \[0m 3s\] Epoch 4 \[2560/13374\] loss=0.0025 \[0m 3s\] Epoch 4 \[5120/13374\] loss=0.0025 \[0m 3s\] Epoch 4 \[7680/13374\] loss=0.0025 \[0m 4s\] Epoch 4 \[10240/13374\] loss=0.0024 \[0m 4s\] Epoch 4 \[12800/13374\] loss=0.0024 Test Accuracy: 5435/6700 (81.12%) === Epoch 5/50 === \[0m 4s\] Epoch 5 \[2560/13374\] loss=0.0020 \[0m 4s\] Epoch 5 \[5120/13374\] loss=0.0021 \[0m 4s\] Epoch 5 \[7680/13374\] loss=0.0021 \[0m 4s\] Epoch 5 \[10240/13374\] loss=0.0021 \[0m 5s\] Epoch 5 \[12800/13374\] loss=0.0020 Test Accuracy: 5544/6700 (82.75%) === Epoch 6/50 === \[0m 5s\] Epoch 6 \[2560/13374\] loss=0.0018 \[0m 5s\] Epoch 6 \[5120/13374\] loss=0.0019 \[0m 5s\] Epoch 6 \[7680/13374\] loss=0.0018 \[0m 5s\] Epoch 6 \[10240/13374\] loss=0.0018 \[0m 6s\] Epoch 6 \[12800/13374\] loss=0.0018 Test Accuracy: 5578/6700 (83.25%) === Epoch 7/50 === \[0m 6s\] Epoch 7 \[2560/13374\] loss=0.0015 \[0m 6s\] Epoch 7 \[5120/13374\] loss=0.0016 \[0m 6s\] Epoch 7 \[7680/13374\] loss=0.0016 \[0m 6s\] Epoch 7 \[10240/13374\] loss=0.0016 \[0m 7s\] Epoch 7 \[12800/13374\] loss=0.0016 Test Accuracy: 5600/6700 (83.58%) === Epoch 8/50 === \[0m 7s\] Epoch 8 \[2560/13374\] loss=0.0013 \[0m 7s\] Epoch 8 \[5120/13374\] loss=0.0014 \[0m 7s\] Epoch 8 \[7680/13374\] loss=0.0014 \[0m 8s\] Epoch 8 \[10240/13374\] loss=0.0014 \[0m 8s\] Epoch 8 \[12800/13374\] loss=0.0014 Test Accuracy: 5565/6700 (83.06%) === Epoch 9/50 === \[0m 8s\] Epoch 9 \[2560/13374\] loss=0.0012 \[0m 8s\] Epoch 9 \[5120/13374\] loss=0.0012 \[0m 8s\] Epoch 9 \[7680/13374\] loss=0.0012 \[0m 9s\] Epoch 9 \[10240/13374\] loss=0.0013 \[0m 9s\] Epoch 9 \[12800/13374\] loss=0.0013 Test Accuracy: 5611/6700 (83.75%) === Epoch 10/50 === \[0m 9s\] Epoch 10 \[2560/13374\] loss=0.0011 \[0m 9s\] Epoch 10 \[5120/13374\] loss=0.0011 \[0m 9s\] Epoch 10 \[7680/13374\] loss=0.0011 \[0m 10s\] Epoch 10 \[10240/13374\] loss=0.0011 \[0m 10s\] Epoch 10 \[12800/13374\] loss=0.0011 Test Accuracy: 5650/6700 (84.33%) === Epoch 11/50 === \[0m 10s\] Epoch 11 \[2560/13374\] loss=0.0010 \[0m 10s\] Epoch 11 \[5120/13374\] loss=0.0010 \[0m 11s\] Epoch 11 \[7680/13374\] loss=0.0010 \[0m 11s\] Epoch 11 \[10240/13374\] loss=0.0010 \[0m 11s\] Epoch 11 \[12800/13374\] loss=0.0010 Test Accuracy: 5653/6700 (84.37%) === Epoch 12/50 === \[0m 11s\] Epoch 12 \[2560/13374\] loss=0.0008 \[0m 12s\] Epoch 12 \[5120/13374\] loss=0.0008 \[0m 12s\] Epoch 12 \[7680/13374\] loss=0.0008 \[0m 12s\] Epoch 12 \[10240/13374\] loss=0.0008 \[0m 12s\] Epoch 12 \[12800/13374\] loss=0.0008 Test Accuracy: 5657/6700 (84.43%) === Epoch 13/50 === \[0m 12s\] Epoch 13 \[2560/13374\] loss=0.0008 \[0m 13s\] Epoch 13 \[5120/13374\] loss=0.0007 \[0m 13s\] Epoch 13 \[7680/13374\] loss=0.0007 \[0m 13s\] Epoch 13 \[10240/13374\] loss=0.0007 \[0m 13s\] Epoch 13 \[12800/13374\] loss=0.0007 Test Accuracy: 5662/6700 (84.51%) === Epoch 14/50 === \[0m 13s\] Epoch 14 \[2560/13374\] loss=0.0006 \[0m 14s\] Epoch 14 \[5120/13374\] loss=0.0006 \[0m 14s\] Epoch 14 \[7680/13374\] loss=0.0006 \[0m 14s\] Epoch 14 \[10240/13374\] loss=0.0006 \[0m 14s\] Epoch 14 \[12800/13374\] loss=0.0006 Test Accuracy: 5659/6700 (84.46%) === Epoch 15/50 === \[0m 14s\] Epoch 15 \[2560/13374\] loss=0.0005 \[0m 14s\] Epoch 15 \[5120/13374\] loss=0.0005 \[0m 15s\] Epoch 15 \[7680/13374\] loss=0.0005 \[0m 15s\] Epoch 15 \[10240/13374\] loss=0.0005 \[0m 15s\] Epoch 15 \[12800/13374\] loss=0.0005 Test Accuracy: 5656/6700 (84.42%) === Epoch 16/50 === \[0m 15s\] Epoch 16 \[2560/13374\] loss=0.0004 \[0m 15s\] Epoch 16 \[5120/13374\] loss=0.0004 \[0m 16s\] Epoch 16 \[7680/13374\] loss=0.0004 \[0m 16s\] Epoch 16 \[10240/13374\] loss=0.0005 \[0m 16s\] Epoch 16 \[12800/13374\] loss=0.0005 Test Accuracy: 5638/6700 (84.15%) === Epoch 17/50 === \[0m 16s\] Epoch 17 \[2560/13374\] loss=0.0004 \[0m 16s\] Epoch 17 \[5120/13374\] loss=0.0004 \[0m 16s\] Epoch 17 \[7680/13374\] loss=0.0004 \[0m 17s\] Epoch 17 \[10240/13374\] loss=0.0004 \[0m 17s\] Epoch 17 \[12800/13374\] loss=0.0004 Test Accuracy: 5596/6700 (83.52%) === Epoch 18/50 === \[0m 17s\] Epoch 18 \[2560/13374\] loss=0.0004 \[0m 17s\] Epoch 18 \[5120/13374\] loss=0.0004 \[0m 17s\] Epoch 18 \[7680/13374\] loss=0.0004 \[0m 18s\] Epoch 18 \[10240/13374\] loss=0.0004 \[0m 18s\] Epoch 18 \[12800/13374\] loss=0.0004 Test Accuracy: 5627/6700 (83.99%) === Epoch 19/50 === \[0m 18s\] Epoch 19 \[2560/13374\] loss=0.0003 \[0m 18s\] Epoch 19 \[5120/13374\] loss=0.0003 \[0m 18s\] Epoch 19 \[7680/13374\] loss=0.0003 \[0m 18s\] Epoch 19 \[10240/13374\] loss=0.0003 \[0m 19s\] Epoch 19 \[12800/13374\] loss=0.0003 Test Accuracy: 5636/6700 (84.12%) === Epoch 20/50 === \[0m 19s\] Epoch 20 \[2560/13374\] loss=0.0003 \[0m 19s\] Epoch 20 \[5120/13374\] loss=0.0003 \[0m 19s\] Epoch 20 \[7680/13374\] loss=0.0003 \[0m 19s\] Epoch 20 \[10240/13374\] loss=0.0003 \[0m 20s\] Epoch 20 \[12800/13374\] loss=0.0003 Test Accuracy: 5622/6700 (83.91%) === Epoch 21/50 === \[0m 20s\] Epoch 21 \[2560/13374\] loss=0.0003 \[0m 20s\] Epoch 21 \[5120/13374\] loss=0.0003 \[0m 20s\] Epoch 21 \[7680/13374\] loss=0.0003 \[0m 20s\] Epoch 21 \[10240/13374\] loss=0.0003 \[0m 21s\] Epoch 21 \[12800/13374\] loss=0.0003 Test Accuracy: 5636/6700 (84.12%) === Epoch 22/50 === \[0m 21s\] Epoch 22 \[2560/13374\] loss=0.0003 \[0m 21s\] Epoch 22 \[5120/13374\] loss=0.0003 \[0m 21s\] Epoch 22 \[7680/13374\] loss=0.0003 \[0m 21s\] Epoch 22 \[10240/13374\] loss=0.0003 \[0m 21s\] Epoch 22 \[12800/13374\] loss=0.0003 Test Accuracy: 5603/6700 (83.63%) === Epoch 23/50 === \[0m 22s\] Epoch 23 \[2560/13374\] loss=0.0002 \[0m 22s\] Epoch 23 \[5120/13374\] loss=0.0002 \[0m 22s\] Epoch 23 \[7680/13374\] loss=0.0002 \[0m 22s\] Epoch 23 \[10240/13374\] loss=0.0003 \[0m 22s\] Epoch 23 \[12800/13374\] loss=0.0003 Test Accuracy: 5590/6700 (83.43%) === Epoch 24/50 === \[0m 23s\] Epoch 24 \[2560/13374\] loss=0.0002 \[0m 23s\] Epoch 24 \[5120/13374\] loss=0.0002 \[0m 23s\] Epoch 24 \[7680/13374\] loss=0.0002 \[0m 23s\] Epoch 24 \[10240/13374\] loss=0.0002 \[0m 23s\] Epoch 24 \[12800/13374\] loss=0.0003 Test Accuracy: 5600/6700 (83.58%) === Epoch 25/50 === \[0m 24s\] Epoch 25 \[2560/13374\] loss=0.0002 \[0m 24s\] Epoch 25 \[5120/13374\] loss=0.0002 \[0m 24s\] Epoch 25 \[7680/13374\] loss=0.0002 \[0m 24s\] Epoch 25 \[10240/13374\] loss=0.0002 \[0m 24s\] Epoch 25 \[12800/13374\] loss=0.0003 Test Accuracy: 5622/6700 (83.91%) === Epoch 26/50 === \[0m 25s\] Epoch 26 \[2560/13374\] loss=0.0002 \[0m 25s\] Epoch 26 \[5120/13374\] loss=0.0002 \[0m 25s\] Epoch 26 \[7680/13374\] loss=0.0002 \[0m 25s\] Epoch 26 \[10240/13374\] loss=0.0002 \[0m 25s\] Epoch 26 \[12800/13374\] loss=0.0002 Test Accuracy: 5605/6700 (83.66%) === Epoch 27/50 === \[0m 26s\] Epoch 27 \[2560/13374\] loss=0.0002 \[0m 26s\] Epoch 27 \[5120/13374\] loss=0.0002 \[0m 26s\] Epoch 27 \[7680/13374\] loss=0.0002 \[0m 26s\] Epoch 27 \[10240/13374\] loss=0.0002 \[0m 26s\] Epoch 27 \[12800/13374\] loss=0.0002 Test Accuracy: 5610/6700 (83.73%) === Epoch 28/50 === \[0m 26s\] Epoch 28 \[2560/13374\] loss=0.0002 \[0m 27s\] Epoch 28 \[5120/13374\] loss=0.0002 \[0m 27s\] Epoch 28 \[7680/13374\] loss=0.0002 \[0m 27s\] Epoch 28 \[10240/13374\] loss=0.0002 \[0m 27s\] Epoch 28 \[12800/13374\] loss=0.0002 Test Accuracy: 5606/6700 (83.67%) === Epoch 29/50 === \[0m 27s\] Epoch 29 \[2560/13374\] loss=0.0002 \[0m 28s\] Epoch 29 \[5120/13374\] loss=0.0002 \[0m 28s\] Epoch 29 \[7680/13374\] loss=0.0002 \[0m 28s\] Epoch 29 \[10240/13374\] loss=0.0002 \[0m 28s\] Epoch 29 \[12800/13374\] loss=0.0002 Test Accuracy: 5621/6700 (83.90%) === Epoch 30/50 === \[0m 28s\] Epoch 30 \[2560/13374\] loss=0.0002 \[0m 28s\] Epoch 30 \[5120/13374\] loss=0.0002 \[0m 29s\] Epoch 30 \[7680/13374\] loss=0.0002 \[0m 29s\] Epoch 30 \[10240/13374\] loss=0.0002 \[0m 29s\] Epoch 30 \[12800/13374\] loss=0.0002 Test Accuracy: 5609/6700 (83.72%) === Epoch 31/50 === \[0m 29s\] Epoch 31 \[2560/13374\] loss=0.0002 \[0m 29s\] Epoch 31 \[5120/13374\] loss=0.0002 \[0m 30s\] Epoch 31 \[7680/13374\] loss=0.0002 \[0m 30s\] Epoch 31 \[10240/13374\] loss=0.0002 \[0m 30s\] Epoch 31 \[12800/13374\] loss=0.0002 Test Accuracy: 5607/6700 (83.69%) === Epoch 32/50 === \[0m 30s\] Epoch 32 \[2560/13374\] loss=0.0001 \[0m 30s\] Epoch 32 \[5120/13374\] loss=0.0002 \[0m 30s\] Epoch 32 \[7680/13374\] loss=0.0002 \[0m 31s\] Epoch 32 \[10240/13374\] loss=0.0002 \[0m 31s\] Epoch 32 \[12800/13374\] loss=0.0002 Test Accuracy: 5614/6700 (83.79%) === Epoch 33/50 === \[0m 31s\] Epoch 33 \[2560/13374\] loss=0.0001 \[0m 31s\] Epoch 33 \[5120/13374\] loss=0.0002 \[0m 31s\] Epoch 33 \[7680/13374\] loss=0.0002 \[0m 32s\] Epoch 33 \[10240/13374\] loss=0.0002 \[0m 32s\] Epoch 33 \[12800/13374\] loss=0.0002 Test Accuracy: 5618/6700 (83.85%) === Epoch 34/50 === \[0m 32s\] Epoch 34 \[2560/13374\] loss=0.0002 \[0m 32s\] Epoch 34 \[5120/13374\] loss=0.0002 \[0m 32s\] Epoch 34 \[7680/13374\] loss=0.0002 \[0m 33s\] Epoch 34 \[10240/13374\] loss=0.0002 \[0m 33s\] Epoch 34 \[12800/13374\] loss=0.0002 Test Accuracy: 5605/6700 (83.66%) === Epoch 35/50 === \[0m 33s\] Epoch 35 \[2560/13374\] loss=0.0002 \[0m 33s\] Epoch 35 \[5120/13374\] loss=0.0002 \[0m 33s\] Epoch 35 \[7680/13374\] loss=0.0002 \[0m 33s\] Epoch 35 \[10240/13374\] loss=0.0002 \[0m 34s\] Epoch 35 \[12800/13374\] loss=0.0002 Test Accuracy: 5610/6700 (83.73%) === Epoch 36/50 === \[0m 34s\] Epoch 36 \[2560/13374\] loss=0.0002 \[0m 34s\] Epoch 36 \[5120/13374\] loss=0.0002 \[0m 34s\] Epoch 36 \[7680/13374\] loss=0.0002 \[0m 34s\] Epoch 36 \[10240/13374\] loss=0.0002 \[0m 35s\] Epoch 36 \[12800/13374\] loss=0.0002 Test Accuracy: 5625/6700 (83.96%) === Epoch 37/50 === \[0m 35s\] Epoch 37 \[2560/13374\] loss=0.0001 \[0m 35s\] Epoch 37 \[5120/13374\] loss=0.0002 \[0m 35s\] Epoch 37 \[7680/13374\] loss=0.0002 \[0m 35s\] Epoch 37 \[10240/13374\] loss=0.0002 \[0m 35s\] Epoch 37 \[12800/13374\] loss=0.0002 Test Accuracy: 5605/6700 (83.66%) === Epoch 38/50 === \[0m 36s\] Epoch 38 \[2560/13374\] loss=0.0001 \[0m 36s\] Epoch 38 \[5120/13374\] loss=0.0001 \[0m 36s\] Epoch 38 \[7680/13374\] loss=0.0002 \[0m 36s\] Epoch 38 \[10240/13374\] loss=0.0002 \[0m 36s\] Epoch 38 \[12800/13374\] loss=0.0002 Test Accuracy: 5610/6700 (83.73%) === Epoch 39/50 === \[0m 37s\] Epoch 39 \[2560/13374\] loss=0.0002 \[0m 37s\] Epoch 39 \[5120/13374\] loss=0.0002 \[0m 37s\] Epoch 39 \[7680/13374\] loss=0.0002 \[0m 37s\] Epoch 39 \[10240/13374\] loss=0.0002 \[0m 37s\] Epoch 39 \[12800/13374\] loss=0.0002 Test Accuracy: 5605/6700 (83.66%) === Epoch 40/50 === \[0m 38s\] Epoch 40 \[2560/13374\] loss=0.0002 \[0m 38s\] Epoch 40 \[5120/13374\] loss=0.0002 \[0m 38s\] Epoch 40 \[7680/13374\] loss=0.0002 \[0m 38s\] Epoch 40 \[10240/13374\] loss=0.0002 \[0m 38s\] Epoch 40 \[12800/13374\] loss=0.0002 Test Accuracy: 5609/6700 (83.72%) === Epoch 41/50 === \[0m 39s\] Epoch 41 \[2560/13374\] loss=0.0001 \[0m 39s\] Epoch 41 \[5120/13374\] loss=0.0002 \[0m 39s\] Epoch 41 \[7680/13374\] loss=0.0002 \[0m 39s\] Epoch 41 \[10240/13374\] loss=0.0002 \[0m 39s\] Epoch 41 \[12800/13374\] loss=0.0002 Test Accuracy: 5605/6700 (83.66%) === Epoch 42/50 === \[0m 40s\] Epoch 42 \[2560/13374\] loss=0.0001 \[0m 40s\] Epoch 42 \[5120/13374\] loss=0.0002 \[0m 40s\] Epoch 42 \[7680/13374\] loss=0.0002 \[0m 40s\] Epoch 42 \[10240/13374\] loss=0.0002 \[0m 40s\] Epoch 42 \[12800/13374\] loss=0.0002 Test Accuracy: 5609/6700 (83.72%) === Epoch 43/50 === \[0m 41s\] Epoch 43 \[2560/13374\] loss=0.0001 \[0m 41s\] Epoch 43 \[5120/13374\] loss=0.0002 \[0m 41s\] Epoch 43 \[7680/13374\] loss=0.0002 \[0m 41s\] Epoch 43 \[10240/13374\] loss=0.0002 \[0m 41s\] Epoch 43 \[12800/13374\] loss=0.0002 Test Accuracy: 5605/6700 (83.66%) === Epoch 44/50 === \[0m 41s\] Epoch 44 \[2560/13374\] loss=0.0002 \[0m 42s\] Epoch 44 \[5120/13374\] loss=0.0002 \[0m 42s\] Epoch 44 \[7680/13374\] loss=0.0002 \[0m 42s\] Epoch 44 \[10240/13374\] loss=0.0002 \[0m 42s\] Epoch 44 \[12800/13374\] loss=0.0002 Test Accuracy: 5594/6700 (83.49%) === Epoch 45/50 === \[0m 42s\] Epoch 45 \[2560/13374\] loss=0.0001 \[0m 43s\] Epoch 45 \[5120/13374\] loss=0.0002 \[0m 43s\] Epoch 45 \[7680/13374\] loss=0.0002 \[0m 43s\] Epoch 45 \[10240/13374\] loss=0.0002 \[0m 43s\] Epoch 45 \[12800/13374\] loss=0.0002 Test Accuracy: 5621/6700 (83.90%) === Epoch 46/50 === \[0m 43s\] Epoch 46 \[2560/13374\] loss=0.0002 \[0m 44s\] Epoch 46 \[5120/13374\] loss=0.0001 \[0m 44s\] Epoch 46 \[7680/13374\] loss=0.0002 \[0m 44s\] Epoch 46 \[10240/13374\] loss=0.0002 \[0m 44s\] Epoch 46 \[12800/13374\] loss=0.0002 Test Accuracy: 5594/6700 (83.49%) === Epoch 47/50 === \[0m 44s\] Epoch 47 \[2560/13374\] loss=0.0001 \[0m 45s\] Epoch 47 \[5120/13374\] loss=0.0002 \[0m 45s\] Epoch 47 \[7680/13374\] loss=0.0002 \[0m 45s\] Epoch 47 \[10240/13374\] loss=0.0002 \[0m 45s\] Epoch 47 \[12800/13374\] loss=0.0002 Test Accuracy: 5624/6700 (83.94%) === Epoch 48/50 === \[0m 45s\] Epoch 48 \[2560/13374\] loss=0.0002 \[0m 46s\] Epoch 48 \[5120/13374\] loss=0.0002 \[0m 46s\] Epoch 48 \[7680/13374\] loss=0.0002 \[0m 46s\] Epoch 48 \[10240/13374\] loss=0.0002 \[0m 46s\] Epoch 48 \[12800/13374\] loss=0.0002 Test Accuracy: 5612/6700 (83.76%) === Epoch 49/50 === \[0m 47s\] Epoch 49 \[2560/13374\] loss=0.0001 \[0m 47s\] Epoch 49 \[5120/13374\] loss=0.0001 \[0m 47s\] Epoch 49 \[7680/13374\] loss=0.0001 \[0m 47s\] Epoch 49 \[10240/13374\] loss=0.0002 \[0m 47s\] Epoch 49 \[12800/13374\] loss=0.0002 Test Accuracy: 5608/6700 (83.70%) === Epoch 50/50 === \[0m 48s\] Epoch 50 \[2560/13374\] loss=0.0001 \[0m 48s\] Epoch 50 \[5120/13374\] loss=0.0001 \[0m 48s\] Epoch 50 \[7680/13374\] loss=0.0002 \[0m 48s\] Epoch 50 \[10240/13374\] loss=0.0002 \[0m 48s\] Epoch 50 \[12800/13374\] loss=0.0002 Test Accuracy: 5603/6700 (83.63%)