Pytorch-day09-模型微调-checkpoint

模型微调(fine-tune)-迁移学习

  • torchvision微调
  • timm微调
  • 半精度训练

起源:

  • 1、随着深度学习的发展,模型的参数越来越大,许多开源模型都是在较大数据集上进行训练的,比如Imagenet-1k,Imagenet-11k等
  • 2、如果数据集可能只有几千张,训练几千万参数的大模型,过拟合无法避免
  • 3、如果我们想从零开始训练一个大模型,那么我们的解决办法是收集更多的数据。然而,收集和标注数据会花费大量的时间和资⾦,成本无法承受

解决方案:

  • 应用迁移学习(transfer learning),将从源数据集学到的知识迁移到目标数据集上
  • 比如:ImageNet数据集的图像大多跟椅子无关,但在该数据集上训练的模型可以抽取较通用的图像特征,从而能够帮助识别边缘、纹理、形状和物体组成
  • 模型微调(finetune):就是先找到一个同类的别人训练好的模型,基于已经训练好的模型换成自己的数据,通过训练调整一下参数

不同数据集下使用微调:

  • 数据集1 - 数据量少,但数据相似度非常高 - 在这种情况下,我们所做的只是修改最后几层或最终的softmax图层的输出类别。

  • 数据集2 - 数据量少,数据相似度低 - 在这种情况下,我们可以冻结预训练模型的初始层(比如k层),并再次训练剩余的(n-k)层。由于新数据集的相似度较低,因此根据新数据集对较高层进行重新训练具有重要意义。

  • 数据集3 - 数据量大,数据相似度低 - 在这种情况下,由于我们有一个大的数据集,我们的神经网络训练将会很有效。但是,由于我们的数据与用于训练我们的预训练模型的数据相比有很大不同。使用预训练模型进行的预测不会有效。因此,最好根据你的数据从头开始训练神经网络(Training from scatch)

  • 数据集4 - 数据量大,数据相似度高 - 这是理想情况。在这种情况下,预训练模型应该是最有效的。使用模型的最好方法是保留模型的体系结构和模型的初始权重。然后,我们可以使用在预先训练的模型中的权重来重新训练该模型。

微调的是什么?

  • 换数据源
  • 针对K层进行重新训练
  • K层的权重&shape调整

1、模型微调(fine-tune)一般流程:

  • 1、在源数据集(如ImageNet数据集)上预训练一个神经网络模型,即源模型
  • 2、创建一个新的神经网络模型,即目标模型,它复制了源模型上除了输出层外的所有模型设计及其参数
  • 3、为目标模型添加一个输出⼤小为⽬标数据集类别个数的输出层,并随机初始化该层的模型参数
  • 4、在目标数据集上训练目标模型。我们将从头训练输出层,而其余层的参数都是基于源模型的参数微调得到的

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-xfegFfaM-1692613842808)(attachment:image.png)]

2、torchvision微调

2.1 实例化Model

python 复制代码
import torchvision.models as models
resnet34 = models.resnet34(pretrained=True)

pretrained参数说明:

  • 1、通过True或者False来决定是否使用预训练好的权重,在默认状态下pretrained = False,意味着我们不使用预训练得到的权重
  • 2、当pretrained = True,意味着我们将使用在一些数据集上预训练得到的权重

注意:如果中途强行停止下载的话,一定要去对应路径下将权重文件删除干净,否则会报错。

2.2 训练特定层

如果我们正在提取特征并且只想为新初始化的层计算梯度,其他参数不进行改变。那我们就需要通过设置requires_grad = False来冻结部分层

python 复制代码
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False

2.3 实例

  • 使用resnet34为例的将1000类改为10类,但是仅改变最后一层的模型参数
  • 我们先冻结模型参数的梯度,再对模型输出部分的全连接层进行修改
python 复制代码
import torch
import torch.nn.functional as F
import torch.nn as nn
from torch.optim.lr_scheduler import LambdaLR
from torch.optim.lr_scheduler import StepLR
import torchvision
from torch.utils.data import Dataset, DataLoader
from torchvision.transforms import transforms
from torch.utils.tensorboard import SummaryWriter
import numpy as np
import torchvision.models as models
from torchinfo import summary
python 复制代码
#超参数定义
# 批次的大小
batch_size = 16 #可选32、64、128
# 优化器的学习率
lr = 1e-4
#运行epoch
max_epochs = 2
# 方案二:使用"device",后续对要使用GPU的变量用.to(device)即可
device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu") 
python 复制代码
# 数据读取
#cifar10数据集为例给出构建Dataset类的方式
from torchvision import datasets

#"data_transform"可以对图像进行一定的变换,如翻转、裁剪、归一化等操作,可自己定义
data_transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))
                   ])


train_cifar_dataset = datasets.CIFAR10('cifar10',train=True, download=False,transform=data_transform)
test_cifar_dataset = datasets.CIFAR10('cifar10',train=False, download=False,transform=data_transform)

#构建好Dataset后,就可以使用DataLoader来按批次读入数据了
train_loader = torch.utils.data.DataLoader(train_cifar_dataset, 
                                           batch_size=batch_size, num_workers=4, 
                                           shuffle=True, drop_last=True)

test_loader = torch.utils.data.DataLoader(test_cifar_dataset, 
                                         batch_size=batch_size, num_workers=4, 
                                         shuffle=False)
python 复制代码
# 下载预训练模型 restnet50
resnet34 = models.resnet34(pretrained=True)
print(resnet34)
D:\Users\xulele\Anaconda3\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
D:\Users\xulele\Anaconda3\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/resnet34-b627a593.pth" to C:\Users\xulele/.cache\torch\hub\checkpoints\resnet34-b627a593.pth
100%|██████████| 83.3M/83.3M [00:10<00:00, 8.57MB/s]

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (3): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (3): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (4): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (5): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)
python 复制代码
#查看模型结构
summary(resnet34, (1, 3, 224, 224)) 
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
ResNet                                   [1, 1000]                 --
├─Conv2d: 1-1                            [1, 64, 112, 112]         9,408
├─BatchNorm2d: 1-2                       [1, 64, 112, 112]         128
├─ReLU: 1-3                              [1, 64, 112, 112]         --
├─MaxPool2d: 1-4                         [1, 64, 56, 56]           --
├─Sequential: 1-5                        [1, 64, 56, 56]           --
│    └─BasicBlock: 2-1                   [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-1                  [1, 64, 56, 56]           36,864
│    │    └─BatchNorm2d: 3-2             [1, 64, 56, 56]           128
│    │    └─ReLU: 3-3                    [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-4                  [1, 64, 56, 56]           36,864
│    │    └─BatchNorm2d: 3-5             [1, 64, 56, 56]           128
│    │    └─ReLU: 3-6                    [1, 64, 56, 56]           --
│    └─BasicBlock: 2-2                   [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-7                  [1, 64, 56, 56]           36,864
│    │    └─BatchNorm2d: 3-8             [1, 64, 56, 56]           128
│    │    └─ReLU: 3-9                    [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-10                 [1, 64, 56, 56]           36,864
│    │    └─BatchNorm2d: 3-11            [1, 64, 56, 56]           128
│    │    └─ReLU: 3-12                   [1, 64, 56, 56]           --
│    └─BasicBlock: 2-3                   [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-13                 [1, 64, 56, 56]           36,864
│    │    └─BatchNorm2d: 3-14            [1, 64, 56, 56]           128
│    │    └─ReLU: 3-15                   [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-16                 [1, 64, 56, 56]           36,864
│    │    └─BatchNorm2d: 3-17            [1, 64, 56, 56]           128
│    │    └─ReLU: 3-18                   [1, 64, 56, 56]           --
├─Sequential: 1-6                        [1, 128, 28, 28]          --
│    └─BasicBlock: 2-4                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-19                 [1, 128, 28, 28]          73,728
│    │    └─BatchNorm2d: 3-20            [1, 128, 28, 28]          256
│    │    └─ReLU: 3-21                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-22                 [1, 128, 28, 28]          147,456
│    │    └─BatchNorm2d: 3-23            [1, 128, 28, 28]          256
│    │    └─Sequential: 3-24             [1, 128, 28, 28]          8,448
│    │    └─ReLU: 3-25                   [1, 128, 28, 28]          --
│    └─BasicBlock: 2-5                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-26                 [1, 128, 28, 28]          147,456
│    │    └─BatchNorm2d: 3-27            [1, 128, 28, 28]          256
│    │    └─ReLU: 3-28                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-29                 [1, 128, 28, 28]          147,456
│    │    └─BatchNorm2d: 3-30            [1, 128, 28, 28]          256
│    │    └─ReLU: 3-31                   [1, 128, 28, 28]          --
│    └─BasicBlock: 2-6                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-32                 [1, 128, 28, 28]          147,456
│    │    └─BatchNorm2d: 3-33            [1, 128, 28, 28]          256
│    │    └─ReLU: 3-34                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-35                 [1, 128, 28, 28]          147,456
│    │    └─BatchNorm2d: 3-36            [1, 128, 28, 28]          256
│    │    └─ReLU: 3-37                   [1, 128, 28, 28]          --
│    └─BasicBlock: 2-7                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-38                 [1, 128, 28, 28]          147,456
│    │    └─BatchNorm2d: 3-39            [1, 128, 28, 28]          256
│    │    └─ReLU: 3-40                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-41                 [1, 128, 28, 28]          147,456
│    │    └─BatchNorm2d: 3-42            [1, 128, 28, 28]          256
│    │    └─ReLU: 3-43                   [1, 128, 28, 28]          --
├─Sequential: 1-7                        [1, 256, 14, 14]          --
│    └─BasicBlock: 2-8                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-44                 [1, 256, 14, 14]          294,912
│    │    └─BatchNorm2d: 3-45            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-46                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-47                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-48            [1, 256, 14, 14]          512
│    │    └─Sequential: 3-49             [1, 256, 14, 14]          33,280
│    │    └─ReLU: 3-50                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-9                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-51                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-52            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-53                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-54                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-55            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-56                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-10                  [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-57                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-58            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-59                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-60                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-61            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-62                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-11                  [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-63                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-64            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-65                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-66                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-67            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-68                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-12                  [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-69                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-70            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-71                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-72                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-73            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-74                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-13                  [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-75                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-76            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-77                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-78                 [1, 256, 14, 14]          589,824
│    │    └─BatchNorm2d: 3-79            [1, 256, 14, 14]          512
│    │    └─ReLU: 3-80                   [1, 256, 14, 14]          --
├─Sequential: 1-8                        [1, 512, 7, 7]            --
│    └─BasicBlock: 2-14                  [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-81                 [1, 512, 7, 7]            1,179,648
│    │    └─BatchNorm2d: 3-82            [1, 512, 7, 7]            1,024
│    │    └─ReLU: 3-83                   [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-84                 [1, 512, 7, 7]            2,359,296
│    │    └─BatchNorm2d: 3-85            [1, 512, 7, 7]            1,024
│    │    └─Sequential: 3-86             [1, 512, 7, 7]            132,096
│    │    └─ReLU: 3-87                   [1, 512, 7, 7]            --
│    └─BasicBlock: 2-15                  [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-88                 [1, 512, 7, 7]            2,359,296
│    │    └─BatchNorm2d: 3-89            [1, 512, 7, 7]            1,024
│    │    └─ReLU: 3-90                   [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-91                 [1, 512, 7, 7]            2,359,296
│    │    └─BatchNorm2d: 3-92            [1, 512, 7, 7]            1,024
│    │    └─ReLU: 3-93                   [1, 512, 7, 7]            --
│    └─BasicBlock: 2-16                  [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-94                 [1, 512, 7, 7]            2,359,296
│    │    └─BatchNorm2d: 3-95            [1, 512, 7, 7]            1,024
│    │    └─ReLU: 3-96                   [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-97                 [1, 512, 7, 7]            2,359,296
│    │    └─BatchNorm2d: 3-98            [1, 512, 7, 7]            1,024
│    │    └─ReLU: 3-99                   [1, 512, 7, 7]            --
├─AdaptiveAvgPool2d: 1-9                 [1, 512, 1, 1]            --
├─Linear: 1-10                           [1, 1000]                 513,000
==========================================================================================
Total params: 21,797,672
Trainable params: 21,797,672
Non-trainable params: 0
Total mult-adds (G): 3.66
==========================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 59.82
Params size (MB): 87.19
Estimated Total Size (MB): 147.61
==========================================================================================
python 复制代码
#检测 模型准确率
def cal_predict_correct(model):
    test_total_correct = 0
    for iter,(images,labels) in enumerate(test_loader):
        images = images.to(device)
        labels = labels.to(device)
    
        outputs = model(images)
        test_total_correct += (outputs.argmax(1) == labels).sum().item()
#     print("test_total_correct: "+ str(test_total_correct))
    return test_total_correct
python 复制代码
total_correct = cal_predict_correct(resnet34)
print("test_total_correct: "+ str(test_total_correct / 10000))
test_total_correct: 0.1
python 复制代码
def set_parameter_requires_grad(model, feature_extracting):
    if feature_extracting:
        for param in model.parameters():
            param.requires_grad = False
            

# 冻结参数的梯度
feature_extract = True
new_model = resnet34
set_parameter_requires_grad(new_model, feature_extract)

# 修改模型
#训练过程中,model仍会进行梯度回传,但是参数更新则只会发生在fc层
num_ftrs = new_model.fc.in_features
new_model.fc = nn.Linear(in_features=num_ftrs, out_features=10, bias=True)
python 复制代码
summary(new_model, (1, 3, 224, 224)) 
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
ResNet                                   [1, 10]                   --
├─Conv2d: 1-1                            [1, 64, 112, 112]         (9,408)
├─BatchNorm2d: 1-2                       [1, 64, 112, 112]         (128)
├─ReLU: 1-3                              [1, 64, 112, 112]         --
├─MaxPool2d: 1-4                         [1, 64, 56, 56]           --
├─Sequential: 1-5                        [1, 64, 56, 56]           --
│    └─BasicBlock: 2-1                   [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-1                  [1, 64, 56, 56]           (36,864)
│    │    └─BatchNorm2d: 3-2             [1, 64, 56, 56]           (128)
│    │    └─ReLU: 3-3                    [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-4                  [1, 64, 56, 56]           (36,864)
│    │    └─BatchNorm2d: 3-5             [1, 64, 56, 56]           (128)
│    │    └─ReLU: 3-6                    [1, 64, 56, 56]           --
│    └─BasicBlock: 2-2                   [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-7                  [1, 64, 56, 56]           (36,864)
│    │    └─BatchNorm2d: 3-8             [1, 64, 56, 56]           (128)
│    │    └─ReLU: 3-9                    [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-10                 [1, 64, 56, 56]           (36,864)
│    │    └─BatchNorm2d: 3-11            [1, 64, 56, 56]           (128)
│    │    └─ReLU: 3-12                   [1, 64, 56, 56]           --
│    └─BasicBlock: 2-3                   [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-13                 [1, 64, 56, 56]           (36,864)
│    │    └─BatchNorm2d: 3-14            [1, 64, 56, 56]           (128)
│    │    └─ReLU: 3-15                   [1, 64, 56, 56]           --
│    │    └─Conv2d: 3-16                 [1, 64, 56, 56]           (36,864)
│    │    └─BatchNorm2d: 3-17            [1, 64, 56, 56]           (128)
│    │    └─ReLU: 3-18                   [1, 64, 56, 56]           --
├─Sequential: 1-6                        [1, 128, 28, 28]          --
│    └─BasicBlock: 2-4                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-19                 [1, 128, 28, 28]          (73,728)
│    │    └─BatchNorm2d: 3-20            [1, 128, 28, 28]          (256)
│    │    └─ReLU: 3-21                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-22                 [1, 128, 28, 28]          (147,456)
│    │    └─BatchNorm2d: 3-23            [1, 128, 28, 28]          (256)
│    │    └─Sequential: 3-24             [1, 128, 28, 28]          (8,448)
│    │    └─ReLU: 3-25                   [1, 128, 28, 28]          --
│    └─BasicBlock: 2-5                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-26                 [1, 128, 28, 28]          (147,456)
│    │    └─BatchNorm2d: 3-27            [1, 128, 28, 28]          (256)
│    │    └─ReLU: 3-28                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-29                 [1, 128, 28, 28]          (147,456)
│    │    └─BatchNorm2d: 3-30            [1, 128, 28, 28]          (256)
│    │    └─ReLU: 3-31                   [1, 128, 28, 28]          --
│    └─BasicBlock: 2-6                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-32                 [1, 128, 28, 28]          (147,456)
│    │    └─BatchNorm2d: 3-33            [1, 128, 28, 28]          (256)
│    │    └─ReLU: 3-34                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-35                 [1, 128, 28, 28]          (147,456)
│    │    └─BatchNorm2d: 3-36            [1, 128, 28, 28]          (256)
│    │    └─ReLU: 3-37                   [1, 128, 28, 28]          --
│    └─BasicBlock: 2-7                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-38                 [1, 128, 28, 28]          (147,456)
│    │    └─BatchNorm2d: 3-39            [1, 128, 28, 28]          (256)
│    │    └─ReLU: 3-40                   [1, 128, 28, 28]          --
│    │    └─Conv2d: 3-41                 [1, 128, 28, 28]          (147,456)
│    │    └─BatchNorm2d: 3-42            [1, 128, 28, 28]          (256)
│    │    └─ReLU: 3-43                   [1, 128, 28, 28]          --
├─Sequential: 1-7                        [1, 256, 14, 14]          --
│    └─BasicBlock: 2-8                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-44                 [1, 256, 14, 14]          (294,912)
│    │    └─BatchNorm2d: 3-45            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-46                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-47                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-48            [1, 256, 14, 14]          (512)
│    │    └─Sequential: 3-49             [1, 256, 14, 14]          (33,280)
│    │    └─ReLU: 3-50                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-9                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-51                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-52            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-53                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-54                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-55            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-56                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-10                  [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-57                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-58            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-59                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-60                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-61            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-62                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-11                  [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-63                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-64            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-65                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-66                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-67            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-68                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-12                  [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-69                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-70            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-71                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-72                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-73            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-74                   [1, 256, 14, 14]          --
│    └─BasicBlock: 2-13                  [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-75                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-76            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-77                   [1, 256, 14, 14]          --
│    │    └─Conv2d: 3-78                 [1, 256, 14, 14]          (589,824)
│    │    └─BatchNorm2d: 3-79            [1, 256, 14, 14]          (512)
│    │    └─ReLU: 3-80                   [1, 256, 14, 14]          --
├─Sequential: 1-8                        [1, 512, 7, 7]            --
│    └─BasicBlock: 2-14                  [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-81                 [1, 512, 7, 7]            (1,179,648)
│    │    └─BatchNorm2d: 3-82            [1, 512, 7, 7]            (1,024)
│    │    └─ReLU: 3-83                   [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-84                 [1, 512, 7, 7]            (2,359,296)
│    │    └─BatchNorm2d: 3-85            [1, 512, 7, 7]            (1,024)
│    │    └─Sequential: 3-86             [1, 512, 7, 7]            (132,096)
│    │    └─ReLU: 3-87                   [1, 512, 7, 7]            --
│    └─BasicBlock: 2-15                  [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-88                 [1, 512, 7, 7]            (2,359,296)
│    │    └─BatchNorm2d: 3-89            [1, 512, 7, 7]            (1,024)
│    │    └─ReLU: 3-90                   [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-91                 [1, 512, 7, 7]            (2,359,296)
│    │    └─BatchNorm2d: 3-92            [1, 512, 7, 7]            (1,024)
│    │    └─ReLU: 3-93                   [1, 512, 7, 7]            --
│    └─BasicBlock: 2-16                  [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-94                 [1, 512, 7, 7]            (2,359,296)
│    │    └─BatchNorm2d: 3-95            [1, 512, 7, 7]            (1,024)
│    │    └─ReLU: 3-96                   [1, 512, 7, 7]            --
│    │    └─Conv2d: 3-97                 [1, 512, 7, 7]            (2,359,296)
│    │    └─BatchNorm2d: 3-98            [1, 512, 7, 7]            (1,024)
│    │    └─ReLU: 3-99                   [1, 512, 7, 7]            --
├─AdaptiveAvgPool2d: 1-9                 [1, 512, 1, 1]            --
├─Linear: 1-10                           [1, 10]                   5,130
==========================================================================================
Total params: 21,289,802
Trainable params: 5,130
Non-trainable params: 21,284,672
Total mult-adds (G): 3.66
==========================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 59.81
Params size (MB): 85.16
Estimated Total Size (MB): 145.57
==========================================================================================
python 复制代码
#训练&验证
Resnet34_new = new_model.to(device)
# 定义损失函数和优化器
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# 损失函数:自定义损失函数
criterion = nn.CrossEntropyLoss()
# 优化器
optimizer = torch.optim.Adam(Resnet50_new.parameters(), lr=lr)
epoch = max_epochs

total_step = len(train_loader)
train_all_loss = []
test_all_loss = []

for i in range(epoch):
    Resnet34_new.train()
    train_total_loss = 0
    train_total_num = 0
    train_total_correct = 0

    for iter, (images,labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet34_new(images)
        loss = criterion(outputs,labels)
        train_total_correct += (outputs.argmax(1) == labels).sum().item()
        
        #backword
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_total_num += labels.shape[0]
        train_total_loss += loss.item()
        print("Epoch [{}/{}], Iter [{}/{}], train_loss:{:4f}".format(i+1,epoch,iter+1,total_step,loss.item()/labels.shape[0]))
    
    Resnet34_new.eval()
    test_total_loss = 0
    test_total_correct = 0
    test_total_num = 0
    for iter,(images,labels) in enumerate(test_loader):
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = Resnet34_new(images)
        loss = criterion(outputs,labels)
        test_total_correct += (outputs.argmax(1) == labels).sum().item()
        test_total_loss += loss.item()
        test_total_num += labels.shape[0]
    print("Epoch [{}/{}], train_loss:{:.4f}, train_acc:{:.4f}%, test_loss:{:.4f}, test_acc:{:.4f}%".format(
        i+1, epoch, train_total_loss / train_total_num, train_total_correct / train_total_num * 100, test_total_loss / test_total_num, test_total_correct / test_total_num * 100
    
    ))
    train_all_loss.append(np.round(train_total_loss / train_total_num,4))
    test_all_loss.append(np.round(test_total_loss / test_total_num,4))
Epoch [1/2], Iter [1/3125], train_loss:0.150127
Epoch [1/2], Iter [2/3125], train_loss:0.174470
Epoch [1/2], Iter [3/3125], train_loss:0.165727
Epoch [1/2], Iter [4/3125], train_loss:0.174811
Epoch [1/2], Iter [5/3125], train_loss:0.158658
Epoch [1/2], Iter [6/3125], train_loss:0.153260
Epoch [1/2], Iter [7/3125], train_loss:0.164495
Epoch [1/2], Iter [8/3125], train_loss:0.164485
Epoch [1/2], Iter [9/3125], train_loss:0.157202
Epoch [1/2], Iter [10/3125], train_loss:0.149555
Epoch [1/2], Iter [11/3125], train_loss:0.172609
Epoch [1/2], Iter [12/3125], train_loss:0.180861
Epoch [1/2], Iter [13/3125], train_loss:0.156719
Epoch [1/2], Iter [14/3125], train_loss:0.172375
Epoch [1/2], Iter [15/3125], train_loss:0.169886
Epoch [1/2], Iter [16/3125], train_loss:0.148726
Epoch [1/2], Iter [17/3125], train_loss:0.160391
Epoch [1/2], Iter [18/3125], train_loss:0.160285
Epoch [1/2], Iter [19/3125], train_loss:0.167672
Epoch [1/2], Iter [20/3125], train_loss:0.151213
Epoch [1/2], Iter [21/3125], train_loss:0.154690
Epoch [1/2], Iter [22/3125], train_loss:0.155165
Epoch [1/2], Iter [23/3125], train_loss:0.162777
Epoch [1/2], Iter [24/3125], train_loss:0.169136
Epoch [1/2], Iter [25/3125], train_loss:0.151533
Epoch [1/2], Iter [26/3125], train_loss:0.168992
Epoch [1/2], Iter [27/3125], train_loss:0.176258
Epoch [1/2], Iter [28/3125], train_loss:0.162240
Epoch [1/2], Iter [29/3125], train_loss:0.161768
Epoch [1/2], Iter [30/3125], train_loss:0.165359
Epoch [1/2], Iter [31/3125], train_loss:0.166174
Epoch [1/2], Iter [32/3125], train_loss:0.173654
Epoch [1/2], Iter [33/3125], train_loss:0.162488
Epoch [1/2], Iter [34/3125], train_loss:0.164815
Epoch [1/2], Iter [35/3125], train_loss:0.154411
Epoch [1/2], Iter [36/3125], train_loss:0.159386
Epoch [1/2], Iter [37/3125], train_loss:0.176261
Epoch [1/2], Iter [38/3125], train_loss:0.163848
Epoch [1/2], Iter [39/3125], train_loss:0.174402
Epoch [1/2], Iter [40/3125], train_loss:0.178917
Epoch [1/2], Iter [41/3125], train_loss:0.149938
Epoch [1/2], Iter [42/3125], train_loss:0.156186
Epoch [1/2], Iter [43/3125], train_loss:0.162950
Epoch [1/2], Iter [44/3125], train_loss:0.169058
Epoch [1/2], Iter [45/3125], train_loss:0.168587
Epoch [1/2], Iter [46/3125], train_loss:0.173754
Epoch [1/2], Iter [47/3125], train_loss:0.158612
Epoch [1/2], Iter [48/3125], train_loss:0.163891
Epoch [1/2], Iter [49/3125], train_loss:0.149220
Epoch [1/2], Iter [50/3125], train_loss:0.175387
Epoch [1/2], Iter [51/3125], train_loss:0.163082
Epoch [1/2], Iter [52/3125], train_loss:0.156597
Epoch [1/2], Iter [53/3125], train_loss:0.179248
Epoch [1/2], Iter [54/3125], train_loss:0.170053
Epoch [1/2], Iter [55/3125], train_loss:0.140899
Epoch [1/2], Iter [56/3125], train_loss:0.168686
Epoch [1/2], Iter [57/3125], train_loss:0.189548
Epoch [1/2], Iter [58/3125], train_loss:0.169847
Epoch [1/2], Iter [59/3125], train_loss:0.171854
Epoch [1/2], Iter [60/3125], train_loss:0.175660
Epoch [1/2], Iter [61/3125], train_loss:0.163686
Epoch [1/2], Iter [62/3125], train_loss:0.174950
Epoch [1/2], Iter [63/3125], train_loss:0.173237
Epoch [1/2], Iter [64/3125], train_loss:0.146743
Epoch [1/2], Iter [65/3125], train_loss:0.159798
Epoch [1/2], Iter [66/3125], train_loss:0.169616
Epoch [1/2], Iter [67/3125], train_loss:0.167541
Epoch [1/2], Iter [68/3125], train_loss:0.136470
Epoch [1/2], Iter [69/3125], train_loss:0.185080
Epoch [1/2], Iter [70/3125], train_loss:0.166373
Epoch [1/2], Iter [71/3125], train_loss:0.160634
Epoch [1/2], Iter [72/3125], train_loss:0.163522
Epoch [1/2], Iter [73/3125], train_loss:0.157858
Epoch [1/2], Iter [74/3125], train_loss:0.157069
Epoch [1/2], Iter [75/3125], train_loss:0.183969
Epoch [1/2], Iter [76/3125], train_loss:0.166041
Epoch [1/2], Iter [77/3125], train_loss:0.151215
Epoch [1/2], Iter [78/3125], train_loss:0.164155
Epoch [1/2], Iter [79/3125], train_loss:0.158990
Epoch [1/2], Iter [80/3125], train_loss:0.178859
Epoch [1/2], Iter [81/3125], train_loss:0.139378
Epoch [1/2], Iter [82/3125], train_loss:0.150422
Epoch [1/2], Iter [83/3125], train_loss:0.155447
Epoch [1/2], Iter [84/3125], train_loss:0.146703
Epoch [1/2], Iter [85/3125], train_loss:0.165099
Epoch [1/2], Iter [86/3125], train_loss:0.175539
Epoch [1/2], Iter [87/3125], train_loss:0.178613
Epoch [1/2], Iter [88/3125], train_loss:0.169430
Epoch [1/2], Iter [89/3125], train_loss:0.160620
Epoch [1/2], Iter [90/3125], train_loss:0.172726
Epoch [1/2], Iter [91/3125], train_loss:0.139834
Epoch [1/2], Iter [92/3125], train_loss:0.162758
Epoch [1/2], Iter [93/3125], train_loss:0.160110
Epoch [1/2], Iter [94/3125], train_loss:0.176203
Epoch [1/2], Iter [95/3125], train_loss:0.170835
Epoch [1/2], Iter [96/3125], train_loss:0.166727
Epoch [1/2], Iter [97/3125], train_loss:0.175421
Epoch [1/2], Iter [98/3125], train_loss:0.173413
Epoch [1/2], Iter [99/3125], train_loss:0.154259
Epoch [1/2], Iter [100/3125], train_loss:0.146670
Epoch [1/2], Iter [101/3125], train_loss:0.161012
Epoch [1/2], Iter [102/3125], train_loss:0.151979
Epoch [1/2], Iter [103/3125], train_loss:0.163212
Epoch [1/2], Iter [104/3125], train_loss:0.174235
Epoch [1/2], Iter [105/3125], train_loss:0.152968
Epoch [1/2], Iter [106/3125], train_loss:0.156215
Epoch [1/2], Iter [107/3125], train_loss:0.164557
Epoch [1/2], Iter [108/3125], train_loss:0.144438
Epoch [1/2], Iter [109/3125], train_loss:0.168143
Epoch [1/2], Iter [110/3125], train_loss:0.144444
Epoch [1/2], Iter [111/3125], train_loss:0.153808
Epoch [1/2], Iter [112/3125], train_loss:0.172484
Epoch [1/2], Iter [113/3125], train_loss:0.168573
Epoch [1/2], Iter [114/3125], train_loss:0.157955
Epoch [1/2], Iter [115/3125], train_loss:0.170679
Epoch [1/2], Iter [116/3125], train_loss:0.150308
Epoch [1/2], Iter [117/3125], train_loss:0.152166
Epoch [1/2], Iter [118/3125], train_loss:0.175642
Epoch [1/2], Iter [119/3125], train_loss:0.167162
Epoch [1/2], Iter [120/3125], train_loss:0.159675
Epoch [1/2], Iter [121/3125], train_loss:0.176089
Epoch [1/2], Iter [122/3125], train_loss:0.154275
Epoch [1/2], Iter [123/3125], train_loss:0.139308
Epoch [1/2], Iter [124/3125], train_loss:0.156106
Epoch [1/2], Iter [125/3125], train_loss:0.140437
Epoch [1/2], Iter [126/3125], train_loss:0.154971
Epoch [1/2], Iter [127/3125], train_loss:0.148948
Epoch [1/2], Iter [128/3125], train_loss:0.173654
Epoch [1/2], Iter [129/3125], train_loss:0.175725
Epoch [1/2], Iter [130/3125], train_loss:0.160516
Epoch [1/2], Iter [131/3125], train_loss:0.170737
Epoch [1/2], Iter [132/3125], train_loss:0.161662
Epoch [1/2], Iter [133/3125], train_loss:0.179921
Epoch [1/2], Iter [134/3125], train_loss:0.160738
Epoch [1/2], Iter [135/3125], train_loss:0.134471
Epoch [1/2], Iter [136/3125], train_loss:0.170317
Epoch [1/2], Iter [137/3125], train_loss:0.153042
Epoch [1/2], Iter [138/3125], train_loss:0.163370
Epoch [1/2], Iter [139/3125], train_loss:0.169346
Epoch [1/2], Iter [140/3125], train_loss:0.156637
Epoch [1/2], Iter [141/3125], train_loss:0.164446
Epoch [1/2], Iter [142/3125], train_loss:0.166337
Epoch [1/2], Iter [143/3125], train_loss:0.150206
Epoch [1/2], Iter [144/3125], train_loss:0.156191
Epoch [1/2], Iter [145/3125], train_loss:0.169191
Epoch [1/2], Iter [146/3125], train_loss:0.165300
Epoch [1/2], Iter [147/3125], train_loss:0.177487
Epoch [1/2], Iter [148/3125], train_loss:0.179514
Epoch [1/2], Iter [149/3125], train_loss:0.153518
Epoch [1/2], Iter [150/3125], train_loss:0.155025
Epoch [1/2], Iter [151/3125], train_loss:0.169826
Epoch [1/2], Iter [152/3125], train_loss:0.150576
Epoch [1/2], Iter [153/3125], train_loss:0.149755
Epoch [1/2], Iter [154/3125], train_loss:0.156437
Epoch [1/2], Iter [155/3125], train_loss:0.163630
Epoch [1/2], Iter [156/3125], train_loss:0.163358
Epoch [1/2], Iter [157/3125], train_loss:0.168501
Epoch [1/2], Iter [158/3125], train_loss:0.152938
Epoch [1/2], Iter [159/3125], train_loss:0.162743
Epoch [1/2], Iter [160/3125], train_loss:0.164684
Epoch [1/2], Iter [161/3125], train_loss:0.134906
Epoch [1/2], Iter [162/3125], train_loss:0.171217
Epoch [1/2], Iter [163/3125], train_loss:0.166338
Epoch [1/2], Iter [164/3125], train_loss:0.173403
Epoch [1/2], Iter [165/3125], train_loss:0.166951
Epoch [1/2], Iter [166/3125], train_loss:0.161986
Epoch [1/2], Iter [167/3125], train_loss:0.167642
Epoch [1/2], Iter [168/3125], train_loss:0.163133
Epoch [1/2], Iter [169/3125], train_loss:0.176087
Epoch [1/2], Iter [170/3125], train_loss:0.181500
Epoch [1/2], Iter [171/3125], train_loss:0.182332
Epoch [1/2], Iter [172/3125], train_loss:0.159162
Epoch [1/2], Iter [173/3125], train_loss:0.173818
Epoch [1/2], Iter [174/3125], train_loss:0.151095
Epoch [1/2], Iter [175/3125], train_loss:0.169016
Epoch [1/2], Iter [176/3125], train_loss:0.168345
Epoch [1/2], Iter [177/3125], train_loss:0.171198
Epoch [1/2], Iter [178/3125], train_loss:0.158377
Epoch [1/2], Iter [179/3125], train_loss:0.150349
Epoch [1/2], Iter [180/3125], train_loss:0.154732
Epoch [1/2], Iter [181/3125], train_loss:0.159255
Epoch [1/2], Iter [182/3125], train_loss:0.180752
Epoch [1/2], Iter [183/3125], train_loss:0.130398
Epoch [1/2], Iter [184/3125], train_loss:0.149835
Epoch [1/2], Iter [185/3125], train_loss:0.163545
Epoch [1/2], Iter [186/3125], train_loss:0.165769
Epoch [1/2], Iter [187/3125], train_loss:0.165499
Epoch [1/2], Iter [188/3125], train_loss:0.191183
Epoch [1/2], Iter [189/3125], train_loss:0.165406
Epoch [1/2], Iter [190/3125], train_loss:0.158130
Epoch [1/2], Iter [191/3125], train_loss:0.167049
Epoch [1/2], Iter [192/3125], train_loss:0.158406
Epoch [1/2], Iter [193/3125], train_loss:0.155791
Epoch [1/2], Iter [194/3125], train_loss:0.154068
Epoch [1/2], Iter [195/3125], train_loss:0.173929
Epoch [1/2], Iter [196/3125], train_loss:0.166356
Epoch [1/2], Iter [197/3125], train_loss:0.153073
Epoch [1/2], Iter [198/3125], train_loss:0.159932
Epoch [1/2], Iter [199/3125], train_loss:0.158823
Epoch [1/2], Iter [200/3125], train_loss:0.187810
Epoch [1/2], Iter [201/3125], train_loss:0.178415
Epoch [1/2], Iter [202/3125], train_loss:0.156469
Epoch [1/2], Iter [203/3125], train_loss:0.160102
Epoch [1/2], Iter [204/3125], train_loss:0.147824
Epoch [1/2], Iter [205/3125], train_loss:0.159959
Epoch [1/2], Iter [206/3125], train_loss:0.168457
Epoch [1/2], Iter [207/3125], train_loss:0.152751
Epoch [1/2], Iter [208/3125], train_loss:0.153071
Epoch [1/2], Iter [209/3125], train_loss:0.162002
Epoch [1/2], Iter [210/3125], train_loss:0.177490
Epoch [1/2], Iter [211/3125], train_loss:0.153973
Epoch [1/2], Iter [212/3125], train_loss:0.178655
Epoch [1/2], Iter [213/3125], train_loss:0.172759
Epoch [1/2], Iter [214/3125], train_loss:0.161288
Epoch [1/2], Iter [215/3125], train_loss:0.145693
Epoch [1/2], Iter [216/3125], train_loss:0.149355
Epoch [1/2], Iter [217/3125], train_loss:0.177612
Epoch [1/2], Iter [218/3125], train_loss:0.156104
Epoch [1/2], Iter [219/3125], train_loss:0.146696
Epoch [1/2], Iter [220/3125], train_loss:0.168620
Epoch [1/2], Iter [221/3125], train_loss:0.134316
Epoch [1/2], Iter [222/3125], train_loss:0.164465
Epoch [1/2], Iter [223/3125], train_loss:0.161020
Epoch [1/2], Iter [224/3125], train_loss:0.144464
Epoch [1/2], Iter [225/3125], train_loss:0.145501
Epoch [1/2], Iter [226/3125], train_loss:0.156721
Epoch [1/2], Iter [227/3125], train_loss:0.160348
Epoch [1/2], Iter [228/3125], train_loss:0.157792
Epoch [1/2], Iter [229/3125], train_loss:0.143886
Epoch [1/2], Iter [230/3125], train_loss:0.146231
Epoch [1/2], Iter [231/3125], train_loss:0.161353
Epoch [1/2], Iter [232/3125], train_loss:0.172967
Epoch [1/2], Iter [233/3125], train_loss:0.173051
Epoch [1/2], Iter [234/3125], train_loss:0.173887
Epoch [1/2], Iter [235/3125], train_loss:0.155447
Epoch [1/2], Iter [236/3125], train_loss:0.162683
Epoch [1/2], Iter [237/3125], train_loss:0.147682
Epoch [1/2], Iter [238/3125], train_loss:0.170582
Epoch [1/2], Iter [239/3125], train_loss:0.159764
Epoch [1/2], Iter [240/3125], train_loss:0.157225
Epoch [1/2], Iter [241/3125], train_loss:0.153664
Epoch [1/2], Iter [242/3125], train_loss:0.166018
Epoch [1/2], Iter [243/3125], train_loss:0.175373
Epoch [1/2], Iter [244/3125], train_loss:0.146529
Epoch [1/2], Iter [245/3125], train_loss:0.166091
Epoch [1/2], Iter [246/3125], train_loss:0.161189
Epoch [1/2], Iter [247/3125], train_loss:0.144563
Epoch [1/2], Iter [248/3125], train_loss:0.150318
Epoch [1/2], Iter [249/3125], train_loss:0.140906
Epoch [1/2], Iter [250/3125], train_loss:0.169033
Epoch [1/2], Iter [251/3125], train_loss:0.155781
Epoch [1/2], Iter [252/3125], train_loss:0.163493
Epoch [1/2], Iter [253/3125], train_loss:0.153378
Epoch [1/2], Iter [254/3125], train_loss:0.183447
Epoch [1/2], Iter [255/3125], train_loss:0.178129
Epoch [1/2], Iter [256/3125], train_loss:0.177007
Epoch [1/2], Iter [257/3125], train_loss:0.179591
Epoch [1/2], Iter [258/3125], train_loss:0.169509
Epoch [1/2], Iter [259/3125], train_loss:0.146213
Epoch [1/2], Iter [260/3125], train_loss:0.171849
Epoch [1/2], Iter [261/3125], train_loss:0.163851
Epoch [1/2], Iter [262/3125], train_loss:0.178366
Epoch [1/2], Iter [263/3125], train_loss:0.194072
Epoch [1/2], Iter [264/3125], train_loss:0.172418
Epoch [1/2], Iter [265/3125], train_loss:0.143541
Epoch [1/2], Iter [266/3125], train_loss:0.158418
Epoch [1/2], Iter [267/3125], train_loss:0.163535
Epoch [1/2], Iter [268/3125], train_loss:0.171397
Epoch [1/2], Iter [269/3125], train_loss:0.183410
Epoch [1/2], Iter [270/3125], train_loss:0.191745
Epoch [1/2], Iter [271/3125], train_loss:0.195354
Epoch [1/2], Iter [272/3125], train_loss:0.166208
Epoch [1/2], Iter [273/3125], train_loss:0.148297
Epoch [1/2], Iter [274/3125], train_loss:0.164007
Epoch [1/2], Iter [275/3125], train_loss:0.168109
Epoch [1/2], Iter [276/3125], train_loss:0.187189
Epoch [1/2], Iter [277/3125], train_loss:0.171387
Epoch [1/2], Iter [278/3125], train_loss:0.143764
Epoch [1/2], Iter [279/3125], train_loss:0.175919
Epoch [1/2], Iter [280/3125], train_loss:0.172834
Epoch [1/2], Iter [281/3125], train_loss:0.173480
Epoch [1/2], Iter [282/3125], train_loss:0.141544
Epoch [1/2], Iter [283/3125], train_loss:0.187073
Epoch [1/2], Iter [284/3125], train_loss:0.147416
Epoch [1/2], Iter [285/3125], train_loss:0.163346
Epoch [1/2], Iter [286/3125], train_loss:0.155601
Epoch [1/2], Iter [287/3125], train_loss:0.160135
Epoch [1/2], Iter [288/3125], train_loss:0.153201
Epoch [1/2], Iter [289/3125], train_loss:0.157078
Epoch [1/2], Iter [290/3125], train_loss:0.143863
Epoch [1/2], Iter [291/3125], train_loss:0.170847
Epoch [1/2], Iter [292/3125], train_loss:0.160009
Epoch [1/2], Iter [293/3125], train_loss:0.160868
Epoch [1/2], Iter [294/3125], train_loss:0.159037
Epoch [1/2], Iter [295/3125], train_loss:0.148768
Epoch [1/2], Iter [296/3125], train_loss:0.172005
Epoch [1/2], Iter [297/3125], train_loss:0.170369
Epoch [1/2], Iter [298/3125], train_loss:0.150250
Epoch [1/2], Iter [299/3125], train_loss:0.203501
Epoch [1/2], Iter [300/3125], train_loss:0.172398
Epoch [1/2], Iter [301/3125], train_loss:0.184601
Epoch [1/2], Iter [302/3125], train_loss:0.156175
Epoch [1/2], Iter [303/3125], train_loss:0.161752
Epoch [1/2], Iter [304/3125], train_loss:0.154050
Epoch [1/2], Iter [305/3125], train_loss:0.151905
Epoch [1/2], Iter [306/3125], train_loss:0.154861
Epoch [1/2], Iter [307/3125], train_loss:0.157530
Epoch [1/2], Iter [308/3125], train_loss:0.162054
Epoch [1/2], Iter [309/3125], train_loss:0.172370
Epoch [1/2], Iter [310/3125], train_loss:0.149971
Epoch [1/2], Iter [311/3125], train_loss:0.155449
Epoch [1/2], Iter [312/3125], train_loss:0.168246
Epoch [1/2], Iter [313/3125], train_loss:0.161156
Epoch [1/2], Iter [314/3125], train_loss:0.182064
Epoch [1/2], Iter [315/3125], train_loss:0.168014
Epoch [1/2], Iter [316/3125], train_loss:0.155707
Epoch [1/2], Iter [317/3125], train_loss:0.155345
Epoch [1/2], Iter [318/3125], train_loss:0.157537
Epoch [1/2], Iter [319/3125], train_loss:0.158657
Epoch [1/2], Iter [320/3125], train_loss:0.162647
Epoch [1/2], Iter [321/3125], train_loss:0.165201
Epoch [1/2], Iter [322/3125], train_loss:0.187565
Epoch [1/2], Iter [323/3125], train_loss:0.153937
Epoch [1/2], Iter [324/3125], train_loss:0.147520
Epoch [1/2], Iter [325/3125], train_loss:0.139758
Epoch [1/2], Iter [326/3125], train_loss:0.177869
Epoch [1/2], Iter [327/3125], train_loss:0.178201
Epoch [1/2], Iter [328/3125], train_loss:0.154316
Epoch [1/2], Iter [329/3125], train_loss:0.178173
Epoch [1/2], Iter [330/3125], train_loss:0.159244
Epoch [1/2], Iter [331/3125], train_loss:0.177582
Epoch [1/2], Iter [332/3125], train_loss:0.153592
Epoch [1/2], Iter [333/3125], train_loss:0.154490
Epoch [1/2], Iter [334/3125], train_loss:0.150733
Epoch [1/2], Iter [335/3125], train_loss:0.169697
Epoch [1/2], Iter [336/3125], train_loss:0.155575
Epoch [1/2], Iter [337/3125], train_loss:0.158214
Epoch [1/2], Iter [338/3125], train_loss:0.174536
Epoch [1/2], Iter [339/3125], train_loss:0.139395
Epoch [1/2], Iter [340/3125], train_loss:0.163447
Epoch [1/2], Iter [341/3125], train_loss:0.146871
Epoch [1/2], Iter [342/3125], train_loss:0.160089
Epoch [1/2], Iter [343/3125], train_loss:0.161521
Epoch [1/2], Iter [344/3125], train_loss:0.148263
Epoch [1/2], Iter [345/3125], train_loss:0.156887
Epoch [1/2], Iter [346/3125], train_loss:0.163093
Epoch [1/2], Iter [347/3125], train_loss:0.130156
Epoch [1/2], Iter [348/3125], train_loss:0.153562
Epoch [1/2], Iter [349/3125], train_loss:0.183320
Epoch [1/2], Iter [350/3125], train_loss:0.151159
Epoch [1/2], Iter [351/3125], train_loss:0.144421
Epoch [1/2], Iter [352/3125], train_loss:0.145968
Epoch [1/2], Iter [353/3125], train_loss:0.150598
Epoch [1/2], Iter [354/3125], train_loss:0.163271
Epoch [1/2], Iter [355/3125], train_loss:0.191171
Epoch [1/2], Iter [356/3125], train_loss:0.166442
Epoch [1/2], Iter [357/3125], train_loss:0.153268
Epoch [1/2], Iter [358/3125], train_loss:0.160086
Epoch [1/2], Iter [359/3125], train_loss:0.172394
Epoch [1/2], Iter [360/3125], train_loss:0.160697
Epoch [1/2], Iter [361/3125], train_loss:0.158556
Epoch [1/2], Iter [362/3125], train_loss:0.148141
Epoch [1/2], Iter [363/3125], train_loss:0.161616
Epoch [1/2], Iter [364/3125], train_loss:0.164506
Epoch [1/2], Iter [365/3125], train_loss:0.153889
Epoch [1/2], Iter [366/3125], train_loss:0.149990
Epoch [1/2], Iter [367/3125], train_loss:0.172651
Epoch [1/2], Iter [368/3125], train_loss:0.167421
Epoch [1/2], Iter [369/3125], train_loss:0.157874
Epoch [1/2], Iter [370/3125], train_loss:0.175726
Epoch [1/2], Iter [371/3125], train_loss:0.168166
Epoch [1/2], Iter [372/3125], train_loss:0.160632
Epoch [1/2], Iter [373/3125], train_loss:0.169915
Epoch [1/2], Iter [374/3125], train_loss:0.141351
Epoch [1/2], Iter [375/3125], train_loss:0.157579
Epoch [1/2], Iter [376/3125], train_loss:0.159373
Epoch [1/2], Iter [377/3125], train_loss:0.173719
Epoch [1/2], Iter [378/3125], train_loss:0.156862
Epoch [1/2], Iter [379/3125], train_loss:0.164567
Epoch [1/2], Iter [380/3125], train_loss:0.151420
Epoch [1/2], Iter [381/3125], train_loss:0.155565
Epoch [1/2], Iter [382/3125], train_loss:0.156861
Epoch [1/2], Iter [383/3125], train_loss:0.162360
Epoch [1/2], Iter [384/3125], train_loss:0.155612
Epoch [1/2], Iter [385/3125], train_loss:0.187500
Epoch [1/2], Iter [386/3125], train_loss:0.167519
Epoch [1/2], Iter [387/3125], train_loss:0.150314
Epoch [1/2], Iter [388/3125], train_loss:0.171371
Epoch [1/2], Iter [389/3125], train_loss:0.170002
Epoch [1/2], Iter [390/3125], train_loss:0.171281
Epoch [1/2], Iter [391/3125], train_loss:0.154229
Epoch [1/2], Iter [392/3125], train_loss:0.152277
Epoch [1/2], Iter [393/3125], train_loss:0.160335
Epoch [1/2], Iter [394/3125], train_loss:0.160123
Epoch [1/2], Iter [395/3125], train_loss:0.157730
Epoch [1/2], Iter [396/3125], train_loss:0.148626
Epoch [1/2], Iter [397/3125], train_loss:0.164090
Epoch [1/2], Iter [398/3125], train_loss:0.181123
Epoch [1/2], Iter [399/3125], train_loss:0.144987
Epoch [1/2], Iter [400/3125], train_loss:0.147743
Epoch [1/2], Iter [401/3125], train_loss:0.156141
Epoch [1/2], Iter [402/3125], train_loss:0.182602
Epoch [1/2], Iter [403/3125], train_loss:0.186334
Epoch [1/2], Iter [404/3125], train_loss:0.158865
Epoch [1/2], Iter [405/3125], train_loss:0.157437
Epoch [1/2], Iter [406/3125], train_loss:0.151499
Epoch [1/2], Iter [407/3125], train_loss:0.155167
Epoch [1/2], Iter [408/3125], train_loss:0.158371
Epoch [1/2], Iter [409/3125], train_loss:0.170319
Epoch [1/2], Iter [410/3125], train_loss:0.172921
Epoch [1/2], Iter [411/3125], train_loss:0.175661
Epoch [1/2], Iter [412/3125], train_loss:0.170778
Epoch [1/2], Iter [413/3125], train_loss:0.173227
Epoch [1/2], Iter [414/3125], train_loss:0.162998
Epoch [1/2], Iter [415/3125], train_loss:0.144094
Epoch [1/2], Iter [416/3125], train_loss:0.154685
Epoch [1/2], Iter [417/3125], train_loss:0.177953
Epoch [1/2], Iter [418/3125], train_loss:0.151602
Epoch [1/2], Iter [419/3125], train_loss:0.165047
Epoch [1/2], Iter [420/3125], train_loss:0.146441
Epoch [1/2], Iter [421/3125], train_loss:0.155056
Epoch [1/2], Iter [422/3125], train_loss:0.144277
Epoch [1/2], Iter [423/3125], train_loss:0.156635
Epoch [1/2], Iter [424/3125], train_loss:0.154019
Epoch [1/2], Iter [425/3125], train_loss:0.161336
Epoch [1/2], Iter [426/3125], train_loss:0.203085
Epoch [1/2], Iter [427/3125], train_loss:0.146707
Epoch [1/2], Iter [428/3125], train_loss:0.158310
Epoch [1/2], Iter [429/3125], train_loss:0.171015
Epoch [1/2], Iter [430/3125], train_loss:0.142477
Epoch [1/2], Iter [431/3125], train_loss:0.189910
Epoch [1/2], Iter [432/3125], train_loss:0.165581
Epoch [1/2], Iter [433/3125], train_loss:0.163553
Epoch [1/2], Iter [434/3125], train_loss:0.162628
Epoch [1/2], Iter [435/3125], train_loss:0.166308
Epoch [1/2], Iter [436/3125], train_loss:0.174060
Epoch [1/2], Iter [437/3125], train_loss:0.170486
Epoch [1/2], Iter [438/3125], train_loss:0.170334
Epoch [1/2], Iter [439/3125], train_loss:0.170027
Epoch [1/2], Iter [440/3125], train_loss:0.176327
Epoch [1/2], Iter [441/3125], train_loss:0.185929
Epoch [1/2], Iter [442/3125], train_loss:0.164644
Epoch [1/2], Iter [443/3125], train_loss:0.155429
Epoch [1/2], Iter [444/3125], train_loss:0.156190
Epoch [1/2], Iter [445/3125], train_loss:0.183739
Epoch [1/2], Iter [446/3125], train_loss:0.168132
Epoch [1/2], Iter [447/3125], train_loss:0.156675
Epoch [1/2], Iter [448/3125], train_loss:0.174648
Epoch [1/2], Iter [449/3125], train_loss:0.176605
Epoch [1/2], Iter [450/3125], train_loss:0.165744
Epoch [1/2], Iter [451/3125], train_loss:0.159427
Epoch [1/2], Iter [452/3125], train_loss:0.137918
Epoch [1/2], Iter [453/3125], train_loss:0.154664
Epoch [1/2], Iter [454/3125], train_loss:0.188046
Epoch [1/2], Iter [455/3125], train_loss:0.157990
Epoch [1/2], Iter [456/3125], train_loss:0.161434
Epoch [1/2], Iter [457/3125], train_loss:0.164751
Epoch [1/2], Iter [458/3125], train_loss:0.147707
Epoch [1/2], Iter [459/3125], train_loss:0.156135
Epoch [1/2], Iter [460/3125], train_loss:0.170298
Epoch [1/2], Iter [461/3125], train_loss:0.157925
Epoch [1/2], Iter [462/3125], train_loss:0.161613
Epoch [1/2], Iter [463/3125], train_loss:0.156034
Epoch [1/2], Iter [464/3125], train_loss:0.154685
Epoch [1/2], Iter [465/3125], train_loss:0.159974
Epoch [1/2], Iter [466/3125], train_loss:0.137804
Epoch [1/2], Iter [467/3125], train_loss:0.173479
Epoch [1/2], Iter [468/3125], train_loss:0.160113
Epoch [1/2], Iter [469/3125], train_loss:0.181849
Epoch [1/2], Iter [470/3125], train_loss:0.154617
Epoch [1/2], Iter [471/3125], train_loss:0.145756
Epoch [1/2], Iter [472/3125], train_loss:0.173865
Epoch [1/2], Iter [473/3125], train_loss:0.179762
Epoch [1/2], Iter [474/3125], train_loss:0.148816
Epoch [1/2], Iter [475/3125], train_loss:0.143284
Epoch [1/2], Iter [476/3125], train_loss:0.171798
Epoch [1/2], Iter [477/3125], train_loss:0.180198
Epoch [1/2], Iter [478/3125], train_loss:0.160204
Epoch [1/2], Iter [479/3125], train_loss:0.166848
Epoch [1/2], Iter [480/3125], train_loss:0.168912
Epoch [1/2], Iter [481/3125], train_loss:0.151769
Epoch [1/2], Iter [482/3125], train_loss:0.164199
Epoch [1/2], Iter [483/3125], train_loss:0.159082
Epoch [1/2], Iter [484/3125], train_loss:0.157923
Epoch [1/2], Iter [485/3125], train_loss:0.175519
Epoch [1/2], Iter [486/3125], train_loss:0.161383
Epoch [1/2], Iter [487/3125], train_loss:0.162508
Epoch [1/2], Iter [488/3125], train_loss:0.165235
Epoch [1/2], Iter [489/3125], train_loss:0.179577
Epoch [1/2], Iter [490/3125], train_loss:0.151752
Epoch [1/2], Iter [491/3125], train_loss:0.171913
Epoch [1/2], Iter [492/3125], train_loss:0.163084
Epoch [1/2], Iter [493/3125], train_loss:0.156714
Epoch [1/2], Iter [494/3125], train_loss:0.156022
Epoch [1/2], Iter [495/3125], train_loss:0.157305
Epoch [1/2], Iter [496/3125], train_loss:0.156836
Epoch [1/2], Iter [497/3125], train_loss:0.154605
Epoch [1/2], Iter [498/3125], train_loss:0.174036
Epoch [1/2], Iter [499/3125], train_loss:0.164733
Epoch [1/2], Iter [500/3125], train_loss:0.162918
Epoch [1/2], Iter [501/3125], train_loss:0.149830
Epoch [1/2], Iter [502/3125], train_loss:0.186489
Epoch [1/2], Iter [503/3125], train_loss:0.145313
Epoch [1/2], Iter [504/3125], train_loss:0.152114
Epoch [1/2], Iter [505/3125], train_loss:0.150460
Epoch [1/2], Iter [506/3125], train_loss:0.172033
Epoch [1/2], Iter [507/3125], train_loss:0.156441
Epoch [1/2], Iter [508/3125], train_loss:0.151387
Epoch [1/2], Iter [509/3125], train_loss:0.174799
Epoch [1/2], Iter [510/3125], train_loss:0.156212
Epoch [1/2], Iter [511/3125], train_loss:0.157743
Epoch [1/2], Iter [512/3125], train_loss:0.171979
Epoch [1/2], Iter [513/3125], train_loss:0.183507
Epoch [1/2], Iter [514/3125], train_loss:0.174797
Epoch [1/2], Iter [515/3125], train_loss:0.151998
Epoch [1/2], Iter [516/3125], train_loss:0.164528
Epoch [1/2], Iter [517/3125], train_loss:0.164061
Epoch [1/2], Iter [518/3125], train_loss:0.184687
Epoch [1/2], Iter [519/3125], train_loss:0.153723
Epoch [1/2], Iter [520/3125], train_loss:0.140085
Epoch [1/2], Iter [521/3125], train_loss:0.161860
Epoch [1/2], Iter [522/3125], train_loss:0.142582
Epoch [1/2], Iter [523/3125], train_loss:0.158409
Epoch [1/2], Iter [524/3125], train_loss:0.197436
Epoch [1/2], Iter [525/3125], train_loss:0.170067
Epoch [1/2], Iter [526/3125], train_loss:0.150738
Epoch [1/2], Iter [527/3125], train_loss:0.164096
Epoch [1/2], Iter [528/3125], train_loss:0.159754
Epoch [1/2], Iter [529/3125], train_loss:0.152052
Epoch [1/2], Iter [530/3125], train_loss:0.161230
Epoch [1/2], Iter [531/3125], train_loss:0.181889
Epoch [1/2], Iter [532/3125], train_loss:0.149528
Epoch [1/2], Iter [533/3125], train_loss:0.156530
Epoch [1/2], Iter [534/3125], train_loss:0.143401
Epoch [1/2], Iter [535/3125], train_loss:0.164431
Epoch [1/2], Iter [536/3125], train_loss:0.155525
Epoch [1/2], Iter [537/3125], train_loss:0.170614
Epoch [1/2], Iter [538/3125], train_loss:0.172353
Epoch [1/2], Iter [539/3125], train_loss:0.167426
Epoch [1/2], Iter [540/3125], train_loss:0.141499
Epoch [1/2], Iter [541/3125], train_loss:0.165216
Epoch [1/2], Iter [542/3125], train_loss:0.164144
Epoch [1/2], Iter [543/3125], train_loss:0.149974
Epoch [1/2], Iter [544/3125], train_loss:0.157108
Epoch [1/2], Iter [545/3125], train_loss:0.169725
Epoch [1/2], Iter [546/3125], train_loss:0.181695
Epoch [1/2], Iter [547/3125], train_loss:0.161326
Epoch [1/2], Iter [548/3125], train_loss:0.187204
Epoch [1/2], Iter [549/3125], train_loss:0.152687
Epoch [1/2], Iter [550/3125], train_loss:0.144457
Epoch [1/2], Iter [551/3125], train_loss:0.160662
Epoch [1/2], Iter [552/3125], train_loss:0.154854
Epoch [1/2], Iter [553/3125], train_loss:0.159735
Epoch [1/2], Iter [554/3125], train_loss:0.147193
Epoch [1/2], Iter [555/3125], train_loss:0.157361
Epoch [1/2], Iter [556/3125], train_loss:0.186600
Epoch [1/2], Iter [557/3125], train_loss:0.152398
Epoch [1/2], Iter [558/3125], train_loss:0.175364
Epoch [1/2], Iter [559/3125], train_loss:0.167578
Epoch [1/2], Iter [560/3125], train_loss:0.158512
Epoch [1/2], Iter [561/3125], train_loss:0.173613
Epoch [1/2], Iter [562/3125], train_loss:0.160966
Epoch [1/2], Iter [563/3125], train_loss:0.172676
Epoch [1/2], Iter [564/3125], train_loss:0.158586
Epoch [1/2], Iter [565/3125], train_loss:0.180590
Epoch [1/2], Iter [566/3125], train_loss:0.192027
Epoch [1/2], Iter [567/3125], train_loss:0.157700
Epoch [1/2], Iter [568/3125], train_loss:0.162584
Epoch [1/2], Iter [569/3125], train_loss:0.183801
Epoch [1/2], Iter [570/3125], train_loss:0.167326
Epoch [1/2], Iter [571/3125], train_loss:0.164745
Epoch [1/2], Iter [572/3125], train_loss:0.173292
Epoch [1/2], Iter [573/3125], train_loss:0.153456
Epoch [1/2], Iter [574/3125], train_loss:0.160368
Epoch [1/2], Iter [575/3125], train_loss:0.151965
Epoch [1/2], Iter [576/3125], train_loss:0.154746
Epoch [1/2], Iter [577/3125], train_loss:0.170880
Epoch [1/2], Iter [578/3125], train_loss:0.161438
Epoch [1/2], Iter [579/3125], train_loss:0.180224
Epoch [1/2], Iter [580/3125], train_loss:0.178791
Epoch [1/2], Iter [581/3125], train_loss:0.145772
Epoch [1/2], Iter [582/3125], train_loss:0.160606
Epoch [1/2], Iter [583/3125], train_loss:0.176088
Epoch [1/2], Iter [584/3125], train_loss:0.164863
Epoch [1/2], Iter [585/3125], train_loss:0.181251
Epoch [1/2], Iter [586/3125], train_loss:0.151516
Epoch [1/2], Iter [587/3125], train_loss:0.176537
Epoch [1/2], Iter [588/3125], train_loss:0.159592
Epoch [1/2], Iter [589/3125], train_loss:0.156307
Epoch [1/2], Iter [590/3125], train_loss:0.149772
Epoch [1/2], Iter [591/3125], train_loss:0.168533
Epoch [1/2], Iter [592/3125], train_loss:0.156289
Epoch [1/2], Iter [593/3125], train_loss:0.171735
Epoch [1/2], Iter [594/3125], train_loss:0.159490
Epoch [1/2], Iter [595/3125], train_loss:0.156549
Epoch [1/2], Iter [596/3125], train_loss:0.172349
Epoch [1/2], Iter [597/3125], train_loss:0.163044
Epoch [1/2], Iter [598/3125], train_loss:0.153242
Epoch [1/2], Iter [599/3125], train_loss:0.178958
Epoch [1/2], Iter [600/3125], train_loss:0.154345
Epoch [1/2], Iter [601/3125], train_loss:0.169848
Epoch [1/2], Iter [602/3125], train_loss:0.148637
Epoch [1/2], Iter [603/3125], train_loss:0.168987
Epoch [1/2], Iter [604/3125], train_loss:0.181611
Epoch [1/2], Iter [605/3125], train_loss:0.155290
Epoch [1/2], Iter [606/3125], train_loss:0.159152
Epoch [1/2], Iter [607/3125], train_loss:0.158287
Epoch [1/2], Iter [608/3125], train_loss:0.165115
Epoch [1/2], Iter [609/3125], train_loss:0.170284
Epoch [1/2], Iter [610/3125], train_loss:0.180462
Epoch [1/2], Iter [611/3125], train_loss:0.170581
Epoch [1/2], Iter [612/3125], train_loss:0.156333
Epoch [1/2], Iter [613/3125], train_loss:0.158744
Epoch [1/2], Iter [614/3125], train_loss:0.151848
Epoch [1/2], Iter [615/3125], train_loss:0.146951
Epoch [1/2], Iter [616/3125], train_loss:0.164550
Epoch [1/2], Iter [617/3125], train_loss:0.163717
Epoch [1/2], Iter [618/3125], train_loss:0.143995
Epoch [1/2], Iter [619/3125], train_loss:0.174511
Epoch [1/2], Iter [620/3125], train_loss:0.161177
Epoch [1/2], Iter [621/3125], train_loss:0.178878
Epoch [1/2], Iter [622/3125], train_loss:0.170491
Epoch [1/2], Iter [623/3125], train_loss:0.172630
Epoch [1/2], Iter [624/3125], train_loss:0.185353
Epoch [1/2], Iter [625/3125], train_loss:0.162375
Epoch [1/2], Iter [626/3125], train_loss:0.166046
Epoch [1/2], Iter [627/3125], train_loss:0.187614
Epoch [1/2], Iter [628/3125], train_loss:0.171770
Epoch [1/2], Iter [629/3125], train_loss:0.137850
Epoch [1/2], Iter [630/3125], train_loss:0.160116
Epoch [1/2], Iter [631/3125], train_loss:0.147122
Epoch [1/2], Iter [632/3125], train_loss:0.182244
Epoch [1/2], Iter [633/3125], train_loss:0.143999
Epoch [1/2], Iter [634/3125], train_loss:0.182345
Epoch [1/2], Iter [635/3125], train_loss:0.156759
Epoch [1/2], Iter [636/3125], train_loss:0.148806
Epoch [1/2], Iter [637/3125], train_loss:0.154144
Epoch [1/2], Iter [638/3125], train_loss:0.149763
Epoch [1/2], Iter [639/3125], train_loss:0.154106
Epoch [1/2], Iter [640/3125], train_loss:0.168608
Epoch [1/2], Iter [641/3125], train_loss:0.152673
Epoch [1/2], Iter [642/3125], train_loss:0.173711
Epoch [1/2], Iter [643/3125], train_loss:0.168833
Epoch [1/2], Iter [644/3125], train_loss:0.169643
Epoch [1/2], Iter [645/3125], train_loss:0.155612
Epoch [1/2], Iter [646/3125], train_loss:0.161894
Epoch [1/2], Iter [647/3125], train_loss:0.173931
Epoch [1/2], Iter [648/3125], train_loss:0.195914
Epoch [1/2], Iter [649/3125], train_loss:0.156067
Epoch [1/2], Iter [650/3125], train_loss:0.182341
Epoch [1/2], Iter [651/3125], train_loss:0.162687
Epoch [1/2], Iter [652/3125], train_loss:0.146367
Epoch [1/2], Iter [653/3125], train_loss:0.188726
Epoch [1/2], Iter [654/3125], train_loss:0.141544
Epoch [1/2], Iter [655/3125], train_loss:0.136832
Epoch [1/2], Iter [656/3125], train_loss:0.165494
Epoch [1/2], Iter [657/3125], train_loss:0.173287
Epoch [1/2], Iter [658/3125], train_loss:0.178410
Epoch [1/2], Iter [659/3125], train_loss:0.153449
Epoch [1/2], Iter [660/3125], train_loss:0.141325
Epoch [1/2], Iter [661/3125], train_loss:0.160569
Epoch [1/2], Iter [662/3125], train_loss:0.168849
Epoch [1/2], Iter [663/3125], train_loss:0.170920
Epoch [1/2], Iter [664/3125], train_loss:0.168509
Epoch [1/2], Iter [665/3125], train_loss:0.174167
Epoch [1/2], Iter [666/3125], train_loss:0.165306
Epoch [1/2], Iter [667/3125], train_loss:0.173296
Epoch [1/2], Iter [668/3125], train_loss:0.161691
Epoch [1/2], Iter [669/3125], train_loss:0.164216
Epoch [1/2], Iter [670/3125], train_loss:0.153614
Epoch [1/2], Iter [671/3125], train_loss:0.176628
Epoch [1/2], Iter [672/3125], train_loss:0.170113
Epoch [1/2], Iter [673/3125], train_loss:0.164020
Epoch [1/2], Iter [674/3125], train_loss:0.170262
Epoch [1/2], Iter [675/3125], train_loss:0.160252
Epoch [1/2], Iter [676/3125], train_loss:0.159595
Epoch [1/2], Iter [677/3125], train_loss:0.174482
Epoch [1/2], Iter [678/3125], train_loss:0.174136
Epoch [1/2], Iter [679/3125], train_loss:0.169684
Epoch [1/2], Iter [680/3125], train_loss:0.156718
Epoch [1/2], Iter [681/3125], train_loss:0.172149
Epoch [1/2], Iter [682/3125], train_loss:0.153396
Epoch [1/2], Iter [683/3125], train_loss:0.152438
Epoch [1/2], Iter [684/3125], train_loss:0.175598
Epoch [1/2], Iter [685/3125], train_loss:0.144732
Epoch [1/2], Iter [686/3125], train_loss:0.146447
Epoch [1/2], Iter [687/3125], train_loss:0.149588
Epoch [1/2], Iter [688/3125], train_loss:0.158347
Epoch [1/2], Iter [689/3125], train_loss:0.173472
Epoch [1/2], Iter [690/3125], train_loss:0.164299
Epoch [1/2], Iter [691/3125], train_loss:0.147107
Epoch [1/2], Iter [692/3125], train_loss:0.138865
Epoch [1/2], Iter [693/3125], train_loss:0.147721
Epoch [1/2], Iter [694/3125], train_loss:0.186929
Epoch [1/2], Iter [695/3125], train_loss:0.149825
Epoch [1/2], Iter [696/3125], train_loss:0.159169
Epoch [1/2], Iter [697/3125], train_loss:0.168211
Epoch [1/2], Iter [698/3125], train_loss:0.155869
Epoch [1/2], Iter [699/3125], train_loss:0.175861
Epoch [1/2], Iter [700/3125], train_loss:0.147055
Epoch [1/2], Iter [701/3125], train_loss:0.152602
Epoch [1/2], Iter [702/3125], train_loss:0.165213
Epoch [1/2], Iter [703/3125], train_loss:0.155550
Epoch [1/2], Iter [704/3125], train_loss:0.165959
Epoch [1/2], Iter [705/3125], train_loss:0.184858
Epoch [1/2], Iter [706/3125], train_loss:0.156636
Epoch [1/2], Iter [707/3125], train_loss:0.141014
Epoch [1/2], Iter [708/3125], train_loss:0.172110
Epoch [1/2], Iter [709/3125], train_loss:0.166598
Epoch [1/2], Iter [710/3125], train_loss:0.181486
Epoch [1/2], Iter [711/3125], train_loss:0.149520
Epoch [1/2], Iter [712/3125], train_loss:0.141277
Epoch [1/2], Iter [713/3125], train_loss:0.150582
Epoch [1/2], Iter [714/3125], train_loss:0.170460
Epoch [1/2], Iter [715/3125], train_loss:0.166523
Epoch [1/2], Iter [716/3125], train_loss:0.140562
Epoch [1/2], Iter [717/3125], train_loss:0.157862
Epoch [1/2], Iter [718/3125], train_loss:0.158880
Epoch [1/2], Iter [719/3125], train_loss:0.151162
Epoch [1/2], Iter [720/3125], train_loss:0.150862
Epoch [1/2], Iter [721/3125], train_loss:0.172271
Epoch [1/2], Iter [722/3125], train_loss:0.167076
Epoch [1/2], Iter [723/3125], train_loss:0.160416
Epoch [1/2], Iter [724/3125], train_loss:0.164712
Epoch [1/2], Iter [725/3125], train_loss:0.155195
Epoch [1/2], Iter [726/3125], train_loss:0.173203
Epoch [1/2], Iter [727/3125], train_loss:0.203542
Epoch [1/2], Iter [728/3125], train_loss:0.132789
Epoch [1/2], Iter [729/3125], train_loss:0.170022
Epoch [1/2], Iter [730/3125], train_loss:0.150648
Epoch [1/2], Iter [731/3125], train_loss:0.152137
Epoch [1/2], Iter [732/3125], train_loss:0.165179
Epoch [1/2], Iter [733/3125], train_loss:0.181513
Epoch [1/2], Iter [734/3125], train_loss:0.144627
Epoch [1/2], Iter [735/3125], train_loss:0.156241
Epoch [1/2], Iter [736/3125], train_loss:0.156647
Epoch [1/2], Iter [737/3125], train_loss:0.156439
Epoch [1/2], Iter [738/3125], train_loss:0.184127
Epoch [1/2], Iter [739/3125], train_loss:0.149900
Epoch [1/2], Iter [740/3125], train_loss:0.178831
Epoch [1/2], Iter [741/3125], train_loss:0.154100
Epoch [1/2], Iter [742/3125], train_loss:0.173619
Epoch [1/2], Iter [743/3125], train_loss:0.174960
Epoch [1/2], Iter [744/3125], train_loss:0.158306
Epoch [1/2], Iter [745/3125], train_loss:0.157812
Epoch [1/2], Iter [746/3125], train_loss:0.170903
Epoch [1/2], Iter [747/3125], train_loss:0.158708
Epoch [1/2], Iter [748/3125], train_loss:0.177305
Epoch [1/2], Iter [749/3125], train_loss:0.157574
Epoch [1/2], Iter [750/3125], train_loss:0.163793
Epoch [1/2], Iter [751/3125], train_loss:0.175222
Epoch [1/2], Iter [752/3125], train_loss:0.167615
Epoch [1/2], Iter [753/3125], train_loss:0.175142
Epoch [1/2], Iter [754/3125], train_loss:0.164994
Epoch [1/2], Iter [755/3125], train_loss:0.173740
Epoch [1/2], Iter [756/3125], train_loss:0.184293
Epoch [1/2], Iter [757/3125], train_loss:0.174505
Epoch [1/2], Iter [758/3125], train_loss:0.151717
Epoch [1/2], Iter [759/3125], train_loss:0.149027
Epoch [1/2], Iter [760/3125], train_loss:0.181634
Epoch [1/2], Iter [761/3125], train_loss:0.157314
Epoch [1/2], Iter [762/3125], train_loss:0.137242
Epoch [1/2], Iter [763/3125], train_loss:0.168438
Epoch [1/2], Iter [764/3125], train_loss:0.141019
Epoch [1/2], Iter [765/3125], train_loss:0.154936
Epoch [1/2], Iter [766/3125], train_loss:0.155263
Epoch [1/2], Iter [767/3125], train_loss:0.156193
Epoch [1/2], Iter [768/3125], train_loss:0.154753
Epoch [1/2], Iter [769/3125], train_loss:0.152388
Epoch [1/2], Iter [770/3125], train_loss:0.154891
Epoch [1/2], Iter [771/3125], train_loss:0.150887
Epoch [1/2], Iter [772/3125], train_loss:0.170387
Epoch [1/2], Iter [773/3125], train_loss:0.142415
Epoch [1/2], Iter [774/3125], train_loss:0.157543
Epoch [1/2], Iter [775/3125], train_loss:0.161519
Epoch [1/2], Iter [776/3125], train_loss:0.153466
Epoch [1/2], Iter [777/3125], train_loss:0.164538
Epoch [1/2], Iter [778/3125], train_loss:0.167005
Epoch [1/2], Iter [779/3125], train_loss:0.164542
Epoch [1/2], Iter [780/3125], train_loss:0.136895
Epoch [1/2], Iter [781/3125], train_loss:0.143366
Epoch [1/2], Iter [782/3125], train_loss:0.159515
Epoch [1/2], Iter [783/3125], train_loss:0.159623
Epoch [1/2], Iter [784/3125], train_loss:0.175021
Epoch [1/2], Iter [785/3125], train_loss:0.188726
Epoch [1/2], Iter [786/3125], train_loss:0.167352
Epoch [1/2], Iter [787/3125], train_loss:0.159414
Epoch [1/2], Iter [788/3125], train_loss:0.143568
Epoch [1/2], Iter [789/3125], train_loss:0.157005
Epoch [1/2], Iter [790/3125], train_loss:0.150693
Epoch [1/2], Iter [791/3125], train_loss:0.142032
Epoch [1/2], Iter [792/3125], train_loss:0.158453
Epoch [1/2], Iter [793/3125], train_loss:0.171967
Epoch [1/2], Iter [794/3125], train_loss:0.154673
Epoch [1/2], Iter [795/3125], train_loss:0.161099
Epoch [1/2], Iter [796/3125], train_loss:0.149141
Epoch [1/2], Iter [797/3125], train_loss:0.172768
Epoch [1/2], Iter [798/3125], train_loss:0.136935
Epoch [1/2], Iter [799/3125], train_loss:0.150901
Epoch [1/2], Iter [800/3125], train_loss:0.177802
Epoch [1/2], Iter [801/3125], train_loss:0.151622
Epoch [1/2], Iter [802/3125], train_loss:0.175425
Epoch [1/2], Iter [803/3125], train_loss:0.158219
Epoch [1/2], Iter [804/3125], train_loss:0.160822
Epoch [1/2], Iter [805/3125], train_loss:0.171360
Epoch [1/2], Iter [806/3125], train_loss:0.173840
Epoch [1/2], Iter [807/3125], train_loss:0.170521
Epoch [1/2], Iter [808/3125], train_loss:0.155042
Epoch [1/2], Iter [809/3125], train_loss:0.201056
Epoch [1/2], Iter [810/3125], train_loss:0.167513
Epoch [1/2], Iter [811/3125], train_loss:0.159135
Epoch [1/2], Iter [812/3125], train_loss:0.161629
Epoch [1/2], Iter [813/3125], train_loss:0.172826
Epoch [1/2], Iter [814/3125], train_loss:0.148274
Epoch [1/2], Iter [815/3125], train_loss:0.183451
Epoch [1/2], Iter [816/3125], train_loss:0.164296
Epoch [1/2], Iter [817/3125], train_loss:0.177334
Epoch [1/2], Iter [818/3125], train_loss:0.154336
Epoch [1/2], Iter [819/3125], train_loss:0.170955
Epoch [1/2], Iter [820/3125], train_loss:0.168194
Epoch [1/2], Iter [821/3125], train_loss:0.165284
Epoch [1/2], Iter [822/3125], train_loss:0.153692
Epoch [1/2], Iter [823/3125], train_loss:0.164452
Epoch [1/2], Iter [824/3125], train_loss:0.160168
Epoch [1/2], Iter [825/3125], train_loss:0.143389
Epoch [1/2], Iter [826/3125], train_loss:0.125640
Epoch [1/2], Iter [827/3125], train_loss:0.154325
Epoch [1/2], Iter [828/3125], train_loss:0.170027
Epoch [1/2], Iter [829/3125], train_loss:0.163227
Epoch [1/2], Iter [830/3125], train_loss:0.180084
Epoch [1/2], Iter [831/3125], train_loss:0.153447
Epoch [1/2], Iter [832/3125], train_loss:0.174136
Epoch [1/2], Iter [833/3125], train_loss:0.166332
Epoch [1/2], Iter [834/3125], train_loss:0.157354
Epoch [1/2], Iter [835/3125], train_loss:0.120264
Epoch [1/2], Iter [836/3125], train_loss:0.148319
Epoch [1/2], Iter [837/3125], train_loss:0.156353
Epoch [1/2], Iter [838/3125], train_loss:0.153210
Epoch [1/2], Iter [839/3125], train_loss:0.169396
Epoch [1/2], Iter [840/3125], train_loss:0.163863
Epoch [1/2], Iter [841/3125], train_loss:0.156365
Epoch [1/2], Iter [842/3125], train_loss:0.166741
Epoch [1/2], Iter [843/3125], train_loss:0.153688
Epoch [1/2], Iter [844/3125], train_loss:0.173625
Epoch [1/2], Iter [845/3125], train_loss:0.167021
Epoch [1/2], Iter [846/3125], train_loss:0.149013
Epoch [1/2], Iter [847/3125], train_loss:0.165667
Epoch [1/2], Iter [848/3125], train_loss:0.153663
Epoch [1/2], Iter [849/3125], train_loss:0.179361
Epoch [1/2], Iter [850/3125], train_loss:0.175290
Epoch [1/2], Iter [851/3125], train_loss:0.168034
Epoch [1/2], Iter [852/3125], train_loss:0.161994
Epoch [1/2], Iter [853/3125], train_loss:0.171823
Epoch [1/2], Iter [854/3125], train_loss:0.147653
Epoch [1/2], Iter [855/3125], train_loss:0.159570
Epoch [1/2], Iter [856/3125], train_loss:0.157177
Epoch [1/2], Iter [857/3125], train_loss:0.155356
Epoch [1/2], Iter [858/3125], train_loss:0.159401
Epoch [1/2], Iter [859/3125], train_loss:0.179499
Epoch [1/2], Iter [860/3125], train_loss:0.153504
Epoch [1/2], Iter [861/3125], train_loss:0.174314
Epoch [1/2], Iter [862/3125], train_loss:0.145309
Epoch [1/2], Iter [863/3125], train_loss:0.170880
Epoch [1/2], Iter [864/3125], train_loss:0.171654
Epoch [1/2], Iter [865/3125], train_loss:0.125795
Epoch [1/2], Iter [866/3125], train_loss:0.161326
Epoch [1/2], Iter [867/3125], train_loss:0.170622
Epoch [1/2], Iter [868/3125], train_loss:0.164444
Epoch [1/2], Iter [869/3125], train_loss:0.138547
Epoch [1/2], Iter [870/3125], train_loss:0.158241
Epoch [1/2], Iter [871/3125], train_loss:0.158963
Epoch [1/2], Iter [872/3125], train_loss:0.198872
Epoch [1/2], Iter [873/3125], train_loss:0.171953
Epoch [1/2], Iter [874/3125], train_loss:0.135011
Epoch [1/2], Iter [875/3125], train_loss:0.187574
Epoch [1/2], Iter [876/3125], train_loss:0.183588
Epoch [1/2], Iter [877/3125], train_loss:0.172207
Epoch [1/2], Iter [878/3125], train_loss:0.163572
Epoch [1/2], Iter [879/3125], train_loss:0.177148
Epoch [1/2], Iter [880/3125], train_loss:0.156533
Epoch [1/2], Iter [881/3125], train_loss:0.164676
Epoch [1/2], Iter [882/3125], train_loss:0.155344
Epoch [1/2], Iter [883/3125], train_loss:0.179421
Epoch [1/2], Iter [884/3125], train_loss:0.152775
Epoch [1/2], Iter [885/3125], train_loss:0.183656
Epoch [1/2], Iter [886/3125], train_loss:0.178685
Epoch [1/2], Iter [887/3125], train_loss:0.174813
Epoch [1/2], Iter [888/3125], train_loss:0.164418
Epoch [1/2], Iter [889/3125], train_loss:0.151287
Epoch [1/2], Iter [890/3125], train_loss:0.159186
Epoch [1/2], Iter [891/3125], train_loss:0.176169
Epoch [1/2], Iter [892/3125], train_loss:0.153548
Epoch [1/2], Iter [893/3125], train_loss:0.163016
Epoch [1/2], Iter [894/3125], train_loss:0.152066
Epoch [1/2], Iter [895/3125], train_loss:0.161777
Epoch [1/2], Iter [896/3125], train_loss:0.147675
Epoch [1/2], Iter [897/3125], train_loss:0.176385
Epoch [1/2], Iter [898/3125], train_loss:0.163108
Epoch [1/2], Iter [899/3125], train_loss:0.157772
Epoch [1/2], Iter [900/3125], train_loss:0.176365
Epoch [1/2], Iter [901/3125], train_loss:0.163414
Epoch [1/2], Iter [902/3125], train_loss:0.152687
Epoch [1/2], Iter [903/3125], train_loss:0.149312
Epoch [1/2], Iter [904/3125], train_loss:0.145944
Epoch [1/2], Iter [905/3125], train_loss:0.183935
Epoch [1/2], Iter [906/3125], train_loss:0.141416
Epoch [1/2], Iter [907/3125], train_loss:0.148432
Epoch [1/2], Iter [908/3125], train_loss:0.161458
Epoch [1/2], Iter [909/3125], train_loss:0.159551
Epoch [1/2], Iter [910/3125], train_loss:0.147279
Epoch [1/2], Iter [911/3125], train_loss:0.149100
Epoch [1/2], Iter [912/3125], train_loss:0.147561
Epoch [1/2], Iter [913/3125], train_loss:0.153887
Epoch [1/2], Iter [914/3125], train_loss:0.155617
Epoch [1/2], Iter [915/3125], train_loss:0.138967
Epoch [1/2], Iter [916/3125], train_loss:0.187655
Epoch [1/2], Iter [917/3125], train_loss:0.189089
Epoch [1/2], Iter [918/3125], train_loss:0.185985
Epoch [1/2], Iter [919/3125], train_loss:0.159035
Epoch [1/2], Iter [920/3125], train_loss:0.158106
Epoch [1/2], Iter [921/3125], train_loss:0.160929
Epoch [1/2], Iter [922/3125], train_loss:0.165616
Epoch [1/2], Iter [923/3125], train_loss:0.159126
Epoch [1/2], Iter [924/3125], train_loss:0.166481
Epoch [1/2], Iter [925/3125], train_loss:0.166022
Epoch [1/2], Iter [926/3125], train_loss:0.143194
Epoch [1/2], Iter [927/3125], train_loss:0.166617
Epoch [1/2], Iter [928/3125], train_loss:0.165519
Epoch [1/2], Iter [929/3125], train_loss:0.149431
Epoch [1/2], Iter [930/3125], train_loss:0.158727
Epoch [1/2], Iter [931/3125], train_loss:0.143095
Epoch [1/2], Iter [932/3125], train_loss:0.153236
Epoch [1/2], Iter [933/3125], train_loss:0.148599
Epoch [1/2], Iter [934/3125], train_loss:0.159922
Epoch [1/2], Iter [935/3125], train_loss:0.168778
Epoch [1/2], Iter [936/3125], train_loss:0.149560
Epoch [1/2], Iter [937/3125], train_loss:0.160552
Epoch [1/2], Iter [938/3125], train_loss:0.151009
Epoch [1/2], Iter [939/3125], train_loss:0.171371
Epoch [1/2], Iter [940/3125], train_loss:0.156552
Epoch [1/2], Iter [941/3125], train_loss:0.153480
Epoch [1/2], Iter [942/3125], train_loss:0.144583
Epoch [1/2], Iter [943/3125], train_loss:0.156411
Epoch [1/2], Iter [944/3125], train_loss:0.153091
Epoch [1/2], Iter [945/3125], train_loss:0.158801
Epoch [1/2], Iter [946/3125], train_loss:0.139097
Epoch [1/2], Iter [947/3125], train_loss:0.171316
Epoch [1/2], Iter [948/3125], train_loss:0.179716
Epoch [1/2], Iter [949/3125], train_loss:0.150855
Epoch [1/2], Iter [950/3125], train_loss:0.159264
Epoch [1/2], Iter [951/3125], train_loss:0.178506
Epoch [1/2], Iter [952/3125], train_loss:0.165264
Epoch [1/2], Iter [953/3125], train_loss:0.163345
Epoch [1/2], Iter [954/3125], train_loss:0.163006
Epoch [1/2], Iter [955/3125], train_loss:0.180055
Epoch [1/2], Iter [956/3125], train_loss:0.152678
Epoch [1/2], Iter [957/3125], train_loss:0.149015
Epoch [1/2], Iter [958/3125], train_loss:0.167205
Epoch [1/2], Iter [959/3125], train_loss:0.159652
Epoch [1/2], Iter [960/3125], train_loss:0.172454
Epoch [1/2], Iter [961/3125], train_loss:0.152453
Epoch [1/2], Iter [962/3125], train_loss:0.159779
Epoch [1/2], Iter [963/3125], train_loss:0.157254
Epoch [1/2], Iter [964/3125], train_loss:0.171195
Epoch [1/2], Iter [965/3125], train_loss:0.129605
Epoch [1/2], Iter [966/3125], train_loss:0.171718
Epoch [1/2], Iter [967/3125], train_loss:0.135528
Epoch [1/2], Iter [968/3125], train_loss:0.171078
Epoch [1/2], Iter [969/3125], train_loss:0.177459
Epoch [1/2], Iter [970/3125], train_loss:0.155430
Epoch [1/2], Iter [971/3125], train_loss:0.162782
Epoch [1/2], Iter [972/3125], train_loss:0.179943
Epoch [1/2], Iter [973/3125], train_loss:0.159568
Epoch [1/2], Iter [974/3125], train_loss:0.145395
Epoch [1/2], Iter [975/3125], train_loss:0.162883
Epoch [1/2], Iter [976/3125], train_loss:0.152242
Epoch [1/2], Iter [977/3125], train_loss:0.178401
Epoch [1/2], Iter [978/3125], train_loss:0.149824
Epoch [1/2], Iter [979/3125], train_loss:0.164016
Epoch [1/2], Iter [980/3125], train_loss:0.173642
Epoch [1/2], Iter [981/3125], train_loss:0.184837
Epoch [1/2], Iter [982/3125], train_loss:0.157643
Epoch [1/2], Iter [983/3125], train_loss:0.170323
Epoch [1/2], Iter [984/3125], train_loss:0.145053
Epoch [1/2], Iter [985/3125], train_loss:0.177159
Epoch [1/2], Iter [986/3125], train_loss:0.170435
Epoch [1/2], Iter [987/3125], train_loss:0.140393
Epoch [1/2], Iter [988/3125], train_loss:0.170333
Epoch [1/2], Iter [989/3125], train_loss:0.154973
Epoch [1/2], Iter [990/3125], train_loss:0.168512
Epoch [1/2], Iter [991/3125], train_loss:0.171748
Epoch [1/2], Iter [992/3125], train_loss:0.191529
Epoch [1/2], Iter [993/3125], train_loss:0.164100
Epoch [1/2], Iter [994/3125], train_loss:0.169794
Epoch [1/2], Iter [995/3125], train_loss:0.161297
Epoch [1/2], Iter [996/3125], train_loss:0.154997
Epoch [1/2], Iter [997/3125], train_loss:0.174858
Epoch [1/2], Iter [998/3125], train_loss:0.131562
Epoch [1/2], Iter [999/3125], train_loss:0.170373
Epoch [1/2], Iter [1000/3125], train_loss:0.174953
Epoch [1/2], Iter [1001/3125], train_loss:0.171201
Epoch [1/2], Iter [1002/3125], train_loss:0.157745
Epoch [1/2], Iter [1003/3125], train_loss:0.165321
Epoch [1/2], Iter [1004/3125], train_loss:0.166191
Epoch [1/2], Iter [1005/3125], train_loss:0.161590
Epoch [1/2], Iter [1006/3125], train_loss:0.155218
Epoch [1/2], Iter [1007/3125], train_loss:0.161306
Epoch [1/2], Iter [1008/3125], train_loss:0.160885
Epoch [1/2], Iter [1009/3125], train_loss:0.150420
Epoch [1/2], Iter [1010/3125], train_loss:0.180716
Epoch [1/2], Iter [1011/3125], train_loss:0.170864
Epoch [1/2], Iter [1012/3125], train_loss:0.155604
Epoch [1/2], Iter [1013/3125], train_loss:0.138882
Epoch [1/2], Iter [1014/3125], train_loss:0.163906
Epoch [1/2], Iter [1015/3125], train_loss:0.157286
Epoch [1/2], Iter [1016/3125], train_loss:0.176689
Epoch [1/2], Iter [1017/3125], train_loss:0.164429
Epoch [1/2], Iter [1018/3125], train_loss:0.151421
Epoch [1/2], Iter [1019/3125], train_loss:0.173269
Epoch [1/2], Iter [1020/3125], train_loss:0.159520
Epoch [1/2], Iter [1021/3125], train_loss:0.136108
Epoch [1/2], Iter [1022/3125], train_loss:0.168635
Epoch [1/2], Iter [1023/3125], train_loss:0.172606
Epoch [1/2], Iter [1024/3125], train_loss:0.169962
Epoch [1/2], Iter [1025/3125], train_loss:0.171667
Epoch [1/2], Iter [1026/3125], train_loss:0.186360
Epoch [1/2], Iter [1027/3125], train_loss:0.154808
Epoch [1/2], Iter [1028/3125], train_loss:0.162741
Epoch [1/2], Iter [1029/3125], train_loss:0.168956
Epoch [1/2], Iter [1030/3125], train_loss:0.167106
Epoch [1/2], Iter [1031/3125], train_loss:0.150996
Epoch [1/2], Iter [1032/3125], train_loss:0.148269
Epoch [1/2], Iter [1033/3125], train_loss:0.159704
Epoch [1/2], Iter [1034/3125], train_loss:0.169828
Epoch [1/2], Iter [1035/3125], train_loss:0.170876
Epoch [1/2], Iter [1036/3125], train_loss:0.152638
Epoch [1/2], Iter [1037/3125], train_loss:0.156386
Epoch [1/2], Iter [1038/3125], train_loss:0.158583
Epoch [1/2], Iter [1039/3125], train_loss:0.131727
Epoch [1/2], Iter [1040/3125], train_loss:0.159804
Epoch [1/2], Iter [1041/3125], train_loss:0.150478
Epoch [1/2], Iter [1042/3125], train_loss:0.172487
Epoch [1/2], Iter [1043/3125], train_loss:0.172604
Epoch [1/2], Iter [1044/3125], train_loss:0.176825
Epoch [1/2], Iter [1045/3125], train_loss:0.155156
Epoch [1/2], Iter [1046/3125], train_loss:0.159919
Epoch [1/2], Iter [1047/3125], train_loss:0.158133
Epoch [1/2], Iter [1048/3125], train_loss:0.171692
Epoch [1/2], Iter [1049/3125], train_loss:0.148961
Epoch [1/2], Iter [1050/3125], train_loss:0.145803
Epoch [1/2], Iter [1051/3125], train_loss:0.166840
Epoch [1/2], Iter [1052/3125], train_loss:0.144305
Epoch [1/2], Iter [1053/3125], train_loss:0.148482
Epoch [1/2], Iter [1054/3125], train_loss:0.159671
Epoch [1/2], Iter [1055/3125], train_loss:0.160208
Epoch [1/2], Iter [1056/3125], train_loss:0.167555
Epoch [1/2], Iter [1057/3125], train_loss:0.161161
Epoch [1/2], Iter [1058/3125], train_loss:0.149388
Epoch [1/2], Iter [1059/3125], train_loss:0.181070
Epoch [1/2], Iter [1060/3125], train_loss:0.171973
Epoch [1/2], Iter [1061/3125], train_loss:0.180709
Epoch [1/2], Iter [1062/3125], train_loss:0.153507
Epoch [1/2], Iter [1063/3125], train_loss:0.145896
Epoch [1/2], Iter [1064/3125], train_loss:0.166199
Epoch [1/2], Iter [1065/3125], train_loss:0.166107
Epoch [1/2], Iter [1066/3125], train_loss:0.155540
Epoch [1/2], Iter [1067/3125], train_loss:0.154284
Epoch [1/2], Iter [1068/3125], train_loss:0.187118
Epoch [1/2], Iter [1069/3125], train_loss:0.159748
Epoch [1/2], Iter [1070/3125], train_loss:0.164387
Epoch [1/2], Iter [1071/3125], train_loss:0.142183
Epoch [1/2], Iter [1072/3125], train_loss:0.138724
Epoch [1/2], Iter [1073/3125], train_loss:0.146154
Epoch [1/2], Iter [1074/3125], train_loss:0.162630
Epoch [1/2], Iter [1075/3125], train_loss:0.185121
Epoch [1/2], Iter [1076/3125], train_loss:0.160997
Epoch [1/2], Iter [1077/3125], train_loss:0.179530
Epoch [1/2], Iter [1078/3125], train_loss:0.153985
Epoch [1/2], Iter [1079/3125], train_loss:0.147778
Epoch [1/2], Iter [1080/3125], train_loss:0.149064
Epoch [1/2], Iter [1081/3125], train_loss:0.151860
Epoch [1/2], Iter [1082/3125], train_loss:0.161012
Epoch [1/2], Iter [1083/3125], train_loss:0.197195
Epoch [1/2], Iter [1084/3125], train_loss:0.157916
Epoch [1/2], Iter [1085/3125], train_loss:0.162013
Epoch [1/2], Iter [1086/3125], train_loss:0.160678
Epoch [1/2], Iter [1087/3125], train_loss:0.168688
Epoch [1/2], Iter [1088/3125], train_loss:0.139971
Epoch [1/2], Iter [1089/3125], train_loss:0.184158
Epoch [1/2], Iter [1090/3125], train_loss:0.184676
Epoch [1/2], Iter [1091/3125], train_loss:0.150608
Epoch [1/2], Iter [1092/3125], train_loss:0.151490
Epoch [1/2], Iter [1093/3125], train_loss:0.176660
Epoch [1/2], Iter [1094/3125], train_loss:0.153394
Epoch [1/2], Iter [1095/3125], train_loss:0.163054
Epoch [1/2], Iter [1096/3125], train_loss:0.144125
Epoch [1/2], Iter [1097/3125], train_loss:0.167320
Epoch [1/2], Iter [1098/3125], train_loss:0.158095
Epoch [1/2], Iter [1099/3125], train_loss:0.150448
Epoch [1/2], Iter [1100/3125], train_loss:0.167379
Epoch [1/2], Iter [1101/3125], train_loss:0.151696
Epoch [1/2], Iter [1102/3125], train_loss:0.174957
Epoch [1/2], Iter [1103/3125], train_loss:0.159035
Epoch [1/2], Iter [1104/3125], train_loss:0.163059
Epoch [1/2], Iter [1105/3125], train_loss:0.170439
Epoch [1/2], Iter [1106/3125], train_loss:0.163471
Epoch [1/2], Iter [1107/3125], train_loss:0.180205
Epoch [1/2], Iter [1108/3125], train_loss:0.148346
Epoch [1/2], Iter [1109/3125], train_loss:0.170757
Epoch [1/2], Iter [1110/3125], train_loss:0.155306
Epoch [1/2], Iter [1111/3125], train_loss:0.173857
Epoch [1/2], Iter [1112/3125], train_loss:0.149888
Epoch [1/2], Iter [1113/3125], train_loss:0.164874
Epoch [1/2], Iter [1114/3125], train_loss:0.165291
Epoch [1/2], Iter [1115/3125], train_loss:0.143524
Epoch [1/2], Iter [1116/3125], train_loss:0.151315
Epoch [1/2], Iter [1117/3125], train_loss:0.169421
Epoch [1/2], Iter [1118/3125], train_loss:0.166722
Epoch [1/2], Iter [1119/3125], train_loss:0.167637
Epoch [1/2], Iter [1120/3125], train_loss:0.155744
Epoch [1/2], Iter [1121/3125], train_loss:0.162011
Epoch [1/2], Iter [1122/3125], train_loss:0.160295
Epoch [1/2], Iter [1123/3125], train_loss:0.154625
Epoch [1/2], Iter [1124/3125], train_loss:0.151765
Epoch [1/2], Iter [1125/3125], train_loss:0.170621
Epoch [1/2], Iter [1126/3125], train_loss:0.155552
Epoch [1/2], Iter [1127/3125], train_loss:0.173134
Epoch [1/2], Iter [1128/3125], train_loss:0.153150
Epoch [1/2], Iter [1129/3125], train_loss:0.145719
Epoch [1/2], Iter [1130/3125], train_loss:0.187136
Epoch [1/2], Iter [1131/3125], train_loss:0.169417
Epoch [1/2], Iter [1132/3125], train_loss:0.178974
Epoch [1/2], Iter [1133/3125], train_loss:0.149931
Epoch [1/2], Iter [1134/3125], train_loss:0.155474
Epoch [1/2], Iter [1135/3125], train_loss:0.161715
Epoch [1/2], Iter [1136/3125], train_loss:0.165408
Epoch [1/2], Iter [1137/3125], train_loss:0.170022
Epoch [1/2], Iter [1138/3125], train_loss:0.147393
Epoch [1/2], Iter [1139/3125], train_loss:0.175394
Epoch [1/2], Iter [1140/3125], train_loss:0.157841
Epoch [1/2], Iter [1141/3125], train_loss:0.164718
Epoch [1/2], Iter [1142/3125], train_loss:0.154701
Epoch [1/2], Iter [1143/3125], train_loss:0.168679
Epoch [1/2], Iter [1144/3125], train_loss:0.181446
Epoch [1/2], Iter [1145/3125], train_loss:0.148103
Epoch [1/2], Iter [1146/3125], train_loss:0.151220
Epoch [1/2], Iter [1147/3125], train_loss:0.186586
Epoch [1/2], Iter [1148/3125], train_loss:0.174347
Epoch [1/2], Iter [1149/3125], train_loss:0.170932
Epoch [1/2], Iter [1150/3125], train_loss:0.165207
Epoch [1/2], Iter [1151/3125], train_loss:0.158725
Epoch [1/2], Iter [1152/3125], train_loss:0.155164
Epoch [1/2], Iter [1153/3125], train_loss:0.171893
Epoch [1/2], Iter [1154/3125], train_loss:0.162601
Epoch [1/2], Iter [1155/3125], train_loss:0.160125
Epoch [1/2], Iter [1156/3125], train_loss:0.181936
Epoch [1/2], Iter [1157/3125], train_loss:0.172337
Epoch [1/2], Iter [1158/3125], train_loss:0.147319
Epoch [1/2], Iter [1159/3125], train_loss:0.183120
Epoch [1/2], Iter [1160/3125], train_loss:0.168000
Epoch [1/2], Iter [1161/3125], train_loss:0.163454
Epoch [1/2], Iter [1162/3125], train_loss:0.158614
Epoch [1/2], Iter [1163/3125], train_loss:0.170988
Epoch [1/2], Iter [1164/3125], train_loss:0.162558
Epoch [1/2], Iter [1165/3125], train_loss:0.164345
Epoch [1/2], Iter [1166/3125], train_loss:0.151922
Epoch [1/2], Iter [1167/3125], train_loss:0.182245
Epoch [1/2], Iter [1168/3125], train_loss:0.162371
Epoch [1/2], Iter [1169/3125], train_loss:0.155639
Epoch [1/2], Iter [1170/3125], train_loss:0.157078
Epoch [1/2], Iter [1171/3125], train_loss:0.168648
Epoch [1/2], Iter [1172/3125], train_loss:0.160894
Epoch [1/2], Iter [1173/3125], train_loss:0.172699
Epoch [1/2], Iter [1174/3125], train_loss:0.186120
Epoch [1/2], Iter [1175/3125], train_loss:0.163291
Epoch [1/2], Iter [1176/3125], train_loss:0.160210
Epoch [1/2], Iter [1177/3125], train_loss:0.157460
Epoch [1/2], Iter [1178/3125], train_loss:0.169464
Epoch [1/2], Iter [1179/3125], train_loss:0.155117
Epoch [1/2], Iter [1180/3125], train_loss:0.175044
Epoch [1/2], Iter [1181/3125], train_loss:0.171335
Epoch [1/2], Iter [1182/3125], train_loss:0.156485
Epoch [1/2], Iter [1183/3125], train_loss:0.166067
Epoch [1/2], Iter [1184/3125], train_loss:0.161618
Epoch [1/2], Iter [1185/3125], train_loss:0.158961
Epoch [1/2], Iter [1186/3125], train_loss:0.158767
Epoch [1/2], Iter [1187/3125], train_loss:0.158581
Epoch [1/2], Iter [1188/3125], train_loss:0.150107
Epoch [1/2], Iter [1189/3125], train_loss:0.151393
Epoch [1/2], Iter [1190/3125], train_loss:0.157746
Epoch [1/2], Iter [1191/3125], train_loss:0.162504
Epoch [1/2], Iter [1192/3125], train_loss:0.162217
Epoch [1/2], Iter [1193/3125], train_loss:0.184125
Epoch [1/2], Iter [1194/3125], train_loss:0.155755
Epoch [1/2], Iter [1195/3125], train_loss:0.163561
Epoch [1/2], Iter [1196/3125], train_loss:0.169904
Epoch [1/2], Iter [1197/3125], train_loss:0.163287
Epoch [1/2], Iter [1198/3125], train_loss:0.162994
Epoch [1/2], Iter [1199/3125], train_loss:0.179970
Epoch [1/2], Iter [1200/3125], train_loss:0.175508
Epoch [1/2], Iter [1201/3125], train_loss:0.171165
Epoch [1/2], Iter [1202/3125], train_loss:0.157112
Epoch [1/2], Iter [1203/3125], train_loss:0.162217
Epoch [1/2], Iter [1204/3125], train_loss:0.168821
Epoch [1/2], Iter [1205/3125], train_loss:0.197255
Epoch [1/2], Iter [1206/3125], train_loss:0.160194
Epoch [1/2], Iter [1207/3125], train_loss:0.151903
Epoch [1/2], Iter [1208/3125], train_loss:0.149178
Epoch [1/2], Iter [1209/3125], train_loss:0.146489
Epoch [1/2], Iter [1210/3125], train_loss:0.154193
Epoch [1/2], Iter [1211/3125], train_loss:0.157028
Epoch [1/2], Iter [1212/3125], train_loss:0.162961
Epoch [1/2], Iter [1213/3125], train_loss:0.176358
Epoch [1/2], Iter [1214/3125], train_loss:0.170513
Epoch [1/2], Iter [1215/3125], train_loss:0.166415
Epoch [1/2], Iter [1216/3125], train_loss:0.150504
Epoch [1/2], Iter [1217/3125], train_loss:0.169194
Epoch [1/2], Iter [1218/3125], train_loss:0.173286
Epoch [1/2], Iter [1219/3125], train_loss:0.170073
Epoch [1/2], Iter [1220/3125], train_loss:0.157464
Epoch [1/2], Iter [1221/3125], train_loss:0.153022
Epoch [1/2], Iter [1222/3125], train_loss:0.164855
Epoch [1/2], Iter [1223/3125], train_loss:0.155083
Epoch [1/2], Iter [1224/3125], train_loss:0.165551
Epoch [1/2], Iter [1225/3125], train_loss:0.185195
Epoch [1/2], Iter [1226/3125], train_loss:0.177821
Epoch [1/2], Iter [1227/3125], train_loss:0.154561
Epoch [1/2], Iter [1228/3125], train_loss:0.159085
Epoch [1/2], Iter [1229/3125], train_loss:0.171906
Epoch [1/2], Iter [1230/3125], train_loss:0.160470
Epoch [1/2], Iter [1231/3125], train_loss:0.151237
Epoch [1/2], Iter [1232/3125], train_loss:0.135055
Epoch [1/2], Iter [1233/3125], train_loss:0.140605
Epoch [1/2], Iter [1234/3125], train_loss:0.183646
Epoch [1/2], Iter [1235/3125], train_loss:0.158728
Epoch [1/2], Iter [1236/3125], train_loss:0.163355
Epoch [1/2], Iter [1237/3125], train_loss:0.148448
Epoch [1/2], Iter [1238/3125], train_loss:0.165396
Epoch [1/2], Iter [1239/3125], train_loss:0.181543
Epoch [1/2], Iter [1240/3125], train_loss:0.166355
Epoch [1/2], Iter [1241/3125], train_loss:0.158869
Epoch [1/2], Iter [1242/3125], train_loss:0.153979
Epoch [1/2], Iter [1243/3125], train_loss:0.155492
Epoch [1/2], Iter [1244/3125], train_loss:0.170940
Epoch [1/2], Iter [1245/3125], train_loss:0.166005
Epoch [1/2], Iter [1246/3125], train_loss:0.158416
Epoch [1/2], Iter [1247/3125], train_loss:0.154584
Epoch [1/2], Iter [1248/3125], train_loss:0.152003
Epoch [1/2], Iter [1249/3125], train_loss:0.168855
Epoch [1/2], Iter [1250/3125], train_loss:0.148871
Epoch [1/2], Iter [1251/3125], train_loss:0.175113
Epoch [1/2], Iter [1252/3125], train_loss:0.149920
Epoch [1/2], Iter [1253/3125], train_loss:0.151580
Epoch [1/2], Iter [1254/3125], train_loss:0.168768
Epoch [1/2], Iter [1255/3125], train_loss:0.166119
Epoch [1/2], Iter [1256/3125], train_loss:0.140963
Epoch [1/2], Iter [1257/3125], train_loss:0.168684
Epoch [1/2], Iter [1258/3125], train_loss:0.158394
Epoch [1/2], Iter [1259/3125], train_loss:0.161410
Epoch [1/2], Iter [1260/3125], train_loss:0.148364
Epoch [1/2], Iter [1261/3125], train_loss:0.165485
Epoch [1/2], Iter [1262/3125], train_loss:0.153689
Epoch [1/2], Iter [1263/3125], train_loss:0.171761
Epoch [1/2], Iter [1264/3125], train_loss:0.163797
Epoch [1/2], Iter [1265/3125], train_loss:0.146530
Epoch [1/2], Iter [1266/3125], train_loss:0.158110
Epoch [1/2], Iter [1267/3125], train_loss:0.160058
Epoch [1/2], Iter [1268/3125], train_loss:0.157368
Epoch [1/2], Iter [1269/3125], train_loss:0.151690
Epoch [1/2], Iter [1270/3125], train_loss:0.142817
Epoch [1/2], Iter [1271/3125], train_loss:0.153046
Epoch [1/2], Iter [1272/3125], train_loss:0.162205
Epoch [1/2], Iter [1273/3125], train_loss:0.179852
Epoch [1/2], Iter [1274/3125], train_loss:0.156627
Epoch [1/2], Iter [1275/3125], train_loss:0.158944
Epoch [1/2], Iter [1276/3125], train_loss:0.148821
Epoch [1/2], Iter [1277/3125], train_loss:0.157448
Epoch [1/2], Iter [1278/3125], train_loss:0.178943
Epoch [1/2], Iter [1279/3125], train_loss:0.170738
Epoch [1/2], Iter [1280/3125], train_loss:0.146238
Epoch [1/2], Iter [1281/3125], train_loss:0.166454
Epoch [1/2], Iter [1282/3125], train_loss:0.147360
Epoch [1/2], Iter [1283/3125], train_loss:0.166235
Epoch [1/2], Iter [1284/3125], train_loss:0.160503
Epoch [1/2], Iter [1285/3125], train_loss:0.155493
Epoch [1/2], Iter [1286/3125], train_loss:0.164259
Epoch [1/2], Iter [1287/3125], train_loss:0.159880
Epoch [1/2], Iter [1288/3125], train_loss:0.174088
Epoch [1/2], Iter [1289/3125], train_loss:0.158363
Epoch [1/2], Iter [1290/3125], train_loss:0.160815
Epoch [1/2], Iter [1291/3125], train_loss:0.168558
Epoch [1/2], Iter [1292/3125], train_loss:0.155379
Epoch [1/2], Iter [1293/3125], train_loss:0.158657
Epoch [1/2], Iter [1294/3125], train_loss:0.152092
Epoch [1/2], Iter [1295/3125], train_loss:0.151002
Epoch [1/2], Iter [1296/3125], train_loss:0.177545
Epoch [1/2], Iter [1297/3125], train_loss:0.155996
Epoch [1/2], Iter [1298/3125], train_loss:0.153431
Epoch [1/2], Iter [1299/3125], train_loss:0.160402
Epoch [1/2], Iter [1300/3125], train_loss:0.164605
Epoch [1/2], Iter [1301/3125], train_loss:0.181966
Epoch [1/2], Iter [1302/3125], train_loss:0.150270
Epoch [1/2], Iter [1303/3125], train_loss:0.153899
Epoch [1/2], Iter [1304/3125], train_loss:0.167255
Epoch [1/2], Iter [1305/3125], train_loss:0.164807
Epoch [1/2], Iter [1306/3125], train_loss:0.176301
Epoch [1/2], Iter [1307/3125], train_loss:0.155036
Epoch [1/2], Iter [1308/3125], train_loss:0.167926
Epoch [1/2], Iter [1309/3125], train_loss:0.176630
Epoch [1/2], Iter [1310/3125], train_loss:0.160102
Epoch [1/2], Iter [1311/3125], train_loss:0.173452
Epoch [1/2], Iter [1312/3125], train_loss:0.172366
Epoch [1/2], Iter [1313/3125], train_loss:0.156772
Epoch [1/2], Iter [1314/3125], train_loss:0.168792
Epoch [1/2], Iter [1315/3125], train_loss:0.178687
Epoch [1/2], Iter [1316/3125], train_loss:0.181647
Epoch [1/2], Iter [1317/3125], train_loss:0.154158
Epoch [1/2], Iter [1318/3125], train_loss:0.151710
Epoch [1/2], Iter [1319/3125], train_loss:0.183539
Epoch [1/2], Iter [1320/3125], train_loss:0.160138
Epoch [1/2], Iter [1321/3125], train_loss:0.177658
Epoch [1/2], Iter [1322/3125], train_loss:0.146497
Epoch [1/2], Iter [1323/3125], train_loss:0.196226
Epoch [1/2], Iter [1324/3125], train_loss:0.165244
Epoch [1/2], Iter [1325/3125], train_loss:0.197831
Epoch [1/2], Iter [1326/3125], train_loss:0.175092
Epoch [1/2], Iter [1327/3125], train_loss:0.184453
Epoch [1/2], Iter [1328/3125], train_loss:0.165453
Epoch [1/2], Iter [1329/3125], train_loss:0.145549
Epoch [1/2], Iter [1330/3125], train_loss:0.173061
Epoch [1/2], Iter [1331/3125], train_loss:0.166073
Epoch [1/2], Iter [1332/3125], train_loss:0.156471
Epoch [1/2], Iter [1333/3125], train_loss:0.152220
Epoch [1/2], Iter [1334/3125], train_loss:0.156158
Epoch [1/2], Iter [1335/3125], train_loss:0.165017
Epoch [1/2], Iter [1336/3125], train_loss:0.183256
Epoch [1/2], Iter [1337/3125], train_loss:0.167704
Epoch [1/2], Iter [1338/3125], train_loss:0.154254
Epoch [1/2], Iter [1339/3125], train_loss:0.162098
Epoch [1/2], Iter [1340/3125], train_loss:0.161697
Epoch [1/2], Iter [1341/3125], train_loss:0.164405
Epoch [1/2], Iter [1342/3125], train_loss:0.149967
Epoch [1/2], Iter [1343/3125], train_loss:0.171982
Epoch [1/2], Iter [1344/3125], train_loss:0.155723
Epoch [1/2], Iter [1345/3125], train_loss:0.147691
Epoch [1/2], Iter [1346/3125], train_loss:0.160214
Epoch [1/2], Iter [1347/3125], train_loss:0.154677
Epoch [1/2], Iter [1348/3125], train_loss:0.152759
Epoch [1/2], Iter [1349/3125], train_loss:0.166476
Epoch [1/2], Iter [1350/3125], train_loss:0.163566
Epoch [1/2], Iter [1351/3125], train_loss:0.150434
Epoch [1/2], Iter [1352/3125], train_loss:0.168793
Epoch [1/2], Iter [1353/3125], train_loss:0.162513
Epoch [1/2], Iter [1354/3125], train_loss:0.169711
Epoch [1/2], Iter [1355/3125], train_loss:0.158046
Epoch [1/2], Iter [1356/3125], train_loss:0.151754
Epoch [1/2], Iter [1357/3125], train_loss:0.170661
Epoch [1/2], Iter [1358/3125], train_loss:0.152679
Epoch [1/2], Iter [1359/3125], train_loss:0.167173
Epoch [1/2], Iter [1360/3125], train_loss:0.156606
Epoch [1/2], Iter [1361/3125], train_loss:0.183170
Epoch [1/2], Iter [1362/3125], train_loss:0.142545
Epoch [1/2], Iter [1363/3125], train_loss:0.159119
Epoch [1/2], Iter [1364/3125], train_loss:0.164405
Epoch [1/2], Iter [1365/3125], train_loss:0.159609
Epoch [1/2], Iter [1366/3125], train_loss:0.161490
Epoch [1/2], Iter [1367/3125], train_loss:0.167248
Epoch [1/2], Iter [1368/3125], train_loss:0.165266
Epoch [1/2], Iter [1369/3125], train_loss:0.164672
Epoch [1/2], Iter [1370/3125], train_loss:0.178968
Epoch [1/2], Iter [1371/3125], train_loss:0.139022
Epoch [1/2], Iter [1372/3125], train_loss:0.157129
Epoch [1/2], Iter [1373/3125], train_loss:0.170236
Epoch [1/2], Iter [1374/3125], train_loss:0.172654
Epoch [1/2], Iter [1375/3125], train_loss:0.154364
Epoch [1/2], Iter [1376/3125], train_loss:0.191031
Epoch [1/2], Iter [1377/3125], train_loss:0.154899
Epoch [1/2], Iter [1378/3125], train_loss:0.154030
Epoch [1/2], Iter [1379/3125], train_loss:0.164986
Epoch [1/2], Iter [1380/3125], train_loss:0.149888
Epoch [1/2], Iter [1381/3125], train_loss:0.161112
Epoch [1/2], Iter [1382/3125], train_loss:0.177446
Epoch [1/2], Iter [1383/3125], train_loss:0.181748
Epoch [1/2], Iter [1384/3125], train_loss:0.148632
Epoch [1/2], Iter [1385/3125], train_loss:0.171001
Epoch [1/2], Iter [1386/3125], train_loss:0.146871
Epoch [1/2], Iter [1387/3125], train_loss:0.152815
Epoch [1/2], Iter [1388/3125], train_loss:0.153880
Epoch [1/2], Iter [1389/3125], train_loss:0.167807
Epoch [1/2], Iter [1390/3125], train_loss:0.163647
Epoch [1/2], Iter [1391/3125], train_loss:0.159752
Epoch [1/2], Iter [1392/3125], train_loss:0.148706
Epoch [1/2], Iter [1393/3125], train_loss:0.145519
Epoch [1/2], Iter [1394/3125], train_loss:0.149635
Epoch [1/2], Iter [1395/3125], train_loss:0.156076
Epoch [1/2], Iter [1396/3125], train_loss:0.156446
Epoch [1/2], Iter [1397/3125], train_loss:0.165825
Epoch [1/2], Iter [1398/3125], train_loss:0.145675
Epoch [1/2], Iter [1399/3125], train_loss:0.142919
Epoch [1/2], Iter [1400/3125], train_loss:0.163214
Epoch [1/2], Iter [1401/3125], train_loss:0.155447
Epoch [1/2], Iter [1402/3125], train_loss:0.164264
Epoch [1/2], Iter [1403/3125], train_loss:0.168581
Epoch [1/2], Iter [1404/3125], train_loss:0.149629
Epoch [1/2], Iter [1405/3125], train_loss:0.164211
Epoch [1/2], Iter [1406/3125], train_loss:0.168869
Epoch [1/2], Iter [1407/3125], train_loss:0.153973
Epoch [1/2], Iter [1408/3125], train_loss:0.173186
Epoch [1/2], Iter [1409/3125], train_loss:0.174420
Epoch [1/2], Iter [1410/3125], train_loss:0.154398
Epoch [1/2], Iter [1411/3125], train_loss:0.147271
Epoch [1/2], Iter [1412/3125], train_loss:0.172150
Epoch [1/2], Iter [1413/3125], train_loss:0.144826
Epoch [1/2], Iter [1414/3125], train_loss:0.160246
Epoch [1/2], Iter [1415/3125], train_loss:0.166100
Epoch [1/2], Iter [1416/3125], train_loss:0.151365
Epoch [1/2], Iter [1417/3125], train_loss:0.144425
Epoch [1/2], Iter [1418/3125], train_loss:0.145632
Epoch [1/2], Iter [1419/3125], train_loss:0.167284
Epoch [1/2], Iter [1420/3125], train_loss:0.162896
Epoch [1/2], Iter [1421/3125], train_loss:0.168027
Epoch [1/2], Iter [1422/3125], train_loss:0.160721
Epoch [1/2], Iter [1423/3125], train_loss:0.164672
Epoch [1/2], Iter [1424/3125], train_loss:0.162177
Epoch [1/2], Iter [1425/3125], train_loss:0.166051
Epoch [1/2], Iter [1426/3125], train_loss:0.146067
Epoch [1/2], Iter [1427/3125], train_loss:0.159922
Epoch [1/2], Iter [1428/3125], train_loss:0.152642
Epoch [1/2], Iter [1429/3125], train_loss:0.148200
Epoch [1/2], Iter [1430/3125], train_loss:0.158262
Epoch [1/2], Iter [1431/3125], train_loss:0.149659
Epoch [1/2], Iter [1432/3125], train_loss:0.163230
Epoch [1/2], Iter [1433/3125], train_loss:0.145847
Epoch [1/2], Iter [1434/3125], train_loss:0.173391
Epoch [1/2], Iter [1435/3125], train_loss:0.125152
Epoch [1/2], Iter [1436/3125], train_loss:0.156048
Epoch [1/2], Iter [1437/3125], train_loss:0.157051
Epoch [1/2], Iter [1438/3125], train_loss:0.158350
Epoch [1/2], Iter [1439/3125], train_loss:0.192877
Epoch [1/2], Iter [1440/3125], train_loss:0.167545
Epoch [1/2], Iter [1441/3125], train_loss:0.197214
Epoch [1/2], Iter [1442/3125], train_loss:0.192135
Epoch [1/2], Iter [1443/3125], train_loss:0.172984
Epoch [1/2], Iter [1444/3125], train_loss:0.173254
Epoch [1/2], Iter [1445/3125], train_loss:0.144135
Epoch [1/2], Iter [1446/3125], train_loss:0.169613
Epoch [1/2], Iter [1447/3125], train_loss:0.167091
Epoch [1/2], Iter [1448/3125], train_loss:0.158223
Epoch [1/2], Iter [1449/3125], train_loss:0.172568
Epoch [1/2], Iter [1450/3125], train_loss:0.162713
Epoch [1/2], Iter [1451/3125], train_loss:0.168839
Epoch [1/2], Iter [1452/3125], train_loss:0.168881
Epoch [1/2], Iter [1453/3125], train_loss:0.166082
Epoch [1/2], Iter [1454/3125], train_loss:0.137113
Epoch [1/2], Iter [1455/3125], train_loss:0.156944
Epoch [1/2], Iter [1456/3125], train_loss:0.176010
Epoch [1/2], Iter [1457/3125], train_loss:0.165683
Epoch [1/2], Iter [1458/3125], train_loss:0.166721
Epoch [1/2], Iter [1459/3125], train_loss:0.177907
Epoch [1/2], Iter [1460/3125], train_loss:0.148519
Epoch [1/2], Iter [1461/3125], train_loss:0.178192
Epoch [1/2], Iter [1462/3125], train_loss:0.163624
Epoch [1/2], Iter [1463/3125], train_loss:0.160104
Epoch [1/2], Iter [1464/3125], train_loss:0.175325
Epoch [1/2], Iter [1465/3125], train_loss:0.173073
Epoch [1/2], Iter [1466/3125], train_loss:0.167916
Epoch [1/2], Iter [1467/3125], train_loss:0.161516
Epoch [1/2], Iter [1468/3125], train_loss:0.168107
Epoch [1/2], Iter [1469/3125], train_loss:0.169824
Epoch [1/2], Iter [1470/3125], train_loss:0.160803
Epoch [1/2], Iter [1471/3125], train_loss:0.170264
Epoch [1/2], Iter [1472/3125], train_loss:0.168911
Epoch [1/2], Iter [1473/3125], train_loss:0.143244
Epoch [1/2], Iter [1474/3125], train_loss:0.154688
Epoch [1/2], Iter [1475/3125], train_loss:0.152704
Epoch [1/2], Iter [1476/3125], train_loss:0.153546
Epoch [1/2], Iter [1477/3125], train_loss:0.180169
Epoch [1/2], Iter [1478/3125], train_loss:0.150831
Epoch [1/2], Iter [1479/3125], train_loss:0.171316
Epoch [1/2], Iter [1480/3125], train_loss:0.168213
Epoch [1/2], Iter [1481/3125], train_loss:0.172205
Epoch [1/2], Iter [1482/3125], train_loss:0.142973
Epoch [1/2], Iter [1483/3125], train_loss:0.157204
Epoch [1/2], Iter [1484/3125], train_loss:0.172524
Epoch [1/2], Iter [1485/3125], train_loss:0.157539
Epoch [1/2], Iter [1486/3125], train_loss:0.143420
Epoch [1/2], Iter [1487/3125], train_loss:0.162053
Epoch [1/2], Iter [1488/3125], train_loss:0.167251
Epoch [1/2], Iter [1489/3125], train_loss:0.172743
Epoch [1/2], Iter [1490/3125], train_loss:0.166958
Epoch [1/2], Iter [1491/3125], train_loss:0.168802
Epoch [1/2], Iter [1492/3125], train_loss:0.160231
Epoch [1/2], Iter [1493/3125], train_loss:0.171343
Epoch [1/2], Iter [1494/3125], train_loss:0.167754
Epoch [1/2], Iter [1495/3125], train_loss:0.166312
Epoch [1/2], Iter [1496/3125], train_loss:0.162917
Epoch [1/2], Iter [1497/3125], train_loss:0.162183
Epoch [1/2], Iter [1498/3125], train_loss:0.170274
Epoch [1/2], Iter [1499/3125], train_loss:0.177937
Epoch [1/2], Iter [1500/3125], train_loss:0.142511
Epoch [1/2], Iter [1501/3125], train_loss:0.146676
Epoch [1/2], Iter [1502/3125], train_loss:0.165919
Epoch [1/2], Iter [1503/3125], train_loss:0.153276
Epoch [1/2], Iter [1504/3125], train_loss:0.169737
Epoch [1/2], Iter [1505/3125], train_loss:0.155799
Epoch [1/2], Iter [1506/3125], train_loss:0.160062
Epoch [1/2], Iter [1507/3125], train_loss:0.156737
Epoch [1/2], Iter [1508/3125], train_loss:0.171055
Epoch [1/2], Iter [1509/3125], train_loss:0.155235
Epoch [1/2], Iter [1510/3125], train_loss:0.144856
Epoch [1/2], Iter [1511/3125], train_loss:0.154941
Epoch [1/2], Iter [1512/3125], train_loss:0.141613
Epoch [1/2], Iter [1513/3125], train_loss:0.169685
Epoch [1/2], Iter [1514/3125], train_loss:0.153574
Epoch [1/2], Iter [1515/3125], train_loss:0.165675
Epoch [1/2], Iter [1516/3125], train_loss:0.194039
Epoch [1/2], Iter [1517/3125], train_loss:0.136731
Epoch [1/2], Iter [1518/3125], train_loss:0.162655
Epoch [1/2], Iter [1519/3125], train_loss:0.157449
Epoch [1/2], Iter [1520/3125], train_loss:0.172672
Epoch [1/2], Iter [1521/3125], train_loss:0.185573
Epoch [1/2], Iter [1522/3125], train_loss:0.177209
Epoch [1/2], Iter [1523/3125], train_loss:0.144910
Epoch [1/2], Iter [1524/3125], train_loss:0.160207
Epoch [1/2], Iter [1525/3125], train_loss:0.163809
Epoch [1/2], Iter [1526/3125], train_loss:0.161429
Epoch [1/2], Iter [1527/3125], train_loss:0.149817
Epoch [1/2], Iter [1528/3125], train_loss:0.182072
Epoch [1/2], Iter [1529/3125], train_loss:0.175234
Epoch [1/2], Iter [1530/3125], train_loss:0.170426
Epoch [1/2], Iter [1531/3125], train_loss:0.155083
Epoch [1/2], Iter [1532/3125], train_loss:0.180693
Epoch [1/2], Iter [1533/3125], train_loss:0.168464
Epoch [1/2], Iter [1534/3125], train_loss:0.159503
Epoch [1/2], Iter [1535/3125], train_loss:0.161271
Epoch [1/2], Iter [1536/3125], train_loss:0.141511
Epoch [1/2], Iter [1537/3125], train_loss:0.166983
Epoch [1/2], Iter [1538/3125], train_loss:0.146959
Epoch [1/2], Iter [1539/3125], train_loss:0.161491
Epoch [1/2], Iter [1540/3125], train_loss:0.166418
Epoch [1/2], Iter [1541/3125], train_loss:0.158611
Epoch [1/2], Iter [1542/3125], train_loss:0.152136
Epoch [1/2], Iter [1543/3125], train_loss:0.172768
Epoch [1/2], Iter [1544/3125], train_loss:0.152095
Epoch [1/2], Iter [1545/3125], train_loss:0.142310
Epoch [1/2], Iter [1546/3125], train_loss:0.150826
Epoch [1/2], Iter [1547/3125], train_loss:0.147781
Epoch [1/2], Iter [1548/3125], train_loss:0.171356
Epoch [1/2], Iter [1549/3125], train_loss:0.149514
Epoch [1/2], Iter [1550/3125], train_loss:0.156635
Epoch [1/2], Iter [1551/3125], train_loss:0.172591
Epoch [1/2], Iter [1552/3125], train_loss:0.178937
Epoch [1/2], Iter [1553/3125], train_loss:0.165982
Epoch [1/2], Iter [1554/3125], train_loss:0.169758
Epoch [1/2], Iter [1555/3125], train_loss:0.170617
Epoch [1/2], Iter [1556/3125], train_loss:0.168998
Epoch [1/2], Iter [1557/3125], train_loss:0.165370
Epoch [1/2], Iter [1558/3125], train_loss:0.161071
Epoch [1/2], Iter [1559/3125], train_loss:0.158345
Epoch [1/2], Iter [1560/3125], train_loss:0.186070
Epoch [1/2], Iter [1561/3125], train_loss:0.160171
Epoch [1/2], Iter [1562/3125], train_loss:0.179954
Epoch [1/2], Iter [1563/3125], train_loss:0.172075
Epoch [1/2], Iter [1564/3125], train_loss:0.151861
Epoch [1/2], Iter [1565/3125], train_loss:0.173611
Epoch [1/2], Iter [1566/3125], train_loss:0.167420
Epoch [1/2], Iter [1567/3125], train_loss:0.149860
Epoch [1/2], Iter [1568/3125], train_loss:0.154862
Epoch [1/2], Iter [1569/3125], train_loss:0.168478
Epoch [1/2], Iter [1570/3125], train_loss:0.159789
Epoch [1/2], Iter [1571/3125], train_loss:0.145618
Epoch [1/2], Iter [1572/3125], train_loss:0.171799
Epoch [1/2], Iter [1573/3125], train_loss:0.146416
Epoch [1/2], Iter [1574/3125], train_loss:0.151151
Epoch [1/2], Iter [1575/3125], train_loss:0.179502
Epoch [1/2], Iter [1576/3125], train_loss:0.169216
Epoch [1/2], Iter [1577/3125], train_loss:0.162661
Epoch [1/2], Iter [1578/3125], train_loss:0.155157
Epoch [1/2], Iter [1579/3125], train_loss:0.155743
Epoch [1/2], Iter [1580/3125], train_loss:0.189926
Epoch [1/2], Iter [1581/3125], train_loss:0.164597
Epoch [1/2], Iter [1582/3125], train_loss:0.137085
Epoch [1/2], Iter [1583/3125], train_loss:0.160498
Epoch [1/2], Iter [1584/3125], train_loss:0.179899
Epoch [1/2], Iter [1585/3125], train_loss:0.145678
Epoch [1/2], Iter [1586/3125], train_loss:0.157172
Epoch [1/2], Iter [1587/3125], train_loss:0.168524
Epoch [1/2], Iter [1588/3125], train_loss:0.156430
Epoch [1/2], Iter [1589/3125], train_loss:0.133350
Epoch [1/2], Iter [1590/3125], train_loss:0.176202
Epoch [1/2], Iter [1591/3125], train_loss:0.169190
Epoch [1/2], Iter [1592/3125], train_loss:0.175895
Epoch [1/2], Iter [1593/3125], train_loss:0.174856
Epoch [1/2], Iter [1594/3125], train_loss:0.163053
Epoch [1/2], Iter [1595/3125], train_loss:0.181598
Epoch [1/2], Iter [1596/3125], train_loss:0.144885
Epoch [1/2], Iter [1597/3125], train_loss:0.170181
Epoch [1/2], Iter [1598/3125], train_loss:0.171142
Epoch [1/2], Iter [1599/3125], train_loss:0.141454
Epoch [1/2], Iter [1600/3125], train_loss:0.145220
Epoch [1/2], Iter [1601/3125], train_loss:0.141769
Epoch [1/2], Iter [1602/3125], train_loss:0.154418
Epoch [1/2], Iter [1603/3125], train_loss:0.160915
Epoch [1/2], Iter [1604/3125], train_loss:0.168919
Epoch [1/2], Iter [1605/3125], train_loss:0.180826
Epoch [1/2], Iter [1606/3125], train_loss:0.152858
Epoch [1/2], Iter [1607/3125], train_loss:0.164291
Epoch [1/2], Iter [1608/3125], train_loss:0.163692
Epoch [1/2], Iter [1609/3125], train_loss:0.147713
Epoch [1/2], Iter [1610/3125], train_loss:0.157055
Epoch [1/2], Iter [1611/3125], train_loss:0.194847
Epoch [1/2], Iter [1612/3125], train_loss:0.158709
Epoch [1/2], Iter [1613/3125], train_loss:0.166231
Epoch [1/2], Iter [1614/3125], train_loss:0.186130
Epoch [1/2], Iter [1615/3125], train_loss:0.192118
Epoch [1/2], Iter [1616/3125], train_loss:0.167463
Epoch [1/2], Iter [1617/3125], train_loss:0.137858
Epoch [1/2], Iter [1618/3125], train_loss:0.146418
Epoch [1/2], Iter [1619/3125], train_loss:0.173478
Epoch [1/2], Iter [1620/3125], train_loss:0.161725
Epoch [1/2], Iter [1621/3125], train_loss:0.152998
Epoch [1/2], Iter [1622/3125], train_loss:0.185276
Epoch [1/2], Iter [1623/3125], train_loss:0.152323
Epoch [1/2], Iter [1624/3125], train_loss:0.145322
Epoch [1/2], Iter [1625/3125], train_loss:0.161513
Epoch [1/2], Iter [1626/3125], train_loss:0.153024
Epoch [1/2], Iter [1627/3125], train_loss:0.164977
Epoch [1/2], Iter [1628/3125], train_loss:0.165822
Epoch [1/2], Iter [1629/3125], train_loss:0.151458
Epoch [1/2], Iter [1630/3125], train_loss:0.168540
Epoch [1/2], Iter [1631/3125], train_loss:0.181315
Epoch [1/2], Iter [1632/3125], train_loss:0.169901
Epoch [1/2], Iter [1633/3125], train_loss:0.169646
Epoch [1/2], Iter [1634/3125], train_loss:0.161900
Epoch [1/2], Iter [1635/3125], train_loss:0.141304
Epoch [1/2], Iter [1636/3125], train_loss:0.144623
Epoch [1/2], Iter [1637/3125], train_loss:0.156594
Epoch [1/2], Iter [1638/3125], train_loss:0.150709
Epoch [1/2], Iter [1639/3125], train_loss:0.137099
Epoch [1/2], Iter [1640/3125], train_loss:0.153333
Epoch [1/2], Iter [1641/3125], train_loss:0.157802
Epoch [1/2], Iter [1642/3125], train_loss:0.143059
Epoch [1/2], Iter [1643/3125], train_loss:0.189253
Epoch [1/2], Iter [1644/3125], train_loss:0.155171
Epoch [1/2], Iter [1645/3125], train_loss:0.152370
Epoch [1/2], Iter [1646/3125], train_loss:0.166632
Epoch [1/2], Iter [1647/3125], train_loss:0.179730
Epoch [1/2], Iter [1648/3125], train_loss:0.172416
Epoch [1/2], Iter [1649/3125], train_loss:0.178696
Epoch [1/2], Iter [1650/3125], train_loss:0.160341
Epoch [1/2], Iter [1651/3125], train_loss:0.144308
Epoch [1/2], Iter [1652/3125], train_loss:0.154410
Epoch [1/2], Iter [1653/3125], train_loss:0.173912
Epoch [1/2], Iter [1654/3125], train_loss:0.168114
Epoch [1/2], Iter [1655/3125], train_loss:0.161952
Epoch [1/2], Iter [1656/3125], train_loss:0.169061
Epoch [1/2], Iter [1657/3125], train_loss:0.159440
Epoch [1/2], Iter [1658/3125], train_loss:0.138091
Epoch [1/2], Iter [1659/3125], train_loss:0.155012
Epoch [1/2], Iter [1660/3125], train_loss:0.172804
Epoch [1/2], Iter [1661/3125], train_loss:0.143058
Epoch [1/2], Iter [1662/3125], train_loss:0.162421
Epoch [1/2], Iter [1663/3125], train_loss:0.143751
Epoch [1/2], Iter [1664/3125], train_loss:0.140241
Epoch [1/2], Iter [1665/3125], train_loss:0.172097
Epoch [1/2], Iter [1666/3125], train_loss:0.163913
Epoch [1/2], Iter [1667/3125], train_loss:0.165221
Epoch [1/2], Iter [1668/3125], train_loss:0.174985
Epoch [1/2], Iter [1669/3125], train_loss:0.157103
Epoch [1/2], Iter [1670/3125], train_loss:0.171044
Epoch [1/2], Iter [1671/3125], train_loss:0.179402
Epoch [1/2], Iter [1672/3125], train_loss:0.166310
Epoch [1/2], Iter [1673/3125], train_loss:0.161387
Epoch [1/2], Iter [1674/3125], train_loss:0.159246
Epoch [1/2], Iter [1675/3125], train_loss:0.159077
Epoch [1/2], Iter [1676/3125], train_loss:0.156243
Epoch [1/2], Iter [1677/3125], train_loss:0.177382
Epoch [1/2], Iter [1678/3125], train_loss:0.161576
Epoch [1/2], Iter [1679/3125], train_loss:0.155681
Epoch [1/2], Iter [1680/3125], train_loss:0.182101
Epoch [1/2], Iter [1681/3125], train_loss:0.168359
Epoch [1/2], Iter [1682/3125], train_loss:0.162136
Epoch [1/2], Iter [1683/3125], train_loss:0.175528
Epoch [1/2], Iter [1684/3125], train_loss:0.139192
Epoch [1/2], Iter [1685/3125], train_loss:0.149815
Epoch [1/2], Iter [1686/3125], train_loss:0.182981
Epoch [1/2], Iter [1687/3125], train_loss:0.160774
Epoch [1/2], Iter [1688/3125], train_loss:0.145968
Epoch [1/2], Iter [1689/3125], train_loss:0.158807
Epoch [1/2], Iter [1690/3125], train_loss:0.158910
Epoch [1/2], Iter [1691/3125], train_loss:0.174940
Epoch [1/2], Iter [1692/3125], train_loss:0.155379
Epoch [1/2], Iter [1693/3125], train_loss:0.170327
Epoch [1/2], Iter [1694/3125], train_loss:0.161909
Epoch [1/2], Iter [1695/3125], train_loss:0.150474
Epoch [1/2], Iter [1696/3125], train_loss:0.170937
Epoch [1/2], Iter [1697/3125], train_loss:0.152703
Epoch [1/2], Iter [1698/3125], train_loss:0.168881
Epoch [1/2], Iter [1699/3125], train_loss:0.172118
Epoch [1/2], Iter [1700/3125], train_loss:0.157837
Epoch [1/2], Iter [1701/3125], train_loss:0.160279
Epoch [1/2], Iter [1702/3125], train_loss:0.181616
Epoch [1/2], Iter [1703/3125], train_loss:0.147026
Epoch [1/2], Iter [1704/3125], train_loss:0.157656
Epoch [1/2], Iter [1705/3125], train_loss:0.179791
Epoch [1/2], Iter [1706/3125], train_loss:0.171684
Epoch [1/2], Iter [1707/3125], train_loss:0.138092
Epoch [1/2], Iter [1708/3125], train_loss:0.177978
Epoch [1/2], Iter [1709/3125], train_loss:0.175673
Epoch [1/2], Iter [1710/3125], train_loss:0.151395
Epoch [1/2], Iter [1711/3125], train_loss:0.159401
Epoch [1/2], Iter [1712/3125], train_loss:0.168381
Epoch [1/2], Iter [1713/3125], train_loss:0.166301
Epoch [1/2], Iter [1714/3125], train_loss:0.156766
Epoch [1/2], Iter [1715/3125], train_loss:0.168902
Epoch [1/2], Iter [1716/3125], train_loss:0.169495
Epoch [1/2], Iter [1717/3125], train_loss:0.159896
Epoch [1/2], Iter [1718/3125], train_loss:0.165244
Epoch [1/2], Iter [1719/3125], train_loss:0.145941
Epoch [1/2], Iter [1720/3125], train_loss:0.166384
Epoch [1/2], Iter [1721/3125], train_loss:0.174706
Epoch [1/2], Iter [1722/3125], train_loss:0.141559
Epoch [1/2], Iter [1723/3125], train_loss:0.174502
Epoch [1/2], Iter [1724/3125], train_loss:0.149242
Epoch [1/2], Iter [1725/3125], train_loss:0.143743
Epoch [1/2], Iter [1726/3125], train_loss:0.159536
Epoch [1/2], Iter [1727/3125], train_loss:0.173931
Epoch [1/2], Iter [1728/3125], train_loss:0.152694
Epoch [1/2], Iter [1729/3125], train_loss:0.167123
Epoch [1/2], Iter [1730/3125], train_loss:0.174618
Epoch [1/2], Iter [1731/3125], train_loss:0.187746
Epoch [1/2], Iter [1732/3125], train_loss:0.188053
Epoch [1/2], Iter [1733/3125], train_loss:0.161420
Epoch [1/2], Iter [1734/3125], train_loss:0.143247
Epoch [1/2], Iter [1735/3125], train_loss:0.164306
Epoch [1/2], Iter [1736/3125], train_loss:0.149670
Epoch [1/2], Iter [1737/3125], train_loss:0.180123
Epoch [1/2], Iter [1738/3125], train_loss:0.158347
Epoch [1/2], Iter [1739/3125], train_loss:0.175313
Epoch [1/2], Iter [1740/3125], train_loss:0.154087
Epoch [1/2], Iter [1741/3125], train_loss:0.180090
Epoch [1/2], Iter [1742/3125], train_loss:0.162183
Epoch [1/2], Iter [1743/3125], train_loss:0.164401
Epoch [1/2], Iter [1744/3125], train_loss:0.164047
Epoch [1/2], Iter [1745/3125], train_loss:0.164262
Epoch [1/2], Iter [1746/3125], train_loss:0.155122
Epoch [1/2], Iter [1747/3125], train_loss:0.164472
Epoch [1/2], Iter [1748/3125], train_loss:0.161191
Epoch [1/2], Iter [1749/3125], train_loss:0.160144
Epoch [1/2], Iter [1750/3125], train_loss:0.161302
Epoch [1/2], Iter [1751/3125], train_loss:0.160503
Epoch [1/2], Iter [1752/3125], train_loss:0.155728
Epoch [1/2], Iter [1753/3125], train_loss:0.140759
Epoch [1/2], Iter [1754/3125], train_loss:0.145626
Epoch [1/2], Iter [1755/3125], train_loss:0.164837
Epoch [1/2], Iter [1756/3125], train_loss:0.157151
Epoch [1/2], Iter [1757/3125], train_loss:0.179853
Epoch [1/2], Iter [1758/3125], train_loss:0.150290
Epoch [1/2], Iter [1759/3125], train_loss:0.144776
Epoch [1/2], Iter [1760/3125], train_loss:0.163736
Epoch [1/2], Iter [1761/3125], train_loss:0.161901
Epoch [1/2], Iter [1762/3125], train_loss:0.137393
Epoch [1/2], Iter [1763/3125], train_loss:0.157102
Epoch [1/2], Iter [1764/3125], train_loss:0.147714
Epoch [1/2], Iter [1765/3125], train_loss:0.168851
Epoch [1/2], Iter [1766/3125], train_loss:0.180556
Epoch [1/2], Iter [1767/3125], train_loss:0.145371
Epoch [1/2], Iter [1768/3125], train_loss:0.182627
Epoch [1/2], Iter [1769/3125], train_loss:0.155184
Epoch [1/2], Iter [1770/3125], train_loss:0.170251
Epoch [1/2], Iter [1771/3125], train_loss:0.156243
Epoch [1/2], Iter [1772/3125], train_loss:0.157432
Epoch [1/2], Iter [1773/3125], train_loss:0.165467
Epoch [1/2], Iter [1774/3125], train_loss:0.166601
Epoch [1/2], Iter [1775/3125], train_loss:0.177986
Epoch [1/2], Iter [1776/3125], train_loss:0.165314
Epoch [1/2], Iter [1777/3125], train_loss:0.171132
Epoch [1/2], Iter [1778/3125], train_loss:0.190048
Epoch [1/2], Iter [1779/3125], train_loss:0.165800
Epoch [1/2], Iter [1780/3125], train_loss:0.160303
Epoch [1/2], Iter [1781/3125], train_loss:0.155642
Epoch [1/2], Iter [1782/3125], train_loss:0.146157
Epoch [1/2], Iter [1783/3125], train_loss:0.160654
Epoch [1/2], Iter [1784/3125], train_loss:0.176773
Epoch [1/2], Iter [1785/3125], train_loss:0.169321
Epoch [1/2], Iter [1786/3125], train_loss:0.150362
Epoch [1/2], Iter [1787/3125], train_loss:0.167345
Epoch [1/2], Iter [1788/3125], train_loss:0.145898
Epoch [1/2], Iter [1789/3125], train_loss:0.150497
Epoch [1/2], Iter [1790/3125], train_loss:0.166425
Epoch [1/2], Iter [1791/3125], train_loss:0.171549
Epoch [1/2], Iter [1792/3125], train_loss:0.154176
Epoch [1/2], Iter [1793/3125], train_loss:0.166127
Epoch [1/2], Iter [1794/3125], train_loss:0.165764
Epoch [1/2], Iter [1795/3125], train_loss:0.162320
Epoch [1/2], Iter [1796/3125], train_loss:0.194941
Epoch [1/2], Iter [1797/3125], train_loss:0.157635
Epoch [1/2], Iter [1798/3125], train_loss:0.163641
Epoch [1/2], Iter [1799/3125], train_loss:0.167187
Epoch [1/2], Iter [1800/3125], train_loss:0.145284
Epoch [1/2], Iter [1801/3125], train_loss:0.161971
Epoch [1/2], Iter [1802/3125], train_loss:0.164058
Epoch [1/2], Iter [1803/3125], train_loss:0.142048
Epoch [1/2], Iter [1804/3125], train_loss:0.160427
Epoch [1/2], Iter [1805/3125], train_loss:0.161583
Epoch [1/2], Iter [1806/3125], train_loss:0.155377
Epoch [1/2], Iter [1807/3125], train_loss:0.191237
Epoch [1/2], Iter [1808/3125], train_loss:0.163925
Epoch [1/2], Iter [1809/3125], train_loss:0.190522
Epoch [1/2], Iter [1810/3125], train_loss:0.160186
Epoch [1/2], Iter [1811/3125], train_loss:0.176043
Epoch [1/2], Iter [1812/3125], train_loss:0.170882
Epoch [1/2], Iter [1813/3125], train_loss:0.168062
Epoch [1/2], Iter [1814/3125], train_loss:0.162766
Epoch [1/2], Iter [1815/3125], train_loss:0.168871
Epoch [1/2], Iter [1816/3125], train_loss:0.161266
Epoch [1/2], Iter [1817/3125], train_loss:0.155407
Epoch [1/2], Iter [1818/3125], train_loss:0.145036
Epoch [1/2], Iter [1819/3125], train_loss:0.166525
Epoch [1/2], Iter [1820/3125], train_loss:0.161894
Epoch [1/2], Iter [1821/3125], train_loss:0.179618
Epoch [1/2], Iter [1822/3125], train_loss:0.165360
Epoch [1/2], Iter [1823/3125], train_loss:0.179443
Epoch [1/2], Iter [1824/3125], train_loss:0.182535
Epoch [1/2], Iter [1825/3125], train_loss:0.183849
Epoch [1/2], Iter [1826/3125], train_loss:0.160797
Epoch [1/2], Iter [1827/3125], train_loss:0.155679
Epoch [1/2], Iter [1828/3125], train_loss:0.163237
Epoch [1/2], Iter [1829/3125], train_loss:0.155929
Epoch [1/2], Iter [1830/3125], train_loss:0.169370
Epoch [1/2], Iter [1831/3125], train_loss:0.181343
Epoch [1/2], Iter [1832/3125], train_loss:0.150245
Epoch [1/2], Iter [1833/3125], train_loss:0.173141
Epoch [1/2], Iter [1834/3125], train_loss:0.163475
Epoch [1/2], Iter [1835/3125], train_loss:0.163326
Epoch [1/2], Iter [1836/3125], train_loss:0.157031
Epoch [1/2], Iter [1837/3125], train_loss:0.166926
Epoch [1/2], Iter [1838/3125], train_loss:0.181456
Epoch [1/2], Iter [1839/3125], train_loss:0.166013
Epoch [1/2], Iter [1840/3125], train_loss:0.151755
Epoch [1/2], Iter [1841/3125], train_loss:0.172941
Epoch [1/2], Iter [1842/3125], train_loss:0.175330
Epoch [1/2], Iter [1843/3125], train_loss:0.147175
Epoch [1/2], Iter [1844/3125], train_loss:0.178653
Epoch [1/2], Iter [1845/3125], train_loss:0.152745
Epoch [1/2], Iter [1846/3125], train_loss:0.142007
Epoch [1/2], Iter [1847/3125], train_loss:0.148765
Epoch [1/2], Iter [1848/3125], train_loss:0.166682
Epoch [1/2], Iter [1849/3125], train_loss:0.156195
Epoch [1/2], Iter [1850/3125], train_loss:0.156262
Epoch [1/2], Iter [1851/3125], train_loss:0.160495
Epoch [1/2], Iter [1852/3125], train_loss:0.162996
Epoch [1/2], Iter [1853/3125], train_loss:0.162435
Epoch [1/2], Iter [1854/3125], train_loss:0.161346
Epoch [1/2], Iter [1855/3125], train_loss:0.171397
Epoch [1/2], Iter [1856/3125], train_loss:0.177385
Epoch [1/2], Iter [1857/3125], train_loss:0.130351
Epoch [1/2], Iter [1858/3125], train_loss:0.151499
Epoch [1/2], Iter [1859/3125], train_loss:0.149441
Epoch [1/2], Iter [1860/3125], train_loss:0.164405
Epoch [1/2], Iter [1861/3125], train_loss:0.163205
Epoch [1/2], Iter [1862/3125], train_loss:0.177325
Epoch [1/2], Iter [1863/3125], train_loss:0.155396
Epoch [1/2], Iter [1864/3125], train_loss:0.162093
Epoch [1/2], Iter [1865/3125], train_loss:0.155531
Epoch [1/2], Iter [1866/3125], train_loss:0.147332
Epoch [1/2], Iter [1867/3125], train_loss:0.150838
Epoch [1/2], Iter [1868/3125], train_loss:0.154241
Epoch [1/2], Iter [1869/3125], train_loss:0.142730
Epoch [1/2], Iter [1870/3125], train_loss:0.154984
Epoch [1/2], Iter [1871/3125], train_loss:0.141732
Epoch [1/2], Iter [1872/3125], train_loss:0.169305
Epoch [1/2], Iter [1873/3125], train_loss:0.158314
Epoch [1/2], Iter [1874/3125], train_loss:0.155676
Epoch [1/2], Iter [1875/3125], train_loss:0.162900
Epoch [1/2], Iter [1876/3125], train_loss:0.174867
Epoch [1/2], Iter [1877/3125], train_loss:0.165536
Epoch [1/2], Iter [1878/3125], train_loss:0.153315
Epoch [1/2], Iter [1879/3125], train_loss:0.150544
Epoch [1/2], Iter [1880/3125], train_loss:0.183719
Epoch [1/2], Iter [1881/3125], train_loss:0.160276
Epoch [1/2], Iter [1882/3125], train_loss:0.169111
Epoch [1/2], Iter [1883/3125], train_loss:0.162062
Epoch [1/2], Iter [1884/3125], train_loss:0.136829
Epoch [1/2], Iter [1885/3125], train_loss:0.152484
Epoch [1/2], Iter [1886/3125], train_loss:0.157395
Epoch [1/2], Iter [1887/3125], train_loss:0.153584
Epoch [1/2], Iter [1888/3125], train_loss:0.184233
Epoch [1/2], Iter [1889/3125], train_loss:0.158605
Epoch [1/2], Iter [1890/3125], train_loss:0.160698
Epoch [1/2], Iter [1891/3125], train_loss:0.160876
Epoch [1/2], Iter [1892/3125], train_loss:0.169683
Epoch [1/2], Iter [1893/3125], train_loss:0.157519
Epoch [1/2], Iter [1894/3125], train_loss:0.159938
Epoch [1/2], Iter [1895/3125], train_loss:0.170259
Epoch [1/2], Iter [1896/3125], train_loss:0.184046
Epoch [1/2], Iter [1897/3125], train_loss:0.153115
Epoch [1/2], Iter [1898/3125], train_loss:0.157097
Epoch [1/2], Iter [1899/3125], train_loss:0.162475
Epoch [1/2], Iter [1900/3125], train_loss:0.161365
Epoch [1/2], Iter [1901/3125], train_loss:0.173105
Epoch [1/2], Iter [1902/3125], train_loss:0.148295
Epoch [1/2], Iter [1903/3125], train_loss:0.165971
Epoch [1/2], Iter [1904/3125], train_loss:0.158941
Epoch [1/2], Iter [1905/3125], train_loss:0.167976
Epoch [1/2], Iter [1906/3125], train_loss:0.161314
Epoch [1/2], Iter [1907/3125], train_loss:0.142002
Epoch [1/2], Iter [1908/3125], train_loss:0.155992
Epoch [1/2], Iter [1909/3125], train_loss:0.159452
Epoch [1/2], Iter [1910/3125], train_loss:0.167375
Epoch [1/2], Iter [1911/3125], train_loss:0.160087
Epoch [1/2], Iter [1912/3125], train_loss:0.162730
Epoch [1/2], Iter [1913/3125], train_loss:0.166080
Epoch [1/2], Iter [1914/3125], train_loss:0.186217
Epoch [1/2], Iter [1915/3125], train_loss:0.151830
Epoch [1/2], Iter [1916/3125], train_loss:0.168950
Epoch [1/2], Iter [1917/3125], train_loss:0.153571
Epoch [1/2], Iter [1918/3125], train_loss:0.164015
Epoch [1/2], Iter [1919/3125], train_loss:0.159809
Epoch [1/2], Iter [1920/3125], train_loss:0.146458
Epoch [1/2], Iter [1921/3125], train_loss:0.160593
Epoch [1/2], Iter [1922/3125], train_loss:0.152458
Epoch [1/2], Iter [1923/3125], train_loss:0.170881
Epoch [1/2], Iter [1924/3125], train_loss:0.158566
Epoch [1/2], Iter [1925/3125], train_loss:0.155870
Epoch [1/2], Iter [1926/3125], train_loss:0.188001
Epoch [1/2], Iter [1927/3125], train_loss:0.169803
Epoch [1/2], Iter [1928/3125], train_loss:0.150111
Epoch [1/2], Iter [1929/3125], train_loss:0.163295
Epoch [1/2], Iter [1930/3125], train_loss:0.145743
Epoch [1/2], Iter [1931/3125], train_loss:0.154151
Epoch [1/2], Iter [1932/3125], train_loss:0.160207
Epoch [1/2], Iter [1933/3125], train_loss:0.158596
Epoch [1/2], Iter [1934/3125], train_loss:0.173918
Epoch [1/2], Iter [1935/3125], train_loss:0.184682
Epoch [1/2], Iter [1936/3125], train_loss:0.184060
Epoch [1/2], Iter [1937/3125], train_loss:0.165681
Epoch [1/2], Iter [1938/3125], train_loss:0.172499
Epoch [1/2], Iter [1939/3125], train_loss:0.154547
Epoch [1/2], Iter [1940/3125], train_loss:0.147814
Epoch [1/2], Iter [1941/3125], train_loss:0.161590
Epoch [1/2], Iter [1942/3125], train_loss:0.141166
Epoch [1/2], Iter [1943/3125], train_loss:0.151419
Epoch [1/2], Iter [1944/3125], train_loss:0.152621
Epoch [1/2], Iter [1945/3125], train_loss:0.164691
Epoch [1/2], Iter [1946/3125], train_loss:0.146373
Epoch [1/2], Iter [1947/3125], train_loss:0.148736
Epoch [1/2], Iter [1948/3125], train_loss:0.199945
Epoch [1/2], Iter [1949/3125], train_loss:0.154226
Epoch [1/2], Iter [1950/3125], train_loss:0.173260
Epoch [1/2], Iter [1951/3125], train_loss:0.161090
Epoch [1/2], Iter [1952/3125], train_loss:0.169187
Epoch [1/2], Iter [1953/3125], train_loss:0.164385
Epoch [1/2], Iter [1954/3125], train_loss:0.151662
Epoch [1/2], Iter [1955/3125], train_loss:0.165125
Epoch [1/2], Iter [1956/3125], train_loss:0.154994
Epoch [1/2], Iter [1957/3125], train_loss:0.173068
Epoch [1/2], Iter [1958/3125], train_loss:0.175447
Epoch [1/2], Iter [1959/3125], train_loss:0.170935
Epoch [1/2], Iter [1960/3125], train_loss:0.167173
Epoch [1/2], Iter [1961/3125], train_loss:0.181798
Epoch [1/2], Iter [1962/3125], train_loss:0.149530
Epoch [1/2], Iter [1963/3125], train_loss:0.162909
Epoch [1/2], Iter [1964/3125], train_loss:0.159980
Epoch [1/2], Iter [1965/3125], train_loss:0.152192
Epoch [1/2], Iter [1966/3125], train_loss:0.178360
Epoch [1/2], Iter [1967/3125], train_loss:0.146795
Epoch [1/2], Iter [1968/3125], train_loss:0.143748
Epoch [1/2], Iter [1969/3125], train_loss:0.181524
Epoch [1/2], Iter [1970/3125], train_loss:0.156961
Epoch [1/2], Iter [1971/3125], train_loss:0.157196
Epoch [1/2], Iter [1972/3125], train_loss:0.156264
Epoch [1/2], Iter [1973/3125], train_loss:0.155585
Epoch [1/2], Iter [1974/3125], train_loss:0.165857
Epoch [1/2], Iter [1975/3125], train_loss:0.179051
Epoch [1/2], Iter [1976/3125], train_loss:0.154581
Epoch [1/2], Iter [1977/3125], train_loss:0.169368
Epoch [1/2], Iter [1978/3125], train_loss:0.144383
Epoch [1/2], Iter [1979/3125], train_loss:0.152714
Epoch [1/2], Iter [1980/3125], train_loss:0.148939
Epoch [1/2], Iter [1981/3125], train_loss:0.175638
Epoch [1/2], Iter [1982/3125], train_loss:0.168751
Epoch [1/2], Iter [1983/3125], train_loss:0.162145
Epoch [1/2], Iter [1984/3125], train_loss:0.182724
Epoch [1/2], Iter [1985/3125], train_loss:0.155977
Epoch [1/2], Iter [1986/3125], train_loss:0.155561
Epoch [1/2], Iter [1987/3125], train_loss:0.191799
Epoch [1/2], Iter [1988/3125], train_loss:0.167943
Epoch [1/2], Iter [1989/3125], train_loss:0.163147
Epoch [1/2], Iter [1990/3125], train_loss:0.176683
Epoch [1/2], Iter [1991/3125], train_loss:0.158023
Epoch [1/2], Iter [1992/3125], train_loss:0.160804
Epoch [1/2], Iter [1993/3125], train_loss:0.158202
Epoch [1/2], Iter [1994/3125], train_loss:0.170246
Epoch [1/2], Iter [1995/3125], train_loss:0.165867
Epoch [1/2], Iter [1996/3125], train_loss:0.144000
Epoch [1/2], Iter [1997/3125], train_loss:0.162011
Epoch [1/2], Iter [1998/3125], train_loss:0.168511
Epoch [1/2], Iter [1999/3125], train_loss:0.156590
Epoch [1/2], Iter [2000/3125], train_loss:0.151475
Epoch [1/2], Iter [2001/3125], train_loss:0.172778
Epoch [1/2], Iter [2002/3125], train_loss:0.174170
Epoch [1/2], Iter [2003/3125], train_loss:0.172485
Epoch [1/2], Iter [2004/3125], train_loss:0.151658
Epoch [1/2], Iter [2005/3125], train_loss:0.165962
Epoch [1/2], Iter [2006/3125], train_loss:0.143169
Epoch [1/2], Iter [2007/3125], train_loss:0.170595
Epoch [1/2], Iter [2008/3125], train_loss:0.200333
Epoch [1/2], Iter [2009/3125], train_loss:0.163445
Epoch [1/2], Iter [2010/3125], train_loss:0.150004
Epoch [1/2], Iter [2011/3125], train_loss:0.157552
Epoch [1/2], Iter [2012/3125], train_loss:0.168187
Epoch [1/2], Iter [2013/3125], train_loss:0.153843
Epoch [1/2], Iter [2014/3125], train_loss:0.169956
Epoch [1/2], Iter [2015/3125], train_loss:0.171310
Epoch [1/2], Iter [2016/3125], train_loss:0.152466
Epoch [1/2], Iter [2017/3125], train_loss:0.173650
Epoch [1/2], Iter [2018/3125], train_loss:0.162068
Epoch [1/2], Iter [2019/3125], train_loss:0.130321
Epoch [1/2], Iter [2020/3125], train_loss:0.124815
Epoch [1/2], Iter [2021/3125], train_loss:0.145361
Epoch [1/2], Iter [2022/3125], train_loss:0.146677
Epoch [1/2], Iter [2023/3125], train_loss:0.165708
Epoch [1/2], Iter [2024/3125], train_loss:0.176067
Epoch [1/2], Iter [2025/3125], train_loss:0.168302
Epoch [1/2], Iter [2026/3125], train_loss:0.154875
Epoch [1/2], Iter [2027/3125], train_loss:0.164601
Epoch [1/2], Iter [2028/3125], train_loss:0.171662
Epoch [1/2], Iter [2029/3125], train_loss:0.151342
Epoch [1/2], Iter [2030/3125], train_loss:0.162905
Epoch [1/2], Iter [2031/3125], train_loss:0.155055
Epoch [1/2], Iter [2032/3125], train_loss:0.133530
Epoch [1/2], Iter [2033/3125], train_loss:0.145367
Epoch [1/2], Iter [2034/3125], train_loss:0.140172
Epoch [1/2], Iter [2035/3125], train_loss:0.166013
Epoch [1/2], Iter [2036/3125], train_loss:0.160138
Epoch [1/2], Iter [2037/3125], train_loss:0.149895
Epoch [1/2], Iter [2038/3125], train_loss:0.158618
Epoch [1/2], Iter [2039/3125], train_loss:0.182976
Epoch [1/2], Iter [2040/3125], train_loss:0.163650
Epoch [1/2], Iter [2041/3125], train_loss:0.156669
Epoch [1/2], Iter [2042/3125], train_loss:0.161290
Epoch [1/2], Iter [2043/3125], train_loss:0.162484
Epoch [1/2], Iter [2044/3125], train_loss:0.167114
Epoch [1/2], Iter [2045/3125], train_loss:0.170945
Epoch [1/2], Iter [2046/3125], train_loss:0.155789
Epoch [1/2], Iter [2047/3125], train_loss:0.164060
Epoch [1/2], Iter [2048/3125], train_loss:0.186333
Epoch [1/2], Iter [2049/3125], train_loss:0.160162
Epoch [1/2], Iter [2050/3125], train_loss:0.159823
Epoch [1/2], Iter [2051/3125], train_loss:0.158371
Epoch [1/2], Iter [2052/3125], train_loss:0.159072
Epoch [1/2], Iter [2053/3125], train_loss:0.173952
Epoch [1/2], Iter [2054/3125], train_loss:0.161498
Epoch [1/2], Iter [2055/3125], train_loss:0.147181
Epoch [1/2], Iter [2056/3125], train_loss:0.176381
Epoch [1/2], Iter [2057/3125], train_loss:0.133942
Epoch [1/2], Iter [2058/3125], train_loss:0.144585
Epoch [1/2], Iter [2059/3125], train_loss:0.162973
Epoch [1/2], Iter [2060/3125], train_loss:0.165241
Epoch [1/2], Iter [2061/3125], train_loss:0.174846
Epoch [1/2], Iter [2062/3125], train_loss:0.180418
Epoch [1/2], Iter [2063/3125], train_loss:0.182369
Epoch [1/2], Iter [2064/3125], train_loss:0.148855
Epoch [1/2], Iter [2065/3125], train_loss:0.180047
Epoch [1/2], Iter [2066/3125], train_loss:0.131797
Epoch [1/2], Iter [2067/3125], train_loss:0.161162
Epoch [1/2], Iter [2068/3125], train_loss:0.153623
Epoch [1/2], Iter [2069/3125], train_loss:0.177079
Epoch [1/2], Iter [2070/3125], train_loss:0.166252
Epoch [1/2], Iter [2071/3125], train_loss:0.167907
Epoch [1/2], Iter [2072/3125], train_loss:0.168837
Epoch [1/2], Iter [2073/3125], train_loss:0.178388
Epoch [1/2], Iter [2074/3125], train_loss:0.167471
Epoch [1/2], Iter [2075/3125], train_loss:0.182263
Epoch [1/2], Iter [2076/3125], train_loss:0.164933
Epoch [1/2], Iter [2077/3125], train_loss:0.159932
Epoch [1/2], Iter [2078/3125], train_loss:0.165879
Epoch [1/2], Iter [2079/3125], train_loss:0.168112
Epoch [1/2], Iter [2080/3125], train_loss:0.164080
Epoch [1/2], Iter [2081/3125], train_loss:0.177289
Epoch [1/2], Iter [2082/3125], train_loss:0.156052
Epoch [1/2], Iter [2083/3125], train_loss:0.150370
Epoch [1/2], Iter [2084/3125], train_loss:0.157389
Epoch [1/2], Iter [2085/3125], train_loss:0.164571
Epoch [1/2], Iter [2086/3125], train_loss:0.165030
Epoch [1/2], Iter [2087/3125], train_loss:0.165491
Epoch [1/2], Iter [2088/3125], train_loss:0.157076
Epoch [1/2], Iter [2089/3125], train_loss:0.157584
Epoch [1/2], Iter [2090/3125], train_loss:0.142475
Epoch [1/2], Iter [2091/3125], train_loss:0.161959
Epoch [1/2], Iter [2092/3125], train_loss:0.150067
Epoch [1/2], Iter [2093/3125], train_loss:0.169877
Epoch [1/2], Iter [2094/3125], train_loss:0.175256
Epoch [1/2], Iter [2095/3125], train_loss:0.150007
Epoch [1/2], Iter [2096/3125], train_loss:0.175035
Epoch [1/2], Iter [2097/3125], train_loss:0.143745
Epoch [1/2], Iter [2098/3125], train_loss:0.175930
Epoch [1/2], Iter [2099/3125], train_loss:0.148834
Epoch [1/2], Iter [2100/3125], train_loss:0.165045
Epoch [1/2], Iter [2101/3125], train_loss:0.142969
Epoch [1/2], Iter [2102/3125], train_loss:0.147515
Epoch [1/2], Iter [2103/3125], train_loss:0.144696
Epoch [1/2], Iter [2104/3125], train_loss:0.170307
Epoch [1/2], Iter [2105/3125], train_loss:0.153275
Epoch [1/2], Iter [2106/3125], train_loss:0.174566
Epoch [1/2], Iter [2107/3125], train_loss:0.168739
Epoch [1/2], Iter [2108/3125], train_loss:0.168275
Epoch [1/2], Iter [2109/3125], train_loss:0.156382
Epoch [1/2], Iter [2110/3125], train_loss:0.180446
Epoch [1/2], Iter [2111/3125], train_loss:0.168478
Epoch [1/2], Iter [2112/3125], train_loss:0.161389
Epoch [1/2], Iter [2113/3125], train_loss:0.166829
Epoch [1/2], Iter [2114/3125], train_loss:0.144316
Epoch [1/2], Iter [2115/3125], train_loss:0.180950
Epoch [1/2], Iter [2116/3125], train_loss:0.160766
Epoch [1/2], Iter [2117/3125], train_loss:0.134064
Epoch [1/2], Iter [2118/3125], train_loss:0.133301
Epoch [1/2], Iter [2119/3125], train_loss:0.156353
Epoch [1/2], Iter [2120/3125], train_loss:0.155335
Epoch [1/2], Iter [2121/3125], train_loss:0.156238
Epoch [1/2], Iter [2122/3125], train_loss:0.167666
Epoch [1/2], Iter [2123/3125], train_loss:0.139103
Epoch [1/2], Iter [2124/3125], train_loss:0.164919
Epoch [1/2], Iter [2125/3125], train_loss:0.172402
Epoch [1/2], Iter [2126/3125], train_loss:0.156982
Epoch [1/2], Iter [2127/3125], train_loss:0.174418
Epoch [1/2], Iter [2128/3125], train_loss:0.163564
Epoch [1/2], Iter [2129/3125], train_loss:0.155914
Epoch [1/2], Iter [2130/3125], train_loss:0.155929
Epoch [1/2], Iter [2131/3125], train_loss:0.164747
Epoch [1/2], Iter [2132/3125], train_loss:0.169162
Epoch [1/2], Iter [2133/3125], train_loss:0.176287
Epoch [1/2], Iter [2134/3125], train_loss:0.145140
Epoch [1/2], Iter [2135/3125], train_loss:0.172772
Epoch [1/2], Iter [2136/3125], train_loss:0.172442
Epoch [1/2], Iter [2137/3125], train_loss:0.166545
Epoch [1/2], Iter [2138/3125], train_loss:0.155658
Epoch [1/2], Iter [2139/3125], train_loss:0.144825
Epoch [1/2], Iter [2140/3125], train_loss:0.165197
Epoch [1/2], Iter [2141/3125], train_loss:0.179990
Epoch [1/2], Iter [2142/3125], train_loss:0.155233
Epoch [1/2], Iter [2143/3125], train_loss:0.162739
Epoch [1/2], Iter [2144/3125], train_loss:0.156480
Epoch [1/2], Iter [2145/3125], train_loss:0.155214
Epoch [1/2], Iter [2146/3125], train_loss:0.162011
Epoch [1/2], Iter [2147/3125], train_loss:0.163268
Epoch [1/2], Iter [2148/3125], train_loss:0.180236
Epoch [1/2], Iter [2149/3125], train_loss:0.173788
Epoch [1/2], Iter [2150/3125], train_loss:0.155130
Epoch [1/2], Iter [2151/3125], train_loss:0.165528
Epoch [1/2], Iter [2152/3125], train_loss:0.176281
Epoch [1/2], Iter [2153/3125], train_loss:0.151886
Epoch [1/2], Iter [2154/3125], train_loss:0.145217
Epoch [1/2], Iter [2155/3125], train_loss:0.162727
Epoch [1/2], Iter [2156/3125], train_loss:0.167274
Epoch [1/2], Iter [2157/3125], train_loss:0.192076
Epoch [1/2], Iter [2158/3125], train_loss:0.163333
Epoch [1/2], Iter [2159/3125], train_loss:0.162825
Epoch [1/2], Iter [2160/3125], train_loss:0.187321
Epoch [1/2], Iter [2161/3125], train_loss:0.158406
Epoch [1/2], Iter [2162/3125], train_loss:0.179870
Epoch [1/2], Iter [2163/3125], train_loss:0.138683
Epoch [1/2], Iter [2164/3125], train_loss:0.150967
Epoch [1/2], Iter [2165/3125], train_loss:0.131095
Epoch [1/2], Iter [2166/3125], train_loss:0.183638
Epoch [1/2], Iter [2167/3125], train_loss:0.159978
Epoch [1/2], Iter [2168/3125], train_loss:0.186739
Epoch [1/2], Iter [2169/3125], train_loss:0.181425
Epoch [1/2], Iter [2170/3125], train_loss:0.168422
Epoch [1/2], Iter [2171/3125], train_loss:0.167889
Epoch [1/2], Iter [2172/3125], train_loss:0.140880
Epoch [1/2], Iter [2173/3125], train_loss:0.181237
Epoch [1/2], Iter [2174/3125], train_loss:0.159820
Epoch [1/2], Iter [2175/3125], train_loss:0.157872
Epoch [1/2], Iter [2176/3125], train_loss:0.158261
Epoch [1/2], Iter [2177/3125], train_loss:0.142904
Epoch [1/2], Iter [2178/3125], train_loss:0.172463
Epoch [1/2], Iter [2179/3125], train_loss:0.149298
Epoch [1/2], Iter [2180/3125], train_loss:0.159796
Epoch [1/2], Iter [2181/3125], train_loss:0.169302
Epoch [1/2], Iter [2182/3125], train_loss:0.185946
Epoch [1/2], Iter [2183/3125], train_loss:0.158634
Epoch [1/2], Iter [2184/3125], train_loss:0.163283
Epoch [1/2], Iter [2185/3125], train_loss:0.147155
Epoch [1/2], Iter [2186/3125], train_loss:0.169214
Epoch [1/2], Iter [2187/3125], train_loss:0.170750
Epoch [1/2], Iter [2188/3125], train_loss:0.164639
Epoch [1/2], Iter [2189/3125], train_loss:0.167140
Epoch [1/2], Iter [2190/3125], train_loss:0.165556
Epoch [1/2], Iter [2191/3125], train_loss:0.173164
Epoch [1/2], Iter [2192/3125], train_loss:0.158743
Epoch [1/2], Iter [2193/3125], train_loss:0.165279
Epoch [1/2], Iter [2194/3125], train_loss:0.151960
Epoch [1/2], Iter [2195/3125], train_loss:0.149383
Epoch [1/2], Iter [2196/3125], train_loss:0.157849
Epoch [1/2], Iter [2197/3125], train_loss:0.168340
Epoch [1/2], Iter [2198/3125], train_loss:0.156435
Epoch [1/2], Iter [2199/3125], train_loss:0.139338
Epoch [1/2], Iter [2200/3125], train_loss:0.160937
Epoch [1/2], Iter [2201/3125], train_loss:0.155327
Epoch [1/2], Iter [2202/3125], train_loss:0.177210
Epoch [1/2], Iter [2203/3125], train_loss:0.178461
Epoch [1/2], Iter [2204/3125], train_loss:0.171101
Epoch [1/2], Iter [2205/3125], train_loss:0.184354
Epoch [1/2], Iter [2206/3125], train_loss:0.144931
Epoch [1/2], Iter [2207/3125], train_loss:0.163896
Epoch [1/2], Iter [2208/3125], train_loss:0.169595
Epoch [1/2], Iter [2209/3125], train_loss:0.157569
Epoch [1/2], Iter [2210/3125], train_loss:0.178770
Epoch [1/2], Iter [2211/3125], train_loss:0.143887
Epoch [1/2], Iter [2212/3125], train_loss:0.161863
Epoch [1/2], Iter [2213/3125], train_loss:0.141911
Epoch [1/2], Iter [2214/3125], train_loss:0.141126
Epoch [1/2], Iter [2215/3125], train_loss:0.181017
Epoch [1/2], Iter [2216/3125], train_loss:0.147786
Epoch [1/2], Iter [2217/3125], train_loss:0.141503
Epoch [1/2], Iter [2218/3125], train_loss:0.157903
Epoch [1/2], Iter [2219/3125], train_loss:0.154843
Epoch [1/2], Iter [2220/3125], train_loss:0.148844
Epoch [1/2], Iter [2221/3125], train_loss:0.161168
Epoch [1/2], Iter [2222/3125], train_loss:0.171695
Epoch [1/2], Iter [2223/3125], train_loss:0.163227
Epoch [1/2], Iter [2224/3125], train_loss:0.159965
Epoch [1/2], Iter [2225/3125], train_loss:0.162049
Epoch [1/2], Iter [2226/3125], train_loss:0.172635
Epoch [1/2], Iter [2227/3125], train_loss:0.152669
Epoch [1/2], Iter [2228/3125], train_loss:0.154830
Epoch [1/2], Iter [2229/3125], train_loss:0.163990
Epoch [1/2], Iter [2230/3125], train_loss:0.168742
Epoch [1/2], Iter [2231/3125], train_loss:0.183449
Epoch [1/2], Iter [2232/3125], train_loss:0.148132
Epoch [1/2], Iter [2233/3125], train_loss:0.175107
Epoch [1/2], Iter [2234/3125], train_loss:0.157387
Epoch [1/2], Iter [2235/3125], train_loss:0.150439
Epoch [1/2], Iter [2236/3125], train_loss:0.141313
Epoch [1/2], Iter [2237/3125], train_loss:0.150460
Epoch [1/2], Iter [2238/3125], train_loss:0.157583
Epoch [1/2], Iter [2239/3125], train_loss:0.160917
Epoch [1/2], Iter [2240/3125], train_loss:0.175020
Epoch [1/2], Iter [2241/3125], train_loss:0.172231
Epoch [1/2], Iter [2242/3125], train_loss:0.158164
Epoch [1/2], Iter [2243/3125], train_loss:0.149750
Epoch [1/2], Iter [2244/3125], train_loss:0.160434
Epoch [1/2], Iter [2245/3125], train_loss:0.163752
Epoch [1/2], Iter [2246/3125], train_loss:0.146884
Epoch [1/2], Iter [2247/3125], train_loss:0.158364
Epoch [1/2], Iter [2248/3125], train_loss:0.156011
Epoch [1/2], Iter [2249/3125], train_loss:0.173126
Epoch [1/2], Iter [2250/3125], train_loss:0.193422
Epoch [1/2], Iter [2251/3125], train_loss:0.158470
Epoch [1/2], Iter [2252/3125], train_loss:0.151245
Epoch [1/2], Iter [2253/3125], train_loss:0.158039
Epoch [1/2], Iter [2254/3125], train_loss:0.152211
Epoch [1/2], Iter [2255/3125], train_loss:0.167252
Epoch [1/2], Iter [2256/3125], train_loss:0.156284
Epoch [1/2], Iter [2257/3125], train_loss:0.157557
Epoch [1/2], Iter [2258/3125], train_loss:0.149407
Epoch [1/2], Iter [2259/3125], train_loss:0.166932
Epoch [1/2], Iter [2260/3125], train_loss:0.174253
Epoch [1/2], Iter [2261/3125], train_loss:0.171375
Epoch [1/2], Iter [2262/3125], train_loss:0.166366
Epoch [1/2], Iter [2263/3125], train_loss:0.149717
Epoch [1/2], Iter [2264/3125], train_loss:0.166810
Epoch [1/2], Iter [2265/3125], train_loss:0.162488
Epoch [1/2], Iter [2266/3125], train_loss:0.165728
Epoch [1/2], Iter [2267/3125], train_loss:0.168702
Epoch [1/2], Iter [2268/3125], train_loss:0.143795
Epoch [1/2], Iter [2269/3125], train_loss:0.125662
Epoch [1/2], Iter [2270/3125], train_loss:0.152566
Epoch [1/2], Iter [2271/3125], train_loss:0.166331
Epoch [1/2], Iter [2272/3125], train_loss:0.146904
Epoch [1/2], Iter [2273/3125], train_loss:0.176470
Epoch [1/2], Iter [2274/3125], train_loss:0.166159
Epoch [1/2], Iter [2275/3125], train_loss:0.164638
Epoch [1/2], Iter [2276/3125], train_loss:0.174697
Epoch [1/2], Iter [2277/3125], train_loss:0.172518
Epoch [1/2], Iter [2278/3125], train_loss:0.179059
Epoch [1/2], Iter [2279/3125], train_loss:0.153997
Epoch [1/2], Iter [2280/3125], train_loss:0.164288
Epoch [1/2], Iter [2281/3125], train_loss:0.156835
Epoch [1/2], Iter [2282/3125], train_loss:0.172427
Epoch [1/2], Iter [2283/3125], train_loss:0.140807
Epoch [1/2], Iter [2284/3125], train_loss:0.176298
Epoch [1/2], Iter [2285/3125], train_loss:0.167197
Epoch [1/2], Iter [2286/3125], train_loss:0.155124
Epoch [1/2], Iter [2287/3125], train_loss:0.168967
Epoch [1/2], Iter [2288/3125], train_loss:0.155021
Epoch [1/2], Iter [2289/3125], train_loss:0.200430
Epoch [1/2], Iter [2290/3125], train_loss:0.168794
Epoch [1/2], Iter [2291/3125], train_loss:0.180748
Epoch [1/2], Iter [2292/3125], train_loss:0.151308
Epoch [1/2], Iter [2293/3125], train_loss:0.168426
Epoch [1/2], Iter [2294/3125], train_loss:0.160170
Epoch [1/2], Iter [2295/3125], train_loss:0.170112
Epoch [1/2], Iter [2296/3125], train_loss:0.182302
Epoch [1/2], Iter [2297/3125], train_loss:0.165573
Epoch [1/2], Iter [2298/3125], train_loss:0.154088
Epoch [1/2], Iter [2299/3125], train_loss:0.157324
Epoch [1/2], Iter [2300/3125], train_loss:0.186557
Epoch [1/2], Iter [2301/3125], train_loss:0.159513
Epoch [1/2], Iter [2302/3125], train_loss:0.159842
Epoch [1/2], Iter [2303/3125], train_loss:0.196757
Epoch [1/2], Iter [2304/3125], train_loss:0.164728
Epoch [1/2], Iter [2305/3125], train_loss:0.159394
Epoch [1/2], Iter [2306/3125], train_loss:0.162070
Epoch [1/2], Iter [2307/3125], train_loss:0.150942
Epoch [1/2], Iter [2308/3125], train_loss:0.178885
Epoch [1/2], Iter [2309/3125], train_loss:0.167701
Epoch [1/2], Iter [2310/3125], train_loss:0.172832
Epoch [1/2], Iter [2311/3125], train_loss:0.151420
Epoch [1/2], Iter [2312/3125], train_loss:0.177722
Epoch [1/2], Iter [2313/3125], train_loss:0.152966
Epoch [1/2], Iter [2314/3125], train_loss:0.144942
Epoch [1/2], Iter [2315/3125], train_loss:0.166451
Epoch [1/2], Iter [2316/3125], train_loss:0.167570
Epoch [1/2], Iter [2317/3125], train_loss:0.173486
Epoch [1/2], Iter [2318/3125], train_loss:0.167726
Epoch [1/2], Iter [2319/3125], train_loss:0.150083
Epoch [1/2], Iter [2320/3125], train_loss:0.161335
Epoch [1/2], Iter [2321/3125], train_loss:0.163541
Epoch [1/2], Iter [2322/3125], train_loss:0.139134
Epoch [1/2], Iter [2323/3125], train_loss:0.172992
Epoch [1/2], Iter [2324/3125], train_loss:0.166975
Epoch [1/2], Iter [2325/3125], train_loss:0.161279
Epoch [1/2], Iter [2326/3125], train_loss:0.152028
Epoch [1/2], Iter [2327/3125], train_loss:0.157792
Epoch [1/2], Iter [2328/3125], train_loss:0.146232
Epoch [1/2], Iter [2329/3125], train_loss:0.169053
Epoch [1/2], Iter [2330/3125], train_loss:0.137772
Epoch [1/2], Iter [2331/3125], train_loss:0.153836
Epoch [1/2], Iter [2332/3125], train_loss:0.173346
Epoch [1/2], Iter [2333/3125], train_loss:0.170181
Epoch [1/2], Iter [2334/3125], train_loss:0.153624
Epoch [1/2], Iter [2335/3125], train_loss:0.164155
Epoch [1/2], Iter [2336/3125], train_loss:0.162456
Epoch [1/2], Iter [2337/3125], train_loss:0.165626
Epoch [1/2], Iter [2338/3125], train_loss:0.165643
Epoch [1/2], Iter [2339/3125], train_loss:0.152838
Epoch [1/2], Iter [2340/3125], train_loss:0.166339
Epoch [1/2], Iter [2341/3125], train_loss:0.162481
Epoch [1/2], Iter [2342/3125], train_loss:0.155828
Epoch [1/2], Iter [2343/3125], train_loss:0.180035
Epoch [1/2], Iter [2344/3125], train_loss:0.155793
Epoch [1/2], Iter [2345/3125], train_loss:0.141963
Epoch [1/2], Iter [2346/3125], train_loss:0.175218
Epoch [1/2], Iter [2347/3125], train_loss:0.172332
Epoch [1/2], Iter [2348/3125], train_loss:0.170206
Epoch [1/2], Iter [2349/3125], train_loss:0.158258
Epoch [1/2], Iter [2350/3125], train_loss:0.135423
Epoch [1/2], Iter [2351/3125], train_loss:0.158765
Epoch [1/2], Iter [2352/3125], train_loss:0.161856
Epoch [1/2], Iter [2353/3125], train_loss:0.165698
Epoch [1/2], Iter [2354/3125], train_loss:0.166844
Epoch [1/2], Iter [2355/3125], train_loss:0.167199
Epoch [1/2], Iter [2356/3125], train_loss:0.168643
Epoch [1/2], Iter [2357/3125], train_loss:0.145062
Epoch [1/2], Iter [2358/3125], train_loss:0.159673
Epoch [1/2], Iter [2359/3125], train_loss:0.172321
Epoch [1/2], Iter [2360/3125], train_loss:0.162261
Epoch [1/2], Iter [2361/3125], train_loss:0.160964
Epoch [1/2], Iter [2362/3125], train_loss:0.170365
Epoch [1/2], Iter [2363/3125], train_loss:0.158219
Epoch [1/2], Iter [2364/3125], train_loss:0.151682
Epoch [1/2], Iter [2365/3125], train_loss:0.173451
Epoch [1/2], Iter [2366/3125], train_loss:0.186411
Epoch [1/2], Iter [2367/3125], train_loss:0.151850
Epoch [1/2], Iter [2368/3125], train_loss:0.153410
Epoch [1/2], Iter [2369/3125], train_loss:0.150387
Epoch [1/2], Iter [2370/3125], train_loss:0.151061
Epoch [1/2], Iter [2371/3125], train_loss:0.155576
Epoch [1/2], Iter [2372/3125], train_loss:0.171615
Epoch [1/2], Iter [2373/3125], train_loss:0.152891
Epoch [1/2], Iter [2374/3125], train_loss:0.173333
Epoch [1/2], Iter [2375/3125], train_loss:0.178193
Epoch [1/2], Iter [2376/3125], train_loss:0.158169
Epoch [1/2], Iter [2377/3125], train_loss:0.171027
Epoch [1/2], Iter [2378/3125], train_loss:0.183264
Epoch [1/2], Iter [2379/3125], train_loss:0.153622
Epoch [1/2], Iter [2380/3125], train_loss:0.167066
Epoch [1/2], Iter [2381/3125], train_loss:0.151149
Epoch [1/2], Iter [2382/3125], train_loss:0.149457
Epoch [1/2], Iter [2383/3125], train_loss:0.151908
Epoch [1/2], Iter [2384/3125], train_loss:0.167173
Epoch [1/2], Iter [2385/3125], train_loss:0.146533
Epoch [1/2], Iter [2386/3125], train_loss:0.140869
Epoch [1/2], Iter [2387/3125], train_loss:0.161093
Epoch [1/2], Iter [2388/3125], train_loss:0.174047
Epoch [1/2], Iter [2389/3125], train_loss:0.169243
Epoch [1/2], Iter [2390/3125], train_loss:0.153055
Epoch [1/2], Iter [2391/3125], train_loss:0.166778
Epoch [1/2], Iter [2392/3125], train_loss:0.171477
Epoch [1/2], Iter [2393/3125], train_loss:0.148076
Epoch [1/2], Iter [2394/3125], train_loss:0.174037
Epoch [1/2], Iter [2395/3125], train_loss:0.152578
Epoch [1/2], Iter [2396/3125], train_loss:0.184196
Epoch [1/2], Iter [2397/3125], train_loss:0.167483
Epoch [1/2], Iter [2398/3125], train_loss:0.164854
Epoch [1/2], Iter [2399/3125], train_loss:0.173233
Epoch [1/2], Iter [2400/3125], train_loss:0.138813
Epoch [1/2], Iter [2401/3125], train_loss:0.152254
Epoch [1/2], Iter [2402/3125], train_loss:0.168025
Epoch [1/2], Iter [2403/3125], train_loss:0.157279
Epoch [1/2], Iter [2404/3125], train_loss:0.148963
Epoch [1/2], Iter [2405/3125], train_loss:0.144112
Epoch [1/2], Iter [2406/3125], train_loss:0.169978
Epoch [1/2], Iter [2407/3125], train_loss:0.153412
Epoch [1/2], Iter [2408/3125], train_loss:0.173826
Epoch [1/2], Iter [2409/3125], train_loss:0.169680
Epoch [1/2], Iter [2410/3125], train_loss:0.162930
Epoch [1/2], Iter [2411/3125], train_loss:0.139202
Epoch [1/2], Iter [2412/3125], train_loss:0.154762
Epoch [1/2], Iter [2413/3125], train_loss:0.154299
Epoch [1/2], Iter [2414/3125], train_loss:0.167414
Epoch [1/2], Iter [2415/3125], train_loss:0.178091
Epoch [1/2], Iter [2416/3125], train_loss:0.173954
Epoch [1/2], Iter [2417/3125], train_loss:0.166982
Epoch [1/2], Iter [2418/3125], train_loss:0.170560
Epoch [1/2], Iter [2419/3125], train_loss:0.170997
Epoch [1/2], Iter [2420/3125], train_loss:0.153861
Epoch [1/2], Iter [2421/3125], train_loss:0.177754
Epoch [1/2], Iter [2422/3125], train_loss:0.168872
Epoch [1/2], Iter [2423/3125], train_loss:0.144303
Epoch [1/2], Iter [2424/3125], train_loss:0.164720
Epoch [1/2], Iter [2425/3125], train_loss:0.186620
Epoch [1/2], Iter [2426/3125], train_loss:0.158638
Epoch [1/2], Iter [2427/3125], train_loss:0.172386
Epoch [1/2], Iter [2428/3125], train_loss:0.167100
Epoch [1/2], Iter [2429/3125], train_loss:0.167147
Epoch [1/2], Iter [2430/3125], train_loss:0.182128
Epoch [1/2], Iter [2431/3125], train_loss:0.165804
Epoch [1/2], Iter [2432/3125], train_loss:0.180088
Epoch [1/2], Iter [2433/3125], train_loss:0.165245
Epoch [1/2], Iter [2434/3125], train_loss:0.159391
Epoch [1/2], Iter [2435/3125], train_loss:0.152686
Epoch [1/2], Iter [2436/3125], train_loss:0.161874
Epoch [1/2], Iter [2437/3125], train_loss:0.165142
Epoch [1/2], Iter [2438/3125], train_loss:0.160963
Epoch [1/2], Iter [2439/3125], train_loss:0.166472
Epoch [1/2], Iter [2440/3125], train_loss:0.158173
Epoch [1/2], Iter [2441/3125], train_loss:0.173994
Epoch [1/2], Iter [2442/3125], train_loss:0.151297
Epoch [1/2], Iter [2443/3125], train_loss:0.152010
Epoch [1/2], Iter [2444/3125], train_loss:0.160982
Epoch [1/2], Iter [2445/3125], train_loss:0.182511
Epoch [1/2], Iter [2446/3125], train_loss:0.171740
Epoch [1/2], Iter [2447/3125], train_loss:0.169194
Epoch [1/2], Iter [2448/3125], train_loss:0.160217
Epoch [1/2], Iter [2449/3125], train_loss:0.170634
Epoch [1/2], Iter [2450/3125], train_loss:0.174725
Epoch [1/2], Iter [2451/3125], train_loss:0.162844
Epoch [1/2], Iter [2452/3125], train_loss:0.179684
Epoch [1/2], Iter [2453/3125], train_loss:0.165793
Epoch [1/2], Iter [2454/3125], train_loss:0.147170
Epoch [1/2], Iter [2455/3125], train_loss:0.167428
Epoch [1/2], Iter [2456/3125], train_loss:0.156832
Epoch [1/2], Iter [2457/3125], train_loss:0.163711
Epoch [1/2], Iter [2458/3125], train_loss:0.163635
Epoch [1/2], Iter [2459/3125], train_loss:0.169788
Epoch [1/2], Iter [2460/3125], train_loss:0.161291
Epoch [1/2], Iter [2461/3125], train_loss:0.176288
Epoch [1/2], Iter [2462/3125], train_loss:0.173527
Epoch [1/2], Iter [2463/3125], train_loss:0.198670
Epoch [1/2], Iter [2464/3125], train_loss:0.163765
Epoch [1/2], Iter [2465/3125], train_loss:0.155121
Epoch [1/2], Iter [2466/3125], train_loss:0.157210
Epoch [1/2], Iter [2467/3125], train_loss:0.158318
Epoch [1/2], Iter [2468/3125], train_loss:0.190069
Epoch [1/2], Iter [2469/3125], train_loss:0.157674
Epoch [1/2], Iter [2470/3125], train_loss:0.153022
Epoch [1/2], Iter [2471/3125], train_loss:0.178211
Epoch [1/2], Iter [2472/3125], train_loss:0.165668
Epoch [1/2], Iter [2473/3125], train_loss:0.170597
Epoch [1/2], Iter [2474/3125], train_loss:0.148514
Epoch [1/2], Iter [2475/3125], train_loss:0.161165
Epoch [1/2], Iter [2476/3125], train_loss:0.159940
Epoch [1/2], Iter [2477/3125], train_loss:0.163364
Epoch [1/2], Iter [2478/3125], train_loss:0.160939
Epoch [1/2], Iter [2479/3125], train_loss:0.188242
Epoch [1/2], Iter [2480/3125], train_loss:0.170161
Epoch [1/2], Iter [2481/3125], train_loss:0.166997
Epoch [1/2], Iter [2482/3125], train_loss:0.173182
Epoch [1/2], Iter [2483/3125], train_loss:0.156736
Epoch [1/2], Iter [2484/3125], train_loss:0.162785
Epoch [1/2], Iter [2485/3125], train_loss:0.159454
Epoch [1/2], Iter [2486/3125], train_loss:0.172418
Epoch [1/2], Iter [2487/3125], train_loss:0.166055
Epoch [1/2], Iter [2488/3125], train_loss:0.166522
Epoch [1/2], Iter [2489/3125], train_loss:0.157236
Epoch [1/2], Iter [2490/3125], train_loss:0.173360
Epoch [1/2], Iter [2491/3125], train_loss:0.147073
Epoch [1/2], Iter [2492/3125], train_loss:0.154806
Epoch [1/2], Iter [2493/3125], train_loss:0.159782
Epoch [1/2], Iter [2494/3125], train_loss:0.175359
Epoch [1/2], Iter [2495/3125], train_loss:0.152874
Epoch [1/2], Iter [2496/3125], train_loss:0.175603
Epoch [1/2], Iter [2497/3125], train_loss:0.151182
Epoch [1/2], Iter [2498/3125], train_loss:0.133273
Epoch [1/2], Iter [2499/3125], train_loss:0.162480
Epoch [1/2], Iter [2500/3125], train_loss:0.172038
Epoch [1/2], Iter [2501/3125], train_loss:0.163592
Epoch [1/2], Iter [2502/3125], train_loss:0.168842
Epoch [1/2], Iter [2503/3125], train_loss:0.167579
Epoch [1/2], Iter [2504/3125], train_loss:0.169892
Epoch [1/2], Iter [2505/3125], train_loss:0.184179
Epoch [1/2], Iter [2506/3125], train_loss:0.172049
Epoch [1/2], Iter [2507/3125], train_loss:0.181183
Epoch [1/2], Iter [2508/3125], train_loss:0.157703
Epoch [1/2], Iter [2509/3125], train_loss:0.156251
Epoch [1/2], Iter [2510/3125], train_loss:0.140083
Epoch [1/2], Iter [2511/3125], train_loss:0.155766
Epoch [1/2], Iter [2512/3125], train_loss:0.171320
Epoch [1/2], Iter [2513/3125], train_loss:0.165249
Epoch [1/2], Iter [2514/3125], train_loss:0.144336
Epoch [1/2], Iter [2515/3125], train_loss:0.169332
Epoch [1/2], Iter [2516/3125], train_loss:0.152470
Epoch [1/2], Iter [2517/3125], train_loss:0.161122
Epoch [1/2], Iter [2518/3125], train_loss:0.182971
Epoch [1/2], Iter [2519/3125], train_loss:0.164621
Epoch [1/2], Iter [2520/3125], train_loss:0.175796
Epoch [1/2], Iter [2521/3125], train_loss:0.176611
Epoch [1/2], Iter [2522/3125], train_loss:0.161589
Epoch [1/2], Iter [2523/3125], train_loss:0.153558
Epoch [1/2], Iter [2524/3125], train_loss:0.177934
Epoch [1/2], Iter [2525/3125], train_loss:0.140108
Epoch [1/2], Iter [2526/3125], train_loss:0.170537
Epoch [1/2], Iter [2527/3125], train_loss:0.190064
Epoch [1/2], Iter [2528/3125], train_loss:0.150987
Epoch [1/2], Iter [2529/3125], train_loss:0.153076
Epoch [1/2], Iter [2530/3125], train_loss:0.153231
Epoch [1/2], Iter [2531/3125], train_loss:0.151433
Epoch [1/2], Iter [2532/3125], train_loss:0.165380
Epoch [1/2], Iter [2533/3125], train_loss:0.154326
Epoch [1/2], Iter [2534/3125], train_loss:0.148860
Epoch [1/2], Iter [2535/3125], train_loss:0.182532
Epoch [1/2], Iter [2536/3125], train_loss:0.184858
Epoch [1/2], Iter [2537/3125], train_loss:0.144190
Epoch [1/2], Iter [2538/3125], train_loss:0.160582
Epoch [1/2], Iter [2539/3125], train_loss:0.150244
Epoch [1/2], Iter [2540/3125], train_loss:0.163084
Epoch [1/2], Iter [2541/3125], train_loss:0.173798
Epoch [1/2], Iter [2542/3125], train_loss:0.180224
Epoch [1/2], Iter [2543/3125], train_loss:0.171645
Epoch [1/2], Iter [2544/3125], train_loss:0.170542
Epoch [1/2], Iter [2545/3125], train_loss:0.150921
Epoch [1/2], Iter [2546/3125], train_loss:0.141499
Epoch [1/2], Iter [2547/3125], train_loss:0.154087
Epoch [1/2], Iter [2548/3125], train_loss:0.146057
Epoch [1/2], Iter [2549/3125], train_loss:0.179915
Epoch [1/2], Iter [2550/3125], train_loss:0.178421
Epoch [1/2], Iter [2551/3125], train_loss:0.162338
Epoch [1/2], Iter [2552/3125], train_loss:0.159943
Epoch [1/2], Iter [2553/3125], train_loss:0.166942
Epoch [1/2], Iter [2554/3125], train_loss:0.161777
Epoch [1/2], Iter [2555/3125], train_loss:0.173371
Epoch [1/2], Iter [2556/3125], train_loss:0.149645
Epoch [1/2], Iter [2557/3125], train_loss:0.150998
Epoch [1/2], Iter [2558/3125], train_loss:0.168478
Epoch [1/2], Iter [2559/3125], train_loss:0.161073
Epoch [1/2], Iter [2560/3125], train_loss:0.153746
Epoch [1/2], Iter [2561/3125], train_loss:0.156996
Epoch [1/2], Iter [2562/3125], train_loss:0.175018
Epoch [1/2], Iter [2563/3125], train_loss:0.161457
Epoch [1/2], Iter [2564/3125], train_loss:0.181512
Epoch [1/2], Iter [2565/3125], train_loss:0.159499
Epoch [1/2], Iter [2566/3125], train_loss:0.155685
Epoch [1/2], Iter [2567/3125], train_loss:0.160816
Epoch [1/2], Iter [2568/3125], train_loss:0.167257
Epoch [1/2], Iter [2569/3125], train_loss:0.168003
Epoch [1/2], Iter [2570/3125], train_loss:0.156276
Epoch [1/2], Iter [2571/3125], train_loss:0.166197
Epoch [1/2], Iter [2572/3125], train_loss:0.171228
Epoch [1/2], Iter [2573/3125], train_loss:0.169274
Epoch [1/2], Iter [2574/3125], train_loss:0.178607
Epoch [1/2], Iter [2575/3125], train_loss:0.180143
Epoch [1/2], Iter [2576/3125], train_loss:0.165496
Epoch [1/2], Iter [2577/3125], train_loss:0.164666
Epoch [1/2], Iter [2578/3125], train_loss:0.172761
Epoch [1/2], Iter [2579/3125], train_loss:0.142597
Epoch [1/2], Iter [2580/3125], train_loss:0.166856
Epoch [1/2], Iter [2581/3125], train_loss:0.180629
Epoch [1/2], Iter [2582/3125], train_loss:0.155988
Epoch [1/2], Iter [2583/3125], train_loss:0.190004
Epoch [1/2], Iter [2584/3125], train_loss:0.153131
Epoch [1/2], Iter [2585/3125], train_loss:0.149209
Epoch [1/2], Iter [2586/3125], train_loss:0.182763
Epoch [1/2], Iter [2587/3125], train_loss:0.163803
Epoch [1/2], Iter [2588/3125], train_loss:0.164377
Epoch [1/2], Iter [2589/3125], train_loss:0.165225
Epoch [1/2], Iter [2590/3125], train_loss:0.132286
Epoch [1/2], Iter [2591/3125], train_loss:0.157618
Epoch [1/2], Iter [2592/3125], train_loss:0.180062
Epoch [1/2], Iter [2593/3125], train_loss:0.149064
Epoch [1/2], Iter [2594/3125], train_loss:0.182419
Epoch [1/2], Iter [2595/3125], train_loss:0.152154
Epoch [1/2], Iter [2596/3125], train_loss:0.156817
Epoch [1/2], Iter [2597/3125], train_loss:0.158894
Epoch [1/2], Iter [2598/3125], train_loss:0.174006
Epoch [1/2], Iter [2599/3125], train_loss:0.170469
Epoch [1/2], Iter [2600/3125], train_loss:0.163272
Epoch [1/2], Iter [2601/3125], train_loss:0.165293
Epoch [1/2], Iter [2602/3125], train_loss:0.132606
Epoch [1/2], Iter [2603/3125], train_loss:0.181648
Epoch [1/2], Iter [2604/3125], train_loss:0.172091
Epoch [1/2], Iter [2605/3125], train_loss:0.145725
Epoch [1/2], Iter [2606/3125], train_loss:0.159542
Epoch [1/2], Iter [2607/3125], train_loss:0.166341
Epoch [1/2], Iter [2608/3125], train_loss:0.144378
Epoch [1/2], Iter [2609/3125], train_loss:0.174001
Epoch [1/2], Iter [2610/3125], train_loss:0.154200
Epoch [1/2], Iter [2611/3125], train_loss:0.168938
Epoch [1/2], Iter [2612/3125], train_loss:0.151330
Epoch [1/2], Iter [2613/3125], train_loss:0.158763
Epoch [1/2], Iter [2614/3125], train_loss:0.154259
Epoch [1/2], Iter [2615/3125], train_loss:0.155223
Epoch [1/2], Iter [2616/3125], train_loss:0.173738
Epoch [1/2], Iter [2617/3125], train_loss:0.164574
Epoch [1/2], Iter [2618/3125], train_loss:0.171280
Epoch [1/2], Iter [2619/3125], train_loss:0.167967
Epoch [1/2], Iter [2620/3125], train_loss:0.165825
Epoch [1/2], Iter [2621/3125], train_loss:0.163001
Epoch [1/2], Iter [2622/3125], train_loss:0.166808
Epoch [1/2], Iter [2623/3125], train_loss:0.158262
Epoch [1/2], Iter [2624/3125], train_loss:0.152927
Epoch [1/2], Iter [2625/3125], train_loss:0.151799
Epoch [1/2], Iter [2626/3125], train_loss:0.153348
Epoch [1/2], Iter [2627/3125], train_loss:0.145824
Epoch [1/2], Iter [2628/3125], train_loss:0.149315
Epoch [1/2], Iter [2629/3125], train_loss:0.183911
Epoch [1/2], Iter [2630/3125], train_loss:0.153068
Epoch [1/2], Iter [2631/3125], train_loss:0.163764
Epoch [1/2], Iter [2632/3125], train_loss:0.161556
Epoch [1/2], Iter [2633/3125], train_loss:0.177212
Epoch [1/2], Iter [2634/3125], train_loss:0.149619
Epoch [1/2], Iter [2635/3125], train_loss:0.160023
Epoch [1/2], Iter [2636/3125], train_loss:0.169547
Epoch [1/2], Iter [2637/3125], train_loss:0.147591
Epoch [1/2], Iter [2638/3125], train_loss:0.156738
Epoch [1/2], Iter [2639/3125], train_loss:0.148298
Epoch [1/2], Iter [2640/3125], train_loss:0.161786
Epoch [1/2], Iter [2641/3125], train_loss:0.162544
Epoch [1/2], Iter [2642/3125], train_loss:0.168581
Epoch [1/2], Iter [2643/3125], train_loss:0.167225
Epoch [1/2], Iter [2644/3125], train_loss:0.160467
Epoch [1/2], Iter [2645/3125], train_loss:0.166200
Epoch [1/2], Iter [2646/3125], train_loss:0.167931
Epoch [1/2], Iter [2647/3125], train_loss:0.157258
Epoch [1/2], Iter [2648/3125], train_loss:0.142979
Epoch [1/2], Iter [2649/3125], train_loss:0.169719
Epoch [1/2], Iter [2650/3125], train_loss:0.179859
Epoch [1/2], Iter [2651/3125], train_loss:0.154542
Epoch [1/2], Iter [2652/3125], train_loss:0.157200
Epoch [1/2], Iter [2653/3125], train_loss:0.178602
Epoch [1/2], Iter [2654/3125], train_loss:0.145348
Epoch [1/2], Iter [2655/3125], train_loss:0.156349
Epoch [1/2], Iter [2656/3125], train_loss:0.148944
Epoch [1/2], Iter [2657/3125], train_loss:0.157309
Epoch [1/2], Iter [2658/3125], train_loss:0.162670
Epoch [1/2], Iter [2659/3125], train_loss:0.150020
Epoch [1/2], Iter [2660/3125], train_loss:0.157252
Epoch [1/2], Iter [2661/3125], train_loss:0.166470
Epoch [1/2], Iter [2662/3125], train_loss:0.178597
Epoch [1/2], Iter [2663/3125], train_loss:0.145679
Epoch [1/2], Iter [2664/3125], train_loss:0.142497
Epoch [1/2], Iter [2665/3125], train_loss:0.153192
Epoch [1/2], Iter [2666/3125], train_loss:0.155716
Epoch [1/2], Iter [2667/3125], train_loss:0.174556
Epoch [1/2], Iter [2668/3125], train_loss:0.152721
Epoch [1/2], Iter [2669/3125], train_loss:0.169619
Epoch [1/2], Iter [2670/3125], train_loss:0.167028
Epoch [1/2], Iter [2671/3125], train_loss:0.154183
Epoch [1/2], Iter [2672/3125], train_loss:0.175002
Epoch [1/2], Iter [2673/3125], train_loss:0.139364
Epoch [1/2], Iter [2674/3125], train_loss:0.162451
Epoch [1/2], Iter [2675/3125], train_loss:0.157143
Epoch [1/2], Iter [2676/3125], train_loss:0.166282
Epoch [1/2], Iter [2677/3125], train_loss:0.150420
Epoch [1/2], Iter [2678/3125], train_loss:0.172134
Epoch [1/2], Iter [2679/3125], train_loss:0.170172
Epoch [1/2], Iter [2680/3125], train_loss:0.188591
Epoch [1/2], Iter [2681/3125], train_loss:0.133006
Epoch [1/2], Iter [2682/3125], train_loss:0.154428
Epoch [1/2], Iter [2683/3125], train_loss:0.146256
Epoch [1/2], Iter [2684/3125], train_loss:0.140180
Epoch [1/2], Iter [2685/3125], train_loss:0.150448
Epoch [1/2], Iter [2686/3125], train_loss:0.166966
Epoch [1/2], Iter [2687/3125], train_loss:0.163846
Epoch [1/2], Iter [2688/3125], train_loss:0.151998
Epoch [1/2], Iter [2689/3125], train_loss:0.177917
Epoch [1/2], Iter [2690/3125], train_loss:0.164405
Epoch [1/2], Iter [2691/3125], train_loss:0.149646
Epoch [1/2], Iter [2692/3125], train_loss:0.155895
Epoch [1/2], Iter [2693/3125], train_loss:0.133467
Epoch [1/2], Iter [2694/3125], train_loss:0.181978
Epoch [1/2], Iter [2695/3125], train_loss:0.178019
Epoch [1/2], Iter [2696/3125], train_loss:0.164970
Epoch [1/2], Iter [2697/3125], train_loss:0.153656
Epoch [1/2], Iter [2698/3125], train_loss:0.158283
Epoch [1/2], Iter [2699/3125], train_loss:0.166151
Epoch [1/2], Iter [2700/3125], train_loss:0.152899
Epoch [1/2], Iter [2701/3125], train_loss:0.150675
Epoch [1/2], Iter [2702/3125], train_loss:0.161370
Epoch [1/2], Iter [2703/3125], train_loss:0.162690
Epoch [1/2], Iter [2704/3125], train_loss:0.146854
Epoch [1/2], Iter [2705/3125], train_loss:0.168728
Epoch [1/2], Iter [2706/3125], train_loss:0.156361
Epoch [1/2], Iter [2707/3125], train_loss:0.162295
Epoch [1/2], Iter [2708/3125], train_loss:0.154698
Epoch [1/2], Iter [2709/3125], train_loss:0.162639
Epoch [1/2], Iter [2710/3125], train_loss:0.170419
Epoch [1/2], Iter [2711/3125], train_loss:0.182608
Epoch [1/2], Iter [2712/3125], train_loss:0.174881
Epoch [1/2], Iter [2713/3125], train_loss:0.163568
Epoch [1/2], Iter [2714/3125], train_loss:0.172464
Epoch [1/2], Iter [2715/3125], train_loss:0.152963
Epoch [1/2], Iter [2716/3125], train_loss:0.174935
Epoch [1/2], Iter [2717/3125], train_loss:0.163978
Epoch [1/2], Iter [2718/3125], train_loss:0.149811
Epoch [1/2], Iter [2719/3125], train_loss:0.168551
Epoch [1/2], Iter [2720/3125], train_loss:0.187687
Epoch [1/2], Iter [2721/3125], train_loss:0.170561
Epoch [1/2], Iter [2722/3125], train_loss:0.157643
Epoch [1/2], Iter [2723/3125], train_loss:0.183448
Epoch [1/2], Iter [2724/3125], train_loss:0.156940
Epoch [1/2], Iter [2725/3125], train_loss:0.176922
Epoch [1/2], Iter [2726/3125], train_loss:0.170941
Epoch [1/2], Iter [2727/3125], train_loss:0.161215
Epoch [1/2], Iter [2728/3125], train_loss:0.157638
Epoch [1/2], Iter [2729/3125], train_loss:0.146765
Epoch [1/2], Iter [2730/3125], train_loss:0.186415
Epoch [1/2], Iter [2731/3125], train_loss:0.179016
Epoch [1/2], Iter [2732/3125], train_loss:0.146862
Epoch [1/2], Iter [2733/3125], train_loss:0.160904
Epoch [1/2], Iter [2734/3125], train_loss:0.184066
Epoch [1/2], Iter [2735/3125], train_loss:0.170018
Epoch [1/2], Iter [2736/3125], train_loss:0.151466
Epoch [1/2], Iter [2737/3125], train_loss:0.155503
Epoch [1/2], Iter [2738/3125], train_loss:0.178504
Epoch [1/2], Iter [2739/3125], train_loss:0.182733
Epoch [1/2], Iter [2740/3125], train_loss:0.178885
Epoch [1/2], Iter [2741/3125], train_loss:0.158115
Epoch [1/2], Iter [2742/3125], train_loss:0.166074
Epoch [1/2], Iter [2743/3125], train_loss:0.175153
Epoch [1/2], Iter [2744/3125], train_loss:0.173695
Epoch [1/2], Iter [2745/3125], train_loss:0.140103
Epoch [1/2], Iter [2746/3125], train_loss:0.164165
Epoch [1/2], Iter [2747/3125], train_loss:0.195799
Epoch [1/2], Iter [2748/3125], train_loss:0.165051
Epoch [1/2], Iter [2749/3125], train_loss:0.168219
Epoch [1/2], Iter [2750/3125], train_loss:0.145761
Epoch [1/2], Iter [2751/3125], train_loss:0.184619
Epoch [1/2], Iter [2752/3125], train_loss:0.183593
Epoch [1/2], Iter [2753/3125], train_loss:0.161479
Epoch [1/2], Iter [2754/3125], train_loss:0.165525
Epoch [1/2], Iter [2755/3125], train_loss:0.152368
Epoch [1/2], Iter [2756/3125], train_loss:0.156252
Epoch [1/2], Iter [2757/3125], train_loss:0.160543
Epoch [1/2], Iter [2758/3125], train_loss:0.169057
Epoch [1/2], Iter [2759/3125], train_loss:0.185539
Epoch [1/2], Iter [2760/3125], train_loss:0.150664
Epoch [1/2], Iter [2761/3125], train_loss:0.168148
Epoch [1/2], Iter [2762/3125], train_loss:0.150886
Epoch [1/2], Iter [2763/3125], train_loss:0.153608
Epoch [1/2], Iter [2764/3125], train_loss:0.173608
Epoch [1/2], Iter [2765/3125], train_loss:0.156316
Epoch [1/2], Iter [2766/3125], train_loss:0.155580
Epoch [1/2], Iter [2767/3125], train_loss:0.170365
Epoch [1/2], Iter [2768/3125], train_loss:0.160952
Epoch [1/2], Iter [2769/3125], train_loss:0.178418
Epoch [1/2], Iter [2770/3125], train_loss:0.161754
Epoch [1/2], Iter [2771/3125], train_loss:0.175010
Epoch [1/2], Iter [2772/3125], train_loss:0.177170
Epoch [1/2], Iter [2773/3125], train_loss:0.156224
Epoch [1/2], Iter [2774/3125], train_loss:0.171853
Epoch [1/2], Iter [2775/3125], train_loss:0.175113
Epoch [1/2], Iter [2776/3125], train_loss:0.153226
Epoch [1/2], Iter [2777/3125], train_loss:0.167736
Epoch [1/2], Iter [2778/3125], train_loss:0.160811
Epoch [1/2], Iter [2779/3125], train_loss:0.174287
Epoch [1/2], Iter [2780/3125], train_loss:0.158126
Epoch [1/2], Iter [2781/3125], train_loss:0.170792
Epoch [1/2], Iter [2782/3125], train_loss:0.165518
Epoch [1/2], Iter [2783/3125], train_loss:0.162349
Epoch [1/2], Iter [2784/3125], train_loss:0.145470
Epoch [1/2], Iter [2785/3125], train_loss:0.159157
Epoch [1/2], Iter [2786/3125], train_loss:0.147954
Epoch [1/2], Iter [2787/3125], train_loss:0.170489
Epoch [1/2], Iter [2788/3125], train_loss:0.165043
Epoch [1/2], Iter [2789/3125], train_loss:0.163622
Epoch [1/2], Iter [2790/3125], train_loss:0.154899
Epoch [1/2], Iter [2791/3125], train_loss:0.160961
Epoch [1/2], Iter [2792/3125], train_loss:0.165133
Epoch [1/2], Iter [2793/3125], train_loss:0.183820
Epoch [1/2], Iter [2794/3125], train_loss:0.170000
Epoch [1/2], Iter [2795/3125], train_loss:0.164589
Epoch [1/2], Iter [2796/3125], train_loss:0.180219
Epoch [1/2], Iter [2797/3125], train_loss:0.144782
Epoch [1/2], Iter [2798/3125], train_loss:0.175786
Epoch [1/2], Iter [2799/3125], train_loss:0.128005
Epoch [1/2], Iter [2800/3125], train_loss:0.156003
Epoch [1/2], Iter [2801/3125], train_loss:0.151638
Epoch [1/2], Iter [2802/3125], train_loss:0.162846
Epoch [1/2], Iter [2803/3125], train_loss:0.162985
Epoch [1/2], Iter [2804/3125], train_loss:0.160361
Epoch [1/2], Iter [2805/3125], train_loss:0.151148
Epoch [1/2], Iter [2806/3125], train_loss:0.164542
Epoch [1/2], Iter [2807/3125], train_loss:0.142881
Epoch [1/2], Iter [2808/3125], train_loss:0.156098
Epoch [1/2], Iter [2809/3125], train_loss:0.133754
Epoch [1/2], Iter [2810/3125], train_loss:0.170719
Epoch [1/2], Iter [2811/3125], train_loss:0.149624
Epoch [1/2], Iter [2812/3125], train_loss:0.175666
Epoch [1/2], Iter [2813/3125], train_loss:0.178650
Epoch [1/2], Iter [2814/3125], train_loss:0.160231
Epoch [1/2], Iter [2815/3125], train_loss:0.181755
Epoch [1/2], Iter [2816/3125], train_loss:0.177022
Epoch [1/2], Iter [2817/3125], train_loss:0.143955
Epoch [1/2], Iter [2818/3125], train_loss:0.182202
Epoch [1/2], Iter [2819/3125], train_loss:0.156804
Epoch [1/2], Iter [2820/3125], train_loss:0.158852
Epoch [1/2], Iter [2821/3125], train_loss:0.159252
Epoch [1/2], Iter [2822/3125], train_loss:0.159138
Epoch [1/2], Iter [2823/3125], train_loss:0.158014
Epoch [1/2], Iter [2824/3125], train_loss:0.173861
Epoch [1/2], Iter [2825/3125], train_loss:0.163103
Epoch [1/2], Iter [2826/3125], train_loss:0.169961
Epoch [1/2], Iter [2827/3125], train_loss:0.160450
Epoch [1/2], Iter [2828/3125], train_loss:0.168754
Epoch [1/2], Iter [2829/3125], train_loss:0.145734
Epoch [1/2], Iter [2830/3125], train_loss:0.171105
Epoch [1/2], Iter [2831/3125], train_loss:0.149704
Epoch [1/2], Iter [2832/3125], train_loss:0.157235
Epoch [1/2], Iter [2833/3125], train_loss:0.168647
Epoch [1/2], Iter [2834/3125], train_loss:0.170278
Epoch [1/2], Iter [2835/3125], train_loss:0.164118
Epoch [1/2], Iter [2836/3125], train_loss:0.160487
Epoch [1/2], Iter [2837/3125], train_loss:0.170349
Epoch [1/2], Iter [2838/3125], train_loss:0.153062
Epoch [1/2], Iter [2839/3125], train_loss:0.179919
Epoch [1/2], Iter [2840/3125], train_loss:0.165033
Epoch [1/2], Iter [2841/3125], train_loss:0.159011
Epoch [1/2], Iter [2842/3125], train_loss:0.141699
Epoch [1/2], Iter [2843/3125], train_loss:0.155806
Epoch [1/2], Iter [2844/3125], train_loss:0.180037
Epoch [1/2], Iter [2845/3125], train_loss:0.172654
Epoch [1/2], Iter [2846/3125], train_loss:0.162126
Epoch [1/2], Iter [2847/3125], train_loss:0.174910
Epoch [1/2], Iter [2848/3125], train_loss:0.190180
Epoch [1/2], Iter [2849/3125], train_loss:0.167382
Epoch [1/2], Iter [2850/3125], train_loss:0.140893
Epoch [1/2], Iter [2851/3125], train_loss:0.169695
Epoch [1/2], Iter [2852/3125], train_loss:0.149698
Epoch [1/2], Iter [2853/3125], train_loss:0.150947
Epoch [1/2], Iter [2854/3125], train_loss:0.160250
Epoch [1/2], Iter [2855/3125], train_loss:0.167571
Epoch [1/2], Iter [2856/3125], train_loss:0.158384
Epoch [1/2], Iter [2857/3125], train_loss:0.137086
Epoch [1/2], Iter [2858/3125], train_loss:0.177784
Epoch [1/2], Iter [2859/3125], train_loss:0.172647
Epoch [1/2], Iter [2860/3125], train_loss:0.169255
Epoch [1/2], Iter [2861/3125], train_loss:0.169094
Epoch [1/2], Iter [2862/3125], train_loss:0.159690
Epoch [1/2], Iter [2863/3125], train_loss:0.162201
Epoch [1/2], Iter [2864/3125], train_loss:0.167594
Epoch [1/2], Iter [2865/3125], train_loss:0.167401
Epoch [1/2], Iter [2866/3125], train_loss:0.164989
Epoch [1/2], Iter [2867/3125], train_loss:0.138895
Epoch [1/2], Iter [2868/3125], train_loss:0.155665
Epoch [1/2], Iter [2869/3125], train_loss:0.178687
Epoch [1/2], Iter [2870/3125], train_loss:0.142473
Epoch [1/2], Iter [2871/3125], train_loss:0.167332
Epoch [1/2], Iter [2872/3125], train_loss:0.179365
Epoch [1/2], Iter [2873/3125], train_loss:0.167223
Epoch [1/2], Iter [2874/3125], train_loss:0.178953
Epoch [1/2], Iter [2875/3125], train_loss:0.157346
Epoch [1/2], Iter [2876/3125], train_loss:0.182048
Epoch [1/2], Iter [2877/3125], train_loss:0.172396
Epoch [1/2], Iter [2878/3125], train_loss:0.175423
Epoch [1/2], Iter [2879/3125], train_loss:0.161872
Epoch [1/2], Iter [2880/3125], train_loss:0.169045
Epoch [1/2], Iter [2881/3125], train_loss:0.169418
Epoch [1/2], Iter [2882/3125], train_loss:0.160182
Epoch [1/2], Iter [2883/3125], train_loss:0.186741
Epoch [1/2], Iter [2884/3125], train_loss:0.157193
Epoch [1/2], Iter [2885/3125], train_loss:0.138638
Epoch [1/2], Iter [2886/3125], train_loss:0.150510
Epoch [1/2], Iter [2887/3125], train_loss:0.176207
Epoch [1/2], Iter [2888/3125], train_loss:0.155249
Epoch [1/2], Iter [2889/3125], train_loss:0.159106
Epoch [1/2], Iter [2890/3125], train_loss:0.162412
Epoch [1/2], Iter [2891/3125], train_loss:0.152091
Epoch [1/2], Iter [2892/3125], train_loss:0.176883
Epoch [1/2], Iter [2893/3125], train_loss:0.146511
Epoch [1/2], Iter [2894/3125], train_loss:0.163757
Epoch [1/2], Iter [2895/3125], train_loss:0.160787
Epoch [1/2], Iter [2896/3125], train_loss:0.160858
Epoch [1/2], Iter [2897/3125], train_loss:0.155350
Epoch [1/2], Iter [2898/3125], train_loss:0.169348
Epoch [1/2], Iter [2899/3125], train_loss:0.144282
Epoch [1/2], Iter [2900/3125], train_loss:0.167706
Epoch [1/2], Iter [2901/3125], train_loss:0.182318
Epoch [1/2], Iter [2902/3125], train_loss:0.171248
Epoch [1/2], Iter [2903/3125], train_loss:0.165353
Epoch [1/2], Iter [2904/3125], train_loss:0.151637
Epoch [1/2], Iter [2905/3125], train_loss:0.161721
Epoch [1/2], Iter [2906/3125], train_loss:0.153006
Epoch [1/2], Iter [2907/3125], train_loss:0.161867
Epoch [1/2], Iter [2908/3125], train_loss:0.156607
Epoch [1/2], Iter [2909/3125], train_loss:0.178779
Epoch [1/2], Iter [2910/3125], train_loss:0.192463
Epoch [1/2], Iter [2911/3125], train_loss:0.148583
Epoch [1/2], Iter [2912/3125], train_loss:0.170696
Epoch [1/2], Iter [2913/3125], train_loss:0.168631
Epoch [1/2], Iter [2914/3125], train_loss:0.168608
Epoch [1/2], Iter [2915/3125], train_loss:0.166084
Epoch [1/2], Iter [2916/3125], train_loss:0.164468
Epoch [1/2], Iter [2917/3125], train_loss:0.154483
Epoch [1/2], Iter [2918/3125], train_loss:0.166607
Epoch [1/2], Iter [2919/3125], train_loss:0.175541
Epoch [1/2], Iter [2920/3125], train_loss:0.146106
Epoch [1/2], Iter [2921/3125], train_loss:0.186289
Epoch [1/2], Iter [2922/3125], train_loss:0.148206
Epoch [1/2], Iter [2923/3125], train_loss:0.180759
Epoch [1/2], Iter [2924/3125], train_loss:0.148458
Epoch [1/2], Iter [2925/3125], train_loss:0.153044
Epoch [1/2], Iter [2926/3125], train_loss:0.173843
Epoch [1/2], Iter [2927/3125], train_loss:0.173281
Epoch [1/2], Iter [2928/3125], train_loss:0.173701
Epoch [1/2], Iter [2929/3125], train_loss:0.165718
Epoch [1/2], Iter [2930/3125], train_loss:0.173092
Epoch [1/2], Iter [2931/3125], train_loss:0.171520
Epoch [1/2], Iter [2932/3125], train_loss:0.148433
Epoch [1/2], Iter [2933/3125], train_loss:0.149291
Epoch [1/2], Iter [2934/3125], train_loss:0.173039
Epoch [1/2], Iter [2935/3125], train_loss:0.167303
Epoch [1/2], Iter [2936/3125], train_loss:0.148045
Epoch [1/2], Iter [2937/3125], train_loss:0.160600
Epoch [1/2], Iter [2938/3125], train_loss:0.175791
Epoch [1/2], Iter [2939/3125], train_loss:0.170290
Epoch [1/2], Iter [2940/3125], train_loss:0.168750
Epoch [1/2], Iter [2941/3125], train_loss:0.174851
Epoch [1/2], Iter [2942/3125], train_loss:0.167067
Epoch [1/2], Iter [2943/3125], train_loss:0.147908
Epoch [1/2], Iter [2944/3125], train_loss:0.161702
Epoch [1/2], Iter [2945/3125], train_loss:0.166226
Epoch [1/2], Iter [2946/3125], train_loss:0.152965
Epoch [1/2], Iter [2947/3125], train_loss:0.151126
Epoch [1/2], Iter [2948/3125], train_loss:0.159228
Epoch [1/2], Iter [2949/3125], train_loss:0.147525
Epoch [1/2], Iter [2950/3125], train_loss:0.186010
Epoch [1/2], Iter [2951/3125], train_loss:0.144456
Epoch [1/2], Iter [2952/3125], train_loss:0.144571
Epoch [1/2], Iter [2953/3125], train_loss:0.149504
Epoch [1/2], Iter [2954/3125], train_loss:0.155754
Epoch [1/2], Iter [2955/3125], train_loss:0.157044
Epoch [1/2], Iter [2956/3125], train_loss:0.164638
Epoch [1/2], Iter [2957/3125], train_loss:0.161717
Epoch [1/2], Iter [2958/3125], train_loss:0.150048
Epoch [1/2], Iter [2959/3125], train_loss:0.161040
Epoch [1/2], Iter [2960/3125], train_loss:0.147002
Epoch [1/2], Iter [2961/3125], train_loss:0.168605
Epoch [1/2], Iter [2962/3125], train_loss:0.160989
Epoch [1/2], Iter [2963/3125], train_loss:0.179867
Epoch [1/2], Iter [2964/3125], train_loss:0.173219
Epoch [1/2], Iter [2965/3125], train_loss:0.166897
Epoch [1/2], Iter [2966/3125], train_loss:0.160661
Epoch [1/2], Iter [2967/3125], train_loss:0.161262
Epoch [1/2], Iter [2968/3125], train_loss:0.164723
Epoch [1/2], Iter [2969/3125], train_loss:0.142853
Epoch [1/2], Iter [2970/3125], train_loss:0.171715
Epoch [1/2], Iter [2971/3125], train_loss:0.158447
Epoch [1/2], Iter [2972/3125], train_loss:0.164181
Epoch [1/2], Iter [2973/3125], train_loss:0.177048
Epoch [1/2], Iter [2974/3125], train_loss:0.167190
Epoch [1/2], Iter [2975/3125], train_loss:0.158204
Epoch [1/2], Iter [2976/3125], train_loss:0.151028
Epoch [1/2], Iter [2977/3125], train_loss:0.162853
Epoch [1/2], Iter [2978/3125], train_loss:0.165735
Epoch [1/2], Iter [2979/3125], train_loss:0.173848
Epoch [1/2], Iter [2980/3125], train_loss:0.149452
Epoch [1/2], Iter [2981/3125], train_loss:0.152468
Epoch [1/2], Iter [2982/3125], train_loss:0.168138
Epoch [1/2], Iter [2983/3125], train_loss:0.163172
Epoch [1/2], Iter [2984/3125], train_loss:0.162576
Epoch [1/2], Iter [2985/3125], train_loss:0.188783
Epoch [1/2], Iter [2986/3125], train_loss:0.161452
Epoch [1/2], Iter [2987/3125], train_loss:0.136657
Epoch [1/2], Iter [2988/3125], train_loss:0.145196
Epoch [1/2], Iter [2989/3125], train_loss:0.183863
Epoch [1/2], Iter [2990/3125], train_loss:0.170865
Epoch [1/2], Iter [2991/3125], train_loss:0.155084
Epoch [1/2], Iter [2992/3125], train_loss:0.175260
Epoch [1/2], Iter [2993/3125], train_loss:0.177893
Epoch [1/2], Iter [2994/3125], train_loss:0.171074
Epoch [1/2], Iter [2995/3125], train_loss:0.166262
Epoch [1/2], Iter [2996/3125], train_loss:0.168631
Epoch [1/2], Iter [2997/3125], train_loss:0.142343
Epoch [1/2], Iter [2998/3125], train_loss:0.176656
Epoch [1/2], Iter [2999/3125], train_loss:0.181024
Epoch [1/2], Iter [3000/3125], train_loss:0.164563
Epoch [1/2], Iter [3001/3125], train_loss:0.181617
Epoch [1/2], Iter [3002/3125], train_loss:0.172865
Epoch [1/2], Iter [3003/3125], train_loss:0.179876
Epoch [1/2], Iter [3004/3125], train_loss:0.165719
Epoch [1/2], Iter [3005/3125], train_loss:0.177486
Epoch [1/2], Iter [3006/3125], train_loss:0.176950
Epoch [1/2], Iter [3007/3125], train_loss:0.178203
Epoch [1/2], Iter [3008/3125], train_loss:0.178196
Epoch [1/2], Iter [3009/3125], train_loss:0.171647
Epoch [1/2], Iter [3010/3125], train_loss:0.173414
Epoch [1/2], Iter [3011/3125], train_loss:0.164811
Epoch [1/2], Iter [3012/3125], train_loss:0.147020
Epoch [1/2], Iter [3013/3125], train_loss:0.166289
Epoch [1/2], Iter [3014/3125], train_loss:0.161090
Epoch [1/2], Iter [3015/3125], train_loss:0.162289
Epoch [1/2], Iter [3016/3125], train_loss:0.130393
Epoch [1/2], Iter [3017/3125], train_loss:0.132035
Epoch [1/2], Iter [3018/3125], train_loss:0.174404
Epoch [1/2], Iter [3019/3125], train_loss:0.157980
Epoch [1/2], Iter [3020/3125], train_loss:0.158861
Epoch [1/2], Iter [3021/3125], train_loss:0.182830
Epoch [1/2], Iter [3022/3125], train_loss:0.158150
Epoch [1/2], Iter [3023/3125], train_loss:0.156165
Epoch [1/2], Iter [3024/3125], train_loss:0.145425
Epoch [1/2], Iter [3025/3125], train_loss:0.176111
Epoch [1/2], Iter [3026/3125], train_loss:0.186718
Epoch [1/2], Iter [3027/3125], train_loss:0.150117
Epoch [1/2], Iter [3028/3125], train_loss:0.173456
Epoch [1/2], Iter [3029/3125], train_loss:0.156002
Epoch [1/2], Iter [3030/3125], train_loss:0.175069
Epoch [1/2], Iter [3031/3125], train_loss:0.150203
Epoch [1/2], Iter [3032/3125], train_loss:0.170119
Epoch [1/2], Iter [3033/3125], train_loss:0.161877
Epoch [1/2], Iter [3034/3125], train_loss:0.154505
Epoch [1/2], Iter [3035/3125], train_loss:0.170968
Epoch [1/2], Iter [3036/3125], train_loss:0.143941
Epoch [1/2], Iter [3037/3125], train_loss:0.171731
Epoch [1/2], Iter [3038/3125], train_loss:0.150052
Epoch [1/2], Iter [3039/3125], train_loss:0.155370
Epoch [1/2], Iter [3040/3125], train_loss:0.154070
Epoch [1/2], Iter [3041/3125], train_loss:0.169434
Epoch [1/2], Iter [3042/3125], train_loss:0.153931
Epoch [1/2], Iter [3043/3125], train_loss:0.167334
Epoch [1/2], Iter [3044/3125], train_loss:0.160416
Epoch [1/2], Iter [3045/3125], train_loss:0.161101
Epoch [1/2], Iter [3046/3125], train_loss:0.153652
Epoch [1/2], Iter [3047/3125], train_loss:0.166452
Epoch [1/2], Iter [3048/3125], train_loss:0.148719
Epoch [1/2], Iter [3049/3125], train_loss:0.153907
Epoch [1/2], Iter [3050/3125], train_loss:0.165748
Epoch [1/2], Iter [3051/3125], train_loss:0.177738
Epoch [1/2], Iter [3052/3125], train_loss:0.162658
Epoch [1/2], Iter [3053/3125], train_loss:0.157725
Epoch [1/2], Iter [3054/3125], train_loss:0.168763
Epoch [1/2], Iter [3055/3125], train_loss:0.169479
Epoch [1/2], Iter [3056/3125], train_loss:0.160464
Epoch [1/2], Iter [3057/3125], train_loss:0.165181
Epoch [1/2], Iter [3058/3125], train_loss:0.158833
Epoch [1/2], Iter [3059/3125], train_loss:0.174259
Epoch [1/2], Iter [3060/3125], train_loss:0.197122
Epoch [1/2], Iter [3061/3125], train_loss:0.157540
Epoch [1/2], Iter [3062/3125], train_loss:0.153574
Epoch [1/2], Iter [3063/3125], train_loss:0.158650
Epoch [1/2], Iter [3064/3125], train_loss:0.159368
Epoch [1/2], Iter [3065/3125], train_loss:0.126841
Epoch [1/2], Iter [3066/3125], train_loss:0.190723
Epoch [1/2], Iter [3067/3125], train_loss:0.161133
Epoch [1/2], Iter [3068/3125], train_loss:0.147794
Epoch [1/2], Iter [3069/3125], train_loss:0.154277
Epoch [1/2], Iter [3070/3125], train_loss:0.160044
Epoch [1/2], Iter [3071/3125], train_loss:0.157531
Epoch [1/2], Iter [3072/3125], train_loss:0.168389
Epoch [1/2], Iter [3073/3125], train_loss:0.172469
Epoch [1/2], Iter [3074/3125], train_loss:0.155994
Epoch [1/2], Iter [3075/3125], train_loss:0.147720
Epoch [1/2], Iter [3076/3125], train_loss:0.137509
Epoch [1/2], Iter [3077/3125], train_loss:0.181711
Epoch [1/2], Iter [3078/3125], train_loss:0.177348
Epoch [1/2], Iter [3079/3125], train_loss:0.148808
Epoch [1/2], Iter [3080/3125], train_loss:0.175595
Epoch [1/2], Iter [3081/3125], train_loss:0.165768
Epoch [1/2], Iter [3082/3125], train_loss:0.142488
Epoch [1/2], Iter [3083/3125], train_loss:0.147224
Epoch [1/2], Iter [3084/3125], train_loss:0.168570
Epoch [1/2], Iter [3085/3125], train_loss:0.155916
Epoch [1/2], Iter [3086/3125], train_loss:0.169448
Epoch [1/2], Iter [3087/3125], train_loss:0.148978
Epoch [1/2], Iter [3088/3125], train_loss:0.158718
Epoch [1/2], Iter [3089/3125], train_loss:0.139569
Epoch [1/2], Iter [3090/3125], train_loss:0.179602
Epoch [1/2], Iter [3091/3125], train_loss:0.172581
Epoch [1/2], Iter [3092/3125], train_loss:0.172989
Epoch [1/2], Iter [3093/3125], train_loss:0.174835
Epoch [1/2], Iter [3094/3125], train_loss:0.162024
Epoch [1/2], Iter [3095/3125], train_loss:0.149372
Epoch [1/2], Iter [3096/3125], train_loss:0.182143
Epoch [1/2], Iter [3097/3125], train_loss:0.173537
Epoch [1/2], Iter [3098/3125], train_loss:0.180467
Epoch [1/2], Iter [3099/3125], train_loss:0.138658
Epoch [1/2], Iter [3100/3125], train_loss:0.167943
Epoch [1/2], Iter [3101/3125], train_loss:0.179498
Epoch [1/2], Iter [3102/3125], train_loss:0.168319
Epoch [1/2], Iter [3103/3125], train_loss:0.159227
Epoch [1/2], Iter [3104/3125], train_loss:0.143851
Epoch [1/2], Iter [3105/3125], train_loss:0.162043
Epoch [1/2], Iter [3106/3125], train_loss:0.173713
Epoch [1/2], Iter [3107/3125], train_loss:0.160019
Epoch [1/2], Iter [3108/3125], train_loss:0.187196
Epoch [1/2], Iter [3109/3125], train_loss:0.178457
Epoch [1/2], Iter [3110/3125], train_loss:0.166758
Epoch [1/2], Iter [3111/3125], train_loss:0.162495
Epoch [1/2], Iter [3112/3125], train_loss:0.144868
Epoch [1/2], Iter [3113/3125], train_loss:0.170601
Epoch [1/2], Iter [3114/3125], train_loss:0.152794
Epoch [1/2], Iter [3115/3125], train_loss:0.166172
Epoch [1/2], Iter [3116/3125], train_loss:0.150413
Epoch [1/2], Iter [3117/3125], train_loss:0.146555
Epoch [1/2], Iter [3118/3125], train_loss:0.158817
Epoch [1/2], Iter [3119/3125], train_loss:0.179008
Epoch [1/2], Iter [3120/3125], train_loss:0.183372
Epoch [1/2], Iter [3121/3125], train_loss:0.165688
Epoch [1/2], Iter [3122/3125], train_loss:0.151766
Epoch [1/2], Iter [3123/3125], train_loss:0.147575
Epoch [1/2], Iter [3124/3125], train_loss:0.140461
Epoch [1/2], Iter [3125/3125], train_loss:0.166029
Epoch [1/2], train_loss:0.1625, train_acc:9.7080%, test_loss:0.1695, test_acc:10.6200%
Epoch [2/2], Iter [1/3125], train_loss:0.146202
Epoch [2/2], Iter [2/3125], train_loss:0.173672
Epoch [2/2], Iter [3/3125], train_loss:0.165151
Epoch [2/2], Iter [4/3125], train_loss:0.158770
Epoch [2/2], Iter [5/3125], train_loss:0.175999
Epoch [2/2], Iter [6/3125], train_loss:0.163998
Epoch [2/2], Iter [7/3125], train_loss:0.165410
Epoch [2/2], Iter [8/3125], train_loss:0.161637
Epoch [2/2], Iter [9/3125], train_loss:0.148239
Epoch [2/2], Iter [10/3125], train_loss:0.162426
Epoch [2/2], Iter [11/3125], train_loss:0.168900
Epoch [2/2], Iter [12/3125], train_loss:0.149848
Epoch [2/2], Iter [13/3125], train_loss:0.147608
Epoch [2/2], Iter [14/3125], train_loss:0.160673
Epoch [2/2], Iter [15/3125], train_loss:0.172021
Epoch [2/2], Iter [16/3125], train_loss:0.162101
Epoch [2/2], Iter [17/3125], train_loss:0.150480
Epoch [2/2], Iter [18/3125], train_loss:0.154776
Epoch [2/2], Iter [19/3125], train_loss:0.163754
Epoch [2/2], Iter [20/3125], train_loss:0.177525
Epoch [2/2], Iter [21/3125], train_loss:0.168097
Epoch [2/2], Iter [22/3125], train_loss:0.156192
Epoch [2/2], Iter [23/3125], train_loss:0.166126
Epoch [2/2], Iter [24/3125], train_loss:0.147863
Epoch [2/2], Iter [25/3125], train_loss:0.176202
Epoch [2/2], Iter [26/3125], train_loss:0.159570
Epoch [2/2], Iter [27/3125], train_loss:0.168702
Epoch [2/2], Iter [28/3125], train_loss:0.151392
Epoch [2/2], Iter [29/3125], train_loss:0.162362
Epoch [2/2], Iter [30/3125], train_loss:0.147167
Epoch [2/2], Iter [31/3125], train_loss:0.155992
Epoch [2/2], Iter [32/3125], train_loss:0.143932
Epoch [2/2], Iter [33/3125], train_loss:0.167568
Epoch [2/2], Iter [34/3125], train_loss:0.156876
Epoch [2/2], Iter [35/3125], train_loss:0.149783
Epoch [2/2], Iter [36/3125], train_loss:0.184439
Epoch [2/2], Iter [37/3125], train_loss:0.162946
Epoch [2/2], Iter [38/3125], train_loss:0.148541
Epoch [2/2], Iter [39/3125], train_loss:0.165627
Epoch [2/2], Iter [40/3125], train_loss:0.169342
Epoch [2/2], Iter [41/3125], train_loss:0.165507
Epoch [2/2], Iter [42/3125], train_loss:0.166825
Epoch [2/2], Iter [43/3125], train_loss:0.180178
Epoch [2/2], Iter [44/3125], train_loss:0.174066
Epoch [2/2], Iter [45/3125], train_loss:0.175319
Epoch [2/2], Iter [46/3125], train_loss:0.159672
Epoch [2/2], Iter [47/3125], train_loss:0.155855
Epoch [2/2], Iter [48/3125], train_loss:0.166862
Epoch [2/2], Iter [49/3125], train_loss:0.157197
Epoch [2/2], Iter [50/3125], train_loss:0.154708
Epoch [2/2], Iter [51/3125], train_loss:0.169141
Epoch [2/2], Iter [52/3125], train_loss:0.189146
Epoch [2/2], Iter [53/3125], train_loss:0.147940
Epoch [2/2], Iter [54/3125], train_loss:0.173229
Epoch [2/2], Iter [55/3125], train_loss:0.147851
Epoch [2/2], Iter [56/3125], train_loss:0.166568
Epoch [2/2], Iter [57/3125], train_loss:0.157517
Epoch [2/2], Iter [58/3125], train_loss:0.157088
Epoch [2/2], Iter [59/3125], train_loss:0.170904
Epoch [2/2], Iter [60/3125], train_loss:0.130077
Epoch [2/2], Iter [61/3125], train_loss:0.162462
Epoch [2/2], Iter [62/3125], train_loss:0.167202
Epoch [2/2], Iter [63/3125], train_loss:0.144449
Epoch [2/2], Iter [64/3125], train_loss:0.147543
Epoch [2/2], Iter [65/3125], train_loss:0.178345
Epoch [2/2], Iter [66/3125], train_loss:0.171756
Epoch [2/2], Iter [67/3125], train_loss:0.182125
Epoch [2/2], Iter [68/3125], train_loss:0.163568
Epoch [2/2], Iter [69/3125], train_loss:0.168720
Epoch [2/2], Iter [70/3125], train_loss:0.166233
Epoch [2/2], Iter [71/3125], train_loss:0.165497
Epoch [2/2], Iter [72/3125], train_loss:0.158568
Epoch [2/2], Iter [73/3125], train_loss:0.158017
Epoch [2/2], Iter [74/3125], train_loss:0.146704
Epoch [2/2], Iter [75/3125], train_loss:0.168960
Epoch [2/2], Iter [76/3125], train_loss:0.176339
Epoch [2/2], Iter [77/3125], train_loss:0.157601
Epoch [2/2], Iter [78/3125], train_loss:0.150234
Epoch [2/2], Iter [79/3125], train_loss:0.171131
Epoch [2/2], Iter [80/3125], train_loss:0.168470
Epoch [2/2], Iter [81/3125], train_loss:0.165504
Epoch [2/2], Iter [82/3125], train_loss:0.182929
Epoch [2/2], Iter [83/3125], train_loss:0.149121
Epoch [2/2], Iter [84/3125], train_loss:0.170251
Epoch [2/2], Iter [85/3125], train_loss:0.176452
Epoch [2/2], Iter [86/3125], train_loss:0.163143
Epoch [2/2], Iter [87/3125], train_loss:0.149888
Epoch [2/2], Iter [88/3125], train_loss:0.158223
Epoch [2/2], Iter [89/3125], train_loss:0.165219
Epoch [2/2], Iter [90/3125], train_loss:0.175566
Epoch [2/2], Iter [91/3125], train_loss:0.172680
Epoch [2/2], Iter [92/3125], train_loss:0.157610
Epoch [2/2], Iter [93/3125], train_loss:0.149683
Epoch [2/2], Iter [94/3125], train_loss:0.150491
Epoch [2/2], Iter [95/3125], train_loss:0.143823
Epoch [2/2], Iter [96/3125], train_loss:0.147380
Epoch [2/2], Iter [97/3125], train_loss:0.162991
Epoch [2/2], Iter [98/3125], train_loss:0.142088
Epoch [2/2], Iter [99/3125], train_loss:0.165098
Epoch [2/2], Iter [100/3125], train_loss:0.142414
Epoch [2/2], Iter [101/3125], train_loss:0.171030
Epoch [2/2], Iter [102/3125], train_loss:0.164070
Epoch [2/2], Iter [103/3125], train_loss:0.155812
Epoch [2/2], Iter [104/3125], train_loss:0.166394
Epoch [2/2], Iter [105/3125], train_loss:0.162388
Epoch [2/2], Iter [106/3125], train_loss:0.156700
Epoch [2/2], Iter [107/3125], train_loss:0.153787
Epoch [2/2], Iter [108/3125], train_loss:0.146724
Epoch [2/2], Iter [109/3125], train_loss:0.146993
Epoch [2/2], Iter [110/3125], train_loss:0.161078
Epoch [2/2], Iter [111/3125], train_loss:0.141862
Epoch [2/2], Iter [112/3125], train_loss:0.164413
Epoch [2/2], Iter [113/3125], train_loss:0.172509
Epoch [2/2], Iter [114/3125], train_loss:0.133704
Epoch [2/2], Iter [115/3125], train_loss:0.156570
Epoch [2/2], Iter [116/3125], train_loss:0.149274
Epoch [2/2], Iter [117/3125], train_loss:0.172428
Epoch [2/2], Iter [118/3125], train_loss:0.158011
Epoch [2/2], Iter [119/3125], train_loss:0.180269
Epoch [2/2], Iter [120/3125], train_loss:0.133947
Epoch [2/2], Iter [121/3125], train_loss:0.160919
Epoch [2/2], Iter [122/3125], train_loss:0.160910
Epoch [2/2], Iter [123/3125], train_loss:0.156073
Epoch [2/2], Iter [124/3125], train_loss:0.170647
Epoch [2/2], Iter [125/3125], train_loss:0.168909
Epoch [2/2], Iter [126/3125], train_loss:0.163942
Epoch [2/2], Iter [127/3125], train_loss:0.185147
Epoch [2/2], Iter [128/3125], train_loss:0.147694
Epoch [2/2], Iter [129/3125], train_loss:0.154867
Epoch [2/2], Iter [130/3125], train_loss:0.156400
Epoch [2/2], Iter [131/3125], train_loss:0.159859
Epoch [2/2], Iter [132/3125], train_loss:0.163676
Epoch [2/2], Iter [133/3125], train_loss:0.164885
Epoch [2/2], Iter [134/3125], train_loss:0.157290
Epoch [2/2], Iter [135/3125], train_loss:0.153076
Epoch [2/2], Iter [136/3125], train_loss:0.170953
Epoch [2/2], Iter [137/3125], train_loss:0.161285
Epoch [2/2], Iter [138/3125], train_loss:0.176708
Epoch [2/2], Iter [139/3125], train_loss:0.164216
Epoch [2/2], Iter [140/3125], train_loss:0.157998
Epoch [2/2], Iter [141/3125], train_loss:0.161874
Epoch [2/2], Iter [142/3125], train_loss:0.165788
Epoch [2/2], Iter [143/3125], train_loss:0.147918
Epoch [2/2], Iter [144/3125], train_loss:0.168310
Epoch [2/2], Iter [145/3125], train_loss:0.157749
Epoch [2/2], Iter [146/3125], train_loss:0.170075
Epoch [2/2], Iter [147/3125], train_loss:0.162752
Epoch [2/2], Iter [148/3125], train_loss:0.170934
Epoch [2/2], Iter [149/3125], train_loss:0.184253
Epoch [2/2], Iter [150/3125], train_loss:0.178670
Epoch [2/2], Iter [151/3125], train_loss:0.168679
Epoch [2/2], Iter [152/3125], train_loss:0.175516
Epoch [2/2], Iter [153/3125], train_loss:0.155538
Epoch [2/2], Iter [154/3125], train_loss:0.161324
Epoch [2/2], Iter [155/3125], train_loss:0.156795
Epoch [2/2], Iter [156/3125], train_loss:0.154852
Epoch [2/2], Iter [157/3125], train_loss:0.156921
Epoch [2/2], Iter [158/3125], train_loss:0.163482
Epoch [2/2], Iter [159/3125], train_loss:0.173362
Epoch [2/2], Iter [160/3125], train_loss:0.167319
Epoch [2/2], Iter [161/3125], train_loss:0.173615
Epoch [2/2], Iter [162/3125], train_loss:0.160354
Epoch [2/2], Iter [163/3125], train_loss:0.167696
Epoch [2/2], Iter [164/3125], train_loss:0.161250
Epoch [2/2], Iter [165/3125], train_loss:0.160384
Epoch [2/2], Iter [166/3125], train_loss:0.164563
Epoch [2/2], Iter [167/3125], train_loss:0.161137
Epoch [2/2], Iter [168/3125], train_loss:0.169574
Epoch [2/2], Iter [169/3125], train_loss:0.175531
Epoch [2/2], Iter [170/3125], train_loss:0.169590
Epoch [2/2], Iter [171/3125], train_loss:0.157394
Epoch [2/2], Iter [172/3125], train_loss:0.156446
Epoch [2/2], Iter [173/3125], train_loss:0.176099
Epoch [2/2], Iter [174/3125], train_loss:0.169188
Epoch [2/2], Iter [175/3125], train_loss:0.181089
Epoch [2/2], Iter [176/3125], train_loss:0.157710
Epoch [2/2], Iter [177/3125], train_loss:0.154907
Epoch [2/2], Iter [178/3125], train_loss:0.139118
Epoch [2/2], Iter [179/3125], train_loss:0.148639
Epoch [2/2], Iter [180/3125], train_loss:0.149552
Epoch [2/2], Iter [181/3125], train_loss:0.181338
Epoch [2/2], Iter [182/3125], train_loss:0.162902
Epoch [2/2], Iter [183/3125], train_loss:0.173415
Epoch [2/2], Iter [184/3125], train_loss:0.163751
Epoch [2/2], Iter [185/3125], train_loss:0.148597
Epoch [2/2], Iter [186/3125], train_loss:0.174917
Epoch [2/2], Iter [187/3125], train_loss:0.182508
Epoch [2/2], Iter [188/3125], train_loss:0.152830
Epoch [2/2], Iter [189/3125], train_loss:0.153870
Epoch [2/2], Iter [190/3125], train_loss:0.163149
Epoch [2/2], Iter [191/3125], train_loss:0.148616
Epoch [2/2], Iter [192/3125], train_loss:0.148913
Epoch [2/2], Iter [193/3125], train_loss:0.187292
Epoch [2/2], Iter [194/3125], train_loss:0.163163
Epoch [2/2], Iter [195/3125], train_loss:0.157831
Epoch [2/2], Iter [196/3125], train_loss:0.183797
Epoch [2/2], Iter [197/3125], train_loss:0.171313
Epoch [2/2], Iter [198/3125], train_loss:0.157854
Epoch [2/2], Iter [199/3125], train_loss:0.162880
Epoch [2/2], Iter [200/3125], train_loss:0.176139
Epoch [2/2], Iter [201/3125], train_loss:0.170941
Epoch [2/2], Iter [202/3125], train_loss:0.177162
Epoch [2/2], Iter [203/3125], train_loss:0.150648
Epoch [2/2], Iter [204/3125], train_loss:0.171486
Epoch [2/2], Iter [205/3125], train_loss:0.150289
Epoch [2/2], Iter [206/3125], train_loss:0.168230
Epoch [2/2], Iter [207/3125], train_loss:0.163843
Epoch [2/2], Iter [208/3125], train_loss:0.162255
Epoch [2/2], Iter [209/3125], train_loss:0.162224
Epoch [2/2], Iter [210/3125], train_loss:0.147608
Epoch [2/2], Iter [211/3125], train_loss:0.153870
Epoch [2/2], Iter [212/3125], train_loss:0.141862
Epoch [2/2], Iter [213/3125], train_loss:0.148429
Epoch [2/2], Iter [214/3125], train_loss:0.156956
Epoch [2/2], Iter [215/3125], train_loss:0.160064
Epoch [2/2], Iter [216/3125], train_loss:0.155396
Epoch [2/2], Iter [217/3125], train_loss:0.158974
Epoch [2/2], Iter [218/3125], train_loss:0.164166
Epoch [2/2], Iter [219/3125], train_loss:0.150157
Epoch [2/2], Iter [220/3125], train_loss:0.159278
Epoch [2/2], Iter [221/3125], train_loss:0.145524
Epoch [2/2], Iter [222/3125], train_loss:0.153799
Epoch [2/2], Iter [223/3125], train_loss:0.156198
Epoch [2/2], Iter [224/3125], train_loss:0.161148
Epoch [2/2], Iter [225/3125], train_loss:0.142585
Epoch [2/2], Iter [226/3125], train_loss:0.146489
Epoch [2/2], Iter [227/3125], train_loss:0.172975
Epoch [2/2], Iter [228/3125], train_loss:0.194386
Epoch [2/2], Iter [229/3125], train_loss:0.172534
Epoch [2/2], Iter [230/3125], train_loss:0.147119
Epoch [2/2], Iter [231/3125], train_loss:0.153974
Epoch [2/2], Iter [232/3125], train_loss:0.156483
Epoch [2/2], Iter [233/3125], train_loss:0.153530
Epoch [2/2], Iter [234/3125], train_loss:0.164038
Epoch [2/2], Iter [235/3125], train_loss:0.173976
Epoch [2/2], Iter [236/3125], train_loss:0.174818
Epoch [2/2], Iter [237/3125], train_loss:0.156790
Epoch [2/2], Iter [238/3125], train_loss:0.164833
Epoch [2/2], Iter [239/3125], train_loss:0.142041
Epoch [2/2], Iter [240/3125], train_loss:0.151814
Epoch [2/2], Iter [241/3125], train_loss:0.178047
Epoch [2/2], Iter [242/3125], train_loss:0.177161
Epoch [2/2], Iter [243/3125], train_loss:0.183264
Epoch [2/2], Iter [244/3125], train_loss:0.149528
Epoch [2/2], Iter [245/3125], train_loss:0.148756
Epoch [2/2], Iter [246/3125], train_loss:0.190471
Epoch [2/2], Iter [247/3125], train_loss:0.176104
Epoch [2/2], Iter [248/3125], train_loss:0.156350
Epoch [2/2], Iter [249/3125], train_loss:0.142632
Epoch [2/2], Iter [250/3125], train_loss:0.174584
Epoch [2/2], Iter [251/3125], train_loss:0.154501
Epoch [2/2], Iter [252/3125], train_loss:0.163151
Epoch [2/2], Iter [253/3125], train_loss:0.166830
Epoch [2/2], Iter [254/3125], train_loss:0.151940
Epoch [2/2], Iter [255/3125], train_loss:0.172570
Epoch [2/2], Iter [256/3125], train_loss:0.149426
Epoch [2/2], Iter [257/3125], train_loss:0.167744
Epoch [2/2], Iter [258/3125], train_loss:0.167243
Epoch [2/2], Iter [259/3125], train_loss:0.150426
Epoch [2/2], Iter [260/3125], train_loss:0.143742
Epoch [2/2], Iter [261/3125], train_loss:0.154619
Epoch [2/2], Iter [262/3125], train_loss:0.177493
Epoch [2/2], Iter [263/3125], train_loss:0.149127
Epoch [2/2], Iter [264/3125], train_loss:0.145748
Epoch [2/2], Iter [265/3125], train_loss:0.159908
Epoch [2/2], Iter [266/3125], train_loss:0.173237
Epoch [2/2], Iter [267/3125], train_loss:0.148302
Epoch [2/2], Iter [268/3125], train_loss:0.153039
Epoch [2/2], Iter [269/3125], train_loss:0.153943
Epoch [2/2], Iter [270/3125], train_loss:0.159962
Epoch [2/2], Iter [271/3125], train_loss:0.168486
Epoch [2/2], Iter [272/3125], train_loss:0.174194
Epoch [2/2], Iter [273/3125], train_loss:0.177417
Epoch [2/2], Iter [274/3125], train_loss:0.169610
Epoch [2/2], Iter [275/3125], train_loss:0.153916
Epoch [2/2], Iter [276/3125], train_loss:0.162009
Epoch [2/2], Iter [277/3125], train_loss:0.173930
Epoch [2/2], Iter [278/3125], train_loss:0.154844
Epoch [2/2], Iter [279/3125], train_loss:0.144510
Epoch [2/2], Iter [280/3125], train_loss:0.174670
Epoch [2/2], Iter [281/3125], train_loss:0.147663
Epoch [2/2], Iter [282/3125], train_loss:0.161231
Epoch [2/2], Iter [283/3125], train_loss:0.164567
Epoch [2/2], Iter [284/3125], train_loss:0.148298
Epoch [2/2], Iter [285/3125], train_loss:0.174240
Epoch [2/2], Iter [286/3125], train_loss:0.151915
Epoch [2/2], Iter [287/3125], train_loss:0.164254
Epoch [2/2], Iter [288/3125], train_loss:0.174495
Epoch [2/2], Iter [289/3125], train_loss:0.142919
Epoch [2/2], Iter [290/3125], train_loss:0.164818
Epoch [2/2], Iter [291/3125], train_loss:0.148046
Epoch [2/2], Iter [292/3125], train_loss:0.133363
Epoch [2/2], Iter [293/3125], train_loss:0.160022
Epoch [2/2], Iter [294/3125], train_loss:0.155773
Epoch [2/2], Iter [295/3125], train_loss:0.176180
Epoch [2/2], Iter [296/3125], train_loss:0.164451
Epoch [2/2], Iter [297/3125], train_loss:0.167795
Epoch [2/2], Iter [298/3125], train_loss:0.165779
Epoch [2/2], Iter [299/3125], train_loss:0.176171
Epoch [2/2], Iter [300/3125], train_loss:0.171345
Epoch [2/2], Iter [301/3125], train_loss:0.184329
Epoch [2/2], Iter [302/3125], train_loss:0.172903
Epoch [2/2], Iter [303/3125], train_loss:0.178375
Epoch [2/2], Iter [304/3125], train_loss:0.155158
Epoch [2/2], Iter [305/3125], train_loss:0.171172
Epoch [2/2], Iter [306/3125], train_loss:0.154146
Epoch [2/2], Iter [307/3125], train_loss:0.162431
Epoch [2/2], Iter [308/3125], train_loss:0.163887
Epoch [2/2], Iter [309/3125], train_loss:0.174687
Epoch [2/2], Iter [310/3125], train_loss:0.165460
Epoch [2/2], Iter [311/3125], train_loss:0.181555
Epoch [2/2], Iter [312/3125], train_loss:0.150162
Epoch [2/2], Iter [313/3125], train_loss:0.153412
Epoch [2/2], Iter [314/3125], train_loss:0.149629
Epoch [2/2], Iter [315/3125], train_loss:0.158892
Epoch [2/2], Iter [316/3125], train_loss:0.156130
Epoch [2/2], Iter [317/3125], train_loss:0.187546
Epoch [2/2], Iter [318/3125], train_loss:0.153912
Epoch [2/2], Iter [319/3125], train_loss:0.151770
Epoch [2/2], Iter [320/3125], train_loss:0.176303
Epoch [2/2], Iter [321/3125], train_loss:0.167846
Epoch [2/2], Iter [322/3125], train_loss:0.150853
Epoch [2/2], Iter [323/3125], train_loss:0.174334
Epoch [2/2], Iter [324/3125], train_loss:0.152363
Epoch [2/2], Iter [325/3125], train_loss:0.182887
Epoch [2/2], Iter [326/3125], train_loss:0.149897
Epoch [2/2], Iter [327/3125], train_loss:0.170501
Epoch [2/2], Iter [328/3125], train_loss:0.186834
Epoch [2/2], Iter [329/3125], train_loss:0.163417
Epoch [2/2], Iter [330/3125], train_loss:0.182607
Epoch [2/2], Iter [331/3125], train_loss:0.167527
Epoch [2/2], Iter [332/3125], train_loss:0.171005
Epoch [2/2], Iter [333/3125], train_loss:0.162520
Epoch [2/2], Iter [334/3125], train_loss:0.160567
Epoch [2/2], Iter [335/3125], train_loss:0.165600
Epoch [2/2], Iter [336/3125], train_loss:0.155164
Epoch [2/2], Iter [337/3125], train_loss:0.175315
Epoch [2/2], Iter [338/3125], train_loss:0.171219
Epoch [2/2], Iter [339/3125], train_loss:0.162644
Epoch [2/2], Iter [340/3125], train_loss:0.159048
Epoch [2/2], Iter [341/3125], train_loss:0.162782
Epoch [2/2], Iter [342/3125], train_loss:0.165438
Epoch [2/2], Iter [343/3125], train_loss:0.153910
Epoch [2/2], Iter [344/3125], train_loss:0.174372
Epoch [2/2], Iter [345/3125], train_loss:0.177340
Epoch [2/2], Iter [346/3125], train_loss:0.177186
Epoch [2/2], Iter [347/3125], train_loss:0.163347
Epoch [2/2], Iter [348/3125], train_loss:0.164975
Epoch [2/2], Iter [349/3125], train_loss:0.202241
Epoch [2/2], Iter [350/3125], train_loss:0.176461
Epoch [2/2], Iter [351/3125], train_loss:0.155909
Epoch [2/2], Iter [352/3125], train_loss:0.161746
Epoch [2/2], Iter [353/3125], train_loss:0.161433
Epoch [2/2], Iter [354/3125], train_loss:0.161199
Epoch [2/2], Iter [355/3125], train_loss:0.176037
Epoch [2/2], Iter [356/3125], train_loss:0.165718
Epoch [2/2], Iter [357/3125], train_loss:0.144140
Epoch [2/2], Iter [358/3125], train_loss:0.142182
Epoch [2/2], Iter [359/3125], train_loss:0.151589
Epoch [2/2], Iter [360/3125], train_loss:0.170065
Epoch [2/2], Iter [361/3125], train_loss:0.155288
Epoch [2/2], Iter [362/3125], train_loss:0.153488
Epoch [2/2], Iter [363/3125], train_loss:0.156576
Epoch [2/2], Iter [364/3125], train_loss:0.161076
Epoch [2/2], Iter [365/3125], train_loss:0.161203
Epoch [2/2], Iter [366/3125], train_loss:0.164802
Epoch [2/2], Iter [367/3125], train_loss:0.166324
Epoch [2/2], Iter [368/3125], train_loss:0.178081
Epoch [2/2], Iter [369/3125], train_loss:0.144357
Epoch [2/2], Iter [370/3125], train_loss:0.174453
Epoch [2/2], Iter [371/3125], train_loss:0.168766
Epoch [2/2], Iter [372/3125], train_loss:0.147773
Epoch [2/2], Iter [373/3125], train_loss:0.143407
Epoch [2/2], Iter [374/3125], train_loss:0.154440
Epoch [2/2], Iter [375/3125], train_loss:0.144308
Epoch [2/2], Iter [376/3125], train_loss:0.146517
Epoch [2/2], Iter [377/3125], train_loss:0.168994
Epoch [2/2], Iter [378/3125], train_loss:0.155020
Epoch [2/2], Iter [379/3125], train_loss:0.136322
Epoch [2/2], Iter [380/3125], train_loss:0.165164
Epoch [2/2], Iter [381/3125], train_loss:0.165966
Epoch [2/2], Iter [382/3125], train_loss:0.149831
Epoch [2/2], Iter [383/3125], train_loss:0.153939
Epoch [2/2], Iter [384/3125], train_loss:0.150713
Epoch [2/2], Iter [385/3125], train_loss:0.149525
Epoch [2/2], Iter [386/3125], train_loss:0.186537
Epoch [2/2], Iter [387/3125], train_loss:0.155550
Epoch [2/2], Iter [388/3125], train_loss:0.130376
Epoch [2/2], Iter [389/3125], train_loss:0.168143
Epoch [2/2], Iter [390/3125], train_loss:0.153200
Epoch [2/2], Iter [391/3125], train_loss:0.156268
Epoch [2/2], Iter [392/3125], train_loss:0.138514
Epoch [2/2], Iter [393/3125], train_loss:0.186347
Epoch [2/2], Iter [394/3125], train_loss:0.167708
Epoch [2/2], Iter [395/3125], train_loss:0.156236
Epoch [2/2], Iter [396/3125], train_loss:0.161214
Epoch [2/2], Iter [397/3125], train_loss:0.164827
Epoch [2/2], Iter [398/3125], train_loss:0.162833
Epoch [2/2], Iter [399/3125], train_loss:0.148330
Epoch [2/2], Iter [400/3125], train_loss:0.151244
Epoch [2/2], Iter [401/3125], train_loss:0.175030
Epoch [2/2], Iter [402/3125], train_loss:0.167966
Epoch [2/2], Iter [403/3125], train_loss:0.174009
Epoch [2/2], Iter [404/3125], train_loss:0.133342
Epoch [2/2], Iter [405/3125], train_loss:0.160444
Epoch [2/2], Iter [406/3125], train_loss:0.173592
Epoch [2/2], Iter [407/3125], train_loss:0.176786
Epoch [2/2], Iter [408/3125], train_loss:0.161026
Epoch [2/2], Iter [409/3125], train_loss:0.173543
Epoch [2/2], Iter [410/3125], train_loss:0.147663
Epoch [2/2], Iter [411/3125], train_loss:0.170319
Epoch [2/2], Iter [412/3125], train_loss:0.185939
Epoch [2/2], Iter [413/3125], train_loss:0.165246
Epoch [2/2], Iter [414/3125], train_loss:0.185627
Epoch [2/2], Iter [415/3125], train_loss:0.131443
Epoch [2/2], Iter [416/3125], train_loss:0.150096
Epoch [2/2], Iter [417/3125], train_loss:0.143120
Epoch [2/2], Iter [418/3125], train_loss:0.193534
Epoch [2/2], Iter [419/3125], train_loss:0.152602
Epoch [2/2], Iter [420/3125], train_loss:0.170830
Epoch [2/2], Iter [421/3125], train_loss:0.174716
Epoch [2/2], Iter [422/3125], train_loss:0.174463
Epoch [2/2], Iter [423/3125], train_loss:0.169858
Epoch [2/2], Iter [424/3125], train_loss:0.142388
Epoch [2/2], Iter [425/3125], train_loss:0.172615
Epoch [2/2], Iter [426/3125], train_loss:0.177286
Epoch [2/2], Iter [427/3125], train_loss:0.167227
Epoch [2/2], Iter [428/3125], train_loss:0.157363
Epoch [2/2], Iter [429/3125], train_loss:0.166516
Epoch [2/2], Iter [430/3125], train_loss:0.172313
Epoch [2/2], Iter [431/3125], train_loss:0.161539
Epoch [2/2], Iter [432/3125], train_loss:0.163159
Epoch [2/2], Iter [433/3125], train_loss:0.144575
Epoch [2/2], Iter [434/3125], train_loss:0.168186
Epoch [2/2], Iter [435/3125], train_loss:0.157193
Epoch [2/2], Iter [436/3125], train_loss:0.161615
Epoch [2/2], Iter [437/3125], train_loss:0.169740
Epoch [2/2], Iter [438/3125], train_loss:0.165369
Epoch [2/2], Iter [439/3125], train_loss:0.171216
Epoch [2/2], Iter [440/3125], train_loss:0.162590
Epoch [2/2], Iter [441/3125], train_loss:0.185242
Epoch [2/2], Iter [442/3125], train_loss:0.161350
Epoch [2/2], Iter [443/3125], train_loss:0.160137
Epoch [2/2], Iter [444/3125], train_loss:0.151255
Epoch [2/2], Iter [445/3125], train_loss:0.174243
Epoch [2/2], Iter [446/3125], train_loss:0.163636
Epoch [2/2], Iter [447/3125], train_loss:0.155706
Epoch [2/2], Iter [448/3125], train_loss:0.165992
Epoch [2/2], Iter [449/3125], train_loss:0.157281
Epoch [2/2], Iter [450/3125], train_loss:0.180386
Epoch [2/2], Iter [451/3125], train_loss:0.180637
Epoch [2/2], Iter [452/3125], train_loss:0.159181
Epoch [2/2], Iter [453/3125], train_loss:0.167303
Epoch [2/2], Iter [454/3125], train_loss:0.161755
Epoch [2/2], Iter [455/3125], train_loss:0.154677
Epoch [2/2], Iter [456/3125], train_loss:0.167636
Epoch [2/2], Iter [457/3125], train_loss:0.180807
Epoch [2/2], Iter [458/3125], train_loss:0.139945
Epoch [2/2], Iter [459/3125], train_loss:0.165975
Epoch [2/2], Iter [460/3125], train_loss:0.153326
Epoch [2/2], Iter [461/3125], train_loss:0.187807
Epoch [2/2], Iter [462/3125], train_loss:0.166080
Epoch [2/2], Iter [463/3125], train_loss:0.164084
Epoch [2/2], Iter [464/3125], train_loss:0.178732
Epoch [2/2], Iter [465/3125], train_loss:0.139112
Epoch [2/2], Iter [466/3125], train_loss:0.154262
Epoch [2/2], Iter [467/3125], train_loss:0.156984
Epoch [2/2], Iter [468/3125], train_loss:0.153696
Epoch [2/2], Iter [469/3125], train_loss:0.167890
Epoch [2/2], Iter [470/3125], train_loss:0.146530
Epoch [2/2], Iter [471/3125], train_loss:0.173568
Epoch [2/2], Iter [472/3125], train_loss:0.172920
Epoch [2/2], Iter [473/3125], train_loss:0.172191
Epoch [2/2], Iter [474/3125], train_loss:0.177066
Epoch [2/2], Iter [475/3125], train_loss:0.166096
Epoch [2/2], Iter [476/3125], train_loss:0.145177
Epoch [2/2], Iter [477/3125], train_loss:0.154965
Epoch [2/2], Iter [478/3125], train_loss:0.154901
Epoch [2/2], Iter [479/3125], train_loss:0.161373
Epoch [2/2], Iter [480/3125], train_loss:0.164218
Epoch [2/2], Iter [481/3125], train_loss:0.163394
Epoch [2/2], Iter [482/3125], train_loss:0.179960
Epoch [2/2], Iter [483/3125], train_loss:0.152761
Epoch [2/2], Iter [484/3125], train_loss:0.148085
Epoch [2/2], Iter [485/3125], train_loss:0.158129
Epoch [2/2], Iter [486/3125], train_loss:0.164468
Epoch [2/2], Iter [487/3125], train_loss:0.140875
Epoch [2/2], Iter [488/3125], train_loss:0.153898
Epoch [2/2], Iter [489/3125], train_loss:0.183179
Epoch [2/2], Iter [490/3125], train_loss:0.159658
Epoch [2/2], Iter [491/3125], train_loss:0.149543
Epoch [2/2], Iter [492/3125], train_loss:0.162857
Epoch [2/2], Iter [493/3125], train_loss:0.146819
Epoch [2/2], Iter [494/3125], train_loss:0.162568
Epoch [2/2], Iter [495/3125], train_loss:0.186698
Epoch [2/2], Iter [496/3125], train_loss:0.152870
Epoch [2/2], Iter [497/3125], train_loss:0.160796
Epoch [2/2], Iter [498/3125], train_loss:0.150789
Epoch [2/2], Iter [499/3125], train_loss:0.143901
Epoch [2/2], Iter [500/3125], train_loss:0.146307
Epoch [2/2], Iter [501/3125], train_loss:0.156505
Epoch [2/2], Iter [502/3125], train_loss:0.170537
Epoch [2/2], Iter [503/3125], train_loss:0.165219
Epoch [2/2], Iter [504/3125], train_loss:0.131376
Epoch [2/2], Iter [505/3125], train_loss:0.150592
Epoch [2/2], Iter [506/3125], train_loss:0.154510
Epoch [2/2], Iter [507/3125], train_loss:0.185317
Epoch [2/2], Iter [508/3125], train_loss:0.155880
Epoch [2/2], Iter [509/3125], train_loss:0.166343
Epoch [2/2], Iter [510/3125], train_loss:0.170775
Epoch [2/2], Iter [511/3125], train_loss:0.158124
Epoch [2/2], Iter [512/3125], train_loss:0.162436
Epoch [2/2], Iter [513/3125], train_loss:0.171975
Epoch [2/2], Iter [514/3125], train_loss:0.158008
Epoch [2/2], Iter [515/3125], train_loss:0.180108
Epoch [2/2], Iter [516/3125], train_loss:0.166079
Epoch [2/2], Iter [517/3125], train_loss:0.187777
Epoch [2/2], Iter [518/3125], train_loss:0.179959
Epoch [2/2], Iter [519/3125], train_loss:0.174720
Epoch [2/2], Iter [520/3125], train_loss:0.159333
Epoch [2/2], Iter [521/3125], train_loss:0.170574
Epoch [2/2], Iter [522/3125], train_loss:0.162373
Epoch [2/2], Iter [523/3125], train_loss:0.165549
Epoch [2/2], Iter [524/3125], train_loss:0.171584
Epoch [2/2], Iter [525/3125], train_loss:0.174756
Epoch [2/2], Iter [526/3125], train_loss:0.161434
Epoch [2/2], Iter [527/3125], train_loss:0.168083
Epoch [2/2], Iter [528/3125], train_loss:0.167138
Epoch [2/2], Iter [529/3125], train_loss:0.140973
Epoch [2/2], Iter [530/3125], train_loss:0.159618
Epoch [2/2], Iter [531/3125], train_loss:0.176200
Epoch [2/2], Iter [532/3125], train_loss:0.162572
Epoch [2/2], Iter [533/3125], train_loss:0.168972
Epoch [2/2], Iter [534/3125], train_loss:0.173325
Epoch [2/2], Iter [535/3125], train_loss:0.163866
Epoch [2/2], Iter [536/3125], train_loss:0.163720
Epoch [2/2], Iter [537/3125], train_loss:0.168137
Epoch [2/2], Iter [538/3125], train_loss:0.175345
Epoch [2/2], Iter [539/3125], train_loss:0.158390
Epoch [2/2], Iter [540/3125], train_loss:0.159162
Epoch [2/2], Iter [541/3125], train_loss:0.144704
Epoch [2/2], Iter [542/3125], train_loss:0.149428
Epoch [2/2], Iter [543/3125], train_loss:0.158572
Epoch [2/2], Iter [544/3125], train_loss:0.172126
Epoch [2/2], Iter [545/3125], train_loss:0.176276
Epoch [2/2], Iter [546/3125], train_loss:0.177032
Epoch [2/2], Iter [547/3125], train_loss:0.173978
Epoch [2/2], Iter [548/3125], train_loss:0.164149
Epoch [2/2], Iter [549/3125], train_loss:0.160977
Epoch [2/2], Iter [550/3125], train_loss:0.141250
Epoch [2/2], Iter [551/3125], train_loss:0.167351
Epoch [2/2], Iter [552/3125], train_loss:0.154863
Epoch [2/2], Iter [553/3125], train_loss:0.176878
Epoch [2/2], Iter [554/3125], train_loss:0.152597
Epoch [2/2], Iter [555/3125], train_loss:0.173390
Epoch [2/2], Iter [556/3125], train_loss:0.163720
Epoch [2/2], Iter [557/3125], train_loss:0.160260
Epoch [2/2], Iter [558/3125], train_loss:0.178257
Epoch [2/2], Iter [559/3125], train_loss:0.175589
Epoch [2/2], Iter [560/3125], train_loss:0.148475
Epoch [2/2], Iter [561/3125], train_loss:0.173594
Epoch [2/2], Iter [562/3125], train_loss:0.165406
Epoch [2/2], Iter [563/3125], train_loss:0.171584
Epoch [2/2], Iter [564/3125], train_loss:0.167694
Epoch [2/2], Iter [565/3125], train_loss:0.163094
Epoch [2/2], Iter [566/3125], train_loss:0.157451
Epoch [2/2], Iter [567/3125], train_loss:0.163195
Epoch [2/2], Iter [568/3125], train_loss:0.145743
Epoch [2/2], Iter [569/3125], train_loss:0.165041
Epoch [2/2], Iter [570/3125], train_loss:0.155912
Epoch [2/2], Iter [571/3125], train_loss:0.150290
Epoch [2/2], Iter [572/3125], train_loss:0.162542
Epoch [2/2], Iter [573/3125], train_loss:0.147671
Epoch [2/2], Iter [574/3125], train_loss:0.153121
Epoch [2/2], Iter [575/3125], train_loss:0.151718
Epoch [2/2], Iter [576/3125], train_loss:0.167825
Epoch [2/2], Iter [577/3125], train_loss:0.148835
Epoch [2/2], Iter [578/3125], train_loss:0.151512
Epoch [2/2], Iter [579/3125], train_loss:0.187779
Epoch [2/2], Iter [580/3125], train_loss:0.157333
Epoch [2/2], Iter [581/3125], train_loss:0.165742
Epoch [2/2], Iter [582/3125], train_loss:0.167597
Epoch [2/2], Iter [583/3125], train_loss:0.163270
Epoch [2/2], Iter [584/3125], train_loss:0.144670
Epoch [2/2], Iter [585/3125], train_loss:0.149435
Epoch [2/2], Iter [586/3125], train_loss:0.170580
Epoch [2/2], Iter [587/3125], train_loss:0.160914
Epoch [2/2], Iter [588/3125], train_loss:0.151355
Epoch [2/2], Iter [589/3125], train_loss:0.167059
Epoch [2/2], Iter [590/3125], train_loss:0.151443
Epoch [2/2], Iter [591/3125], train_loss:0.147637
Epoch [2/2], Iter [592/3125], train_loss:0.173933
Epoch [2/2], Iter [593/3125], train_loss:0.157407
Epoch [2/2], Iter [594/3125], train_loss:0.169269
Epoch [2/2], Iter [595/3125], train_loss:0.155772
Epoch [2/2], Iter [596/3125], train_loss:0.189058
Epoch [2/2], Iter [597/3125], train_loss:0.147937
Epoch [2/2], Iter [598/3125], train_loss:0.179247
Epoch [2/2], Iter [599/3125], train_loss:0.167485
Epoch [2/2], Iter [600/3125], train_loss:0.153575
Epoch [2/2], Iter [601/3125], train_loss:0.143053
Epoch [2/2], Iter [602/3125], train_loss:0.150471
Epoch [2/2], Iter [603/3125], train_loss:0.143764
Epoch [2/2], Iter [604/3125], train_loss:0.161357
Epoch [2/2], Iter [605/3125], train_loss:0.177912
Epoch [2/2], Iter [606/3125], train_loss:0.193015
Epoch [2/2], Iter [607/3125], train_loss:0.165355
Epoch [2/2], Iter [608/3125], train_loss:0.160645
Epoch [2/2], Iter [609/3125], train_loss:0.153148
Epoch [2/2], Iter [610/3125], train_loss:0.161745
Epoch [2/2], Iter [611/3125], train_loss:0.177804
Epoch [2/2], Iter [612/3125], train_loss:0.169567
Epoch [2/2], Iter [613/3125], train_loss:0.163330
Epoch [2/2], Iter [614/3125], train_loss:0.156796
Epoch [2/2], Iter [615/3125], train_loss:0.176123
Epoch [2/2], Iter [616/3125], train_loss:0.154425
Epoch [2/2], Iter [617/3125], train_loss:0.152680
Epoch [2/2], Iter [618/3125], train_loss:0.150936
Epoch [2/2], Iter [619/3125], train_loss:0.174734
Epoch [2/2], Iter [620/3125], train_loss:0.164248
Epoch [2/2], Iter [621/3125], train_loss:0.154376
Epoch [2/2], Iter [622/3125], train_loss:0.181289
Epoch [2/2], Iter [623/3125], train_loss:0.154710
Epoch [2/2], Iter [624/3125], train_loss:0.173619
Epoch [2/2], Iter [625/3125], train_loss:0.160207
Epoch [2/2], Iter [626/3125], train_loss:0.164651
Epoch [2/2], Iter [627/3125], train_loss:0.168672
Epoch [2/2], Iter [628/3125], train_loss:0.152033
Epoch [2/2], Iter [629/3125], train_loss:0.145318
Epoch [2/2], Iter [630/3125], train_loss:0.153201
Epoch [2/2], Iter [631/3125], train_loss:0.136641
Epoch [2/2], Iter [632/3125], train_loss:0.165298
Epoch [2/2], Iter [633/3125], train_loss:0.146980
Epoch [2/2], Iter [634/3125], train_loss:0.157089
Epoch [2/2], Iter [635/3125], train_loss:0.153481
Epoch [2/2], Iter [636/3125], train_loss:0.180023
Epoch [2/2], Iter [637/3125], train_loss:0.177965
Epoch [2/2], Iter [638/3125], train_loss:0.168382
Epoch [2/2], Iter [639/3125], train_loss:0.170590
Epoch [2/2], Iter [640/3125], train_loss:0.146684
Epoch [2/2], Iter [641/3125], train_loss:0.154656
Epoch [2/2], Iter [642/3125], train_loss:0.148962
Epoch [2/2], Iter [643/3125], train_loss:0.162826
Epoch [2/2], Iter [644/3125], train_loss:0.154299
Epoch [2/2], Iter [645/3125], train_loss:0.140432
Epoch [2/2], Iter [646/3125], train_loss:0.169591
Epoch [2/2], Iter [647/3125], train_loss:0.160964
Epoch [2/2], Iter [648/3125], train_loss:0.163820
Epoch [2/2], Iter [649/3125], train_loss:0.180686
Epoch [2/2], Iter [650/3125], train_loss:0.149200
Epoch [2/2], Iter [651/3125], train_loss:0.165878
Epoch [2/2], Iter [652/3125], train_loss:0.153168
Epoch [2/2], Iter [653/3125], train_loss:0.158429
Epoch [2/2], Iter [654/3125], train_loss:0.164462
Epoch [2/2], Iter [655/3125], train_loss:0.173659
Epoch [2/2], Iter [656/3125], train_loss:0.158212
Epoch [2/2], Iter [657/3125], train_loss:0.147685
Epoch [2/2], Iter [658/3125], train_loss:0.165053
Epoch [2/2], Iter [659/3125], train_loss:0.147815
Epoch [2/2], Iter [660/3125], train_loss:0.156994
Epoch [2/2], Iter [661/3125], train_loss:0.166037
Epoch [2/2], Iter [662/3125], train_loss:0.172137
Epoch [2/2], Iter [663/3125], train_loss:0.164935
Epoch [2/2], Iter [664/3125], train_loss:0.135215
Epoch [2/2], Iter [665/3125], train_loss:0.158562
Epoch [2/2], Iter [666/3125], train_loss:0.160104
Epoch [2/2], Iter [667/3125], train_loss:0.151053
Epoch [2/2], Iter [668/3125], train_loss:0.170116
Epoch [2/2], Iter [669/3125], train_loss:0.137139
Epoch [2/2], Iter [670/3125], train_loss:0.157071
Epoch [2/2], Iter [671/3125], train_loss:0.188446
Epoch [2/2], Iter [672/3125], train_loss:0.161760
Epoch [2/2], Iter [673/3125], train_loss:0.155279
Epoch [2/2], Iter [674/3125], train_loss:0.179824
Epoch [2/2], Iter [675/3125], train_loss:0.167790
Epoch [2/2], Iter [676/3125], train_loss:0.146095
Epoch [2/2], Iter [677/3125], train_loss:0.177003
Epoch [2/2], Iter [678/3125], train_loss:0.148537
Epoch [2/2], Iter [679/3125], train_loss:0.152893
Epoch [2/2], Iter [680/3125], train_loss:0.159080
Epoch [2/2], Iter [681/3125], train_loss:0.156266
Epoch [2/2], Iter [682/3125], train_loss:0.166901
Epoch [2/2], Iter [683/3125], train_loss:0.168217
Epoch [2/2], Iter [684/3125], train_loss:0.169070
Epoch [2/2], Iter [685/3125], train_loss:0.162491
Epoch [2/2], Iter [686/3125], train_loss:0.168951
Epoch [2/2], Iter [687/3125], train_loss:0.125869
Epoch [2/2], Iter [688/3125], train_loss:0.181195
Epoch [2/2], Iter [689/3125], train_loss:0.177369
Epoch [2/2], Iter [690/3125], train_loss:0.161117
Epoch [2/2], Iter [691/3125], train_loss:0.157555
Epoch [2/2], Iter [692/3125], train_loss:0.159016
Epoch [2/2], Iter [693/3125], train_loss:0.157256
Epoch [2/2], Iter [694/3125], train_loss:0.164547
Epoch [2/2], Iter [695/3125], train_loss:0.165163
Epoch [2/2], Iter [696/3125], train_loss:0.168598
Epoch [2/2], Iter [697/3125], train_loss:0.167152
Epoch [2/2], Iter [698/3125], train_loss:0.174982
Epoch [2/2], Iter [699/3125], train_loss:0.150731
Epoch [2/2], Iter [700/3125], train_loss:0.144726
Epoch [2/2], Iter [701/3125], train_loss:0.161515
Epoch [2/2], Iter [702/3125], train_loss:0.168019
Epoch [2/2], Iter [703/3125], train_loss:0.151221
Epoch [2/2], Iter [704/3125], train_loss:0.155330
Epoch [2/2], Iter [705/3125], train_loss:0.162497
Epoch [2/2], Iter [706/3125], train_loss:0.146891
Epoch [2/2], Iter [707/3125], train_loss:0.144152
Epoch [2/2], Iter [708/3125], train_loss:0.169863
Epoch [2/2], Iter [709/3125], train_loss:0.151497
Epoch [2/2], Iter [710/3125], train_loss:0.171949
Epoch [2/2], Iter [711/3125], train_loss:0.144536
Epoch [2/2], Iter [712/3125], train_loss:0.174258
Epoch [2/2], Iter [713/3125], train_loss:0.156956
Epoch [2/2], Iter [714/3125], train_loss:0.143885
Epoch [2/2], Iter [715/3125], train_loss:0.154764
Epoch [2/2], Iter [716/3125], train_loss:0.158947
Epoch [2/2], Iter [717/3125], train_loss:0.169612
Epoch [2/2], Iter [718/3125], train_loss:0.183921
Epoch [2/2], Iter [719/3125], train_loss:0.164853
Epoch [2/2], Iter [720/3125], train_loss:0.152667
Epoch [2/2], Iter [721/3125], train_loss:0.164879
Epoch [2/2], Iter [722/3125], train_loss:0.162339
Epoch [2/2], Iter [723/3125], train_loss:0.155902
Epoch [2/2], Iter [724/3125], train_loss:0.166309
Epoch [2/2], Iter [725/3125], train_loss:0.169535
Epoch [2/2], Iter [726/3125], train_loss:0.157821
Epoch [2/2], Iter [727/3125], train_loss:0.177206
Epoch [2/2], Iter [728/3125], train_loss:0.161878
Epoch [2/2], Iter [729/3125], train_loss:0.165634
Epoch [2/2], Iter [730/3125], train_loss:0.162080
Epoch [2/2], Iter [731/3125], train_loss:0.149615
Epoch [2/2], Iter [732/3125], train_loss:0.157824
Epoch [2/2], Iter [733/3125], train_loss:0.160058
Epoch [2/2], Iter [734/3125], train_loss:0.164464
Epoch [2/2], Iter [735/3125], train_loss:0.173593
Epoch [2/2], Iter [736/3125], train_loss:0.177152
Epoch [2/2], Iter [737/3125], train_loss:0.185746
Epoch [2/2], Iter [738/3125], train_loss:0.161387
Epoch [2/2], Iter [739/3125], train_loss:0.163264
Epoch [2/2], Iter [740/3125], train_loss:0.165813
Epoch [2/2], Iter [741/3125], train_loss:0.172456
Epoch [2/2], Iter [742/3125], train_loss:0.173366
Epoch [2/2], Iter [743/3125], train_loss:0.167722
Epoch [2/2], Iter [744/3125], train_loss:0.152204
Epoch [2/2], Iter [745/3125], train_loss:0.162796
Epoch [2/2], Iter [746/3125], train_loss:0.148085
Epoch [2/2], Iter [747/3125], train_loss:0.138988
Epoch [2/2], Iter [748/3125], train_loss:0.165154
Epoch [2/2], Iter [749/3125], train_loss:0.163704
Epoch [2/2], Iter [750/3125], train_loss:0.139482
Epoch [2/2], Iter [751/3125], train_loss:0.146638
Epoch [2/2], Iter [752/3125], train_loss:0.179230
Epoch [2/2], Iter [753/3125], train_loss:0.168096
Epoch [2/2], Iter [754/3125], train_loss:0.157946
Epoch [2/2], Iter [755/3125], train_loss:0.121326
Epoch [2/2], Iter [756/3125], train_loss:0.160800
Epoch [2/2], Iter [757/3125], train_loss:0.143741
Epoch [2/2], Iter [758/3125], train_loss:0.164546
Epoch [2/2], Iter [759/3125], train_loss:0.153188
Epoch [2/2], Iter [760/3125], train_loss:0.153755
Epoch [2/2], Iter [761/3125], train_loss:0.156617
Epoch [2/2], Iter [762/3125], train_loss:0.165343
Epoch [2/2], Iter [763/3125], train_loss:0.152439
Epoch [2/2], Iter [764/3125], train_loss:0.150895
Epoch [2/2], Iter [765/3125], train_loss:0.171088
Epoch [2/2], Iter [766/3125], train_loss:0.152008
Epoch [2/2], Iter [767/3125], train_loss:0.159565
Epoch [2/2], Iter [768/3125], train_loss:0.141178
Epoch [2/2], Iter [769/3125], train_loss:0.151271
Epoch [2/2], Iter [770/3125], train_loss:0.141239
Epoch [2/2], Iter [771/3125], train_loss:0.178049
Epoch [2/2], Iter [772/3125], train_loss:0.181188
Epoch [2/2], Iter [773/3125], train_loss:0.173826
Epoch [2/2], Iter [774/3125], train_loss:0.175326
Epoch [2/2], Iter [775/3125], train_loss:0.167236
Epoch [2/2], Iter [776/3125], train_loss:0.149285
Epoch [2/2], Iter [777/3125], train_loss:0.153321
Epoch [2/2], Iter [778/3125], 
相关推荐
繁依Fanyi1 小时前
使用 Spring Boot + Redis + Vue 实现动态路由加载页面
开发语言·vue.js·pytorch·spring boot·redis·python·算法
crownyouyou3 小时前
第一次安装Pytorch
人工智能·pytorch·python
qq_435070783 小时前
python乱炖6——sum(),指定维度进行求和
pytorch·python·深度学习
醒了就刷牙4 小时前
机器翻译与数据集_by《李沐:动手学深度学习v2》pytorch版
pytorch·深度学习·机器翻译
布吉岛呀~9 小时前
Pytorch复习笔记--pytorch常见交叉熵函数的实现
pytorch
fydw_7159 小时前
PyTorch 激活函数及非线性变换详解
人工智能·pytorch·python
天蓝蓝235289 小时前
PyTorch开源的深度学习框架
pytorch·深度学习·开源
#include<菜鸡>12 小时前
动手学深度学习(pytorch土堆)-06损失函数与反向传播、模型训练、GPU训练
人工智能·pytorch·深度学习
5pace1 天前
PyTorch深度学习快速入门教程【土堆】基础知识篇
人工智能·pytorch·深度学习
AI完全体1 天前
AI小项目4-用Pytorch从头实现Transformer(详细注解)
人工智能·pytorch·深度学习·机器学习·语言模型·transformer·注意力机制