基于飞浆训练车牌识别模型
基于飞浆训练车牌识别模型
LPRNet(License Plate Recognition via Deep Neural Networks)是一种轻量级卷积神经网络,专为端到端车牌识别设计,由Intel IOTG Computer Vision Group的Sergey Zherzdev于2018年提出 。该模型最大创新在于首次完全去除RNN(如BiLSTM)结构,仅通过CNN和CTC Loss实现序列识别,同时保持了高精度和实时性,对中国车牌的识别准确率高达95%,处理速度达3ms/plate(GPU端)或1.3ms/plate(CPU端) 。LPRNet凭借其轻量化、端到端特性,已成为智能交通领域的重要技术,被广泛集成到YOLO系列(如YOLOv5/v8)的检测-识别框架中,形成完整的车牌识别系统 。
一、LPRNet的基本概念与发展历程
LPRNet是一种专为车牌识别设计的深度神经网络,其核心思想是通过端到端的方式直接从原始图像中提取车牌信息,无需预先进行字符分割和RNN处理 。在传统车牌识别系统中,通常需要先通过图像处理技术(如边缘检测、阈值分割)定位车牌区域,然后进行字符分割,最后对每个分割后的字符进行识别。这种流程虽然有效,但存在分割错误导致整体识别失败的风险,且依赖于手工设计的分割规则,难以适应复杂多变的现实场景 。
LPRNet的提出标志着车牌识别技术的重要突破。2018年,Intel团队发表论文《LPRNet: License Plate Recognition via Deep Neural Networks》,首次提出了一种无需字符预分割的端到端车牌识别方法 。该方法基于深度神经网络的最新进展,将车牌检测和字符识别整合为一个统一的模型,避免了传统方法中可能出现的分割错误问题。自LPRNet发布以来,其轻量化特性(仅0.48M参数)使其成为嵌入式设备上的理想选择,随后被广泛集成到YOLO系列(如YOLOv5、YOLOv7、YOLOv8)的检测-识别框架中,形成完整的车牌识别系统 。2022年后,LPRNet在嵌入式设备(如BM1684芯片)和云平台(如SOPHNET)上的部署实践增多,进一步推动了其实时识别能力在边缘计算场景的应用 。
二、LPRNet的网络架构设计
LPRNet采用了一种创新的轻量级CNN架构,通过精心设计的网络结构实现了无需RNN的序列识别 。其整体架构主要包括以下几个关键组件:
1. 空间变换网络(STN,可选模块)
STN模块用于对输入图像进行几何变换,校正车牌的扭曲和倾斜 。该模块由定位网络(Localization Net)构成,通过卷积和全连接层输出仿射变换参数(6维),将原始图像转换为更适合识别的视角 。STN模块的引入显著提高了模型对畸变车牌的鲁棒性,但并不是模型的必需组件,可以根据具体应用场景选择是否使用 。
2. 轻量级Backbone网络
Backbone网络是LPRNet的核心特征提取模块,设计灵感来自SqueezeNet的Fire Module和Inception的多分支结构 。具体来说,Backbone由多个"Small Basic Block"堆叠而成,每个基本构建块包含以下组件:
- 1×1卷积层(通道压缩)
- 批归一化层(Batch Normalization)
- ReLU/PReLU激活函数
- 3×3卷积层(特征提取)
- 批归一化层
- ReLU/PReLU激活函数
这种设计通过减少参数量和计算复杂度,实现了高效的特征提取 。Backbone网络的输出是一个表示字符概率的序列,其长度与输入图像像素宽度相关,每个位置对应一个字符的预测概率分布 。
3. 宽卷积模块(1×13卷积)
LPRNet在Backbone末尾引入了一个宽卷积模块(1×13卷积核),这是替代传统RNN的关键设计 。该模块通过宽卷积操作提取序列方向(宽度方向)的上下文信息,捕捉字符之间的关联性,从而在不使用RNN的情况下实现序列识别。1×13卷积核的设计使得网络能够同时考虑单个字符及其相邻字符的信息,增强对字符序列的预测能力 。
4. 全局上下文融合模块
为了进一步增强序列预测的上下文关联,LPRNet在Backbone外还额外使用了一个全连接层进行全局上下文特征提取 。该全连接层提取的全局特征会被平铺到所需的大小,并与Backbone输出的局部特征进行拼接(concat) ,形成更丰富的特征表示,最终输入到字符分类头进行识别。
5. 字符分类头与CTC解码
字符分类头负责对每个位置的特征进行分类,预测可能的字符(包括汉字、字母和数字) 。网络输出的序列通过CTC(Connectionist Temporal Classification)损失函数进行训练,解决输入和输出序列长度不一致的问题 。在推理阶段,使用波束搜索(beam search)或贪婪算法(greedy algorithm)从输出序列中解码出最终的车牌字符串 。
下表展示了LPRNet与其他主流车牌识别模型的参数量和计算量对比:
模型名称 | 输入尺寸 | 参数量 | GFLOPs |
---|---|---|---|
LPRNet | 94×24 | 0.48M | 0.147 |
CRNN | 160×32 | 8.35M | 1.06 |
PlateNet | 168×48 | 1.92M | 1.25 |
从表中可以看出,LPRNet的参数量远低于CRNN,计算量也显著减少,这使其特别适合在嵌入式设备上部署 。
三、LPRNet的训练方法与优化策略
LPRNet的训练过程采用了端到端的方法,直接将原始车牌图像输入模型,输出对应的字符序列 。训练数据主要来源于中国城市停车数据集(CCPD),该数据集包含超过25万张图片,覆盖了各种复杂环境下的车牌图像,如模糊、倾斜、雨天、雪天等 。
1. 训练参数设置
LPRNet的训练参数根据不同的实现版本有所差异,但核心参数包括:
- 优化器:原始论文使用Adam优化器,初始学习率设置为0.001 ;后续改进版本(如材料[47])改用SGD优化器,初始学习率设置为0.0008,批次大小为64,训练300个epoch 。
- 学习率策略:动态衰减(如余弦退火)、Warm-up(前3个epoch逐步提升至1e-3) 。
- 正则化技术:梯度噪声注入(尺度0.008)、Dropout等防止过拟合 。
2. 数据增强方法
为提高模型的泛化能力和鲁棒性,LPRNet的训练采用了多种数据增强技术:
- 图像级增强:旋转、高斯噪声、动作模糊、图像裁剪等 。
- STN预处理:通过仿射变换模拟不同视角下的车牌图像 。
- 随机暗化处理:提升模型在夜间场景下的识别能力,使识别精度从93.2%提高到96.1% 。
- 基于GAN的数据生成:平衡字符分布,减少训练数据不足导致的过拟合问题 。
3. CTC Loss实现
LPRNet采用了CTC损失函数来解决输入和输出序列长度不一致的问题 。CTC的核心思想是引入一个"空白符"(blank)类别,允许模型在输出序列中插入空白符来对齐输入和目标序列 。在训练过程中,所有与目标序列等价的预测序列(通过删除空白符和重复字符得到相同序列)都会被考虑为正确预测,从而简化了训练流程 。
CTC解码通常采用波束搜索算法,从输出序列中寻找概率最大的字符序列 。在LPRNet中,解码过程还会结合后过滤策略,通过与国家车牌规则的预定义模板集匹配,进一步提高识别准确率 。
四、基于飞浆的代码实现
python
# -*- coding: utf-8 -*-
# @Time : 2025/7/14 22:12
# @Author : pblh123@126.com
# @File : paddlepaddle2_6_2_lprnet_licence_plate_recognize.py
# @Describe : lprnet 车牌识别
import os
import time
from statistics import mean
import paddle
from matplotlib import pyplot as plt
from paddle import nn
from paddle.io import Dataset, DataLoader
from PIL import Image
import numpy as np
import paddle.vision.transforms as T
import cv2
# 读取数据
CHARS = ['京', '沪', '津', '渝', '冀', '晋', '蒙', '辽', '吉', '黑',
'苏', '浙', '皖', '闽', '赣', '鲁', '豫', '鄂', '湘', '粤',
'桂', '琼', '川', '贵', '云', '藏', '陕', '甘', '青', '宁',
'新',
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K',
'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
'W', 'X', 'Y', 'Z', 'I', 'O', '-'
]
CHARS_DICT = {char:i for i, char in enumerate(CHARS)}
EPOCH = 50
IMGSIZE = (94, 24)
IMGDIR = './datas/rec_filtered_images/data'
TRAINFILE = './datas/rec_filtered_images/train.txt'
VALIDFILE = './datas/rec_filtered_images/valid.txt'
SAVEFOLDER = './runs'
DROPOUT = 0.01
LEARNINGRATE = 0.001
LPRMAXLEN = 18
TRAINBATCHSIZE = 64
EVALBATCHSIZE = 64
NUMWORKERS = 4 # 越大越快,和CPU性能有关。若dataloader报错,调小该参数,或直接改为0
WEIGHTDECAY = 0.001
class LprnetDataloader(Dataset):
def __init__(self, target_path, label_text, transforms=None):
super().__init__()
self.transforms = transforms
self.target_path = target_path
with open(label_text, 'r', encoding='utf-8', errors='ignore') as f:
self.data = f.readlines()
def __getitem__(self, index):
img_name = self.data[index].strip()
img_path = os.path.join(self.target_path, img_name)
data = Image.open(img_path)
label = []
img_label = img_name.split('.', 1)[0]
for c in img_label:
label.append(CHARS_DICT[c])
if len(label) == 8:
if self.check(label) == False:
print(label)
# print(imgname)
# assert 0, "Error label ^~^!!!"
if self.transforms is not None:
data = self.transforms(data)
data = np.array(data, dtype=np.float32)
np_label = np.array(label, dtype=np.int64)
return data, np_label, len(np_label)
def __len__(self):
return len(self.data)
def check(self, label):
if label[2] == CHARS_DICT['D'] or label[2] == CHARS_DICT['F'] \
or label[-1] == CHARS_DICT['D'] or label[-1] == CHARS_DICT['F']:
return True
else:
print("Error label, Please check!")
return False
def collate_fn(batch):
# 图片输入已经规范到相同大小,这里只需要对标签进行padding
batch_size = len(batch)
# 找出标签最长的
batch_temp = sorted(batch, key=lambda sample: len(sample[1]), reverse=True)
max_label_length = len(batch_temp[0][1])
# 以最大的长度创建0张量
labels = np.zeros((batch_size, max_label_length), dtype='int64')
label_lens = []
img_list = []
for x in range(batch_size):
sample = batch[x]
tensor = sample[0]
target = sample[1]
label_length = sample[2]
img_list.append(tensor)
# 将数据插入都0张量中,实现了padding
labels[x, :label_length] = target[:]
label_lens.append(len(target))
label_lens = paddle.to_tensor(label_lens, dtype='int64') # ctcloss需要
imgs = paddle.to_tensor(img_list, dtype='float32')
labels = paddle.to_tensor(labels, dtype="int32") # ctcloss仅支持int32的labels
return imgs, labels, label_lens
# LPRNet网络
# 网络结构
class small_basic_block(nn.Layer):
def __init__(self, ch_in, ch_out):
super(small_basic_block, self).__init__()
self.block = nn.Sequential(
nn.Conv2D(ch_in, ch_out // 4, kernel_size=1),
nn.ReLU(),
nn.Conv2D(ch_out // 4, ch_out // 4, kernel_size=(3, 1), padding=(1, 0)),
nn.ReLU(),
nn.Conv2D(ch_out // 4, ch_out // 4, kernel_size=(1, 3), padding=(0, 1)),
nn.ReLU(),
nn.Conv2D(ch_out // 4, ch_out, kernel_size=1),
)
def forward(self, x):
return self.block(x)
class maxpool_3d(nn.Layer):
def __init__(self, kernel_size, stride):
super(maxpool_3d, self).__init__()
assert(len(kernel_size)==3 and len(stride)==3)
kernel_size2d1 = kernel_size[-2:]
stride2d1 = stride[-2:]
kernel_size2d2 = (1, kernel_size[0])
stride2d2 = (1, stride[0])
self.maxpool1 = nn.MaxPool2D(kernel_size=kernel_size2d1, stride=stride2d1)
self.maxpool2 = nn.MaxPool2D(kernel_size=kernel_size2d2, stride=stride2d2)
def forward(self,x):
x = self.maxpool1(x)
x = x.transpose((0, 3, 2, 1))
x = self.maxpool2(x)
x = x.transpose((0, 3, 2, 1))
return x
class LPRNet(nn.Layer):
def __init__(self, lpr_max_len, class_num, dropout_rate):
super(LPRNet, self).__init__()
self.lpr_max_len = lpr_max_len
self.class_num = class_num
self.backbone = nn.Sequential(
nn.Conv2D(in_channels=3, out_channels=64, kernel_size=3, stride=1), # 0 [bs,3,24,94] -> [bs,64,22,92]
nn.BatchNorm2D(num_features=64), # 1 -> [bs,64,22,92]
nn.ReLU(), # 2 -> [bs,64,22,92]
maxpool_3d(kernel_size=(1, 3, 3), stride=(1, 1, 1)), # 3 -> [bs,64,20,90]
small_basic_block(ch_in=64, ch_out=128), # 4 -> [bs,128,20,90]
nn.BatchNorm2D(num_features=128), # 5 -> [bs,128,20,90]
nn.ReLU(), # 6 -> [bs,128,20,90]
maxpool_3d(kernel_size=(1, 3, 3), stride=(2, 1, 2)), # 7 -> [bs,64,18,44]
small_basic_block(ch_in=64, ch_out=256), # 8 -> [bs,256,18,44]
nn.BatchNorm2D(num_features=256), # 9 -> [bs,256,18,44]
nn.ReLU(), # 10 -> [bs,256,18,44]
small_basic_block(ch_in=256, ch_out=256), # 11 -> [bs,256,18,44]
nn.BatchNorm2D(num_features=256), # 12 -> [bs,256,18,44]
nn.ReLU(), # 13 -> [bs,256,18,44]
maxpool_3d(kernel_size=(1, 3, 3), stride=(4, 1, 2)), # 14 -> [bs,64,16,21]
nn.Dropout(dropout_rate), # 15 -> [bs,64,16,21]
nn.Conv2D(in_channels=64, out_channels=256, kernel_size=(1, 4), stride=1), # 16 -> [bs,256,16,18]
nn.BatchNorm2D(num_features=256), # 17 -> [bs,256,16,18]
nn.ReLU(), # 18 -> [bs,256,16,18]
nn.Dropout(dropout_rate), # 19 -> [bs,256,16,18]
nn.Conv2D(in_channels=256, out_channels=class_num, kernel_size=(13, 1), stride=1), # class_num=68 20 -> [bs,68,4,18]
nn.BatchNorm2D(num_features=class_num), # 21 -> [bs,68,4,18]
nn.ReLU(), # 22 -> [bs,68,4,18]
)
self.container = nn.Sequential(
nn.Conv2D(in_channels=448+self.class_num, out_channels=self.class_num, kernel_size=(1, 1), stride=(1, 1)),
)
def forward(self, x):
keep_features = list()
for i, layer in enumerate(self.backbone.children()):
x = layer(x)
if i in [2, 6, 13, 22]:
keep_features.append(x)
global_context = list()
# keep_features: [bs,64,22,92] [bs,128,20,90] [bs,256,18,44] [bs,68,4,18]
for i, f in enumerate(keep_features):
if i in [0, 1]:
# [bs,64,22,92] -> [bs,64,4,18]
# [bs,128,20,90] -> [bs,128,4,18]
f = nn.AvgPool2D(kernel_size=5, stride=5)(f)
if i in [2]:
# [bs,256,18,44] -> [bs,256,4,18]
f = nn.AvgPool2D(kernel_size=(4, 10), stride=(4, 2))(f)
f_pow = paddle.pow(f, 2) # [bs,64,4,18] 所有元素求平方
# f_mean = paddle.mean(f_pow) # 1 所有元素求平均
f_mean = paddle.mean(f_pow, axis=[1,2,3], keepdim=True)
f = paddle.divide(f, f_mean) # [bs,64,4,18] 所有元素除以这个均值
global_context.append(f)
x = paddle.concat(global_context, 1) # [bs,516,4,18]
x = self.container(x) # -> [bs, 68, 4, 18] head头
logits = paddle.mean(x, axis=2) # -> [bs, 68, 18] # 68 字符类别数 18字符序列长度
return logits
def init_weight(model):
"""权重初始化函数
使用model.apply的方法,修改每个子层的权重和偏置
"""
for name, layer in model.named_sublayers():
if isinstance(layer, nn.Conv2D):
weight_attr = nn.initializer.KaimingNormal()
bias_attr = nn.initializer.Constant(0.)
init_bias = paddle.create_parameter(layer.bias.shape, attr=bias_attr, dtype='float32')
init_weight = paddle.create_parameter(layer.weight.shape, attr=weight_attr, dtype='float32')
layer.weight = init_weight
layer.bias = init_bias
elif isinstance(layer, nn.BatchNorm2D):
weight_attr = nn.initializer.XavierUniform()
bias_attr = nn.initializer.Constant(0.)
init_bias = paddle.create_parameter(layer.bias.shape, attr=bias_attr, dtype='float32')
init_weight = paddle.create_parameter(layer.weight.shape, attr=weight_attr, dtype='float32')
layer.weight = init_weight
layer.bias = init_bias
class ACC:
def __init__(self):
self.Tp = 0
self.Tn_1 = 0
self.Tn_2 = 0
self.acc = 0
def batch_update(self, batch_label, label_lengths, pred):
for i, label in enumerate(batch_label):
length = label_lengths[i]
label = label[:length]
pred_i = pred[i, :, :]
preb_label = []
for j in range(pred_i.shape[1]): # T
preb_label.append(np.argmax(pred_i[:, j], axis=0))
no_repeat_blank_label = []
pre_c = preb_label[0]
if pre_c != len(CHARS) - 1: # 非空白
no_repeat_blank_label.append(pre_c)
for c in preb_label:
if (pre_c == c) or (c == len(CHARS) - 1):
if c == len(CHARS) - 1:
pre_c = c
continue
no_repeat_blank_label.append(c)
pre_c = c
if len(label) != len(no_repeat_blank_label):
self.Tn_1 += 1
elif (np.asarray(label) == np.asarray(no_repeat_blank_label)).all():
self.Tp += 1
else:
self.Tn_2 += 1
self.acc = self.Tp * 1.0 / (self.Tp + self.Tn_1 + self.Tn_2)
def clear(self):
self.Tp = 0
self.Tn_1 = 0
self.Tn_2 = 0
self.acc = 0
def load_pretrained(model, path=None):
"""
加载预训练模型
:param model:
:param path:
:return:
"""
print('params loading...')
if not (os.path.isdir(path) or os.path.exists(path + '.pdparams')):
raise ValueError("Model pretrain path {} does not "
"exists.".format(path))
param_state_dict = paddle.load(path + ".pdparams")
model.set_dict(param_state_dict)
print(f'load {path + ".pdparams"} success...')
return
def train_model():
"""
模型训练
"""
# 图片预处理
train_transforms = T.Compose([
T.ColorJitter(0.2, 0.2, 0.2),
T.ToTensor(data_format='CHW'),
T.Normalize(
[0.5, 0.5, 0.5], # 在totensor的时候已经将图片缩放到0-1
[0.5, 0.5, 0.5],
data_format='CHW'
),
])
eval_transforms = T.Compose([
T.ToTensor(data_format='CHW'),
T.Normalize(
[0.5, 0.5, 0.5],
[0.5, 0.5, 0.5],
data_format='CHW'
),
])
# 数据加载
train_data_set = LprnetDataloader(IMGDIR, TRAINFILE, train_transforms)
eval_data_set = LprnetDataloader(IMGDIR, VALIDFILE, eval_transforms)
train_loader = DataLoader(
train_data_set,
batch_size=TRAINBATCHSIZE,
shuffle=True,
num_workers=NUMWORKERS,
drop_last=True,
collate_fn=collate_fn
)
eval_loader = DataLoader(
eval_data_set,
batch_size=EVALBATCHSIZE,
shuffle=False,
num_workers=NUMWORKERS,
drop_last=False,
collate_fn=collate_fn
)
# 定义loss
loss_func = nn.CTCLoss(len(CHARS) - 1)
# input_length, loss计算需要
input_length = np.ones(shape=TRAINBATCHSIZE) * LPRMAXLEN
input_length = paddle.to_tensor(input_length, dtype='int64')
# LPRNet网络,初始化/加载预训练参数
model = LPRNet(LPRMAXLEN, len(CHARS), DROPOUT)
model.apply(init_weight) # 首次训练时初始化
# 定义优化器
def make_optimizer(base_lr, parameters=None):
momentum = 0.9
weight_decay = WEIGHTDECAY
scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
learning_rate=base_lr, eta_min=0.01 * base_lr, T_max=EPOCH, verbose=1)
scheduler = paddle.optimizer.lr.LinearWarmup( # 第一次训练的时候考虑模型权重不稳定,添加warmup策略
learning_rate=scheduler,
warmup_steps=5,
start_lr=base_lr / 5,
end_lr=base_lr,
verbose=True)
optimizer = paddle.optimizer.Momentum(
learning_rate=scheduler,
weight_decay=paddle.regularizer.L2Decay(weight_decay),
momentum=momentum,
parameters=parameters)
return optimizer, scheduler
optim, scheduler = make_optimizer(LEARNINGRATE, parameters=model.parameters())
# acc
acc_train = ACC()
acc_eval = ACC()
# 早停参数
patience = 5 # 连续5个epoch验证集准确率无提升则停止
best_acc = 0.1 # 初始最佳准确率
no_improve_epochs = 0 # 未提升计数器
# 训练流程
for epoch in range(EPOCH):
start_time = time.localtime(time.time())
str_time = time.strftime("%Y-%m-%d %H:%M:%S", start_time)
print(f'{str_time} || Epoch {epoch} start:')
model.train()
for batch_id, bath_data in enumerate(train_loader):
img_data, label_data, label_lens = bath_data
predict = model(img_data)
logits = paddle.transpose(predict, (2, 0, 1)) # for ctc loss: T x N x C
loss = loss_func(logits, label_data, input_length, label_lens)
acc_train.batch_update(label_data, label_lens, predict)
if batch_id % 20 == 0:
cur_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print(f'cur_time:{cur_time}, epoch:{epoch}, batch_id:{batch_id}, loss:{loss.item():.4f}, \
acc:{acc_train.acc:.4f} Tp/Tn_1/Tn_2: {acc_train.Tp}/{acc_train.Tn_1}/{acc_train.Tn_2}')
loss.backward()
optim.step()
optim.clear_grad()
acc_train.clear()
# save
if epoch and epoch % 20 == 0:
paddle.save(model.state_dict(), os.path.join(SAVEFOLDER, f'lprnet_{epoch}_2.pdparams'))
paddle.save(optim.state_dict(), os.path.join(SAVEFOLDER, f'lprnet_{epoch}_2.pdopt'))
cur_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print(f'cur_time:{cur_time}, Saved log ecpch-{epoch}')
# eval
with paddle.no_grad():
model.eval()
loss_list = []
for batch_id, bath_data in enumerate(eval_loader):
img_data, label_data, label_lens = bath_data
predict = model(img_data)
logits = paddle.transpose(predict, (2, 0, 1))
loss = loss_func(logits, label_data, input_length, label_lens)
acc_eval.batch_update(label_data, label_lens, predict)
loss_list.append(loss.item())
cur_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print(f'cur_time:{cur_time}, Eval of epoch {epoch} => acc:{acc_eval.acc:.4f}, loss:{mean(loss_list):.4f}')
# save best model
if acc_eval.acc > best_acc:
## 原先是2,我改成mymodel做区分!
paddle.save(model.state_dict(), os.path.join(SAVEFOLDER, f'lprnet_best_chosen_mymodel.pdparams'))
paddle.save(optim.state_dict(), os.path.join(SAVEFOLDER, f'lprnet_best_chosen_mymodel.pdopt'))
best_acc= acc_eval.acc
cur_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print(f'cur_time:{cur_time}, Saved best model of epoch{epoch}, acc {acc_eval.acc:.4f}, save path "{SAVEFOLDER}"')
acc_eval.clear()
# 更新早停逻辑(在原有保存最佳模型逻辑之后添加)
if acc_eval.acc > best_acc:
no_improve_epochs = 0 # 重置计数器
else:
no_improve_epochs += 1
# 早停判断
if no_improve_epochs >= patience:
cur_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print(f'\ncur_time:{cur_time}, Early stopping triggered at epoch {epoch}!')
print(f'cur_time:{cur_time}, No improvement in {patience} consecutive epochs.')
break
# 学习率衰减策略
scheduler.step()
def eval():
"""
模型评估
"""
# 图片预处理
eval_transforms = T.Compose([
T.ToTensor(data_format='CHW'),
T.Normalize(
[0.5, 0.5, 0.5], # 在totensor的时候已经将图片缩放到0-1
[0.5, 0.5, 0.5],
data_format='CHW'
),
])
# 数据加载
eval_data_set = LprnetDataloader(IMGDIR, VALIDFILE, eval_transforms)
eval_loader = DataLoader(
eval_data_set,
batch_size=EVALBATCHSIZE,
shuffle=False,
num_workers=NUMWORKERS,
drop_last=False,
collate_fn=collate_fn
)
# 定义loss
loss_func = nn.CTCLoss(len(CHARS) - 1)
# input_length, loss计算需要
input_length = np.ones(shape=TRAINBATCHSIZE) * LPRMAXLEN
input_length = paddle.to_tensor(input_length, dtype='int64')
# LPRNet网络,添加模型权重
model = LPRNet(LPRMAXLEN, len(CHARS), DROPOUT)
# 这里可以加入自己训练的模型
load_pretrained(model, 'runs/lprnet_best_chosen_mymodel')
# acc
acc_eval = ACC()
# 验证
# eval
with paddle.no_grad():
model.eval()
loss_list = []
for batch_id, bath_data in enumerate(eval_loader):
img_data, label_data, label_lens = bath_data
predict = model(img_data)
logits = paddle.transpose(predict, (2, 0, 1))
loss = loss_func(logits, label_data, input_length, label_lens)
acc_eval.batch_update(label_data, label_lens, predict)
loss_list.append(loss.item())
cur_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime())
print(f'cur_time:{cur_time}, Eval from {VALIDFILE} => acc:{acc_eval.acc:.4f}, loss:{mean(loss_list):.4f}')
acc_eval.clear()
def test():
"""
模型测试
"""
# 这里改成自己的图片地址
img_path = 'datas/rec_filtered_images/data/京A82889.jpg'
print(img_path)
img_data = cv2.imread(img_path)
# 添加图片读取校验
if img_data is None:
print(f"错误:无法读取图片 {img_path},请检查路径和文件格式")
return
img_data = img_data[:, :, ::-1] # BGR to RGB
# 保证标准差的数据前处理
try:
img_data = cv2.resize(img_data, (94, 24)) # 调整大小
except Exception as e:
print(f"图像处理失败: {str(e)}")
return
img_data = img_data / 255.0 # 将像素值缩放到[0, 1]范围
img_data = np.transpose(img_data, (2, 0, 1)) # HWC 到 CHW
img_data = np.expand_dims(img_data, 0) # 添加批次维度
# 计算每个通道的均值和标准差
mean = np.mean(img_data, axis=(0, 2, 3), keepdims=True)
std = np.std(img_data, axis=(0, 2, 3), keepdims=True)
# 缩放使得标准差为0.5
desired_std = 0.5
img_data = (img_data - mean) / std * desired_std
img_tensor = paddle.to_tensor(img_data, dtype='float32') # 转为 PaddlePaddle 张量
# 加载模型, 预测
LPRMAXLEN = 18
model = LPRNet(LPRMAXLEN, len(CHARS), dropout_rate=0)
# 换成自己训练的模型
load_pretrained(model, 'runs/lprnet_best_chosen_mymodel')
out_data = model(img_tensor)
# 后处理,单张图片数据
def reprocess(pred):
pred_data = pred[0]
pred_label = np.argmax(pred_data, axis=0)
print(pred_label)
no_repeat_blank_label = []
pre_c = pred_label[0]
if pre_c != len(CHARS) - 1: # 非空白
no_repeat_blank_label.append(pre_c)
for c in pred_label:
if (pre_c == c) or (c == len(CHARS) - 1):
if c == len(CHARS) - 1:
pre_c = c
continue
no_repeat_blank_label.append(c)
pre_c = c
char_list = [CHARS[i] for i in no_repeat_blank_label]
return ''.join(char_list)
rep_result = reprocess(out_data)
print(rep_result)
def export_model():
"""
导出模型
:return:
"""
model = LPRNet(18, 68, dropout_rate=0)
# 换自己训练的的模型
load_pretrained(model, 'runs/lprnet_best_chosen_mymodel')
# 换自己保存的路径
save_path = 'save_onnx_chosen_mymodel/lprnet'
# 检查路径是否存在,不存在则创建
if not os.path.exists(save_path):
os.makedirs(save_path)
x_spec = paddle.static.InputSpec([1, 3, 24, 94], 'float32', 'image')
x_spec = paddle.static.InputSpec([1, 3, 24, 94], 'float32', 'image')
paddle.onnx.export(model, save_path, input_spec=[x_spec], opset_version=11)
def main():
# 训练模型
train_model()
# 评估模型
eval()
# 测试模型
test()
# 导出模型
export_model()
if __name__ == '__main__':
main()
