GIS之深度学习10:运行Faster RCNN算法

(未完成,待补充)


获取Faster RCNN源码

(开源的很多,论文里也有,在这里不多赘述)

替换自己的数据集(图片+标签文件)

(需要使用labeling生成标签文件)

打开终端,进入gpupytorch环境

运行voc_annotation.py文件生成与训练文件

复制代码
E:\DeepLearningModel\Model01>activate gpupytorch

(gpupytorch) E:\DeepLearningModel\Model01>python voc_annotation.py
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
  warnings.warn("loaded more than 1 DLL from .libs:\n%s" %
Generate txt in ImageSets.
train and val size 777
train size 699
Generate txt in ImageSets done.
Generate 2007_train.txt and 2007_val.txt for train.

结果所示:

复制代码
(gpupytorch) E:\DeepLearningModel\Model01>python voc_annotation.py
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
  warnings.warn("loaded more than 1 DLL from .libs:\n%s" %
Generate txt in ImageSets.
train and val size 777
train size 699
Generate txt in ImageSets done.
Generate 2007_train.txt and 2007_val.txt for train.
Generate 2007_train.txt and 2007_val.txt for train done.
|  leopard | 174 |
|     boar | 491 |
| roe_deer | 352 |

(gpupytorch) E:\DeepLearningModel\Model01>

运行:train.py文件

python 复制代码
import colorsys
import os
import time

import numpy as np
import torch
import torch.nn as nn
from PIL import Image, ImageDraw, ImageFont

from nets.frcnn import FasterRCNN
from utils.utils import (cvtColor, get_classes, get_new_img_size, resize_image,
                         preprocess_input, show_config)
from utils.utils_bbox import DecodeBox



class FRCNN(object):
    _defaults = {

        "model_path"    : 'logs/loss_2024_03_05_22_26_24.pth',
        "classes_path"  : 'model_data/voc_classes.txt',
        "backbone"      : "resnet50",
        "confidence"    : 0.5,
        "nms_iou"       : 0.3,
        'anchors_size'  : [8, 16, 32],
        "cuda"          : True,
    }

    @classmethod
    def get_defaults(cls, n):
        if n in cls._defaults:
            return cls._defaults[n]
        else:
            return "Unrecognized attribute name '" + n + "'"
    def __init__(self, **kwargs):
        self.__dict__.update(self._defaults)
        for name, value in kwargs.items():
            setattr(self, name, value)
            self._defaults[name] = value 
        self.class_names, self.num_classes  = get_classes(self.classes_path)

        self.std    = torch.Tensor([0.1, 0.1, 0.2, 0.2]).repeat(self.num_classes + 1)[None]
        if self.cuda:
            self.std    = self.std.cuda()
        self.bbox_util  = DecodeBox(self.std, self.num_classes)
        #---------------------------------------------------#
        hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]
        self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
        self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))
        self.generate()

        show_config(**self._defaults)

    #---------------------------------------------------#
    #   载入模型
    #---------------------------------------------------#
    def generate(self):
        self.net    = FasterRCNN(self.num_classes, "predict", anchor_scales = self.anchors_size, backbone = self.backbone)
        device      = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.net.load_state_dict(torch.load(self.model_path, map_location=device))
        self.net    = self.net.eval()
        print('{} model, anchors, and classes loaded.'.format(self.model_path))
        
        if self.cuda:
            self.net = nn.DataParallel(self.net)
            self.net = self.net.cuda()
    
    #---------------------------------------------------#
    #   检测图片
    #---------------------------------------------------#
    def detect_image(self, image, crop = False, count = False):
        #---------------------------------------------------#
        #   计算输入图片的高和宽
        #---------------------------------------------------#
        image_shape = np.array(np.shape(image)[0:2])
        #---------------------------------------------------#
        #   计算resize后的图片的大小,resize后的图片短边为600
        #---------------------------------------------------#
        input_shape = get_new_img_size(image_shape[0], image_shape[1])
        #---------------------------------------------------------#
        #   在这里将图像转换成RGB图像,防止灰度图在预测时报错。
        #   代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB
        #---------------------------------------------------------#
        image       = cvtColor(image)
        #---------------------------------------------------------#
        #   给原图像进行resize,resize到短边为600的大小上
        #---------------------------------------------------------#
        image_data  = resize_image(image, [input_shape[1], input_shape[0]])
        #---------------------------------------------------------#
        #   添加上batch_size维度
        #---------------------------------------------------------#
        image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)

        with torch.no_grad():
            images = torch.from_numpy(image_data)
            if self.cuda:
                images = images.cuda()
            
            #-------------------------------------------------------------#
            #   roi_cls_locs  建议框的调整参数
            #   roi_scores    建议框的种类得分
            #   rois          建议框的坐标
            #-------------------------------------------------------------#
            roi_cls_locs, roi_scores, rois, _ = self.net(images)
            #-------------------------------------------------------------#
            #   利用classifier的预测结果对建议框进行解码,获得预测框
            #-------------------------------------------------------------#
            results = self.bbox_util.forward(roi_cls_locs, roi_scores, rois, image_shape, input_shape, 
                                                    nms_iou = self.nms_iou, confidence = self.confidence)
            #---------------------------------------------------------#
            #   如果没有检测出物体,返回原图
            #---------------------------------------------------------#           
            if len(results[0]) <= 0:
                return image
                
            top_label   = np.array(results[0][:, 5], dtype = 'int32')
            top_conf    = results[0][:, 4]
            top_boxes   = results[0][:, :4]
        
        #---------------------------------------------------------#
        #   设置字体与边框厚度
        #---------------------------------------------------------#
        font        = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
        thickness   = int(max((image.size[0] + image.size[1]) // np.mean(input_shape), 1))
        #---------------------------------------------------------#
        #   计数
        #---------------------------------------------------------#
        if count:
            print("top_label:", top_label)
            classes_nums    = np.zeros([self.num_classes])
            for i in range(self.num_classes):
                num = np.sum(top_label == i)
                if num > 0:
                    print(self.class_names[i], " : ", num)
                classes_nums[i] = num
            print("classes_nums:", classes_nums)
        #---------------------------------------------------------#
        #   是否进行目标的裁剪
        #---------------------------------------------------------#
        if crop:
            for i, c in list(enumerate(top_label)):
                top, left, bottom, right = top_boxes[i]
                top     = max(0, np.floor(top).astype('int32'))
                left    = max(0, np.floor(left).astype('int32'))
                bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
                right   = min(image.size[0], np.floor(right).astype('int32'))
                
                dir_save_path = "img_crop"
                if not os.path.exists(dir_save_path):
                    os.makedirs(dir_save_path)
                crop_image = image.crop([left, top, right, bottom])
                crop_image.save(os.path.join(dir_save_path, "crop_" + str(i) + ".png"), quality=95, subsampling=0)
                print("save crop_" + str(i) + ".png to " + dir_save_path)
        #---------------------------------------------------------#
        #   图像绘制
        #---------------------------------------------------------#
        for i, c in list(enumerate(top_label)):
            predicted_class = self.class_names[int(c)]
            box             = top_boxes[i]
            score           = top_conf[i]

            top, left, bottom, right = box

            top     = max(0, np.floor(top).astype('int32'))
            left    = max(0, np.floor(left).astype('int32'))
            bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
            right   = min(image.size[0], np.floor(right).astype('int32'))

            label = '{} {:.2f}'.format(predicted_class, score)
            draw = ImageDraw.Draw(image)
            label_size = draw.textsize(label, font)
            label = label.encode('utf-8')
            # print(label, top, left, bottom, right)
            
            if top - label_size[1] >= 0:
                text_origin = np.array([left, top - label_size[1]])
            else:
                text_origin = np.array([left, top + 1])

            for i in range(thickness):
                draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c])
            draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c])
            draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
            del draw

        return image

    def get_FPS(self, image, test_interval):
        #---------------------------------------------------#
        #   计算输入图片的高和宽
        #---------------------------------------------------#
        image_shape = np.array(np.shape(image)[0:2])
        input_shape = get_new_img_size(image_shape[0], image_shape[1])
        #---------------------------------------------------------#
        #   在这里将图像转换成RGB图像,防止灰度图在预测时报错。
        #   代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB
        #---------------------------------------------------------#
        image       = cvtColor(image)
        
        #---------------------------------------------------------#
        #   给原图像进行resize,resize到短边为600的大小上
        #---------------------------------------------------------#
        image_data  = resize_image(image, [input_shape[1], input_shape[0]])
        #---------------------------------------------------------#
        #   添加上batch_size维度
        #---------------------------------------------------------#
        image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)

        with torch.no_grad():
            images = torch.from_numpy(image_data)
            if self.cuda:
                images = images.cuda()

            roi_cls_locs, roi_scores, rois, _ = self.net(images)
            #-------------------------------------------------------------#
            #   利用classifier的预测结果对建议框进行解码,获得预测框
            #-------------------------------------------------------------#
            results = self.bbox_util.forward(roi_cls_locs, roi_scores, rois, image_shape, input_shape, 
                                                    nms_iou = self.nms_iou, confidence = self.confidence)
        t1 = time.time()
        for _ in range(test_interval):
            with torch.no_grad():
                roi_cls_locs, roi_scores, rois, _ = self.net(images)
                #-------------------------------------------------------------#
                #   利用classifier的预测结果对建议框进行解码,获得预测框
                #-------------------------------------------------------------#
                results = self.bbox_util.forward(roi_cls_locs, roi_scores, rois, image_shape, input_shape, 
                                                        nms_iou = self.nms_iou, confidence = self.confidence)
                
        t2 = time.time()
        tact_time = (t2 - t1) / test_interval
        return tact_time

    #---------------------------------------------------#
    #   检测图片
    #---------------------------------------------------#
    def get_map_txt(self, image_id, image, class_names, map_out_path):
        f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w")
        #---------------------------------------------------#
        #   计算输入图片的高和宽
        #---------------------------------------------------#
        image_shape = np.array(np.shape(image)[0:2])
        input_shape = get_new_img_size(image_shape[0], image_shape[1])
        #---------------------------------------------------------#
        #   在这里将图像转换成RGB图像,防止灰度图在预测时报错。
        #   代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB
        #---------------------------------------------------------#
        image       = cvtColor(image)
        
        #---------------------------------------------------------#
        #   给原图像进行resize,resize到短边为600的大小上
        #---------------------------------------------------------#
        image_data  = resize_image(image, [input_shape[1], input_shape[0]])
        #---------------------------------------------------------#
        #   添加上batch_size维度
        #---------------------------------------------------------#
        image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)

        with torch.no_grad():
            images = torch.from_numpy(image_data)
            if self.cuda:
                images = images.cuda()

            roi_cls_locs, roi_scores, rois, _ = self.net(images)
            #-------------------------------------------------------------#
            #   利用classifier的预测结果对建议框进行解码,获得预测框
            #-------------------------------------------------------------#
            results = self.bbox_util.forward(roi_cls_locs, roi_scores, rois, image_shape, input_shape, 
                                                    nms_iou = self.nms_iou, confidence = self.confidence)
            #--------------------------------------#
            #   如果没有检测到物体,则返回原图
            #--------------------------------------#
            if len(results[0]) <= 0:
                return 

            top_label   = np.array(results[0][:, 5], dtype = 'int32')
            top_conf    = results[0][:, 4]
            top_boxes   = results[0][:, :4]
        
        for i, c in list(enumerate(top_label)):
            predicted_class = self.class_names[int(c)]
            box             = top_boxes[i]
            score           = str(top_conf[i])

            top, left, bottom, right = box
            if predicted_class not in class_names:
                continue

            f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))

        f.close()
        return 

终端/编码器运行:

复制代码
E:\DeepLearningModel\Model01>activate gpupytorch

(gpupytorch) E:\DeepLearningModel\Model01>python train.py
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
  warnings.warn("loaded more than 1 DLL from .libs:\n%s" %
Number of devices: 1
initialize network with normal type
Load weights model_data/voc_weights_resnet.pth.

Successful Load Key: ['extractor.0.weight', 'extractor.1.weight', 'extractor.1.bias', 'extractor.1.running_mean', 'extractor.1.running_var', 'extractor.1.num_batches_tracked', 'extractor.4.0.conv1.weight', 'extractor.4.0.bn1.weight', 'extractor.4.0.bn1.bias', 'extractor.4.0.bn1.running_mean', 'extractor.4.0.bn1.running_var', 'extractor.4.0.bn1.num_batches_tracked', 'extractor.4.0.conv2.weight', 'extractor.4.0.bn2.weight', 'extractor.4.0.bn2.bias', 'extractor.4.0.bn2.running_mean', 'extractor.4.0.bn2.running_var', 'e ......
Successful Load Key Num: 324

Fail To Load Key: ['head.cls_loc.weight', 'head.cls_loc.bias', 'head.score.weight', 'head.score.bias'] ......
Fail To Load Key num: 4

温馨提示,head部分没有载入是正常现象,Backbone部分没有载入是错误的。
Configurations:
----------------------------------------------------------------------
|                     keys |                                   values|
----------------------------------------------------------------------
|             classes_path |               model_data/voc_classes.txt|
|               model_path |        model_data/voc_weights_resnet.pth|
|              input_shape |                               [600, 600]|
|               Init_Epoch |                                        0|
|             Freeze_Epoch |                                       50|
|           UnFreeze_Epoch |                                      100|
|        Freeze_batch_size |                                        4|
|      Unfreeze_batch_size |                                        2|
|             Freeze_Train |                                     True|
|                  Init_lr |                                   0.0001|
|                   Min_lr |                   1.0000000000000002e-06|
|           optimizer_type |                                     adam|
|                 momentum |                                      0.9|
|            lr_decay_type |                                      cos|
|              save_period |                                        5|
|                 save_dir |                                     logs|
|              num_workers |                                        4|
|                num_train |                                      699|
|                  num_val |                                       78|
----------------------------------------------------------------------
Start Train
Epoch 1/100:   0%|                                                               | 0/174 [00:00<?, ?it/s<class 'dict'>]D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll

查看结果:

复制代码
Calculate Map.
96.35% = boar AP        ||      score_threhold=0.5 : F1=0.81 ; Recall=97.92% ; Precision=69.12%
94.74% = leopard AP     ||      score_threhold=0.5 : F1=0.90 ; Recall=94.74% ; Precision=85.71%
94.97% = roe_deer AP    ||      score_threhold=0.5 : F1=0.86 ; Recall=96.88% ; Precision=77.50%
mAP = 95.35%
Get map done.
Epoch:100/100
Total Loss: 0.505 || Val Loss: 0.621
Save best model to best_epoch_weights.pth
相关推荐
智算菩萨5 分钟前
【Python机器学习】K-Means 聚类:数据分组与用户画像的完整技术指南
人工智能·python·机器学习
fantasy_arch12 分钟前
LSTM模型学习分析
人工智能·学习·lstm
Java后端的Ai之路15 分钟前
【神经网络基础】-前向传播说明指南
人工智能·深度学习·神经网络·前向传播
熊猫钓鱼>_>30 分钟前
GLM4.6多工具协同开发实践:AI构建智能任务管理系统的完整指南
人工智能·python·状态模式·ai编程·glm·分类系统·开发架构
川西胖墩墩32 分钟前
中文PC端跨职能流程图模板免费下载
大数据·论文阅读·人工智能·架构·流程图
Keep_Trying_Go43 分钟前
MaskGIT掩码生成图算法详解(MaskGIT: Masked Generative Image Transformer)
人工智能·深度学习·transformer
致Great1 小时前
大模型对齐核心技术:从第一性原理完整推导 PPO 算法!
人工智能·算法·大模型·agent·智能体
Darken031 小时前
基于STM32---编码器测速(利用GPIO模拟脉冲信号)
人工智能·stm32·串口助手·gpio模拟编码器
Mintopia1 小时前
🪄 生成式应用的 **前端 orchestration 层(编排层)指南**
人工智能·llm·aigc
雍凉明月夜1 小时前
深度学习之常用归一化(Normalization)
人工智能·深度学习·计算机视觉