GIS之深度学习10:运行Faster RCNN算法

(未完成,待补充)


获取Faster RCNN源码

(开源的很多,论文里也有,在这里不多赘述)

替换自己的数据集(图片+标签文件)

(需要使用labeling生成标签文件)

打开终端,进入gpupytorch环境

运行voc_annotation.py文件生成与训练文件

复制代码
E:\DeepLearningModel\Model01>activate gpupytorch

(gpupytorch) E:\DeepLearningModel\Model01>python voc_annotation.py
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
  warnings.warn("loaded more than 1 DLL from .libs:\n%s" %
Generate txt in ImageSets.
train and val size 777
train size 699
Generate txt in ImageSets done.
Generate 2007_train.txt and 2007_val.txt for train.

结果所示:

复制代码
(gpupytorch) E:\DeepLearningModel\Model01>python voc_annotation.py
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
  warnings.warn("loaded more than 1 DLL from .libs:\n%s" %
Generate txt in ImageSets.
train and val size 777
train size 699
Generate txt in ImageSets done.
Generate 2007_train.txt and 2007_val.txt for train.
Generate 2007_train.txt and 2007_val.txt for train done.
|  leopard | 174 |
|     boar | 491 |
| roe_deer | 352 |

(gpupytorch) E:\DeepLearningModel\Model01>

运行:train.py文件

python 复制代码
import colorsys
import os
import time

import numpy as np
import torch
import torch.nn as nn
from PIL import Image, ImageDraw, ImageFont

from nets.frcnn import FasterRCNN
from utils.utils import (cvtColor, get_classes, get_new_img_size, resize_image,
                         preprocess_input, show_config)
from utils.utils_bbox import DecodeBox



class FRCNN(object):
    _defaults = {

        "model_path"    : 'logs/loss_2024_03_05_22_26_24.pth',
        "classes_path"  : 'model_data/voc_classes.txt',
        "backbone"      : "resnet50",
        "confidence"    : 0.5,
        "nms_iou"       : 0.3,
        'anchors_size'  : [8, 16, 32],
        "cuda"          : True,
    }

    @classmethod
    def get_defaults(cls, n):
        if n in cls._defaults:
            return cls._defaults[n]
        else:
            return "Unrecognized attribute name '" + n + "'"
    def __init__(self, **kwargs):
        self.__dict__.update(self._defaults)
        for name, value in kwargs.items():
            setattr(self, name, value)
            self._defaults[name] = value 
        self.class_names, self.num_classes  = get_classes(self.classes_path)

        self.std    = torch.Tensor([0.1, 0.1, 0.2, 0.2]).repeat(self.num_classes + 1)[None]
        if self.cuda:
            self.std    = self.std.cuda()
        self.bbox_util  = DecodeBox(self.std, self.num_classes)
        #---------------------------------------------------#
        hsv_tuples = [(x / self.num_classes, 1., 1.) for x in range(self.num_classes)]
        self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))
        self.colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), self.colors))
        self.generate()

        show_config(**self._defaults)

    #---------------------------------------------------#
    #   载入模型
    #---------------------------------------------------#
    def generate(self):
        self.net    = FasterRCNN(self.num_classes, "predict", anchor_scales = self.anchors_size, backbone = self.backbone)
        device      = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.net.load_state_dict(torch.load(self.model_path, map_location=device))
        self.net    = self.net.eval()
        print('{} model, anchors, and classes loaded.'.format(self.model_path))
        
        if self.cuda:
            self.net = nn.DataParallel(self.net)
            self.net = self.net.cuda()
    
    #---------------------------------------------------#
    #   检测图片
    #---------------------------------------------------#
    def detect_image(self, image, crop = False, count = False):
        #---------------------------------------------------#
        #   计算输入图片的高和宽
        #---------------------------------------------------#
        image_shape = np.array(np.shape(image)[0:2])
        #---------------------------------------------------#
        #   计算resize后的图片的大小,resize后的图片短边为600
        #---------------------------------------------------#
        input_shape = get_new_img_size(image_shape[0], image_shape[1])
        #---------------------------------------------------------#
        #   在这里将图像转换成RGB图像,防止灰度图在预测时报错。
        #   代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB
        #---------------------------------------------------------#
        image       = cvtColor(image)
        #---------------------------------------------------------#
        #   给原图像进行resize,resize到短边为600的大小上
        #---------------------------------------------------------#
        image_data  = resize_image(image, [input_shape[1], input_shape[0]])
        #---------------------------------------------------------#
        #   添加上batch_size维度
        #---------------------------------------------------------#
        image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)

        with torch.no_grad():
            images = torch.from_numpy(image_data)
            if self.cuda:
                images = images.cuda()
            
            #-------------------------------------------------------------#
            #   roi_cls_locs  建议框的调整参数
            #   roi_scores    建议框的种类得分
            #   rois          建议框的坐标
            #-------------------------------------------------------------#
            roi_cls_locs, roi_scores, rois, _ = self.net(images)
            #-------------------------------------------------------------#
            #   利用classifier的预测结果对建议框进行解码,获得预测框
            #-------------------------------------------------------------#
            results = self.bbox_util.forward(roi_cls_locs, roi_scores, rois, image_shape, input_shape, 
                                                    nms_iou = self.nms_iou, confidence = self.confidence)
            #---------------------------------------------------------#
            #   如果没有检测出物体,返回原图
            #---------------------------------------------------------#           
            if len(results[0]) <= 0:
                return image
                
            top_label   = np.array(results[0][:, 5], dtype = 'int32')
            top_conf    = results[0][:, 4]
            top_boxes   = results[0][:, :4]
        
        #---------------------------------------------------------#
        #   设置字体与边框厚度
        #---------------------------------------------------------#
        font        = ImageFont.truetype(font='model_data/simhei.ttf', size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))
        thickness   = int(max((image.size[0] + image.size[1]) // np.mean(input_shape), 1))
        #---------------------------------------------------------#
        #   计数
        #---------------------------------------------------------#
        if count:
            print("top_label:", top_label)
            classes_nums    = np.zeros([self.num_classes])
            for i in range(self.num_classes):
                num = np.sum(top_label == i)
                if num > 0:
                    print(self.class_names[i], " : ", num)
                classes_nums[i] = num
            print("classes_nums:", classes_nums)
        #---------------------------------------------------------#
        #   是否进行目标的裁剪
        #---------------------------------------------------------#
        if crop:
            for i, c in list(enumerate(top_label)):
                top, left, bottom, right = top_boxes[i]
                top     = max(0, np.floor(top).astype('int32'))
                left    = max(0, np.floor(left).astype('int32'))
                bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
                right   = min(image.size[0], np.floor(right).astype('int32'))
                
                dir_save_path = "img_crop"
                if not os.path.exists(dir_save_path):
                    os.makedirs(dir_save_path)
                crop_image = image.crop([left, top, right, bottom])
                crop_image.save(os.path.join(dir_save_path, "crop_" + str(i) + ".png"), quality=95, subsampling=0)
                print("save crop_" + str(i) + ".png to " + dir_save_path)
        #---------------------------------------------------------#
        #   图像绘制
        #---------------------------------------------------------#
        for i, c in list(enumerate(top_label)):
            predicted_class = self.class_names[int(c)]
            box             = top_boxes[i]
            score           = top_conf[i]

            top, left, bottom, right = box

            top     = max(0, np.floor(top).astype('int32'))
            left    = max(0, np.floor(left).astype('int32'))
            bottom  = min(image.size[1], np.floor(bottom).astype('int32'))
            right   = min(image.size[0], np.floor(right).astype('int32'))

            label = '{} {:.2f}'.format(predicted_class, score)
            draw = ImageDraw.Draw(image)
            label_size = draw.textsize(label, font)
            label = label.encode('utf-8')
            # print(label, top, left, bottom, right)
            
            if top - label_size[1] >= 0:
                text_origin = np.array([left, top - label_size[1]])
            else:
                text_origin = np.array([left, top + 1])

            for i in range(thickness):
                draw.rectangle([left + i, top + i, right - i, bottom - i], outline=self.colors[c])
            draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)], fill=self.colors[c])
            draw.text(text_origin, str(label,'UTF-8'), fill=(0, 0, 0), font=font)
            del draw

        return image

    def get_FPS(self, image, test_interval):
        #---------------------------------------------------#
        #   计算输入图片的高和宽
        #---------------------------------------------------#
        image_shape = np.array(np.shape(image)[0:2])
        input_shape = get_new_img_size(image_shape[0], image_shape[1])
        #---------------------------------------------------------#
        #   在这里将图像转换成RGB图像,防止灰度图在预测时报错。
        #   代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB
        #---------------------------------------------------------#
        image       = cvtColor(image)
        
        #---------------------------------------------------------#
        #   给原图像进行resize,resize到短边为600的大小上
        #---------------------------------------------------------#
        image_data  = resize_image(image, [input_shape[1], input_shape[0]])
        #---------------------------------------------------------#
        #   添加上batch_size维度
        #---------------------------------------------------------#
        image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)

        with torch.no_grad():
            images = torch.from_numpy(image_data)
            if self.cuda:
                images = images.cuda()

            roi_cls_locs, roi_scores, rois, _ = self.net(images)
            #-------------------------------------------------------------#
            #   利用classifier的预测结果对建议框进行解码,获得预测框
            #-------------------------------------------------------------#
            results = self.bbox_util.forward(roi_cls_locs, roi_scores, rois, image_shape, input_shape, 
                                                    nms_iou = self.nms_iou, confidence = self.confidence)
        t1 = time.time()
        for _ in range(test_interval):
            with torch.no_grad():
                roi_cls_locs, roi_scores, rois, _ = self.net(images)
                #-------------------------------------------------------------#
                #   利用classifier的预测结果对建议框进行解码,获得预测框
                #-------------------------------------------------------------#
                results = self.bbox_util.forward(roi_cls_locs, roi_scores, rois, image_shape, input_shape, 
                                                        nms_iou = self.nms_iou, confidence = self.confidence)
                
        t2 = time.time()
        tact_time = (t2 - t1) / test_interval
        return tact_time

    #---------------------------------------------------#
    #   检测图片
    #---------------------------------------------------#
    def get_map_txt(self, image_id, image, class_names, map_out_path):
        f = open(os.path.join(map_out_path, "detection-results/"+image_id+".txt"),"w")
        #---------------------------------------------------#
        #   计算输入图片的高和宽
        #---------------------------------------------------#
        image_shape = np.array(np.shape(image)[0:2])
        input_shape = get_new_img_size(image_shape[0], image_shape[1])
        #---------------------------------------------------------#
        #   在这里将图像转换成RGB图像,防止灰度图在预测时报错。
        #   代码仅仅支持RGB图像的预测,所有其它类型的图像都会转化成RGB
        #---------------------------------------------------------#
        image       = cvtColor(image)
        
        #---------------------------------------------------------#
        #   给原图像进行resize,resize到短边为600的大小上
        #---------------------------------------------------------#
        image_data  = resize_image(image, [input_shape[1], input_shape[0]])
        #---------------------------------------------------------#
        #   添加上batch_size维度
        #---------------------------------------------------------#
        image_data  = np.expand_dims(np.transpose(preprocess_input(np.array(image_data, dtype='float32')), (2, 0, 1)), 0)

        with torch.no_grad():
            images = torch.from_numpy(image_data)
            if self.cuda:
                images = images.cuda()

            roi_cls_locs, roi_scores, rois, _ = self.net(images)
            #-------------------------------------------------------------#
            #   利用classifier的预测结果对建议框进行解码,获得预测框
            #-------------------------------------------------------------#
            results = self.bbox_util.forward(roi_cls_locs, roi_scores, rois, image_shape, input_shape, 
                                                    nms_iou = self.nms_iou, confidence = self.confidence)
            #--------------------------------------#
            #   如果没有检测到物体,则返回原图
            #--------------------------------------#
            if len(results[0]) <= 0:
                return 

            top_label   = np.array(results[0][:, 5], dtype = 'int32')
            top_conf    = results[0][:, 4]
            top_boxes   = results[0][:, :4]
        
        for i, c in list(enumerate(top_label)):
            predicted_class = self.class_names[int(c)]
            box             = top_boxes[i]
            score           = str(top_conf[i])

            top, left, bottom, right = box
            if predicted_class not in class_names:
                continue

            f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))

        f.close()
        return 

终端/编码器运行:

复制代码
E:\DeepLearningModel\Model01>activate gpupytorch

(gpupytorch) E:\DeepLearningModel\Model01>python train.py
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll
  warnings.warn("loaded more than 1 DLL from .libs:\n%s" %
Number of devices: 1
initialize network with normal type
Load weights model_data/voc_weights_resnet.pth.

Successful Load Key: ['extractor.0.weight', 'extractor.1.weight', 'extractor.1.bias', 'extractor.1.running_mean', 'extractor.1.running_var', 'extractor.1.num_batches_tracked', 'extractor.4.0.conv1.weight', 'extractor.4.0.bn1.weight', 'extractor.4.0.bn1.bias', 'extractor.4.0.bn1.running_mean', 'extractor.4.0.bn1.running_var', 'extractor.4.0.bn1.num_batches_tracked', 'extractor.4.0.conv2.weight', 'extractor.4.0.bn2.weight', 'extractor.4.0.bn2.bias', 'extractor.4.0.bn2.running_mean', 'extractor.4.0.bn2.running_var', 'e ......
Successful Load Key Num: 324

Fail To Load Key: ['head.cls_loc.weight', 'head.cls_loc.bias', 'head.score.weight', 'head.score.bias'] ......
Fail To Load Key num: 4

温馨提示,head部分没有载入是正常现象,Backbone部分没有载入是错误的。
Configurations:
----------------------------------------------------------------------
|                     keys |                                   values|
----------------------------------------------------------------------
|             classes_path |               model_data/voc_classes.txt|
|               model_path |        model_data/voc_weights_resnet.pth|
|              input_shape |                               [600, 600]|
|               Init_Epoch |                                        0|
|             Freeze_Epoch |                                       50|
|           UnFreeze_Epoch |                                      100|
|        Freeze_batch_size |                                        4|
|      Unfreeze_batch_size |                                        2|
|             Freeze_Train |                                     True|
|                  Init_lr |                                   0.0001|
|                   Min_lr |                   1.0000000000000002e-06|
|           optimizer_type |                                     adam|
|                 momentum |                                      0.9|
|            lr_decay_type |                                      cos|
|              save_period |                                        5|
|                 save_dir |                                     logs|
|              num_workers |                                        4|
|                num_train |                                      699|
|                  num_val |                                       78|
----------------------------------------------------------------------
Start Train
Epoch 1/100:   0%|                                                               | 0/174 [00:00<?, ?it/s<class 'dict'>]D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
D:\Anaconda\envs\gpupytorch\lib\site-packages\numpy\.libs\libopenblas.PYQHXLVVQ7VESDPUVUADXEVJOBGHJPAY.gfortran-win_amd64.dll

查看结果:

复制代码
Calculate Map.
96.35% = boar AP        ||      score_threhold=0.5 : F1=0.81 ; Recall=97.92% ; Precision=69.12%
94.74% = leopard AP     ||      score_threhold=0.5 : F1=0.90 ; Recall=94.74% ; Precision=85.71%
94.97% = roe_deer AP    ||      score_threhold=0.5 : F1=0.86 ; Recall=96.88% ; Precision=77.50%
mAP = 95.35%
Get map done.
Epoch:100/100
Total Loss: 0.505 || Val Loss: 0.621
Save best model to best_epoch_weights.pth
相关推荐
人工智能培训几秒前
AI模型部署进阶:Docker容器化部署AI项目
人工智能·深度学习·机器学习·docker·容器·transformer·知识图谱
佳xuan2 分钟前
部署大模型
人工智能
AI服务老曹3 分钟前
深度架构解析:如何构建兼容 X86/ARM 及异构计算(GPU/NPU)的级联级 AI 视频管理平台
arm开发·人工智能·架构
哦哦~9216 分钟前
揭示多功能合成界面,增强致密厚复合电极的机械和电化学性能
服务器·网络·人工智能
企业架构师老王6 分钟前
注册审批申报材料自动校验:如何利用实在Agent构建非侵入式架构并降低数据误报率?
大数据·人工智能·ai·架构
吃鱼的灰太狼7 分钟前
Hermes 全配置指南:从裸版到 AI Agent 天花板
人工智能
HoneyMoose8 分钟前
Discourse 提供 AI 总结功能
人工智能
三维频道8 分钟前
深度解析:3D-DIC技术如何精准表征复合材料的变形与损伤演化?
人工智能·计算机视觉·3d·全场应变测量·dic技术·新拓三维·复合材料力学表征
NebulaData9 分钟前
GPT-image 2 重磅上线,Nebula Lab 带您解锁 AI 创意新可能(附提示词版)
人工智能
GISer_Jing9 分钟前
Dify可视化编排:技术架构与实战指南
前端·人工智能·ai编程