完整指南:CNStream流处理多路并发框架适配到NVIDIA Jetson Orin (二) 源码架构流程梳理、代码编写

目录

[1 视频解码代码编写----利用jetson-ffmpeg](#1 视频解码代码编写----利用jetson-ffmpeg)

[1.1 nvstream中视频解码的代码流程框架](#1.1 nvstream中视频解码的代码流程框架)

[1.1.1 类的层次关系](#1.1.1 类的层次关系)

[1.1.2 各个类的初始化函数调用层次关系](#1.1.2 各个类的初始化函数调用层次关系)

[1.1.3 各个类的process函数调用层次关系](#1.1.3 各个类的process函数调用层次关系)

[1.2 编写视频解码代码](#1.2 编写视频解码代码)

[1.2.1 修改VideoInfo结构体定义](#1.2.1 修改VideoInfo结构体定义)

[1.2.2 修改解封装代码](#1.2.2 修改解封装代码)

[1.2.3 decode_impl_nv.hpp](#1.2.3 decode_impl_nv.hpp)

[1.2.4 decode_impl_nv.cpp](#1.2.4 decode_impl_nv.cpp)

[2 硬件相关的图像格式、内存申请接口、内存释放、内存释放等代码修改](#2 硬件相关的图像格式、内存申请接口、内存释放、内存释放等代码修改)

[2.1 infer_server/include/common/utils.hpp文件内容修改](#2.1 infer_server/include/common/utils.hpp文件内容修改)

[2.2 cuda的四种内存](#2.2 cuda的四种内存)

[2.3 infer_server/src/core/device.cpp修改](#2.3 infer_server/src/core/device.cpp修改)

[3 图像缩放、裁剪、色域转换等代码编写----利用CV-CUDA](#3 图像缩放、裁剪、色域转换等代码编写----利用CV-CUDA)

[3.1 nvstream中图像处理代码流程框架](#3.1 nvstream中图像处理代码流程框架)

[3.1.1 类的层次关系](#3.1.1 类的层次关系)

[3.1.2 各个类的初始化函数调用层次关系](#3.1.2 各个类的初始化函数调用层次关系)

[3.1.3 各个类的transform函数调用层次关系](#3.1.3 各个类的transform函数调用层次关系)

[3.2 编写图像缩放、裁剪、色域转换等代码](#3.2 编写图像缩放、裁剪、色域转换等代码)

[3.2.1 infer_server/src/nv/transform_impl_nv.hpp](#3.2.1 infer_server/src/nv/transform_impl_nv.hpp)

[3.2.2 infer_server/src/nv/transform_impl_nv.cpp](#3.2.2 infer_server/src/nv/transform_impl_nv.cpp)

[4 算法推理相关代码修改](#4 算法推理相关代码修改)

[4.1 ./infer_server/src/model/model.h](#4.1 ./infer_server/src/model/model.h)

[4.2 ./infer_server/src/model/model.cpp](#4.2 ./infer_server/src/model/model.cpp)

[5 其他代码修改](#5 其他代码修改)

参考文献:


记录下将CNStream流处理多路并发Pipeline框架适配到NVIDIA Jetson AGX Orin的过程,以及过程中遇到的问题,我的jetson盒子是用jetpack5.1.3重新刷机之后的,这是系列博客的第二篇。

1 视频解码代码编写----利用jetson-ffmpeg

用jetson-ffmpeg来做视频解码,本质上还是用ffmpeg解码,是不是jetson-ffmpeg底层调用了英伟达的nvmpi做硬件加速。

1.1 nvstream中视频解码的代码流程框架

视频解码的代码流程框架在博客:aclStream流处理多路并发Pipeline框架中 视频解码 代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客

上面博客里面的是详细代码阅读,从中提炼出来简单点的框架

1.1.1 类的层次关系

bash 复制代码
class FileHandler类里面有个FileHandlerImpl *impl_ = nullptr成员
        -->class FileHandlerImpl
                -->class FFParser 解封装的类
                -->class DeviceDecoder
                       -->class DecodeService
                               -->class IDecoder 只是个虚基类,被用来继承的               
                                       -->IDecoder *decoder_ = CreateDecoder();这里面就是new DecoderAcl()了。 DecoderAcl继承IDecoder 
                                               -->再往下就是各种的硬件相关的类了, 

上面是类的关系,这次英伟达平台上就是从IDecoder *decoder_ = CreateDecoder();开始写一个新的类然后用来做具体的解码,上层的那些类还是保持不变的。

1.1.2 各个类的初始化函数调用层次关系

cpp 复制代码
FileHandlerImpl::Open()
  std::thread(&FileHandlerImpl::Loop
    PrepareResources()
      parser_.Open
        impl_->Open(url, result, only_key_frame);
          result_->OnParserInfo(info);     
            decoder_->Create(info, &extra)这个decoder_就是DeviceDecoder类
              VdecCreate(void **vdec, VdecCreateParams *params) 这是个单纯的函数,不在任何类里面,函数内容在下一行
                 infer_server::DecodeService::Instance().Create(vdec, params);
                   DecodeService::Create里面有下面两行
                       IDecoder *decoder_ = CreateDecoder()先创建DecoderAcl                    
                       decoder_->Create(params)然后调用DecoderAcl 的create函数,   
                       再往下就是硬件相关的各种类的init和open函数了,这次英伟达的也是要创建个新的类,然后在这里新类的
                       open或者init函数被调用    

1.1.3 各个类的process函数调用层次关系

process分两个方向,一个是从上到下送解码数据的,另一个是解码完之后获取解码数据,然后从下往上回传的。先看从上到下送frame的流程

cpp 复制代码
bool FileHandlerImpl::Process() {
parser_.Parse();
   impl_->Parse();
      result_->OnParserFrame(&frame);
        DeviceDecoder::Process(VideoEsPacket *pkt) {    
          int VdecSendStream(void *vdec, const VdecStream *stream, int timeout_ms)
          {
             return infer_server::DecodeService::Instance().SendStream(vdec, stream, timeout_ms);
           }
              decoder_->SendStream(stream, timeout_ms);这就已经是DecoderAcl的sendstream了,已经到硬件了。
                 vdec_->Decode(data_ptr, data_size, frame_id, this);vdec_是AclLiteVideoProc
                 再往下就不看了,下面几层类都是硬件解码的。

然后看一下从下往上回传的,就是解码之后的数据一层层传到上层的类。

cpp 复制代码
VideoDecoder::DvppVdecCallbackV2(hi_video_frame_info *frame, void *userdata) {
  CallBackVdec(const std::shared_ptr<acllite::ImageData> decoded_image, uint32_t channel_id, uint32_t frame_id, v
     decoder->OnFrame(decoded_image, channel_id, frame_id);这个decoder就是DecoderAcl类,
        create_params_.OnFrame(surf, create_params_.userdata);
            OnFrame_(BufSurface *surf, void *userdata)class DeviceDecoder类
                result_->OnDecodeFrame(wrapper);
                    FileHandlerImpl::OnDecodeFrame

1.2 编写视频解码代码

从上面分析可以知道,需要写一个类,替换掉之前的底层硬件解码类,在jetson上,用jetson-ffmpeg做解码,jetson-ffmpeg会调用硬件加速。

1.2.1 修改VideoInfo结构体定义

首先修改一个结构体的定义,因为之前的cnstream中解码不是用ffmpeg解码的,他只是用ffmpeg解封装,所以这个结构体定义是这样的modules/source/src/video_parser.hpp,他由于不用ffmpeg解码所以不需要 AVCodecParameters* codecpar = nullptr成员。

cpp 复制代码
namespace cnstream {

struct VideoInfo {
  AVCodecID codec_id;
#ifdef HAVE_FFMPEG_AVDEVICE  // for usb camera
  int format;
  int width;
  int height;
#endif
  int progressive;
  std::vector<unsigned char> extra_data;
};

...其他代码...

然后在解码的demo那里,easydk/samples/simple_demo/common/video_parser.h,还有一个结构体的定义。

cpp 复制代码
struct VideoInfo {
  AVCodecID codec_id = AV_CODEC_ID_NONE;
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
  AVCodecParameters* codecpar = nullptr;
#endif
  AVCodecContext* codec_ctx = nullptr;
  std::vector<uint8_t> extra_data{};
  int width = 0;
  int height = 0;
  int progressive = 0;
};

所以我这里要修改一下这个结构体的定义,把第一个的结构体定义改成下面的格式。

cpp 复制代码
namespace cnstream {

struct VideoInfo {
  AVCodecID codec_id;
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
  AVCodecParameters* codecpar = nullptr;
#endif
  AVCodecContext* codec_ctx = nullptr;
  std::vector<unsigned char> extra_data;
  int width;
  int height;
  int progressive;
#ifdef HAVE_FFMPEG_AVDEVICE  // for usb camera
  int format;
#endif
};

1.2.2 修改解封装代码

然后解封装那里的代码,需要修改一下给AVCodecParameters* codecpar成员赋值。参考CNStream中easydk/samples/simple_demo/common/video_parser.cpp的代码去修改NVStream中modules/source/src/video_parser.cpp的代码,

cpp 复制代码
       int Open(const std::string& url, IParserResult* result, bool only_key_frame = false) {
            std::unique_lock<std::mutex> guard(mutex_);
            if (!result) return -1;
            result_ = result;
            // format context
            fmt_ctx_ = avformat_alloc_context();
            if (!fmt_ctx_) {
                return -1;
            }
            url_name_ = url;

            AVInputFormat* ifmt = NULL;
            // for usb camera
#ifdef HAVE_FFMPEG_AVDEVICE
            const char* usb_prefix = "/dev/video";
            if (0 == strncasecmp(url_name_.c_str(), usb_prefix, strlen(usb_prefix))) {
                // open v4l2 input
#if defined(__linux) || defined(__unix)
                ifmt = av_find_input_format("video4linux2");
                if (!ifmt) {
                    LOGE(SOURCE) << "[" << stream_id_ << "]: Could not find v4l2 format.";
                    return false;
                }
#elif defined(_WIN32) || defined(_WIN64)
                ifmt = av_find_input_format("dshow");
                if (!ifmt) {
                    LOGE(SOURCE) << "[" << stream_id_ << "]: Could not find dshow.";
                    return false;
                }
#else
                LOGE(SOURCE) << "[" << stream_id_ << "]: Unsupported Platform";
                return false;
#endif
            }
#endif
            int ret_code;
            const char* p_rtsp_start_str = "rtsp://";
            if (0 == strncasecmp(url_name_.c_str(), p_rtsp_start_str, strlen(p_rtsp_start_str))) {
                AVIOInterruptCB intrpt_callback = { InterruptCallBack, this };
                fmt_ctx_->interrupt_callback = intrpt_callback;
                last_receive_frame_time_ = GetTickCount();
                // options
                av_dict_set(&options_, "buffer_size", "1024000", 0);
                av_dict_set(&options_, "max_delay", "500000", 0);
                av_dict_set(&options_, "stimeout", "20000000", 0);
                av_dict_set(&options_, "rtsp_flags", "prefer_tcp", 0);
                rtsp_source_ = true;
            }
            else {
                // options
                av_dict_set(&options_, "buffer_size", "1024000", 0);
                av_dict_set(&options_, "max_delay", "500000", 0);
            }

            // open input
            ret_code = avformat_open_input(&fmt_ctx_, url_name_.c_str(), ifmt, &options_);
            if (0 != ret_code) {
                LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't open input stream -- " << url_name_;
                return -1;
            }
            // find video stream information
            ret_code = avformat_find_stream_info(fmt_ctx_, NULL);
            if (ret_code < 0) {
                LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't find stream information -- " << url_name_;
                return -1;
            }

            // fill info
            VideoInfo video_info;
            VideoInfo* info = &video_info;
            int video_index = -1;
            AVStream* st = nullptr;
            for (uint32_t loop_i = 0; loop_i < fmt_ctx_->nb_streams; loop_i++) {
                st = fmt_ctx_->streams[loop_i];
#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
                if (st->codecpar->codec_type == AVMEDIA_TYPE_VIDEO) {
#else
                if (st->codec->codec_type == AVMEDIA_TYPE_VIDEO) {
#endif
                    video_index = loop_i;
                    break;
                }
            }
            if (video_index == -1) {
                LOGI(SOURCE) << "[" << stream_id_ << "]: Couldn't find a video stream -- " << url_name_;
                return -1;
            }
            video_index_ = video_index;
            
            info->width = st->codecpar->width;
            info->height = st->codecpar->height;

#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
            info->codec_id = st->codecpar->codec_id;
            int field_order = st->codecpar->field_order;
#ifdef HAVE_FFMPEG_AVDEVICE  // for usb camera
            info->format = st->codecpar->format;
#endif
#else
            info->codec_id = st->codec->codec_id;
            int field_order = st->codec->field_order;
#ifdef HAVE_FFMPEG_AVDEVICE  // for usb camera
            info->format = st->codec->format;
#endif

#endif

#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
            info->codecpar = fmt_ctx_->streams[video_index_]->codecpar;
#endif
            info->codec_ctx = fmt_ctx_->streams[video_index_]->codec;

            /*At this moment, if the demuxer does not set this value (avctx->field_order == UNKNOWN),
             *   the input stream will be assumed as progressive one.
             */
            switch (field_order) {
            case AV_FIELD_TT:
            case AV_FIELD_BB:
            case AV_FIELD_TB:
            case AV_FIELD_BT:
                info->progressive = 0;
                break;
            case AV_FIELD_PROGRESSIVE:  // fall through
            default:
                info->progressive = 1;
                break;
            }

#if LIBAVFORMAT_VERSION_INT >= FFMPEG_VERSION_3_1
            unsigned char* extradata = st->codecpar->extradata;
            int extradata_size = st->codecpar->extradata_size;
#else
            unsigned char* extradata = st->codec->extradata;
            int extradata_size = st->codec->extradata_size;
#endif

            if (extradata && extradata_size) {
                info->extra_data.resize(extradata_size);
                memcpy(info->extra_data.data(), extradata, extradata_size);
            }
            // bitstream filter
            bsf_ctx_ = nullptr;
            const AVBitStreamFilter *pfilter{};
            if (strstr(fmt_ctx_->iformat->name, "mp4") || strstr(fmt_ctx_->iformat->name, "flv") ||
                strstr(fmt_ctx_->iformat->name, "matroska")) {
                if (AV_CODEC_ID_H264 == info->codec_id) {
                    pfilter = av_bsf_get_by_name("h264_mp4toannexb");
                }
                else if (AV_CODEC_ID_HEVC == info->codec_id) {
                    pfilter = av_bsf_get_by_name("hevc_mp4toannexb");
                }
                else {
                    pfilter = nullptr;
                }
            }
            if (pfilter == nullptr) {
                bsf_ctx_ = nullptr;
            }
            else {
                av_bsf_alloc(pfilter, &bsf_ctx_);
            }

            if (result_) {
                result_->OnParserInfo(info);
            }
            av_init_packet(&packet_);
            first_frame_ = true;
            eos_reached_ = false;
            open_success_ = true;
            only_key_frame_ = only_key_frame;
            return 0;
        }

1.2.3 decode_impl_nv.hpp

创建个新的类,用来做视频解码,初步代码如下,后面编译以及运行的时候有错误再修改。

cpp 复制代码
#ifndef _DECODE_IMPL_NV_HPP_
#define _DECODE_IMPL_NV_HPP_

#include <atomic>
#include <string>

#include <libavformat/avformat.h>
#include <libavcodec/avcodec.h>
#include "../decode_impl.hpp"

namespace infer_server {

    class DecodeFFmpeg  : public IDecoder {
    public:
        DecodeFFmpeg()  = default;
        ~DecodeFFmpeg() = default;
        int Create(VdecCreateParams *params) override;
        int Destroy() override;
        int SendStream(const VdecStream *stream, int timeout_ms) override;
        void OnFrame(AVFrame *av_frame_, uint32_t frame_id) override;
        void OnEos() override;
        void OnError(int errcode) override;

    private:
        void ResetFlags();

    private:
        std::atomic<bool> eos_sent_{false};  // flag for acl eos has been sent to decoder
        std::atomic<bool> created_{false};
        VdecCreateParams create_params_;
        AVCodec *av_codec_;
        AVCodecContext* codec_context_{nullptr};
        AVFrame *av_frame_ = nullptr;
        void *transformer_{};
};


}  // namespace 
#endif // DECODE_IMPL_NV_HPP

1.2.4 decode_impl_nv.cpp

cpp 复制代码
#include <iostream>

#include "glog/logging.h"
#include "decode_impl_nv.hpp"
#include "defer.hpp"


namespace infer_server {

    bool DecodeFFmpeg::Create(VdecCreateParams *params) {
        create_params_ = *params;
        
        switch (params->type) {
        case VDEC_TYPE_H264:
            av_codec_ = avcodec_find_decoder(AV_CODEC_ID_H264);
            break;
        case VDEC_TYPE_H265:
            av_codec_ = avcodec_find_decoder(AV_CODEC_ID_H265);
            break;
        case VDEC_TYPE_JPEG:
        default:
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Create(): Unsupported codec type: " << create_params_.type;
            return -1;
        }

        if (!av_codec_) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] avcodec_find_decoder failed";
            return false;
        }

        codec_context_ = avcodec_alloc_context3(av_codec_);
        if (!codec_context_) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Failed to do avcodec_alloc_context3";
            return false;
        }


        AVDictionary *decoder_opts = nullptr;
        defer([&decoder_opts] {
            if (decoder_opts) av_dict_free(&decoder_opts);
        });

        av_dict_set_int(&decoder_opts, "device_id", 0, 0);

        if (avcodec_open2(codec_context_, av_codec_,  &decoder_opts) < 0) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Failed to open codec";
            return false;
        }
        av_frame_ = av_frame_alloc();
        if (!av_frame_) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] Could not alloc frame";
            return false;
        } 

        ResetFlags();

        //待修改
        //这里还需要增加创建transform用来做图像缩放的代码。

        created_ = true;

        return true;
    }


    int DecodeFFmpeg::SendStream(const VdecStream *stream, int timeout_ms) {

        if (!created_) {
            LOG(ERROR) << "[InferServer] [DecodeFFmpeg] SendStream(): Decoder is not created";
            return -1;
        }

        if (nullptr == stream || nullptr == stream->bits) {
            if (eos_sent_) {
                LOG(WARNING) << "[InferServer] [DecodeFFmpeg] SendStream(): EOS packet has been send";
                return 0;
            }
             
            AVPacket framePacket = {};
            av_init_packet(&framePacket);
        
            framePacket.data = nullptr;
            framePacket.size = 0;
            
            avcodec_send_packet(decode_, &packet);
            // flush all frames ...
            int ret = 0;
            do {
                ret = avcodec_receive_frame(decode_, av_frame_);
                if(ret >= 0)
                {
                    OnFrame(av_frame_, stream->pts);
                }

            } while (ret >= 0);
            eos_sent_ = true;
            OnEos();
        }
        else{
            if (eos_sent_) {
                LOG(ERROR) << "[InferServer] [DecodeFFmpeg] SendStream(): EOS has been sent, process packet failed, pts:"
                    << stream->pts;
                return -1;
            }
            AVPacket framePacket = {};
            av_init_packet(&framePacket);
        
            framePacket.data = stream.bits;
            framePacket.size = stream.len;

            //开始解码
            int ret = avcodec_send_packet(codec_context_, framePacket);
            if (ret < 0) {
                LOG(ERROR) << "[InferServer] [DecodeFFmpeg] avcodec_send_packet failed, data ptr, size:"
                        << framePacket->data << ", " << framePacket->size;
                return false;
            }
            ret = avcodec_receive_frame(codec_context_, av_frame_);
            OnFrame(av_frame_, stream->pts);
        }
    }


    void DecodeFFmpeg::OnFrame(AVFrame *av_frame_, uint32_t frame_id) {

        BufSurface *surf = nullptr;
        if (create_params_.GetBufSurf(&surf, av_frame_->width, av_frame_->height, CastColorFmt(av_frame_->format),
            create_params_.surf_timeout_ms, create_params_.userdata) < 0) {
            LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): Get BufSurface failed";
            OnError(-1);
            return;
        }
        if (surf->mem_type != BUF_MEMORY_DVPP) {
            LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): BufSurface memory type must be BUF_MEMORY_DVPP";
            return;
        }

        switch (av_frame_->format) {
        case acllite::ImageFormat::YUV_SP_420:
        case acllite::ImageFormat::YVU_SP_420:
            if (surf->surface_list[0].width != av_frame_->width || surf->surface_list[0].height != av_frame_->height) {
                BufSurface transform_src;
                BufSurfaceParams src_param;
                memset(&transform_src, 0, sizeof(BufSurface));
                memset(&src_param, 0, sizeof(BufSurfaceParams));

                src_param.color_format = CastColorFmt(av_frame_->format);
                src_param.data_size = codec_image->size;//待修改。
                src_param.data_ptr = reinterpret_cast<void *>(codec_image->data.get());//待修改。

                VLOG(5) << "[InferServer] [DecoderAcl] OnFrame(): codec_frame: "
                    << " width = " << av_frame_->width
                    << ", height = " << av_frame_->height
                    << ", width stride = " << av_frame_->alignWidth
                    << ", height stride = " << av_frame_->alignHeight;
                VLOG(5) << "[InferServer] [DecoderAcl] OnFrame(): surf->surface_list[0]: "
                    << " width = " << surf->surface_list[0].width
                    << ", height = " << surf->surface_list[0].height
                    << ", width stride = " << surf->surface_list[0].width_stride
                    << ", height stride = " << surf->surface_list[0].height_stride;

                src_param.width = av_frame_->width;
                src_param.height = av_frame_->height;
                src_param.width_stride = codec_image->alignWidth;//待修改。
                src_param.height_stride = codec_image->alignHeight;//待修改。

                transform_src.batch_size = 1;
                transform_src.num_filled = 1;
                transform_src.device_id = create_params_.device_id;
                transform_src.mem_type = BUF_MEMORY_DVPP;
                transform_src.surface_list = &src_param;

                TransformParams trans_params;
                memset(&trans_params, 0, sizeof(trans_params));
                trans_params.transform_flag = TRANSFORM_RESIZE_SRC;

                if (Transform(transformer_, &transform_src, surf, &trans_params) < 0) {
                    LOG(ERROR) << "[InferServer] [DecoderAcl] OnFrame(): Transfrom failed";
                    break;
                }
            }
            else {
                std::chrono::high_resolution_clock::time_point tnow = std::chrono::high_resolution_clock::now();
                CALL_ACL_FUNC(acllite::CopyDataToHostEx(surf->surface_list[0].data_ptr, codec_image->size, codec_image->data.get(), codec_image->size, codec_image->deviceId)
                   , "[DecoderAcl] OnFrame(): copy codec buffer data to surf failed");
                             
                std::chrono::high_resolution_clock::time_point tpost = std::chrono::high_resolution_clock::now();
                //std::cout << "<<<<<<================================ CopyDataToHostEx time = " << std::chrono::duration_cast<std::chrono::duration<double>>(tpost - tnow).count() * 1000 << " ms" << std::endl;
   
            }
            break;
        default:
            break;
        }

        surf->pts = frame_id;
        
        //std::chrono::high_resolution_clock::time_point tnow = std::chrono::high_resolution_clock::now();
        create_params_.OnFrame(surf, create_params_.userdata);
        //std::chrono::high_resolution_clock::time_point tpost = std::chrono::high_resolution_clock::now();
        //std::cout << "<<<<<<================================ create_params_.OnFrame time = " << std::chrono::duration_cast<std::chrono::duration<double>>(tpost - tnow).count() * 1000 << " ms" << std::endl;
    }

    void DecodeFFmpeg::Destroy() {

        if (!created_) {
            LOG(WARNING) << "[InferServer] [DecoderAcl] Destroy(): Decoder is not created";
            return 0;
        }
        
        // if error happened, destroy directly, eos maybe not be transmitted from the decoder
        if (!eos_sent_) {
            SendStream(nullptr, 10000);
        }

        ResetFlags();

        if (av_frame_) {
            av_frame_free(&av_frame_);
            av_frame_ = nullptr;
        }
        if (codec_context_) {
            avcodec_close(codec_context_);
            avcodec_free_context(&codec_context_);
            codec_context_ = nullptr;
        }

        //待修改,还需要增加销毁transform的相关代码。
    }


    DecodeFFmpeg::~DecodeFFmpeg() {
        DecodeFFmpeg::Destroy();
    }

    void DecodeFFmpeg::ResetFlags() {
        eos_sent_ = false;
        created_ = false;
    }

    void DecodeFFmpeg::OnEos() {
        create_params_.OnEos(create_params_.userdata);
    }
   
    void DecodeFFmpeg::OnError(int errcode) {
        //convert the error code
        create_params_.OnError(static_cast<int>(errcode), create_params_.userdata);
    }

}  // namespace 

2 硬件相关的图像格式、内存申请接口、内存释放、内存释放等代码修改

infer_server/include/common/utils.hpp文件内容如下

cpp 复制代码
/*************************************************************************
 * Copyright (C) [2022] by Cambricon, Inc. All rights reserved
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 *************************************************************************/

#ifndef COMMON_UTILS_HPP_
#define COMMON_UTILS_HPP_

#include <string>

#include <glog/logging.h>

#include "buf_surface.h"

#include "AclLite/AclLite.h"

#define _SAFECALL(func, expected, msg, ret_val)                                                     \
  do {                                                                                              \
    int _ret = (func);                                                                              \
    if ((expected) != _ret) {                                                                       \
      LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg;        \
      return (ret_val);                                                                             \
        }                                                                                               \
    } while (0)

#define ACL_SAFECALL(func, msg, ret_val) _SAFECALL(func, acllite::ACLLITE_OK, msg, ret_val)

#define _CALLFUNC(func, expected, msg)                                                     \
  do {                                                                                              \
    int _ret = (func);                                                                              \
    if ((expected) != _ret) {                                                                       \
      LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg;        \
                }                                                                                               \
        } while (0)

#define CALL_ACL_FUNC(func, msg) _CALLFUNC(func, acllite::ACLLITE_OK, msg)

inline BufSurfaceMemType CastMemoryType(acllite::MemoryType type) noexcept{
    switch (type) {
#define RETURN_MEMORY_TYPE(type) \
  case acllite::MemoryType::type:         \
    return BUF_##type;
        RETURN_MEMORY_TYPE(MEMORY_HOST)
            RETURN_MEMORY_TYPE(MEMORY_DEVICE)
            RETURN_MEMORY_TYPE(MEMORY_DVPP)
            RETURN_MEMORY_TYPE(MEMORY_NORMAL)
#undef RETURN_MEMORY_TYPE
  default:
      LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
      return BUF_MEMORY_HOST;
    }
}

inline acllite::MemoryType CastMemoryType(BufSurfaceMemType type) noexcept{
    switch (type) {
#define RETURN_MEMORY_TYPE(type)    \
  case BUF_##type: \
    return acllite::MemoryType::type;
        RETURN_MEMORY_TYPE(MEMORY_HOST)
            RETURN_MEMORY_TYPE(MEMORY_DEVICE)
            RETURN_MEMORY_TYPE(MEMORY_DVPP)
            RETURN_MEMORY_TYPE(MEMORY_NORMAL)
#undef RETURN_MEMORY_TYPE
  default:
      LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
      return acllite::MemoryType::MEMORY_HOST;
    }
}

inline BufSurfaceColorFormat CastColorFmt(acllite::ImageFormat format) {
    static std::map<acllite::ImageFormat, BufSurfaceColorFormat> color_map{
        { acllite::ImageFormat::YUV_SP_420, BUF_COLOR_FORMAT_NV12 },
        { acllite::ImageFormat::YVU_SP_420, BUF_COLOR_FORMAT_NV21 },
        { acllite::ImageFormat::RGB_888, BUF_COLOR_FORMAT_RGB },
        { acllite::ImageFormat::BGR_888, BUF_COLOR_FORMAT_BGR },
    };
    return color_map[format];
}

inline acllite::ImageFormat CastColorFmt(BufSurfaceColorFormat format) {
    static std::map<BufSurfaceColorFormat, acllite::ImageFormat> color_map{
        { BUF_COLOR_FORMAT_NV12, acllite::ImageFormat::YUV_SP_420 },
        { BUF_COLOR_FORMAT_NV21, acllite::ImageFormat::YVU_SP_420 },
        { BUF_COLOR_FORMAT_RGB, acllite::ImageFormat::RGB_888 },
        { BUF_COLOR_FORMAT_BGR, acllite::ImageFormat::BGR_888 },
    };
    return color_map[format];
}

#endif  // COMMON_UTILS_HPP_

上面这个文件内容目前是改成了华为acl相关的,现在把整个成功移植到英伟达的Jetson,那么这个文件的内容也要修改,另外,整个工程中所有用到了ACL_SAFECALL和CALL_ACL_FUNC的地方,不仅要把ACL_SAFECALL和CALL_ACL_FUNC这两个名字改掉,还要把用到这两个名字的硬件相关接口全都修改掉,比如

比如这张截图中,所有的这些acllite::AclLiteMalloc acllite::AclLiteFree acllite::AclLiteMemcpy这些接口都要相应的改成英伟达Jetson平台的接口。

2.1 infer_server/include/common/utils.hpp文件内容修改

cpp 复制代码
/*************************************************************************
 * Copyright (C) [2022] by Cambricon, Inc. All rights reserved
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 *************************************************************************/

#ifndef COMMON_UTILS_HPP_
#define COMMON_UTILS_HPP_

#include <string>

#include <glog/logging.h>

#include "buf_surface.h"

#include "cuda_runtime.h"
#include "nvcv/ImageFormat.h"

#define _SAFECALL(func, expected, msg, ret_val)                                                     \
  do {                                                                                              \
    int _ret = (func);                                                                              \
    if ((expected) != _ret) {                                                                       \
      LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg;        \
      return (ret_val);                                                                             \
        }                                                                                               \
    } while (0)

#define CUDA_SAFECALL(func, msg, ret_val) _SAFECALL(func, cudaSuccess, msg, ret_val)

#define _CALLFUNC(func, expected, msg)                                                     \
  do {                                                                                              \
    int _ret = (func);                                                                              \
    if ((expected) != _ret) {                                                                       \
      LOG(ERROR) << "[InferServer] Call [" << #func << "] failed, ret = " << _ret << ". " << msg;        \
                }                                                                                               \
        } while (0)

#define CALL_CUDA_FUNC(func, msg) _CALLFUNC(func, cudaSuccess, msg)

inline BufSurfaceMemType CastMemoryType(cudaMemoryType type) noexcept{
    switch (type) {
    case cudaMemoryTypeUnregistered:
        return BUF_MEMORY_UNREGISTERED;
    case cudaMemoryTypeHost:
        return BUF_MEMORY_HOST;
    case cudaMemoryTypeDevice:
        return BUF_MEMORY_DEVICE;
    case cudaMemoryTypeManaged:
        return BUF_MEMORY_MANAGED;
  default:
      LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
      return BUF_MEMORY_HOST;
    }
}

inline cudaMemoryType CastMemoryType(BufSurfaceMemType type) noexcept{
    switch (type) {
    case BUF_MEMORY_UNREGISTERED:
        return cudaMemoryTypeUnregistered;
    case BUF_MEMORY_HOST:
        return cudaMemoryTypeHost;
    case BUF_MEMORY_DEVICE:
        return cudaMemoryTypeDevice;
    case BUF_MEMORY_MANAGED:
        return cudaMemoryTypeManaged;
  default:
      LOG(ERROR) << "[InferServer] CastMemoryType(): Unsupported memory type";
      return cudaMemoryTypeHost;
    }
}

inline BufSurfaceColorFormat CastColorFmt(NVCVImageFormat format) {
    static std::map<NVCVImageFormat, BufSurfaceColorFormat> color_map{
        { NVCV_IMAGE_FORMAT_NV12, BUF_COLOR_FORMAT_NV12 },
        { NVCV_IMAGE_FORMAT_NV21, BUF_COLOR_FORMAT_NV21 },
        { NVCV_IMAGE_FORMAT_RGB8, BUF_COLOR_FORMAT_RGB },
        { NVCV_IMAGE_FORMAT_BGR8, BUF_COLOR_FORMAT_BGR },
    };
    return color_map[format];
}

inline NVCVImageFormat CastColorFmt(BufSurfaceColorFormat format) {
    static std::map<BufSurfaceColorFormat, NVCVImageFormat> color_map{
        { BUF_COLOR_FORMAT_NV12, NVCV_IMAGE_FORMAT_NV12 },
        { BUF_COLOR_FORMAT_NV21, NVCV_IMAGE_FORMAT_NV21 },
        { BUF_COLOR_FORMAT_RGB, NVCV_IMAGE_FORMAT_RGB8 },
        { BUF_COLOR_FORMAT_BGR, NVCV_IMAGE_FORMAT_BGR8 },
    };
    return color_map[format];
}

#endif  // COMMON_UTILS_HPP_

2.2 cuda的四种内存

在CUDA编程中,内存类型是指定数据应该存储在哪种类型的内存中的关键概念。CUDA支持多种内存类型,每种类型都有其特定的用途和性能特点。以下是你提到的几种CUDA内存类型的解释和区别:

  1. cudaMemoryTypeUnregistered

    • 这个枚举值表示内存类型未注册。在CUDA中,通常不会直接使用这个值,因为它表示内存没有被明确指定为主机或设备内存。
  2. cudaMemoryTypeHost

    • 这表示内存是分配在主机(CPU)上的。主机内存可以被CPU直接访问,但GPU访问它通常较慢,因为需要通过PCIe总线进行数据传输。这种内存类型适用于需要CPU频繁访问的数据,或者在CPU和GPU之间需要频繁传输数据的场景。
  3. cudaMemoryTypeDevice

    • 这表示内存是分配在设备(GPU)上的。设备内存只能被GPU直接访问,CPU访问它需要通过CUDA的内存复制操作。这种内存类型适用于GPU计算密集型任务,因为数据已经在GPU上,可以减少数据传输的开销。
  4. cudaMemoryTypeManaged

    • 这表示内存是"统一内存",即由CUDA统一管理的内存。在这种内存类型下,数据可以同时被CPU和GPU访问,而不需要显式的数据复制操作。CUDA运行时会自动处理数据在主机和设备之间的迁移。这种内存类型简化了内存管理,但可能会引入额外的性能开销,因为运行时需要决定何时以及如何迁移数据。

区别总结:

  • cudaMemoryTypeHostcudaMemoryTypeDevice 提供了明确的内存位置,分别对应CPU和GPU,适用于对性能有明确要求的场景。
  • cudaMemoryTypeManaged 提供了更简单的编程模型,但可能会牺牲一些性能,因为数据迁移是自动和隐式的。
  • cudaMemoryTypeUnregistered 通常不用于实际编程,它更多是一个占位符,表示内存类型尚未确定。

2.3 infer_server/src/core/device.cpp修改

cpp 复制代码
/*************************************************************************
* Copyright (C) [2020] by Cambricon, Inc. All rights reserved
*
*  Licensed under the Apache License, Version 2.0 (the "License");
*  you may not use this file except in compliance with the License.
*  You may obtain a copy of the License at
*
*     http://www.apache.org/licenses/LICENSE-2.0
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
* OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
* THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*************************************************************************/

#include <glog/logging.h>

#include "nvis/infer_server.h"

namespace infer_server {
    cudaMemcpyKind GetMemcpyKind(BufSurfaceMemType src_mem_type, BufSurfaceMemType dst_mem_type) {
        // 确保未注册内存不会被使用
        assert(src_mem_type != BUF_MEMORY_UNREGISTERED);
        assert(dst_mem_type != BUF_MEMORY_UNREGISTERED);

        // 根据源和目标内存类型确定 cudaMemcpyKind
        if (src_mem_type == BUF_MEMORY_HOST && dst_mem_type == BUF_MEMORY_HOST) {
            return cudaMemcpyHostToHost;
        }
        else if (src_mem_type == BUF_MEMORY_HOST && dst_mem_type == BUF_MEMORY_DEVICE) {
            return cudaMemcpyHostToDevice;
        }
        else if (src_mem_type == BUF_MEMORY_DEVICE && dst_mem_type == BUF_MEMORY_HOST) {
            return cudaMemcpyDeviceToHost;
        }
        else if (src_mem_type == BUF_MEMORY_DEVICE && dst_mem_type == BUF_MEMORY_DEVICE) {
            return cudaMemcpyDeviceToDevice;
        }
        else if (src_mem_type == BUF_MEMORY_MANAGED || dst_mem_type == BUF_MEMORY_MANAGED) {
            // 管理内存可以视为主机或设备内存
            return cudaMemcpyDefault;
        }

        // 默认情况下返回 cudaMemcpyDefault
        return cudaMemcpyDefault;
    }

    bool SetCurrentDevice(int device_id) noexcept{
        CUDA_SAFECALL(cudaSetDevice(device_id), "Set device failed", false);
        VLOG(3) << "[InferServer] SetCurrentDevice(): Set device [" << device_id << "] for this thread";
        return true;
    }

    uint32_t TotalDeviceCount() noexcept{
        uint32_t dev_cnt;
        CUDA_SAFECALL(cudaGetDeviceCount(dev_cnt), "Set device failed", 0);
        return dev_cnt;
    }

    bool CheckDevice(int device_id) noexcept{
        uint32_t dev_cnt;
        CUDA_SAFECALL(cudaGetDeviceCountdev_cnt), "Check device failed", false);
        return device_id < static_cast<int>(dev_cnt) && device_id >= 0;
    }

    void* MallocDeviceMem(size_t size) noexcept{
        void *device_ptr{};
        CUDA_SAFECALL(cudaMallocManaged(&device_ptr, size), "Malloc device memory failed", nullptr);
        return device_ptr;
    }

    int FreeDeviceMem(void *p) noexcept{
        CUDA_SAFECALL(cudaFree(p), "Free device memory failed", -1);
        return 0;
    }

    void* AllocHostMem(size_t size) noexcept{
        void *host_ptr{};
        CUDA_SAFECALL(cudaMallocHost(&host_ptr, size), "Malloc host memory failed", nullptr);
        return host_ptr;
    }

    int FreeHostMem(void *p) noexcept{
        CUDA_SAFECALL(cudaFreeHost(p), "Free host memory failed", -1);
    }

    int MemcpyHD(void* dst, BufSurfaceMemType dst_mem_type, void* src, BufSurfaceMemType src_mem_type, size_t size) noexcept{
        cudaMemcpyKind cpy_type = GetMemcpyKind(src_mem_type, dst_mem_type);
        CUDA_SAFECALL(cudaMemcpy(dst, src, size, cpy_type), "Memcpy HD failed", -1);
        return 0;
    }

    bool IsItegratedGPU(int device_id) {
        static int s_integrated = [device_id]() {
            cudaDeviceProp prop;
            CUDA_SAFECALL(cudaGetDeviceProperties(&prop, device_id));
        };
        return s_integrated == 1;
    }

    int GetCurrentDevice(int& device_id) noexcept{
        CUDA_SAFECALL(cudaGetDevice(&device_id));
        return 0;
    }

}  // namespace infer_server

3 图像缩放、裁剪、色域转换等代码编写----利用CV-CUDA

3.1 nvstream中图像处理代码流程框架

和前面一样,先大体看一下nvstream中图像处理的代码流程框架。

3.1.1 类的层次关系

cpp 复制代码
class TransformService
    class ITransformer 只是个基类,被用来继承的,
    class TransformerAcl : public ITransformer 然后就是硬件处理的了
        AclLiteImageProc

现在应该是直接把TransformerAcl 和 AclLiteImageProc合成一个英伟达的类。

3.1.2 各个类的初始化函数调用层次关系

cpp 复制代码
在视频解码类的create函数里面,或者在算法的Preproc都会有这样一行
if (TransformCreate(&transformer_, &config) != 0)这个transformer_ 是视频解码类或者算法预处理类的一个成员。
    这个create函数是这样的,
    int TransformCreate(void **transformer, TransformConfigParams *params) {
        return infer_server::TransformService::Instance().Create(transformer, params);
    }
    然后TransformService类的create函数里面有         
        ITransformer *transformer_ = CreateTransformer(); CreateTransformer里面就是new TransformerAcl也就是具体的硬件处理类了    
        transformer_->Create(params)
        *transformer = transformer_;回传给解码的那个类或者算法预处理类,

3.1.3 各个类的transform函数调用层次关系

cpp 复制代码
具体硬件解码处理类的OnFrame函数里面会有一个
Transform(transformer_, &transform_src, surf, &trans_params)
    应该是调用了这个
    int Transform(void *transformer, BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        return infer_server::TransformService::Instance().Transform(transformer, src, dst, transform_params);
    }
        然后到了这里
        int Transform(void *transformer, BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
            if (!dst || !src || !transform_params) {
                LOG(ERROR) << "[InferServer] [TransformService] Transform(): src, dst BufSurface or parameters pointer is invalid";
                return -1;
            }

            ITransformer *transformer_ = static_cast<ITransformer *>(transformer); 那个具体的硬件处理类就是继承的ITransformer 
            return transformer_->Transform(src, dst, transform_params);这就到了具体硬件transform类的函数了,
        }

3.2 编写图像缩放、裁剪、色域转换等代码

3.2.1 infer_server/src/nv/transform_impl_nv.hpp

cpp 复制代码
#ifndef TRANSFORM_IMPL_NV_HPP_
#define TRANSFORM_IMPL_NV_HPP_

#include <algorithm>
#include <cstring>  // for memset
#include <atomic>

#include "../transform_impl.hpp"
#include "transform.h"

namespace infer_server {

    class TransformerNV : public ITransformer {
    public:
        TransformerNV() {

        }
        ~TransformerNV() = default;

        int Create(TransformConfigParams *params) override;
        int Destroy() override;
        int Transform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) override;

    private:
        int DoNVTransform(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVCrop(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVCropResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVCropResizePaste(BufSurface *src, BufSurface *dst, TransformParams *transform_params);
        int NVConvertFormat(BufSurface *src, BufSurface *dst, TransformParams *transform_params);

    private:
        TransformConfigParams create_params_;
        cudaStream_t* cu_stream_{nullptr};
        std::shared_ptr<nvcv::ITensor> crop_tensor_;
        std::shared_ptr<nvcv::ITensor> resized_tensor_;
        std::shared_ptr<nvcv::ITensor> cvtcolor_tensor_;
        std::shared_ptr<nvcv::ITensor> copymakeborder_tensor_;

        std::shared_ptr<cvcuda::CustomCrop> crop_op_;    
        std::shared_ptr<cvcuda::Resize> resize_op_;
        std::shared_ptr<cvcuda::CvtColor> cvtcolor_op_;
        std::shared_ptr<cvcuda::CopyMakeBorder> copymakeborder_op_;
        

        std::atomic<bool> created_{};
    };

}  // namespace 

#endif  // TRANSFORM_IMPL_MLU370_HPP_

3.2.2 infer_server/src/nv/transform_impl_nv.cpp

cpp 复制代码
#include "transform_impl_nv.hpp"

#include <algorithm>
#include <atomic>
#include <cstring>  // for memset
#include <map>
#include <memory>
#include <string>
#include <vector>

#include "glog/logging.h"

namespace infer_server {
    int TransformerNV::Create(TransformConfigParams *params) {
        create_params_ = *params;
        
        if(nullptr == cu_stream_)
        {
            cuCreateStream(&cu_stream_, create_params_.device_id);
        }
        
        if (!crop_op_){
            crop_op_ = std::make_shared<cvcuda::CustomCrop>();
        }
        
        if (!cvtcolor_op_){
            cvtcolor_op_ = std::make_shared<cvcuda::CvtColor>();
        }

        if (!resize_op_){
            resize_op_ = std::make_shared<cvcuda::Resize>();
        }

        if (!copymakeborder_op_){
            copymakeborder_op_ = std::make_shared<cvcuda::CopyMakeBorder>();
        }

        created_ = true;
        return 0;
    }

    int TransformerNV::Destroy() {
        if (!created_) {
            LOG(WARNING) << "[InferServer] [TransformerNV] Destroy(): Transformer is not created";
            return 0;
        }

        if (cu_stream_ != nullptr) {
            cuDestroyStream(cu_stream_);
            cu_stream_ = nullptr;
        }

        cvtcolor_op_.reset();
        resize_op_.reset();
        crop_op_.reset();
        copymakeborder_op_.reset();

        return 0;
    }

    int TransformerNV::Transform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        if (!created_) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): transformer is not created";
            return -1;
        }

        if (src->num_filled > dst->batch_size) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The number of inputs exceeds batch size: "
                << src->num_filled << " v.s. " << dst->batch_size;
            return -1;
        }

        if (src->device_id != dst->device_id || src->device_id != create_params_.device_id) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The device id of src, dst and transformer is not the same: src device: " << src->device_id
                << " , dst device: " << dst->device_id
                << " , transformer device: " << create_params_.device_id;
            return -1;
        }

        if (src->mem_type != BUF_MEMORY_DVPP) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): The src and dst mem_type must be BUF_MEMORY_DVPP";
            return -1;
        }

        if (src->surface_list[0].data_size == 0) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): Input data size is 0";
            return -1;
        }

        return DoNVTransform(src, dst, transform_params);
    }

    int TransformerNV::DoNVTransform(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        switch (transform_params->transform_flag) {
        case TRANSFORM_RESIZE_SRC:
            return NVResize(src, dst, transform_params);
            break;
        case TRANSFORM_CROP_SRC:
            return NVCrop(src, dst, transform_params);
            break;
        case TRANSFORM_CROP_RESIZE_SRC:
            return NVCropResize(src, dst, transform_params);
            break;
        case TRANSFORM_CROP_RESIZE_PASTE_SRC:
            return NVCropResizePaste(src, dst, transform_params);
            break;
        default:
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): Transform flag not supported currently, flag = " << transform_params->transform_flag;
            return -1;
        }
    }

    int TransformerNV::NVResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        if (src->batch_size != 1 || dst->batch_size != 1) {
            LOG(ERROR) << "[InferServer] [TransformerNV] Transform(): NVResize now only support src/dst one image, src batch size = " << src->batch_size <<" , dst batch size = " << dst->batch_size;
            return -1;
        }

        auto& src_surf = src->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);
    
        if (!resized_tensor_){
            resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {dst.width_stride, src_surf.height}, nvcv::FMT_BGR8));
        }

        (*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *resized_tensor_, NVCV_INTERP_LINEAR);

        auto& dst_surf = dst->surface_list[0];
        auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        resized_tensor_.reset();
        
        return 0;
    }

    int TransformerNV::NVCrop(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        auto& src_surf = src->surface_list[0];
        auto& dst_surf = dst->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);
    
        if (!crop_tensor_){
            crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {dst_surf.width, dst_surf.height}, nvcv::FMT_BGR8));
        }
        
        TransformRect rect = transform_params->src_rect[0];
        rect.left = rect.left >= src_surf.width ? 0 : rect.left;
        rect.top = rect.top >= src_surf.height ? 0 : rect.top;
        rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
        rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
        NVCVRectI crpRect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };

        (*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crpRect);


        auto out_data = crop_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        crop_tensor_.reset();
        
        return 0;
    }

    int TransformerNV::NVCropResize(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {

        auto& src_surf = src->surface_list[0];
        auto& dst_surf = dst->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);

        
        TransformRect rect = transform_params->src_rect[0];
        rect.left = rect.left >= src_surf.width ? 0 : rect.left;
        rect.top = rect.top >= src_surf.height ? 0 : rect.top;
        rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
        rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
        NVCVRectI crpRect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };
            
        if (!crop_tensor_){
            crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {rect.width, rect.height}, nvcv::FMT_BGR8));
        }

        (*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crpRect);
        
        if (!resized_tensor_){
            resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, { dst_surf.width, dst_surf.height }, nvcv::FMT_BGR8));
        }

        (*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *crop_tensor_, *resized_tensor_, NVCV_INTERP_LINEAR);

        auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        crop_tensor_.reset();
        resized_tensor_.reset();

        return 0;
    }

    int TransformerNV::NVCropResizePaste(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {
        auto& src_surf = src->surface_list[0];
        auto& dst_surf = dst->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);

        
        TransformRect rect = transform_params->src_rect[0];
        rect.left = rect.left >= src_surf.width ? 0 : rect.left;
        rect.top = rect.top >= src_surf.height ? 0 : rect.top;
        rect.width = rect.width <= 0 ? (src_surf.width - rect.left) : rect.width;
        rect.height = rect.height <= 0 ? (src_surf.height - rect.top) : rect.height;
        NVCVRectI crop_src_rect = { rect.left, rect.top, rect.left + rect.width - 1, rect.top + rect.height - 1 };

        TransformRect dst_rect = transform_params->dst_rect[0];
        dst_rect.left = dst_rect.left >= src_surf.width ? 0 : dst_rect.left;
        dst_rect.top = dst_rect.top >= src_surf.height ? 0 : dst_rect.top;
        dst_rect.width = dst_rect.width <= 0 ? (src_surf.width - dst_rect.left) : dst_rect.width;
        dst_rect.height = dst_rect.height <= 0 ? (src_surf.height - dst_rect.top) : dst_rect.height;
        NVCVRectI paste_dst_rect(dst_rect.left, dst_rect.top, dst_rect.left + dst_rect.width - 1, dst_rect.top + dst_rect.height - 1);
            
        if (!crop_tensor_){
            crop_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {rect.width, rect.height}, nvcv::FMT_BGR8));
        }

        (*crop_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *crop_tensor_, crop_src_rect);
        
        if (!resized_tensor_){
            resized_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {  dst_rect.width,  dst_rect.height }, nvcv::FMT_BGR8));
        }

        (*resize_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *crop_tensor_, *resized_tensor_, NVCV_INTERP_LINEAR);


        if (!copymakeborder_tensor_){
            copymakeborder_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, { dst_surf.width, dst_surf.height }, nvcv::FMT_BGR8));
        }

       (*copymakeborder_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), *resized_tensor_, *copymakeborder_tensor_, dst_rect.top, dst_rect.left, NVCV_BORDER_CONSTANT, 0);

        auto out_data = copymakeborder_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        crop_tensor_.reset();
        resized_tensor_.reset();
        copymakeborder_tensor_.reset();

        return 0;
    }

    int TransformerNV::NVConvertFormat(BufSurface *src, BufSurface *dst, TransformParams *transform_params) {

        auto& src_surf = src->surface_list[0];

        nvcv::Tensor::Requirements in_reqs = nvcv::Tensor::CalcRequirements(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_NV12);
        nvcv::TensorDataStridedCuda::Buffer in_buf;
        std::copy(in_reqs.strides, in_reqs.strides + NVCV_TENSOR_MAX_RANK, in_buf.strides);
        in_buf.basePtr = reinterpret_cast<NVCVByte *>(src->data_ptr);

        nvcv::TensorDataStridedCuda in_data(nvcv::TensorShape{in_reqs.shape, in_reqs.rank, in_reqs.layout}, nvcv::DataType{in_reqs.dtype}, in_buf);
        nvcv::TensorWrapData in_tensor(in_data);
    
        if (!cvtcolor_tensor_){
            cvtcolor_tensor_ = std::shared_ptr<nvcv::Tensor>(new nvcv::Tensor(1, {src_surf.width_stride, src_surf.height}, nvcv::FMT_BGR8));
        }

        (*cvtcolor_op_)(reinterpret_cast<cudaStream_t>(cu_stream_), in_tensor, *cvtcolor_tensor_, NVCV_COLOR_YUV2BGR_NV12);

        auto& dst_surf = dst->surface_list[0];
        auto out_data = resized_tensor_->exportData<nvcv::TensorDataStridedCuda>();

        cudaMemcpyAsync(dst_surf.data_ptr, (const unsigned char *)out_data->basePtr(), dst_surf.data_size, cudaMemcpyDeviceToHost);
        cuStreamSynchronize(cu_stream_);

        cvtcolor_tensor_.reset();
        return 0;
    }


}  // namespace 

4 算法推理相关代码修改

4.1 ./infer_server/src/model/model.h

cpp 复制代码
/*************************************************************************
 * Copyright (C) [2020] by Cambricon, Inc. All rights reserved
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 *************************************************************************/

#ifndef INFER_SERVER_MODEL_H_
#define INFER_SERVER_MODEL_H_

#include <glog/logging.h>
#include <algorithm>
#include <map>
#include <memory>
#include <mutex>
#include <string>
#include <unordered_map>
#include <vector>

#include "nvis/infer_server.h"
#include "nvis/processor.h"
#include "nvis/shape.h"

namespace trt {
#include "trteng_exp/export_funtions.h"
}

namespace infer_server {
    using TEngine = std::unique_ptr<trt::trtnet_t>;

    class Model;
    class ModelRunner {
    public:
        explicit ModelRunner(int device_id) : device_id_(device_id) {}
        ModelRunner(const ModelRunner& other) = delete;
        ModelRunner& operator=(const ModelRunner& other) = delete;
        ModelRunner(ModelRunner&& other) = default;
        ModelRunner& operator=(ModelRunner&& other) = default;
        ~ModelRunner() = default;

        bool Init(TEngine engine) noexcept;
        Status Run(ModelIO* input, ModelIO* output) noexcept;  // NOLINT

        void SetInputNum(uint32_t  i_num) noexcept{ input_num_ = i_num; }
        void SetOutputNum(uint32_t  o_num) noexcept{ output_num_ = o_num; }

        void SetInputShapes(std::vector<Shape> const& i_shapes) noexcept{ i_shapes_ = i_shapes; }
        void SetOutputShapes(std::vector<Shape> const& o_shapes) noexcept{ o_shapes_ = o_shapes; }

        void SetInputLayouts(std::vector<DataLayout> const& i_layouts) noexcept{ i_layouts_ = i_layouts; }
        void SetOutputLayouts(std::vector<DataLayout> const& o_layouts) noexcept{ o_layouts_ = o_layouts; }

    private:
        TEngine engine_{};

        uint32_t input_num_{};
        uint32_t output_num_{};

        std::vector<Shape> i_shapes_;
        std::vector<Shape> o_shapes_;
        std::vector<DataLayout> i_layouts_;
        std::vector<DataLayout> o_layouts_;

        int device_id_{};
    };  // class RuntimeContext

    class Model : public ModelInfo {
    public:
        Model() = default;
        bool Init(const std::string& model_path) noexcept;
        ~Model();

        bool HasInit() const noexcept{ return has_init_; }

            const Shape& InputShape(int index) const noexcept override{
            CHECK(index < i_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
            return input_shapes_[index];
        }
            const Shape& OutputShape(int index) const noexcept override{
            CHECK(index < o_num_ || index >= 0) << "[InferServer] [Model] Output shape index overflow";
            return output_shapes_[index];
        }
            const DataLayout& InputLayout(int index) const noexcept override{
            CHECK(index < i_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
            return i_layouts_[index];
        }
            const DataLayout& OutputLayout(int index) const noexcept override{
            CHECK(index < o_num_ || index >= 0) << "[InferServer] [Model] Input shape index overflow";
            return o_layouts_[index];
        }
        uint32_t InputNum() const noexcept override{ return i_num_; }
        uint32_t OutputNum() const noexcept override{ return o_num_; }
        uint32_t BatchSize() const noexcept override{ return model_batch_size_; }

        bool FixedOutputShape() noexcept override{ return FixedShape(output_shapes_); }

        std::shared_ptr<ModelRunner> GetRunner(int device_id) noexcept{
            trt::ErrInfo ei{};
            trt::trtnet_t* net = trt::load_net_from_file(model_file_.data(), &ei);
            CHECK(!net) << "trt::load_net_from_file failed: model file: " << te2fullpath << ", error: " << ei.errmsg;
            TEngine engine = TEngine(net, [](trt::trtnet_t* n) { trt::release_net(n); });

            auto runner = std::make_shared<ModelRunner>(device_id);
            runner->SetInputNum(i_num_);
            runner->SetOutputNum(o_num_);
            runner->SetInputShapes(input_shapes_);
            runner->SetOutputShapes(output_shapes_);
            runner->SetInputLayouts(i_layouts_);
            runner->SetOutputLayouts(o_layouts_);     

            if (!runner->Init(std::move(engine))) return nullptr;
            return runner;
        }

        std::string GetKey() const noexcept override{ return model_file_; }

    private:
        bool GetModelInfo(trt::trtnet_t* net) noexcept;
        bool FixedShape(const std::vector<Shape>& shapes) noexcept{
            for (auto &shape : shapes) {
                auto vectorized_shape = shape.Vectorize();
                if (!std::all_of(vectorized_shape.begin(), vectorized_shape.end(), [](int64_t v) { return v > 0; })) {
                    return false;
                }
            }
            return !shapes.empty();
        }

        Model(const Model&) = delete;
        Model& operator=(const Model&) = delete;

    private:
        std::string model_file_;

        std::vector<DataLayout> i_layouts_, o_layouts_;
        std::vector<Shape> input_shapes_, output_shapes_;
        int i_num_{}, o_num_{};
        uint32_t model_batch_size_{ 1 };
        bool has_init_{ false };
    };  // class Model

    // use environment CNIS_MODEL_CACHE_LIMIT to control cache limit
    class ModelManager {
    public:
        static ModelManager* Instance() noexcept;

        void SetModelDir(const std::string& model_dir) noexcept{ model_dir_ = model_dir; }

        ModelPtr Load(const std::string& model_file) noexcept;
        ModelPtr Load(void* mem_ptr, size_t size) noexcept;
        bool Unload(ModelPtr model) noexcept;

        void ClearCache() noexcept;

        int CacheSize() noexcept;

        std::shared_ptr<Model> GetModel(const std::string& name) noexcept;

    private:
        std::string DownloadModel(const std::string& url) noexcept;
        void CheckAndCleanCache() noexcept;

        std::string model_dir_{ "." };

        static std::unordered_map<std::string, std::shared_ptr<Model>> model_cache_;
        static std::mutex model_cache_mutex_;
    };  // class ModelManager

}  // namespace infer_server

#endif  // INFER_SERVER_MODEL_H_

4.2 ./infer_server/src/model/model.cpp

cpp 复制代码
/*************************************************************************
 * Copyright (C) [2020] by Cambricon, Inc. All rights reserved
 *
 *  Licensed under the Apache License, Version 2.0 (the "License");
 *  you may not use this file except in compliance with the License.
 *  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
 * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
 * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 *************************************************************************/

#include "model.h"

#include <glog/logging.h>
#include <algorithm>
#include <memory>
#include <string>
#include <utility>
#include <vector>

#include "core/data_type.h"
#include "common/utils.hpp"

using std::string;
using std::vector;

namespace infer_server {

    bool ModelRunner::Init(TEngine engine) noexcept{
        if (engine == nullptr) return false;
        engine_ = std::move(engine);
        return true;
    }

        Status ModelRunner::Run(ModelIO* in, ModelIO* out) noexcept{  // NOLINT
        auto& input = in->surfs;
        auto& output = out->surfs;
        CHECK_EQ(input_num_, input.size()) << "[InferServer] [ModelRunner] Input number is mismatched";

        VLOG(5) << "[InferServer] [ModelRunner] Process inference once, input num: " << input_num_ << " output num: "
            << output_num_;

        std::vector<trt::NetInoutLayerData> inputData(input_num_);
        std::vector<trt::NetInoutLayerData> outputData(output_num_);

        for (uint32_t i_idx = 0; i_idx < input_num_; ++i_idx) {
            CHECK_EQ(input_num_, input.size()) << "[InferServer] [ModelRunner] Input number is mismatched";

            inputData[i].data = input[i_idx]->GetData(0);
            inputData[i].size = input[i_idx]->GetSize(0);
            inputData[i].layer_idx = i_idx;
        }

        for (uint32_t o_idx = 0; o_idx < output_num_; ++o_idx) {
            CHECK_EQ(output_num_, output.size()) << "[InferServer] [ModelRunner] Output number is mismatched";

            outputData[i].data = input[o_idx]->GetData(0);
            outputData[i].size = input[o_idx]->GetSize(0);
            outputData[i].layer_idx = o_idx;
        }

        uint32_t batchsize = input[0]->GetNumFilled();
        trt::ErrInfo ei{};
        _SAFECALL(trt::net_do_inference(engine_.get(), batchsize, inputData.data(), inputData.size(), outputData.data(), outputData.size(), &ei)
                  , "[InferServer] [ModelRunner] Infer failed.", Status::ERROR_BACKEND);

        return Status::SUCCESS;
    }

        bool Model::Init(const string& model_file) noexcept{
        model_file_ = model_file;

        trt::ErrInfo ei{};
        trt::trtnet_t* net = trt::load_net_from_file(model_file_.data(), &ei);
        CHECK(!net) << "trt::load_net_from_file failed: model file: " << te2fullpath << ", error: " << ei.errmsg;

        has_init_ = GetModelInfo(model);

        VLOG(1) << "[InferServer] [Model] (success) Load model from file: " << model_file_;

        trt::release_net(net);

        return has_init_;
    }

        bool Model::GetModelInfo(trt::trtnet_t* net) noexcept{
        VLOG(1) << "[InferServer] [Model] (success) Load model from graph file: " << model_file_;

        int model_batch_size_ = trt::net_max_batch_size(net);

        // get IO messages
        // get io number and data size
        i_num_ = trt::net_num_inputs(net);
        o_num_ = trt::net_num_outputs(net);

        // get input info
        for (int i = 0; i < ninp; i++) {
            trt::LayerDims ldim{};
            trt::net_input_layer_dims(net, i, &ldim);
            input_shapes_.emplace_back(std::move(Shape({ std::max(ldim.n, model_batch_size_), ldim.c, ldim.h, ldim.w })));

            DataLayout layout;
            layout.dtype = DataType::FLOAT;
            layout.order = DimOrder::NCHW;
            i_layouts_.emplace_back(std::move(layout));
        }

        // get output info
        int noup = net_num_outputs(net);
        runner->SetOutputNum(noup);
        for (int i = 0; i < noup; i++) {
            trt::LayerDims ldim{};
            trt::net_output_layer_dims(net, i, &ldim);
            output_shapes_.emplace_back(std::move(Shape({ std::max(ldim.n, model_batch_size_), ldim.c, ldim.h, ldim.w })));

            DataLayout layout;
            layout.dtype = DataType::FLOAT;
            layout.order = DimOrder::NCHW;
            o_layouts_.emplace_back(std::move(layout));
        }

        VLOG(1) << "[InferServer] [Model] Model Info: input number = " << i_num_ << ";\toutput number = " << o_num_;
        VLOG(1) << "[InferServer] [Model]             batch size = " << model_batch_size_;
        for (int i = 0; i < i_num_; ++i) {
            VLOG(1) << "[InferServer] [Model] ----- input index [" << i;
            VLOG(1) << "[InferServer] [Model]       data type " << detail::DataTypeStr(i_layouts_[i].dtype);
            VLOG(1) << "[InferServer] [Model]       dim order " << detail::DimOrderStr(i_layouts_[i].order);
            VLOG(1) << "[InferServer] [Model]       shape " << input_shapes_[i];
        }
        for (int i = 0; i < o_num_; ++i) {
            VLOG(1) << "[InferServer] [Model] ----- output index [" << i;
            VLOG(1) << "[InferServer] [Model]       data type " << detail::DataTypeStr(o_layouts_[i].dtype);
            VLOG(1) << "[InferServer] [Model]       dim order " << detail::DimOrderStr(o_layouts_[i].order);
            VLOG(1) << "[InferServer] [Model]       shape " << output_shapes_[i];
        }
        return true;
    }

        Model::~Model() {
        VLOG(1) << "[InferServer] [Model] Unload model: " << model_file_;
    }
}  // namespace infer_server

5 其他代码修改

其他的还有很多零碎代码,比如一些命名空间的名字,还有一些其他名称,还有很多文件里面调用的一些函数的名字,太乱了,不写到博客里面了。

参考文献:

在NVIDIA Jetson AGX Orin中使用jetson-ffmpeg调用硬件编解码加速处理-CSDN博客

NVIDIA Jetson AGX Orin源码编译安装CV-CUDA-CSDN博客

GitHub - Cambricon/CNStream: CNStream is a streaming framework for building Cambricon machine learning pipelines http://forum.cambricon.com https://gitee.com/SolutionSDK/CNStream

easydk/samples/simple_demo/common/video_decoder.cpp at master · Cambricon/easydk · GitHub

aclStream流处理多路并发Pipeline框架中 视频解码 代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客

aclStream流处理多路并发Pipeline框架中VEncode Module代码调用流程整理、类的层次关系整理、回调函数赋值和调用流程整理-CSDN博客

FFmpeg/doc/examples at master · FFmpeg/FFmpeg · GitHub

GitHub - CVCUDA/CV-CUDA: CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

如何使用FFmpeg的解码器---FFmpeg API教程 · FFmpeg原理

C++ API --- CV-CUDA Beta documentation (cvcuda.github.io)

CV-CUDA/tests/cvcuda/system at main · CVCUDA/CV-CUDA · GitHub

Resize --- CV-CUDA Beta documentation

CUDA Runtime API :: CUDA Toolkit Documentation

CUDA Toolkit Documentation 12.6 Update 1

相关推荐
qq_316837755 小时前
使用ffmpeg 将图片合成为视频,填充模糊背景,并添加两段音乐
ffmpeg·音视频
林鸿群10 小时前
Mediamtx与FFmpeg远程与本地推拉流使用
ffmpeg
unix2linux2 天前
YOLO v5 Series - FFmpeg & (HTML5 + FLV.js ) & ONNX YOLOv5s Integrating
yolo·ffmpeg·html5
Antonio9152 天前
【音视频】FFmpeg解封装
ffmpeg·音视频
宽容人厚载物2 天前
Jetson Orin NX 部署YOLOv12笔记
嵌入式硬件·yolo·jetson·yolov12·英伟达开发板
Antonio9152 天前
【音视频】FFmpeg内存模型
ffmpeg·音视频
hjjdebug2 天前
全面介绍AVFilter 的添加和使用
ffmpeg·avfilter
邪恶的贝利亚2 天前
基于 FFmpeg 的音视频处理基础原理与实验探究
ffmpeg·音视频
Antonio9153 天前
【音视频】AAC-ADTS分析
ffmpeg·音视频·aac
这被禁忌的游戏3 天前
网页下载的m3u8格式文件使用FFmpeg转为MP4
ffmpeg