深度学习入门(4) -Object Detection 目标检测

Object Detection

Output:

  1. category label from fixed, known set of categories
  2. bounding box (x, y, width, height)

If only one object is needed to be detected -> add FC layer to the Net pretrianed on ImageNet

Sliding Window

apply a CNN to many different crops of the image, CNN classifies each crop as object / backgroud

but too many windows!! and may detect repeatedly

we need region proposals to find a small set of boxes that are likely to cover all the objects

"Selective Search" quick to generate 2000 regions

R-CNN : Region-Based CNN

  1. Region proposals
  2. warped the image to fixed size 224*224
  3. forward each region through ConvNet independently
  4. output a classification score and also a Bbox of 4 numbers, using the following algorithm
Measurement of boxes (IoU)

I o U = Area of Intersection Area of Union IoU = \frac{\text{Area of Intersection}}{\text{Area of Union}} IoU=Area of UnionArea of Intersection

I o U > 0.5 IoU > 0.5 IoU>0.5 is decent

I o U > 0.7 IoU > 0.7 IoU>0.7 pretty good

I o U > 0.9 IoU > 0.9 IoU>0.9 perfect

Overlapping Boxes: Non-Max Suppression (NMS)
  1. select next highest-scoring box
  2. eliminate lower-scoring boxes with IoU>0.7 (with the box we selected in step1)
  3. If any boxes remain goto 1

Evaluating Object Detectors: mAP(Mean Average Precision)

  1. run detector on all test images + NMS

  2. for each category, computer AP = area under precision vs Recall Curve

    复制代码
     1.	for each detection (high -> low)
     	1.	If it matches some GT(Ground-Truth) box with IoU>0.5 mark it as positive and eliminate the GT
     	2.	otherwise mark is as nagative
     	3.	plot a point on PR curve
     2.	AP = area under PR Curve
  3. mAP = average of AP for each category

  4. COCO mAP: compute mAP for each IoU threshold and take average

How to get AP = 1.0 -> hit all GT boxes with IoU > 0.5, no false positive ranked above any true positive

Fast R-CNN

  1. ConvNet (Backbone network)-> convolutional features for entire high resolution image
  2. Regions of Interest (Rols)
  3. Crop + Resize features
  4. Per-Region Network (light-weight -> fast)
  5. output category and box

Cropping Features: Rol Pool

  1. project proposal onto features
  2. snap to gird cells
  3. divide into 2*2 gird of (roughly) equal subregions
  4. max-pool within each subregions
  5. output the region features (always the same size even if we have different sizes of input regions)

Rol Align

Rol Align -> better align to avoid snapping

Faster R-CNN

Insert Region Proposal Network (RPN) to predict proposals from features

after the backbone network -> RPN -> regional proposals

Imagine an anchor box of fixed size at each point in the feature map

At each point predict whether the corresponding anchor contains an object

for positive boxes, also predict a box transform to regress from anchor box to object box

Use k different anchor boxes at each point

Single stage Faster R-CNN

just use anchor to make classification and object boxes predictions

Semantic Segmentation: Fully Convolutional Network

Input -> Convolutions -> Scores C * H * W -> argmax H * W

use cross-entropy loss of every pixel to train the network

Trick: Downsampling and Upsampling

Downsampling : Pooling, strided convolution

Upsampling

Unpooling

Bed of nails : fill 0

Nearest Neighbour: same numbers in small blocks

Bilinear Interpolation

f x , y = ∑ i , j f i , j max ⁡ ( 0 , 1 − ∣ x − i ∣ ) max ⁡ ( 0 , 1 − ∣ y − j ∣ ) f_{x,y} = \sum_{i,j}{f_{i,j} \max(0, 1-|x-i|) \max(0,1-|y-j|)} fx,y=∑i,jfi,jmax(0,1−∣x−i∣)max(0,1−∣y−j∣)

i,j in Nearest neighbours

Use two closest neighbours in x and y to construct linear approximations

Bicubic Interpolation

three closest neighbours in x and y to construct cubic approximation

Max Unpooling
Learnable Upsampling

Mask R-CNN

Just add Conv layers to predict a mask for each of C classes on the region proposals

Panoptic Segmentation

speperate different objects in the same category

Human Keypoints

Represent the pose of a human by locating a set of keypoints

Joint Instance Segmentation and Pose Estimation

-> General Idea: Add Per-Region "Heads" to Faster / Mask R-CNN

Dense captioning -> nlp -> visual reasoning

3D shape prediction ...

相关推荐
狗狗学不会7 分钟前
RK3588 极致性能:使用 Pybind11 封装 MPP 实现 Python 端 8 路视频硬件解码
人工智能·python·音视频
Aevget7 分钟前
Kendo UI for jQuery 2025 Q4新版亮点 - AI 助手持续加持,主力开发更智能
人工智能·ui·jquery·界面控件·kendo ui
北京耐用通信8 分钟前
耐达讯自动化CANopen转Profibus网关在矿山机械RFID读写器应用中的技术分析
人工智能·科技·物联网·自动化·信息与通信
飞睿科技10 分钟前
UWB技术在机器人领域的创新应用与前景
网络·人工智能·机器人·定位技术·uwb技术
空山新雨后、11 分钟前
RAG:搜索引擎与大模型的完美融合
人工智能·搜索引擎·rag
sld16816 分钟前
以S2B2C平台重构快消品生态:效率升级与价值共生
大数据·人工智能·重构
love530love18 分钟前
EPGF 新手教程 21把“环境折磨”从课堂中彻底移除:EPGF 如何重构 AI / Python 教学环境?
人工智能·windows·python·重构·架构·epgf
ldccorpora19 分钟前
Chinese News Translation Text Part 1数据集介绍,官网编号LDC2005T06
数据结构·人工智能·python·算法·语音识别
大学生毕业题目19 分钟前
毕业项目推荐:99-基于yolov8/yolov5/yolo11的肾结石检测识别系统(Python+卷积神经网络)
人工智能·python·yolo·目标检测·cnn·pyqt·肾结石检测
退休钓鱼选手20 分钟前
BehaviorTree行为树 【调试】 5
人工智能·自动驾驶