深度学习入门(4) -Object Detection 目标检测

Object Detection

Output:

  1. category label from fixed, known set of categories
  2. bounding box (x, y, width, height)

If only one object is needed to be detected -> add FC layer to the Net pretrianed on ImageNet

Sliding Window

apply a CNN to many different crops of the image, CNN classifies each crop as object / backgroud

but too many windows!! and may detect repeatedly

we need region proposals to find a small set of boxes that are likely to cover all the objects

"Selective Search" quick to generate 2000 regions

R-CNN : Region-Based CNN

  1. Region proposals
  2. warped the image to fixed size 224*224
  3. forward each region through ConvNet independently
  4. output a classification score and also a Bbox of 4 numbers, using the following algorithm
Measurement of boxes (IoU)

I o U = Area of Intersection Area of Union IoU = \frac{\text{Area of Intersection}}{\text{Area of Union}} IoU=Area of UnionArea of Intersection

I o U > 0.5 IoU > 0.5 IoU>0.5 is decent

I o U > 0.7 IoU > 0.7 IoU>0.7 pretty good

I o U > 0.9 IoU > 0.9 IoU>0.9 perfect

Overlapping Boxes: Non-Max Suppression (NMS)
  1. select next highest-scoring box
  2. eliminate lower-scoring boxes with IoU>0.7 (with the box we selected in step1)
  3. If any boxes remain goto 1

Evaluating Object Detectors: mAP(Mean Average Precision)

  1. run detector on all test images + NMS

  2. for each category, computer AP = area under precision vs Recall Curve

    复制代码
     1.	for each detection (high -> low)
     	1.	If it matches some GT(Ground-Truth) box with IoU>0.5 mark it as positive and eliminate the GT
     	2.	otherwise mark is as nagative
     	3.	plot a point on PR curve
     2.	AP = area under PR Curve
  3. mAP = average of AP for each category

  4. COCO mAP: compute mAP for each IoU threshold and take average

How to get AP = 1.0 -> hit all GT boxes with IoU > 0.5, no false positive ranked above any true positive

Fast R-CNN

  1. ConvNet (Backbone network)-> convolutional features for entire high resolution image
  2. Regions of Interest (Rols)
  3. Crop + Resize features
  4. Per-Region Network (light-weight -> fast)
  5. output category and box

Cropping Features: Rol Pool

  1. project proposal onto features
  2. snap to gird cells
  3. divide into 2*2 gird of (roughly) equal subregions
  4. max-pool within each subregions
  5. output the region features (always the same size even if we have different sizes of input regions)

Rol Align

Rol Align -> better align to avoid snapping

Faster R-CNN

Insert Region Proposal Network (RPN) to predict proposals from features

after the backbone network -> RPN -> regional proposals

Imagine an anchor box of fixed size at each point in the feature map

At each point predict whether the corresponding anchor contains an object

for positive boxes, also predict a box transform to regress from anchor box to object box

Use k different anchor boxes at each point

Single stage Faster R-CNN

just use anchor to make classification and object boxes predictions

Semantic Segmentation: Fully Convolutional Network

Input -> Convolutions -> Scores C * H * W -> argmax H * W

use cross-entropy loss of every pixel to train the network

Trick: Downsampling and Upsampling

Downsampling : Pooling, strided convolution

Upsampling

Unpooling

Bed of nails : fill 0

Nearest Neighbour: same numbers in small blocks

Bilinear Interpolation

f x , y = ∑ i , j f i , j max ⁡ ( 0 , 1 − ∣ x − i ∣ ) max ⁡ ( 0 , 1 − ∣ y − j ∣ ) f_{x,y} = \sum_{i,j}{f_{i,j} \max(0, 1-|x-i|) \max(0,1-|y-j|)} fx,y=∑i,jfi,jmax(0,1−∣x−i∣)max(0,1−∣y−j∣)

i,j in Nearest neighbours

Use two closest neighbours in x and y to construct linear approximations

Bicubic Interpolation

three closest neighbours in x and y to construct cubic approximation

Max Unpooling
Learnable Upsampling

Mask R-CNN

Just add Conv layers to predict a mask for each of C classes on the region proposals

Panoptic Segmentation

speperate different objects in the same category

Human Keypoints

Represent the pose of a human by locating a set of keypoints

Joint Instance Segmentation and Pose Estimation

-> General Idea: Add Per-Region "Heads" to Faster / Mask R-CNN

Dense captioning -> nlp -> visual reasoning

3D shape prediction ...

相关推荐
人工智能训练13 小时前
Ubuntu系统中Docker的常用命令总结
linux·运维·人工智能·ubuntu·docker·ai
深兰科技14 小时前
廊坊市市长刘媛率队到访深兰科技,推动机器人制造基地与产业投资落地
人工智能·科技·机器人·scala·symfony·深兰科技·廊坊市市长刘媛
沫儿笙14 小时前
发那科机器人在氩弧焊中搭配节气装置的优势
人工智能·机器人
m0_6501082418 小时前
【论文精读】CMD:迈向高效视频生成的新范式
人工智能·论文精读·视频扩散模型·高效生成·内容 - 运动分解·latent 空间
电鱼智能的电小鱼18 小时前
基于电鱼 AI 工控机的智慧工地视频智能分析方案——边缘端AI检测,实现无人值守下的实时安全预警
网络·人工智能·嵌入式硬件·算法·安全·音视频
年年测试18 小时前
AI驱动的测试:用Dify工作流实现智能缺陷分析与分类
人工智能·分类·数据挖掘
唐兴通个人19 小时前
人工智能Deepseek医药AI培训师培训讲师唐兴通讲课课程纲要
大数据·人工智能
WGS.20 小时前
llama factory 扩充词表训练
深度学习
共绩算力20 小时前
Llama 4 Maverick Scout 多模态MoE新里程碑
人工智能·llama·共绩算力
DashVector21 小时前
向量检索服务 DashVector产品计费
数据库·数据仓库·人工智能·算法·向量检索