深度学习入门(4) -Object Detection 目标检测

Object Detection

Output:

  1. category label from fixed, known set of categories
  2. bounding box (x, y, width, height)

If only one object is needed to be detected -> add FC layer to the Net pretrianed on ImageNet

Sliding Window

apply a CNN to many different crops of the image, CNN classifies each crop as object / backgroud

but too many windows!! and may detect repeatedly

we need region proposals to find a small set of boxes that are likely to cover all the objects

"Selective Search" quick to generate 2000 regions

R-CNN : Region-Based CNN

  1. Region proposals
  2. warped the image to fixed size 224*224
  3. forward each region through ConvNet independently
  4. output a classification score and also a Bbox of 4 numbers, using the following algorithm
Measurement of boxes (IoU)

I o U = Area of Intersection Area of Union IoU = \frac{\text{Area of Intersection}}{\text{Area of Union}} IoU=Area of UnionArea of Intersection

I o U > 0.5 IoU > 0.5 IoU>0.5 is decent

I o U > 0.7 IoU > 0.7 IoU>0.7 pretty good

I o U > 0.9 IoU > 0.9 IoU>0.9 perfect

Overlapping Boxes: Non-Max Suppression (NMS)
  1. select next highest-scoring box
  2. eliminate lower-scoring boxes with IoU>0.7 (with the box we selected in step1)
  3. If any boxes remain goto 1

Evaluating Object Detectors: mAP(Mean Average Precision)

  1. run detector on all test images + NMS

  2. for each category, computer AP = area under precision vs Recall Curve

    复制代码
     1.	for each detection (high -> low)
     	1.	If it matches some GT(Ground-Truth) box with IoU>0.5 mark it as positive and eliminate the GT
     	2.	otherwise mark is as nagative
     	3.	plot a point on PR curve
     2.	AP = area under PR Curve
  3. mAP = average of AP for each category

  4. COCO mAP: compute mAP for each IoU threshold and take average

How to get AP = 1.0 -> hit all GT boxes with IoU > 0.5, no false positive ranked above any true positive

Fast R-CNN

  1. ConvNet (Backbone network)-> convolutional features for entire high resolution image
  2. Regions of Interest (Rols)
  3. Crop + Resize features
  4. Per-Region Network (light-weight -> fast)
  5. output category and box

Cropping Features: Rol Pool

  1. project proposal onto features
  2. snap to gird cells
  3. divide into 2*2 gird of (roughly) equal subregions
  4. max-pool within each subregions
  5. output the region features (always the same size even if we have different sizes of input regions)

Rol Align

Rol Align -> better align to avoid snapping

Faster R-CNN

Insert Region Proposal Network (RPN) to predict proposals from features

after the backbone network -> RPN -> regional proposals

Imagine an anchor box of fixed size at each point in the feature map

At each point predict whether the corresponding anchor contains an object

for positive boxes, also predict a box transform to regress from anchor box to object box

Use k different anchor boxes at each point

Single stage Faster R-CNN

just use anchor to make classification and object boxes predictions

Semantic Segmentation: Fully Convolutional Network

Input -> Convolutions -> Scores C * H * W -> argmax H * W

use cross-entropy loss of every pixel to train the network

Trick: Downsampling and Upsampling

Downsampling : Pooling, strided convolution

Upsampling

Unpooling

Bed of nails : fill 0

Nearest Neighbour: same numbers in small blocks

Bilinear Interpolation

f x , y = ∑ i , j f i , j max ⁡ ( 0 , 1 − ∣ x − i ∣ ) max ⁡ ( 0 , 1 − ∣ y − j ∣ ) f_{x,y} = \sum_{i,j}{f_{i,j} \max(0, 1-|x-i|) \max(0,1-|y-j|)} fx,y=∑i,jfi,jmax(0,1−∣x−i∣)max(0,1−∣y−j∣)

i,j in Nearest neighbours

Use two closest neighbours in x and y to construct linear approximations

Bicubic Interpolation

three closest neighbours in x and y to construct cubic approximation

Max Unpooling
Learnable Upsampling

Mask R-CNN

Just add Conv layers to predict a mask for each of C classes on the region proposals

Panoptic Segmentation

speperate different objects in the same category

Human Keypoints

Represent the pose of a human by locating a set of keypoints

Joint Instance Segmentation and Pose Estimation

-> General Idea: Add Per-Region "Heads" to Faster / Mask R-CNN

Dense captioning -> nlp -> visual reasoning

3D shape prediction ...

相关推荐
中杯可乐多加冰12 分钟前
基于 DeepSeek + MateChat 的证券智能投顾技术实践:打造金融领域的专属大Q模型助手
前端·人工智能
deephub17 分钟前
从零开始:用Python和Gemini 3四步搭建你自己的AI Agent
人工智能·python·大语言模型·agent
算家计算26 分钟前
DeepSeek开源IMO金牌模型!跑出数学推理新高度,你的算力准备好了吗?
人工智能·资讯·deepseek
Codebee30 分钟前
SOLO+OODER全栈框架:图生代码与组件化重构实战指南
前端·人工智能
腾讯云开发者38 分钟前
AI 时代,职场不慌!前快狗打车CTO沈剑来支招
人工智能
合方圆~小文43 分钟前
球型摄像机作为现代监控系统的核心设备
java·数据库·c++·人工智能
AI_56781 小时前
AI无人机如何让安全隐患无处遁形
人工智能·无人机
FL16238631291 小时前
无人机视角航拍河道漂浮物垃圾识别分割数据集labelme格式256张1类别
深度学习
机器之心1 小时前
DeepSeek强势回归,开源IMO金牌级数学模型
人工智能·openai
机器之心1 小时前
华为放出「准万亿级MoE推理」大招,两大杀手级优化技术直接开源
人工智能·openai