深度学习入门(3) - CNN

CNN

Convolutional Layer

We use a filter to slide over the image spatially (computing dot products)

Interspersed with activation function as well

What it learns?

First-layer conv filters: local image templates (Often learns oriented edges, opposing colors)

Problems:
  1. For large images, we need many layers to get information about the whole image

​ Solution: Downsample inside the network

  1. Feature map shrinks with each layer

    Solution: Padding : adding zeros around the input

Pooling layer

-> downsampling

Without parameters that needs to be learnt.

ex:

max pooling

Aver pooling

...

FC layer(Fully Connected)

The last layer should always be a FC layer.

Batch normalization

we need to force inputs to be nicely scaled at each layer so that we can do the optimization more easily.

Usually inserted after FC layer / Convolutional layer, before non-linearity

Pros:

make the network easier to train

robust to initialization

Cons:

behaves differently during training and testing

Architechtures (History of ImageNet Challenge)

AlexNet

Input 3 * 277 * 277

Layer filters 64 kernel 11 stride 4 pad 2

We need to pay attention to the Memory, pramas, flop size

ZFNet

larger AlexNet

VGG

Rules:

  1. All conv 3*3 stride 1 pad 1
  2. max pool 2*2 stride 2
  3. after pool double channels

Stages:

conv-conv-pool

conv-conv-pool

conv-conv-pool

conv-conv-[conv]-pool

conv-conv-[conv]-pool

GoogLeNet

Stem network: aggressively downsamples input

Inception module:

Use such local unit with different kernal size

Use 1*1 Bottleneck to reduce channel dimensions

At the end, rather than flatting to destroy the spatial information with giant parameters

GoogLeNet use average pooling: 7 * 7 * 1024 -> 1024

There is only on FClayer at the last.

找到瓶颈位置,尽可能降低需要学习的参数数量/内存占用

Auxiliary Classifiers:

To help the deep network converge (batch normalization was not invented then): Auxiliary classification outputs to inject additional gradient at lower layers

Residual Networks

We find out that, somtimes we make the net deeper but it turns out to be underfitted.

Deeper network should strictly have the capability to do whatever a shallow one can, but it's hard to learn the parameters.

So we need the residual network!

This can help learning Identity, with all the parameters to be 0.

The still imitate VGG with its sat b

ResNeXt

Adding grops improves preforamance with same computational complexity.

MobileNets

reduce cost to make it affordable on mobile devices

Transfer learning

We can pretrain the model on a dataset.

When applying it to a new dataset, just finetune/Use linear classifier on the top layers.

Froze the main body of the net.

有一定争议,不需要预训练也能在2-3x的时间达到近似的效果

相关推荐
沫儿笙14 分钟前
镀锌板焊接中库卡机器人是如何省气的
网络·人工智能·机器人
Keep_Trying_Go29 分钟前
论文Leveraging Unlabeled Data for Crowd Counting by Learning to Rank算法详解
人工智能·pytorch·深度学习·算法·人群计数
趣浪吧1 小时前
AI在手机上真没用吗?
人工智能·智能手机·aigc·音视频·媒体
IT考试认证1 小时前
华为人工智能认证 HCIA-AI Solution H13-313 题库
人工智能·华为·题库·hcia-ai·h13-313
AI technophile1 小时前
OpenCV计算机视觉实战(31)——人脸识别详解
人工智能·opencv·计算机视觉
九河云1 小时前
汽车轻量化部件智造:碳纤维成型 AI 调控与强度性能数字孪生验证实践
人工智能·汽车·数字化转型
3DVisionary1 小时前
DIC技术如何重新定义汽车板料成形测试
人工智能·汽车·材料力学性能·dic技术·汽车板料·成形极限图·非接触式测量
5***o5001 小时前
深度学习代码库
人工智能·深度学习
2501_941664962 小时前
AI在创意产业的应用:从艺术到娱乐的数字变革
人工智能
没有梦想的咸鱼185-1037-16632 小时前
最新“科研创新与智能化转型“暨AI 智能体(Agent)开发、大语言模型(LLM)本地化部署与RAG/微调优化技术
人工智能·语言模型·自然语言处理·chatgpt·数据分析