深度学习入门(3) - CNN

CNN

Convolutional Layer

We use a filter to slide over the image spatially (computing dot products)

Interspersed with activation function as well

What it learns?

First-layer conv filters: local image templates (Often learns oriented edges, opposing colors)

Problems:
  1. For large images, we need many layers to get information about the whole image

​ Solution: Downsample inside the network

  1. Feature map shrinks with each layer

    Solution: Padding : adding zeros around the input

Pooling layer

-> downsampling

Without parameters that needs to be learnt.

ex:

max pooling

Aver pooling

...

FC layer(Fully Connected)

The last layer should always be a FC layer.

Batch normalization

we need to force inputs to be nicely scaled at each layer so that we can do the optimization more easily.

Usually inserted after FC layer / Convolutional layer, before non-linearity

Pros:

make the network easier to train

robust to initialization

Cons:

behaves differently during training and testing

Architechtures (History of ImageNet Challenge)

AlexNet

Input 3 * 277 * 277

Layer filters 64 kernel 11 stride 4 pad 2

We need to pay attention to the Memory, pramas, flop size

ZFNet

larger AlexNet

VGG

Rules:

  1. All conv 3*3 stride 1 pad 1
  2. max pool 2*2 stride 2
  3. after pool double channels

Stages:

conv-conv-pool

conv-conv-pool

conv-conv-pool

conv-conv-[conv]-pool

conv-conv-[conv]-pool

GoogLeNet

Stem network: aggressively downsamples input

Inception module:

Use such local unit with different kernal size

Use 1*1 Bottleneck to reduce channel dimensions

At the end, rather than flatting to destroy the spatial information with giant parameters

GoogLeNet use average pooling: 7 * 7 * 1024 -> 1024

There is only on FClayer at the last.

找到瓶颈位置,尽可能降低需要学习的参数数量/内存占用

Auxiliary Classifiers:

To help the deep network converge (batch normalization was not invented then): Auxiliary classification outputs to inject additional gradient at lower layers

Residual Networks

We find out that, somtimes we make the net deeper but it turns out to be underfitted.

Deeper network should strictly have the capability to do whatever a shallow one can, but it's hard to learn the parameters.

So we need the residual network!

This can help learning Identity, with all the parameters to be 0.

The still imitate VGG with its sat b

ResNeXt

Adding grops improves preforamance with same computational complexity.

MobileNets

reduce cost to make it affordable on mobile devices

Transfer learning

We can pretrain the model on a dataset.

When applying it to a new dataset, just finetune/Use linear classifier on the top layers.

Froze the main body of the net.

有一定争议,不需要预训练也能在2-3x的时间达到近似的效果

相关推荐
itwangyang5201 分钟前
GitHub Push Protection 报错解决指南(检测到 Token / Secret)
人工智能·python·github
燃于AC之乐7 分钟前
OpenClaw“小龙虾”深度解析:60天碾压Linux的AI智能体,从原理到搞定本地部署【Windows系统 + 接入飞书】
linux·人工智能·飞书·openclaw·小龙虾
AlphaNil8 分钟前
.NET + AI 跨平台实战系列(四):本地化部署——使用Ollama运行开源多模态模型
人工智能·开源·.net
lihuayong9 分钟前
混合检索架构:为什么BM25与向量搜索缺一不可
人工智能·全文检索·向量检索·混合检索
犀思云9 分钟前
解构网络复杂性:基于 FusionWAN NaaS 的确定性架构工程实践与流量编排深度指南
网络·人工智能·机器人·智能仓储·专线
安逸sgr10 分钟前
【Agent 架构设计】记忆系统深度解析:从 RAG 到 Hindsight 的演进之路!
人工智能·microsoft·大模型·claude·cursor
万少11 分钟前
用龙虾 openclaw 拆解网络爆文-说不定你也可以参考写一个
人工智能
wal131452011 分钟前
OpenClaw教程补充内容——如何进行飞书Bot的配置
人工智能·飞书·openclaw
FeelTouch Labs12 分钟前
智能开发平台建设方案
人工智能
悟纤14 分钟前
OpenClaw 安装与运行教程 | OpenClaw教程 | 第2篇
人工智能·ai agent·openclaw