CNNs for image processing and other applications

CNNs for image processing and other applications


CNNs - convolutional neural networks

A structure to simulate the brain's visual cortex. They can both perform well in CV [object detection (classifying multiple objects in an image and placing bounding boxes around them) and semantic segmentation (classifying each pixel according to the class of the object it belongs to)] and NLP.

Why don't we use the Dense Layers? Although Dense Layers works well with MNIST dataset, they're hard to scale up (i.e., to deal with relative large image; e.g., 100 × 100) as the number of parameters will explode. CNN solves this problem by using partial connections and sharing weights.

CNNs actually use the similar "cross-correlations" operation instead of using "convolution".

Remark: these local receptive fields are rectangles

In real world, the image is hierarchical. This may be the reason that a stack of CNN layers succeed in image recognition.

Supplement 1: we have to line up input tensors as dense layers follow the principle: 1 sample, 1 vector (1D)

Supplement 2: tf.keras.layers.Conv2D allows input Tensor with the shape (batch_size, height, width, channels), which means we can pass it a 2D image directly

Spacing out the receptive fields to connect to a larger input layer. In this way, the computational burden of this model will be eased, dramatically.

The connection weights are just multiplying their values to the corresponding receptive field values, plused by respective bias terms. A set of these weights (excluding bias terms) is called a (convolutional) kernel (alias: a filter).

As usual, all these weights (which said, the filters) and the biases will be learned during training, and the layer corresponding to them will output a feature map given it a input.

In reality, a convolution layer will output one feature map per filter. Each pixel of a feature map pairs one neuron in a 2D layer (precisely we shall take a convolutional layer as a 3D layer, by stacking them together). Each feature map matches a pair of (kernel, bias), which is bijectively mapped onto a 2D layer.

In short, a convolutional layer simultaneously applies multiple trainable filters to its inputs, making it capable of detecting multiple features anywhere in its inputs.

There're many advantages (which surpass dense layers) of sharing kernels and bias terms in one "2D layer"/feature map, including:

  1. reducing the computation complexity greatly
  2. learning a feature in someplace and transfer into anywhere of input image

Computing the output of a neuron in a convolutional layer

the output of a neuron in one convolutional layer located in ( i , j , f e a t u r e m a p i d ) = b i a s ( f e a t u r e m a p i d ) + ∑ the output of one neuron of previous layer located in ( i ′ , j ′ , f e a t u r e m a p i d ′ ) × weight ( f e a t u r e m a p i d , f e a t u r e m a p i d ′ , i n t h e c e l l o f x t h r o w o f r e c e p t i v e f i e l d , i n t h e c e l l o f y t h c o l u m n o f r e c e p t i v e f i e l d ) \text{the output of a neuron in one convolutional layer located in}(i, j, feature~map~id)\\ = bias(feature~map~id) + \sum\\\text{the output of one neuron of previous layer located in}(i', j', feature~map~id')\times\text{weight}(feature~map~id, feature~map~id', in~the~cell~of~x_{th}~row~of~receptive~field, in~the~cell~of~y_{th}~column~of~receptive~field) the output of a neuron in one convolutional layer located in(i,j,feature map id)=bias(feature map id)+∑the output of one neuron of previous layer located in(i′,j′,feature map id′)×weight(feature map id,feature map id′,in the cell of xth row of receptive field,in the cell of yth column of receptive field)

w i t h with with

i ′ = i × s t r i d e h e i g h t + x , x + 1 ∈ [ 1 , f i e l d h e i g h t ] i'=i\times stride_{height}+x, x+1\in[1, field_{height}] i′=i×strideheight+x,x+1∈[1,fieldheight], j ′ = j × s t r i d e w i d t h + y , y + 1 ∈ [ 1 , f i e l d w i d t h ] j'=j\times stride_{width}+y, y+1\in[1, field_{width}] j′=j×stridewidth+y,y+1∈[1,fieldwidth]

python 复制代码
import tensorflow as tf
import matplotlib.pyplot as plt

from sklearn.datasets import load_sample_images
dataset = load_sample_images()['images']
im1, im2 = dataset

_, ax = plt.subplots(1, 2)
ax[0].imshow(im1)
ax[1].imshow(im2)
plt.show()

im1.max(), im1.min(), im2.max(), im2.min()

print(tuple(map(lambda x: x.dtype, dataset)))

print(tuple(map(lambda x: x.shape, dataset)))

dataset = tf.keras.layers.Rescaling(1/255)(tf.keras.layers.CenterCrop(height=70, width=120)(dataset))

dataset.shape

More about tf.keras.layers.Conv2D:

  1. tf.keras.layers.Conv2D = tf.keras.layers.Convolution2D
  2. under the hood, this layer relies on TensorFlow's tf.nn.conv2d() operation
  3. kernel_size defines the shape of reception field
  4. by default, strides is set to (1, 1) and padding="valid" (which actually means no zero-padding at all)

Conv2D accepts Tensors of shape (batch_size, spacial_dimension_1, spatial_dimension_2, channels)

We can consider channels as color filters.

python 复制代码
Conv2D = tf.keras.layers.Conv2D(filters=32, kernel_size=7) # equivalent to using kernel_size=(7 , 7)
feature_map = Conv2D(dataset)
feature_map.shape

64 = 70 - 7 + 1

114 = 120 - 7 + 1

python 复制代码
Conv2D = tf.keras.layers.Conv2D(filters=32, kernel_size=7, padding='same') # pad with zeros to make shapes the same
feature_map = Conv2D(dataset)
feature_map.shape

Under the hood: how to pad with 0s?

padding='valid':
find the maximal O u t p u t G r i d s s . t . 1 + S t r i d e s × ( O u t p u t G r i d s − 1 ) + ( K e r n e l S i z e − 1 ) ≤ I n p u t G r i d s O u t p u t G r i d s ≤ I n p u t G r i d s − K e r n e l S i z e + S t r i d e s S t r i d e s Therefore: O u t p u t G r i d s = ⌊ I n p u t G r i d s − K e r n e l S i z e + S t r i d e s S t r i d e s ⌋ \text{find the maximal }OutputGrids \\ s.t. \\ 1 + Strides \times (OutputGrids-1) + (KernelSize-1) \le InputGrids \\ OutputGrids \le \frac{InputGrids-KernelSize+Strides}{Strides} \\ \text{Therefore:}~OutputGrids=\lfloor\frac{InputGrids-KernelSize+Strides}{Strides}\rfloor find the maximal OutputGridss.t.1+Strides×(OutputGrids−1)+(KernelSize−1)≤InputGridsOutputGrids≤StridesInputGrids−KernelSize+StridesTherefore: OutputGrids=⌊StridesInputGrids−KernelSize+Strides⌋

padding = 'same':
we confine the O u t p u t G r i d s s . t . O u t p u t G r i d s = ⌈ I n p u t G r i d s S t r i d e s ⌉ Therefore we can compute the I n p u t G r i d s A f t e r P a d d i n g = K e r n e l S i z e + ( O u t p u t G r i d s − 1 ) × S t r i d e s So we shall pad ⌊ I n p u t G r i d s A f t e r P a d d i n g − I n p u t G r i d s 2 ⌋ , ⌈ I n p u t G r i d s A f t e r P a d d i n g − I n p u t G r i d s 2 ⌉ 0s on each side \text{we confine the }OutputGrids~s.t.~OutputGrids=\lceil\frac{InputGrids}{Strides}\rceil \\ \text{Therefore we can compute the }InputGridsAfterPadding=KernelSize+(OutputGrids-1)\times Strides \\ \text{So we shall pad }\lfloor\frac{InputGridsAfterPadding - InputGrids}2\rfloor, \lceil\frac{InputGridsAfterPadding - InputGrids}2\rceil\text{ 0s on each side} we confine the OutputGrids s.t. OutputGrids=⌈StridesInputGrids⌉Therefore we can compute the InputGridsAfterPadding=KernelSize+(OutputGrids−1)×StridesSo we shall pad ⌊2InputGridsAfterPadding−InputGrids⌋,⌈2InputGridsAfterPadding−InputGrids⌉ 0s on each side

python 复制代码
kernels, biases = Conv2D.weights # this attribute will return Tensors; if we use get_weights() method instead, numpy arrays will be returned

kernels.shape # [kernel_height, kernel_width, input_channels, output_channels]

biases.shape # [output_channels]

We can feed images of any size to this layer, as long as they are at least as large as the kernels, and if they have the right number of channels.

Specifying an activation function (such as ReLU) when creating a Conv2D layer, and also specifying the corresponding kernel initializer (such as He initialization) is useful, otherwise consecutively stacked convolutional layers are equivalent to one convolutional layer.

Hyperparameters Summary: filters, kernel_size, padding, strides, activation, kernel_initializer, etc.

相关推荐
代码匠心34 分钟前
Trae IDE 隐藏玩法:接入即梦 AI,生成高质量大片!
人工智能·ai·trae·skills
早点睡觉好了1 小时前
重排序 (Re-ranking) 算法详解
算法·ai·rag
xixixi777773 小时前
今日 AI 、通信、安全行业前沿日报(2026 年 2 月 4 日,星期三)
大数据·人工智能·安全·ai·大模型·通信·卫星通信
哥布林学者4 小时前
吴恩达深度学习课程五:自然语言处理 第三周:序列模型与注意力机制 课后习题与代码实践
深度学习·ai
m0_603888716 小时前
A Multi-scale Linear-time Encoder for Whole-Slide Image Analysis
ai·论文速览
Elastic 中国社区官方博客7 小时前
Elastic 9.3:与数据对话、构建自定义 AI agents、实现全自动化
大数据·人工智能·elasticsearch·搜索引擎·ai·自动化·全文检索
启友玩AI7 小时前
方言守护者:基于启英泰伦CI-F162GS02J芯片的“能听懂乡音”的智能夜灯DIY全攻略
c语言·人工智能·嵌入式硬件·ai·语音识别·pcb工艺
vivo互联网技术9 小时前
Chat 模式是和 AI 最好的交互范式吗?
ai·人机交互·产品设计·ai交互设计·chat模式·意图信息密度
北杳同学9 小时前
Claude Code安装与初始化
ai·claude
啊阿狸不会拉杆10 小时前
《机器学习导论》第 1 章 - 引言
人工智能·python·算法·机器学习·ai·numpy·matplotlib