CNN(Convolutional Neural Network)
Import a question:classification
given a feature representing for images, how do we learn a model for distinguishing features from different classes?
The machine learning framework
1:prediction function to get desired output:
f(🍎)=apple
f(🍅)=tomato
f(🐮)=cow
2:The framework
here, there are two activities:
- Training:knowing training set {(x1,y1)......(xn,yn)} estimate the prediction function f
- Testing:knowing f ,to test x and output value y=f(x)
Neural Networks(Linear)
- Perceptron(感知机)
- Linear classifier-vector of weights w and a 'bias b
This is convolution!
An example of binary classifying an image
- Each pixel of the image would be an input, so, for a 28x28 image, we vectorize(矢量化),x=1x784
矢量化是一种将图像、图形或其他类型的数据转换为矢量格式的过程。在矢量格式中,图像和图形被表示为数学公式,而不是像素或其他离散数据点的集合。这种表示方式具有许多优点,包括:
可缩放性:矢量图形可以无限放大或缩小,而不会失去清晰度或产生锯齿状边缘。
编辑性:矢量图形可以轻松地编辑和修改,例如更改颜色、形状、大小等,而不会影响图像的质量。
交互性:矢量图形可以与其他应用程序进行交互,例如在网站上使用矢量图形可以使页面加载更快,并且可以通过CSS样式表轻松地更改图形属性。
打印质量:矢量图形具有更高的打印质量,因为它们不会失去清晰度或产生锯齿状边缘。
总之,矢量化可以提高图像和图形的质量,并使其更易于编辑、缩放和使用。
- w is a vector of weights for each pixel: 784x1
- b is a scalar(标量) bias per perceptron
- result=xw +b ->(1x784)(784x1)+b->(1x1)+b
[Notice: the result of multiplying xw is a scalar(dot product)]
Multuclass(add more perceptrons)
- x same as above example ->x=1x784
- W is a matrix of weights for each pixel/each perceptron
w=784x10(assume 10-class classification) - b is a bias per perceptron(vector of biases)->b=1x10
- result=xW+b=(1x784)x(784x10)+b=(1x10)+(1x10)=output vector
Bias convenience
- create a 'fake' feature with value 1 to represent the bias
- Add an extra weight that can vary
Then: the composition :
Outputs from one perceptron are fed into inputs of another perceptron
It's all just matrix multiplication!
Two problems
1:with all linear functions, the composition of functions is really just a single function(not complex function)
2:Linear classifiers:small change in input can cause large change in binary output=problem for composition of functions.
The thing we want:
Neural Network(Non-Linearities)
MLP(Multi-layer perceptron)
- with enough parameters, it can approximate any function
- images as input to neural networks(spatial correlation is local+waste of resource and we have not enough training samples)
so we import an activity: Sparse interactions
- composition of layers will expand local to global
Note:after such operation,the parameterization is good when input image is registered
Convolution Layer
pooling Layer:Receptive Field Size
Pooling is similar to downsampling
- In convolution neural network, we always adopt pooling layer after a convolution layer operation.(Often using Max pooling not average pooling)
- There are many kind of pooling layer(max/average)
Local contrast Normalization