Computer Vision-CNN

CNN(Convolutional Neural Network)

Import a question:classification

given a feature representing for images, how do we learn a model for distinguishing features from different classes?

The machine learning framework

1:prediction function to get desired output:

f(🍎)=apple

f(🍅)=tomato

f(🐮)=cow

2:The framework

here, there are two activities:

  • Training:knowing training set {(x1,y1)......(xn,yn)} estimate the prediction function f
  • Testing:knowing f ,to test x and output value y=f(x)

Neural Networks(Linear)

  • Perceptron(感知机)
  • Linear classifier-vector of weights w and a 'bias b

    This is convolution!

An example of binary classifying an image

  • Each pixel of the image would be an input, so, for a 28x28 image, we vectorize(矢量化),x=1x784

矢量化是一种将图像、图形或其他类型的数据转换为矢量格式的过程。在矢量格式中,图像和图形被表示为数学公式,而不是像素或其他离散数据点的集合。这种表示方式具有许多优点,包括:

可缩放性:矢量图形可以无限放大或缩小,而不会失去清晰度或产生锯齿状边缘。

编辑性:矢量图形可以轻松地编辑和修改,例如更改颜色、形状、大小等,而不会影响图像的质量。

交互性:矢量图形可以与其他应用程序进行交互,例如在网站上使用矢量图形可以使页面加载更快,并且可以通过CSS样式表轻松地更改图形属性。

打印质量:矢量图形具有更高的打印质量,因为它们不会失去清晰度或产生锯齿状边缘。

总之,矢量化可以提高图像和图形的质量,并使其更易于编辑、缩放和使用。

  • w is a vector of weights for each pixel: 784x1
  • b is a scalar(标量) bias per perceptron
  • result=xw +b ->(1x784)(784x1)+b->(1x1)+b

    Notice: the result of multiplying **xw** is a scalar(dot product)

Multuclass(add more perceptrons)

  • x same as above example ->x=1x784
  • W is a matrix of weights for each pixel/each perceptron
    w=784x10(assume 10-class classification)
  • b is a bias per perceptron(vector of biases)->b=1x10
  • result=xW+b=(1x784)x(784x10)+b=(1x10)+(1x10)=output vector

Bias convenience

  • create a 'fake' feature with value 1 to represent the bias
  • Add an extra weight that can vary

Then: the composition :

Outputs from one perceptron are fed into inputs of another perceptron

It's all just matrix multiplication!

Two problems

1:with all linear functions, the composition of functions is really just a single function(not complex function)

2:Linear classifiers:small change in input can cause large change in binary output=problem for composition of functions.

The thing we want:

Neural Network(Non-Linearities)

MLP(Multi-layer perceptron)

  • with enough parameters, it can approximate any function
  • images as input to neural networks(spatial correlation is local+waste of resource and we have not enough training samples)

so we import an activity: Sparse interactions

  • composition of layers will expand local to global


    Note:after such operation,the parameterization is good when input image is registered

Convolution Layer


pooling Layer:Receptive Field Size

Pooling is similar to downsampling

  • In convolution neural network, we always adopt pooling layer after a convolution layer operation.(Often using Max pooling not average pooling)
  • There are many kind of pooling layer(max/average)

Local contrast Normalization


相关推荐
装不满的克莱因瓶1 分钟前
Cursor超长会话跨窗口关联解决方案
人工智能·ai·agent·ai编程·cursor·智能体
仙女修炼史11 分钟前
How transferable are features in deep neural networks
人工智能·深度学习·学习
摩尔元数16 分钟前
出入库管理智能升级,工厂管理更高效、透明
大数据·数据仓库·人工智能·制造
北京耐用通信18 分钟前
如何用耐达讯自动化Profibus总线光纤中继器解决变频器长距离通信干扰问题?
人工智能·物联网·网络协议·自动化·信息与通信
Elastic 中国社区官方博客19 分钟前
Elasticsearch:Jina Reranker v3
大数据·人工智能·elasticsearch·搜索引擎·ai·全文检索·jina
鲨莎分不晴19 分钟前
FNN vs CNN 完全对比指南
深度学习·神经网络·cnn
CoovallyAIHub19 分钟前
仅192万参数的目标检测模型,Micro-YOLO如何做到目标检测精度与效率兼得
深度学习·算法·计算机视觉
liu****20 分钟前
深度学习简介
人工智能·python·深度学习·python基础
Mintopia23 分钟前
TrustLink|产品概览(公开版)
人工智能·产品经理·产品
大学生毕业题目23 分钟前
毕业项目推荐:102-基于yolov8/yolov5/yolo11的行人车辆检测识别系统(Python+卷积神经网络)
人工智能·python·yolo·目标检测·cnn·pyqt·行人车辆检测