Computer Vision-CNN

CNN(Convolutional Neural Network)

Import a question:classification

given a feature representing for images, how do we learn a model for distinguishing features from different classes?

The machine learning framework

1:prediction function to get desired output:

f(🍎)=apple

f(🍅)=tomato

f(🐮)=cow

2:The framework

here, there are two activities:

  • Training:knowing training set {(x1,y1)......(xn,yn)} estimate the prediction function f
  • Testing:knowing f ,to test x and output value y=f(x)

Neural Networks(Linear)

  • Perceptron(感知机)
  • Linear classifier-vector of weights w and a 'bias b

    This is convolution!

An example of binary classifying an image

  • Each pixel of the image would be an input, so, for a 28x28 image, we vectorize(矢量化),x=1x784

矢量化是一种将图像、图形或其他类型的数据转换为矢量格式的过程。在矢量格式中,图像和图形被表示为数学公式,而不是像素或其他离散数据点的集合。这种表示方式具有许多优点,包括:

可缩放性:矢量图形可以无限放大或缩小,而不会失去清晰度或产生锯齿状边缘。

编辑性:矢量图形可以轻松地编辑和修改,例如更改颜色、形状、大小等,而不会影响图像的质量。

交互性:矢量图形可以与其他应用程序进行交互,例如在网站上使用矢量图形可以使页面加载更快,并且可以通过CSS样式表轻松地更改图形属性。

打印质量:矢量图形具有更高的打印质量,因为它们不会失去清晰度或产生锯齿状边缘。

总之,矢量化可以提高图像和图形的质量,并使其更易于编辑、缩放和使用。

  • w is a vector of weights for each pixel: 784x1
  • b is a scalar(标量) bias per perceptron
  • result=xw +b ->(1x784)(784x1)+b->(1x1)+b

    Notice: the result of multiplying **xw** is a scalar(dot product)

Multuclass(add more perceptrons)

  • x same as above example ->x=1x784
  • W is a matrix of weights for each pixel/each perceptron
    w=784x10(assume 10-class classification)
  • b is a bias per perceptron(vector of biases)->b=1x10
  • result=xW+b=(1x784)x(784x10)+b=(1x10)+(1x10)=output vector

Bias convenience

  • create a 'fake' feature with value 1 to represent the bias
  • Add an extra weight that can vary

Then: the composition :

Outputs from one perceptron are fed into inputs of another perceptron

It's all just matrix multiplication!

Two problems

1:with all linear functions, the composition of functions is really just a single function(not complex function)

2:Linear classifiers:small change in input can cause large change in binary output=problem for composition of functions.

The thing we want:

Neural Network(Non-Linearities)

MLP(Multi-layer perceptron)

  • with enough parameters, it can approximate any function
  • images as input to neural networks(spatial correlation is local+waste of resource and we have not enough training samples)

so we import an activity: Sparse interactions

  • composition of layers will expand local to global


    Note:after such operation,the parameterization is good when input image is registered

Convolution Layer


pooling Layer:Receptive Field Size

Pooling is similar to downsampling

  • In convolution neural network, we always adopt pooling layer after a convolution layer operation.(Often using Max pooling not average pooling)
  • There are many kind of pooling layer(max/average)

Local contrast Normalization


相关推荐
xiangduanjava14 分钟前
关于安装Ollama大语言模型本地部署工具
人工智能·语言模型·自然语言处理
zzywxc78735 分钟前
AI 正在深度重构软件开发的底层逻辑和全生命周期,从技术演进、流程重构和未来趋势三个维度进行系统性分析
java·大数据·开发语言·人工智能·spring
超龄超能程序猿37 分钟前
(1)机器学习小白入门 YOLOv:从概念到实践
人工智能·机器学习
大熊背1 小时前
图像处理专业书籍以及网络资源总结
人工智能·算法·microsoft
江理不变情1 小时前
图像质量对比感悟
c++·人工智能
张较瘦_3 小时前
[论文阅读] 人工智能 + 软件工程 | 需求获取访谈中LLM生成跟进问题研究:来龙去脉与创新突破
论文阅读·人工智能
一 铭4 小时前
AI领域新趋势:从提示(Prompt)工程到上下文(Context)工程
人工智能·语言模型·大模型·llm·prompt
顾道长生'5 小时前
(Arxiv-2025)通过动态 token 剔除实现无需训练的高效视频生成
计算机视觉·音视频·视频生成
麻雀无能为力7 小时前
CAU数据挖掘实验 表分析数据插件
人工智能·数据挖掘·中国农业大学
时序之心7 小时前
时空数据挖掘五大革新方向详解篇!
人工智能·数据挖掘·论文·时间序列