【COMP9517】Computer Vision

COMP9517: Computer Vision
Objectives: This lab revisits important concepts covered in the Week 1 and Week 2 lectures
and aims to make you familiar with implementing specific algorithms.
Preliminaries: As mentioned in the first lecture, we assume you are familiar with programming
in Python or are willing to learn it independently. You do not need to be an expert, as you will
further develop your skills during the course, but you should at least know the basics. If you
do not yet know Python, we assume you are familiar with at least one other programming
language such as C, in which case it should be relatively easy to learn Python.
To learn or brush up your Python skills, see several free online resources listed at the end of
this document. Especially if you already know C or similar languages, there is no need to go
through all the linked resources in detail. Just quickly learn the syntax and the main features
of the language. The rest will follow as you go.
For implementing and testing computer vision algorithms, we use OpenCV in this course.
OpenCV is a library of programming functions mainly for computer vision. The library is cross
platform and licensed as free and open-source software under Apache License 2. It also
supports training and execution of machine/deep learning models. Originally written in C, with
new algorithms developed in C++, it has wrappers for languages such as Python and Java. As
stated above, in this course we will focus on programming in Python. See the links below for
OpenCV tutorials and documentation.
Software: You are required to use OpenCV 3+ with Python 3+ and submit your code as a
Jupyter notebook (see coding and submission requirements below). In the first tutor
consultation session this week, your tutors will give a demo of the software to be used, and
you can ask any questions you may have about this.
Materials: The sample images to be used in this lab are available via WebCMS3.
Submission: All code and requested results are assessable after the lab. Submit your source
code as a Jupyter notebook (.ipynb) which includes all output and answers to all questions
(see coding requirements at the end of this document) by the above deadline. The submission
link will be announced in due time.
1. Contrast Stretching
Contrast is a measure of the range of intensity values in an image and is defined as the
difference between the maximum pixel value and minimum pixel value. The maximum
possible contrast of an 8-bit image is 255 (max) -- 0 (min) = 255. Any value less than that means
the image has lower contrast than possible. Contrast stretching attempts to improve the
contrast of the image by stretching the range of intensity values using linear scaling.
Task (0.75 mark): Write an algorithm that performs contrast stretching as per Equation (1)
above. Read the given image Oakland.png and execute your algorithm to see whether it
indeed improves the contrast. Notice that this is a colour image, which has three channels (R,
G, B), so you need to somehow apply your algorithm to the three channels.
There are different possibilities here and we consider three of them:
• The most straightforward is to apply the algorithm to each of the image channels (R, G, B)
individually. That is, the mapping function (1) is calculated for each channel separately and
may be different for each channel depending on its 𝑐𝑐 and 𝑑𝑑 values.
• Alternatively, you could convert the colour image to a gray-level image, for example using
the formula Y = 0.299R + 0.587G + 0.114B, then calculate the mapping function (1) on the
gray-level image (Y), and apply it to the channels (R, G, B) of the original colour image. In
this case, the same mapping function is applied to all channels.
• Finally, you could convert the colour image to a different colour space, such as HSV, then
calculate the mapping function (1) on the value (V) channel, and apply it to the channels
(R, G, B) of the original colour image. Here again, a single mapping function is calculated
and applied to all image channels.
In your notebook, execute your algorithm and show the input image and output image next
to each other, for each of the three approaches. Also briefly discuss in a comment in your
notebook which approach yields the best contrast-stretched colour image and provide
reasons for why that approach works better than the other two.
2. Histogram Calculation
The histogram of an image shows the counts of the intensity values. It gives only statistical
information about the pixels and removes the location information. For a digital image with 𝐿𝐿
gray levels, from 0 to 𝐿𝐿 − 1 , the histogram is a discrete function ℎ(𝑖𝑖) = 𝑛𝑛 𝑖𝑖 where 𝑖𝑖 ∈
[0, 𝐿𝐿 − 1] is the 𝑖𝑖 th gray level and 𝑛𝑛 𝑖𝑖 is the number of pixels having that gray level.
Task (0.5 mark): Write an algorithm that computes and plots the histogram of an image and
also reports the minimum pixel value and the maximum pixel value in the image. Then execute
your algorithm to compare the histograms and extreme values before and after contrast
stretching of image Oakland.png for each of the three approaches in the previous task.
More specifically, for the first contrast-stretching approach, show the histogram and extreme
values for each of the three channels (R, G, B) of both the input image and the output image.
For the second approach, show the histogram and extreme values of only the gray value
representation (Y) of both the input image and the output image after conversion. For the
third approach, show the histogram and extreme values of only the value channel (V) of both
the input image and the output image after conversion.
To facilitate visual comparison, present the histograms of the input image and corresponding
output image side by side in each case.
3. Image Thresholding
A crucial first step for quantitative analysis of objects (or regions) of interest in images, is to
identify which pixels belong to the objects (the relevant pixels) and which belong to the
background (the irrelevant pixels). This task is called image segmentation.
The simplest technique to perform this task is thresholding. Here, a pixel is considered to
belong to an object if its value is above the threshold, and to the background if its value is
lower than or equal to the threshold.
While an optimal threshold for each image could be selected manually by the user, this is
undesirable in applications that require full automation. Fortunately, several automatic
thresholding techniques exist, as discussed in the lecture.
Task (0.75 mark): Write an algorithm that can threshold an image using the three different
thresholding methods discussed in the lecture: Otsu, IsoData, Triangle. Apply your algorithm
to the images Hardware.png (the objects are the dark nuts and bolts), Nuclei.png (the objects
are the bright cell nuclei), and Orca.png (the object of interest is the Orca).
In your notebook, show the results in table form to facilitate visual comparison of all images.
For example, one table row per input image, successively showing the input image, its
histogram, and the thresholding results using the three methods.
Also briefly discuss the differences in the results in your notebook and provide explanations
(based on the histograms or otherwise) why for some images one thresholding method may
work better than others, while for other images it may be the other way around, or perhaps
in some cases none of the methods work well. Present some general rules of thumb for which
thresholding methods are best for what kind of images.
4. Edge Detection
Edges are an important source of semantic information in images. A gray-scale image can be
thought of as a 2D landscape with areas of different intensities corresponding to different
heights. The edges are the transitions from one such area to the next.
The Laplacian is a second-order derivative operator that can be used to find edges. It
emphasizes pixels in areas of strong intensity changes and de-emphasizes pixels in areas with
slowly varying intensities. The edges are the zero-crossings in the Laplacian image.
Task (0.5 mark): Write an algorithm that computes the Laplacian image of an input image
using the above kernel. Apply the algorithm to the image Laplace.png .
Notice that the calculations may produce negative output pixel values. Thus, make sure you
use the right data types for the calculations and for the output image, and the right intensity
mapping to display the output image.
Coding Requirements
Make sure that in your Jupyter notebook, the input images are readable from the location
specified as an argument, and all output images and other requested results are displayed in
the notebook environment. All cells in your notebook should have been executed so that the
tutor/marker does not have to execute the notebook again to see the results.

相关推荐
B站计算机毕业设计超人19 分钟前
计算机毕业设计PySpark+Hadoop中国城市交通分析与预测 Python交通预测 Python交通可视化 客流量预测 交通大数据 机器学习 深度学习
大数据·人工智能·爬虫·python·机器学习·课程设计·数据可视化
学术头条23 分钟前
清华、智谱团队:探索 RLHF 的 scaling laws
人工智能·深度学习·算法·机器学习·语言模型·计算语言学
18号房客28 分钟前
一个简单的机器学习实战例程,使用Scikit-Learn库来完成一个常见的分类任务——**鸢尾花数据集(Iris Dataset)**的分类
人工智能·深度学习·神经网络·机器学习·语言模型·自然语言处理·sklearn
feifeikon31 分钟前
机器学习DAY3 : 线性回归与最小二乘法与sklearn实现 (线性回归完)
人工智能·机器学习·线性回归
游客52033 分钟前
opencv中的常用的100个API
图像处理·人工智能·python·opencv·计算机视觉
古希腊掌管学习的神35 分钟前
[机器学习]sklearn入门指南(2)
人工智能·机器学习·sklearn
凡人的AI工具箱1 小时前
每天40分玩转Django:Django国际化
数据库·人工智能·后端·python·django·sqlite
咸鱼桨1 小时前
《庐山派从入门到...》PWM板载蜂鸣器
人工智能·windows·python·k230·庐山派
强哥之神2 小时前
Nexa AI发布OmniAudio-2.6B:一款快速的音频语言模型,专为边缘部署设计
人工智能·深度学习·机器学习·语言模型·自然语言处理·音视频·openai
yusaisai大鱼2 小时前
tensorflow_probability与tensorflow版本依赖关系
人工智能·python·tensorflow