Applied Computer Vision - a Deep Learning Approach

Applied Computer
Vision
FOR UNDERGRADS
A Deep Learning Approach
J. Berengueres

Applied Computer Vision for Undergrads
version 2.2
Editor
Jose Berengueres
Edition
First Edition. August 24th, 2014.
Text Copyright
© Jose Berengueres 2014. All Rights Reserved.
i
©

Video, Audio & Artwork Copyright
Artwork appearing in this work is subject to their
corresponding original Copyright or Creative Commons
License. Except where otherwise noted a Creative
Commons Attribution 3.0 License applies.
Limit of Liability
The editor makes no representations or warranties
concerning the accuracy or exhaustivity of the contents
and theories hereby presented and particularly disclaim
any implied warranties regarding merchantability or
fitness for a particular use including but not limited to
educational, industrial and academic application.
Neither the editor or the authors are liable for any loss
or profit or any commercial damages including but not
limited to incidental, consequential or other damages.
Support
This work was supported by:
UAE University V
ii

1 | Intro
http://www.howwedrive.com/2012/01/23/let-the-robot-drive/

How face recognition woks
http://vimeo.com/12774628
Andrew NG
http://www.youtube.com/watch?v=AY4ajbu_G3k
Google Self driving Car
https://www.youtube.com/watch?v=YXylqtEQ0tk
4
1 | Intro - Topics
Introduction
videos

2 | Theoretical Stuff
OpenCV was developed by Intel Russia
research center at Nizhny Novgorod
Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html
How to program a PC so it learns how to see?
What does it mean to “see” something?
1. Projective geometry
2. The Eye Design
3. Saliency Sense in babies
Optical Illusions are
proof of information
discarding at retina
leve l (s e e al so de e p
lear n ing )
What does it mean to
See?
------

Projective Geometry
Many Books on Computer vision start by this topic. Which is
quite irrelevant. Let me explain, Picasso did not benefit from
learning the chemistry of Yellow paint manufacturing. You
neither. You don’t need 3D geometry knowledge, you just
need great 2D geometry knowledge. Additionally, his
paintings were one of the first (since the middle ages) to
ignore the laws of projective geometry. And they where a big
success. His father was a frustrated art teacher, but he
made sure that young Picasso would not become like
himself. Picasso like Tiger Woods was a product of his
father.
7
2 | Theoretical Stuff - Projective Geometry
Caring about projective Geom in Computer Vsion
is the same mistake as “model of the world” vs.
“ground model” - you should not care about the
world because you already have one model of it:
Your retina is your world and your model! Do not
complicate it by adding an additional layer or
model of a model that u have to update
continuously. You don’t even know if the world is
real. See Godl.
------
Reality?
Yo u r R e a l it y !

Pin-hole Design
Pin-hole design is one of the few cheap ways to convert a
3D world into a 2D picture (simplification). That is why
man-made cameras use the same principle. For a
fascinating story on how the pinhole eye is used by
biological systems I recommend Climbing mount
improbable. There are other ways in which nature “sees”:
the bat’s ultrasound vision
dolphins’ ultrasound vision
the compound eye of fly
Polarized light vision
From a computer vision point of view it should not mater if
your vision device is pinhole based or not.
Compound Eye. What is the pixel resolution of the
compound eye?
8
2 | Theoretical Stuff - The Design of the Eye

9

http://www.detectingdesign.com/humaneye.html
10

2 | Theoretical Stuff - Developmental Psychology
Hit a Wall
When Maja Rudinac hit the wall of unpractical computer
vision, she turned to developmental psychologists for
strategies to cope with large amounts of image pixels. This is
what she found (so you don’t have to):
Visual Developmental Psychology
Basics
(Abridged from Maja Rudinac PhD thesis, TUDelft)
Babies at the 4th month can already tell if a character is
bad or good because we can see who they hug longer.
Infants look longer at new faces or new objects
Independent of where are born, all babies know
boundaries of objects.
Can predict collisions
Basic additive and subtractive cognition
Can identify members of own group
versus non-own group
Spontaneous motor movement is not goal directed at the
onset. The baby explores the degrees of freedom
Goal directed arm-grasp appears at the 4th month
The ability to engage and disengage attention on targets
appears from day 1 in babies.
Smooth visual tracking is present at birth
How baby cognition “works”
Development of actions of babies is goal directed by two
motives. Actions are either,
1. To discover novelty
2. To discover regularity
3. To discover the potential of their own body
Development of Perception
Perception in babies is driven by two processes:
1. Detection of structure or patterns
11

2 | Theoretical Stuff - Developmental Psychology
2. Discarding of irrelevant info and keeping relevant info
Cognitive Robot Shopping List
So if we want to make a minimum viable product (MVP)
that can understand the world at least (as well or as
poorly) as a baby does, this are the functions that
according to Mrs. Maja (pronounced Maya) we will need:
A WebCam
Object Rracking
Object Discrimination
Attraction to peoples faces
Face Recognition
Use the hand to move objects to scan them form
various angles
Shades and 3D
Turns out that shades have a disproportionate influence in
helping us figure out 3D info from 2D retina pixels. When
researchers at Univ. of Texas used fake shades in a virtual
reality world, participants got head aches (because the
faking of the shades was not precise enough to fool the
brain. The brain got confused by the imperceptible
mismatches... that’s why smart people get head aches in
3D cinemas)
12

3 | Number Recognition Workshop
2014. Number recognition workshop
purpose is to learn computer vision basics.
16 2 7 3 8 4 9 50
Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html
In this chapter we will learn the four basic
components of a typical computer vision
program:
1. Features
2. Clustering
3. Filtering/Morph ops
4. Validation
we will use the example to learn OpenCV. And
finally we will learn why manual feature making is
obsolete because deep learning.
Let’s learn by means
of a simple example
What does it mean to
classify numbers?
------

3 | Number Recognition Workshop - Number Recognition
Intro
This workshop drives the introduction OpenCV functions
as needs arise. Let’s identify written numbers from 0 to 9.
You can get some inspiration in this video:
http://www.youtube.com/watch?v=D_cZBdfw-hQ
In Feb 2011 (just before the tsunami), we programmed
HRP-IV to play a game. We used the histogram method to
separate Caucasian skin from a background and then we
counted the number of valleys and mountains between
the hull vertexes. A more primitive approach is to average
the area, but then it is not as robust.
Workshop Time
Teams of 4. You have 15 minutes to come up with some
algorithm, trick or rule of thumb to classify the numbers.
Note The students will try to find saliency features. Good
saliency features are robust to:
noise
partial occlusions and,
confusion
Here is a typical list:
# of horizontal segments
# of pixels belonging to horizontal segments
Length of segments
# contains closed loops
# start and end points
Relative orientation of end points
Extracting features from pixels is called feature
extraction. Salient features are the ones that are most
useful in classifying pixels (by Information Entropy).
14

16 2 7 3 8 4 9 50
Feature Reduction as a Necessity
Consider the letter ‘E’, its representation a s a vector is e =
{ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ,
0,0,1,1,1,1,1,1,1,0,0 0,0,1,0,0,0,0,0,0 ... } that is a a vector
of dimension 11 x 13 = +100. There is no way for the
retinal nerve to have enough bandwidth to send all
that information for processing to the brain neurons. What
happens in reality is that, already at retina level, features
15

are already being extracted, information is compressed
(edge filters).
Minimum Number of Features
If we find five features that allow us to distinguish between
al 10 different number shapes, then the problem becomes
a dimension 5 problem, much more manageable.
Minimum Number of features formula
N> Log2 (different labels)
Because the problem can be complex. Students can start
by trying to distinguish just two numbers (the 1 and the 0).
Manually extracting two features: vertical and horizontal
features (feature reduction). Then, we can map the
numbers in clusters. This helps them understand the need
of more features when clusters overlap.
Training
So if we start with webcam photos of figures at 600x400
pixels how would you design the program? At this stage
most students fail to recognize the need for training with
lots of examples. In fact the hard part of this workshop is
training (understanding what features work to differentiate
0 from 1 and a 1 from a 2...) (See also Andrew Ng video in
Chapter 1)
16

Reflection
Some teams came up with valley mountain features.
Others with number of lines, or crossings.
Let’s make a competition to see which is the winning
team! (this is also a great excuse to help them learn
openCV)
Cognition Leads to Play?
Once we have a working program with the ability to
recognize numbers. We can Apply Toyota’s Kaizen (see
The Brown Book of Design Thinking Chapter 3). What kind
of games or apps can we make? What kind of applications
can u imagine? Make a useful app.
17

18

All computer vision
apps today
fundamentally work
the same as this
example
Now we just need to learn
ho w to:
1. D o tr ack ing
2. Clustering methods
3. How extr act fe at u re s w it h
opencv without reinventing
the wheel (next)
If
you can make an
app that recognizes my
hand writting better than my
app, I will give you a....
A +
19

3 | Number Recognition Workshop - Clustering of Features
How to Cluster Features (review of three methods)
By Closest Neighbor By Centroids (Center of mass) k-means
20
Let’s
have an honest
discussion about
clustering

By Probability
Given x is in x position what minimizes Prob. Error
Comparison of methods
Centroid --> x belongs to Green
Neighbor --> Green
Statistic --> Red
x
21
Where does ‘x’ belong?
-----
What is the best “strategy”

http://en.wikipedia.org/wiki/Cluster_analysis
22
Where does ‘x’ belong?
-----
What is the best “strategy”
The previous are the
main 3 unsuper vised
clustering methods

The Feature - Cluster Conundrum
Now ask the students to make a
flowchart of a program that can
label hand written numbers 0 to 9.
At this point most students come
up with this flow chart
The purpose of the exercise is to
let the student realize the
difference between training and
labeling. Training is the hard part. Another hard part to get is
how to discard useful from not useful features in the
Dimension reduction phase. Previusly we said to use Entropy
as a indicator of how useful a feature can be. However, if I
plug random noise as a feature it will socre high on entropy.
Because the quality of resutls depends on clustering and
clusters depends on what features we choose, it is not a good
idea to decouple features discarding from clustering itself.
This is called the feature-cluster conundrum.
In the next chapter, lets find how feature-finding has been
traditionally approached in the 80’s and 90’s (this is now of
course obsolete knowledge but I will include it here as a
“nostalgic” note).
23
So,
learnt
(knowledge) = List of
Feature + Cluster
boundaries?

3 | Number Recognition Workshop - Morphological Ops
Morphological Ops
Morphs ops is different from linear filters in that they are not
linear. Imagine you have the letter E but that the corner pixel
has been erased because of some noise. Your brain
(Gestalt Theory) can reconstruct, in fact, it is
designed to reconstruct intersections of lines.
However, a computer is not you. It does not know anything
about Gestalt and so how can we reconstruct this
missing corner automatically? If we don’t the
computer might think this is an F underlined.
You can reconstruct this by use of a so called
‘morph op’ called closing
24
F!
E!

Closing
1. Dilate - enlarge black pixel by adding black pixel next to
pre existing black pixels using some kind of rule
http://www.youtube.com/watch?v=xO3ED27rMHs
2. Erode - the reverse process
http://www.youtube.com/watch?v=fmyE7DiaIYQ
before after
Opening
Same as previous but in reverse 2 and 1 in order
before after
Case uses
Closing is used to connect missing lines, parts.
Opening is used to remove noise, that does not
belong to the largest object in the scene.
25

Structuring element
In this case the blue cross is the structuring element. IT can
be any other shape
For more details:
http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm
http://bigwww.epfl.ch/demo/demoteaching.html Demos
Practice
We can use Excel to practice manual closing and opening
26

L shape Excel exercise
In this exercise, I asked the students to come up with a
structuring element to reconstruct the E shape. Most of them
propose the L-shape 3x3. After two closings we can realize
the following: We managed to reconstruct the missing corner,
but if the structuring element size will obliterate details inthe
picture smaller than itself. This is one drawback of
morphological operation.
27

3 | Number Recognition Workshop - OpenCV Code Review
Code review - What’s wrong with openCV
thinking
https://github.com/orioli/MAID-ROBOT
https://github.com/orioli/MAID-ROBOT/blob/master/uEyeCameraHIRO/camShiftDemo/camShiftDemo.cpp
What the robot sees.
Results of the code review:
400 lines of code to count 5 fingers.
Not robust to skin color change
Not resuable code or a general purpose solution
it is too custom --> obsolescence assured
28
This project of 2007 is
an example of everything
what is wrong with OpenCV
thinking. It is not the way forward.
The brain does not work like this. It
does not scale. So whats next?
------

3 | Number Recognition Workshop - Edge Detectors & convolution
Now students have all the knowledge to make the sw to
identify numbers. Ask the students to draft a detailed
action plan to classify characters from 0 to 9 from photos.
Here is an example:
1.Get the training dataset
1.How many need? 100 of each number?
1. Organize folders /1 /2 /3 …
2. Go to cafeteria and ask people to give sample
of number
3. Digitize it. How? Take a picture with the
iPhone
2.Training SW
1.Find features for numbers
1. How many do we need?
1. Five?
2. Let the SW choose the useful ones
2. Edges
3. Horizontal lines
4. Closed spaces
5. Shapes?
3.Cluster dataset
1. How good is the clustering? ! Testing
2. Choose the method with highest accuracy across
testing subsets
3. The model will tell you which features are useful and
which are not at labeling
4. How do you test it?
5. Bring other (new) numbers and check
6. Split dataset into training set and testing set. 50% -
50%
7. Prevent overfitting - Divide test data into chunks of 10
and see prediction accuracy for each individual chunk
Now that students have drafted a plan let them do
something concrete. draw a matrix of 1 and 0 that
represents the binary matrix image of number 7 and
ask them how would they extract features.
29

Feature Extraction
In Excel we can use conditional highlighting to visualize the filtering process. We start by a 7. Ask the students
How would you extract a feature from the ones and zeroes?
30

Gaussian filtering by convolution
31

Vertical edge detector by convolution
See also Canny Filters and Laplacians
1. http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html
2. http://matlabserver.cs.rug.nl/cgi-bin/matweb.exe
3. http://www.youtube.com/watch?v=pIFnFhDsYlk
32

33
At this point we can realize that
numbers are traces so what we
types of
classify the
is to detection of need the traces? not edges?
What kind of edge classifier can
we use? viola-jones?
convo lution? we want to know:
type of edge, position
16 2 7 3 8 4 9 50

4 (0,0.5,4,13,33) it is a four!
http://practicalquant.blogspot.ae/2013/10/deep-learning-oral-traditions.html
34
Convert to +100
features found by hand
Example
feature: Has a cross in
the lower mid feature
Tr ad i t i o n al
approach

35
itth irse ea!
LEVEL 3
SPACIAL
RELATIONSHIP
DETECTOR
Layer - abs traction
Deep learning
approach
LEVEL 2
PRIMITIVE
SHAPE
DETECTORS
LEVEL 1
EDGE DETECTOR
BANK
http://cs.brown.edu/courses/cs143/2011/results/proj2/thuhe/

3 | Number Recognition Workshop - Features and Sparse Coding
Feature Extraction
http://www.youtube.com/watch?v=n1ViNeWhC24
36

3 | Number Recognition Workshop - Features and Sparse Coding
At this point, it is probably a good time to ask the students to
try to make a little system to classify handwritten numbers.
See what error rate they come up. For the training set, they
can ask friends to write numbers. Whatever you do, do not
forget to set up a deadline, otherwise they will seize the
opportunity. ( #rookiemistake )
How
to do automatic
feature extraction?
Manual
Feature Extraction is
not the way forward
37
Most
of Kaggle’s comp.
winners are decided by
how lucky they are at
finding useful features
----
D. Efimov

4 | Numer Recognition Workshop Solution
Number Recognition
Workshop Solution

4 | Numer Recognition Workshop Solution - Deep Learning Workshop
Recapitulation
In the previous chapter we saw how to manually make
features (aka feature extraction). We also so that feature
extraction is about 50% of the work to win a kaggle
competition. The other 50% is optimizing the mathematical
prediction model ( Efimov 2012 ). We also saw that some
geniuses, like Andrew Ng, postulate that manual feature
extraction is a waste of time, that this is the kind of nitty gritty
job that should be done by computers. We also saw in the
section feature-cluster conundrum one more reason why
feature engineering is part of the problem itself. In the section
about Sparse Coding we saw a mathematical foundation that
is a good reverse engineering of hwo the brain finds good
features because both sparse coding and the brain end up
with similar edge detector filters.
DNN online, on-demand
The company Ersats allows you to try Deep Neural Networks
models online. They have a demo based on number
recognition that we will use now (which is what we have been
doing in the last chapter). The handwritten training sets are
available from NIST at http://yann.lecun.com/exdb/mnist/.
The author, Yann LeCun, compared different methods to to
the job...Convolutional neural networks do the job better than
metric approaches such as SVM.
39

The MNIST tutorial
This tutorial explains how to use the cloud infrastructure to
solve the number recognition problem via Deep Neural
Networks.
http://www.ersatzlabs.com/documentation/sharedMNIST/
More background info, current backdrops and videos on
history of ANN at: (playlist of 5 videos)
History of neural nets (playlist)
https://www.youtube.com/watch?v=4B-XY8a4RGk
https://gigaom.com/2014/06/11/more-deep-learning-for-the-masses-
courtesy-of-ersatz-labs/
40

Additionally, Ersats published an interesting introduction to
Neural Nets at:
http://neuralnetworksanddeeplearning.com/chap1.html
41

5 | Advanced Topics
Advanced
Topics

Using Vision to Navigate
http://www.youtube.com/watch?v=8c2SFXQ5zHM Sir James
explains
http://www.youtube.com/watch?v=oguKCHP7jNQ Navigation
http://www.youtube.com/watch?v=xlaqYDZwoWo#t=43
Suction
NT: Hoover tried to steal his idea. They lost in court with
punitive damages. Top right photo HEAD SPORTS.
43
5 | Advanced Topics - Vision for Navigation
One day my wife was
sick so I had to vacuum the
house. Then I realized that bag
based vacuum cleaners do not suck,
So I decided to make one that sucks
more. I took longer than expected
though.
------
Sir J. Dyson

What People Cured of Blindness See
Abridged from “What People Cured of Blindness See”
BY PATRICK HOUSE” The New Yorker
How quickly, if at all, does the brain adapt and vision return
after surgery? A simple answer, and a correct one, is that it
depends entirely on circumstance. Back in 1993, Oliver
Sacks wrote a story in the magazine about Virgil, a man with
limited to no vision as a child who had developed cataracts at
the age of six. After his cataracts were removed, fifty years
later, Virgil had trouble adjusting. (For example, he could not
always distinguish the letter “A” from the letter “H” and, when
given Molyneux’s test, could not tell a square he felt from a
square he saw.)*
Since the surgeries, Sinha has followed up with the Prakash
children and found that, while they continued to suffer from
poor acuity, many higher-order aspects of vision seemed to
be improving. Within a week to a few months after surgery,
the children could match felt objects to their visual
counterparts. They also improved on spatial-navigation tasks
requiring mental imagery, which tested their ability to follow a
series of up, down, left, and right directions on a visually
imagined game board. This finding was particularly important
because previous work by Kosslyn and others had found that
the congenitally blind have a capacity for mental imagery, but
it is limited in some ways and becomes increasingly poor as
the task becomes more complex. (In one example, a sighted
person will imagine a typewriter a few feet away as larger than
the same one imagined a hundred feet away. Among the
congenitally blind, however, the imagined typewriter—a
composite of experiences of touch and sound alone—is the
same size at all distances.)
44
5 | Advanced Topics - Blindness

Kosslyn believes that any improvements in mental imagery will
require a “catalogue of visual memories” that can then be
used to build expectations about the visual world. “When you
develop expectations, you can use the fruits of previous
experience to help you process what’s coming in now,”
Kosslyn said. “But you need to have had that experience.” An
example is depth perception: to the sighted, with a lifetime of
practice, rules about occlusion (if A occludes B, object A is
closer) and foreshortening (objects farther away appear
smaller) are continually used to combine incoming light into a
rich, three-dimensional world. The absence of these rules can
frustrate the newly sighted, whose visual world can be both
blurry and two-dimensional—paintings and people are often
described as “flat, with dark patches”; a far-away
house is “nearby, but requiring the taking of a lot of
steps”; streetlights seen through glass are “luminous stains
stuck to the window”; sunbeams through tree branches
collapse into a single “tree with all the lights in it.” (The writer
Jorge Luis Borges, who went blind at age fifty-five, described
going blind as a process by which “everything near becomes
distant.” In the newly sighted, without depth perception, the
opposite seems true: the distant—tiny houses on the horizon,
clouds in the impossibly high sky—suddenly looks nearby.)
45
5 | Advanced Topics - Blindness

Ways of tracking
http://www.youtube.com/watch?
v=InqV34BcheM
1. By Color histogram (HSV is less
dependent on illumination)
2. By Blob (OpenCV Library, not very
robust)
3. By Face Detection
4. By Saliency (robust to occlusions
Traditional methods
Typical Keypoint Extraction for recognition of objects
independent of view:
Harris Afine
MSER
Hessian Afine
Maja’s method Insight
Features extracted
Use of HSV histogram (robust to ilumination changes)
Texture by Gray level co-occurrence matrix
Edge orientation histogram (6 bins)
Mean, skewness and sd for each color channel
Discard all but 25 top features.
Tested on Columbia Object Image Library. Beats previous
methods.
46
5 | Advanced Topics - Easy Tracking
We are looking for an
algo that is invariant to
partial oclusions
------
Maja R.

47

For more:
48

Deep Learning is a new area of Machine Learning research,
which has been introduced with the objective of moving
Machine Learning closer to one of its original goals: Artificial
Intelligence. See these course notes for a brief introduction to
Machine Learning for AI and an introduction toDeep Learning
algorithms. www.deeplearning.net/tutorial/
Deep Learning explained
(Abridged from the original from Pete Warden | @petewarden)
http://radar.oreilly.com/2014/07/what-is-deep-learning-and-why-should-you-care.html
Inside an ANN
The functions that are run inside an ANN are controlled by the
memory of the neural network, arrays of numbers known as
weights that define how the inputs are combined and
recombined to produce the results. Dealing with real-world
problems like cat-detection requires very complex functions,
which mean these arrays are very large, containing around
60 million (60MBytes) numbers in the case of one of
the recent computer vision networks. The biggest obstacle to
using neural networks has been figuring out how to set all
these massive arrays to values that will do a good job
transforming the input signals into output predictions.
Renaissance
It has always been difficult to train an ANN. But in 2012, a
breakthrough, a paper sparks a renaissance in ANN. Alex
Krizhevsky, Ilya Sutskever, and Geoff Hinton bring together a
whole bunch of different ways of accelerating the
learning process, including convolutional networks, clever
use of GPUs, and some novel mathematical tricks like ReLU
and dropout, and showed that in a few weeks they could
49
5 | Advanced Topics - Deep Learning

train a very complex network to a level that outperformed
conventional approaches to computer vision.
GPU photo by Pete Warden slides (Jetpack)
Listen to the Webcast at Strata 2013
http://www.oreilly.com/pub/e/3121
http://www.iro.umontreal.ca/~pift6266/H10/intro_diapos.pdf
Deep NN failed unitl 2006....
50

Automatic speech recognition
The results shown in the table below are for automatic speech
recognition on the popular TIMIT data set. This is a common
data set used for initial evaluations of deep learning
architectures. The entire set contains 630 speakers from eight
major dialects of American English, with each speaker reading
10 different sentences.[48] Its small size allows many different
configurations to be tried effectively with it. The error rates
presented are phone error rates (PER).
http://en.wikipedia.org/wiki/Deep_learning#Fundamental_concepts
51

Andrew Ng on Deep Learning
where AI will learn from untagged data
52

https://www.youtube.com/watch?v=W15K9PegQt0#t=221 [39:56]
53

To learn more about Andrew Ng on Deep Learning and the
future of #AI
- http://new.livestream.com/gigaom/FutureofAI (~1:20:00)
- https://www.youtube.com/watch?v=W15K9PegQt0#t=221
-http://deeplearning.stanford.edu/
Neural Nets talk by Ilya Sutskever
https://vimeo.com/77050653 [25:00 - end]
54

6 | Sample Questions
Sample
Questions

1. OpenCV is a software app that is useful because
1. it not a software app
2. false
3. true
4. it is useful because it integrates functionality
commonly used to perform computer vision
2. Projective geometry is useful because
1. every pinhole based visual system is subject to the
laws of projective geometry
2. compound eyes are subject to projective geometry
3. all are false
4. all are true
3. Pinhole eye design is popular in nature
1. true
2. false
4. A feature is
1. a characteristic of a image
2. the output of a filter
3. the output of non linear filter (morphological)
4. a feature must be a number
5. a feature cannot be binary
6. a feature can be true or false
5. A feature that is a constant can still be useful for
clustering purposes,
1. true
2. false
6. A good feature is a salient feature,
1. true
2. false
3. not always
56
6 | Sample Questions - A

7. The absolute minimum number of features to cluster
7 labels is six
1. true
2. false
3. its 8
4. 7
5. all are true
6. depends on the features
7. none
8. Use of features instead of dealing with pixels allows
to simplify the problem of labeling a image
1. true
2. false
9. Clustering by closest neighbor is the most effective
method to classify two overlapping clouds of points
1. true
2. k-means is in general superior
3. all of the above
4. none
10. Likelihood is the best way to cluster labels
1. true
2. false
3. it is better than k-means
11. A drawback of k-means is that you have to set the
number of clusters
1. true
2. false
12. Learning to recognize a pattern is equivalent to,
1. Knowing the cluster boundaries and knowing which
features to use
2. is to make a database of possible images and then
when one of the images is presented to tell which one is same
in the database
3. all
4. none
13. In morphological operations, closing is...
1. used to clean salt a pepper noise which is bigger
than the objects on interest
2. used to clean salt a pepper noise which is smaller
than the objects on interest
3. opening and dilation by this order
4. dilation and then erosion
14. Opening is
1. a tool to remove white noise form a black and white
picture
57

2. it only works on black and white picture
3. erosion followed by closing
4. erosion followed by dilation
5. none
15. Erosion is
1. a linear operation
2. is reversible
3. all are true
16. Dilation is
1. used to connect disconnected lines
2. to remove vertical lines
3. none
4. all
17. If I want to connect vertical lines that are
disconnected by an average pixel length of 5 pixels, what
structuring element is appropriate
1. [1,1,1,1,1,1,1,1] (as a vertical column)
2. [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
3. [1,1,1,3,4,3,1,1]
4. [0,0,0,1,0,0]
5. [1,1,1]
6. [1,1]
7. none
1. a 4x4 matrix of ones
5. none
3. a 5x5 matrix of zeroes
4. a 5x1 matrix of zeroes with a 1 in the middle
5. none
20. Ideally the features to do arabic number recognition
would be
1. correspond to traces,
2. not correspond to traces but numbers
3. both
4. none
58

21. The best features for pattern recognition are those
that appear infrequently in the dataset because
1. they have a lot of entropy
2. they discriminate numbers better because the
frequency
3. none
4. all
22. Sparse coding is related to how the brain detects
edges by means of basis functions
1. true
2. false
3. not always, depends on the variance of the images
59

Dr. Jose Berengueres joined UAE University
as Assistant Professor in 2011. He received
MEE from Polytechnic University of
Catalonia in 1999 and a PhD in bio-inspired
robotics from Tokyo Institute of Technology
in 2007.
He has authored books on:
The Toyota Production System
Design Thinking
Human Computer Interaction
UX women designers
Business Models Innovation
He has given talks and workshops on Design
Thinking & Business Models in Germany,
Mexico, Dubai, and California.

Applied Computer Vision - a Deep Learning Approach

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Applied Computer Vision - a Deep Learning Approach

Similaire à Applied Computer Vision - a Deep Learning Approach (20)

Plus de Jose Berengueres

Plus de Jose Berengueres (20)

Dernier

Dernier (20)

Applied Computer Vision - a Deep Learning Approach