SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
Applied Computer 
Vision 
FOR UNDERGRADS 
A Deep Learning Approach 
J. Berengueres
Applied Computer Vision for Undergrads 
version 2.2 
Editor 
Jose Berengueres 
Edition 
First Edition. August 24th, 2014. 
Text Copyright 
© Jose Berengueres 2014. All Rights Reserved. 
i 
©
Video, Audio & Artwork Copyright 
Artwork appearing in this work is subject to their 
corresponding original Copyright or Creative Commons 
License. Except where otherwise noted a Creative 
Commons Attribution 3.0 License applies. 
Limit of Liability 
The editor makes no representations or warranties 
concerning the accuracy or exhaustivity of the contents 
and theories hereby presented and particularly disclaim 
any implied warranties regarding merchantability or 
fitness for a particular use including but not limited to 
educational, industrial and academic application. 
Neither the editor or the authors are liable for any loss 
or profit or any commercial damages including but not 
limited to incidental, consequential or other damages. 
Support 
This work was supported by: 
UAE University V 
ii
1 | Intro 
http://www.howwedrive.com/2012/01/23/let-the-robot-drive/
How face recognition woks 
http://vimeo.com/12774628 
Andrew NG 
http://www.youtube.com/watch?v=AY4ajbu_G3k 
Google Self driving Car 
https://www.youtube.com/watch?v=YXylqtEQ0tk 
4 
1 | Intro - Topics 
Introduction 
videos
2 | Theoretical Stuff 
OpenCV was developed by Intel Russia 
research center at Nizhny Novgorod 
Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html 
How to program a PC so it learns how to see? 
What does it mean to “see” something? 
1. Projective geometry 
2. The Eye Design 
3. Saliency Sense in babies 
Optical Illusions are 
proof of information 
discarding at retina 
leve l (s e e al so de e p 
lear n ing ) 
What does it mean to 
See? 
------
6
Projective Geometry 
Many Books on Computer vision start by this topic. Which is 
quite irrelevant. Let me explain, Picasso did not benefit from 
learning the chemistry of Yellow paint manufacturing. You 
neither. You don’t need 3D geometry knowledge, you just 
need great 2D geometry knowledge. Additionally, his 
paintings were one of the first (since the middle ages) to 
ignore the laws of projective geometry. And they where a big 
success. His father was a frustrated art teacher, but he 
made sure that young Picasso would not become like 
himself. Picasso like Tiger Woods was a product of his 
father. 
7 
2 | Theoretical Stuff - Projective Geometry 
Caring about projective Geom in Computer Vsion 
is the same mistake as “model of the world” vs. 
“ground model” - you should not care about the 
world because you already have one model of it: 
Your retina is your world and your model! Do not 
complicate it by adding an additional layer or 
model of a model that u have to update 
continuously. You don’t even know if the world is 
real. See Godl. 
------ 
Reality? 
Yo u r R e a l it y !
Pin-hole Design 
Pin-hole design is one of the few cheap ways to convert a 
3D world into a 2D picture (simplification). That is why 
man-made cameras use the same principle. For a 
fascinating story on how the pinhole eye is used by 
biological systems I recommend Climbing mount 
improbable. There are other ways in which nature “sees”: 
the bat’s ultrasound vision 
dolphins’ ultrasound vision 
the compound eye of fly 
Polarized light vision 
From a computer vision point of view it should not mater if 
your vision device is pinhole based or not. 
Compound Eye. What is the pixel resolution of the 
compound eye? 
8 
2 | Theoretical Stuff - The Design of the Eye
9 
2 | Theoretical Stuff - The Design of the Eye
http://www.detectingdesign.com/humaneye.html 
10 
2 | Theoretical Stuff - The Design of the Eye
2 | Theoretical Stuff - Developmental Psychology 
Hit a Wall 
When Maja Rudinac hit the wall of unpractical computer 
vision, she turned to developmental psychologists for 
strategies to cope with large amounts of image pixels. This is 
what she found (so you don’t have to): 
Visual Developmental Psychology 
Basics 
(Abridged from Maja Rudinac PhD thesis, TUDelft) 
Babies at the 4th month can already tell if a character is 
bad or good because we can see who they hug longer. 
Infants look longer at new faces or new objects 
Independent of where are born, all babies know 
boundaries of objects. 
Can predict collisions 
Basic additive and subtractive cognition 
Can identify members of own group 
versus non-own group 
Spontaneous motor movement is not goal directed at the 
onset. The baby explores the degrees of freedom 
Goal directed arm-grasp appears at the 4th month 
The ability to engage and disengage attention on targets 
appears from day 1 in babies. 
Smooth visual tracking is present at birth 
How baby cognition “works” 
Development of actions of babies is goal directed by two 
motives. Actions are either, 
1. To discover novelty 
2. To discover regularity 
3. To discover the potential of their own body 
Development of Perception 
Perception in babies is driven by two processes: 
1. Detection of structure or patterns 
11
2 | Theoretical Stuff - Developmental Psychology 
2. Discarding of irrelevant info and keeping relevant info 
Cognitive Robot Shopping List 
So if we want to make a minimum viable product (MVP) 
that can understand the world at least (as well or as 
poorly) as a baby does, this are the functions that 
according to Mrs. Maja (pronounced Maya) we will need: 
A WebCam 
Object Rracking 
Object Discrimination 
Attraction to peoples faces 
Face Recognition 
Use the hand to move objects to scan them form 
various angles 
Shades and 3D 
Turns out that shades have a disproportionate influence in 
helping us figure out 3D info from 2D retina pixels. When 
researchers at Univ. of Texas used fake shades in a virtual 
reality world, participants got head aches (because the 
faking of the shades was not precise enough to fool the 
brain. The brain got confused by the imperceptible 
mismatches... that’s why smart people get head aches in 
3D cinemas) 
12
3 | Number Recognition Workshop 
2014. Number recognition workshop 
purpose is to learn computer vision basics. 
16 2 7 3 8 4 9 50 
Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html 
In this chapter we will learn the four basic 
components of a typical computer vision 
program: 
1. Features 
2. Clustering 
3. Filtering/Morph ops 
4. Validation 
we will use the example to learn OpenCV. And 
finally we will learn why manual feature making is 
obsolete because deep learning. 
Let’s learn by means 
of a simple example 
What does it mean to 
classify numbers? 
------
3 | Number Recognition Workshop - Number Recognition 
Intro 
This workshop drives the introduction OpenCV functions 
as needs arise. Let’s identify written numbers from 0 to 9. 
You can get some inspiration in this video: 
http://www.youtube.com/watch?v=D_cZBdfw-hQ 
In Feb 2011 (just before the tsunami), we programmed 
HRP-IV to play a game. We used the histogram method to 
separate Caucasian skin from a background and then we 
counted the number of valleys and mountains between 
the hull vertexes. A more primitive approach is to average 
the area, but then it is not as robust. 
Workshop Time 
Teams of 4. You have 15 minutes to come up with some 
algorithm, trick or rule of thumb to classify the numbers. 
Note The students will try to find saliency features. Good 
saliency features are robust to: 
noise 
partial occlusions and, 
confusion 
Here is a typical list: 
# of horizontal segments 
# of pixels belonging to horizontal segments 
Length of segments 
# contains closed loops 
# start and end points 
Relative orientation of end points 
Extracting features from pixels is called feature 
extraction. Salient features are the ones that are most 
useful in classifying pixels (by Information Entropy). 
14
3 | Number Recognition Workshop - Number Recognition 
16 2 7 3 8 4 9 50 
Feature Reduction as a Necessity 
Consider the letter ‘E’, its representation a s a vector is e = 
{ 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 
0,0,1,1,1,1,1,1,1,0,0 0,0,1,0,0,0,0,0,0 ... } that is a a vector 
of dimension 11 x 13 = +100. There is no way for the 
retinal nerve to have enough bandwidth to send all 
that information for processing to the brain neurons. What 
happens in reality is that, already at retina level, features 
15
3 | Number Recognition Workshop - Number Recognition 
are already being extracted, information is compressed 
(edge filters). 
Minimum Number of Features 
If we find five features that allow us to distinguish between 
al 10 different number shapes, then the problem becomes 
a dimension 5 problem, much more manageable. 
Minimum Number of features formula 
N> Log2 (different labels) 
Because the problem can be complex. Students can start 
by trying to distinguish just two numbers (the 1 and the 0). 
Manually extracting two features: vertical and horizontal 
features (feature reduction). Then, we can map the 
numbers in clusters. This helps them understand the need 
of more features when clusters overlap. 
Training 
So if we start with webcam photos of figures at 600x400 
pixels how would you design the program? At this stage 
most students fail to recognize the need for training with 
lots of examples. In fact the hard part of this workshop is 
training (understanding what features work to differentiate 
0 from 1 and a 1 from a 2...) (See also Andrew Ng video in 
Chapter 1) 
16
3 | Number Recognition Workshop - Number Recognition 
Reflection 
Some teams came up with valley mountain features. 
Others with number of lines, or crossings. 
Let’s make a competition to see which is the winning 
team! (this is also a great excuse to help them learn 
openCV) 
Cognition Leads to Play? 
Once we have a working program with the ability to 
recognize numbers. We can Apply Toyota’s Kaizen (see 
The Brown Book of Design Thinking Chapter 3). What kind 
of games or apps can we make? What kind of applications 
can u imagine? Make a useful app. 
17
3 | Number Recognition Workshop - Number Recognition 
18
3 | Number Recognition Workshop - Number Recognition 
All computer vision 
apps today 
fundamentally work 
the same as this 
example 
Now we just need to learn 
ho w to: 
1. D o tr ack ing 
2. Clustering methods 
3. How extr act fe at u re s w it h 
opencv without reinventing 
the wheel (next) 
If 
you can make an 
app that recognizes my 
hand writting better than my 
app, I will give you a.... 
A + 
19
3 | Number Recognition Workshop - Clustering of Features 
How to Cluster Features (review of three methods) 
By Closest Neighbor By Centroids (Center of mass) k-means 
20 
Let’s 
have an honest 
discussion about 
clustering
3 | Number Recognition Workshop - Clustering of Features 
By Probability 
Given x is in x position what minimizes Prob. Error 
Comparison of methods 
Centroid --> x belongs to Green 
Neighbor --> Green 
Statistic --> Red 
x 
21 
Where does ‘x’ belong? 
----- 
What is the best “strategy”
3 | Number Recognition Workshop - Clustering of Features 
http://en.wikipedia.org/wiki/Cluster_analysis 
22 
Where does ‘x’ belong? 
----- 
What is the best “strategy” 
The previous are the 
main 3 unsuper vised 
clustering methods
3 | Number Recognition Workshop - Clustering of Features 
The Feature - Cluster Conundrum 
Now ask the students to make a 
flowchart of a program that can 
label hand written numbers 0 to 9. 
At this point most students come 
up with this flow chart 
The purpose of the exercise is to 
let the student realize the 
difference between training and 
labeling. Training is the hard part. Another hard part to get is 
how to discard useful from not useful features in the 
Dimension reduction phase. Previusly we said to use Entropy 
as a indicator of how useful a feature can be. However, if I 
plug random noise as a feature it will socre high on entropy. 
Because the quality of resutls depends on clustering and 
clusters depends on what features we choose, it is not a good 
idea to decouple features discarding from clustering itself. 
This is called the feature-cluster conundrum. 
In the next chapter, lets find how feature-finding has been 
traditionally approached in the 80’s and 90’s (this is now of 
course obsolete knowledge but I will include it here as a 
“nostalgic” note). 
23 
So, 
learnt 
(knowledge) = List of 
Feature + Cluster 
boundaries?
3 | Number Recognition Workshop - Morphological Ops 
Morphological Ops 
Morphs ops is different from linear filters in that they are not 
linear. Imagine you have the letter E but that the corner pixel 
has been erased because of some noise. Your brain 
(Gestalt Theory) can reconstruct, in fact, it is 
designed to reconstruct intersections of lines. 
However, a computer is not you. It does not know anything 
about Gestalt and so how can we reconstruct this 
missing corner automatically? If we don’t the 
computer might think this is an F underlined. 
You can reconstruct this by use of a so called 
‘morph op’ called closing 
24 
F! 
E!
3 | Number Recognition Workshop - Morphological Ops 
Closing 
1. Dilate - enlarge black pixel by adding black pixel next to 
pre existing black pixels using some kind of rule 
http://www.youtube.com/watch?v=xO3ED27rMHs 
2. Erode - the reverse process 
http://www.youtube.com/watch?v=fmyE7DiaIYQ 
before after 
Opening 
Same as previous but in reverse 2 and 1 in order 
before after 
Case uses 
Closing is used to connect missing lines, parts. 
Opening is used to remove noise, that does not 
belong to the largest object in the scene. 
25
3 | Number Recognition Workshop - Morphological Ops 
Structuring element 
In this case the blue cross is the structuring element. IT can 
be any other shape 
For more details: 
http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm 
http://bigwww.epfl.ch/demo/demoteaching.html Demos 
Practice 
We can use Excel to practice manual closing and opening 
26
3 | Number Recognition Workshop - Morphological Ops 
L shape Excel exercise 
In this exercise, I asked the students to come up with a 
structuring element to reconstruct the E shape. Most of them 
propose the L-shape 3x3. After two closings we can realize 
the following: We managed to reconstruct the missing corner, 
but if the structuring element size will obliterate details inthe 
picture smaller than itself. This is one drawback of 
morphological operation. 
27
3 | Number Recognition Workshop - OpenCV Code Review 
Code review - What’s wrong with openCV 
thinking 
https://github.com/orioli/MAID-ROBOT 
https://github.com/orioli/MAID-ROBOT/blob/master/uEyeCameraHIRO/camShiftDemo/camShiftDemo.cpp 
What the robot sees. 
Results of the code review: 
400 lines of code to count 5 fingers. 
Not robust to skin color change 
Not resuable code or a general purpose solution 
it is too custom --> obsolescence assured 
28 
This project of 2007 is 
an example of everything 
what is wrong with OpenCV 
thinking. It is not the way forward. 
The brain does not work like this. It 
does not scale. So whats next? 
------
3 | Number Recognition Workshop - Edge Detectors & convolution 
Now students have all the knowledge to make the sw to 
identify numbers. Ask the students to draft a detailed 
action plan to classify characters from 0 to 9 from photos. 
Here is an example: 
1.Get the training dataset 
1.How many need? 100 of each number? 
1. Organize folders /1 /2 /3 … 
2. Go to cafeteria and ask people to give sample 
of number 
3. Digitize it. How? Take a picture with the 
iPhone 
2.Training SW 
1.Find features for numbers 
1. How many do we need? 
1. Five? 
2. Let the SW choose the useful ones 
2. Edges 
3. Horizontal lines 
4. Closed spaces 
5. Shapes? 
3.Cluster dataset 
1. How good is the clustering? ! Testing 
2. Choose the method with highest accuracy across 
testing subsets 
3. The model will tell you which features are useful and 
which are not at labeling 
4. How do you test it? 
5. Bring other (new) numbers and check 
6. Split dataset into training set and testing set. 50% - 
50% 
7. Prevent overfitting - Divide test data into chunks of 10 
and see prediction accuracy for each individual chunk 
Now that students have drafted a plan let them do 
something concrete. draw a matrix of 1 and 0 that 
represents the binary matrix image of number 7 and 
ask them how would they extract features. 
29
3 | Number Recognition Workshop - Edge Detectors & convolution 
Feature Extraction 
In Excel we can use conditional highlighting to visualize the filtering process. We start by a 7. Ask the students 
How would you extract a feature from the ones and zeroes? 
30
3 | Number Recognition Workshop - Edge Detectors & convolution 
Gaussian filtering by convolution 
31
3 | Number Recognition Workshop - Edge Detectors & convolution 
Vertical edge detector by convolution 
See also Canny Filters and Laplacians 
1. http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html 
2. http://matlabserver.cs.rug.nl/cgi-bin/matweb.exe 
3. http://www.youtube.com/watch?v=pIFnFhDsYlk 
32
3 | Number Recognition Workshop - Edge Detectors & convolution 
33 
At this point we can realize that 
numbers are traces so what we 
types of 
classify the 
is to detection of need the traces? not edges? 
What kind of edge classifier can 
we use? viola-jones? 
convo lution? we want to know: 
type of edge, position 
16 2 7 3 8 4 9 50
3 | Number Recognition Workshop - Edge Detectors & convolution 
4 (0,0.5,4,13,33) it is a four! 
http://practicalquant.blogspot.ae/2013/10/deep-learning-oral-traditions.html 
34 
Convert to +100 
features found by hand 
Example 
feature: Has a cross in 
the lower mid feature 
Tr ad i t i o n al 
approach
3 | Number Recognition Workshop - Edge Detectors & convolution 
35 
itth irse ea! 
LEVEL 3 
SPACIAL 
RELATIONSHIP 
DETECTOR 
Layer - abs traction 
Deep learning 
approach 
LEVEL 2 
PRIMITIVE 
SHAPE 
DETECTORS 
LEVEL 1 
EDGE DETECTOR 
BANK 
http://cs.brown.edu/courses/cs143/2011/results/proj2/thuhe/
3 | Number Recognition Workshop - Features and Sparse Coding 
Feature Extraction 
http://www.youtube.com/watch?v=n1ViNeWhC24 
36
3 | Number Recognition Workshop - Features and Sparse Coding 
At this point, it is probably a good time to ask the students to 
try to make a little system to classify handwritten numbers. 
See what error rate they come up. For the training set, they 
can ask friends to write numbers. Whatever you do, do not 
forget to set up a deadline, otherwise they will seize the 
opportunity. ( #rookiemistake ) 
How 
to do automatic 
feature extraction? 
Manual 
Feature Extraction is 
not the way forward 
37 
Most 
of Kaggle’s comp. 
winners are decided by 
how lucky they are at 
finding useful features 
---- 
D. Efimov
4 | Numer Recognition Workshop Solution 
Number Recognition 
Workshop Solution
4 | Numer Recognition Workshop Solution - Deep Learning Workshop 
Recapitulation 
In the previous chapter we saw how to manually make 
features (aka feature extraction). We also so that feature 
extraction is about 50% of the work to win a kaggle 
competition. The other 50% is optimizing the mathematical 
prediction model ( Efimov 2012 ). We also saw that some 
geniuses, like Andrew Ng, postulate that manual feature 
extraction is a waste of time, that this is the kind of nitty gritty 
job that should be done by computers. We also saw in the 
section feature-cluster conundrum one more reason why 
feature engineering is part of the problem itself. In the section 
about Sparse Coding we saw a mathematical foundation that 
is a good reverse engineering of hwo the brain finds good 
features because both sparse coding and the brain end up 
with similar edge detector filters. 
DNN online, on-demand 
The company Ersats allows you to try Deep Neural Networks 
models online. They have a demo based on number 
recognition that we will use now (which is what we have been 
doing in the last chapter). The handwritten training sets are 
available from NIST at http://yann.lecun.com/exdb/mnist/. 
The author, Yann LeCun, compared different methods to to 
the job...Convolutional neural networks do the job better than 
metric approaches such as SVM. 
39
4 | Numer Recognition Workshop Solution - Deep Learning Workshop 
The MNIST tutorial 
This tutorial explains how to use the cloud infrastructure to 
solve the number recognition problem via Deep Neural 
Networks. 
http://www.ersatzlabs.com/documentation/sharedMNIST/ 
More background info, current backdrops and videos on 
history of ANN at: (playlist of 5 videos) 
History of neural nets (playlist) 
https://www.youtube.com/watch?v=4B-XY8a4RGk 
https://gigaom.com/2014/06/11/more-deep-learning-for-the-masses- 
courtesy-of-ersatz-labs/ 
40
4 | Numer Recognition Workshop Solution - Deep Learning Workshop 
Additionally, Ersats published an interesting introduction to 
Neural Nets at: 
http://neuralnetworksanddeeplearning.com/chap1.html 
41
5 | Advanced Topics 
Advanced 
Topics
Using Vision to Navigate 
http://www.youtube.com/watch?v=8c2SFXQ5zHM Sir James 
explains 
http://www.youtube.com/watch?v=oguKCHP7jNQ Navigation 
http://www.youtube.com/watch?v=xlaqYDZwoWo#t=43 
Suction 
NT: Hoover tried to steal his idea. They lost in court with 
punitive damages. Top right photo HEAD SPORTS. 
43 
5 | Advanced Topics - Vision for Navigation 
One day my wife was 
sick so I had to vacuum the 
house. Then I realized that bag 
based vacuum cleaners do not suck, 
So I decided to make one that sucks 
more. I took longer than expected 
though. 
------ 
Sir J. Dyson
What People Cured of Blindness See 
Abridged from “What People Cured of Blindness See” 
BY PATRICK HOUSE” The New Yorker 
How quickly, if at all, does the brain adapt and vision return 
after surgery? A simple answer, and a correct one, is that it 
depends entirely on circumstance. Back in 1993, Oliver 
Sacks wrote a story in the magazine about Virgil, a man with 
limited to no vision as a child who had developed cataracts at 
the age of six. After his cataracts were removed, fifty years 
later, Virgil had trouble adjusting. (For example, he could not 
always distinguish the letter “A” from the letter “H” and, when 
given Molyneux’s test, could not tell a square he felt from a 
square he saw.)* 
Since the surgeries, Sinha has followed up with the Prakash 
children and found that, while they continued to suffer from 
poor acuity, many higher-order aspects of vision seemed to 
be improving. Within a week to a few months after surgery, 
the children could match felt objects to their visual 
counterparts. They also improved on spatial-navigation tasks 
requiring mental imagery, which tested their ability to follow a 
series of up, down, left, and right directions on a visually 
imagined game board. This finding was particularly important 
because previous work by Kosslyn and others had found that 
the congenitally blind have a capacity for mental imagery, but 
it is limited in some ways and becomes increasingly poor as 
the task becomes more complex. (In one example, a sighted 
person will imagine a typewriter a few feet away as larger than 
the same one imagined a hundred feet away. Among the 
congenitally blind, however, the imagined typewriter—a 
composite of experiences of touch and sound alone—is the 
same size at all distances.) 
44 
5 | Advanced Topics - Blindness
Kosslyn believes that any improvements in mental imagery will 
require a “catalogue of visual memories” that can then be 
used to build expectations about the visual world. “When you 
develop expectations, you can use the fruits of previous 
experience to help you process what’s coming in now,” 
Kosslyn said. “But you need to have had that experience.” An 
example is depth perception: to the sighted, with a lifetime of 
practice, rules about occlusion (if A occludes B, object A is 
closer) and foreshortening (objects farther away appear 
smaller) are continually used to combine incoming light into a 
rich, three-dimensional world. The absence of these rules can 
frustrate the newly sighted, whose visual world can be both 
blurry and two-dimensional—paintings and people are often 
described as “flat, with dark patches”; a far-away 
house is “nearby, but requiring the taking of a lot of 
steps”; streetlights seen through glass are “luminous stains 
stuck to the window”; sunbeams through tree branches 
collapse into a single “tree with all the lights in it.” (The writer 
Jorge Luis Borges, who went blind at age fifty-five, described 
going blind as a process by which “everything near becomes 
distant.” In the newly sighted, without depth perception, the 
opposite seems true: the distant—tiny houses on the horizon, 
clouds in the impossibly high sky—suddenly looks nearby.) 
45 
5 | Advanced Topics - Blindness
Ways of tracking 
http://www.youtube.com/watch? 
v=InqV34BcheM 
1. By Color histogram (HSV is less 
dependent on illumination) 
2. By Blob (OpenCV Library, not very 
robust) 
3. By Face Detection 
4. By Saliency (robust to occlusions 
Traditional methods 
Typical Keypoint Extraction for recognition of objects 
independent of view: 
Harris Afine 
MSER 
Hessian Afine 
Maja’s method Insight 
Features extracted 
Use of HSV histogram (robust to ilumination changes) 
Texture by Gray level co-occurrence matrix 
Edge orientation histogram (6 bins) 
Mean, skewness and sd for each color channel 
Discard all but 25 top features. 
Tested on Columbia Object Image Library. Beats previous 
methods. 
46 
5 | Advanced Topics - Easy Tracking 
We are looking for an 
algo that is invariant to 
partial oclusions 
------ 
Maja R.
47 
5 | Advanced Topics - Easy Tracking
For more: 
48 
5 | Advanced Topics - Easy Tracking
Deep Learning is a new area of Machine Learning research, 
which has been introduced with the objective of moving 
Machine Learning closer to one of its original goals: Artificial 
Intelligence. See these course notes for a brief introduction to 
Machine Learning for AI and an introduction toDeep Learning 
algorithms. www.deeplearning.net/tutorial/ 
Deep Learning explained 
(Abridged from the original from Pete Warden | @petewarden) 
http://radar.oreilly.com/2014/07/what-is-deep-learning-and-why-should-you-care.html 
Inside an ANN 
The functions that are run inside an ANN are controlled by the 
memory of the neural network, arrays of numbers known as 
weights that define how the inputs are combined and 
recombined to produce the results. Dealing with real-world 
problems like cat-detection requires very complex functions, 
which mean these arrays are very large, containing around 
60 million (60MBytes) numbers in the case of one of 
the recent computer vision networks. The biggest obstacle to 
using neural networks has been figuring out how to set all 
these massive arrays to values that will do a good job 
transforming the input signals into output predictions. 
Renaissance 
It has always been difficult to train an ANN. But in 2012, a 
breakthrough, a paper sparks a renaissance in ANN. Alex 
Krizhevsky, Ilya Sutskever, and Geoff Hinton bring together a 
whole bunch of different ways of accelerating the 
learning process, including convolutional networks, clever 
use of GPUs, and some novel mathematical tricks like ReLU 
and dropout, and showed that in a few weeks they could 
49 
5 | Advanced Topics - Deep Learning
train a very complex network to a level that outperformed 
conventional approaches to computer vision. 
GPU photo by Pete Warden slides (Jetpack) 
Listen to the Webcast at Strata 2013 
http://www.oreilly.com/pub/e/3121 
http://www.iro.umontreal.ca/~pift6266/H10/intro_diapos.pdf 
Deep NN failed unitl 2006.... 
50 
5 | Advanced Topics - Deep Learning
Automatic speech recognition 
The results shown in the table below are for automatic speech 
recognition on the popular TIMIT data set. This is a common 
data set used for initial evaluations of deep learning 
architectures. The entire set contains 630 speakers from eight 
major dialects of American English, with each speaker reading 
10 different sentences.[48] Its small size allows many different 
configurations to be tried effectively with it. The error rates 
presented are phone error rates (PER). 
http://en.wikipedia.org/wiki/Deep_learning#Fundamental_concepts 
51 
5 | Advanced Topics - Deep Learning
Andrew Ng on Deep Learning 
where AI will learn from untagged data 
52 
5 | Advanced Topics - Deep Learning
https://www.youtube.com/watch?v=W15K9PegQt0#t=221 [39:56] 
53 
5 | Advanced Topics - Deep Learning
To learn more about Andrew Ng on Deep Learning and the 
future of #AI 
- http://new.livestream.com/gigaom/FutureofAI (~1:20:00) 
- https://www.youtube.com/watch?v=W15K9PegQt0#t=221 
-http://deeplearning.stanford.edu/ 
Neural Nets talk by Ilya Sutskever 
https://vimeo.com/77050653 [25:00 - end] 
54 
5 | Advanced Topics - Deep Learning
6 | Sample Questions 
Sample 
Questions
1. OpenCV is a software app that is useful because 
1. it not a software app 
2. false 
3. true 
4. it is useful because it integrates functionality 
commonly used to perform computer vision 
2. Projective geometry is useful because 
1. every pinhole based visual system is subject to the 
laws of projective geometry 
2. compound eyes are subject to projective geometry 
3. all are false 
4. all are true 
3. Pinhole eye design is popular in nature 
1. true 
2. false 
4. A feature is 
1. a characteristic of a image 
2. the output of a filter 
3. the output of non linear filter (morphological) 
4. a feature must be a number 
5. a feature cannot be binary 
6. a feature can be true or false 
5. A feature that is a constant can still be useful for 
clustering purposes, 
1. true 
2. false 
6. A good feature is a salient feature, 
1. true 
2. false 
3. not always 
56 
6 | Sample Questions - A
7. The absolute minimum number of features to cluster 
7 labels is six 
1. true 
2. false 
3. its 8 
4. 7 
5. all are true 
6. depends on the features 
7. none 
8. Use of features instead of dealing with pixels allows 
to simplify the problem of labeling a image 
1. true 
2. false 
9. Clustering by closest neighbor is the most effective 
method to classify two overlapping clouds of points 
1. true 
2. k-means is in general superior 
3. all of the above 
4. none 
10. Likelihood is the best way to cluster labels 
1. true 
2. false 
3. it is better than k-means 
11. A drawback of k-means is that you have to set the 
number of clusters 
1. true 
2. false 
12. Learning to recognize a pattern is equivalent to, 
1. Knowing the cluster boundaries and knowing which 
features to use 
2. is to make a database of possible images and then 
when one of the images is presented to tell which one is same 
in the database 
3. all 
4. none 
13. In morphological operations, closing is... 
1. used to clean salt a pepper noise which is bigger 
than the objects on interest 
2. used to clean salt a pepper noise which is smaller 
than the objects on interest 
3. opening and dilation by this order 
4. dilation and then erosion 
14. Opening is 
1. a tool to remove white noise form a black and white 
picture 
57 
6 | Sample Questions - A
2. it only works on black and white picture 
3. erosion followed by closing 
4. erosion followed by dilation 
5. none 
15. Erosion is 
1. a linear operation 
2. is reversible 
3. all are true 
16. Dilation is 
1. used to connect disconnected lines 
2. to remove vertical lines 
3. none 
4. all 
17. If I want to connect vertical lines that are 
disconnected by an average pixel length of 5 pixels, what 
structuring element is appropriate 
1. [1,1,1,1,1,1,1,1] (as a vertical column) 
2. [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] 
3. [1,1,1,3,4,3,1,1] 
4. [0,0,0,1,0,0] 
5. [1,1,1] 
6. [1,1] 
7. none 
18. If I want to connect vertical lines that are 
disconnected by an average pixel length of 5 pixels, what 
structuring element is appropriate 
1. a 4x4 matrix of ones 
2. a 7x1 matrix of ones 
3. a 7x7 matrix of ones 
4. a 1x7 matrix of ones 
5. none 
19. If I want to connect vertical lines that are 
disconnected by an average pixel length of 5 pixels, what 
structuring element is appropriate 
1. a 1x5 matrix of ones 
2. a 5x1 matrix of ones 
3. a 5x5 matrix of zeroes 
4. a 5x1 matrix of zeroes with a 1 in the middle 
5. none 
20. Ideally the features to do arabic number recognition 
would be 
1. correspond to traces, 
2. not correspond to traces but numbers 
3. both 
4. none 
58 
6 | Sample Questions - A
21. The best features for pattern recognition are those 
that appear infrequently in the dataset because 
1. they have a lot of entropy 
2. they discriminate numbers better because the 
frequency 
3. none 
4. all 
22. Sparse coding is related to how the brain detects 
edges by means of basis functions 
1. true 
2. false 
3. not always, depends on the variance of the images 
59 
6 | Sample Questions - A
Dr. Jose Berengueres joined UAE University 
as Assistant Professor in 2011. He received 
MEE from Polytechnic University of 
Catalonia in 1999 and a PhD in bio-inspired 
robotics from Tokyo Institute of Technology 
in 2007. 
He has authored books on: 
The Toyota Production System 
Design Thinking 
Human Computer Interaction 
UX women designers 
Business Models Innovation 
He has given talks and workshops on Design 
Thinking & Business Models in Germany, 
Mexico, Dubai, and California.

Contenu connexe

Tendances

The final words about software estimation
The final words about software estimationThe final words about software estimation
The final words about software estimationAlberto Brandolini
 
The Complexity Curve: How to Design for Simplicity (SXSW, March 2012)
The Complexity Curve: How to Design for Simplicity (SXSW, March 2012)The Complexity Curve: How to Design for Simplicity (SXSW, March 2012)
The Complexity Curve: How to Design for Simplicity (SXSW, March 2012)Dave Hogue
 
Redesigning everything (avanscoperta meeutp edition)
Redesigning everything (avanscoperta meeutp edition)Redesigning everything (avanscoperta meeutp edition)
Redesigning everything (avanscoperta meeutp edition)Alberto Brandolini
 
Making Steaks from Sacred Cows
Making Steaks from Sacred CowsMaking Steaks from Sacred Cows
Making Steaks from Sacred CowsKevlin Henney
 
Carmen Brion - The value for product teams to design think
Carmen Brion - The value for product teams to design thinkCarmen Brion - The value for product teams to design think
Carmen Brion - The value for product teams to design thinkuxbri
 
Guerrilla portfolio management
Guerrilla portfolio managementGuerrilla portfolio management
Guerrilla portfolio managementAlberto Brandolini
 
Taking Responsibility for the Things We Unleash Into the World - IoT Meetup 2...
Taking Responsibility for the Things We Unleash Into the World - IoT Meetup 2...Taking Responsibility for the Things We Unleash Into the World - IoT Meetup 2...
Taking Responsibility for the Things We Unleash Into the World - IoT Meetup 2...Daytona
 
Design for developers
Design for developersDesign for developers
Design for developersJohan Ronsse
 
Thoughts on metrics and goals for responsible products
Thoughts on metrics and goals for responsible productsThoughts on metrics and goals for responsible products
Thoughts on metrics and goals for responsible productsAndreas Wolters
 
Increasing Innovation IQ
Increasing Innovation IQIncreasing Innovation IQ
Increasing Innovation IQMaria Thompson
 
What are machines learning? How might that impact design?
What are machines learning? How might that impact design?What are machines learning? How might that impact design?
What are machines learning? How might that impact design?Andreas Wolters
 
L'illusione dell'ortogonalità
L'illusione dell'ortogonalitàL'illusione dell'ortogonalità
L'illusione dell'ortogonalitàAlberto Brandolini
 
Nick Fine - Scientific Design
Nick Fine - Scientific Design Nick Fine - Scientific Design
Nick Fine - Scientific Design uxbri
 

Tendances (20)

The final words about software estimation
The final words about software estimationThe final words about software estimation
The final words about software estimation
 
Design thinking
Design thinkingDesign thinking
Design thinking
 
The Complexity Curve: How to Design for Simplicity (SXSW, March 2012)
The Complexity Curve: How to Design for Simplicity (SXSW, March 2012)The Complexity Curve: How to Design for Simplicity (SXSW, March 2012)
The Complexity Curve: How to Design for Simplicity (SXSW, March 2012)
 
What lies beneath
What lies beneathWhat lies beneath
What lies beneath
 
Redesigning everything (avanscoperta meeutp edition)
Redesigning everything (avanscoperta meeutp edition)Redesigning everything (avanscoperta meeutp edition)
Redesigning everything (avanscoperta meeutp edition)
 
The alignment
The alignmentThe alignment
The alignment
 
Making Steaks from Sacred Cows
Making Steaks from Sacred CowsMaking Steaks from Sacred Cows
Making Steaks from Sacred Cows
 
Carmen Brion - The value for product teams to design think
Carmen Brion - The value for product teams to design thinkCarmen Brion - The value for product teams to design think
Carmen Brion - The value for product teams to design think
 
Chasing elephants
Chasing elephantsChasing elephants
Chasing elephants
 
Guerrilla portfolio management
Guerrilla portfolio managementGuerrilla portfolio management
Guerrilla portfolio management
 
Taking Responsibility for the Things We Unleash Into the World - IoT Meetup 2...
Taking Responsibility for the Things We Unleash Into the World - IoT Meetup 2...Taking Responsibility for the Things We Unleash Into the World - IoT Meetup 2...
Taking Responsibility for the Things We Unleash Into the World - IoT Meetup 2...
 
Design for developers
Design for developersDesign for developers
Design for developers
 
Thoughts on metrics and goals for responsible products
Thoughts on metrics and goals for responsible productsThoughts on metrics and goals for responsible products
Thoughts on metrics and goals for responsible products
 
Extreme DDD modelling
Extreme DDD modellingExtreme DDD modelling
Extreme DDD modelling
 
Increasing Innovation IQ
Increasing Innovation IQIncreasing Innovation IQ
Increasing Innovation IQ
 
What are machines learning? How might that impact design?
What are machines learning? How might that impact design?What are machines learning? How might that impact design?
What are machines learning? How might that impact design?
 
L'illusione dell'ortogonalità
L'illusione dell'ortogonalitàL'illusione dell'ortogonalità
L'illusione dell'ortogonalità
 
Pissing against the wind
Pissing against the windPissing against the wind
Pissing against the wind
 
Nick Fine - Scientific Design
Nick Fine - Scientific Design Nick Fine - Scientific Design
Nick Fine - Scientific Design
 
50.000 orange stickies later
50.000 orange stickies later50.000 orange stickies later
50.000 orange stickies later
 

En vedette

From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
Building Intelligent Data Products (Applied AI)
Building Intelligent Data Products (Applied AI)Building Intelligent Data Products (Applied AI)
Building Intelligent Data Products (Applied AI)Stephen Whitworth
 
Building Intelligent Data Products
Building Intelligent Data ProductsBuilding Intelligent Data Products
Building Intelligent Data ProductsStephen Whitworth
 
Launching Data Products for Fun and Profit
Launching Data Products for Fun and ProfitLaunching Data Products for Fun and Profit
Launching Data Products for Fun and ProfitZach Gemignani
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data ProductsPeter Skomoroch
 
Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle
Understanding Feature Space in Machine Learning - Data Science Pop-up SeattleUnderstanding Feature Space in Machine Learning - Data Science Pop-up Seattle
Understanding Feature Space in Machine Learning - Data Science Pop-up SeattleDomino Data Lab
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 

En vedette (8)

From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Building Intelligent Data Products (Applied AI)
Building Intelligent Data Products (Applied AI)Building Intelligent Data Products (Applied AI)
Building Intelligent Data Products (Applied AI)
 
Building Intelligent Data Products
Building Intelligent Data ProductsBuilding Intelligent Data Products
Building Intelligent Data Products
 
Launching Data Products for Fun and Profit
Launching Data Products for Fun and ProfitLaunching Data Products for Fun and Profit
Launching Data Products for Fun and Profit
 
Developing Data Products
Developing Data ProductsDeveloping Data Products
Developing Data Products
 
Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle
Understanding Feature Space in Machine Learning - Data Science Pop-up SeattleUnderstanding Feature Space in Machine Learning - Data Science Pop-up Seattle
Understanding Feature Space in Machine Learning - Data Science Pop-up Seattle
 
Andrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at BaiduAndrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at Baidu
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 

Similaire à Applied Computer Vision - a Deep Learning Approach

0-1--Introduction FPCV-0-1.pdf
0-1--Introduction FPCV-0-1.pdf0-1--Introduction FPCV-0-1.pdf
0-1--Introduction FPCV-0-1.pdfPatrickMatthewChan
 
IRJET- ATM Security using Machine Learning
IRJET- ATM Security using Machine LearningIRJET- ATM Security using Machine Learning
IRJET- ATM Security using Machine LearningIRJET Journal
 
01Introduction.pptx - C280, Computer Vision
01Introduction.pptx - C280, Computer Vision01Introduction.pptx - C280, Computer Vision
01Introduction.pptx - C280, Computer Visionbutest
 
Computer vision lightning talk castaway week
Computer vision lightning talk castaway weekComputer vision lightning talk castaway week
Computer vision lightning talk castaway weekChristopher Decker
 
Everything You Need to Know About Computer Vision
Everything You Need to Know About Computer VisionEverything You Need to Know About Computer Vision
Everything You Need to Know About Computer VisionKavika Roy
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItHolberton School
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do itGregory Renard
 
Senior Project Paper
Senior Project PaperSenior Project Paper
Senior Project PaperMark Kurtz
 
Face Recognition - Deep Learning
Face Recognition - Deep LearningFace Recognition - Deep Learning
Face Recognition - Deep LearningAashish Chaubey
 
Face Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image ProcessingFace Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image Processingpaperpublications3
 
Human Emotion Recognition using Machine Learning
Human Emotion Recognition using Machine LearningHuman Emotion Recognition using Machine Learning
Human Emotion Recognition using Machine Learningijtsrd
 
metanoia presentation
metanoia presentationmetanoia presentation
metanoia presentationaiWorker.com
 
Machine Learning Fundamentals.docx
Machine Learning Fundamentals.docxMachine Learning Fundamentals.docx
Machine Learning Fundamentals.docxHaritvKrishnagiri
 
Face Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image ProcessingFace Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image Processingpaperpublications3
 
E-Learning Balancing Act: Good vs Efficient development-web_version092010
E-Learning Balancing Act: Good vs Efficient development-web_version092010E-Learning Balancing Act: Good vs Efficient development-web_version092010
E-Learning Balancing Act: Good vs Efficient development-web_version092010tmharpster
 
Professional effectiveness train the triple a+ trainer workshop hand out
Professional effectiveness train the triple a+ trainer workshop hand outProfessional effectiveness train the triple a+ trainer workshop hand out
Professional effectiveness train the triple a+ trainer workshop hand outPower2Improve
 
PBL presentation p2.pptx
PBL presentation p2.pptxPBL presentation p2.pptx
PBL presentation p2.pptxTony383416
 

Similaire à Applied Computer Vision - a Deep Learning Approach (20)

0-1--Introduction FPCV-0-1.pdf
0-1--Introduction FPCV-0-1.pdf0-1--Introduction FPCV-0-1.pdf
0-1--Introduction FPCV-0-1.pdf
 
IRJET- ATM Security using Machine Learning
IRJET- ATM Security using Machine LearningIRJET- ATM Security using Machine Learning
IRJET- ATM Security using Machine Learning
 
Python Project.pptx
Python Project.pptxPython Project.pptx
Python Project.pptx
 
01Introduction.pptx - C280, Computer Vision
01Introduction.pptx - C280, Computer Vision01Introduction.pptx - C280, Computer Vision
01Introduction.pptx - C280, Computer Vision
 
Computer vision lightning talk castaway week
Computer vision lightning talk castaway weekComputer vision lightning talk castaway week
Computer vision lightning talk castaway week
 
Everything You Need to Know About Computer Vision
Everything You Need to Know About Computer VisionEverything You Need to Know About Computer Vision
Everything You Need to Know About Computer Vision
 
Deep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do ItDeep Learning Class #0 - You Can Do It
Deep Learning Class #0 - You Can Do It
 
DL Classe 0 - You can do it
DL Classe 0 - You can do itDL Classe 0 - You can do it
DL Classe 0 - You can do it
 
Senior Project Paper
Senior Project PaperSenior Project Paper
Senior Project Paper
 
Face Recognition - Deep Learning
Face Recognition - Deep LearningFace Recognition - Deep Learning
Face Recognition - Deep Learning
 
Face Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image ProcessingFace Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image Processing
 
Human Emotion Recognition using Machine Learning
Human Emotion Recognition using Machine LearningHuman Emotion Recognition using Machine Learning
Human Emotion Recognition using Machine Learning
 
metanoia presentation
metanoia presentationmetanoia presentation
metanoia presentation
 
Edgenuity syllabus-animation
Edgenuity syllabus-animationEdgenuity syllabus-animation
Edgenuity syllabus-animation
 
Machine Learning Fundamentals.docx
Machine Learning Fundamentals.docxMachine Learning Fundamentals.docx
Machine Learning Fundamentals.docx
 
Face Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image ProcessingFace Recognition & Detection Using Image Processing
Face Recognition & Detection Using Image Processing
 
E-Learning Balancing Act: Good vs Efficient development-web_version092010
E-Learning Balancing Act: Good vs Efficient development-web_version092010E-Learning Balancing Act: Good vs Efficient development-web_version092010
E-Learning Balancing Act: Good vs Efficient development-web_version092010
 
Professional effectiveness train the triple a+ trainer workshop hand out
Professional effectiveness train the triple a+ trainer workshop hand outProfessional effectiveness train the triple a+ trainer workshop hand out
Professional effectiveness train the triple a+ trainer workshop hand out
 
PBL presentation p2.pptx
PBL presentation p2.pptxPBL presentation p2.pptx
PBL presentation p2.pptx
 
Real time facial expression analysis using pca
Real time facial expression analysis using pcaReal time facial expression analysis using pca
Real time facial expression analysis using pca
 

Plus de Jose Berengueres

DF in the industrial Sector in ME_Mars Presentation_22June2023.pptx
DF in the industrial Sector in ME_Mars Presentation_22June2023.pptxDF in the industrial Sector in ME_Mars Presentation_22June2023.pptx
DF in the industrial Sector in ME_Mars Presentation_22June2023.pptxJose Berengueres
 
Euro tax on cloud computing misinformation
Euro tax on cloud computing misinformationEuro tax on cloud computing misinformation
Euro tax on cloud computing misinformationJose Berengueres
 
Coaching session for the Future Mindset Challenge slides
Coaching session for the Future Mindset Challenge slides Coaching session for the Future Mindset Challenge slides
Coaching session for the Future Mindset Challenge slides Jose Berengueres
 
Human Factors f berengueres sweb654_2021_sp
Human Factors f berengueres sweb654_2021_spHuman Factors f berengueres sweb654_2021_sp
Human Factors f berengueres sweb654_2021_spJose Berengueres
 
Gamification and growth hacking lecture 1 of 3
Gamification and growth hacking lecture 1 of 3Gamification and growth hacking lecture 1 of 3
Gamification and growth hacking lecture 1 of 3Jose Berengueres
 
The SIX RULES OF DATA VISUALIZATION
The SIX RULES OF DATA VISUALIZATIONThe SIX RULES OF DATA VISUALIZATION
The SIX RULES OF DATA VISUALIZATIONJose Berengueres
 
Data Visualization for Policy Decision Making (impulse talk)
Data Visualization for Policy Decision Making (impulse talk)Data Visualization for Policy Decision Making (impulse talk)
Data Visualization for Policy Decision Making (impulse talk)Jose Berengueres
 
DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019
DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019
DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019Jose Berengueres
 
1 introduction to data visualization & storytelling chapter 1 slides
1   introduction to data visualization & storytelling  chapter 1 slides1   introduction to data visualization & storytelling  chapter 1 slides
1 introduction to data visualization & storytelling chapter 1 slidesJose Berengueres
 
Introduction to data visualization and storytelling - Chapter 1 slides
Introduction to data visualization and storytelling -  Chapter 1 slidesIntroduction to data visualization and storytelling -  Chapter 1 slides
Introduction to data visualization and storytelling - Chapter 1 slidesJose Berengueres
 
What is human centered design berengueres
What is  human centered design   berengueresWhat is  human centered design   berengueres
What is human centered design berengueresJose Berengueres
 
#Dgo2019 Conference workshop A3 - viza
#Dgo2019 Conference workshop A3 - viza#Dgo2019 Conference workshop A3 - viza
#Dgo2019 Conference workshop A3 - vizaJose Berengueres
 
Meetup creative design literature review by Kai Bruns 17 3-2019 2
Meetup creative design literature review by Kai Bruns 17 3-2019 2Meetup creative design literature review by Kai Bruns 17 3-2019 2
Meetup creative design literature review by Kai Bruns 17 3-2019 2Jose Berengueres
 
ikigai wheeloflife design for life
ikigai  wheeloflife design for life ikigai  wheeloflife design for life
ikigai wheeloflife design for life Jose Berengueres
 
IEEE Happiness an inside job asoman 2017
IEEE Happiness an inside job asoman 2017IEEE Happiness an inside job asoman 2017
IEEE Happiness an inside job asoman 2017Jose Berengueres
 
Palo alto design thinking meetup number 2
Palo alto design thinking meetup number 2Palo alto design thinking meetup number 2
Palo alto design thinking meetup number 2Jose Berengueres
 

Plus de Jose Berengueres (20)

DF in the industrial Sector in ME_Mars Presentation_22June2023.pptx
DF in the industrial Sector in ME_Mars Presentation_22June2023.pptxDF in the industrial Sector in ME_Mars Presentation_22June2023.pptx
DF in the industrial Sector in ME_Mars Presentation_22June2023.pptx
 
Euro tax on cloud computing misinformation
Euro tax on cloud computing misinformationEuro tax on cloud computing misinformation
Euro tax on cloud computing misinformation
 
Aaa
AaaAaa
Aaa
 
Coaching session for the Future Mindset Challenge slides
Coaching session for the Future Mindset Challenge slides Coaching session for the Future Mindset Challenge slides
Coaching session for the Future Mindset Challenge slides
 
Human Factors f berengueres sweb654_2021_sp
Human Factors f berengueres sweb654_2021_spHuman Factors f berengueres sweb654_2021_sp
Human Factors f berengueres sweb654_2021_sp
 
Gamification and growth hacking lecture 1 of 3
Gamification and growth hacking lecture 1 of 3Gamification and growth hacking lecture 1 of 3
Gamification and growth hacking lecture 1 of 3
 
The SIX RULES OF DATA VISUALIZATION
The SIX RULES OF DATA VISUALIZATIONThe SIX RULES OF DATA VISUALIZATION
The SIX RULES OF DATA VISUALIZATION
 
Data Visualization for Policy Decision Making (impulse talk)
Data Visualization for Policy Decision Making (impulse talk)Data Visualization for Policy Decision Making (impulse talk)
Data Visualization for Policy Decision Making (impulse talk)
 
DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019
DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019
DATA VISUALIZATION PRESENTATION AT ODS DUABI SEPTEMBER 2019
 
1 introduction to data visualization & storytelling chapter 1 slides
1   introduction to data visualization & storytelling  chapter 1 slides1   introduction to data visualization & storytelling  chapter 1 slides
1 introduction to data visualization & storytelling chapter 1 slides
 
Introduction to data visualization and storytelling - Chapter 1 slides
Introduction to data visualization and storytelling -  Chapter 1 slidesIntroduction to data visualization and storytelling -  Chapter 1 slides
Introduction to data visualization and storytelling - Chapter 1 slides
 
What is human centered design berengueres
What is  human centered design   berengueresWhat is  human centered design   berengueres
What is human centered design berengueres
 
#Dgo2019 Conference workshop A3 - viza
#Dgo2019 Conference workshop A3 - viza#Dgo2019 Conference workshop A3 - viza
#Dgo2019 Conference workshop A3 - viza
 
Meetup creative design literature review by Kai Bruns 17 3-2019 2
Meetup creative design literature review by Kai Bruns 17 3-2019 2Meetup creative design literature review by Kai Bruns 17 3-2019 2
Meetup creative design literature review by Kai Bruns 17 3-2019 2
 
ikigai wheeloflife design for life
ikigai  wheeloflife design for life ikigai  wheeloflife design for life
ikigai wheeloflife design for life
 
Data Visualization Tips
Data Visualization TipsData Visualization Tips
Data Visualization Tips
 
TIP Hannover Messe 2018
TIP Hannover Messe 2018TIP Hannover Messe 2018
TIP Hannover Messe 2018
 
Innovation event report
Innovation event reportInnovation event report
Innovation event report
 
IEEE Happiness an inside job asoman 2017
IEEE Happiness an inside job asoman 2017IEEE Happiness an inside job asoman 2017
IEEE Happiness an inside job asoman 2017
 
Palo alto design thinking meetup number 2
Palo alto design thinking meetup number 2Palo alto design thinking meetup number 2
Palo alto design thinking meetup number 2
 

Dernier

Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 

Dernier (20)

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 

Applied Computer Vision - a Deep Learning Approach

  • 1. Applied Computer Vision FOR UNDERGRADS A Deep Learning Approach J. Berengueres
  • 2. Applied Computer Vision for Undergrads version 2.2 Editor Jose Berengueres Edition First Edition. August 24th, 2014. Text Copyright © Jose Berengueres 2014. All Rights Reserved. i ©
  • 3. Video, Audio & Artwork Copyright Artwork appearing in this work is subject to their corresponding original Copyright or Creative Commons License. Except where otherwise noted a Creative Commons Attribution 3.0 License applies. Limit of Liability The editor makes no representations or warranties concerning the accuracy or exhaustivity of the contents and theories hereby presented and particularly disclaim any implied warranties regarding merchantability or fitness for a particular use including but not limited to educational, industrial and academic application. Neither the editor or the authors are liable for any loss or profit or any commercial damages including but not limited to incidental, consequential or other damages. Support This work was supported by: UAE University V ii
  • 4. 1 | Intro http://www.howwedrive.com/2012/01/23/let-the-robot-drive/
  • 5. How face recognition woks http://vimeo.com/12774628 Andrew NG http://www.youtube.com/watch?v=AY4ajbu_G3k Google Self driving Car https://www.youtube.com/watch?v=YXylqtEQ0tk 4 1 | Intro - Topics Introduction videos
  • 6. 2 | Theoretical Stuff OpenCV was developed by Intel Russia research center at Nizhny Novgorod Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html How to program a PC so it learns how to see? What does it mean to “see” something? 1. Projective geometry 2. The Eye Design 3. Saliency Sense in babies Optical Illusions are proof of information discarding at retina leve l (s e e al so de e p lear n ing ) What does it mean to See? ------
  • 7. 6
  • 8. Projective Geometry Many Books on Computer vision start by this topic. Which is quite irrelevant. Let me explain, Picasso did not benefit from learning the chemistry of Yellow paint manufacturing. You neither. You don’t need 3D geometry knowledge, you just need great 2D geometry knowledge. Additionally, his paintings were one of the first (since the middle ages) to ignore the laws of projective geometry. And they where a big success. His father was a frustrated art teacher, but he made sure that young Picasso would not become like himself. Picasso like Tiger Woods was a product of his father. 7 2 | Theoretical Stuff - Projective Geometry Caring about projective Geom in Computer Vsion is the same mistake as “model of the world” vs. “ground model” - you should not care about the world because you already have one model of it: Your retina is your world and your model! Do not complicate it by adding an additional layer or model of a model that u have to update continuously. You don’t even know if the world is real. See Godl. ------ Reality? Yo u r R e a l it y !
  • 9. Pin-hole Design Pin-hole design is one of the few cheap ways to convert a 3D world into a 2D picture (simplification). That is why man-made cameras use the same principle. For a fascinating story on how the pinhole eye is used by biological systems I recommend Climbing mount improbable. There are other ways in which nature “sees”: the bat’s ultrasound vision dolphins’ ultrasound vision the compound eye of fly Polarized light vision From a computer vision point of view it should not mater if your vision device is pinhole based or not. Compound Eye. What is the pixel resolution of the compound eye? 8 2 | Theoretical Stuff - The Design of the Eye
  • 10. 9 2 | Theoretical Stuff - The Design of the Eye
  • 11. http://www.detectingdesign.com/humaneye.html 10 2 | Theoretical Stuff - The Design of the Eye
  • 12. 2 | Theoretical Stuff - Developmental Psychology Hit a Wall When Maja Rudinac hit the wall of unpractical computer vision, she turned to developmental psychologists for strategies to cope with large amounts of image pixels. This is what she found (so you don’t have to): Visual Developmental Psychology Basics (Abridged from Maja Rudinac PhD thesis, TUDelft) Babies at the 4th month can already tell if a character is bad or good because we can see who they hug longer. Infants look longer at new faces or new objects Independent of where are born, all babies know boundaries of objects. Can predict collisions Basic additive and subtractive cognition Can identify members of own group versus non-own group Spontaneous motor movement is not goal directed at the onset. The baby explores the degrees of freedom Goal directed arm-grasp appears at the 4th month The ability to engage and disengage attention on targets appears from day 1 in babies. Smooth visual tracking is present at birth How baby cognition “works” Development of actions of babies is goal directed by two motives. Actions are either, 1. To discover novelty 2. To discover regularity 3. To discover the potential of their own body Development of Perception Perception in babies is driven by two processes: 1. Detection of structure or patterns 11
  • 13. 2 | Theoretical Stuff - Developmental Psychology 2. Discarding of irrelevant info and keeping relevant info Cognitive Robot Shopping List So if we want to make a minimum viable product (MVP) that can understand the world at least (as well or as poorly) as a baby does, this are the functions that according to Mrs. Maja (pronounced Maya) we will need: A WebCam Object Rracking Object Discrimination Attraction to peoples faces Face Recognition Use the hand to move objects to scan them form various angles Shades and 3D Turns out that shades have a disproportionate influence in helping us figure out 3D info from 2D retina pixels. When researchers at Univ. of Texas used fake shades in a virtual reality world, participants got head aches (because the faking of the shades was not precise enough to fool the brain. The brain got confused by the imperceptible mismatches... that’s why smart people get head aches in 3D cinemas) 12
  • 14. 3 | Number Recognition Workshop 2014. Number recognition workshop purpose is to learn computer vision basics. 16 2 7 3 8 4 9 50 Nizhny Novgorod street via: http://personal.cfw.com/~renders/nizhnymall_photos.html In this chapter we will learn the four basic components of a typical computer vision program: 1. Features 2. Clustering 3. Filtering/Morph ops 4. Validation we will use the example to learn OpenCV. And finally we will learn why manual feature making is obsolete because deep learning. Let’s learn by means of a simple example What does it mean to classify numbers? ------
  • 15. 3 | Number Recognition Workshop - Number Recognition Intro This workshop drives the introduction OpenCV functions as needs arise. Let’s identify written numbers from 0 to 9. You can get some inspiration in this video: http://www.youtube.com/watch?v=D_cZBdfw-hQ In Feb 2011 (just before the tsunami), we programmed HRP-IV to play a game. We used the histogram method to separate Caucasian skin from a background and then we counted the number of valleys and mountains between the hull vertexes. A more primitive approach is to average the area, but then it is not as robust. Workshop Time Teams of 4. You have 15 minutes to come up with some algorithm, trick or rule of thumb to classify the numbers. Note The students will try to find saliency features. Good saliency features are robust to: noise partial occlusions and, confusion Here is a typical list: # of horizontal segments # of pixels belonging to horizontal segments Length of segments # contains closed loops # start and end points Relative orientation of end points Extracting features from pixels is called feature extraction. Salient features are the ones that are most useful in classifying pixels (by Information Entropy). 14
  • 16. 3 | Number Recognition Workshop - Number Recognition 16 2 7 3 8 4 9 50 Feature Reduction as a Necessity Consider the letter ‘E’, its representation a s a vector is e = { 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0,0,1,1,1,1,1,1,1,0,0 0,0,1,0,0,0,0,0,0 ... } that is a a vector of dimension 11 x 13 = +100. There is no way for the retinal nerve to have enough bandwidth to send all that information for processing to the brain neurons. What happens in reality is that, already at retina level, features 15
  • 17. 3 | Number Recognition Workshop - Number Recognition are already being extracted, information is compressed (edge filters). Minimum Number of Features If we find five features that allow us to distinguish between al 10 different number shapes, then the problem becomes a dimension 5 problem, much more manageable. Minimum Number of features formula N> Log2 (different labels) Because the problem can be complex. Students can start by trying to distinguish just two numbers (the 1 and the 0). Manually extracting two features: vertical and horizontal features (feature reduction). Then, we can map the numbers in clusters. This helps them understand the need of more features when clusters overlap. Training So if we start with webcam photos of figures at 600x400 pixels how would you design the program? At this stage most students fail to recognize the need for training with lots of examples. In fact the hard part of this workshop is training (understanding what features work to differentiate 0 from 1 and a 1 from a 2...) (See also Andrew Ng video in Chapter 1) 16
  • 18. 3 | Number Recognition Workshop - Number Recognition Reflection Some teams came up with valley mountain features. Others with number of lines, or crossings. Let’s make a competition to see which is the winning team! (this is also a great excuse to help them learn openCV) Cognition Leads to Play? Once we have a working program with the ability to recognize numbers. We can Apply Toyota’s Kaizen (see The Brown Book of Design Thinking Chapter 3). What kind of games or apps can we make? What kind of applications can u imagine? Make a useful app. 17
  • 19. 3 | Number Recognition Workshop - Number Recognition 18
  • 20. 3 | Number Recognition Workshop - Number Recognition All computer vision apps today fundamentally work the same as this example Now we just need to learn ho w to: 1. D o tr ack ing 2. Clustering methods 3. How extr act fe at u re s w it h opencv without reinventing the wheel (next) If you can make an app that recognizes my hand writting better than my app, I will give you a.... A + 19
  • 21. 3 | Number Recognition Workshop - Clustering of Features How to Cluster Features (review of three methods) By Closest Neighbor By Centroids (Center of mass) k-means 20 Let’s have an honest discussion about clustering
  • 22. 3 | Number Recognition Workshop - Clustering of Features By Probability Given x is in x position what minimizes Prob. Error Comparison of methods Centroid --> x belongs to Green Neighbor --> Green Statistic --> Red x 21 Where does ‘x’ belong? ----- What is the best “strategy”
  • 23. 3 | Number Recognition Workshop - Clustering of Features http://en.wikipedia.org/wiki/Cluster_analysis 22 Where does ‘x’ belong? ----- What is the best “strategy” The previous are the main 3 unsuper vised clustering methods
  • 24. 3 | Number Recognition Workshop - Clustering of Features The Feature - Cluster Conundrum Now ask the students to make a flowchart of a program that can label hand written numbers 0 to 9. At this point most students come up with this flow chart The purpose of the exercise is to let the student realize the difference between training and labeling. Training is the hard part. Another hard part to get is how to discard useful from not useful features in the Dimension reduction phase. Previusly we said to use Entropy as a indicator of how useful a feature can be. However, if I plug random noise as a feature it will socre high on entropy. Because the quality of resutls depends on clustering and clusters depends on what features we choose, it is not a good idea to decouple features discarding from clustering itself. This is called the feature-cluster conundrum. In the next chapter, lets find how feature-finding has been traditionally approached in the 80’s and 90’s (this is now of course obsolete knowledge but I will include it here as a “nostalgic” note). 23 So, learnt (knowledge) = List of Feature + Cluster boundaries?
  • 25. 3 | Number Recognition Workshop - Morphological Ops Morphological Ops Morphs ops is different from linear filters in that they are not linear. Imagine you have the letter E but that the corner pixel has been erased because of some noise. Your brain (Gestalt Theory) can reconstruct, in fact, it is designed to reconstruct intersections of lines. However, a computer is not you. It does not know anything about Gestalt and so how can we reconstruct this missing corner automatically? If we don’t the computer might think this is an F underlined. You can reconstruct this by use of a so called ‘morph op’ called closing 24 F! E!
  • 26. 3 | Number Recognition Workshop - Morphological Ops Closing 1. Dilate - enlarge black pixel by adding black pixel next to pre existing black pixels using some kind of rule http://www.youtube.com/watch?v=xO3ED27rMHs 2. Erode - the reverse process http://www.youtube.com/watch?v=fmyE7DiaIYQ before after Opening Same as previous but in reverse 2 and 1 in order before after Case uses Closing is used to connect missing lines, parts. Opening is used to remove noise, that does not belong to the largest object in the scene. 25
  • 27. 3 | Number Recognition Workshop - Morphological Ops Structuring element In this case the blue cross is the structuring element. IT can be any other shape For more details: http://homepages.inf.ed.ac.uk/rbf/HIPR2/morops.htm http://bigwww.epfl.ch/demo/demoteaching.html Demos Practice We can use Excel to practice manual closing and opening 26
  • 28. 3 | Number Recognition Workshop - Morphological Ops L shape Excel exercise In this exercise, I asked the students to come up with a structuring element to reconstruct the E shape. Most of them propose the L-shape 3x3. After two closings we can realize the following: We managed to reconstruct the missing corner, but if the structuring element size will obliterate details inthe picture smaller than itself. This is one drawback of morphological operation. 27
  • 29. 3 | Number Recognition Workshop - OpenCV Code Review Code review - What’s wrong with openCV thinking https://github.com/orioli/MAID-ROBOT https://github.com/orioli/MAID-ROBOT/blob/master/uEyeCameraHIRO/camShiftDemo/camShiftDemo.cpp What the robot sees. Results of the code review: 400 lines of code to count 5 fingers. Not robust to skin color change Not resuable code or a general purpose solution it is too custom --> obsolescence assured 28 This project of 2007 is an example of everything what is wrong with OpenCV thinking. It is not the way forward. The brain does not work like this. It does not scale. So whats next? ------
  • 30. 3 | Number Recognition Workshop - Edge Detectors & convolution Now students have all the knowledge to make the sw to identify numbers. Ask the students to draft a detailed action plan to classify characters from 0 to 9 from photos. Here is an example: 1.Get the training dataset 1.How many need? 100 of each number? 1. Organize folders /1 /2 /3 … 2. Go to cafeteria and ask people to give sample of number 3. Digitize it. How? Take a picture with the iPhone 2.Training SW 1.Find features for numbers 1. How many do we need? 1. Five? 2. Let the SW choose the useful ones 2. Edges 3. Horizontal lines 4. Closed spaces 5. Shapes? 3.Cluster dataset 1. How good is the clustering? ! Testing 2. Choose the method with highest accuracy across testing subsets 3. The model will tell you which features are useful and which are not at labeling 4. How do you test it? 5. Bring other (new) numbers and check 6. Split dataset into training set and testing set. 50% - 50% 7. Prevent overfitting - Divide test data into chunks of 10 and see prediction accuracy for each individual chunk Now that students have drafted a plan let them do something concrete. draw a matrix of 1 and 0 that represents the binary matrix image of number 7 and ask them how would they extract features. 29
  • 31. 3 | Number Recognition Workshop - Edge Detectors & convolution Feature Extraction In Excel we can use conditional highlighting to visualize the filtering process. We start by a 7. Ask the students How would you extract a feature from the ones and zeroes? 30
  • 32. 3 | Number Recognition Workshop - Edge Detectors & convolution Gaussian filtering by convolution 31
  • 33. 3 | Number Recognition Workshop - Edge Detectors & convolution Vertical edge detector by convolution See also Canny Filters and Laplacians 1. http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/canny_detector/canny_detector.html 2. http://matlabserver.cs.rug.nl/cgi-bin/matweb.exe 3. http://www.youtube.com/watch?v=pIFnFhDsYlk 32
  • 34. 3 | Number Recognition Workshop - Edge Detectors & convolution 33 At this point we can realize that numbers are traces so what we types of classify the is to detection of need the traces? not edges? What kind of edge classifier can we use? viola-jones? convo lution? we want to know: type of edge, position 16 2 7 3 8 4 9 50
  • 35. 3 | Number Recognition Workshop - Edge Detectors & convolution 4 (0,0.5,4,13,33) it is a four! http://practicalquant.blogspot.ae/2013/10/deep-learning-oral-traditions.html 34 Convert to +100 features found by hand Example feature: Has a cross in the lower mid feature Tr ad i t i o n al approach
  • 36. 3 | Number Recognition Workshop - Edge Detectors & convolution 35 itth irse ea! LEVEL 3 SPACIAL RELATIONSHIP DETECTOR Layer - abs traction Deep learning approach LEVEL 2 PRIMITIVE SHAPE DETECTORS LEVEL 1 EDGE DETECTOR BANK http://cs.brown.edu/courses/cs143/2011/results/proj2/thuhe/
  • 37. 3 | Number Recognition Workshop - Features and Sparse Coding Feature Extraction http://www.youtube.com/watch?v=n1ViNeWhC24 36
  • 38. 3 | Number Recognition Workshop - Features and Sparse Coding At this point, it is probably a good time to ask the students to try to make a little system to classify handwritten numbers. See what error rate they come up. For the training set, they can ask friends to write numbers. Whatever you do, do not forget to set up a deadline, otherwise they will seize the opportunity. ( #rookiemistake ) How to do automatic feature extraction? Manual Feature Extraction is not the way forward 37 Most of Kaggle’s comp. winners are decided by how lucky they are at finding useful features ---- D. Efimov
  • 39. 4 | Numer Recognition Workshop Solution Number Recognition Workshop Solution
  • 40. 4 | Numer Recognition Workshop Solution - Deep Learning Workshop Recapitulation In the previous chapter we saw how to manually make features (aka feature extraction). We also so that feature extraction is about 50% of the work to win a kaggle competition. The other 50% is optimizing the mathematical prediction model ( Efimov 2012 ). We also saw that some geniuses, like Andrew Ng, postulate that manual feature extraction is a waste of time, that this is the kind of nitty gritty job that should be done by computers. We also saw in the section feature-cluster conundrum one more reason why feature engineering is part of the problem itself. In the section about Sparse Coding we saw a mathematical foundation that is a good reverse engineering of hwo the brain finds good features because both sparse coding and the brain end up with similar edge detector filters. DNN online, on-demand The company Ersats allows you to try Deep Neural Networks models online. They have a demo based on number recognition that we will use now (which is what we have been doing in the last chapter). The handwritten training sets are available from NIST at http://yann.lecun.com/exdb/mnist/. The author, Yann LeCun, compared different methods to to the job...Convolutional neural networks do the job better than metric approaches such as SVM. 39
  • 41. 4 | Numer Recognition Workshop Solution - Deep Learning Workshop The MNIST tutorial This tutorial explains how to use the cloud infrastructure to solve the number recognition problem via Deep Neural Networks. http://www.ersatzlabs.com/documentation/sharedMNIST/ More background info, current backdrops and videos on history of ANN at: (playlist of 5 videos) History of neural nets (playlist) https://www.youtube.com/watch?v=4B-XY8a4RGk https://gigaom.com/2014/06/11/more-deep-learning-for-the-masses- courtesy-of-ersatz-labs/ 40
  • 42. 4 | Numer Recognition Workshop Solution - Deep Learning Workshop Additionally, Ersats published an interesting introduction to Neural Nets at: http://neuralnetworksanddeeplearning.com/chap1.html 41
  • 43. 5 | Advanced Topics Advanced Topics
  • 44. Using Vision to Navigate http://www.youtube.com/watch?v=8c2SFXQ5zHM Sir James explains http://www.youtube.com/watch?v=oguKCHP7jNQ Navigation http://www.youtube.com/watch?v=xlaqYDZwoWo#t=43 Suction NT: Hoover tried to steal his idea. They lost in court with punitive damages. Top right photo HEAD SPORTS. 43 5 | Advanced Topics - Vision for Navigation One day my wife was sick so I had to vacuum the house. Then I realized that bag based vacuum cleaners do not suck, So I decided to make one that sucks more. I took longer than expected though. ------ Sir J. Dyson
  • 45. What People Cured of Blindness See Abridged from “What People Cured of Blindness See” BY PATRICK HOUSE” The New Yorker How quickly, if at all, does the brain adapt and vision return after surgery? A simple answer, and a correct one, is that it depends entirely on circumstance. Back in 1993, Oliver Sacks wrote a story in the magazine about Virgil, a man with limited to no vision as a child who had developed cataracts at the age of six. After his cataracts were removed, fifty years later, Virgil had trouble adjusting. (For example, he could not always distinguish the letter “A” from the letter “H” and, when given Molyneux’s test, could not tell a square he felt from a square he saw.)* Since the surgeries, Sinha has followed up with the Prakash children and found that, while they continued to suffer from poor acuity, many higher-order aspects of vision seemed to be improving. Within a week to a few months after surgery, the children could match felt objects to their visual counterparts. They also improved on spatial-navigation tasks requiring mental imagery, which tested their ability to follow a series of up, down, left, and right directions on a visually imagined game board. This finding was particularly important because previous work by Kosslyn and others had found that the congenitally blind have a capacity for mental imagery, but it is limited in some ways and becomes increasingly poor as the task becomes more complex. (In one example, a sighted person will imagine a typewriter a few feet away as larger than the same one imagined a hundred feet away. Among the congenitally blind, however, the imagined typewriter—a composite of experiences of touch and sound alone—is the same size at all distances.) 44 5 | Advanced Topics - Blindness
  • 46. Kosslyn believes that any improvements in mental imagery will require a “catalogue of visual memories” that can then be used to build expectations about the visual world. “When you develop expectations, you can use the fruits of previous experience to help you process what’s coming in now,” Kosslyn said. “But you need to have had that experience.” An example is depth perception: to the sighted, with a lifetime of practice, rules about occlusion (if A occludes B, object A is closer) and foreshortening (objects farther away appear smaller) are continually used to combine incoming light into a rich, three-dimensional world. The absence of these rules can frustrate the newly sighted, whose visual world can be both blurry and two-dimensional—paintings and people are often described as “flat, with dark patches”; a far-away house is “nearby, but requiring the taking of a lot of steps”; streetlights seen through glass are “luminous stains stuck to the window”; sunbeams through tree branches collapse into a single “tree with all the lights in it.” (The writer Jorge Luis Borges, who went blind at age fifty-five, described going blind as a process by which “everything near becomes distant.” In the newly sighted, without depth perception, the opposite seems true: the distant—tiny houses on the horizon, clouds in the impossibly high sky—suddenly looks nearby.) 45 5 | Advanced Topics - Blindness
  • 47. Ways of tracking http://www.youtube.com/watch? v=InqV34BcheM 1. By Color histogram (HSV is less dependent on illumination) 2. By Blob (OpenCV Library, not very robust) 3. By Face Detection 4. By Saliency (robust to occlusions Traditional methods Typical Keypoint Extraction for recognition of objects independent of view: Harris Afine MSER Hessian Afine Maja’s method Insight Features extracted Use of HSV histogram (robust to ilumination changes) Texture by Gray level co-occurrence matrix Edge orientation histogram (6 bins) Mean, skewness and sd for each color channel Discard all but 25 top features. Tested on Columbia Object Image Library. Beats previous methods. 46 5 | Advanced Topics - Easy Tracking We are looking for an algo that is invariant to partial oclusions ------ Maja R.
  • 48. 47 5 | Advanced Topics - Easy Tracking
  • 49. For more: 48 5 | Advanced Topics - Easy Tracking
  • 50. Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. See these course notes for a brief introduction to Machine Learning for AI and an introduction toDeep Learning algorithms. www.deeplearning.net/tutorial/ Deep Learning explained (Abridged from the original from Pete Warden | @petewarden) http://radar.oreilly.com/2014/07/what-is-deep-learning-and-why-should-you-care.html Inside an ANN The functions that are run inside an ANN are controlled by the memory of the neural network, arrays of numbers known as weights that define how the inputs are combined and recombined to produce the results. Dealing with real-world problems like cat-detection requires very complex functions, which mean these arrays are very large, containing around 60 million (60MBytes) numbers in the case of one of the recent computer vision networks. The biggest obstacle to using neural networks has been figuring out how to set all these massive arrays to values that will do a good job transforming the input signals into output predictions. Renaissance It has always been difficult to train an ANN. But in 2012, a breakthrough, a paper sparks a renaissance in ANN. Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton bring together a whole bunch of different ways of accelerating the learning process, including convolutional networks, clever use of GPUs, and some novel mathematical tricks like ReLU and dropout, and showed that in a few weeks they could 49 5 | Advanced Topics - Deep Learning
  • 51. train a very complex network to a level that outperformed conventional approaches to computer vision. GPU photo by Pete Warden slides (Jetpack) Listen to the Webcast at Strata 2013 http://www.oreilly.com/pub/e/3121 http://www.iro.umontreal.ca/~pift6266/H10/intro_diapos.pdf Deep NN failed unitl 2006.... 50 5 | Advanced Topics - Deep Learning
  • 52. Automatic speech recognition The results shown in the table below are for automatic speech recognition on the popular TIMIT data set. This is a common data set used for initial evaluations of deep learning architectures. The entire set contains 630 speakers from eight major dialects of American English, with each speaker reading 10 different sentences.[48] Its small size allows many different configurations to be tried effectively with it. The error rates presented are phone error rates (PER). http://en.wikipedia.org/wiki/Deep_learning#Fundamental_concepts 51 5 | Advanced Topics - Deep Learning
  • 53. Andrew Ng on Deep Learning where AI will learn from untagged data 52 5 | Advanced Topics - Deep Learning
  • 55. To learn more about Andrew Ng on Deep Learning and the future of #AI - http://new.livestream.com/gigaom/FutureofAI (~1:20:00) - https://www.youtube.com/watch?v=W15K9PegQt0#t=221 -http://deeplearning.stanford.edu/ Neural Nets talk by Ilya Sutskever https://vimeo.com/77050653 [25:00 - end] 54 5 | Advanced Topics - Deep Learning
  • 56. 6 | Sample Questions Sample Questions
  • 57. 1. OpenCV is a software app that is useful because 1. it not a software app 2. false 3. true 4. it is useful because it integrates functionality commonly used to perform computer vision 2. Projective geometry is useful because 1. every pinhole based visual system is subject to the laws of projective geometry 2. compound eyes are subject to projective geometry 3. all are false 4. all are true 3. Pinhole eye design is popular in nature 1. true 2. false 4. A feature is 1. a characteristic of a image 2. the output of a filter 3. the output of non linear filter (morphological) 4. a feature must be a number 5. a feature cannot be binary 6. a feature can be true or false 5. A feature that is a constant can still be useful for clustering purposes, 1. true 2. false 6. A good feature is a salient feature, 1. true 2. false 3. not always 56 6 | Sample Questions - A
  • 58. 7. The absolute minimum number of features to cluster 7 labels is six 1. true 2. false 3. its 8 4. 7 5. all are true 6. depends on the features 7. none 8. Use of features instead of dealing with pixels allows to simplify the problem of labeling a image 1. true 2. false 9. Clustering by closest neighbor is the most effective method to classify two overlapping clouds of points 1. true 2. k-means is in general superior 3. all of the above 4. none 10. Likelihood is the best way to cluster labels 1. true 2. false 3. it is better than k-means 11. A drawback of k-means is that you have to set the number of clusters 1. true 2. false 12. Learning to recognize a pattern is equivalent to, 1. Knowing the cluster boundaries and knowing which features to use 2. is to make a database of possible images and then when one of the images is presented to tell which one is same in the database 3. all 4. none 13. In morphological operations, closing is... 1. used to clean salt a pepper noise which is bigger than the objects on interest 2. used to clean salt a pepper noise which is smaller than the objects on interest 3. opening and dilation by this order 4. dilation and then erosion 14. Opening is 1. a tool to remove white noise form a black and white picture 57 6 | Sample Questions - A
  • 59. 2. it only works on black and white picture 3. erosion followed by closing 4. erosion followed by dilation 5. none 15. Erosion is 1. a linear operation 2. is reversible 3. all are true 16. Dilation is 1. used to connect disconnected lines 2. to remove vertical lines 3. none 4. all 17. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate 1. [1,1,1,1,1,1,1,1] (as a vertical column) 2. [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1] 3. [1,1,1,3,4,3,1,1] 4. [0,0,0,1,0,0] 5. [1,1,1] 6. [1,1] 7. none 18. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate 1. a 4x4 matrix of ones 2. a 7x1 matrix of ones 3. a 7x7 matrix of ones 4. a 1x7 matrix of ones 5. none 19. If I want to connect vertical lines that are disconnected by an average pixel length of 5 pixels, what structuring element is appropriate 1. a 1x5 matrix of ones 2. a 5x1 matrix of ones 3. a 5x5 matrix of zeroes 4. a 5x1 matrix of zeroes with a 1 in the middle 5. none 20. Ideally the features to do arabic number recognition would be 1. correspond to traces, 2. not correspond to traces but numbers 3. both 4. none 58 6 | Sample Questions - A
  • 60. 21. The best features for pattern recognition are those that appear infrequently in the dataset because 1. they have a lot of entropy 2. they discriminate numbers better because the frequency 3. none 4. all 22. Sparse coding is related to how the brain detects edges by means of basis functions 1. true 2. false 3. not always, depends on the variance of the images 59 6 | Sample Questions - A
  • 61. Dr. Jose Berengueres joined UAE University as Assistant Professor in 2011. He received MEE from Polytechnic University of Catalonia in 1999 and a PhD in bio-inspired robotics from Tokyo Institute of Technology in 2007. He has authored books on: The Toyota Production System Design Thinking Human Computer Interaction UX women designers Business Models Innovation He has given talks and workshops on Design Thinking & Business Models in Germany, Mexico, Dubai, and California.