The document provides an introduction to computer vision. It discusses key topics including:
- What computer vision is and why it is useful. It uses mathematical and computational tools to extract information from images and improve human vision.
- Some basic concepts in computer vision including digital images, sampling, noise removal, segmentation, and feature extraction techniques.
- Where computer vision is used such as healthcare, autonomous vehicles, augmented/virtual reality, industry, social media, security, agriculture, and fashion.
- A brief history of computer vision including classical approaches and the revolution enabled by advances in artificial intelligence and deep learning.
25. A motivating application
Building a panorama
• We need to match/align/register images
Taken from PPT of M. Brown and D. Lowe, University of British
Columbia
26. Building a panorama
1) Detect feature points in both images
Taken from PPT of M. Brown and D. Lowe, University of British
Columbia
27. Building a panorama
1. Detect feature points in both images
2. Find corresponding pairs
Taken from PPT of M. Brown and D. Lowe, University of British
Columbia
28. Building a panorama
1. Detect feature points in both images
2. Find corresponding pairs
3. Find a parametric transformation (e.g. homography)
4. Warp (right image to left image)
Taken from PPT of M. Brown and D. Lowe, University of British Columbia
29. Scale-Space
, , , , ,L x y G x y I x y
2 2 2
2
2
1
, ,
2
x y
G x y e
31. Scale-Space Extrema Detection
• X is selected if it is larger or smaller than all 26
neighbors
Taken from David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
32. Keypoint Localization
Threshold on minimal contrast
Threshold on ratio of
principal curvatures
Taken from David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
33. Orientation assignment
• Create weighted histogram of
local gradient directions
computed at selected scale
• Assign canonical orientation at
peak of smoothed histogram
• For location of multiple peaks
multiply key point
Taken from David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004), pp. 91-110.
35. • If you’ve been to a concert recently, you’ve probably seen how many people take videos of the
event with mobile phone cameras
• Each user has only one video – taken from one angle and location and of only moderate quality
Mobile Crowdsourcing Video Scene
Reconstruction
36. Creation of the 3D Video Sequence
The scene is photographed by
several people using their cell
phone camera
The video data is
transmitted via the
cellular network to a
High Performance
Computing server.
Following time
synchronization, resolution
normalization and spatial
registration, the several videos
are merged into a 3-D video
cube.
TIME
40. • Precise : epipolar matching is both fast and accurate
• Dense multi-scale description of the images using binary descriptors
3D Model Reconstruction
41. • Precise : epipolar matching is both fast and accurate
• Empirical probability density check to discard false positives at occlusion
points
Correct match : max peak above other local max
Wrong match : max peak similar to other local max
3D Model Reconstruction
42. • Robust : works even with a minimal set of inputs
• two viewpoints already sufficient for dense reconstruction
• very few erroneous points
3Dreconstruction
3D Model Reconstruction
46. • Image classification revolutionized by DL
– ImageNet – from 27% to ~5% in three years
What happened in ImageNet 2012 ?
47. • In this case, our classifier would have a decision boundary
more complex than the simple straight line.
• All the training patterns would be separated perfectly.
Learning Methods - Supervised
48. • Simpler recognizer better performance on novel patterns.
• This is one of the central problems in statistical pattern
recognition.
Learning Methods - Supervised
49. • Feature extraction
Discriminative features
Invariant features with respect to translation, rotation and scale.
• Classification
Use a feature vector provided by a feature extractor to assign the
object to a category
Learning Methods - Supervised
50. • A multi layered Neural Network (NN)
– With non linearity
• The input to the DL NN is presented at the input layer
– Images, sound, laguage...
• Hidden layers extracts increasingly abstract features
• Output layer contains the result
Neural Networks
51. • More complex neural networks with multiple layers and multiple output neurons are
theoretically capable of separation using any continuous surface.
• The straight line depicts the separation achieved by a simple Perceptron and the curve the
separation by a multi-layered network (left), which is in theory able to learn any separating
function.
Learning methods
Supervised
52. What can you do with Machine & Deep Learning ?
• Train classifiers for specific recognition tasks
• Localization
– fully convolutional connected networks
– train networks, e.g. RCNN, YOLO
53. The task: What object category do we see in the image?
Benchmark: 1000 category “ImageNet” dataset.
(from Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep
Convolutional Neural Networks, NIPS, 2012.)
Image Classification
54. 2012 AlexNet results were what jump started the current neural network wave.
Current results (networks with tens/hundreds of layers) perform better than humans on this task!
Image Classification - continued
55. Task: Find bounding boxes and categories of objects in the image.
Benchmark: PASCAL VOC / Microsoft COCO
In right video: YOLO 2 (You Look Only Once)
Object Detection
https://www.youtube.com/watch?v=VOC3huqHrssg
56. Task: Assign each image pixel a category label (person, wall, road, dog, ..)
Example Dataset: PASCAL VOC2012 Challenge
SegModel, Deep-Labv2, and many more..
Image Semantic Segmentation
57. Task: Given an image and a question, answer the question.
Question Answering
58. Task: Given a dataset of images, generate new artificial image that look real!
State of the art: Generative Adversarial Networks (GANs)
Image Generation
59. Steering the wheels of self driving cars
Super resolution
Image completion
Saliency detection
Human pose detection
Facial keypoints (nose, eye, ear..)
Image captioning
Activity recognition
And many, many more tasks ….
60. SagivTech Traffic Lights Detection using DL
With dlib & a few
images from
Google street
view
https://www.youtube.com/watch?v=jg444J2AmOI
https://www.youtube.com/watch?v=YV4y1iqo_TQ
61. What can you do to get into this world ?
• Theory:
– Get to know basic computer vision and some “classical” algorithms, e.g.
Viola Jones, SIFT, etc.
– Get to know Deep Learning, e.g. CS 231 by Stanford
• Practice:
– Get to know and use OpenCV
– Hands on experience with Caffe, Tensor Flow etc.
62. Technology and Professional Services company
Established in 2009 and headquartered in Israel
SagivTech Snapshot
• What we do:
• Technological
Solutions
• Projects
• Research
• Core domains:
• Computer Vision
• Deep Learning
• Code Optimization
• GPU Computing
63.
64. Thanks for the following SagivTech team members and
collaborators:
Acknowledgements
• Prof. Peter Maass, University of Bremen
• Prof. Pierre Vandergheynst, EPFL
• Dov Eilot, SagivTech
• Jacob Gildenblat, SagivTech
• Amir Egozi, SceneNet project
65. Thank You
F o r m o r e i n f o r m a t i o n p l e a s e c o n t a c t
C h e n S a g i v
c h e n @ s a g i v t e c h . c o m
+ 9 7 2 5 4 7 7 0 6 0 8 9