SlideShare une entreprise Scribd logo
1  sur  100
Télécharger pour lire hors ligne
ICVSS
MARINA DI RAGUSA JULY 2014
Prof. Rita Cucchiara
DIPARTIMENTO DI INGEGNERIA Enzo Ferrari
Università di Modena e Reggio Emilia, Italia
Egocentric vision
tracking and recognizing human signs
From fundamentals to applications
http://www.Imagelab.ing.unimore.it
Rita Cucchiara ICVSS 2014, Italy
AGENDA
Egocentric vision: from applications to fundamentals (and viceversa)
• Introduction
• Challenges in ego-vision problems
• Recognizing ego-gestures by motion
• The (unsolved) tracking problem, in ego-vision too
• Discussion  at Ragusa Ibla
Rita Cucchiara ICVSS 2014, Italy
1.INTRODUCTION
Can we know what you are looking at?
Rita Cucchiara ICVSS 2014, Italy
EGOCENTRIC VISION
Egocentric vision ( “Ego-Vision”)
models and techniques for understanding what a person sees, from the
first person’s point of view and centered on the human perceptual needs.
Often called first-person vision, to recall the needs of using wearable cameras
(e.g. on glasses mounted on the head) for acquiring and processing the same
visual stimuli that human acquire and process.
a broader meaning …..
to understand what a person sees or want to see
or would like to see (e.g. in case of vision impairments)
and to exploit similar learning,
perception and reasoning paradigms of humans..
Rita Cucchiara ICVSS 2014, Italy
RESEARCH @IMAGELAB
from surveillance…
to new vision sensors
Floorimage
Drones
Smartphones Ego-vision
Rita Cucchiara ICVSS 2014, Italy
A SMALL INCOMPLETE STORY..
1961 Edward O. Thorp,
(with Shannon) built a
computerized timing
devices for cheating at the
game of roulette
( from [Thorp ICSWC98] )
1980.. now Steve Mann (now
at Univ. Of Toronto) defines
many concepts of wearable
computer for vision
“My current wearable prototype, equipped with head-mounted
display, cameras, and wireless communications, enables computer-
assisted forms of interaction in ordinary situations-for example,
while walking, shopping, or meeting people-and it is hardly
noticeable.” [Mann Computer1997]
1998… now MIT
wearable lab
Alex Pentland
Bernt Schiele…
……..
Rita Cucchiara ICVSS 2014, Italy
A SMALL INCOMPLETE STORY
2004 Richard Devaul ……….
Phd thesis in the memory glass
s. A. Pentland
[Devaul MIT 04]
2009
1° CVPR
workshop on
Egocentric
Vision
By Philipose,
Herbert, RenMITrill2000 …Google Glass
2012
2° CVPR
workshop on
Egocentric
Vision
By Regh,
Ramanan, Rem,
Fathi, Pisiavash
2014
3° CVPR
workshop on
Egocentric
Vision
By Kitani, Lee,
Rioo, Fathi
Three papers at
CVPR2014
A session on
IWCV 2014
….
Here we are.
T. Kanade,
“First-person,
inside-out
vision,” 2009,
keynote
Google temporally banned facial recognition technology on
Google Glass due to privacy concerns,
Rita Cucchiara ICVSS 2014, Italy
WHY NOW, IN 2014 ?..
Technology
(hw & sw
availabilty)
Applications
Systems
Services
Market
needs
ideas
Rita Cucchiara ICVSS 2014, Italy
APPLICATIONS (OR MARKET NEEDS?)
Recording your life digitally..
Memex 1945 Vaneer Bush
BigBrother 1997
Mylifebits Gordon Bell Microsoft 2000
Sensecam Microsoft..
Sony Core
Google Glass
Life caching, Lifeblogging..
Human Augmentation
Applications
Creating cognitive and
physical improvements as
an integral part of the
human body.
(Gartner IT Glossary 2012)
Life Logging
Rita Cucchiara ICVSS 2014, Italy
LIFE LOGGING: MARKET
a megabit for every second in a year…. roughly 10 million seconds per year.
10 Tbit necessary for everything I look at for a year
MEMOTO 2 days memory with 2 shot at minute; 4 GB data per day amounts to
up to 1,5 terabyte per year… you can put on his cloud..
Autographer is a camera capable of shooting up to 2,000 shots a day
Gopro ( see @youtube)
The Lifelogger camera driving design requirements:
1. compact and small enough to wear like a necklace
2. wide field of view to capture what I see
3. battery life to last a full day
Rita Cucchiara ICVSS 2014, Italy
AUGMENTED VISION
Who is who?
Augmenetd vision for
Security surveillance
Smart manufacturing
Natural interfaces, HCI 2.0
Wellbeing
Education
Entertainment
……
Rita Cucchiara ICVSS 2014, Italy
AUGMENTED VISION
Applications and market needs…. MANY
Applications for reading for impaired and blind people
--OrCam (Shasua 2010, Hebrew Univ spin-off)
--Applications for smart manufacory
- Viewranger (japan)
Applications for security and surveillance
Rita Cucchiara ICVSS 2014, Italy
2. CV CHALLENGES FOR EGOVISION
Why egocentric vision is not so trivial?
Rita Cucchiara ICVSS 2014, Italy
CV CHALLENGES FOR EGOVISION (1/6)
a. Hardware
• Design new hardware [Badino, Kanade MVA 2011]
• Exploit real-time capabilities for egovision
At UNIMORE & ETHZ: Odroid XU ARM Exynos 5 Heterogeneous octa core.
Google Glass Vuzix M100 Kopin Golden-i Olympus Meg4.0
Rita Cucchiara ICVSS 2014, Italy
CV CHALLENGES FOR EGOVISION (2/6)
b. Recognizing FoA and PoI
• Eye-tracking; & ego vision [Tsukada, ETRA2012]
• Estimating FoA [Li ICCV2013], [OgakiCVPRW2012], [Fathi
ECCV2012]
A.Fathi, Y.Li, J. Rehg, Learning to Recognizing Daily Actions using Gaze ECCV 2012
Rita Cucchiara ICVSS 2014, Italy
CV CHALLENGES FOR EGOVISION (3/6)
c. Recognizing head motion
• Head/body motion for outdoor summarization [ Poleg CVPR2014]
• Motion for indoor summarization [Grauman CVPR2013]
• Motion for supporting attention [Matsuo, CVPRW2014]
• Motion for SLAM as in robotics [Bahera, ACCV2012]
Cumulative displacement curves:
Poleg, Arora, Peleg Temporal Segmentation of Egocentric Videos CVPR2014
Rita Cucchiara ICVSS 2014, Italy
UNDERSTANDING MOTION
Dense OF vs Sparse OF
Classical OF approach Classical LK approach
Image Brightness Constancy assumption
0








t
E
dt
dy
y
E
dt
dx
x
E
  0 t
T
EE v
     0.  iti EvuE pp   
   
   
 
 
 
12512225
25
2
1
2525
22
11
 
































bdA
E
E
E
v
u
EE
EE
EE
t
t
t
yx
yx
yx
p
p
p
pp
pp
pp
    




























ty
tx
yyxy
yxxx
TT
EE
EE
v
u
EEEE
EEEE
bAdAA 1252521222
Farneback OF [2003]
Faster assumes that a local
neighborhood of each pixel an image
can be represented on a polynomial
basis……
1/2 should not be large (1 = larger eigenvalue)
Iterative Lucas-Kanade Algorithm [1981]
Rita Cucchiara ICVSS 2014, Italy
COMPUTING MOTION
optical flow…
Dense optical flow:
• Horn Shunck [AI 1981]
•Farneback [SCIA 2003]
•Liu, Chellappa Rozenfeld [ICPR2002]eigenvalues
•Medioni [PAMI 2008] tensor voting
•…
Sparse optical flow:
•Luckas Kanade [IJCAI 1981]
•Other keypoints
(SIFT, SURF…)
It is not enough for understanding head motion
Rita Cucchiara ICVSS 2014, Italy
CV CHALLENGES FOR EGOVISION (4/6)
d. Recognizing objects
• Objects useful for humans [Fathi, CVPR2013, ]
• Objects in the hand [Fathi, Rehg CVPR2011]
• Target tagging in the scene [Pirsiavash, Ramanan CVPR 2012]
• Objects around you [Yvashita et al ICR2014]
Rita Cucchiara ICVSS 2014, Italy
CV CHALLENGES FOR EGOVISION (5/6)
e. Recognizing actions
Self-actions, gestures [Kitani, CVPR 2013; Baraldi, EVW2014]
Actions of people, social actions [Ryoo, CVPR 2013]; [Alletto EFPVW 2014],
[Narayan CVPRW2014]
Actions in the environment (sport..) [Kitani, IEEE PC Magazine 2012]
Rita Cucchiara ICVSS 2014, Italy
EGO-GESTURE RECOGNITION
• monocular hand gesture recognition
• deal with static (hand pose) and dynamic
gestures (motion).
• very few positive samples
• Many changes in luminance
Rita Cucchiara ICVSS 2014, Italy
HAND SEGMENTATION IN EGOVISION
Hand recognition:
It is an old problem, many approaches in different contexts:
Skin classification: [Khan et al ICIP2010] Random forest: ( better than BN,
MP,NB, AdaB…)
Background subtraction after image registration [Fathi ICCV 2011] (assuming
static bckg, hands with objects etc..)
Generic object recognition : [Li, Kitani CVPR 2013] sparse feature selection
and a battery of RF trained with different luminance conditions
Rita Cucchiara ICVSS 2014, Italy
AN EGOVISION SOLUTION
Superpixel
segmentation
Temporal
coherence
Classification by
Collection of RFs
Superpixel
descriptors
Spatial
coherence
- SLIC (Simple linear Iterative clustering* [Achanta 2010])
- K means in 5D (Lab+xy)
Rita Cucchiara ICVSS 2014, Italy
AN EGOVISION SOLUTION
Superpixel
segmentation
Temporal
coherence
Classification by
Collection of RFs
Superpixel
descriptors
Spatial
coherence
- Descriptors
- -mean and covariance in RGB
- LabH and HSVH
- 27 Gabor filters (9 orientation, 3 scales
7x7,13x13,19x19)
- HoG
Rita Cucchiara ICVSS 2014, Italy
AN EGOVISION SOLUTION
Superpixel
segmentation
Temporal
coherence
Classification by
Collection of RFs
Superpixel
descriptors
Spatial
coherence
Classifier
- Collection of Random Forests
- Indexed by a 32 bin RGBH
- It encodes the appearance of the scene and
the global luminance
- Hp: bkg and hands similar color changes
Feature
vector
Scene
luminance
Global
Luminance
feature
Rita Cucchiara ICVSS 2014, Italy
AN EGOVISION SOLUTION
Superpixel
segmentation
Temporal
coherence
Classification by
Collection of RFs
Superpixel
descriptors
Spatial
coherence
Temporal smoothing
in a window of k frames
Posterior probability to be or not
a hand pixel in a previous window
Estimated priors
Rita Cucchiara ICVSS 2014, Italy
AN EGOVISION SOLUTION
Superpixel
segmentation
Temporal
coherence
Classification by
Collection of RFs
Superpixel
descriptors
Spatial
coherence
Spatial consistency
Eliminate spurious superpixels
Close holes
Use grabcut using posteriori as a seed point
Rita Cucchiara ICVSS 2014, Italy
HAND SEGMENTATION
there is a significant improvement in
performance when all the three consistency
aspect are used together:
• illumination invariance (II)
• temporal smoothing (TS)
• spatial consistency (SC).
• In standard datasets
Rita Cucchiara ICVSS 2014, Italy
CAMERA MOTION REMOVAL
Extract dense keypoints
Extract dense keypoints
Estimate Homography
-
-
Apply to
Original frame sequence
Output frame sequence
without camera motion
• In ego-camera views hands movement is usually not consistent with
camera motion, resulting in wrong matches between the two frames.
• a segmentation mask disregards feature matches belonging to hands
Dense
motion
Object detection
Rita Cucchiara ICVSS 2014, Italy
EGO-GESTURE FEATURE
• dense trajectories, HOG, HOG, MBH [Wang CVPR2013] extracted around
hand regions.
BoW
Power-normalization
and concatenation
BoW
BoW
BoW
TD descriptors
HOG descriptors
HOF descriptors
MBH descriptors SVM 1-vs-1
Rita Cucchiara ICVSS 2014, Italy
EXPERIMENTAL RESULTS
Datasets:
• The Cambridge-Gesture
database, with 900 sequences of
nine hand gesture types under
different illumination conditions;
• Our Interactive Museum
Dataset, an ego-centric gesture
recognition dataset with 700
sequences from seven gesture
classes performed by five subjects.
• The EDSH dataset, which
consists of three egocentric videos
with indoor and outdoor scenes
and large variations of
illumination.
See results in [Baraldi CVPRW2014]
Rita Cucchiara ICVSS 2014, Italy
CV CHALLENGES FOR EGOVISION (6/6)
f. Tracking: recognizing among the time
• tracking target objects
• tracking face and people [Alletto, ICPR 2014]
• Multiple target tracking
Rita Cucchiara ICVSS 2014, Italy
6. TRACKING: THE BIG CHALLENGE
Rita Cucchiara ICVSS 2014, Italy
TRACKING
Single object
from a single camera
static camera, moving camera
Smartphone, egovision………
Multiple objects
Single object
from multiple camera
Overlapped or Disjoined FoVs
Network of Egovision systems
Heterogeneous NoCs
Multiple objects
from distributed cameras
Target
Single multiple
FoV
Single
multiple
See at imagelab.ing.unimore.it
Rita Cucchiara ICVSS 2014, Italy
TRACKING IN EGOVISION
Straightforward relationships with robot vision… !
Similar to video-surveillance
• fast, real-time
• similar scenes ( typically people, social life, children..)
• many similar tasks( detection, action recognition , tracking)
• as in people tracking motion is unpredictable
but
• Unconstrained
• Large different motion factors
• Frequent Changes of field of view
• Interactive
Rita Cucchiara ICVSS 2014, Italy
OUR VISION…
How can we track objects?
Rita Cucchiara ICVSS 2014, Italy
HOW DO WE SEE?
The path: (E. Kendall, 2008)
1) The stimuli from retinae through two parallel path reach the lateral
geniculate nucleus in thalamus, then to the cortex in the occipital lobe and
then in the temporal and frontal lobes.
2) Two parallel paths
1) The way of WHAT in the temporal lobe perceives color, shape of
the object, the face..
2) The way of WHERE in parietal lobe provides localization of such
objects
3) Centers hierarchically connected, process
information and than come back
to the WHAT area and work together
Based on attention and purpose
where
what
Rita Cucchiara ICVSS 2014, Italy
SINGLE TARGET TRACKING
analyzing the behavior of current tracking solutions in following a given target
fray-by-frame
Tracking is the task of generating an
inference about the motion of an object
given a sequence of images *.
Arnold Smeulders, Dung M. Chu, Rita Cucchiara, Simone Calderara,
Afshin Deghghan and, and Mubarak Shah, Visual Tracking: an
Experimental Survey, IEEE TPAMI, 2014.
Rita Cucchiara ICVSS 2014, Italy
SINGLE TARGET TRACKING
1) Region of interest
2) Representation:
2.1)how to observe invariant and variant features in the frame and
2.2) how to hold them in an internal representation
1) Inference Method
2) Model Update
The data or
The observation
The state ( the object model)
The inference
The (unsolved) questions in tracking a
given target
Rita Cucchiara ICVSS 2014, Italy
THE HARDNESS OF TRACKING
Which is the invariance that can be perceived and maintained along the
time?
Tracking is hard as nothing is fixed:
• Problems of lights: the target aspect, the illumination,
• Problems of motion: the object/camera motion,
• Problems of scene: the occlusion, the confusion...
• …. Searching for the invariance in the video
Despite of the variety in the video ….most papers use 6 – 10 long videos
only. This covers variety poorly.
Rita Cucchiara ICVSS 2014, Italy
WHY TRACKING IS SO HARD?
1. light
10. scene contrast
2. object surface cover
3. object specularity
4. object transparency
5. object shape
11. scene occlusion
9. scene confusion
8. scene clutter
12. camera moving
13. camera zoom
6. motion smoothness
7. motion coherence
Light
Object aspect
Object Motion
Camera motion
Scene context
14. Long videosTemporal
coherence
14 Challenges in video tracking
Rita Cucchiara ICVSS 2014, Italy
LIGHT AND OBJECT ASPECT
1. light
2. object surface cover
3. object specularity
4. object transparency
5. object shape
Light
Object aspect
Rita Cucchiara ICVSS 2014, Italy
MOTION
6. motion smoothness
7. motion coherence
Object Motion
12. camera moving
13. camera zoom
Camera motion
14. Long videos
Temporal
coherence
Rita Cucchiara ICVSS 2014, Italy
SCENE
10. scene contrast
11. scene occlusion
9. scene confusion
8. scene clutter
Scene context
Rita Cucchiara ICVSS 2014, Italy
14 TRACKING CHALLENGES IN 313 VIDEOS
01-LIGHT
02-SURFACECOVER
03-SPECULARITY
04-TRANSPARENCY
05-SHAPE
06-MOTIONSMOOTHNESS
07-MOTIONCOHERENCE
08-CLUTTER
09-CONFUSION
10-LOWCONTRAST
11-OCCLUSION
12-MOVINGCAMERA
13-ZOOMINGCAMERA
14-LONGDURATION
Rita Cucchiara ICVSS 2014, Italy
DATASET
ALOV++
http://imagelab.ing.unimore.it/dsm/ or
www.alov300.org
Rita Cucchiara ICVSS 2014, Italy
TRACKING IN EGO-VISION
In egovision?
• all the previous problems!!
• relative motion
• No motion of observer but moving target
• Motion of observer but fixed target
• Motion of both observer and target
• ( the dataset at Imagelab.unimore.it)
• EGO_GROUP
• EGO_TRACK
Rita Cucchiara ICVSS 2014, Italy
GENERAL PURPOSE TRACKING
Single target tracking, without any constraints
From KALMAN, Ext. Kalman, LKT, Particle Filter, MeanShift, ……
new generations of tracking solutions
• explore multiple features and cues (SIFT; HOGS; SURF etc)
• explore multiple object representation (fragments, graphs)
• explore new optimization methods
• explore new solutions of machine learning
But what about the results?
Rita Cucchiara ICVSS 2014, Italy
THE STATE OF THE ART
19 tracking solutions for you..
Tracking by matching
Tracking with a
discriminative classification
Rita Cucchiara ICVSS 2014, Italy
THE STATE OF THE ART
19 tracking solutions for you..
Tracking by matching
• [NCC] Normalized Cross-Correlation
K. Briechle and U. Hanebeck, SPIE 2001
• [KLT] Lucas-Kanade Tracker
S. Baker and I. Matthews, IJCV2004
• [KAT] Kalman Appearance Tracker
H. Nguyen and A. Smeulders, TPAMI 2004
• [FRT] Fragments-based Robust
Tracking
A. Adam, E. Rivlin, and I. Shimshoni, CVPR2006
[MST] Mean Shift Tracking
D. Comaniciu, V. Ramesh, and P. Meer, CVPR2000
• [LOT] Locally Orderless Tracking
S. Oron, A. Bar-Hillel, D. Levi, S. Avidan, CVPR2012
Rita Cucchiara ICVSS 2014, Italy
THE STATE OF THE ART
19 tracking solutions for you..
Tracking by matching
• [IVT] Incremental Visual Tracking
D. Ross, J. Lim, and R.S.Lin, IJCV2008
• [TAG] Tracking on the Affine Group
J. Kwon and F.C. Park, CVPR2009
• [TST] Tracking by Sampling Trackers
J. Kwon, K.M. Lee, 2ICCV 011
• [TMC] Tracking by Monte Carlo sampling
J. Kwon, K.M. Lee,CVPR 2009
• [ACT] Adaptive Coupled-layer Tracking
L. Cehovin, M. Kristan, A. Leonardis,
ICCV2011
• [L1T] L1-minimazioni tracker
X.Mei H.Ling ICCV2009
• [L1O]L1 minimization with occlusion
X.Mei, H.Ling, Y.Wu,E.Blash, L.Bai
CVPR2011
Rita Cucchiara ICVSS 2014, Italy
THE STATE OF THE ART
19 tracking solutions for you..
Tracking with a
[FBT]Foreground-background tracking
Ngujen Smeulder IJCV 2006;
[HBT], Hough based tracking
Godec, Roth, Bishof ICCV2011;
[SPT ]super pixel tracking
Wang, Lu, Yang, Yang ICCV 2011;
[MIT] multiple instance learning
Babenko, Yang, Belongie CVPR2009;
[TLD] tracking learning detection
Kalal, Matas, Mikolajczyk CVPR 2010
[STRUC] structured output tracking
Hare, Saffari, Torr ICCV 2011
•
Rita Cucchiara ICVSS 2014, Italy
ONE PROBLEM, MANY SOLUTIONS…
RoI
Model update
Visual features
Appearance
motion
predictions
inference
data
Rita Cucchiara ICVSS 2014, Italy
REGION OF INTEREST
1. From manual or automatic detectors
2. From moving object segmentation (bkg suppression, OF segmentation)
3. From local features identification;
Rita Cucchiara ICVSS 2014, Italy
APPEARANCE
b) Histograms
color histograms ( MST;, TMC, HBT, SPT, )
intensity histograms (FRT, ACT)
Useful only is small patches, otherwise the spatial relationship information has
to be captures elsewhere in the tracking algorithm
c) Feature vectors
Useful is the shape of the object si important ( and constant, at least in some
parts)
- Haar gradients (MIT)
- 2D bin patterns (TLD)
- SURF (FBT)
- Lab-color features and others (HBT)……..
- Be careful in selecting true and stable invariants
Rita Cucchiara ICVSS 2014, Italy
APPEARANCE
Some trackers keep the appearance information in the scene,
i.e. add some contextual information
• Background intensity representation ( the methods based on bck
subtraction has a reference background) in surveillance. ([Wren et al
TPAM97],[MOG CVPR’99], [Sakbot TPAMI ‘03] )
• occlusion detection information [AD Hoc ‘PRL ’11,ALIEN ‘13]
• confusion information [Medioni CVPR ‘11]
• motion information of cameras [Qiogui, IASP11]
they cannot always used in general unknown visual contexts
Rita Cucchiara ICVSS 2014, Italy
MOTION MODEL
1. Uniform search ( no motion model) es STR, FBT
2. Probabilistic Gaussian motion model es IVT, L1T
3. Motion prediction es KALMAN, ACT ( a linear motion model, sometime
guided by OF)
4. Implicit motion model (with optimization:) es KLT, MST
5. Multiple models : tracking and detection TLD, particle filters, SPT, 3D
affine or projective motion models ,TAG
Rita Cucchiara ICVSS 2014, Italy
MOTION
Some considerations:
- Uniform search is only simple,
- better if guided at least by optical flow - in egovision 
- implicit with optimization: only if the motion is small and appearance is
more constant.. Not always possible
- -motion prediction in specific applications is perfect ; for instance in
intelligent transportation system a linear motion prediction is not
questionable.
Rita Cucchiara ICVSS 2014, Italy
MODEL UPDATE
The model updating:
• No update: NCC; FRT; MST; KLT no update but search the best
transform of the model. It could be considered trivial but good for short
sequence
• Last seen template, or partial updating (Porikli cvpr ‘06),
• Predicting the new appearance KAT for long term occlusion but for
abrupt changes it can produces errors
• Patches update, TMC and ACT (add or delete pathes)
• Updating and extended model, eg incremental PCA (IVT, TAG, TST)
Rita Cucchiara ICVSS 2014, Italy
INFERENCE METHODS
the method: The computational paradigm to find the best location and or
the best state of the target in the new frame.
a) Matching
b) Matching
extended target
c) Matching with
constraint
d) Discriminative
e) Discriminative
with constraint.
T target model, i.e. status
C candidates objects, i.e DATA
Rita Cucchiara ICVSS 2014, Italy
INFERENCE METHODS
a) Matching:
-direct gradient ascent (NCC, KLT), or probabilistic matching (MST)
-Matching with particle filtering (IVT, TST, TAG)
Useful if the appearance of target and bck are different to avoid local
maxima; useful in case of good appearance invariance ( problems with
intensity or surface luminance changes); good for occlusions and low
contrast
b) Matching with extended appearance
-Subspace matching in extended models, with some examples (IVT), or
with different track results (TAG, TST)
it is similar to have a long term memory, useful for long term tracking or
with occlusions, more complex
c) Matching with constraints
-adding some rules for the context (TAG), for the positions of patches
(L1T, L1O), or their pose (ACT)
Rita Cucchiara ICVSS 2014, Italy
INFERENCE METHODS
d) Discriminative
- Discriminative supervised classifier :
- FBT - Linear Discriminant analysis;
- HBT -segmentation and random forests;
- MIT and SPT - clustering,
- TLD a pool of randomized classifiers
- Often very few examples are available (thus LDA could be better than
multiple instance learning) in case of errors there are problems of drifting
away
e) Discriminative with constraints
• Structured classifier : STR structured classifier, uses as output the
displacement of the target instead of the label per pixels;
Rita Cucchiara ICVSS 2014, Italy
[NCC] NORMALIZED CROSS-CORRELATION
Direct target matching by normalized cross-correlation.  
[Briechle et al Spie 2001]
• Intensity values in the initial target box as template;
• Matching around by sampling uniformly around the previous position;
• Take the highest score with NCC at pixel level;
• No updating of the target;
T
2 2
( , ) ( , )
( , )
( , ) ( , )
i i
i i i i
g i m j n t i j
N m n
g i m j n t i j
 

 

 jj
j
Rita Cucchiara ICVSS 2014, Italy
[FRT] FRAGMENTS-BASED ROBUST
TRACKING
Matching the ensemble of 10 x 2 patches. [Adam et al CVPR2006] 
• ROI divided into patches;
• New window search around the previous including 10% scale change.
• Each patch intensity histogram compared by Earth Movers Distance.
• The target is not updated.
Robust to changes of poses, occlusions and very simple
Suitable for shape modification
T
Rita Cucchiara ICVSS 2014, Italy
[HBT] HOUGH-BASED TRACKING
Discriminative classifier on Lab-color, gradients and positions. 
The Hough Forest provides a probability map of the target.
• Give the sample, represented by features;
• Transform the target with a Hough Forest;
• Back projection from a Hough Forest.;(as GHT’s R table)
• Segment the target using grabcut and hence generate new samples.
HT.
backprojection
[Godec, Roth, Bishof, ICCV, 2011]
Rita Cucchiara ICVSS 2014, Italy
[TLD] TRACKING, LEARNING AND
DETECTION
Top detections on LBPs and LKT optical flow are combined by NCC.  fast
• Samples are selected in, around and away from the target to update (labeled
and unlabeled).
• If neither of the two trackers outputs, TLD declares loss and recovers.
• Learn which is the best detector; good for short term occlusion
class.
class.
[Kalal, Matas, Mikolajczyk, CVPR, 2010]
OF
LBP
Rita Cucchiara ICVSS 2014, Italy
[STR] STRUCTURED OUTPUT
TRACKING
Structured supervised classifier by {appearance, translation}. 
• The window is described by Haar features with 2 scales.
• Sampling uniformly around the previous position.
• The S-SVM learner update constraint to stay at current location; the
locations which violates the support points are used for the new SVM
class.
Transformation
prediction
patches
[Hare, Saffari, Torr, ICCV, 2011]
Rita Cucchiara ICVSS 2014, Italy
EXAMPLE OF STRUCT
Rita Cucchiara ICVSS 2014, Italy
IS TRACKING GOOD ENOUGH?
Measuring results is hard
Rita Cucchiara ICVSS 2014, Italy
TRACKING MEASURES
Lets’ call:
GTi the ground truth in the frame i
Ti the Detected target in the frame i
Match degree at pixel level MD=
|𝑇 𝑖∩𝐺𝑇 𝑖|
|𝑇 𝑖∪𝐺𝑇 𝑖|
Match at pixel level if level
|𝑇 𝑖∩𝐺𝑇 𝑖|
|𝑇 𝑖∪𝐺𝑇 𝑖|
≥ 𝑇ℎ Th=0,5 PASCAL measure [4]
|𝑇 𝑖∩𝐺𝑇 𝑖|
|𝑇 𝑖∪𝐺𝑇 𝑖
Without threshold is called the DICE measure
T
GT
Rita Cucchiara ICVSS 2014, Italy
TRACKING MEASURES
Ad object level in a sequence of frames ( for i=1 Nframe):
𝑛tp= 𝑖=1
𝑁𝑓𝑟𝑎𝑚𝑒
ni
tp 𝑛fp= 𝑖=1
𝑁𝑓𝑟𝑎𝑚𝑒
nif
p 𝑛fn= 𝑖=1
𝑁𝑓𝑟𝑎𝑚𝑒
ni
fn
Precision = (ntp )/(ntp + nfp) Recall= (ntp )/(ntp + nfn)
F-SCORE 𝐅 = 𝟐
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏∗𝑹𝒆𝒄𝒂𝒍𝒍
𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏+𝑹𝒆𝒄𝒂𝒍𝒍
(also called Correct track ratio)
At area/pixel level
𝑟𝑖 =
|𝑇 𝑖∩𝐺𝑇 𝑖|
|𝑇 𝑖|
and p𝑖 =
|𝑇 𝑖∩𝐺𝑇 𝑖|
|𝐺𝑇 𝑖|
F1-SCORE F1 =
1
𝑁𝑓𝑟𝑎𝑚𝑒 𝑖=1
𝑁𝑓𝑟𝑎𝑚𝑒
2
𝑃𝑖
∗𝑅𝑖
𝑃𝑖
+
𝑅𝑖
Rita Cucchiara ICVSS 2014, Italy
TRACKING MEASURES
For measuring the position deviation instead
Deviation=1- 𝒊∈𝑴𝒊 𝒅(𝑪𝑻𝒊−𝑪𝑮𝑻𝒊)
|𝑴𝒊|
d(x,y) distance L2 norm of the centroids
PBM Position Based Matching
PBM=
𝟏
𝑵𝒇𝒓𝒂𝒎𝒆𝒔 𝒊(𝟏 −
𝒅𝟏(𝑻𝒊,𝑮𝑻𝒊)
𝒔𝒑_𝒂𝒗𝒆(𝒊)
) d1(x,y) is the L1 norm
sp_ave is the average semi-perimeter between GT and T
T
GT
T
GT
sp_ave=
(H(T)+W(T)+H(GT)+W(GT))/2
Rita Cucchiara ICVSS 2014, Italy
EXPERIMENTAL RESULTS ON ALOV++
A
B
C
D
E
[NCC]
[STR]
[L1O]
[TST]
[TLD]
[FBT]
The upper bound, taking the
best of all trackers at each
frame 10%
The lower bound, what all
trackers can do 7%
About the 30%, correctly tracked only
Survival curves by Kaplan-Meier
Rita Cucchiara ICVSS 2014, Italy
COMPARISON ON MATCHING MODELS
For Egovision Comparison on video with motion issues
FRT KAT LKT
LOT MST NCC
FRT using fragments copes well with motion changes
NCC is insensitive to motion changes
Rita Cucchiara ICVSS 2014, Italy
COMPARISON ON DISCRIMINATIVE M.
Rita Cucchiara ICVSS 2014, Italy
CONFUSION CHALLENGE:
[FBT][NCC][STR] [TLD][TST] [L1O]
CONFUSION.. CROWD short term tracking
Rita Cucchiara ICVSS 2014, Italy
OCCLUSION
Conclusions: 1. STR, FBT, TST, TLD and L1T are best here (!).
2. Light occlusion is approximately solved.
3. Full occlusion is still hard for most.
Rita Cucchiara ICVSS 2014, Italy
LONG TERM CHALLENGE:
[FBT][NCC][STR] [TLD][TST] [L1O]
Rita Cucchiara ICVSS 2014, Italy
F-SCORES
See alov++
Rita Cucchiara ICVSS 2014, Italy
IS TRACKING GOOD IN EGOVISION?
Problems in egovision
• Moving head and Moving target
• Changes of Fov
• Changes of luminance
Dataset
V1. (semi-)still observer
V2. Moving head, not coherent motion (abrupt changes in motion patterns)
V3. Camera observer movement, with and without abrupt camera motions
Trackers
Matching based trackers
• NCC , NN ( with color histograms), FRT
Discriminative classifiers based
• HBT , TLD, STR
Rita Cucchiara ICVSS 2014, Italy
V1(semi-)still camera
Challenges:
• Changes in object shape (e.g. change in head pose)
• Occlusions between objects
Pros:
• No camera motion: low blur, target losses can be due only to occlusions
• Adaptive models can adapt to changes in object shape
Cons:
• Occlusions are likely to occur, loss detection is needed.
• Adaptive models must detect the loss or they adapt to the occluding object
TRACKING IN EGOVISION: EVALUATION
Rita Cucchiara ICVSS 2014, Italy
EGOVISION FROM A STILL PEROSN
Rita Cucchiara ICVSS 2014, Italy
TRACKING IN EGOVISION: EVALUATION
results in the first scenario
DICE measure: the overlap
degree between the ground truth
and the predicted bounding box.
V1.1: video without occlusions, the
only challenge are the subject’s pose
changes.
V1.2: recurring occlusions; adaptive
models (STR, TLD, HBT) fail to
discriminate between the original
target and the occluding one and
adapt to it, resulting in tracking
failure.
Rita Cucchiara ICVSS 2014, Italy
Results V2: moving camera, still person
Challenges
• Changes in object shape (e.g. change in head pose)
• Target exits the camera FoV
• Occlusions between objects
Scenario:
Pros:
• Person stands still, abrupt lighting changes are not likely
Cons:
• Occlusions are likely to occur, loss detection is needed
• Target can exit camera FoV, loss detection and re-identification are needed
• Adaptive models without loss detection quickly adapt to the background
after a loss
TRACKING IN EGOVISION: EVALUATION
Rita Cucchiara ICVSS 2014, Italy
EGOVISION FROM A MOVING HEAD
Rita Cucchiara ICVSS 2014, Italy
TRACKING IN EGOVISION: EVALUATION
Tracking results in the second
scenario
V2.1: video with people chatting. HBT
performs poorly due to its lack of loss
detection and recovery. STR cannot
detect the loss either and adapts its
support vectors to the background.
V2.2: tracking of a environmental point
of interest. Target stays still but gets
occluded and exits the camera FoW.
Color based trackers (HBT, NN)
performs poorly due to the difficulty in
discriminating the object based on color.
V2.3: tracking a face under fast
occurring occlusions. Responsive loss
detection (TLD) is needed in order to
stop adapting the model in time. Scale
changes compromise model matching
(FRT,NCC)
Rita Cucchiara ICVSS 2014, Italy
V3: moving camera, moving person
Challenges
• Changes in object shape (e.g. change in head pose)
• Target exits the camera FoW
• Occlusions between objects
• Abrupt changes in lighting
• Occasional low image resolution due to motion blur
The most challenging tracking scenario.
Considerations:
• Lack of loss detection results in tracking failure after very few frames
• Adaptive models often cannot cope with the challenges of this scenario and
adapt to background on some degree, resulting in the tracker quickly drifting
• Adaptability to scale changes is needed due to the person moving closer to
objects of interest
TRACKING IN EGOVISION: EVALUATION
Rita Cucchiara ICVSS 2014, Italy
EGOVISION FROM A MOVING PERSON
Rita Cucchiara ICVSS 2014, Italy
TRACKING IN EGOVISION: EVALUATION
Tracking results in the third scenario
V3.1: face tracking under person
motion. Discriminative colors between
object and background allow good
performances for HBT. NCC performs
well due to the lack of object changes.
V3.2: face tracking under both person
and camera motion. Adaptive models
end up adapting to background. NCC
do not adapt and hence do not drift.
V3.3: face tracking with indoor-outdoor
transition. Abrupt lighting change
during the transition compromise most
trackers. Adaptive models (TLD, STR,
HBT) try to adapt to the transition
resulting in the inability to adapt back to
the object when the lighting stabilizes.
Rita Cucchiara ICVSS 2014, Italy
TRACKING IN EGOVISION: EVALUATION
Scenari
o
Still camera,
still person
Moving camera, still
person
Moving camera, moving
person
Video V1.1 V1.2 V2.1 V2.2 V2.3 V3.1 V3.2 V3.3
NN 0.5204 0.2793 0.2314 0.0472 0.1211 0.2552 0.0867 0.1565
HBT 0.5187 0.1177 0.0206 0.1602 0.0333 0.5786 0.1457 0.0973
TLD 0.4838 0.1767 0.5091 0.6372 0.4342 0.2446 0.0237 0.1303
STR 0.6406 0.2397 0.0698 0.5745 0.0801 0.5532 0.0294 0.0879
NCC 0.4326 0.2251 0.4575 0.3769 0.0147 0.3607 0.1834 0.1118
FRT 0.2271 0.2138 0.1406 0.0294 0.0389 0.0984 0.1492 0.0756
Tracking results: table shows the F-measure for each video and each tracke
a lot of work to do…
Rita Cucchiara ICVSS 2014, Italy
THE SUPPORT POINTS!!!
Rita Cucchiara ICVSS 2014, Italy
IN SIMPLE CASES..
Problem: tracking people for detecting social group [Alletto CVPRW2014]
- We use VJ for initial detection
- HBT and TLD for tracking faces in real time + re-identification
- Classification for orientation
- Correlation clustering for detecting social group
- * MIUR Cluster project in Smart cities «educating city»: recognizing children
social activity 2014-2017
Rita Cucchiara ICVSS 2014, Italy
SOCIAL FEATURES
-90° -60° -30° 0° 30° 60° 90°
-75 -45 0 45 75Class
Interval
HEAD detection from egovision and POSE ESTIMATION
determine head yaw angle.
HOG descriptor computed using 8x8 cells, 16 bins per cell. Power normalization is then applied.
Multiclass Linear SVM and HMM used to discriminate different pose classes.
DISTANCE ESTIMATION AND 3D RECONSTRUCTION
No camera calibration implied, random regression forests. A grid model is applied to
estimate the person 3D location accounting for projective deformation.
Rita Cucchiara ICVSS 2014, Italy
METHOD OVERVIEW
SVM+ HMM
Random Forest
+
3D estimation
Multiple face detection
Tracking and segmentation
Video Stream
HOG
t-1
t
t+1
t
Face area
estimation
Head Pose estimation
Correlation
Clustering
SSVM
3D bird view model
Groups composition estimation
Distance
Rita Cucchiara ICVSS 2014, Italy
EXPERIMENTAL RESULTS
Rita Cucchiara ICVSS 2014, Italy
LAST: EXECUTION TIME
An example of comparison on the same video
HBT, STRUC, TLD and
NN(nearest neighbour with histograms)
Rita Cucchiara ICVSS 2014, Italy
CONCLUSIONS AND OPEN PROBLEMS
• Single target Detection & Tracking
• two big problems in all video sources
• In egovision the problem is open.
And then
• Recognition action&behaviors………
• Understanding what people seeing
•
• Maybe, it’s not utopia anymore.
Rita Cucchiara ICVSS 2014, Italy
HOMEWORKS
- Work with the dataset ALOV++ www.alov.org
- Compare the matching.based and discriminative-based tracking in different
scenario
- Try to understand motivation of failures
- Try to understand which are the weak points for egovision
- Answer to questions
1°) Why HOG, HOF and MBH are used (as well as trajectory shapes) for ego-
gesture recognition with hand segmentation by super pixels,?
2) Which is a comprehensive definition of egocentric vision?
3) Single target tracking approaches can be divided into two broad categories.
Which ones?
4) Why in ego-vision, tracking algorithms based on a single
or multiple memory of targets are more suitable?
Rita Cucchiara ICVSS 2014, Italy
ADDITIONAL REFERENCES
• T.-K. Kim and R. Cipolla. Canonical correlation analysis of video volume tensors for action categorization and
detection. Trans. PAMI, 2009
• Y. M. Lui, J. R. Beveridge, and M. Kirby. Action classification on product manifolds. In Proc. of CVPR, 2010
• Y. M. Lui and J. R. Beveridge. Tangent bundle for human action recognition. In In proc. of Automatic Face & Gesture
Recognition and Workshops, 2011
• A. Sanin, C. Sanderson, M. T. Harandi, and B. C. Lovell. Spatio-temporal covariance descriptors for action and gesture
recognition. In Proc. of Workshop on Applications of Computer Vision, 2013.
• L.Baraldi, F.Pace, G.Serra, L.Benini and R.Cucchiara Gesture recognition in ego-centric videos using dense trajectories
and hand segmentation EVW @CVPR2014
• R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. SLIC superpixels. Technical report, EPFL, 2010.
• H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action Recognition by Dense Trajectories. In Proc. of CVPR, 2011 and
IJCV2013.
• G.Gualdi, A. Prati, R. Cucchiara, "Multi-Stage Particle Windows for Fast and Accurate Object Detection"in IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 34, n. 8, pp. 1589-1604, 2012
• Adam, E. Rivlin, and I. Shimshoni, “Robust fragments-based tracking using the integral histogram,” in CVPR, 2006.
• M. Godec, P. M. Roth, and H. Bischof, “Hough-based tracking of non-rigid objects,” in ICCV, 2011
• Z. Kalal, J. Matas, and K. Mikolajczyk, “Online learning of robust object detectors during unstable tracking,” CVPR
2009
• S. Hare, A. Saffari, and P. H. S. Torr, “Struck: Structured output tracking with kernels,” in ICCV, 2011.
Rita Cucchiara ICVSS 2014, Italy
THANKS TO
http://imagelab.ing.unimo.it
Imagelab PEOPLE
Rita Cucchiara Giuseppe Serra Marco Manfredi
Costantino Grana Paolo Santinelli Francesco Solera
Roberto Vezzani Martino Lombardi Simone Pistocchi
Simone Calderara Michele Fornaciari Fabio Battilani
Dalia Coppi Patrizia VariniAugusto Pieracci
Stefano Aletto,

Contenu connexe

En vedette

Families and Friends of Murder Victims October 2015 newsletter
Families and Friends of Murder Victims October 2015 newsletterFamilies and Friends of Murder Victims October 2015 newsletter
Families and Friends of Murder Victims October 2015 newsletterRose Madsen
 
How to turn your inbox into a productivity engine - PCMA Convening Leaders 2015
How to turn your inbox into a productivity engine - PCMA Convening Leaders 2015How to turn your inbox into a productivity engine - PCMA Convening Leaders 2015
How to turn your inbox into a productivity engine - PCMA Convening Leaders 2015Social Tables
 
AHIP - Path to Consumer Engagement with MedHelp - Yurkovic
AHIP - Path to Consumer Engagement with MedHelp - Yurkovic AHIP - Path to Consumer Engagement with MedHelp - Yurkovic
AHIP - Path to Consumer Engagement with MedHelp - Yurkovic Robert Yurkovic
 
Caifanes matenme por que me muero
Caifanes matenme por que me mueroCaifanes matenme por que me muero
Caifanes matenme por que me mueroEduardo Rolas
 
Bloombase Spitfire KeyCastle Key Lifecycle Management Server Specifications
Bloombase Spitfire KeyCastle Key Lifecycle Management Server SpecificationsBloombase Spitfire KeyCastle Key Lifecycle Management Server Specifications
Bloombase Spitfire KeyCastle Key Lifecycle Management Server SpecificationsBloombase
 
Subsanar salud i.p.s
Subsanar salud i.p.sSubsanar salud i.p.s
Subsanar salud i.p.serikajaimes
 
islas galapagos
islas galapagosislas galapagos
islas galapagosyaczolev
 
SuisseID Forum 2015 | Bankkonto-Eröffnung übers Internet
SuisseID Forum 2015 | Bankkonto-Eröffnung übers InternetSuisseID Forum 2015 | Bankkonto-Eröffnung übers Internet
SuisseID Forum 2015 | Bankkonto-Eröffnung übers InternetTrägerverein SuisseID
 
Brochure films adhésifs
Brochure films adhésifs Brochure films adhésifs
Brochure films adhésifs Mactac Europe
 
ATELIER ANT N°3 - E-MAILING ET RELATION CLIENT
ATELIER ANT N°3 - E-MAILING ET RELATION CLIENTATELIER ANT N°3 - E-MAILING ET RELATION CLIENT
ATELIER ANT N°3 - E-MAILING ET RELATION CLIENTLaurent P. PRO
 
Modellierung von Abrechnungsprozessen und Einbindung neuer Dienstleistungspro...
Modellierung von Abrechnungsprozessen und Einbindung neuer Dienstleistungspro...Modellierung von Abrechnungsprozessen und Einbindung neuer Dienstleistungspro...
Modellierung von Abrechnungsprozessen und Einbindung neuer Dienstleistungspro...Michael Groeschel
 

En vedette (18)

Research Questions
Research QuestionsResearch Questions
Research Questions
 
Estrategia viral con Facebook
Estrategia viral con FacebookEstrategia viral con Facebook
Estrategia viral con Facebook
 
Families and Friends of Murder Victims October 2015 newsletter
Families and Friends of Murder Victims October 2015 newsletterFamilies and Friends of Murder Victims October 2015 newsletter
Families and Friends of Murder Victims October 2015 newsletter
 
How to turn your inbox into a productivity engine - PCMA Convening Leaders 2015
How to turn your inbox into a productivity engine - PCMA Convening Leaders 2015How to turn your inbox into a productivity engine - PCMA Convening Leaders 2015
How to turn your inbox into a productivity engine - PCMA Convening Leaders 2015
 
AHIP - Path to Consumer Engagement with MedHelp - Yurkovic
AHIP - Path to Consumer Engagement with MedHelp - Yurkovic AHIP - Path to Consumer Engagement with MedHelp - Yurkovic
AHIP - Path to Consumer Engagement with MedHelp - Yurkovic
 
Caifanes matenme por que me muero
Caifanes matenme por que me mueroCaifanes matenme por que me muero
Caifanes matenme por que me muero
 
Bloombase Spitfire KeyCastle Key Lifecycle Management Server Specifications
Bloombase Spitfire KeyCastle Key Lifecycle Management Server SpecificationsBloombase Spitfire KeyCastle Key Lifecycle Management Server Specifications
Bloombase Spitfire KeyCastle Key Lifecycle Management Server Specifications
 
Subsanar salud i.p.s
Subsanar salud i.p.sSubsanar salud i.p.s
Subsanar salud i.p.s
 
Common Purpose Hamburg
Common Purpose HamburgCommon Purpose Hamburg
Common Purpose Hamburg
 
islas galapagos
islas galapagosislas galapagos
islas galapagos
 
El profeta ISAIAS y las pequeñas comunidades EAS - Antonio Hortelano
El profeta ISAIAS y las pequeñas comunidades EAS - Antonio HortelanoEl profeta ISAIAS y las pequeñas comunidades EAS - Antonio Hortelano
El profeta ISAIAS y las pequeñas comunidades EAS - Antonio Hortelano
 
Retail
Retail Retail
Retail
 
hangul
hangulhangul
hangul
 
SuisseID Forum 2015 | Bankkonto-Eröffnung übers Internet
SuisseID Forum 2015 | Bankkonto-Eröffnung übers InternetSuisseID Forum 2015 | Bankkonto-Eröffnung übers Internet
SuisseID Forum 2015 | Bankkonto-Eröffnung übers Internet
 
Brochure films adhésifs
Brochure films adhésifs Brochure films adhésifs
Brochure films adhésifs
 
ATELIER ANT N°3 - E-MAILING ET RELATION CLIENT
ATELIER ANT N°3 - E-MAILING ET RELATION CLIENTATELIER ANT N°3 - E-MAILING ET RELATION CLIENT
ATELIER ANT N°3 - E-MAILING ET RELATION CLIENT
 
El Cormoran No Volador
El Cormoran No VoladorEl Cormoran No Volador
El Cormoran No Volador
 
Modellierung von Abrechnungsprozessen und Einbindung neuer Dienstleistungspro...
Modellierung von Abrechnungsprozessen und Einbindung neuer Dienstleistungspro...Modellierung von Abrechnungsprozessen und Einbindung neuer Dienstleistungspro...
Modellierung von Abrechnungsprozessen und Einbindung neuer Dienstleistungspro...
 

Similaire à Lecture 09 rita cucchiara - egocentric vision: tracking and recognizing human signs

Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfichsan6
 
Object and pose detection
Object and pose detectionObject and pose detection
Object and pose detectionAshwinBicholiya
 
Intelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIntelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIRJET Journal
 
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...IRJET Journal
 
AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...
AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...
AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...Waqas Tariq
 
Motion capture for Animation
Motion capture for AnimationMotion capture for Animation
Motion capture for AnimationIRJET Journal
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash CourseJia-Bin Huang
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionHemanth Haridas
 
Intellectual capital taxonomy
Intellectual capital taxonomy Intellectual capital taxonomy
Intellectual capital taxonomy Linlin Cai
 
IRJET - For(E)Sight :A Perceptive Device to Assist Blind People
IRJET -  	  For(E)Sight :A Perceptive Device to Assist Blind PeopleIRJET -  	  For(E)Sight :A Perceptive Device to Assist Blind People
IRJET - For(E)Sight :A Perceptive Device to Assist Blind PeopleIRJET Journal
 
Godeye An Efficient System for Blinds
Godeye An Efficient System for BlindsGodeye An Efficient System for Blinds
Godeye An Efficient System for Blindsijtsrd
 
Engineering the software of robotic systems - 4 - Current results [ICSE 2017 ...
Engineering the software of robotic systems - 4 - Current results [ICSE 2017 ...Engineering the software of robotic systems - 4 - Current results [ICSE 2017 ...
Engineering the software of robotic systems - 4 - Current results [ICSE 2017 ...Ivano Malavolta
 
Catalan Spatial Data Infrastructure
Catalan Spatial Data InfrastructureCatalan Spatial Data Infrastructure
Catalan Spatial Data InfrastructureRobert Jerzy Lach
 
Visual Summary of Egocentric Photostreams by Representative Keyframes (WEsAX ...
Visual Summary of Egocentric Photostreams by Representative Keyframes (WEsAX ...Visual Summary of Egocentric Photostreams by Representative Keyframes (WEsAX ...
Visual Summary of Egocentric Photostreams by Representative Keyframes (WEsAX ...Universitat Politècnica de Catalunya
 
Visual Summary of Egocentric Photostreams by Representative Keyframes
Visual Summary of Egocentric Photostreams by Representative KeyframesVisual Summary of Egocentric Photostreams by Representative Keyframes
Visual Summary of Egocentric Photostreams by Representative KeyframesMarc Bolaños Solà
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction Wael Badawy
 
MA - Final presentation
MA - Final presentationMA - Final presentation
MA - Final presentationshafiqzamri
 
Saksham seminar report
Saksham seminar reportSaksham seminar report
Saksham seminar reportSakshamTurki
 
Advance Intelligent Video Surveillance System Using OpenCV
Advance Intelligent Video Surveillance System Using OpenCVAdvance Intelligent Video Surveillance System Using OpenCV
Advance Intelligent Video Surveillance System Using OpenCVIRJET Journal
 

Similaire à Lecture 09 rita cucchiara - egocentric vision: tracking and recognizing human signs (20)

Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdf
 
Object and pose detection
Object and pose detectionObject and pose detection
Object and pose detection
 
Intelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep LearningIntelligent Video Surveillance System using Deep Learning
Intelligent Video Surveillance System using Deep Learning
 
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
INDOOR AND OUTDOOR NAVIGATION ASSISTANCE SYSTEM FOR VISUALLY IMPAIRED PEOPLE ...
 
AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...
AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...
AudiNect: An Aid for the Autonomous Navigation of Visually Impaired People, B...
 
Motion capture for Animation
Motion capture for AnimationMotion capture for Animation
Motion capture for Animation
 
Computer Vision Crash Course
Computer Vision Crash CourseComputer Vision Crash Course
Computer Vision Crash Course
 
Open Cv – An Introduction To The Vision
Open Cv – An Introduction To The VisionOpen Cv – An Introduction To The Vision
Open Cv – An Introduction To The Vision
 
Intellectual capital taxonomy
Intellectual capital taxonomy Intellectual capital taxonomy
Intellectual capital taxonomy
 
IRJET - For(E)Sight :A Perceptive Device to Assist Blind People
IRJET -  	  For(E)Sight :A Perceptive Device to Assist Blind PeopleIRJET -  	  For(E)Sight :A Perceptive Device to Assist Blind People
IRJET - For(E)Sight :A Perceptive Device to Assist Blind People
 
Godeye An Efficient System for Blinds
Godeye An Efficient System for BlindsGodeye An Efficient System for Blinds
Godeye An Efficient System for Blinds
 
Engineering the software of robotic systems - 4 - Current results [ICSE 2017 ...
Engineering the software of robotic systems - 4 - Current results [ICSE 2017 ...Engineering the software of robotic systems - 4 - Current results [ICSE 2017 ...
Engineering the software of robotic systems - 4 - Current results [ICSE 2017 ...
 
Catalan Spatial Data Infrastructure
Catalan Spatial Data InfrastructureCatalan Spatial Data Infrastructure
Catalan Spatial Data Infrastructure
 
Visual Summary of Egocentric Photostreams by Representative Keyframes (WEsAX ...
Visual Summary of Egocentric Photostreams by Representative Keyframes (WEsAX ...Visual Summary of Egocentric Photostreams by Representative Keyframes (WEsAX ...
Visual Summary of Egocentric Photostreams by Representative Keyframes (WEsAX ...
 
Visual Summary of Egocentric Photostreams by Representative Keyframes
Visual Summary of Egocentric Photostreams by Representative KeyframesVisual Summary of Egocentric Photostreams by Representative Keyframes
Visual Summary of Egocentric Photostreams by Representative Keyframes
 
Computer vision introduction
Computer vision  introduction Computer vision  introduction
Computer vision introduction
 
MA - Final presentation
MA - Final presentationMA - Final presentation
MA - Final presentation
 
Saksham seminar report
Saksham seminar reportSaksham seminar report
Saksham seminar report
 
Advance Intelligent Video Surveillance System Using OpenCV
Advance Intelligent Video Surveillance System Using OpenCVAdvance Intelligent Video Surveillance System Using OpenCV
Advance Intelligent Video Surveillance System Using OpenCV
 
F04613040
F04613040F04613040
F04613040
 

Plus de mustafa sarac

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma sonmustafa sarac
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
Latka december digital
Latka december digitalLatka december digital
Latka december digitalmustafa sarac
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualmustafa sarac
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpymustafa sarac
 
Math for programmers
Math for programmersMath for programmers
Math for programmersmustafa sarac
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizmustafa sarac
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?mustafa sarac
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mimustafa sarac
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?mustafa sarac
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Marketsmustafa sarac
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimimustafa sarac
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0mustafa sarac
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tshmustafa sarac
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008mustafa sarac
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guidemustafa sarac
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020mustafa sarac
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dicemustafa sarac
 

Plus de mustafa sarac (20)

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
Math for programmers
Math for programmersMath for programmers
Math for programmers
 
The book of Why
The book of WhyThe book of Why
The book of Why
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
 

Dernier

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 

Dernier (20)

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 

Lecture 09 rita cucchiara - egocentric vision: tracking and recognizing human signs

  • 1. ICVSS MARINA DI RAGUSA JULY 2014 Prof. Rita Cucchiara DIPARTIMENTO DI INGEGNERIA Enzo Ferrari Università di Modena e Reggio Emilia, Italia Egocentric vision tracking and recognizing human signs From fundamentals to applications http://www.Imagelab.ing.unimore.it
  • 2. Rita Cucchiara ICVSS 2014, Italy AGENDA Egocentric vision: from applications to fundamentals (and viceversa) • Introduction • Challenges in ego-vision problems • Recognizing ego-gestures by motion • The (unsolved) tracking problem, in ego-vision too • Discussion  at Ragusa Ibla
  • 3. Rita Cucchiara ICVSS 2014, Italy 1.INTRODUCTION Can we know what you are looking at?
  • 4. Rita Cucchiara ICVSS 2014, Italy EGOCENTRIC VISION Egocentric vision ( “Ego-Vision”) models and techniques for understanding what a person sees, from the first person’s point of view and centered on the human perceptual needs. Often called first-person vision, to recall the needs of using wearable cameras (e.g. on glasses mounted on the head) for acquiring and processing the same visual stimuli that human acquire and process. a broader meaning ….. to understand what a person sees or want to see or would like to see (e.g. in case of vision impairments) and to exploit similar learning, perception and reasoning paradigms of humans..
  • 5. Rita Cucchiara ICVSS 2014, Italy RESEARCH @IMAGELAB from surveillance… to new vision sensors Floorimage Drones Smartphones Ego-vision
  • 6. Rita Cucchiara ICVSS 2014, Italy A SMALL INCOMPLETE STORY.. 1961 Edward O. Thorp, (with Shannon) built a computerized timing devices for cheating at the game of roulette ( from [Thorp ICSWC98] ) 1980.. now Steve Mann (now at Univ. Of Toronto) defines many concepts of wearable computer for vision “My current wearable prototype, equipped with head-mounted display, cameras, and wireless communications, enables computer- assisted forms of interaction in ordinary situations-for example, while walking, shopping, or meeting people-and it is hardly noticeable.” [Mann Computer1997] 1998… now MIT wearable lab Alex Pentland Bernt Schiele… ……..
  • 7. Rita Cucchiara ICVSS 2014, Italy A SMALL INCOMPLETE STORY 2004 Richard Devaul ………. Phd thesis in the memory glass s. A. Pentland [Devaul MIT 04] 2009 1° CVPR workshop on Egocentric Vision By Philipose, Herbert, RenMITrill2000 …Google Glass 2012 2° CVPR workshop on Egocentric Vision By Regh, Ramanan, Rem, Fathi, Pisiavash 2014 3° CVPR workshop on Egocentric Vision By Kitani, Lee, Rioo, Fathi Three papers at CVPR2014 A session on IWCV 2014 …. Here we are. T. Kanade, “First-person, inside-out vision,” 2009, keynote Google temporally banned facial recognition technology on Google Glass due to privacy concerns,
  • 8. Rita Cucchiara ICVSS 2014, Italy WHY NOW, IN 2014 ?.. Technology (hw & sw availabilty) Applications Systems Services Market needs ideas
  • 9. Rita Cucchiara ICVSS 2014, Italy APPLICATIONS (OR MARKET NEEDS?) Recording your life digitally.. Memex 1945 Vaneer Bush BigBrother 1997 Mylifebits Gordon Bell Microsoft 2000 Sensecam Microsoft.. Sony Core Google Glass Life caching, Lifeblogging.. Human Augmentation Applications Creating cognitive and physical improvements as an integral part of the human body. (Gartner IT Glossary 2012) Life Logging
  • 10. Rita Cucchiara ICVSS 2014, Italy LIFE LOGGING: MARKET a megabit for every second in a year…. roughly 10 million seconds per year. 10 Tbit necessary for everything I look at for a year MEMOTO 2 days memory with 2 shot at minute; 4 GB data per day amounts to up to 1,5 terabyte per year… you can put on his cloud.. Autographer is a camera capable of shooting up to 2,000 shots a day Gopro ( see @youtube) The Lifelogger camera driving design requirements: 1. compact and small enough to wear like a necklace 2. wide field of view to capture what I see 3. battery life to last a full day
  • 11. Rita Cucchiara ICVSS 2014, Italy AUGMENTED VISION Who is who? Augmenetd vision for Security surveillance Smart manufacturing Natural interfaces, HCI 2.0 Wellbeing Education Entertainment ……
  • 12. Rita Cucchiara ICVSS 2014, Italy AUGMENTED VISION Applications and market needs…. MANY Applications for reading for impaired and blind people --OrCam (Shasua 2010, Hebrew Univ spin-off) --Applications for smart manufacory - Viewranger (japan) Applications for security and surveillance
  • 13. Rita Cucchiara ICVSS 2014, Italy 2. CV CHALLENGES FOR EGOVISION Why egocentric vision is not so trivial?
  • 14. Rita Cucchiara ICVSS 2014, Italy CV CHALLENGES FOR EGOVISION (1/6) a. Hardware • Design new hardware [Badino, Kanade MVA 2011] • Exploit real-time capabilities for egovision At UNIMORE & ETHZ: Odroid XU ARM Exynos 5 Heterogeneous octa core. Google Glass Vuzix M100 Kopin Golden-i Olympus Meg4.0
  • 15. Rita Cucchiara ICVSS 2014, Italy CV CHALLENGES FOR EGOVISION (2/6) b. Recognizing FoA and PoI • Eye-tracking; & ego vision [Tsukada, ETRA2012] • Estimating FoA [Li ICCV2013], [OgakiCVPRW2012], [Fathi ECCV2012] A.Fathi, Y.Li, J. Rehg, Learning to Recognizing Daily Actions using Gaze ECCV 2012
  • 16. Rita Cucchiara ICVSS 2014, Italy CV CHALLENGES FOR EGOVISION (3/6) c. Recognizing head motion • Head/body motion for outdoor summarization [ Poleg CVPR2014] • Motion for indoor summarization [Grauman CVPR2013] • Motion for supporting attention [Matsuo, CVPRW2014] • Motion for SLAM as in robotics [Bahera, ACCV2012] Cumulative displacement curves: Poleg, Arora, Peleg Temporal Segmentation of Egocentric Videos CVPR2014
  • 17. Rita Cucchiara ICVSS 2014, Italy UNDERSTANDING MOTION Dense OF vs Sparse OF Classical OF approach Classical LK approach Image Brightness Constancy assumption 0         t E dt dy y E dt dx x E   0 t T EE v      0.  iti EvuE pp                  12512225 25 2 1 2525 22 11                                   bdA E E E v u EE EE EE t t t yx yx yx p p p pp pp pp                                  ty tx yyxy yxxx TT EE EE v u EEEE EEEE bAdAA 1252521222 Farneback OF [2003] Faster assumes that a local neighborhood of each pixel an image can be represented on a polynomial basis…… 1/2 should not be large (1 = larger eigenvalue) Iterative Lucas-Kanade Algorithm [1981]
  • 18. Rita Cucchiara ICVSS 2014, Italy COMPUTING MOTION optical flow… Dense optical flow: • Horn Shunck [AI 1981] •Farneback [SCIA 2003] •Liu, Chellappa Rozenfeld [ICPR2002]eigenvalues •Medioni [PAMI 2008] tensor voting •… Sparse optical flow: •Luckas Kanade [IJCAI 1981] •Other keypoints (SIFT, SURF…) It is not enough for understanding head motion
  • 19. Rita Cucchiara ICVSS 2014, Italy CV CHALLENGES FOR EGOVISION (4/6) d. Recognizing objects • Objects useful for humans [Fathi, CVPR2013, ] • Objects in the hand [Fathi, Rehg CVPR2011] • Target tagging in the scene [Pirsiavash, Ramanan CVPR 2012] • Objects around you [Yvashita et al ICR2014]
  • 20. Rita Cucchiara ICVSS 2014, Italy CV CHALLENGES FOR EGOVISION (5/6) e. Recognizing actions Self-actions, gestures [Kitani, CVPR 2013; Baraldi, EVW2014] Actions of people, social actions [Ryoo, CVPR 2013]; [Alletto EFPVW 2014], [Narayan CVPRW2014] Actions in the environment (sport..) [Kitani, IEEE PC Magazine 2012]
  • 21. Rita Cucchiara ICVSS 2014, Italy EGO-GESTURE RECOGNITION • monocular hand gesture recognition • deal with static (hand pose) and dynamic gestures (motion). • very few positive samples • Many changes in luminance
  • 22. Rita Cucchiara ICVSS 2014, Italy HAND SEGMENTATION IN EGOVISION Hand recognition: It is an old problem, many approaches in different contexts: Skin classification: [Khan et al ICIP2010] Random forest: ( better than BN, MP,NB, AdaB…) Background subtraction after image registration [Fathi ICCV 2011] (assuming static bckg, hands with objects etc..) Generic object recognition : [Li, Kitani CVPR 2013] sparse feature selection and a battery of RF trained with different luminance conditions
  • 23. Rita Cucchiara ICVSS 2014, Italy AN EGOVISION SOLUTION Superpixel segmentation Temporal coherence Classification by Collection of RFs Superpixel descriptors Spatial coherence - SLIC (Simple linear Iterative clustering* [Achanta 2010]) - K means in 5D (Lab+xy)
  • 24. Rita Cucchiara ICVSS 2014, Italy AN EGOVISION SOLUTION Superpixel segmentation Temporal coherence Classification by Collection of RFs Superpixel descriptors Spatial coherence - Descriptors - -mean and covariance in RGB - LabH and HSVH - 27 Gabor filters (9 orientation, 3 scales 7x7,13x13,19x19) - HoG
  • 25. Rita Cucchiara ICVSS 2014, Italy AN EGOVISION SOLUTION Superpixel segmentation Temporal coherence Classification by Collection of RFs Superpixel descriptors Spatial coherence Classifier - Collection of Random Forests - Indexed by a 32 bin RGBH - It encodes the appearance of the scene and the global luminance - Hp: bkg and hands similar color changes Feature vector Scene luminance Global Luminance feature
  • 26. Rita Cucchiara ICVSS 2014, Italy AN EGOVISION SOLUTION Superpixel segmentation Temporal coherence Classification by Collection of RFs Superpixel descriptors Spatial coherence Temporal smoothing in a window of k frames Posterior probability to be or not a hand pixel in a previous window Estimated priors
  • 27. Rita Cucchiara ICVSS 2014, Italy AN EGOVISION SOLUTION Superpixel segmentation Temporal coherence Classification by Collection of RFs Superpixel descriptors Spatial coherence Spatial consistency Eliminate spurious superpixels Close holes Use grabcut using posteriori as a seed point
  • 28. Rita Cucchiara ICVSS 2014, Italy HAND SEGMENTATION there is a significant improvement in performance when all the three consistency aspect are used together: • illumination invariance (II) • temporal smoothing (TS) • spatial consistency (SC). • In standard datasets
  • 29. Rita Cucchiara ICVSS 2014, Italy CAMERA MOTION REMOVAL Extract dense keypoints Extract dense keypoints Estimate Homography - - Apply to Original frame sequence Output frame sequence without camera motion • In ego-camera views hands movement is usually not consistent with camera motion, resulting in wrong matches between the two frames. • a segmentation mask disregards feature matches belonging to hands Dense motion Object detection
  • 30. Rita Cucchiara ICVSS 2014, Italy EGO-GESTURE FEATURE • dense trajectories, HOG, HOG, MBH [Wang CVPR2013] extracted around hand regions. BoW Power-normalization and concatenation BoW BoW BoW TD descriptors HOG descriptors HOF descriptors MBH descriptors SVM 1-vs-1
  • 31. Rita Cucchiara ICVSS 2014, Italy EXPERIMENTAL RESULTS Datasets: • The Cambridge-Gesture database, with 900 sequences of nine hand gesture types under different illumination conditions; • Our Interactive Museum Dataset, an ego-centric gesture recognition dataset with 700 sequences from seven gesture classes performed by five subjects. • The EDSH dataset, which consists of three egocentric videos with indoor and outdoor scenes and large variations of illumination. See results in [Baraldi CVPRW2014]
  • 32. Rita Cucchiara ICVSS 2014, Italy CV CHALLENGES FOR EGOVISION (6/6) f. Tracking: recognizing among the time • tracking target objects • tracking face and people [Alletto, ICPR 2014] • Multiple target tracking
  • 33. Rita Cucchiara ICVSS 2014, Italy 6. TRACKING: THE BIG CHALLENGE
  • 34. Rita Cucchiara ICVSS 2014, Italy TRACKING Single object from a single camera static camera, moving camera Smartphone, egovision……… Multiple objects Single object from multiple camera Overlapped or Disjoined FoVs Network of Egovision systems Heterogeneous NoCs Multiple objects from distributed cameras Target Single multiple FoV Single multiple See at imagelab.ing.unimore.it
  • 35. Rita Cucchiara ICVSS 2014, Italy TRACKING IN EGOVISION Straightforward relationships with robot vision… ! Similar to video-surveillance • fast, real-time • similar scenes ( typically people, social life, children..) • many similar tasks( detection, action recognition , tracking) • as in people tracking motion is unpredictable but • Unconstrained • Large different motion factors • Frequent Changes of field of view • Interactive
  • 36. Rita Cucchiara ICVSS 2014, Italy OUR VISION… How can we track objects?
  • 37. Rita Cucchiara ICVSS 2014, Italy HOW DO WE SEE? The path: (E. Kendall, 2008) 1) The stimuli from retinae through two parallel path reach the lateral geniculate nucleus in thalamus, then to the cortex in the occipital lobe and then in the temporal and frontal lobes. 2) Two parallel paths 1) The way of WHAT in the temporal lobe perceives color, shape of the object, the face.. 2) The way of WHERE in parietal lobe provides localization of such objects 3) Centers hierarchically connected, process information and than come back to the WHAT area and work together Based on attention and purpose where what
  • 38. Rita Cucchiara ICVSS 2014, Italy SINGLE TARGET TRACKING analyzing the behavior of current tracking solutions in following a given target fray-by-frame Tracking is the task of generating an inference about the motion of an object given a sequence of images *. Arnold Smeulders, Dung M. Chu, Rita Cucchiara, Simone Calderara, Afshin Deghghan and, and Mubarak Shah, Visual Tracking: an Experimental Survey, IEEE TPAMI, 2014.
  • 39. Rita Cucchiara ICVSS 2014, Italy SINGLE TARGET TRACKING 1) Region of interest 2) Representation: 2.1)how to observe invariant and variant features in the frame and 2.2) how to hold them in an internal representation 1) Inference Method 2) Model Update The data or The observation The state ( the object model) The inference The (unsolved) questions in tracking a given target
  • 40. Rita Cucchiara ICVSS 2014, Italy THE HARDNESS OF TRACKING Which is the invariance that can be perceived and maintained along the time? Tracking is hard as nothing is fixed: • Problems of lights: the target aspect, the illumination, • Problems of motion: the object/camera motion, • Problems of scene: the occlusion, the confusion... • …. Searching for the invariance in the video Despite of the variety in the video ….most papers use 6 – 10 long videos only. This covers variety poorly.
  • 41. Rita Cucchiara ICVSS 2014, Italy WHY TRACKING IS SO HARD? 1. light 10. scene contrast 2. object surface cover 3. object specularity 4. object transparency 5. object shape 11. scene occlusion 9. scene confusion 8. scene clutter 12. camera moving 13. camera zoom 6. motion smoothness 7. motion coherence Light Object aspect Object Motion Camera motion Scene context 14. Long videosTemporal coherence 14 Challenges in video tracking
  • 42. Rita Cucchiara ICVSS 2014, Italy LIGHT AND OBJECT ASPECT 1. light 2. object surface cover 3. object specularity 4. object transparency 5. object shape Light Object aspect
  • 43. Rita Cucchiara ICVSS 2014, Italy MOTION 6. motion smoothness 7. motion coherence Object Motion 12. camera moving 13. camera zoom Camera motion 14. Long videos Temporal coherence
  • 44. Rita Cucchiara ICVSS 2014, Italy SCENE 10. scene contrast 11. scene occlusion 9. scene confusion 8. scene clutter Scene context
  • 45. Rita Cucchiara ICVSS 2014, Italy 14 TRACKING CHALLENGES IN 313 VIDEOS 01-LIGHT 02-SURFACECOVER 03-SPECULARITY 04-TRANSPARENCY 05-SHAPE 06-MOTIONSMOOTHNESS 07-MOTIONCOHERENCE 08-CLUTTER 09-CONFUSION 10-LOWCONTRAST 11-OCCLUSION 12-MOVINGCAMERA 13-ZOOMINGCAMERA 14-LONGDURATION
  • 46. Rita Cucchiara ICVSS 2014, Italy DATASET ALOV++ http://imagelab.ing.unimore.it/dsm/ or www.alov300.org
  • 47. Rita Cucchiara ICVSS 2014, Italy TRACKING IN EGO-VISION In egovision? • all the previous problems!! • relative motion • No motion of observer but moving target • Motion of observer but fixed target • Motion of both observer and target • ( the dataset at Imagelab.unimore.it) • EGO_GROUP • EGO_TRACK
  • 48. Rita Cucchiara ICVSS 2014, Italy GENERAL PURPOSE TRACKING Single target tracking, without any constraints From KALMAN, Ext. Kalman, LKT, Particle Filter, MeanShift, …… new generations of tracking solutions • explore multiple features and cues (SIFT; HOGS; SURF etc) • explore multiple object representation (fragments, graphs) • explore new optimization methods • explore new solutions of machine learning But what about the results?
  • 49. Rita Cucchiara ICVSS 2014, Italy THE STATE OF THE ART 19 tracking solutions for you.. Tracking by matching Tracking with a discriminative classification
  • 50. Rita Cucchiara ICVSS 2014, Italy THE STATE OF THE ART 19 tracking solutions for you.. Tracking by matching • [NCC] Normalized Cross-Correlation K. Briechle and U. Hanebeck, SPIE 2001 • [KLT] Lucas-Kanade Tracker S. Baker and I. Matthews, IJCV2004 • [KAT] Kalman Appearance Tracker H. Nguyen and A. Smeulders, TPAMI 2004 • [FRT] Fragments-based Robust Tracking A. Adam, E. Rivlin, and I. Shimshoni, CVPR2006 [MST] Mean Shift Tracking D. Comaniciu, V. Ramesh, and P. Meer, CVPR2000 • [LOT] Locally Orderless Tracking S. Oron, A. Bar-Hillel, D. Levi, S. Avidan, CVPR2012
  • 51. Rita Cucchiara ICVSS 2014, Italy THE STATE OF THE ART 19 tracking solutions for you.. Tracking by matching • [IVT] Incremental Visual Tracking D. Ross, J. Lim, and R.S.Lin, IJCV2008 • [TAG] Tracking on the Affine Group J. Kwon and F.C. Park, CVPR2009 • [TST] Tracking by Sampling Trackers J. Kwon, K.M. Lee, 2ICCV 011 • [TMC] Tracking by Monte Carlo sampling J. Kwon, K.M. Lee,CVPR 2009 • [ACT] Adaptive Coupled-layer Tracking L. Cehovin, M. Kristan, A. Leonardis, ICCV2011 • [L1T] L1-minimazioni tracker X.Mei H.Ling ICCV2009 • [L1O]L1 minimization with occlusion X.Mei, H.Ling, Y.Wu,E.Blash, L.Bai CVPR2011
  • 52. Rita Cucchiara ICVSS 2014, Italy THE STATE OF THE ART 19 tracking solutions for you.. Tracking with a [FBT]Foreground-background tracking Ngujen Smeulder IJCV 2006; [HBT], Hough based tracking Godec, Roth, Bishof ICCV2011; [SPT ]super pixel tracking Wang, Lu, Yang, Yang ICCV 2011; [MIT] multiple instance learning Babenko, Yang, Belongie CVPR2009; [TLD] tracking learning detection Kalal, Matas, Mikolajczyk CVPR 2010 [STRUC] structured output tracking Hare, Saffari, Torr ICCV 2011 •
  • 53. Rita Cucchiara ICVSS 2014, Italy ONE PROBLEM, MANY SOLUTIONS… RoI Model update Visual features Appearance motion predictions inference data
  • 54. Rita Cucchiara ICVSS 2014, Italy REGION OF INTEREST 1. From manual or automatic detectors 2. From moving object segmentation (bkg suppression, OF segmentation) 3. From local features identification;
  • 55. Rita Cucchiara ICVSS 2014, Italy APPEARANCE b) Histograms color histograms ( MST;, TMC, HBT, SPT, ) intensity histograms (FRT, ACT) Useful only is small patches, otherwise the spatial relationship information has to be captures elsewhere in the tracking algorithm c) Feature vectors Useful is the shape of the object si important ( and constant, at least in some parts) - Haar gradients (MIT) - 2D bin patterns (TLD) - SURF (FBT) - Lab-color features and others (HBT)…….. - Be careful in selecting true and stable invariants
  • 56. Rita Cucchiara ICVSS 2014, Italy APPEARANCE Some trackers keep the appearance information in the scene, i.e. add some contextual information • Background intensity representation ( the methods based on bck subtraction has a reference background) in surveillance. ([Wren et al TPAM97],[MOG CVPR’99], [Sakbot TPAMI ‘03] ) • occlusion detection information [AD Hoc ‘PRL ’11,ALIEN ‘13] • confusion information [Medioni CVPR ‘11] • motion information of cameras [Qiogui, IASP11] they cannot always used in general unknown visual contexts
  • 57. Rita Cucchiara ICVSS 2014, Italy MOTION MODEL 1. Uniform search ( no motion model) es STR, FBT 2. Probabilistic Gaussian motion model es IVT, L1T 3. Motion prediction es KALMAN, ACT ( a linear motion model, sometime guided by OF) 4. Implicit motion model (with optimization:) es KLT, MST 5. Multiple models : tracking and detection TLD, particle filters, SPT, 3D affine or projective motion models ,TAG
  • 58. Rita Cucchiara ICVSS 2014, Italy MOTION Some considerations: - Uniform search is only simple, - better if guided at least by optical flow - in egovision  - implicit with optimization: only if the motion is small and appearance is more constant.. Not always possible - -motion prediction in specific applications is perfect ; for instance in intelligent transportation system a linear motion prediction is not questionable.
  • 59. Rita Cucchiara ICVSS 2014, Italy MODEL UPDATE The model updating: • No update: NCC; FRT; MST; KLT no update but search the best transform of the model. It could be considered trivial but good for short sequence • Last seen template, or partial updating (Porikli cvpr ‘06), • Predicting the new appearance KAT for long term occlusion but for abrupt changes it can produces errors • Patches update, TMC and ACT (add or delete pathes) • Updating and extended model, eg incremental PCA (IVT, TAG, TST)
  • 60. Rita Cucchiara ICVSS 2014, Italy INFERENCE METHODS the method: The computational paradigm to find the best location and or the best state of the target in the new frame. a) Matching b) Matching extended target c) Matching with constraint d) Discriminative e) Discriminative with constraint. T target model, i.e. status C candidates objects, i.e DATA
  • 61. Rita Cucchiara ICVSS 2014, Italy INFERENCE METHODS a) Matching: -direct gradient ascent (NCC, KLT), or probabilistic matching (MST) -Matching with particle filtering (IVT, TST, TAG) Useful if the appearance of target and bck are different to avoid local maxima; useful in case of good appearance invariance ( problems with intensity or surface luminance changes); good for occlusions and low contrast b) Matching with extended appearance -Subspace matching in extended models, with some examples (IVT), or with different track results (TAG, TST) it is similar to have a long term memory, useful for long term tracking or with occlusions, more complex c) Matching with constraints -adding some rules for the context (TAG), for the positions of patches (L1T, L1O), or their pose (ACT)
  • 62. Rita Cucchiara ICVSS 2014, Italy INFERENCE METHODS d) Discriminative - Discriminative supervised classifier : - FBT - Linear Discriminant analysis; - HBT -segmentation and random forests; - MIT and SPT - clustering, - TLD a pool of randomized classifiers - Often very few examples are available (thus LDA could be better than multiple instance learning) in case of errors there are problems of drifting away e) Discriminative with constraints • Structured classifier : STR structured classifier, uses as output the displacement of the target instead of the label per pixels;
  • 63. Rita Cucchiara ICVSS 2014, Italy [NCC] NORMALIZED CROSS-CORRELATION Direct target matching by normalized cross-correlation.   [Briechle et al Spie 2001] • Intensity values in the initial target box as template; • Matching around by sampling uniformly around the previous position; • Take the highest score with NCC at pixel level; • No updating of the target; T 2 2 ( , ) ( , ) ( , ) ( , ) ( , ) i i i i i i g i m j n t i j N m n g i m j n t i j        jj j
  • 64. Rita Cucchiara ICVSS 2014, Italy [FRT] FRAGMENTS-BASED ROBUST TRACKING Matching the ensemble of 10 x 2 patches. [Adam et al CVPR2006]  • ROI divided into patches; • New window search around the previous including 10% scale change. • Each patch intensity histogram compared by Earth Movers Distance. • The target is not updated. Robust to changes of poses, occlusions and very simple Suitable for shape modification T
  • 65. Rita Cucchiara ICVSS 2014, Italy [HBT] HOUGH-BASED TRACKING Discriminative classifier on Lab-color, gradients and positions.  The Hough Forest provides a probability map of the target. • Give the sample, represented by features; • Transform the target with a Hough Forest; • Back projection from a Hough Forest.;(as GHT’s R table) • Segment the target using grabcut and hence generate new samples. HT. backprojection [Godec, Roth, Bishof, ICCV, 2011]
  • 66. Rita Cucchiara ICVSS 2014, Italy [TLD] TRACKING, LEARNING AND DETECTION Top detections on LBPs and LKT optical flow are combined by NCC.  fast • Samples are selected in, around and away from the target to update (labeled and unlabeled). • If neither of the two trackers outputs, TLD declares loss and recovers. • Learn which is the best detector; good for short term occlusion class. class. [Kalal, Matas, Mikolajczyk, CVPR, 2010] OF LBP
  • 67. Rita Cucchiara ICVSS 2014, Italy [STR] STRUCTURED OUTPUT TRACKING Structured supervised classifier by {appearance, translation}.  • The window is described by Haar features with 2 scales. • Sampling uniformly around the previous position. • The S-SVM learner update constraint to stay at current location; the locations which violates the support points are used for the new SVM class. Transformation prediction patches [Hare, Saffari, Torr, ICCV, 2011]
  • 68. Rita Cucchiara ICVSS 2014, Italy EXAMPLE OF STRUCT
  • 69. Rita Cucchiara ICVSS 2014, Italy IS TRACKING GOOD ENOUGH? Measuring results is hard
  • 70. Rita Cucchiara ICVSS 2014, Italy TRACKING MEASURES Lets’ call: GTi the ground truth in the frame i Ti the Detected target in the frame i Match degree at pixel level MD= |𝑇 𝑖∩𝐺𝑇 𝑖| |𝑇 𝑖∪𝐺𝑇 𝑖| Match at pixel level if level |𝑇 𝑖∩𝐺𝑇 𝑖| |𝑇 𝑖∪𝐺𝑇 𝑖| ≥ 𝑇ℎ Th=0,5 PASCAL measure [4] |𝑇 𝑖∩𝐺𝑇 𝑖| |𝑇 𝑖∪𝐺𝑇 𝑖 Without threshold is called the DICE measure T GT
  • 71. Rita Cucchiara ICVSS 2014, Italy TRACKING MEASURES Ad object level in a sequence of frames ( for i=1 Nframe): 𝑛tp= 𝑖=1 𝑁𝑓𝑟𝑎𝑚𝑒 ni tp 𝑛fp= 𝑖=1 𝑁𝑓𝑟𝑎𝑚𝑒 nif p 𝑛fn= 𝑖=1 𝑁𝑓𝑟𝑎𝑚𝑒 ni fn Precision = (ntp )/(ntp + nfp) Recall= (ntp )/(ntp + nfn) F-SCORE 𝐅 = 𝟐 𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏∗𝑹𝒆𝒄𝒂𝒍𝒍 𝑷𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏+𝑹𝒆𝒄𝒂𝒍𝒍 (also called Correct track ratio) At area/pixel level 𝑟𝑖 = |𝑇 𝑖∩𝐺𝑇 𝑖| |𝑇 𝑖| and p𝑖 = |𝑇 𝑖∩𝐺𝑇 𝑖| |𝐺𝑇 𝑖| F1-SCORE F1 = 1 𝑁𝑓𝑟𝑎𝑚𝑒 𝑖=1 𝑁𝑓𝑟𝑎𝑚𝑒 2 𝑃𝑖 ∗𝑅𝑖 𝑃𝑖 + 𝑅𝑖
  • 72. Rita Cucchiara ICVSS 2014, Italy TRACKING MEASURES For measuring the position deviation instead Deviation=1- 𝒊∈𝑴𝒊 𝒅(𝑪𝑻𝒊−𝑪𝑮𝑻𝒊) |𝑴𝒊| d(x,y) distance L2 norm of the centroids PBM Position Based Matching PBM= 𝟏 𝑵𝒇𝒓𝒂𝒎𝒆𝒔 𝒊(𝟏 − 𝒅𝟏(𝑻𝒊,𝑮𝑻𝒊) 𝒔𝒑_𝒂𝒗𝒆(𝒊) ) d1(x,y) is the L1 norm sp_ave is the average semi-perimeter between GT and T T GT T GT sp_ave= (H(T)+W(T)+H(GT)+W(GT))/2
  • 73. Rita Cucchiara ICVSS 2014, Italy EXPERIMENTAL RESULTS ON ALOV++ A B C D E [NCC] [STR] [L1O] [TST] [TLD] [FBT] The upper bound, taking the best of all trackers at each frame 10% The lower bound, what all trackers can do 7% About the 30%, correctly tracked only Survival curves by Kaplan-Meier
  • 74. Rita Cucchiara ICVSS 2014, Italy COMPARISON ON MATCHING MODELS For Egovision Comparison on video with motion issues FRT KAT LKT LOT MST NCC FRT using fragments copes well with motion changes NCC is insensitive to motion changes
  • 75. Rita Cucchiara ICVSS 2014, Italy COMPARISON ON DISCRIMINATIVE M.
  • 76. Rita Cucchiara ICVSS 2014, Italy CONFUSION CHALLENGE: [FBT][NCC][STR] [TLD][TST] [L1O] CONFUSION.. CROWD short term tracking
  • 77. Rita Cucchiara ICVSS 2014, Italy OCCLUSION Conclusions: 1. STR, FBT, TST, TLD and L1T are best here (!). 2. Light occlusion is approximately solved. 3. Full occlusion is still hard for most.
  • 78. Rita Cucchiara ICVSS 2014, Italy LONG TERM CHALLENGE: [FBT][NCC][STR] [TLD][TST] [L1O]
  • 79. Rita Cucchiara ICVSS 2014, Italy F-SCORES See alov++
  • 80. Rita Cucchiara ICVSS 2014, Italy IS TRACKING GOOD IN EGOVISION? Problems in egovision • Moving head and Moving target • Changes of Fov • Changes of luminance Dataset V1. (semi-)still observer V2. Moving head, not coherent motion (abrupt changes in motion patterns) V3. Camera observer movement, with and without abrupt camera motions Trackers Matching based trackers • NCC , NN ( with color histograms), FRT Discriminative classifiers based • HBT , TLD, STR
  • 81. Rita Cucchiara ICVSS 2014, Italy V1(semi-)still camera Challenges: • Changes in object shape (e.g. change in head pose) • Occlusions between objects Pros: • No camera motion: low blur, target losses can be due only to occlusions • Adaptive models can adapt to changes in object shape Cons: • Occlusions are likely to occur, loss detection is needed. • Adaptive models must detect the loss or they adapt to the occluding object TRACKING IN EGOVISION: EVALUATION
  • 82. Rita Cucchiara ICVSS 2014, Italy EGOVISION FROM A STILL PEROSN
  • 83. Rita Cucchiara ICVSS 2014, Italy TRACKING IN EGOVISION: EVALUATION results in the first scenario DICE measure: the overlap degree between the ground truth and the predicted bounding box. V1.1: video without occlusions, the only challenge are the subject’s pose changes. V1.2: recurring occlusions; adaptive models (STR, TLD, HBT) fail to discriminate between the original target and the occluding one and adapt to it, resulting in tracking failure.
  • 84. Rita Cucchiara ICVSS 2014, Italy Results V2: moving camera, still person Challenges • Changes in object shape (e.g. change in head pose) • Target exits the camera FoV • Occlusions between objects Scenario: Pros: • Person stands still, abrupt lighting changes are not likely Cons: • Occlusions are likely to occur, loss detection is needed • Target can exit camera FoV, loss detection and re-identification are needed • Adaptive models without loss detection quickly adapt to the background after a loss TRACKING IN EGOVISION: EVALUATION
  • 85. Rita Cucchiara ICVSS 2014, Italy EGOVISION FROM A MOVING HEAD
  • 86. Rita Cucchiara ICVSS 2014, Italy TRACKING IN EGOVISION: EVALUATION Tracking results in the second scenario V2.1: video with people chatting. HBT performs poorly due to its lack of loss detection and recovery. STR cannot detect the loss either and adapts its support vectors to the background. V2.2: tracking of a environmental point of interest. Target stays still but gets occluded and exits the camera FoW. Color based trackers (HBT, NN) performs poorly due to the difficulty in discriminating the object based on color. V2.3: tracking a face under fast occurring occlusions. Responsive loss detection (TLD) is needed in order to stop adapting the model in time. Scale changes compromise model matching (FRT,NCC)
  • 87. Rita Cucchiara ICVSS 2014, Italy V3: moving camera, moving person Challenges • Changes in object shape (e.g. change in head pose) • Target exits the camera FoW • Occlusions between objects • Abrupt changes in lighting • Occasional low image resolution due to motion blur The most challenging tracking scenario. Considerations: • Lack of loss detection results in tracking failure after very few frames • Adaptive models often cannot cope with the challenges of this scenario and adapt to background on some degree, resulting in the tracker quickly drifting • Adaptability to scale changes is needed due to the person moving closer to objects of interest TRACKING IN EGOVISION: EVALUATION
  • 88. Rita Cucchiara ICVSS 2014, Italy EGOVISION FROM A MOVING PERSON
  • 89. Rita Cucchiara ICVSS 2014, Italy TRACKING IN EGOVISION: EVALUATION Tracking results in the third scenario V3.1: face tracking under person motion. Discriminative colors between object and background allow good performances for HBT. NCC performs well due to the lack of object changes. V3.2: face tracking under both person and camera motion. Adaptive models end up adapting to background. NCC do not adapt and hence do not drift. V3.3: face tracking with indoor-outdoor transition. Abrupt lighting change during the transition compromise most trackers. Adaptive models (TLD, STR, HBT) try to adapt to the transition resulting in the inability to adapt back to the object when the lighting stabilizes.
  • 90. Rita Cucchiara ICVSS 2014, Italy TRACKING IN EGOVISION: EVALUATION Scenari o Still camera, still person Moving camera, still person Moving camera, moving person Video V1.1 V1.2 V2.1 V2.2 V2.3 V3.1 V3.2 V3.3 NN 0.5204 0.2793 0.2314 0.0472 0.1211 0.2552 0.0867 0.1565 HBT 0.5187 0.1177 0.0206 0.1602 0.0333 0.5786 0.1457 0.0973 TLD 0.4838 0.1767 0.5091 0.6372 0.4342 0.2446 0.0237 0.1303 STR 0.6406 0.2397 0.0698 0.5745 0.0801 0.5532 0.0294 0.0879 NCC 0.4326 0.2251 0.4575 0.3769 0.0147 0.3607 0.1834 0.1118 FRT 0.2271 0.2138 0.1406 0.0294 0.0389 0.0984 0.1492 0.0756 Tracking results: table shows the F-measure for each video and each tracke a lot of work to do…
  • 91. Rita Cucchiara ICVSS 2014, Italy THE SUPPORT POINTS!!!
  • 92. Rita Cucchiara ICVSS 2014, Italy IN SIMPLE CASES.. Problem: tracking people for detecting social group [Alletto CVPRW2014] - We use VJ for initial detection - HBT and TLD for tracking faces in real time + re-identification - Classification for orientation - Correlation clustering for detecting social group - * MIUR Cluster project in Smart cities «educating city»: recognizing children social activity 2014-2017
  • 93. Rita Cucchiara ICVSS 2014, Italy SOCIAL FEATURES -90° -60° -30° 0° 30° 60° 90° -75 -45 0 45 75Class Interval HEAD detection from egovision and POSE ESTIMATION determine head yaw angle. HOG descriptor computed using 8x8 cells, 16 bins per cell. Power normalization is then applied. Multiclass Linear SVM and HMM used to discriminate different pose classes. DISTANCE ESTIMATION AND 3D RECONSTRUCTION No camera calibration implied, random regression forests. A grid model is applied to estimate the person 3D location accounting for projective deformation.
  • 94. Rita Cucchiara ICVSS 2014, Italy METHOD OVERVIEW SVM+ HMM Random Forest + 3D estimation Multiple face detection Tracking and segmentation Video Stream HOG t-1 t t+1 t Face area estimation Head Pose estimation Correlation Clustering SSVM 3D bird view model Groups composition estimation Distance
  • 95. Rita Cucchiara ICVSS 2014, Italy EXPERIMENTAL RESULTS
  • 96. Rita Cucchiara ICVSS 2014, Italy LAST: EXECUTION TIME An example of comparison on the same video HBT, STRUC, TLD and NN(nearest neighbour with histograms)
  • 97. Rita Cucchiara ICVSS 2014, Italy CONCLUSIONS AND OPEN PROBLEMS • Single target Detection & Tracking • two big problems in all video sources • In egovision the problem is open. And then • Recognition action&behaviors……… • Understanding what people seeing • • Maybe, it’s not utopia anymore.
  • 98. Rita Cucchiara ICVSS 2014, Italy HOMEWORKS - Work with the dataset ALOV++ www.alov.org - Compare the matching.based and discriminative-based tracking in different scenario - Try to understand motivation of failures - Try to understand which are the weak points for egovision - Answer to questions 1°) Why HOG, HOF and MBH are used (as well as trajectory shapes) for ego- gesture recognition with hand segmentation by super pixels,? 2) Which is a comprehensive definition of egocentric vision? 3) Single target tracking approaches can be divided into two broad categories. Which ones? 4) Why in ego-vision, tracking algorithms based on a single or multiple memory of targets are more suitable?
  • 99. Rita Cucchiara ICVSS 2014, Italy ADDITIONAL REFERENCES • T.-K. Kim and R. Cipolla. Canonical correlation analysis of video volume tensors for action categorization and detection. Trans. PAMI, 2009 • Y. M. Lui, J. R. Beveridge, and M. Kirby. Action classification on product manifolds. In Proc. of CVPR, 2010 • Y. M. Lui and J. R. Beveridge. Tangent bundle for human action recognition. In In proc. of Automatic Face & Gesture Recognition and Workshops, 2011 • A. Sanin, C. Sanderson, M. T. Harandi, and B. C. Lovell. Spatio-temporal covariance descriptors for action and gesture recognition. In Proc. of Workshop on Applications of Computer Vision, 2013. • L.Baraldi, F.Pace, G.Serra, L.Benini and R.Cucchiara Gesture recognition in ego-centric videos using dense trajectories and hand segmentation EVW @CVPR2014 • R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. SLIC superpixels. Technical report, EPFL, 2010. • H. Wang, A. Klaser, C. Schmid, and C.-L. Liu. Action Recognition by Dense Trajectories. In Proc. of CVPR, 2011 and IJCV2013. • G.Gualdi, A. Prati, R. Cucchiara, "Multi-Stage Particle Windows for Fast and Accurate Object Detection"in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, n. 8, pp. 1589-1604, 2012 • Adam, E. Rivlin, and I. Shimshoni, “Robust fragments-based tracking using the integral histogram,” in CVPR, 2006. • M. Godec, P. M. Roth, and H. Bischof, “Hough-based tracking of non-rigid objects,” in ICCV, 2011 • Z. Kalal, J. Matas, and K. Mikolajczyk, “Online learning of robust object detectors during unstable tracking,” CVPR 2009 • S. Hare, A. Saffari, and P. H. S. Torr, “Struck: Structured output tracking with kernels,” in ICCV, 2011.
  • 100. Rita Cucchiara ICVSS 2014, Italy THANKS TO http://imagelab.ing.unimo.it Imagelab PEOPLE Rita Cucchiara Giuseppe Serra Marco Manfredi Costantino Grana Paolo Santinelli Francesco Solera Roberto Vezzani Martino Lombardi Simone Pistocchi Simone Calderara Michele Fornaciari Fabio Battilani Dalia Coppi Patrizia VariniAugusto Pieracci Stefano Aletto,