SlideShare une entreprise Scribd logo
1  sur  62
Modeling Temporal Structure of
 Decomposable Motion Segments for
       Activity Classification




Juan Carlos      Chih-Wei                Li
  Niebles          Chen                Fei-Fei
              Computer Science Dept.
                Stanford University
                                                 1
Recognizing Human Activities




Motion Analysis              Interactions with Objects    Detect unusual behavior
                             Temporal structure &
                             causality


Judge Sports Automatically   Provide cooking assistance   Smart surveillance
Biomechanics                 …                            Psychology studies
Video game interfaces

                                                                               2
Activity landscape
                                                            Long term
Snapshot   Atomic action    Activities   Events
                                                              event




                                                           Construction
 Catch         Run          High Jump    Football          of a building

  10-1          100            101         103                 107-8
                                                    Temporal Scale (seconds)




                                                                       3
Activity landscape
                                                                                                      Long term
     Snapshot                  Atomic action              Activities               Events
                                                                                                        event




                                                                                                      Construction
        Catch                        Run                  High Jump                Football           of a building

         10-1                         100                     101                     103                 107-8
• Thurau & Hlavac, 2008        • Bobick & Davis, 2001    • Ramanan & Forsyth, • Sridhar et al, 2010
• Gupta et al, 2009            • Efros et al, 2003       2003                      • Kuettel, 2010
• Ikizler & Duygulu, 2009      • Schuldt et al, 2004     • Laxton et al, 2007
• Ikizler-Cinbis et al, 2009   • Alper & Shah, 2005      • Ikizler & Forsyth, 2008
• Yao & Fei-Fei 2010a,b        • Dollar et al, 2005      • Gupta et al, 2009
• Yang, Wang and Mori,         • Blank et al, 2005       • Choi & Savarese, 2009
2010                           • Niebles et al, 2006
                               • Laptev et al, 2008
                               • Wang & Mori, 2008
                               • Rodriguez et al, 2008
                               • Wang & Mori, 2009
                               • Gupta et al, 2009
                               • Liu et al, 2009                                                                  4
                               • Marszalek et al, 2009
Activity landscape
                                                                 Long term
Snapshot   Atomic action     Activities         Events
                                                                   event




  10-1          100             101               103               107-8
                                                         Temporal Scale (seconds)


               • Composition of simple motions
               • Non-periodic
               • Longer duration than atomic actions
Activity landscape – related datasets
                                                                                 Long term
   Snapshot           Atomic action        Activities           Events
                                                                                   event




       10-1                 100               101                103                107-8
                                                                         Temporal Scale (seconds)
Actions in still images   KTH                    New
[Ikizler 2009]            [Schuldt et al 2004] Olympic Sports
PPMI                      Hollywood              Dataset
[Yao & Fei-Fei 2010]      [Laptev et al 2008]
UIUC Sports               UCF Sports
[Li & Fei-Fei 2007]       [Rodriguez et al 2008]
                          Ballet
                          [Yang et al 2009]
Activity landscape
                                                                                    Long term
 Snapshot         Atomic action         Activities           Events
                                                                                      event




    10-1                   100              101                103                    107-8
                                                                          Temporal Scale (seconds)
Possible approaches:
    Pose-based recognition               HMM, CRF                      Bag of features




  • Computationally intensive         • Simple action recognition: Fails when actions
  Ferrari et al 2008                    are complex
  Ramanan & Forsyth 2003               Laptev et al 2008     Sminchisescu 2006
  Nazli & Forsyth 2008                 Niebles et al 2006    Blank et al 2005
                                                                                              7
  […]                                  Liu et al 2009        Efros et al 2003 […]
Our proposal – decompose activities into simpler
               motion segments




    1. Simple motions are easier to describe computationally
    2. Can leverage temporal context
    3. Human visual system seems to rely on decomposition for
       understanding [Zacks et al, Nature Neuro 2001, Tversky et al, JEP, 2006]
                                                                                  8
Outline
• Discriminative model for activities
  – Representation
  – Recognition
  – Learning
• Experiments
• Conclusions



                                        9
Outline
• Discriminative model for activities
  – Representation
  – Recognition
  – Learning
• Experiments
• Conclusions



                                        10
A model for activities




                         Activity Model




                                     11
A model for complex activities




                                                 Activity Model
Model Properties
                           0                             1
• Use a standard       [                                     ]
time range: [0,1]                                                time




                                                                  12
A model for complex activities




                                                 Activity Model
Model Properties
                           0                             1
• Use a standard       [                                     ]
time range: [0,1]                                                time
• Model is formed
by a few simple
motions




                                                                  13
A model for complex activities




                                                 Activity Model
Model Properties
                           0                             1
• Use a standard       [                                     ]
time range: [0,1]                                                time
• Model is formed
by a few simple
motions




                                                                  14
A model for complex activities




                                                    Activity Model
Model Properties
                           0                                1
• Use a standard       [                                        ]
time range: [0,1]                                                   time
• Model is formed
by a few simple
motions
• Local motion
appearance

                               : Motion Segment 1




                                                                     15
A model for complex activities




                                                          Activity Model
Model Properties
                           0                                      1
• Use a standard       [                                              ]
time range: [0,1]                     : anchor location                   time
• Model is formed
by a few simple
motions
• Local motion
appearance
• Encode temporal
order                          : Motion Segment 1




                                                                           16
A model for complex activities




                           temporal location uncertainty   Activity Model
Model Properties
                           0                                       1
• Use a standard       [                                               ]
time range: [0,1]                     : anchor location                    time
• Model is formed
by a few simple
motions
• Local motion
appearance
• Encode temporal
order                          : Motion Segment 1
• Temporal
flexibility


                                                                            17
A model for complex activities




                          temporal location uncertainty   Activity Model
Model Properties
                          0                                       1
• Use a standard      [                                               ]
time range: [0,1]                    : anchor location                    time
                      shorter
• Model is formed
by a few simple
motions
• Local motion
appearance
• Encode temporal
order                         : Motion Segment 1
• Temporal
flexibility
• Multiple temporal
scales                longer                                               18
Outline
• Discriminative model for activities
  – Representation
  – Recognition
  – Learning
• Experiments
• Conclusions



                                        19
Query Video
              Recognition




                            20
Query Video
              Recognition


    [0                            1]




                            Activity Model

    [0                            1]




                                        21
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:




                                        Activity Model

    [0                                        1]




                                                    22
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:
• Consider a candidate location



                                        Activity Model

    [0                                        1]




                                                    23
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:
• Consider a candidate location
• Matching score for this segment:


                                        Activity Model

    [0                                        1]




                                                    24
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:
• Consider a candidate location
• Matching score for this segment:


                                        Activity Model

    [0                                        1]




                                                    25
Query Video
                          Recognition


    [0                                                           1]


Match Motion Segment 1:              Spatio-temporal Interest points
• Consider a candidate location      HOG/HOF Descriptors
• Matching score for this segment:   [Laptev et al, 2005]



                                                           Activity Model

    [0                                                           1]




                                                                       26
Query Video
                          Recognition


    [0                                                          1]


Match Motion Segment 1:              Vector-quantized into a codebook
• Consider a candidate location      of 1000 spatio-temporal words.
• Matching score for this segment:


                                                          Activity Model

    [0                                                          1]




                                                                      27
Query Video
                            Recognition

              Video words

    [0                                                            1]


Match Motion Segment 1:              Appearance feature:
• Consider a candidate location      histogram of video words
• Matching score for this segment:


                                                            Activity Model

    [0                                                            1]




                                                                        28
Query Video
                            Recognition

              Video words

    [0                                                              1]


Match Motion Segment 1:              Appearance similarity score:
• Consider a candidate location      Chi-square kernel SVM
• Matching score for this segment:


                                                             Activity Model

    [0                                                              1]




                                                                         29
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:
• Consider a candidate location
• Matching score for this segment:


                                        Activity Model

    [0                                        1]




                                                    30
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:
• Consider a candidate location
• Matching score for this segment:


                                        Activity Model

    [0                                        1]




                                                    31
Query Video
                          Recognition


    [0                                                              1]


Match Motion Segment 1:              Temporal location feature:
• Consider a candidate location      the distance btw h_1 and the
• Matching score for this segment:   anchor location:



                                                            Activity Model

    [0                                                              1]




                                                                         32
Query Video
                          Recognition


    [0                                                           1]


Match Motion Segment 1:              Temporal location disagreement
• Consider a candidate location      score: 2nd order polynomial
• Matching score for this segment:


                                                           Activity Model

    [0                                                           1]




                                                                       33
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:
• Consider a candidate location
• Matching score for this segment:


                                        Activity Model

    [0                                        1]




                                                    34
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:
• Consider a candidate location
• Matching score for this segment:


                                        Activity Model

    [0                                        1]




                                                    35
Query Video
                          Recognition


    [0                                        1]


Match Motion Segment 1:
• Consider a candidate location
• Matching score for this segment:


                                        Activity Model

    [0                                        1]




                                                    36
Query Video
                          Recognition


    [0                                        1]




• Matching score for all segments:


                                        Activity Model

    [0                                        1]




                                                    37
Outline
• Discriminative model for activities
  – Representation
  – Recognition
  – Learning
• Experiments
• Conclusions



                                        38
Learning from weakly labeled data
  positive examples               negative examples




                                                      39




       • YouTube videos
       • Class label per video collected on
         Amazon Mechanical Turk
       • No annotation of temporal
         segments                                     39
Learning from weakly labeled data
       positive examples   negative examples




                                                    40




                                         Activity Model

[0                                             1]




                                                     40
Learning
Goal
Learn: • Motion segment appearance
       • Temporal arrangement
A max-margin framwork by optimizing a discriminative loss:



        Coordinate descend                   [Felzenszwalb et al 2008]
                                                     Activity Model

  [0                                                       1]




                                                                 41
Learning
Coordinate descend
• Initialize model parameters



          positive examples                negative examples




                                               Activity Model
               [                                      ]
                0                                 1

                                                                42
Learning
Coordinate descend
• Initialize model parameters
1. Find best matching locations


          positive examples                  negative examples




                                                 Activity Model
               [                                        ]
                0                                   1

                                                                  43
Learning
Coordinate descend
• Initialize model parameters
1. Find best matching locations
2. Update

          positive examples                  negative examples




                                                 Activity Model
               [                                        ]
                0                                   1

                                                                  44
Learning
Coordinate descend
• Initialize model parameters
1. Find best matching locations
2. Update

          positive examples                  negative examples




                                                 Activity Model
               [                                        ]
                0                                   1

                                                                  45
Learning
Coordinate descend
• Initialize model parameters
1. Find best matching locations
                                    Repeat till convergence (or max iter.)
2. Update

          positive examples                           negative examples




                                                           Activity Model
               [                                                  ]
                0                                             1

                                                                             46
Outline
• Discriminative model for activities
  – Representation
  – Recognition
  – Learning
• Experiments
• Conclusions



                                        47
Experiment I: Simple Actions
 • KTH dataset [Schuldt et al 2004]
         Action Class         Our Model
                                                        walking      jogging       running
         walking              94.4%
         running              79.5%
         jogging              78.2%
         hand-waving          99.9%
         hand-clapping        96.5%
         boxing               99.2%
                                                          boxing   hand-waving   hand-clapping
100.0%                                Ours
 90.0%                                Wang et al 2009
 80.0%
 70.0%                                Laptev et al 2008
 60.0%
                                      Wong et al 2007
 50.0%
                   Accuracy           Schuldt et al
                                                                                             48
                                      2004
Experiment II: Proof of concept
 • Activities synthesized from               • 6 classes
   Weizmann [Blank 2005]                          •Ours 100%
                                                    Ours           100%
                                                  •Bag-of-features 17%
                                                    Bag-of-Features 17%



wave                 jump                    jumping - jacks
                                                                        Activity Model

   [0                                                                       1]
 shorter



                                                                 jumping jacks
   waving



            waving          Transition from jump to   jumping jacks
 longer                     jumping jacks                                         49
Experiment III: Olympic Sports Dataset
• YouTube videos with class labels per video from AMT
• 16 classes, ~100 videos each http://vision.stanford.edu/Datasets/OlympicSports




   high-jump long-jump triple-jump pole-vault     discus      hammer    javelin     shot put




                                                                                             50
   basketball bowling tennis-serve platform     springboard   snatch   clean-jerk    vault
     lay-up
Learned model: High Jump


                                               Activity Model

 [0                                                1]
shorter




                                              Landing &
   Start running         Run   Take off       stand up




longer             Run




                                                          51
Learned model: High Jump


                                               Activity Model

 [0                                                1]
shorter




                                              Landing &
   Start running         Run   Take off       stand up




longer             Run


    Shorter
segment, larger
   location                                               52
Learned model: High Jump


                                             Activity Model

[0                                               1]




                                            Landing &
 Start running         Run   Take off       stand up




                 Run


            Long segment,
            small location
             uncertainty                                53
Learned Model: Clean and Jerk


                                                                        Activity Model

[0                                                                          1]




Hold weight while            Lift Weight to             Hold weight on
crouching                    shoulders                  shoulders



         Hold weight while
         crouching
                                       Transition to upright position




                                                                                   54
Learned Model: Clean and Jerk


                                                                        Activity Model

[0                                                                          1]




Hold weight while            Lift Weight to             Hold weight on
crouching                    shoulders                  shoulders



         Hold weight while
         crouching
                                       Transition to upright position

                   Short segment with low
              location uncertainty, it had high
               location consistency in training                                    55
Learned Model: Clean and Jerk


                                                                        Activity Model

[0                                                                          1]




Hold weight while            Lift Weight to             Hold weight on
crouching                    shoulders                  shoulders



         Hold weight while
         crouching
                                       Transition to upright position

     Segments encode
    similar appearance,
 possible locations overlap                                                        56
Matched Sequences
Long Jump
Sequence 1
                 Run                            Take off               Stand up
Long Jump
Sequence 2


   Remarks:
   •Matching is tolerant to variations in exact motion segment temporal location.
   • Query videos can have different time length.

                                                                   Long Jump Model

    [0                                                                    1]




                                                                               57
Matched Sequences
   Vault
Sequence 1
             Run                Up in the air      Landing
   Vault
Sequence 2

                        Low matching score, good
                         temporal alignment, bad
                              appearance.

                                                   Vault Model

    [0                                               1]




                                                             58
Classifying Olympic Sports
100.0%
 90.0%
 80.0%
 70.0%
 60.0%
 50.0%
 40.0%
 30.0%




                Ours       Laptev et al CVPR 08

              Our Method              72.1%
              Laptev et al 2008       62.0%       59
Outline
• Discriminative model for activities
  – Representation
  – Recognition
  – Learning
• Experiments
• Conclusions



                                        60
Conclusions




Temporal context and structures are useful      Olympic Sports Dataset
         for activity recognition            (16 classes, ~100 video/class)

 Future directions
 • Explore richer temporal structures;
 • Introduce semantics for more meaningful decomposition

                                                                              61
Thank you!

                           Juan Carlos Niebles
                               Graduate student
                              Princeton/Stanford


Bangpeng Yao, Barry Chai, Jia Deng, Hao Su, Olga
Russakovsky, and all Stanford Vision Lab members.

Contenu connexe

Similaire à ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

PHD PROJECT INTRODUCTION
PHD PROJECT INTRODUCTIONPHD PROJECT INTRODUCTION
PHD PROJECT INTRODUCTIONFaisal Azhar
 
cvpr2011: human activity recognition - part 6: applications
cvpr2011: human activity recognition - part 6: applicationscvpr2011: human activity recognition - part 6: applications
cvpr2011: human activity recognition - part 6: applicationszukun
 
Event Processing and Stream Reasoning with ETALIS
Event Processing and Stream Reasoning with ETALISEvent Processing and Stream Reasoning with ETALIS
Event Processing and Stream Reasoning with ETALISDarko Anicic
 
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
IRJET-  	  Behavior Analysis from Videos using Motion based Feature ExtractionIRJET-  	  Behavior Analysis from Videos using Motion based Feature Extraction
IRJET- Behavior Analysis from Videos using Motion based Feature ExtractionIRJET Journal
 
A COMPARATIVE STUDY ON HUMAN ACTION RECOGNITION USING MULTIPLE SKELETAL FEATU...
A COMPARATIVE STUDY ON HUMAN ACTION RECOGNITION USING MULTIPLE SKELETAL FEATU...A COMPARATIVE STUDY ON HUMAN ACTION RECOGNITION USING MULTIPLE SKELETAL FEATU...
A COMPARATIVE STUDY ON HUMAN ACTION RECOGNITION USING MULTIPLE SKELETAL FEATU...mlaij
 

Similaire à ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification (6)

PHD PROJECT INTRODUCTION
PHD PROJECT INTRODUCTIONPHD PROJECT INTRODUCTION
PHD PROJECT INTRODUCTION
 
cvpr2011: human activity recognition - part 6: applications
cvpr2011: human activity recognition - part 6: applicationscvpr2011: human activity recognition - part 6: applications
cvpr2011: human activity recognition - part 6: applications
 
Event Processing and Stream Reasoning with ETALIS
Event Processing and Stream Reasoning with ETALISEvent Processing and Stream Reasoning with ETALIS
Event Processing and Stream Reasoning with ETALIS
 
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
IRJET-  	  Behavior Analysis from Videos using Motion based Feature ExtractionIRJET-  	  Behavior Analysis from Videos using Motion based Feature Extraction
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
 
A COMPARATIVE STUDY ON HUMAN ACTION RECOGNITION USING MULTIPLE SKELETAL FEATU...
A COMPARATIVE STUDY ON HUMAN ACTION RECOGNITION USING MULTIPLE SKELETAL FEATU...A COMPARATIVE STUDY ON HUMAN ACTION RECOGNITION USING MULTIPLE SKELETAL FEATU...
A COMPARATIVE STUDY ON HUMAN ACTION RECOGNITION USING MULTIPLE SKELETAL FEATU...
 
Iciap 2
Iciap 2Iciap 2
Iciap 2
 

Plus de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Plus de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Dernier

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Dernier (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

ECCV2010: Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification

  • 1. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification Juan Carlos Chih-Wei Li Niebles Chen Fei-Fei Computer Science Dept. Stanford University 1
  • 2. Recognizing Human Activities Motion Analysis Interactions with Objects Detect unusual behavior Temporal structure & causality Judge Sports Automatically Provide cooking assistance Smart surveillance Biomechanics … Psychology studies Video game interfaces 2
  • 3. Activity landscape Long term Snapshot Atomic action Activities Events event Construction Catch Run High Jump Football of a building 10-1 100 101 103 107-8 Temporal Scale (seconds) 3
  • 4. Activity landscape Long term Snapshot Atomic action Activities Events event Construction Catch Run High Jump Football of a building 10-1 100 101 103 107-8 • Thurau & Hlavac, 2008 • Bobick & Davis, 2001 • Ramanan & Forsyth, • Sridhar et al, 2010 • Gupta et al, 2009 • Efros et al, 2003 2003 • Kuettel, 2010 • Ikizler & Duygulu, 2009 • Schuldt et al, 2004 • Laxton et al, 2007 • Ikizler-Cinbis et al, 2009 • Alper & Shah, 2005 • Ikizler & Forsyth, 2008 • Yao & Fei-Fei 2010a,b • Dollar et al, 2005 • Gupta et al, 2009 • Yang, Wang and Mori, • Blank et al, 2005 • Choi & Savarese, 2009 2010 • Niebles et al, 2006 • Laptev et al, 2008 • Wang & Mori, 2008 • Rodriguez et al, 2008 • Wang & Mori, 2009 • Gupta et al, 2009 • Liu et al, 2009 4 • Marszalek et al, 2009
  • 5. Activity landscape Long term Snapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds) • Composition of simple motions • Non-periodic • Longer duration than atomic actions
  • 6. Activity landscape – related datasets Long term Snapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds) Actions in still images KTH New [Ikizler 2009] [Schuldt et al 2004] Olympic Sports PPMI Hollywood Dataset [Yao & Fei-Fei 2010] [Laptev et al 2008] UIUC Sports UCF Sports [Li & Fei-Fei 2007] [Rodriguez et al 2008] Ballet [Yang et al 2009]
  • 7. Activity landscape Long term Snapshot Atomic action Activities Events event 10-1 100 101 103 107-8 Temporal Scale (seconds) Possible approaches: Pose-based recognition HMM, CRF Bag of features • Computationally intensive • Simple action recognition: Fails when actions Ferrari et al 2008 are complex Ramanan & Forsyth 2003 Laptev et al 2008 Sminchisescu 2006 Nazli & Forsyth 2008 Niebles et al 2006 Blank et al 2005 7 […] Liu et al 2009 Efros et al 2003 […]
  • 8. Our proposal – decompose activities into simpler motion segments 1. Simple motions are easier to describe computationally 2. Can leverage temporal context 3. Human visual system seems to rely on decomposition for understanding [Zacks et al, Nature Neuro 2001, Tversky et al, JEP, 2006] 8
  • 9. Outline • Discriminative model for activities – Representation – Recognition – Learning • Experiments • Conclusions 9
  • 10. Outline • Discriminative model for activities – Representation – Recognition – Learning • Experiments • Conclusions 10
  • 11. A model for activities Activity Model 11
  • 12. A model for complex activities Activity Model Model Properties 0 1 • Use a standard [ ] time range: [0,1] time 12
  • 13. A model for complex activities Activity Model Model Properties 0 1 • Use a standard [ ] time range: [0,1] time • Model is formed by a few simple motions 13
  • 14. A model for complex activities Activity Model Model Properties 0 1 • Use a standard [ ] time range: [0,1] time • Model is formed by a few simple motions 14
  • 15. A model for complex activities Activity Model Model Properties 0 1 • Use a standard [ ] time range: [0,1] time • Model is formed by a few simple motions • Local motion appearance : Motion Segment 1 15
  • 16. A model for complex activities Activity Model Model Properties 0 1 • Use a standard [ ] time range: [0,1] : anchor location time • Model is formed by a few simple motions • Local motion appearance • Encode temporal order : Motion Segment 1 16
  • 17. A model for complex activities temporal location uncertainty Activity Model Model Properties 0 1 • Use a standard [ ] time range: [0,1] : anchor location time • Model is formed by a few simple motions • Local motion appearance • Encode temporal order : Motion Segment 1 • Temporal flexibility 17
  • 18. A model for complex activities temporal location uncertainty Activity Model Model Properties 0 1 • Use a standard [ ] time range: [0,1] : anchor location time shorter • Model is formed by a few simple motions • Local motion appearance • Encode temporal order : Motion Segment 1 • Temporal flexibility • Multiple temporal scales longer 18
  • 19. Outline • Discriminative model for activities – Representation – Recognition – Learning • Experiments • Conclusions 19
  • 20. Query Video Recognition 20
  • 21. Query Video Recognition [0 1] Activity Model [0 1] 21
  • 22. Query Video Recognition [0 1] Match Motion Segment 1: Activity Model [0 1] 22
  • 23. Query Video Recognition [0 1] Match Motion Segment 1: • Consider a candidate location Activity Model [0 1] 23
  • 24. Query Video Recognition [0 1] Match Motion Segment 1: • Consider a candidate location • Matching score for this segment: Activity Model [0 1] 24
  • 25. Query Video Recognition [0 1] Match Motion Segment 1: • Consider a candidate location • Matching score for this segment: Activity Model [0 1] 25
  • 26. Query Video Recognition [0 1] Match Motion Segment 1: Spatio-temporal Interest points • Consider a candidate location HOG/HOF Descriptors • Matching score for this segment: [Laptev et al, 2005] Activity Model [0 1] 26
  • 27. Query Video Recognition [0 1] Match Motion Segment 1: Vector-quantized into a codebook • Consider a candidate location of 1000 spatio-temporal words. • Matching score for this segment: Activity Model [0 1] 27
  • 28. Query Video Recognition Video words [0 1] Match Motion Segment 1: Appearance feature: • Consider a candidate location histogram of video words • Matching score for this segment: Activity Model [0 1] 28
  • 29. Query Video Recognition Video words [0 1] Match Motion Segment 1: Appearance similarity score: • Consider a candidate location Chi-square kernel SVM • Matching score for this segment: Activity Model [0 1] 29
  • 30. Query Video Recognition [0 1] Match Motion Segment 1: • Consider a candidate location • Matching score for this segment: Activity Model [0 1] 30
  • 31. Query Video Recognition [0 1] Match Motion Segment 1: • Consider a candidate location • Matching score for this segment: Activity Model [0 1] 31
  • 32. Query Video Recognition [0 1] Match Motion Segment 1: Temporal location feature: • Consider a candidate location the distance btw h_1 and the • Matching score for this segment: anchor location: Activity Model [0 1] 32
  • 33. Query Video Recognition [0 1] Match Motion Segment 1: Temporal location disagreement • Consider a candidate location score: 2nd order polynomial • Matching score for this segment: Activity Model [0 1] 33
  • 34. Query Video Recognition [0 1] Match Motion Segment 1: • Consider a candidate location • Matching score for this segment: Activity Model [0 1] 34
  • 35. Query Video Recognition [0 1] Match Motion Segment 1: • Consider a candidate location • Matching score for this segment: Activity Model [0 1] 35
  • 36. Query Video Recognition [0 1] Match Motion Segment 1: • Consider a candidate location • Matching score for this segment: Activity Model [0 1] 36
  • 37. Query Video Recognition [0 1] • Matching score for all segments: Activity Model [0 1] 37
  • 38. Outline • Discriminative model for activities – Representation – Recognition – Learning • Experiments • Conclusions 38
  • 39. Learning from weakly labeled data positive examples negative examples 39 • YouTube videos • Class label per video collected on Amazon Mechanical Turk • No annotation of temporal segments 39
  • 40. Learning from weakly labeled data positive examples negative examples 40 Activity Model [0 1] 40
  • 41. Learning Goal Learn: • Motion segment appearance • Temporal arrangement A max-margin framwork by optimizing a discriminative loss:  Coordinate descend [Felzenszwalb et al 2008] Activity Model [0 1] 41
  • 42. Learning Coordinate descend • Initialize model parameters positive examples negative examples Activity Model [ ] 0 1 42
  • 43. Learning Coordinate descend • Initialize model parameters 1. Find best matching locations positive examples negative examples Activity Model [ ] 0 1 43
  • 44. Learning Coordinate descend • Initialize model parameters 1. Find best matching locations 2. Update positive examples negative examples Activity Model [ ] 0 1 44
  • 45. Learning Coordinate descend • Initialize model parameters 1. Find best matching locations 2. Update positive examples negative examples Activity Model [ ] 0 1 45
  • 46. Learning Coordinate descend • Initialize model parameters 1. Find best matching locations Repeat till convergence (or max iter.) 2. Update positive examples negative examples Activity Model [ ] 0 1 46
  • 47. Outline • Discriminative model for activities – Representation – Recognition – Learning • Experiments • Conclusions 47
  • 48. Experiment I: Simple Actions • KTH dataset [Schuldt et al 2004] Action Class Our Model walking jogging running walking 94.4% running 79.5% jogging 78.2% hand-waving 99.9% hand-clapping 96.5% boxing 99.2% boxing hand-waving hand-clapping 100.0% Ours 90.0% Wang et al 2009 80.0% 70.0% Laptev et al 2008 60.0% Wong et al 2007 50.0% Accuracy Schuldt et al 48 2004
  • 49. Experiment II: Proof of concept • Activities synthesized from • 6 classes Weizmann [Blank 2005] •Ours 100% Ours 100% •Bag-of-features 17% Bag-of-Features 17% wave jump jumping - jacks Activity Model [0 1] shorter jumping jacks waving waving Transition from jump to jumping jacks longer jumping jacks 49
  • 50. Experiment III: Olympic Sports Dataset • YouTube videos with class labels per video from AMT • 16 classes, ~100 videos each http://vision.stanford.edu/Datasets/OlympicSports high-jump long-jump triple-jump pole-vault discus hammer javelin shot put 50 basketball bowling tennis-serve platform springboard snatch clean-jerk vault lay-up
  • 51. Learned model: High Jump Activity Model [0 1] shorter Landing & Start running Run Take off stand up longer Run 51
  • 52. Learned model: High Jump Activity Model [0 1] shorter Landing & Start running Run Take off stand up longer Run Shorter segment, larger location 52
  • 53. Learned model: High Jump Activity Model [0 1] Landing & Start running Run Take off stand up Run Long segment, small location uncertainty 53
  • 54. Learned Model: Clean and Jerk Activity Model [0 1] Hold weight while Lift Weight to Hold weight on crouching shoulders shoulders Hold weight while crouching Transition to upright position 54
  • 55. Learned Model: Clean and Jerk Activity Model [0 1] Hold weight while Lift Weight to Hold weight on crouching shoulders shoulders Hold weight while crouching Transition to upright position Short segment with low location uncertainty, it had high location consistency in training 55
  • 56. Learned Model: Clean and Jerk Activity Model [0 1] Hold weight while Lift Weight to Hold weight on crouching shoulders shoulders Hold weight while crouching Transition to upright position Segments encode similar appearance, possible locations overlap 56
  • 57. Matched Sequences Long Jump Sequence 1 Run Take off Stand up Long Jump Sequence 2 Remarks: •Matching is tolerant to variations in exact motion segment temporal location. • Query videos can have different time length. Long Jump Model [0 1] 57
  • 58. Matched Sequences Vault Sequence 1 Run Up in the air Landing Vault Sequence 2 Low matching score, good temporal alignment, bad appearance. Vault Model [0 1] 58
  • 59. Classifying Olympic Sports 100.0% 90.0% 80.0% 70.0% 60.0% 50.0% 40.0% 30.0% Ours Laptev et al CVPR 08 Our Method 72.1% Laptev et al 2008 62.0% 59
  • 60. Outline • Discriminative model for activities – Representation – Recognition – Learning • Experiments • Conclusions 60
  • 61. Conclusions Temporal context and structures are useful Olympic Sports Dataset for activity recognition (16 classes, ~100 video/class) Future directions • Explore richer temporal structures; • Introduce semantics for more meaningful decomposition 61
  • 62. Thank you! Juan Carlos Niebles Graduate student Princeton/Stanford Bangpeng Yao, Barry Chai, Jia Deng, Hao Su, Olga Russakovsky, and all Stanford Vision Lab members.