SlideShare une entreprise Scribd logo
1  sur  67
Télécharger pour lire hors ligne
Frontiers of
Human Activity Analysis

                J. K. Aggarwal
               Michael S. Ryoo
                 Kris M. Kitani
Overview

        Human Activity Recognition



Single-layer                  Hierarchical

        Space-time                     Statistical
                                       Syntactic

        Sequential                     Descriptive


                                                     2
Motivation

How do we interpret a sequence of actions?




                                             3
Hierarchy
Hierarchy implies decomposition into sub-parts




                                                 4
Now we’ll cover…

        Human Activity Recognition



Single-layer                  Hierarchical

        Space-time                   Statistical
                                     Syntactic
        Sequential                     Descriptive


                                                     5
Syntactic
Approaches


             6
Syntactic Models


Activities as strings of symbols.




What is the underlying structure?


                                    7
Early applications to Vision
                                        Tsai and Fu 1980.
Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition.




                                                                                                       8
Hierarchical syntactic approach
  Useful for activities with:
    Deep hierarchical structure
    Repetitive (cyclic) structure


  Not for
    Systems with a lot of errors and uncertainty
    Activities with shallow structure



                                                    9
Basics
                  Context-Free Grammar



      Generic Language            Natural Languages


       Start Symbol (S)               Sentences


  Set of Terminal Symbols (T)           Words


Set of Non-Terminal Symbols (N)     Parts of Speech


  Set of Production Rules (P)        Syntax Rules



                                                      10
Parsing with a grammar
S → NP VP         (0.8)           PP → PREP NP   (1.0)
S → VP            (0.2)           PREP → like    (1.0)
NP → NOUN         (0.4)           VERB → swat    (0.2)
NP → NOUN PP      (0.4)           VERB → flies   (0.4)
NP → NOUN NP      (0.2)           VERB → like    (0.4)
VP → VERB         (0.3)           NOUN → swat    (0.05)
VP → VERB NP      (0.3)           NOUN → flies   (0.45)
VP → VERB PP      (0.2)           NOUN → ants    (0.5)
VP → VERB NP PP   (0.2)




 swat                     flies          like      ants
                                                          11
Parsing with a grammar
S → NP VP            (0.8)               PP → PREP NP      (1.0)
S → VP               (0.2)               PREP → like       (1.0)
NP → NOUN            (0.4)               VERB → swat       (0.2)
NP → NOUN PP         (0.4)               VERB → flies      (0.4)
NP → NOUN NP         (0.2)               VERB → like       (0.4)
VP → VERB            (0.3)               NOUN → swat       (0.05)
VP → VERB NP         (0.3)               NOUN → flies      (0.45)
VP → VERB PP         (0.2)               NOUN → ants       (0.5)
VP → VERB NP PP      (0.2)

                                S
                  NP           (0.8)            VP

                  (0.2)                            (0.3)
NOUN                          NP                              NP
                                 (0.4)                              (0.4)

                             NOUN              VERB        NOUN
    (0.05)                      (0.45)            (0.4)       (0.5)

 swat                         flies             like         ants
                                                                            12
Video analysis with CFGs

     The “Inverse Hollywood problem”:
     From video to scripts and storyboards via causal analysis.
     Brand 1997




     Action Recognition using Probabilistic Parsing.
     Bobick and Ivanov 1998




     Recognizing Multitasked Activities from Video using
     Stochastic Context-Free Grammar.
     Moore and Essa 2001




                                                                  13
CFG for human activities




enter   detach   leave   enter   detach   attach    touch    touch     detach      attach       leave




                                                   M. Brand. The "Inverse Hollywood Problem":
                                                      From video to scripts and storyboards
                                                         via causal analysis. AAAI 1997.




                                                                                                        14
Parse tree
                                                              SCENE (Open up a PC)




               IN                     ACTION (Open PC)                               ACTION (unscrew)                      OUT


                                   OUT                   IN                                MOVE                          REMOVE




              ADD                                      ADD
                                                                                     MOTION     MOTION


      enter         detach       leave         enter          detach      attach      touch      touch       detach   attach     leave




                                                               •  Deterministic low-level primitive detection
                                                               •  Deterministic parsing



M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.
                                                                                                                                         15
Stochastic CFGs
Action Recognition using Probabilistic Parsing.
          Bobick and Ivanov 1998




                                                  16
Gesture analysis with CFGs
                                       Primitive recognition with HMMs




Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
                                                                         17
left-right




Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
                                                                                18
up-down




Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
                                                                            19
right-left




Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
                                                                                20
down-up




Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
                                                                            21
Parse Tree
                       S


                    RH



  TOP          UD            BOT          DU


   LR                         RL

left-right   up-down       right-left   down-up

                                               22
Errors
                   Likelihood value over time (not discrete symbols)


          HMM a

          HMM b



                                                  Errors are inevitable…
                      but the grammar acts as a top-down constraint


Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
                                                                           23
Dealing with uncertainty & errors
        Stolcke-Early (probabilistic) parser
        SKIP rules to deal with insertion errors


               HMM a


               HMM b


               HMM c



Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998
                                                                         24
SCFG for Blackjack
Recognizing Multitasked Activities from Video using
       Stochastic Context-Free Grammar.
             Moore and Essa 2001




  •  Deals with more complex activities
  •  Deals with more error types

                                                      25
extracting primitive actions




                               26
Game grammar




Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001
                                                                                                           27
Dealing with errors

  Ungrammatical strings cause parser to fail
  Account for errors with multiple hypothesis
     Insertion, deletion, substitution

  Issues
     How many errors should we tolerate?
     Potentially exponential hypothesis space
     Ungrammatical strings: vision problem or illegal
      activity?


                                                         28
Observations
  CFGs good for structured activities
  Can incorporate uncertainty in observations
  Natural contextual prior for recognizing errors

  Not clear how to deal with errors
  Assumes ‘good’ action classifiers
  Need to define grammar manually

     Can we learn the grammar from data?

                                                     29
Heuristic Grammatical Induction

                                 1.  Lexicon learning
                                     •  Learn HMMs
                                     •  Cluster HMMs

                                 2.  Convert video to string

                                 3.  Learn Grammar




  Unsupervised Analysis of Human Gestures. Wang et al 2001
                                                               30
COMPRESSIVE
  a b c d a b c d b c d a b a b
                                          length   occurrence   new rule      new symbol


                                             deletion of             insertion of
                                              substring                new rule
                              substring




On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences.
                                Nevill-Manning 2000.
                                                                                           31
example

S→a b c d a b c d b c d a b a b
                                              (DL=16)


A→b c d
S→a A a               A         A          a b a b
                                              (DL=14)


     Repeat until compression becomes 0.
                                                    32
Critical assumption
  No uncertainty
  No errors
    insertions
    deletions
    substitution


 Can we learn grammars despite errors?


                                         33
Learning with noise
 Can we learn the basic structure of a transaction?




Recovering the basic structure of human activities from
 noisy video-based symbol strings. Kitani et al 2008.
                                                          34
extracting primitives




Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                               35
Underlying structure?

  D → a x b y c a b x c y a b c x




Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                               36
Underlying structure?

  D → a x b y c a b x c y a b c x


          D→a                                    b                c a b                                c       a b c



Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                                        37
Underlying structure?

  D → a x b y c a b x c y a b c x


          D→a                                    b                c a b                                c       a b c



Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                                        38
Underlying structure?

  D → a x b y c a b x c y a b c x


          D→a                                    b                c a b                                c        a b c

                        A→a b c                                                 D → A A A
                                   Simple grammar                                           Efficient compression


Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                                         39
Information Theory Problem (MDL)

              ˆ
              G = arg min {DL(G) + DL(D|G)}
                                                                Model complexity                         Data compression
                                          G




Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                                            40
Information Theory Problem (MDL)

              ˆ
              G = arg min {DL(G) + DL(D|G)}
                                                                Model complexity                         Data compression
                                          G
                            DL(G)                        =           − log p(G)
                          Model complexity
                                                         =           − log p(θS , GS )
                                                         =           − log p(θS |GS ) − log p(GS )
                                                         =           DL(θS |GS ) − DL(GS )
                                                                        Grammar parameters                     Grammar structure




Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                                                   41
Information Theory Problem (MDL)

              ˆ
              G = arg min {DL(G) + DL(D|G)}
                                                                Model complexity                         Data compression
                                          G
                            DL(G)                        =           − log p(G)
                          Model complexity
                                                         =           − log p(θS , GS )
                                                         =           − log p(θS |GS ) − log p(GS )
                                                         =           DL(θS |GS ) − DL(GS )
                                                                        Grammar parameters                     Grammar structure



                      DL(D|G) = − log p(D|G)
                         Data compression                                              Likelihood
                                                                                 (inside probabilities)

Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                                                   42
Minimum Description Length




Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                               43
Minimum Description Length




Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                               44
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                               45
Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008.
                                                                                                               46
Conclusions
  Possible to learn basic structure
  Robust to errors
   (insertion, deletion, substitution)

  Need a lot of training data
  Computational complexity




                                         47
Bayesian Approaches




Infinite Hierarchical Hidden Markov Models.   The Infinite PCFG using Hierarchical Dirichlet Processes.
              Heller et al 2009.                               Liang et al 2007.




                                                                                                        48
Take home message
              Hierarchical Syntactic Models


  Useful for activities with:
    Deep hierarchical structure
    Repetitive (cyclic) structure


  Not for
    Systems with a lot of errors and uncertainty
    Activities with weak structure



                                                    49
Statistical
Approaches


               50
Using a hierarchical statistical approach

  Use when
    Low-level action detectors are noisy
    Structure of activity is sequential
    Integrating dynamics


  Not for
    Activities with deep hierarchical structure
    Activities with complex temporal structure


                                                   51
Statistical (State-based) Model
Activities as a stochastic path.




                   What are the underlying dynamics?

                                                       52
Characteristics
  Strong Markov assumption
  Strong dynamics prior
  Robust to uncertainty

  Modifications to account for
    Hierarchical structure
    Concurrent structure



                                  53
Hierarchical activities
      Problem:
  How do we model
hierarchical activities?

                           combinatory state space!




      Solution:
 “stack” actions for
hierarchical activities

                                                      54
Hierarchical hidden Markov model




 Learning and Detecting Activities from Movement Trajectories Using the
        Hierarchical Hidden Markov Models. Nguyen et al 2005
                                                                          55
Context-free activity grammar




Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005
                                                                                                                              56
Context-free activity grammar




Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005
                                                                                                                              57
Observations
  Tree structures useful for hierarchies
  Tight integration of trajectories with
   abstract semantic states

  Activities are not always a single
   sequence
   (ie. they sometimes happen in parallel )



                                              58
Concurrent activities
     Problem:
 How do we model
concurrent activities?

                         combinatory state space!




     Solution:
“stand-up” model for
 concurrent activities

                                                    59
Propagation network




Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004
                                                                                              60
Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004
                                                                                              61
temporal inference




Inference by standing the state transition model on its side
                                                               62
Inferring structure (storylines)
              Understanding Videos, Constructing Plots –
  Learning a Visually Grounded Storyline Model from Annotated Videos
             Gupta, Srinivasan, Shi and Davis CVPR 2009




            Learn AND-OR graphs from weakly labeled data

                                                                       63
Scripts from structure




Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos.
Gupta, Srinivasan, Shi and Davis CVPR 2009
                                                                                                                 64
Take home message
              Hierarchical statistical model


  Use when
    Low-level action detectors are noisy
    Structure of activity is sequential
    Integrating dynamics


  Not for
    Activities with deep hierarchical structure
    Activities with complex temporal structure


                                                   65
Contrasting hierarchical approaches


              Actions as:     Activities as:    Model     Characteristic


              probabilistic                                 Robust to
 Statistic                        paths          DBN
                 states                                    uncertainty


                discrete                                    Describes
Syntactic                        strings         CFG
                symbols                                   deep hierarchy

                                                            Encodes
                  logical
Descriptive                       sets         CFG, MLN     complex
              relationships
                                                              logic



                                                                         66
References
                          (not included in ACM survey paper)


  W. Tsai and K.S. Fu. Attributed Grammar-A Tool for Combining
   Syntactic and Statistical Approaches to Pattern Recognition. SMC1980.
  M. Brand. The "Inverse Hollywood Problem":
   From video to scripts and storyboards via causal analysis. AAAI 1997.
  T. Wang, H. Shum, Y. Xu, N. Zheng. Unsupervised Analysis of Human
   Gestures. PRCM 2001.
  C.G. Nevill-Manning, I.H. Witten. On-Line and Off-Line Heuristics for
   Inferring Hierarchies of Repetitions in Sequences. IEEE 2000.
  K. Heller, Y.W. Teh and D. Gorur. Infinite Hierarchical Hidden Markov Model
   s. AISTATS 2009.
  P. Liang, S. Petrov, M. Jordan, D. Klein. The Infinite PCFG using
   Hierarchical Dirichlet Processes. EMNLP 2007.
  A. Gupta, N. Srinivasan, J.Shi and L.Davis. Understanding Videos,
   Constructing Plots - Learning a Visually Grounded Storyline Model from
   Annotated Videos. CVPR 2009.


                                                                                67

Contenu connexe

En vedette (7)

Human+Resource+Systems
Human+Resource+SystemsHuman+Resource+Systems
Human+Resource+Systems
 
Industrial psychology
Industrial   psychologyIndustrial   psychology
Industrial psychology
 
Industrial Psychology
Industrial PsychologyIndustrial Psychology
Industrial Psychology
 
The scope of psychology
The scope of psychology The scope of psychology
The scope of psychology
 
Industrial Organizational Psychology . ppt
Industrial Organizational Psychology . ppt Industrial Organizational Psychology . ppt
Industrial Organizational Psychology . ppt
 
HR Strategy - How to develop and deploy your hrm strategy - a manual for HR ...
HR Strategy - How to develop and deploy your hrm strategy  - a manual for HR ...HR Strategy - How to develop and deploy your hrm strategy  - a manual for HR ...
HR Strategy - How to develop and deploy your hrm strategy - a manual for HR ...
 
Introduction to Industrial Psychology and its Basic Concept
Introduction to Industrial Psychology and its Basic ConceptIntroduction to Industrial Psychology and its Basic Concept
Introduction to Industrial Psychology and its Basic Concept
 

Plus de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Plus de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Dernier

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Dernier (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

cvpr2011: human activity recognition - part 4: syntactic

  • 1. Frontiers of Human Activity Analysis J. K. Aggarwal Michael S. Ryoo Kris M. Kitani
  • 2. Overview Human Activity Recognition Single-layer Hierarchical Space-time Statistical Syntactic Sequential Descriptive 2
  • 3. Motivation How do we interpret a sequence of actions? 3
  • 5. Now we’ll cover… Human Activity Recognition Single-layer Hierarchical Space-time Statistical Syntactic Sequential Descriptive 5
  • 7. Syntactic Models Activities as strings of symbols. What is the underlying structure? 7
  • 8. Early applications to Vision Tsai and Fu 1980. Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. 8
  • 9. Hierarchical syntactic approach   Useful for activities with:   Deep hierarchical structure   Repetitive (cyclic) structure   Not for   Systems with a lot of errors and uncertainty   Activities with shallow structure 9
  • 10. Basics Context-Free Grammar Generic Language Natural Languages Start Symbol (S) Sentences Set of Terminal Symbols (T) Words Set of Non-Terminal Symbols (N) Parts of Speech Set of Production Rules (P) Syntax Rules 10
  • 11. Parsing with a grammar S → NP VP (0.8) PP → PREP NP (1.0) S → VP (0.2) PREP → like (1.0) NP → NOUN (0.4) VERB → swat (0.2) NP → NOUN PP (0.4) VERB → flies (0.4) NP → NOUN NP (0.2) VERB → like (0.4) VP → VERB (0.3) NOUN → swat (0.05) VP → VERB NP (0.3) NOUN → flies (0.45) VP → VERB PP (0.2) NOUN → ants (0.5) VP → VERB NP PP (0.2) swat flies like ants 11
  • 12. Parsing with a grammar S → NP VP (0.8) PP → PREP NP (1.0) S → VP (0.2) PREP → like (1.0) NP → NOUN (0.4) VERB → swat (0.2) NP → NOUN PP (0.4) VERB → flies (0.4) NP → NOUN NP (0.2) VERB → like (0.4) VP → VERB (0.3) NOUN → swat (0.05) VP → VERB NP (0.3) NOUN → flies (0.45) VP → VERB PP (0.2) NOUN → ants (0.5) VP → VERB NP PP (0.2) S NP (0.8) VP (0.2) (0.3) NOUN NP NP (0.4) (0.4) NOUN VERB NOUN (0.05) (0.45) (0.4) (0.5) swat flies like ants 12
  • 13. Video analysis with CFGs The “Inverse Hollywood problem”: From video to scripts and storyboards via causal analysis. Brand 1997 Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 13
  • 14. CFG for human activities enter detach leave enter detach attach touch touch detach attach leave M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997. 14
  • 15. Parse tree SCENE (Open up a PC) IN ACTION (Open PC) ACTION (unscrew) OUT OUT IN MOVE REMOVE ADD ADD MOTION MOTION enter detach leave enter detach attach touch touch detach attach leave •  Deterministic low-level primitive detection •  Deterministic parsing M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997. 15
  • 16. Stochastic CFGs Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 16
  • 17. Gesture analysis with CFGs Primitive recognition with HMMs Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 17
  • 18. left-right Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 18
  • 19. up-down Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 19
  • 20. right-left Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 20
  • 21. down-up Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 21
  • 22. Parse Tree S RH TOP UD BOT DU LR RL left-right up-down right-left down-up 22
  • 23. Errors Likelihood value over time (not discrete symbols) HMM a HMM b Errors are inevitable… but the grammar acts as a top-down constraint Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 23
  • 24. Dealing with uncertainty & errors   Stolcke-Early (probabilistic) parser   SKIP rules to deal with insertion errors HMM a HMM b HMM c Action Recognition using Probabilistic Parsing. Bobick and Ivanov 1998 24
  • 25. SCFG for Blackjack Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 •  Deals with more complex activities •  Deals with more error types 25
  • 27. Game grammar Recognizing Multitasked Activities from Video using Stochastic Context-Free Grammar. Moore and Essa 2001 27
  • 28. Dealing with errors   Ungrammatical strings cause parser to fail   Account for errors with multiple hypothesis   Insertion, deletion, substitution   Issues   How many errors should we tolerate?   Potentially exponential hypothesis space   Ungrammatical strings: vision problem or illegal activity? 28
  • 29. Observations   CFGs good for structured activities   Can incorporate uncertainty in observations   Natural contextual prior for recognizing errors   Not clear how to deal with errors   Assumes ‘good’ action classifiers   Need to define grammar manually Can we learn the grammar from data? 29
  • 30. Heuristic Grammatical Induction 1.  Lexicon learning •  Learn HMMs •  Cluster HMMs 2.  Convert video to string 3.  Learn Grammar Unsupervised Analysis of Human Gestures. Wang et al 2001 30
  • 31. COMPRESSIVE a b c d a b c d b c d a b a b length occurrence new rule new symbol deletion of insertion of substring new rule substring On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. Nevill-Manning 2000. 31
  • 32. example S→a b c d a b c d b c d a b a b (DL=16) A→b c d S→a A a A A a b a b (DL=14) Repeat until compression becomes 0. 32
  • 33. Critical assumption   No uncertainty   No errors   insertions   deletions   substitution Can we learn grammars despite errors? 33
  • 34. Learning with noise Can we learn the basic structure of a transaction? Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 34
  • 35. extracting primitives Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 35
  • 36. Underlying structure? D → a x b y c a b x c y a b c x Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 36
  • 37. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b c Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 37
  • 38. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b c Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 38
  • 39. Underlying structure? D → a x b y c a b x c y a b c x D→a b c a b c a b c A→a b c D → A A A Simple grammar Efficient compression Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 39
  • 40. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 40
  • 41. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G DL(G) = − log p(G) Model complexity = − log p(θS , GS ) = − log p(θS |GS ) − log p(GS ) = DL(θS |GS ) − DL(GS ) Grammar parameters Grammar structure Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 41
  • 42. Information Theory Problem (MDL) ˆ G = arg min {DL(G) + DL(D|G)} Model complexity Data compression G DL(G) = − log p(G) Model complexity = − log p(θS , GS ) = − log p(θS |GS ) − log p(GS ) = DL(θS |GS ) − DL(GS ) Grammar parameters Grammar structure DL(D|G) = − log p(D|G) Data compression Likelihood (inside probabilities) Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 42
  • 43. Minimum Description Length Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 43
  • 44. Minimum Description Length Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 44
  • 45. Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 45
  • 46. Recovering the basic structure of human activities from noisy video-based symbol strings. Kitani et al 2008. 46
  • 47. Conclusions   Possible to learn basic structure   Robust to errors (insertion, deletion, substitution)   Need a lot of training data   Computational complexity 47
  • 48. Bayesian Approaches Infinite Hierarchical Hidden Markov Models. The Infinite PCFG using Hierarchical Dirichlet Processes. Heller et al 2009. Liang et al 2007. 48
  • 49. Take home message Hierarchical Syntactic Models   Useful for activities with:   Deep hierarchical structure   Repetitive (cyclic) structure   Not for   Systems with a lot of errors and uncertainty   Activities with weak structure 49
  • 51. Using a hierarchical statistical approach   Use when   Low-level action detectors are noisy   Structure of activity is sequential   Integrating dynamics   Not for   Activities with deep hierarchical structure   Activities with complex temporal structure 51
  • 52. Statistical (State-based) Model Activities as a stochastic path. What are the underlying dynamics? 52
  • 53. Characteristics   Strong Markov assumption   Strong dynamics prior   Robust to uncertainty   Modifications to account for   Hierarchical structure   Concurrent structure 53
  • 54. Hierarchical activities Problem: How do we model hierarchical activities? combinatory state space! Solution: “stack” actions for hierarchical activities 54
  • 55. Hierarchical hidden Markov model Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 55
  • 56. Context-free activity grammar Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 56
  • 57. Context-free activity grammar Learning and Detecting Activities from Movement Trajectories Using the Hierarchical Hidden Markov Models. Nguyen et al 2005 57
  • 58. Observations   Tree structures useful for hierarchies   Tight integration of trajectories with abstract semantic states   Activities are not always a single sequence (ie. they sometimes happen in parallel ) 58
  • 59. Concurrent activities Problem: How do we model concurrent activities? combinatory state space! Solution: “stand-up” model for concurrent activities 59
  • 60. Propagation network Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004 60
  • 61. Propagation Networks for Recognition of Partially Ordered Sequential Action. Shi et al 2004 61
  • 62. temporal inference Inference by standing the state transition model on its side 62
  • 63. Inferring structure (storylines) Understanding Videos, Constructing Plots – Learning a Visually Grounded Storyline Model from Annotated Videos Gupta, Srinivasan, Shi and Davis CVPR 2009 Learn AND-OR graphs from weakly labeled data 63
  • 64. Scripts from structure Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos. Gupta, Srinivasan, Shi and Davis CVPR 2009 64
  • 65. Take home message Hierarchical statistical model   Use when   Low-level action detectors are noisy   Structure of activity is sequential   Integrating dynamics   Not for   Activities with deep hierarchical structure   Activities with complex temporal structure 65
  • 66. Contrasting hierarchical approaches Actions as: Activities as: Model Characteristic probabilistic Robust to Statistic paths DBN states uncertainty discrete Describes Syntactic strings CFG symbols deep hierarchy Encodes logical Descriptive sets CFG, MLN complex relationships logic 66
  • 67. References (not included in ACM survey paper)   W. Tsai and K.S. Fu. Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition. SMC1980.   M. Brand. The "Inverse Hollywood Problem": From video to scripts and storyboards via causal analysis. AAAI 1997.   T. Wang, H. Shum, Y. Xu, N. Zheng. Unsupervised Analysis of Human Gestures. PRCM 2001.   C.G. Nevill-Manning, I.H. Witten. On-Line and Off-Line Heuristics for Inferring Hierarchies of Repetitions in Sequences. IEEE 2000.   K. Heller, Y.W. Teh and D. Gorur. Infinite Hierarchical Hidden Markov Model s. AISTATS 2009.   P. Liang, S. Petrov, M. Jordan, D. Klein. The Infinite PCFG using Hierarchical Dirichlet Processes. EMNLP 2007.   A. Gupta, N. Srinivasan, J.Shi and L.Davis. Understanding Videos, Constructing Plots - Learning a Visually Grounded Storyline Model from Annotated Videos. CVPR 2009. 67