ISVC2015 paper
http://www.hirokatsukataoka.net/pdf/isvc15_kataoka_dt13feature.pdf
Activity recognition has been an active research topic in computer vision. Recently, the most successful approaches use dense trajectories that extract a large number of trajectories and features on the trajectories into a codeword. In this paper, we evaluate various features in the framework of dense trajectories on several types of datasets. We implement 13 features in total by including five different types of descriptor, namely motion-, shape-, texture- trajectory- and co-occurrence-based feature descriptors. The experimental results show a relationship between feature descriptors and performance rate at each dataset. Different scenes of traffic, surgery, daily living and sports are used to analyze the feature characteristics. Moreover, we test how much the performance rate of concatenated vectors depends on the type, top-ranked in experiment and all 13 feature descriptors on fine-grained datasets. Feature evaluation is beneficial not only in the activity recognition problem, but also in other domains in spatio-temporal recognition.
Forensic Biology & Its biological significance.pdf
【ISVC2015】Evaluation of Vision-based Human Activity Recognition in Dense Trajectory Framework
1. Evaluation of Vision-based Human Activity
Recognition in Dense Trajectory Framework
Hirokatsu Kataoka, Yoshimitsu Aoki†, Kenji Iwata, Yutaka Satoh
National Institute of Advanced Industrial Science and Technology (AIST)
† Keio University
http://www.hirokatsukataoka.net/
2. Background
Computer vision for human sensing
- Detection, Tracking, Trajectory Analysis
- Posture Estimation, Activity Recognition
- Action recognition is able to extend human sensing applications
Mental state
Body Situation
Attention
Activity Analysis
shakinghands
Look at people
Detection
Gaze Estimation
Action Recognition
Posture Estimation
Face Recognition
Trajectory extraction
Tracking
3. Activity Recognition
“Activity” is a low-level primitive with semantic meaning
e.g. walking, running, sitting
This image contains a man walking
- The classification (location is given)
Activity recognition
- The classification and localization
Activity detection
Walking
4. Dense Trajectories (DT) [Wang+, IJCV2013]
• State-of-the-art space-time recognition approach
– State-of-the-art: DT + Deep Learning [THUMOS2015]
– Usable motion analyzer
– Simply, (i) flow tracker (ii) feature vectorization
Large amount of opt. flows
[THUMOS2015] http://www.thumos.info/results.html
5. History of keypoint/traj.-based approach
• Space-time interest points (STIP) – DT
STIP: Space-time interest points
[Laptev et al., IJCV2005]
Dense Trajectories
[Wang et al., CVPR2
[Laptev et al., CVPR2008]
HOG + HOF on STIP
Feature Mining for Activity Recognition
[Gilbert et al., PAMI2011]
Cuboid
Features
[Dollar et al., PETS2005]
STR: Spatio-Temporal Relationship Match
[Ryoo et al., ICCV2009]
[Raptis et al., ECCV2010]
Tracklet Descriptors
6. STIP & DT: Sampling
• Space-time interest points (STIP) – DT
STIP: Space-time interest points
[Laptev et al., IJCV2005]
Dense Trajectories
[Wang et al., CVPR2011]
Action Bank
[Sadanand et al., CVPR2012]
[Laptev et al., CVPR2008]
HOG + HOF on STIP
Feature Mining for Activity Recognition
[Gilbert et al., PAMI2011]
Cuboid
Features
[Dollar et al., PETS2005]
STR: Spatio-Temporal Relationship Match
[Ryoo et al., ICCV2009]
[Raptis et al., ECCV2010]
Tracklet Descriptors
7. Co-occurrence features in DT
• Extended co-occurrence feature (ECoHOG)
– Feature
• CoHOG[Watanabe, PSIVT2009] (pair-count), ECoHOG (edge-magnitude accum.)
• PCA for codeword
• DT+Co-occurrence features (62.4%) > DT (59.2%) on MPII cooking
CoHOG
ECoHOG
H. Kataoka+, “Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity Recognition”, in ACCV2014.
Need for more features!
Pose-based approach
Holistic appraoch
8. Proposal
• Feature evaluation for more better performance
– Evaluation of 13 features at fair settings
– 5 Category
• Trajectory: traj. feature (originally in DT)
• Shape: HOG, SIFT
• Motion: HOF, MBHx, MBHy, MIP
• Texture: HLAC, LBP, iLBP, LTP
• Co-occurrence: CoHOG, ECoHOG
– 4 different datasets
• NTSEL (traffic)
• INRIA surgery (surgery)
• MSR daily activity 3d (daily living)
• UCF50 (sports)
15. Texture
• HLAC, LBP, iLBP, LTP
Higher-order local auto-correlation
0-, 1st-, 2nd- order pattern
Texture binarization in a 3x3 patch, [Ojala+, TPAMI2002]
[Otsu+, IAIP1988] [Kobayashi+, ICPR2004]
16. Co-occurrence
• Extended co-occurrence feature (ECoHOG)
– Feature
• CoHOG[Watanabe, PSIVT2009] (pair-count), ECoHOG (edge-magnitude accum.)
• PCA for codeword
• DT+Co-occurrence features (62.4%) > DT (59.2%) on MPII cooking
CoHOG ECoHOG
H. Kataoka+, “Extended Co-occurrence HOG with Dense Trajectories for Fine-grained Activity Recognition”, in ACCV2014.
17. Experiments
• Evaluation of 13 features in dense trajectory
framework
– 4 different datasets
• Traffic scene (NTSEL dataset): 4 classes
• Surgery (INRIA surgery): 4 classes
• Daily living (MSR daily action 3D): 12 classes
• Sports (UCF50): 50 classes
18. Results on the 4 datasets
• High-performance features
– Top three features at each dataset
– 4 different scenes
19. Results on the 4 datasets
• High-performance features
– CoHOG, SIFT, MBH
– CoHOG is the stable accuracy at all datasets
20. Detailed performance rate
• Depending on recognition task!
– We need to experimentally concatenate several features
– Feature concatenation on the NTSEL and INRIA surgery
21. Rate of feature concatenation
• Baseline, 5 categories and concatenated vector
– Baseline: DT + BoW model
– Motion and co-occurrence feature
– No need to apply all features
22. Conclusion
• We evaluated 13 features in the framework of DT
– For more effective activity recognition
– 4 different scenes at each dataset
– Detailed evaluation and concatenated vectors
– Top-N ranked concatenation is needed for activity recognition
23. Feature extraction
Around trajectories
– Extraction of 13 features in ST-patch
– 2 (x dir.) x 2 (y dir.) x 3 (t dir.) region
– Calculating features with bag-of-words(BoW)
ST-patch and xyt block extraction
13 features extractioin