Publicité
Publicité

Contenu connexe

Publicité

GazeObjectDetection.pptx

  1. GAZE OBJECT DETECTION Amartya Bhattacharya Intern Institute of Datability Science Osaka University Supervisor: Prof. Hajime Nagahara
  2. ABOUT ME 2 I am Amartya Bhattacharya, currently working as an intern at Institute of Datability Science, Osaka University. Graduated with Bachelor degree in Computer Science from University of Calcutta, India. Previously worked in the domain of Computer Vision, Natural Language Processing and Multi-modal models and my research interests are the same. Will present the project on Gaze Object Detection
  3. PROBLEM INTRODUCTION Detect the objects people are gazing at in videos Also detect if a person is looking at some other person present in the video From the Gaze Objects Obtained the interaction between the people can be studied 1 2 3
  4. PREVIOUS WORKS Recasens et al. 2015, proposed a methodology for estimating the gaze point coordinates from image data Chong et al. 2018, proposing an improved model for the same problem, Chong et al. 2020 provided the first spatiotemporal model for Gaze Estimation in videos Wang et al. 2022 provided the first Gaze Object Detection in images and proposed an improved model for gaze estimation added a YOLO v5 late fusion branch to it 4
  5. COMPARISON OF PREVIOUS WORKS NOTE: 1. Models were trained on Gaze on Object Dataset containing only images and not videos 2. Chong et al. 2020 had both spatial and temporal model, for obtaining result on image data, temporal part was removed 5 Models Type of Data Type of Input Angular Error(°) Type of Problem Recasens et al. 2015 Image Image + Head Location ( x, y, w, h) 33.00 Gaze Estimation Chong et al. 2018 Image Image + Head Location ( x, y, w, h) 21.80 Gaze Estimation Chong et al. 2020 Video + Image Image + Head Location ( x, y, w, h) 15.10 Gaze Estimation Wang et al. 2022 Image Image + Head Location (x, y, w, h) 14.90 Gaze Object Detection • Recasens, Adria, et al. "Where are they looking?." Advances in neural information processing systems 28 (2015). • Chong, Eunji, et al. "Detecting attended visual targets in video." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020 • Wang, Binglu, et al. "GaTector: A Unified Framework for Gaze Object Prediction." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022. Center of Head GT pred Angular Error
  6. SCOPE OF IMPROVEMENT No existing model for detecting gaze objects in video data No model for understanding the interaction between people/object or people/people All the existing model requires head position as an input, where the bounding box of head should be marked for all the frames in the video, leads to a tedious task 10 min. Video at 30 fps having 'n' people = n* 30 * 60 *10 = n * 18000 annotations! 6
  7. AUTOMATIC GAZE OBJECT DETECTION 7 Head Tracking Module Input Video Object Detection Model (YOLO v7) Spatiotemporal Model Gaze Object Class Assignme nt Gaze Object WHOLE ARCHITECTURE OF GAZE OBJECT DETECTION
  8. HEAD TRACKING MODULE PART1 - DETECTION 8 Novel head tracking had to be proposed due to unavailability of any such model Based on Object Tracking principle I) detection in initial frame II) detection in the next frame III) relating the objects from current to previous frame Head Detection done using YOLO v5 model trained on Crowdhuman dataset1 1.Shao, Shuai, et al. "Crowdhuman: A benchmark for detecting human in a crowd." arXiv preprint arXiv:1805.00123 (2018).
  9. HEAD TRACKING MODULE PART 2 - DETECTION AND ASSIGNMENT - SOTA object tracking methods involve assignment by comparing the feature vector from the detections obtained - Feature vectors are generally calculated using a ResNet 501 based model - Task of comparing objects across the frames involve re identification - Feature vectors calculated using Omni-Scale Network(SOTA for person re identification)2 - Person Re identification method chosen due to absence of head re identification model - Intuition was the model would extract important feature for re identifying objects across frames and thus tracking - Idea validated after completion of the work by the latest SOTA paper3 - Model successfully tracked and also performed well when occlusions happened 9 1. Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770-778 2. K. Zhou, Y. Yang, A. Cavallaro and T. Xiang, "Learning Generalisable Omni-Scale Representations for Person Re-Identification," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5056-5069, 1 Sept. 2022, doi: 10.1109/TPAMI.2021.3069237. 3. Du, Yunhao, et al. "Strongsort: Make deepsort great again." IEEE Transactions on Multimedia (2023).
  10. HEAD TRACKING MODULE PART3- MODULE SUMMARY AND RESULTS 10 Head Tracking Module Results
  11. OBJECT DETECTION Object Detection using YOLOv7 11
  12. SPATIOTEMPORAL MODEL FOR GAZE ESTIMATION - Chong et al. 20201 paper was implemented for gaze estimation purpose - Only model for Gaze Estimation in videos - The image as well as the head bounding box coordinates are needed as an input to the model - Uses spatial as well as temporal features for gaze estimation - Head bounding box obtained from Head Tracking Module was passed into the model - Trained on Video Attention Dataset1 12 Video Attention Network model for Gaze Estimation 1. Chong, Eunji, et al. "Detecting attended visual targets in video." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020. (x,y)
  13. GAZE ESTIMATION RESULTS 13 Pre-trained spatiotemporal model was used to generate the gaze coordinates Spatial model achieved a 0.2 less angular error than the only Gaze Object Detection Model GaTector1, the gaze object detection model is not suitable of video data
  14. GAZE OBJECT CLASS ASSIGNMENT 14 If a gaze point lies inside a bounding box, assign the gaze object to the label name associated with bounding box If a gaze point lies inside multiple bounding boxes, assign the object class associated with the bounding box whose center is closest to the gaze point
  15. GAZE OBJECT DETECTION RESULTS 15 Gaze Object Detection on Person/Person Interaction
  16. GAZE OBJECT DETECTION RESULT- UCL DATA 16 Gaze Object Detection in UCL Data
  17. GAZE OBJECT DETECTION RESULTS 17 Gaze Object Detection on Public Data
  18. DISCUSSIONS 18 Model is a first of its kind for detection of Gaze Objects in a video Model solves the issue of manually annotating thousands of frames in order to obtain the gaze estimation, observed in the previous works Provides an opportunity to study the interaction of different people in a video
  19. DRAWBACKS 19 - Model was pre-trained on a predefined dataset, the generalization capability was decent - Performance observed to decrease with the decrease in quality of the images - Presence of noise inside images, such as masks or accessories can affect the model
  20. SCOPES OF IMPROVEMENT Model works decent in most of the cases, performance is sensitive to the kind of noise like masks, caps etc. Susceptible to giving false positives The effect was observed in UCL Data where it showed an accuracy of 67% (also the correct metrics to judge, is a part of discussion) Pre-trained models were used, the generalization capability of the model is debatable, a novel model on a new dataset for object detection can improve the performance 20
  21. THANK YOU AMARTYA BHATTACHARYA INTERN INSTITUTE OF DATABILITY SCIENCE OSAKA UNIVERSITY 21
  22. SUPPLEMENTARY – OS NET Model1 learns features at different scales The stacking up of 3*3 convolutions helps to learn multi-scale features In cases of re-id, the multi-scale features prove to be important Features aggregated through various weights, helps in dynamic feature learning 22 1. K. Zhou, Y. Yang, A. Cavallaro and T. Xiang, "Learning Generalisable Omni-Scale Representations for Person Re-Identification," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5056-5069, 1 Sept. 2022, doi: 10.1109/TPAMI.2021.3069237. Bottleneck block for OS Net Whole Architecture
Publicité