Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web

We contribute a scalable, open source implementation [1] of the Pooled Time Series (PoT) algorithm from CVPR 2015. e algorithm is evaluated on approximately 6800 human tracking (HT) videos collected from the deep and dark web, and on an open dataset: the Human Motion Database (HMDB). We describe PoT and our motivation for using it on larger data and the issues we encountered. Our new solution reimagines PoT as an Apache Hadoop-based algorithm. We demonstrate that our new Hadoop-based algorithm successfully identies similar videos in the HT and HMDB datasets and we evaluate the algorithm qualitatively and quantitatively.

  • Identifiez-vous pour voir les commentaires

Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web

  1. 1. Information Retrieval and Data Science Paul Ramirez paul.m.ramirez@jpl.nasa.gov Madhav Sharan msharan@usc.edu ICMR 2017, Bucharest 1 Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web Dr. Chris Mattmann mattmann@usc.edu https://github.com/USCDataScience/hadoop-pot
  2. 2. Information Retrieval and Data Science 2 Information Retrieval and Data Science (IRDS) Group University of Southern California, Los Angeles, CA https://irds.usc.edu Dr. Chris Mattmann Director, IRDS Chief Scientist JPL ABOUT Madhav Sharan Graduate Student IRDS/JPL Computer Science for Data Intensive Applications Group Jet Propulsion Laboratory, Pasadena, CA Paul Ramirez Group Supervisor JPL
  3. 3. Information Retrieval and Data Science OUTLINE 1. Introduction 2. Dataset 3. Hadoop PoT 4. Evaluation 5. Video Space 6. Thanks 3
  4. 4. Information Retrieval and Data Science INTRODUCTION • AIM – To create a scalable approach of calculating similarity between all pairs in a set of videos • Built on previous effort by Pooled Time Series (PoT) algorithm from CVPR 2015 by Dr. Michael Ryoo • We present our dataset and use case of video similarity then our journey of scaling algorithm on hadoop 4
  5. 5. Information Retrieval and Data Science DATASET 5
  6. 6. Information Retrieval and Data Science HUMAN TRAFFICKING DATASET HT(Human Trafficking) videos are crawled from internet ads of escorts from backpage.com 1. TOTAL SIZE - 26Gb 2. TOTAL VIDEOS - 6805 3. AVERAGE VIDEO SIZE - 3.8MB 4. TOTAL RECORDING LENGTH ≈ 2250 hr 5. AVERAGE RECORDING LENGTH = 19.8 secs 6
  7. 7. Information Retrieval and Data Science HMDB DATASET HMDB: A large Human Motion DataBase open sourced by serre lab 1. TOTAL SIZE - 1.9Gb 2. TOTAL VIDEOS - 7,000 3. AVERAGE VIDEO SIZE ≈ 0.5 MB 4. TOTAL RECORDING LENGTH ≈ 350 hr 5. AVERAGE RECORDING LENGTH = 3.1 secs This is an open source labeled dataset used for evaluation of similarity algorithm. 7
  8. 8. Information Retrieval and Data Science PoT Similarity 8
  9. 9. Information Retrieval and Data Science FEATURE EXTRACTION 9
  10. 10. Information Retrieval and Data Science SIMILARITY ALGORITHM 1. Permute across whole video set to form all possible pair of videos 2. For each pair - Calculate mean distance a. Calculate HOF and HOG for both videos using OpenCV or use cached. Cache HOF and HOG b. Calculate Pooled time series feature for both videos 3. For each pair - Calculate chi-squared similarity a. Use cache HOF and HOG b. Calculate Pooled time series feature for both videos c. Use mean distance and both series to calculate a similarity score for pair 10
  11. 11. Information Retrieval and Data Science PROBLEMS 1. Out of Memory (OoM) issues 2. Time consuming Sequential Code 3. Instrumentation and Checkpointing 4. Could only process 500 videos in 2 days 11
  12. 12. Information Retrieval and Data Science HADOOP PoT 12
  13. 13. Information Retrieval and Data Science HADOOP JOBS 13
  14. 14. Information Retrieval and Data Science CARTESIAN INPUT FORMAT 14
  15. 15. Information Retrieval and Data Science EVALUATION 15
  16. 16. Information Retrieval and Data Science OBSERVED RUNTIME 16 Total time for all Hadoop jobs : HT - 33.18 hours HMDB - 26.84 hours Time difference as per video length Similar time for different video length
  17. 17. Information Retrieval and Data Science QUALITATIVE EVALUATION 1. Fetch top 5 most similar videos as per PoT 2. Record number of videos with same label (True) 3. Recall = True/Total 4. Every label had highest recall for it’s own label 17
  18. 18. Information Retrieval and Data Science VIDEOSPACE 18
  19. 19. Information Retrieval and Data Science INTRODUCING VIDEOSPACE 19
  20. 20. Information Retrieval and Data Science SEARCH RESULTS PAGE 20
  21. 21. Information Retrieval and Data Science DETAILS POPUP 21
  22. 22. Information Retrieval and Data Science FUTURE WORK 22 1. Preprocessing videos 1. Removing banners at starting of a video 2. Dividing a video into a set of scenes 2. Adding convolutional features to enable object recognition etc.. HOF and HOG are too simple
  23. 23. Information Retrieval and Data Science THANK YOU 23 Questions/Comments? Madhav Sharan msharan@usc.edu @goyal_madhav @smadha Dr. Chris Mattmann mattmann@usc.edu @chrismattmann @chrismattmann https://github.com/USCDataScience/hadoop-pot

×