SlideShare une entreprise Scribd logo
1  sur  23
Information Retrieval
and Data Science
Paul Ramirez
paul.m.ramirez@jpl.nasa.gov
Madhav Sharan
msharan@usc.edu
ICMR 2017, Bucharest 1
Scalable Hadoop-Based Pooled
Time Series of Big Video Data
from the Deep Web
Dr. Chris Mattmann
mattmann@usc.edu
https://github.com/USCDataScience/hadoop-pot
Information Retrieval
and Data Science
2
Information Retrieval and Data Science (IRDS) Group
University of Southern California, Los Angeles, CA https://irds.usc.edu
Dr. Chris Mattmann
Director, IRDS
Chief Scientist JPL
ABOUT
Madhav Sharan
Graduate Student IRDS/JPL
Computer Science for Data Intensive Applications Group
Jet Propulsion Laboratory, Pasadena, CA
Paul Ramirez
Group Supervisor JPL
Information Retrieval
and Data Science
OUTLINE
1. Introduction
2. Dataset
3. Hadoop PoT
4. Evaluation
5. Video Space
6. Thanks
3
Information Retrieval
and Data Science
INTRODUCTION
• AIM – To create a scalable approach of calculating similarity between all pairs in a
set of videos
• Built on previous effort by Pooled Time Series (PoT) algorithm from CVPR 2015 by
Dr. Michael Ryoo
• We present our dataset and use case of video similarity then our journey of scaling
algorithm on hadoop
4
Information Retrieval
and Data Science
DATASET
5
Information Retrieval
and Data Science
HUMAN TRAFFICKING DATASET
HT(Human Trafficking) videos are crawled from internet ads of escorts from
backpage.com
1. TOTAL SIZE - 26Gb
2. TOTAL VIDEOS - 6805
3. AVERAGE VIDEO SIZE - 3.8MB
4. TOTAL RECORDING LENGTH ≈ 2250 hr
5. AVERAGE RECORDING LENGTH = 19.8 secs
6
Information Retrieval
and Data Science
HMDB DATASET
HMDB: A large Human Motion DataBase open sourced by serre lab
1. TOTAL SIZE - 1.9Gb
2. TOTAL VIDEOS - 7,000
3. AVERAGE VIDEO SIZE ≈ 0.5 MB
4. TOTAL RECORDING LENGTH ≈ 350 hr
5. AVERAGE RECORDING LENGTH = 3.1 secs
This is an open source labeled dataset used for evaluation of similarity algorithm.
7
Information Retrieval
and Data Science
PoT Similarity
8
Information Retrieval
and Data Science
FEATURE EXTRACTION
9
Information Retrieval
and Data Science
SIMILARITY ALGORITHM
1. Permute across whole video set to form all possible pair of videos
2. For each pair - Calculate mean distance
a. Calculate HOF and HOG for both videos using OpenCV or use cached. Cache HOF and HOG
b. Calculate Pooled time series feature for both videos
3. For each pair - Calculate chi-squared similarity
a. Use cache HOF and HOG
b. Calculate Pooled time series feature for both videos
c. Use mean distance and both series to calculate a similarity score for pair
10
Information Retrieval
and Data Science
PROBLEMS
1. Out of Memory (OoM) issues
2. Time consuming Sequential Code
3. Instrumentation and Checkpointing
4. Could only process 500 videos in 2 days
11
Information Retrieval
and Data Science
HADOOP PoT
12
Information Retrieval
and Data Science
HADOOP JOBS
13
Information Retrieval
and Data Science
CARTESIAN INPUT FORMAT
14
Information Retrieval
and Data Science
EVALUATION
15
Information Retrieval
and Data Science
OBSERVED RUNTIME
16
Total time for all Hadoop jobs :
HT - 33.18 hours
HMDB - 26.84 hours
Time difference as per
video length
Similar time for different
video length
Information Retrieval
and Data Science
QUALITATIVE EVALUATION
1. Fetch top 5 most similar videos as per PoT
2. Record number of videos with same label (True)
3. Recall = True/Total
4. Every label had highest recall for it’s own label
17
Information Retrieval
and Data Science
VIDEOSPACE
18
Information Retrieval
and Data Science
INTRODUCING VIDEOSPACE
19
Information Retrieval
and Data Science
SEARCH RESULTS PAGE
20
Information Retrieval
and Data Science
DETAILS POPUP
21
Information Retrieval
and Data Science
FUTURE WORK
22
1. Preprocessing videos
1. Removing banners at starting of a video
2. Dividing a video into a set of scenes
2. Adding convolutional features to enable object recognition etc.. HOF and HOG
are too simple
Information Retrieval
and Data Science
THANK YOU
23
Questions/Comments?
Madhav Sharan
msharan@usc.edu
@goyal_madhav
@smadha
Dr. Chris Mattmann
mattmann@usc.edu
@chrismattmann
@chrismattmann
https://github.com/USCDataScience/hadoop-pot

Contenu connexe

Similaire à Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web

Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayLarry Smarr
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformLarry Smarr
 
Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Larry Smarr
 
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...Larry Smarr
 
Frank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forwardLarry Smarr
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research PlatformLarry Smarr
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemLarry Smarr
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster IJECEIAES
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardLarry Smarr
 
PRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGPRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGLarry Smarr
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data Surya Saha
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research PlatformLarry Smarr
 
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksDynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksLinh Ngo
 
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningThe Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningLarry Smarr
 
Global Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureGlobal Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureLarry Smarr
 
Real time video copy detection based on hadoop
Real time video copy detection based on hadoopReal time video copy detection based on hadoop
Real time video copy detection based on hadoopSachin Tripathi
 
Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV
 
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryThe Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryIntel IT Center
 

Similaire à Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web (20)

Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data Superhighway
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 
Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020Berkeley cloud computing meetup may 2020
Berkeley cloud computing meetup may 2020
 
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...
 
afternoon3.pdf
afternoon3.pdfafternoon3.pdf
afternoon3.pdf
 
Frank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forwardFrank Würthwein - NRP and the Path forward
Frank Würthwein - NRP and the Path forward
 
The Pacific Research Platform
The Pacific Research PlatformThe Pacific Research Platform
The Pacific Research Platform
 
Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025Looking Back, Looking Forward NSF CI Funding 1985-2025
Looking Back, Looking Forward NSF CI Funding 1985-2025
 
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemThe Pacific Research Platform: A Science-Driven Big-Data Freeway System
The Pacific Research Platform: A Science-Driven Big-Data Freeway System
 
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster
 
PRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path ForwardPRP, NRP, GRP & the Path Forward
PRP, NRP, GRP & the Path Forward
 
PRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSGPRP, CHASE-CI, TNRP and OSG
PRP, CHASE-CI, TNRP and OSG
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
 
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware FrameworksDynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
 
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-LearningThe Pacific Research Platform Enables Distributed Big-Data Machine-Learning
The Pacific Research Platform Enables Distributed Big-Data Machine-Learning
 
Global Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, FutureGlobal Research Platforms: Past, Present, Future
Global Research Platforms: Past, Present, Future
 
Real time video copy detection based on hadoop
Real time video copy detection based on hadoopReal time video copy detection based on hadoop
Real time video copy detection based on hadoop
 
Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...Fast object re-detection and localization in video for spatio-temporal fragme...
Fast object re-detection and localization in video for spatio-temporal fragme...
 
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryThe Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
 

Dernier

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 

Dernier (20)

KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 

Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web

  • 1. Information Retrieval and Data Science Paul Ramirez paul.m.ramirez@jpl.nasa.gov Madhav Sharan msharan@usc.edu ICMR 2017, Bucharest 1 Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web Dr. Chris Mattmann mattmann@usc.edu https://github.com/USCDataScience/hadoop-pot
  • 2. Information Retrieval and Data Science 2 Information Retrieval and Data Science (IRDS) Group University of Southern California, Los Angeles, CA https://irds.usc.edu Dr. Chris Mattmann Director, IRDS Chief Scientist JPL ABOUT Madhav Sharan Graduate Student IRDS/JPL Computer Science for Data Intensive Applications Group Jet Propulsion Laboratory, Pasadena, CA Paul Ramirez Group Supervisor JPL
  • 3. Information Retrieval and Data Science OUTLINE 1. Introduction 2. Dataset 3. Hadoop PoT 4. Evaluation 5. Video Space 6. Thanks 3
  • 4. Information Retrieval and Data Science INTRODUCTION • AIM – To create a scalable approach of calculating similarity between all pairs in a set of videos • Built on previous effort by Pooled Time Series (PoT) algorithm from CVPR 2015 by Dr. Michael Ryoo • We present our dataset and use case of video similarity then our journey of scaling algorithm on hadoop 4
  • 5. Information Retrieval and Data Science DATASET 5
  • 6. Information Retrieval and Data Science HUMAN TRAFFICKING DATASET HT(Human Trafficking) videos are crawled from internet ads of escorts from backpage.com 1. TOTAL SIZE - 26Gb 2. TOTAL VIDEOS - 6805 3. AVERAGE VIDEO SIZE - 3.8MB 4. TOTAL RECORDING LENGTH ≈ 2250 hr 5. AVERAGE RECORDING LENGTH = 19.8 secs 6
  • 7. Information Retrieval and Data Science HMDB DATASET HMDB: A large Human Motion DataBase open sourced by serre lab 1. TOTAL SIZE - 1.9Gb 2. TOTAL VIDEOS - 7,000 3. AVERAGE VIDEO SIZE ≈ 0.5 MB 4. TOTAL RECORDING LENGTH ≈ 350 hr 5. AVERAGE RECORDING LENGTH = 3.1 secs This is an open source labeled dataset used for evaluation of similarity algorithm. 7
  • 8. Information Retrieval and Data Science PoT Similarity 8
  • 9. Information Retrieval and Data Science FEATURE EXTRACTION 9
  • 10. Information Retrieval and Data Science SIMILARITY ALGORITHM 1. Permute across whole video set to form all possible pair of videos 2. For each pair - Calculate mean distance a. Calculate HOF and HOG for both videos using OpenCV or use cached. Cache HOF and HOG b. Calculate Pooled time series feature for both videos 3. For each pair - Calculate chi-squared similarity a. Use cache HOF and HOG b. Calculate Pooled time series feature for both videos c. Use mean distance and both series to calculate a similarity score for pair 10
  • 11. Information Retrieval and Data Science PROBLEMS 1. Out of Memory (OoM) issues 2. Time consuming Sequential Code 3. Instrumentation and Checkpointing 4. Could only process 500 videos in 2 days 11
  • 12. Information Retrieval and Data Science HADOOP PoT 12
  • 13. Information Retrieval and Data Science HADOOP JOBS 13
  • 14. Information Retrieval and Data Science CARTESIAN INPUT FORMAT 14
  • 15. Information Retrieval and Data Science EVALUATION 15
  • 16. Information Retrieval and Data Science OBSERVED RUNTIME 16 Total time for all Hadoop jobs : HT - 33.18 hours HMDB - 26.84 hours Time difference as per video length Similar time for different video length
  • 17. Information Retrieval and Data Science QUALITATIVE EVALUATION 1. Fetch top 5 most similar videos as per PoT 2. Record number of videos with same label (True) 3. Recall = True/Total 4. Every label had highest recall for it’s own label 17
  • 18. Information Retrieval and Data Science VIDEOSPACE 18
  • 19. Information Retrieval and Data Science INTRODUCING VIDEOSPACE 19
  • 20. Information Retrieval and Data Science SEARCH RESULTS PAGE 20
  • 21. Information Retrieval and Data Science DETAILS POPUP 21
  • 22. Information Retrieval and Data Science FUTURE WORK 22 1. Preprocessing videos 1. Removing banners at starting of a video 2. Dividing a video into a set of scenes 2. Adding convolutional features to enable object recognition etc.. HOF and HOG are too simple
  • 23. Information Retrieval and Data Science THANK YOU 23 Questions/Comments? Madhav Sharan msharan@usc.edu @goyal_madhav @smadha Dr. Chris Mattmann mattmann@usc.edu @chrismattmann @chrismattmann https://github.com/USCDataScience/hadoop-pot