Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web

•Télécharger en tant que PPTX, PDF•

1 j'aime•400 vues

We contribute a scalable, open source implementation [1] of the Pooled Time Series (PoT) algorithm from CVPR 2015. e algorithm is evaluated on approximately 6800 human tracking (HT) videos collected from the deep and dark web, and on an open dataset: the Human Motion Database (HMDB). We describe PoT and our motivation for using it on larger data and the issues we encountered. Our new solution reimagines PoT as an Apache Hadoop-based algorithm. We demonstrate that our new Hadoop-based algorithm successfully identies similar videos in the HT and HMDB datasets and we evaluate the algorithm qualitatively and quantitatively.

Ingénierie

Information Retrieval
and Data Science
Paul Ramirez
paul.m.ramirez@jpl.nasa.gov
Madhav Sharan
msharan@usc.edu
ICMR 2017, Bucharest 1
Scalable Hadoop-Based Pooled
Time Series of Big Video Data
from the Deep Web
Dr. Chris Mattmann
mattmann@usc.edu
https://github.com/USCDataScience/hadoop-pot

Information Retrieval
and Data Science
2
Information Retrieval and Data Science (IRDS) Group
University of Southern California, Los Angeles, CA https://irds.usc.edu
Dr. Chris Mattmann
Director, IRDS
Chief Scientist JPL
ABOUT
Madhav Sharan
Graduate Student IRDS/JPL
Computer Science for Data Intensive Applications Group
Jet Propulsion Laboratory, Pasadena, CA
Paul Ramirez
Group Supervisor JPL

Information Retrieval
and Data Science
OUTLINE
1. Introduction
2. Dataset
3. Hadoop PoT
4. Evaluation
5. Video Space
6. Thanks
3

Information Retrieval
and Data Science
INTRODUCTION
• AIM – To create a scalable approach of calculating similarity between all pairs in a
set of videos
• Built on previous effort by Pooled Time Series (PoT) algorithm from CVPR 2015 by
Dr. Michael Ryoo
• We present our dataset and use case of video similarity then our journey of scaling
algorithm on hadoop
4

Information Retrieval
and Data Science
DATASET
5

Information Retrieval
and Data Science
HUMAN TRAFFICKING DATASET
HT(Human Trafficking) videos are crawled from internet ads of escorts from
backpage.com
1. TOTAL SIZE - 26Gb
2. TOTAL VIDEOS - 6805
3. AVERAGE VIDEO SIZE - 3.8MB
4. TOTAL RECORDING LENGTH ≈ 2250 hr
5. AVERAGE RECORDING LENGTH = 19.8 secs
6

Information Retrieval
and Data Science
HMDB DATASET
HMDB: A large Human Motion DataBase open sourced by serre lab
1. TOTAL SIZE - 1.9Gb
2. TOTAL VIDEOS - 7,000
3. AVERAGE VIDEO SIZE ≈ 0.5 MB
4. TOTAL RECORDING LENGTH ≈ 350 hr
5. AVERAGE RECORDING LENGTH = 3.1 secs
This is an open source labeled dataset used for evaluation of similarity algorithm.
7

Information Retrieval
and Data Science
PoT Similarity
8

Information Retrieval
and Data Science
FEATURE EXTRACTION
9

Information Retrieval
and Data Science
SIMILARITY ALGORITHM
1. Permute across whole video set to form all possible pair of videos
2. For each pair - Calculate mean distance
a. Calculate HOF and HOG for both videos using OpenCV or use cached. Cache HOF and HOG
b. Calculate Pooled time series feature for both videos
3. For each pair - Calculate chi-squared similarity
a. Use cache HOF and HOG
b. Calculate Pooled time series feature for both videos
c. Use mean distance and both series to calculate a similarity score for pair
10

Information Retrieval
and Data Science
PROBLEMS
1. Out of Memory (OoM) issues
2. Time consuming Sequential Code
3. Instrumentation and Checkpointing
4. Could only process 500 videos in 2 days
11

Information Retrieval
and Data Science
HADOOP PoT
12

Information Retrieval
and Data Science
HADOOP JOBS
13

Information Retrieval
and Data Science
CARTESIAN INPUT FORMAT
14

Information Retrieval
and Data Science
EVALUATION
15

Information Retrieval
and Data Science
OBSERVED RUNTIME
16
Total time for all Hadoop jobs :
HT - 33.18 hours
HMDB - 26.84 hours
Time difference as per
video length
Similar time for different
video length

Information Retrieval
and Data Science
QUALITATIVE EVALUATION
1. Fetch top 5 most similar videos as per PoT
2. Record number of videos with same label (True)
3. Recall = True/Total
4. Every label had highest recall for it’s own label
17

Information Retrieval
and Data Science
VIDEOSPACE
18

Information Retrieval
and Data Science
INTRODUCING VIDEOSPACE
19

Information Retrieval
and Data Science
SEARCH RESULTS PAGE
20

Information Retrieval
and Data Science
DETAILS POPUP
21

Information Retrieval
and Data Science
FUTURE WORK
22
1. Preprocessing videos
1. Removing banners at starting of a video
2. Dividing a video into a set of scenes
2. Adding convolutional features to enable object recognition etc.. HOF and HOG
are too simple

Information Retrieval
and Data Science
THANK YOU
23
Questions/Comments?
Madhav Sharan
msharan@usc.edu
@goyal_madhav
@smadha
Dr. Chris Mattmann
mattmann@usc.edu
@chrismattmann
@chrismattmann
https://github.com/USCDataScience/hadoop-pot

Recommandé

2019 IRIS-HEP AS workshop: Particles and decaysHenry Schreiner

Deep Learning using OpenPOWERGanesan Narayanasamy

2014 moore-dddc.titus.brown

The Exabyte Journey and DataBrew with CICDShu-Jeng Hsieh

The PRP and Its ApplicationsLarry Smarr

Larry Smarr - NRP Application DriversLarry Smarr

A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...machawk1

IPWB and IPFS at WAC2017David Dias

Recommandé

2019 IRIS-HEP AS workshop: Particles and decaysHenry Schreiner

Deep Learning using OpenPOWERGanesan Narayanasamy

2014 moore-dddc.titus.brown

The Exabyte Journey and DataBrew with CICDShu-Jeng Hsieh

The PRP and Its ApplicationsLarry Smarr

Larry Smarr - NRP Application DriversLarry Smarr

A Collaborative, Secure, and Private InterPlanetary Wayback Web Archiving Sys...machawk1

IPWB and IPFS at WAC2017David Dias

Creating a Science-Driven Big Data SuperhighwayLarry Smarr

Security Challenges and the Pacific Research PlatformLarry Smarr

Berkeley cloud computing meetup may 2020Larry Smarr

Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...Larry Smarr

afternoon3.pdfWinnieChu21

Frank Würthwein - NRP and the Path forwardLarry Smarr

The Pacific Research PlatformLarry Smarr

Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr

The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemLarry Smarr

Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster IJECEIAES

PRP, NRP, GRP & the Path ForwardLarry Smarr

PRP, CHASE-CI, TNRP and OSGLarry Smarr

Quality Control of NGS Data Surya Saha

Toward a National Research PlatformLarry Smarr

Dynamic Provisioning of Data Intensive Computing Middleware FrameworksLinh Ngo

The Pacific Research Platform Enables Distributed Big-Data Machine-LearningLarry Smarr

Global Research Platforms: Past, Present, FutureLarry Smarr

Real time video copy detection based on hadoopSachin Tripathi

Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV

The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryIntel IT Center

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

Contenu connexe

Similaire à Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web

Creating a Science-Driven Big Data SuperhighwayLarry Smarr

Security Challenges and the Pacific Research PlatformLarry Smarr

Berkeley cloud computing meetup may 2020Larry Smarr

Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...Larry Smarr

afternoon3.pdfWinnieChu21

Frank Würthwein - NRP and the Path forwardLarry Smarr

The Pacific Research PlatformLarry Smarr

Looking Back, Looking Forward NSF CI Funding 1985-2025Larry Smarr

The Pacific Research Platform: A Science-Driven Big-Data Freeway SystemLarry Smarr

Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster IJECEIAES

PRP, NRP, GRP & the Path ForwardLarry Smarr

PRP, CHASE-CI, TNRP and OSGLarry Smarr

Quality Control of NGS Data Surya Saha

Toward a National Research PlatformLarry Smarr

Dynamic Provisioning of Data Intensive Computing Middleware FrameworksLinh Ngo

The Pacific Research Platform Enables Distributed Big-Data Machine-LearningLarry Smarr

Global Research Platforms: Past, Present, FutureLarry Smarr

Real time video copy detection based on hadoopSachin Tripathi

Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV

The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryIntel IT Center

Similaire à Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web (20)

Creating a Science-Driven Big Data Superhighway

Security Challenges and the Pacific Research Platform

Berkeley cloud computing meetup may 2020

Advanced Global-Scale Networking Supporting Data-Intensive Artificial Intelli...

afternoon3.pdf

Frank Würthwein - NRP and the Path forward

The Pacific Research Platform

Looking Back, Looking Forward NSF CI Funding 1985-2025

The Pacific Research Platform: A Science-Driven Big-Data Freeway System

Post Event Investigation of Multi-stream Video Data Utilizing Hadoop Cluster

PRP, NRP, GRP & the Path Forward

PRP, CHASE-CI, TNRP and OSG

Quality Control of NGS Data

Toward a National Research Platform

Dynamic Provisioning of Data Intensive Computing Middleware Frameworks

The Pacific Research Platform Enables Distributed Big-Data Machine-Learning

Global Research Platforms: Past, Present, Future

Real time video copy detection based on hadoop

Fast object re-detection and localization in video for spatio-temporal fragme...

The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery

Dernier

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

UNIT-II FMM-Flow Through Circular Conduitsrknatarajan

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Introduction to Multiple Access Protocol.pptxupamatechverse

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Dernier (20)

KubeKraft presentation @CloudNativeHooghly

Microscopic Analysis of Ceramic Materials.pptx

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

Coefficient of Thermal Expansion and their Importance.pptx

OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

Introduction to IEEE STANDARDS and its different types.pptx

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik

UNIT-II FMM-Flow Through Circular Conduits

SPICE PARK APR2024 ( 6,793 SPICE Models )

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

Introduction to Multiple Access Protocol.pptx

Roadmap to Membership of RICS - Pathways and Routes

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web

1. Information Retrieval and Data Science Paul Ramirez paul.m.ramirez@jpl.nasa.gov Madhav Sharan msharan@usc.edu ICMR 2017, Bucharest 1 Scalable Hadoop-Based Pooled Time Series of Big Video Data from the Deep Web Dr. Chris Mattmann mattmann@usc.edu https://github.com/USCDataScience/hadoop-pot

2. Information Retrieval and Data Science 2 Information Retrieval and Data Science (IRDS) Group University of Southern California, Los Angeles, CA https://irds.usc.edu Dr. Chris Mattmann Director, IRDS Chief Scientist JPL ABOUT Madhav Sharan Graduate Student IRDS/JPL Computer Science for Data Intensive Applications Group Jet Propulsion Laboratory, Pasadena, CA Paul Ramirez Group Supervisor JPL

3. Information Retrieval and Data Science OUTLINE 1. Introduction 2. Dataset 3. Hadoop PoT 4. Evaluation 5. Video Space 6. Thanks 3

4. Information Retrieval and Data Science INTRODUCTION • AIM – To create a scalable approach of calculating similarity between all pairs in a set of videos • Built on previous effort by Pooled Time Series (PoT) algorithm from CVPR 2015 by Dr. Michael Ryoo • We present our dataset and use case of video similarity then our journey of scaling algorithm on hadoop 4

5. Information Retrieval and Data Science DATASET 5

6. Information Retrieval and Data Science HUMAN TRAFFICKING DATASET HT(Human Trafficking) videos are crawled from internet ads of escorts from backpage.com 1. TOTAL SIZE - 26Gb 2. TOTAL VIDEOS - 6805 3. AVERAGE VIDEO SIZE - 3.8MB 4. TOTAL RECORDING LENGTH ≈ 2250 hr 5. AVERAGE RECORDING LENGTH = 19.8 secs 6

7. Information Retrieval and Data Science HMDB DATASET HMDB: A large Human Motion DataBase open sourced by serre lab 1. TOTAL SIZE - 1.9Gb 2. TOTAL VIDEOS - 7,000 3. AVERAGE VIDEO SIZE ≈ 0.5 MB 4. TOTAL RECORDING LENGTH ≈ 350 hr 5. AVERAGE RECORDING LENGTH = 3.1 secs This is an open source labeled dataset used for evaluation of similarity algorithm. 7

8. Information Retrieval and Data Science PoT Similarity 8

9. Information Retrieval and Data Science FEATURE EXTRACTION 9

10. Information Retrieval and Data Science SIMILARITY ALGORITHM 1. Permute across whole video set to form all possible pair of videos 2. For each pair - Calculate mean distance a. Calculate HOF and HOG for both videos using OpenCV or use cached. Cache HOF and HOG b. Calculate Pooled time series feature for both videos 3. For each pair - Calculate chi-squared similarity a. Use cache HOF and HOG b. Calculate Pooled time series feature for both videos c. Use mean distance and both series to calculate a similarity score for pair 10

11. Information Retrieval and Data Science PROBLEMS 1. Out of Memory (OoM) issues 2. Time consuming Sequential Code 3. Instrumentation and Checkpointing 4. Could only process 500 videos in 2 days 11

12. Information Retrieval and Data Science HADOOP PoT 12

13. Information Retrieval and Data Science HADOOP JOBS 13

14. Information Retrieval and Data Science CARTESIAN INPUT FORMAT 14

15. Information Retrieval and Data Science EVALUATION 15

16. Information Retrieval and Data Science OBSERVED RUNTIME 16 Total time for all Hadoop jobs : HT - 33.18 hours HMDB - 26.84 hours Time difference as per video length Similar time for different video length

17. Information Retrieval and Data Science QUALITATIVE EVALUATION 1. Fetch top 5 most similar videos as per PoT 2. Record number of videos with same label (True) 3. Recall = True/Total 4. Every label had highest recall for it’s own label 17

18. Information Retrieval and Data Science VIDEOSPACE 18

19. Information Retrieval and Data Science INTRODUCING VIDEOSPACE 19

20. Information Retrieval and Data Science SEARCH RESULTS PAGE 20

21. Information Retrieval and Data Science DETAILS POPUP 21

22. Information Retrieval and Data Science FUTURE WORK 22 1. Preprocessing videos 1. Removing banners at starting of a video 2. Dividing a video into a set of scenes 2. Adding convolutional features to enable object recognition etc.. HOF and HOG are too simple

23. Information Retrieval and Data Science THANK YOU 23 Questions/Comments? Madhav Sharan msharan@usc.edu @goyal_madhav @smadha Dr. Chris Mattmann mattmann@usc.edu @chrismattmann @chrismattmann https://github.com/USCDataScience/hadoop-pot