SlideShare une entreprise Scribd logo
1  sur  17
HIPI: Computer Vision atLarge Scale Chris Sweeny Liu Liu
Intro to MapReduce SIMD at Scale Mapper / Reducer
MapReduce, Main Takeaway Data Centric, Data Centric, Data Centric!
Hadoop, a Java Impl An Implementation of MapReduce originated from Yahoo! The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834
Computer Vision at Scale The “computational vision” The sheer size of dataset: PCA of Natural Images (1992):  15 images, 4096 patches High-perf Face Detection (2007): 75,000 samples IM2GPS (2008):  6,472,304 images
HIPI Workflow
HIPI Image Bundle Setup Moral of the story: Many small files are killing the performance in distributed file system.
Redo PCA in Natural Images at Scale The first 15 principal components with 15 images (Hancock, 1992):
Redo PCA in Natural Images at Scale Comparison: Hancock, 1992 HIPI, 100 HIPI, 1,000 HIPI, 10,000 HIPI, 100,000
Optimize HIPI Performance Culling: because decompression is costly Decompress at need A boolean cull(ImageHeader header) method for conditional decompression
Culling, to inspect specific camera effects Canon Powershot S500, at 2592x1944
HIPI, Glance at Performance figures An empty job (only decompressing and looping over images), 5 run, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figures Im2gray job (converting images to gray scale), 5 run, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figures Covariance job (compute covariance matrix of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:
HIPI, Glance at Performance figures Culling job (decompressing all images V.S. decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:
Conclusion Everything at large scale gets better. HIPI provides an image-centric interface that performs on par or better than the leading alternative Cull method provides significant improvement and convenience HIPI offers noticeable improvements!
Future work Release HIPI as Opensource Project. Work on deep integration with Hadoop. Making HIPI work-load more configurable. Making work-load more balanced.

Contenu connexe

Similaire à Hipi: Computer Vision at Large Scale

Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the CloudJim O'Neil
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018Masud Rahman
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionCloudera, Inc.
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
 
Sydgraph presentation 2004
Sydgraph presentation 2004Sydgraph presentation 2004
Sydgraph presentation 2004Steve Smith
 
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsAugust 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsYahoo Developer Network
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?J Langley
 
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)Rohith R
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...Edge AI and Vision Alliance
 
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...Jui-Hsin (Larry) Lai
 

Similaire à Hipi: Computer Vision at Large Scale (20)

F07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAMF07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAM
 
Hadoop in the Cloud
Hadoop in the CloudHadoop in the Cloud
Hadoop in the Cloud
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 
RaspberryPiPresentation
RaspberryPiPresentationRaspberryPiPresentation
RaspberryPiPresentation
 
A hadoop map reduce
A hadoop map reduceA hadoop map reduce
A hadoop map reduce
 
NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018NLP2API: Replication package accepted by ICSME 2018
NLP2API: Replication package accepted by ICSME 2018
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Sydgraph presentation 2004
Sydgraph presentation 2004Sydgraph presentation 2004
Sydgraph presentation 2004
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of TradeoffsAugust 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
August 2013 HUG: Compression Options in Hadoop - A Tale of Tradeoffs
 
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?The Big Data Puzzle, Where Does the Eclipse Piece Fit?
The Big Data Puzzle, Where Does the Eclipse Piece Fit?
 
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
Implementing a parallel_open_cv_application_on_raspberry_pi3(1)
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
“High-fidelity Conversion of Floating-point Networks for Low-precision Infere...
 
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
AI+ Remote Sensing: Applying Deep Learning to Image Enhancement, Analytics, a...
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 

Dernier (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Hipi: Computer Vision at Large Scale

  • 1. HIPI: Computer Vision atLarge Scale Chris Sweeny Liu Liu
  • 2. Intro to MapReduce SIMD at Scale Mapper / Reducer
  • 3. MapReduce, Main Takeaway Data Centric, Data Centric, Data Centric!
  • 4. Hadoop, a Java Impl An Implementation of MapReduce originated from Yahoo! The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834
  • 5. Computer Vision at Scale The “computational vision” The sheer size of dataset: PCA of Natural Images (1992): 15 images, 4096 patches High-perf Face Detection (2007): 75,000 samples IM2GPS (2008): 6,472,304 images
  • 7. HIPI Image Bundle Setup Moral of the story: Many small files are killing the performance in distributed file system.
  • 8. Redo PCA in Natural Images at Scale The first 15 principal components with 15 images (Hancock, 1992):
  • 9. Redo PCA in Natural Images at Scale Comparison: Hancock, 1992 HIPI, 100 HIPI, 1,000 HIPI, 10,000 HIPI, 100,000
  • 10. Optimize HIPI Performance Culling: because decompression is costly Decompress at need A boolean cull(ImageHeader header) method for conditional decompression
  • 11. Culling, to inspect specific camera effects Canon Powershot S500, at 2592x1944
  • 12. HIPI, Glance at Performance figures An empty job (only decompressing and looping over images), 5 run, using minimal figure, in seconds, lower is better:
  • 13. HIPI, Glance at Performance figures Im2gray job (converting images to gray scale), 5 run, using minimal figure, in seconds, lower is better:
  • 14. HIPI, Glance at Performance figures Covariance job (compute covariance matrix of patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:
  • 15. HIPI, Glance at Performance figures Culling job (decompressing all images V.S. decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:
  • 16. Conclusion Everything at large scale gets better. HIPI provides an image-centric interface that performs on par or better than the leading alternative Cull method provides significant improvement and convenience HIPI offers noticeable improvements!
  • 17. Future work Release HIPI as Opensource Project. Work on deep integration with Hadoop. Making HIPI work-load more configurable. Making work-load more balanced.