SlideShare une entreprise Scribd logo
1  sur  21
High Performance Parallel Computing with Clouds and Cloud Technologies CloudComp 09 Munich, Germany 1 1,2 Jaliya Ekanayake,    Geoffrey Fox {jekanaya,gcf}@indiana.edu School of Informatics and Computing Pervasive Technology Institute Indiana University Bloomington 1 2
Acknowledgements to: Joe Rinkovsky and Jenett Tillotson at IU UITS SALSA Team - Pervasive Technology Institution, Indiana University Scott Beason Xiaohong Qiu Thilina Gunarathne
Computing in Clouds Eucalyptus (Open source) Commercial Clouds Private Clouds Amazon EC2 3Tera Nimbus GoGrid Xen Some Benefits: On demand allocation of resources (pay per use) Customizable Virtual Machine (VM)s  Any software configuration Root/administrative privileges Provisioning happens in minutes  Compared to hours in traditional job queues Better resource utilization No need to allocated a whole 24 core machine to perform a single threaded R analysis Accessibility to a computation power is no longer a barrier.
Cloud Technologies/Parallel Runtimes Cloud technologies E.g.  Apache Hadoop (MapReduce) Microsoft DryadLINQ  MapReduce++ (earlier known as CGL-MapReduce) Moving computation to data Distributed file systems (HDFS, GFS) Better quality of service (QoS) support Simple communication topologies Most HPC applications use MPI Variety of communication topologies Typically use fast (or dedicated) network settings
Applications & Different Interconnection Patterns Input map iterations Input Input map map Output Pij reduce reduce Domain of MapReduce and Iterative Extensions MPI
MapReduce++ (earlier known as CGL-MapReduce) In memory MapReduce Streaming based communication Avoids file based communication mechanisms Cacheable map/reduce tasks Static data remains in memory Combine phase to combine reductions Extends the MapReduce programming model to iterative MapReduce applications
What I will present next Our experience in applying cloud technologies to: EST (Expressed Sequence Tag) sequence assembly program -CAP3. HEP Processing large columns of physics data using ROOT K-means Clustering Matrix Multiplication Performance analysis of MPI applications using a private cloud environment
Cluster Configurations DryadLINQ Hadoop / MPI/ Eucalyptus
Pleasingly Parallel Applications High Energy  Physics CAP3 Performance of CAP3 Performance of HEP
Iterative Computations K-means Matrix Multiplication Performance of K-Means  Parallel Overhead  Matrix Multiplication
Performance analysis of MPI applications using a private cloud environment Eucalyptus and Xen based private cloud infrastructure  Eucalyptus version 1.4 and Xen version 3.0.3 Deployed on 16 nodes each with 2 Quad Core Intel Xeon processors and 32 GB of memory All nodes are connected via a 1 giga-bit connections Bare-metal and VMs use exactly the same software configurations Red Hat Enterprise Linux Server release 5.2 (Tikanga) operating system. OpenMPI version 1.3.2 with gcc version 4.1.2.
Different Hardware/VM configurations Invariant used in selecting the number of MPI processes Number of MPI processes = Number of CPU cores used
MPI Applications n n n C 1 n n d n 1 d 1
Matrix Multiplication Performance -  64 CPU cores Speedup – Fixed matrix size (5184x5184) Implements Cannon’s Algorithm [1] Exchange large messages More susceptible to bandwidth than latency At least 14% reduction in speedup between bare-metal and 1-VM per node [1] S. Johnsson, T. Harris, and K. Mathur, “Matrix multiplication on the connection machine,” In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Reno, Nevada, United States, November 12 - 17, 1989). Supercomputing '89. ACM, New York, NY, 326-332. DOI= http://doi.acm.org/10.1145/76263.76298
Kmeans Clustering Performance – 128 CPU cores Overhead = (P * T(P) –T(1))/T(1) Up to 40 million 3D data points Amount of communication depends only on the number of cluster centers Amount of communication  << Computation and the amount of data processed At the highest granularity VMs show at least ~33%  of total overhead Extremely large overheads for smaller grain sizes
Concurrent Wave Equation Solver  Overhead = (P * T(P) –T(1))/T(1) Performance -  64 CPU cores Clear difference in performance and overheads between VMs and bare-metal Very small messages (the message size in each MPI_Sendrecv() call is only 8 bytes) More susceptible to latency At 40560 data points, at least ~37% of total overhead in VMs
Higher latencies -1 1-VM per node  8 MPI processes inside the VM 8-VMs per node  1 MPI process inside each VM domUs (VMs that run on top of Xenpara-virtualization) are not capable of performing I/O operations dom0 (privileged OS) schedules and execute I/O operations on behalf of domUs More VMs per node => more scheduling => higher latencies
Higher latencies -2 Kmeans Clustering Lack of support for in-node communication => “Sequentializing” parallel communication Better support for in-node communication in OpenMPI sm BTL (shared memory byte transfer layer) Both OpenMPI and LAM-MPI perform equally well in 8-VMs per node configuration
Conclusions and Future Works Cloud technologies works for most pleasingly parallel applications Runtimes such as MapReduce++ extends MapReduce to iterative MapReduce domain MPI applications experience moderate to high performance degradation (10% ~ 40%) in private cloud Dr. Edward walker noticed  (40% ~ 1000%) performance degradations in commercial clouds [1] Applications sensitive to latencies experience higher overheads Bandwidth does not seem to be an issue in private clouds More VMs per node => Higher overheads In-node communication support is crucial Applications such as MapReduce may perform well on VMs ? [1] Walker, E.: benchmarking Amazon EC2 for high-performance scientific computing, http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf
Questions?
Thank You!

Contenu connexe

Tendances

Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processing
are you
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
Inderjeet Singh
 
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
ijceronline
 

Tendances (20)

Parallel processing
Parallel processingParallel processing
Parallel processing
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processing
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel Computing
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
 
INTRODUCTION TO PARALLEL PROCESSING
INTRODUCTION TO PARALLEL PROCESSINGINTRODUCTION TO PARALLEL PROCESSING
INTRODUCTION TO PARALLEL PROCESSING
 
Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
Introduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed ComputingIntroduction to Parallel and Distributed Computing
Introduction to Parallel and Distributed Computing
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
 
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
The Parallel Architecture Approach, Single Program Multiple Data (Spmd) Imple...
 

En vedette

Full introduction to_parallel_computing
Full introduction to_parallel_computingFull introduction to_parallel_computing
Full introduction to_parallel_computing
Supasit Kajkamhaeng
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
Mohammad Mustaqeem
 
cloud scheduling
cloud schedulingcloud scheduling
cloud scheduling
Mudit Verma
 

En vedette (20)

Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...Comparing Big Data and Simulation Applications and Implications for Software ...
Comparing Big Data and Simulation Applications and Implications for Software ...
 
Geoff Rothman Presentation on Parallel Processing
Geoff Rothman Presentation on Parallel ProcessingGeoff Rothman Presentation on Parallel Processing
Geoff Rothman Presentation on Parallel Processing
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R
 
Genetic Approach to Parallel Scheduling
Genetic Approach to Parallel SchedulingGenetic Approach to Parallel Scheduling
Genetic Approach to Parallel Scheduling
 
Bi criteria scheduling on parallel machines under fuzzy processing time
Bi criteria scheduling on parallel machines under fuzzy processing timeBi criteria scheduling on parallel machines under fuzzy processing time
Bi criteria scheduling on parallel machines under fuzzy processing time
 
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTINGEFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
 
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENTA STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
A STUDY ON JOB SCHEDULING IN CLOUD ENVIRONMENT
 
Nephele efficient parallel data processing in the cloud
Nephele  efficient parallel data processing in the cloudNephele  efficient parallel data processing in the cloud
Nephele efficient parallel data processing in the cloud
 
Full introduction to_parallel_computing
Full introduction to_parallel_computingFull introduction to_parallel_computing
Full introduction to_parallel_computing
 
Cloud Computing
Cloud Computing Cloud Computing
Cloud Computing
 
Parallel and Distributed Computing: BOINC Grid Implementation Paper
Parallel and Distributed Computing: BOINC Grid Implementation PaperParallel and Distributed Computing: BOINC Grid Implementation Paper
Parallel and Distributed Computing: BOINC Grid Implementation Paper
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Task scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud ComputingTask scheduling Survey in Cloud Computing
Task scheduling Survey in Cloud Computing
 
Application of MapReduce in Cloud Computing
Application of MapReduce in Cloud ComputingApplication of MapReduce in Cloud Computing
Application of MapReduce in Cloud Computing
 
cloud scheduling
cloud schedulingcloud scheduling
cloud scheduling
 
Cloud Computing Ppt
Cloud Computing PptCloud Computing Ppt
Cloud Computing Ppt
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 

Similaire à High Performance Parallel Computing with Clouds and Cloud Technologies

Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
NIKHIL NAIR
 
Clusters (Distributed computing)
Clusters (Distributed computing)Clusters (Distributed computing)
Clusters (Distributed computing)
Sri Prasanna
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
Anil Madhavapeddy
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
tech2click
 

Similaire à High Performance Parallel Computing with Clouds and Cloud Technologies (20)

Clustering by AKASHMSHAH
Clustering by AKASHMSHAHClustering by AKASHMSHAH
Clustering by AKASHMSHAH
 
Parallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.pptParallel_and_Cluster_Computing.ppt
Parallel_and_Cluster_Computing.ppt
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Cluster computer
Cluster  computerCluster  computer
Cluster computer
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
mTCP使ってみた
mTCP使ってみたmTCP使ってみた
mTCP使ってみた
 
Clusters (Distributed computing)
Clusters (Distributed computing)Clusters (Distributed computing)
Clusters (Distributed computing)
 
Par com
Par comPar com
Par com
 
Again music
Again musicAgain music
Again music
 
Comparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyComparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization Technology
 
Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)Mirage: ML kernels in the cloud (ML Workshop 2010)
Mirage: ML kernels in the cloud (ML Workshop 2010)
 
Fundamentals
FundamentalsFundamentals
Fundamentals
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
Coding the Continuum
Coding the ContinuumCoding the Continuum
Coding the Continuum
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

High Performance Parallel Computing with Clouds and Cloud Technologies

  • 1. High Performance Parallel Computing with Clouds and Cloud Technologies CloudComp 09 Munich, Germany 1 1,2 Jaliya Ekanayake, Geoffrey Fox {jekanaya,gcf}@indiana.edu School of Informatics and Computing Pervasive Technology Institute Indiana University Bloomington 1 2
  • 2. Acknowledgements to: Joe Rinkovsky and Jenett Tillotson at IU UITS SALSA Team - Pervasive Technology Institution, Indiana University Scott Beason Xiaohong Qiu Thilina Gunarathne
  • 3. Computing in Clouds Eucalyptus (Open source) Commercial Clouds Private Clouds Amazon EC2 3Tera Nimbus GoGrid Xen Some Benefits: On demand allocation of resources (pay per use) Customizable Virtual Machine (VM)s Any software configuration Root/administrative privileges Provisioning happens in minutes Compared to hours in traditional job queues Better resource utilization No need to allocated a whole 24 core machine to perform a single threaded R analysis Accessibility to a computation power is no longer a barrier.
  • 4. Cloud Technologies/Parallel Runtimes Cloud technologies E.g. Apache Hadoop (MapReduce) Microsoft DryadLINQ MapReduce++ (earlier known as CGL-MapReduce) Moving computation to data Distributed file systems (HDFS, GFS) Better quality of service (QoS) support Simple communication topologies Most HPC applications use MPI Variety of communication topologies Typically use fast (or dedicated) network settings
  • 5. Applications & Different Interconnection Patterns Input map iterations Input Input map map Output Pij reduce reduce Domain of MapReduce and Iterative Extensions MPI
  • 6. MapReduce++ (earlier known as CGL-MapReduce) In memory MapReduce Streaming based communication Avoids file based communication mechanisms Cacheable map/reduce tasks Static data remains in memory Combine phase to combine reductions Extends the MapReduce programming model to iterative MapReduce applications
  • 7. What I will present next Our experience in applying cloud technologies to: EST (Expressed Sequence Tag) sequence assembly program -CAP3. HEP Processing large columns of physics data using ROOT K-means Clustering Matrix Multiplication Performance analysis of MPI applications using a private cloud environment
  • 8. Cluster Configurations DryadLINQ Hadoop / MPI/ Eucalyptus
  • 9. Pleasingly Parallel Applications High Energy Physics CAP3 Performance of CAP3 Performance of HEP
  • 10. Iterative Computations K-means Matrix Multiplication Performance of K-Means Parallel Overhead Matrix Multiplication
  • 11. Performance analysis of MPI applications using a private cloud environment Eucalyptus and Xen based private cloud infrastructure Eucalyptus version 1.4 and Xen version 3.0.3 Deployed on 16 nodes each with 2 Quad Core Intel Xeon processors and 32 GB of memory All nodes are connected via a 1 giga-bit connections Bare-metal and VMs use exactly the same software configurations Red Hat Enterprise Linux Server release 5.2 (Tikanga) operating system. OpenMPI version 1.3.2 with gcc version 4.1.2.
  • 12. Different Hardware/VM configurations Invariant used in selecting the number of MPI processes Number of MPI processes = Number of CPU cores used
  • 13. MPI Applications n n n C 1 n n d n 1 d 1
  • 14. Matrix Multiplication Performance - 64 CPU cores Speedup – Fixed matrix size (5184x5184) Implements Cannon’s Algorithm [1] Exchange large messages More susceptible to bandwidth than latency At least 14% reduction in speedup between bare-metal and 1-VM per node [1] S. Johnsson, T. Harris, and K. Mathur, “Matrix multiplication on the connection machine,” In Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Reno, Nevada, United States, November 12 - 17, 1989). Supercomputing '89. ACM, New York, NY, 326-332. DOI= http://doi.acm.org/10.1145/76263.76298
  • 15. Kmeans Clustering Performance – 128 CPU cores Overhead = (P * T(P) –T(1))/T(1) Up to 40 million 3D data points Amount of communication depends only on the number of cluster centers Amount of communication << Computation and the amount of data processed At the highest granularity VMs show at least ~33% of total overhead Extremely large overheads for smaller grain sizes
  • 16. Concurrent Wave Equation Solver Overhead = (P * T(P) –T(1))/T(1) Performance - 64 CPU cores Clear difference in performance and overheads between VMs and bare-metal Very small messages (the message size in each MPI_Sendrecv() call is only 8 bytes) More susceptible to latency At 40560 data points, at least ~37% of total overhead in VMs
  • 17. Higher latencies -1 1-VM per node 8 MPI processes inside the VM 8-VMs per node 1 MPI process inside each VM domUs (VMs that run on top of Xenpara-virtualization) are not capable of performing I/O operations dom0 (privileged OS) schedules and execute I/O operations on behalf of domUs More VMs per node => more scheduling => higher latencies
  • 18. Higher latencies -2 Kmeans Clustering Lack of support for in-node communication => “Sequentializing” parallel communication Better support for in-node communication in OpenMPI sm BTL (shared memory byte transfer layer) Both OpenMPI and LAM-MPI perform equally well in 8-VMs per node configuration
  • 19. Conclusions and Future Works Cloud technologies works for most pleasingly parallel applications Runtimes such as MapReduce++ extends MapReduce to iterative MapReduce domain MPI applications experience moderate to high performance degradation (10% ~ 40%) in private cloud Dr. Edward walker noticed (40% ~ 1000%) performance degradations in commercial clouds [1] Applications sensitive to latencies experience higher overheads Bandwidth does not seem to be an issue in private clouds More VMs per node => Higher overheads In-node communication support is crucial Applications such as MapReduce may perform well on VMs ? [1] Walker, E.: benchmarking Amazon EC2 for high-performance scientific computing, http://www.usenix.org/publications/login/2008-10/openpdfs/walker.pdf