SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW
EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES
Hassan Nawaz, Gideon Juve, Rafael Ferreira da Silva, Ewa Deelman
USC Information Sciences Institute
18th Workshop on Advances in Parallel and Distributed Computational Models
30th IEEE International Parallel & Distributed Processing Symposium
May 23, 2016 – Chicago, USA
OUTLINE
Introduction Experiment Conditions Storage Configurations
Evaluation Performance Benchmarking Summary
Scientific workflows
Motivation
Goals
Scientific application
Execution environment
Conclusions
Future Research Directions
Cloud storage
Virtual machine storage
Submit host
Network I/O
SSD I/O
Makespan
Cumulative execution time
Data transfer time
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services
2Pegasus
3
INTRODUCTION
>	>
Scientific Workflows
Large scale computations
Provenance
Data-intensive Workflows
Characterized by tasks that consume
large volumes of data, and the application
makespan is dominated by the processing
of data movement and I/O operations
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services
job
dependency
Usually	data	dependencies
split
merge
pipeline
Command-line	programs
DAGdirected-acyclic graphs
Pegasus
4
MOTIVATION
>	>	>
Scientific Workflows
Campus clusters
National cyberinfrastructures
Cloud Computing
Predictable performance
Quality of the service
Challenge: run data-intensive application on clouds
Clouds were not designed for the execution of
complex simulations and I/O intensive applications
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Advantages
On-demand resource provisioning
Ability to store virtual machines (VM)
Resource monitoring
Full control of the execution
environment
5
GOALS
OCTOBER 2014
Therefore…
There is an incentive for researchers to explore
avenues to reduce the cost of executing
workflows, while increasing their efficiency
In this work…
Practical evaluation of the performance of an
I/O-intensive scientific workflow (Montage)
Cloud environments:
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Cost efficiency
Application deadline
Energy-aware scheduling
On-demand resource provisioning
…
Google
Compute Engine
6
SCIENTIFIC APPLICATION
>
Montage
10,429 jobs
reads 4.2 GB of input data
produces 7.9 GB of output data
Montage is an astronomy application that creates astronomical
image mosaics using data collected from telescopes
An illustrative representation of the
Montage workflow
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
A workflow instance
operates over about
23K intermediate files,
where most of them
have a few MBs
6
6190
2 199
5931
10598
0
3000
6000
9000
0 256 4K 64K 512K 4M
File Size (Bytes)
#Files
>
EXECUTION ENVIRONMENT
Pegasus Workflow Management System
http://pegasus.isi.edu
Automates complex, multi-stage processing pipelines
Enables parallel, distributed computations
Automatically executes data transfers
Handles failures with to provide reliability
Amazon: m3.2xlarge
Google: n1-standard-8
8 cores per node, 30GB of memory,
and 70GB SSD
7
VM Instance Types cleanup job
Removes	unused	data
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
>
stage-in job
stage-out job
registration job
Transfers	the	workflow	input	data
Transfers	the	workflow	output	data
Registers	the	workflow	output	data
DATA MOVEMENT TOOLS
aws-cli
8
In order to measure the actual overhead involved on data transfers, we
initially limit the transfer mechanisms to single-threaded mode
gsutil pegasus-s3
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Amazon standard client Google standard client
Pegasus standard client built on top of
standard Amazon API
9
STORAGE CONFIGURATION DEPLOYMENTS
However, storing all intermediate files in a
storage service may be costly
Intermediate files are stored into object storage.
It is expected to be the most commonly used in cloud
environments due to its simplicity
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
object
storage
submit host
(e.g.,	user’s	laptop)
10
STORAGE CONFIGURATION DEPLOYMENTS
Although it may reduce the monetary cost, it may
not be scalable for very large workflow executions
Intermediate files are stored locally
to disks attached to the VM
Helps quantify the overhead of other storage configurations
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
local	disk
Virtual	Machine	B
Virtual	Machine	A
submit host
(e.g.,	user’s	laptop)
local	disk
11
STORAGE CONFIGURATION DEPLOYMENTS
Unlikely to be used in real production workflows due to the
high latency in transferring data to the submit host
Intermediate files are stored at the submit host
Useful in low cost scenarios, or local
analyses in the intermediate results
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
submit host
(e.g.,	user’s	laptop)
Typical OSG sites
Open	Science	 Grid
12
OVERALL MAKESPAN EVALUATION
The VM storage configuration outperforms all other
configurations due to the absence of transferring
intermediate files
Average workflow makespan, the turnaround time for a
workflow to complete its execution
Average makespan values for 3 runs of the Montage Workflow for
different storage configurations
>
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
The performance gain on storing data locally is 400% when
compared to the Cloud Storage configuration, and up to
580% in relation to the Submit Host configuration
13
EVALUATION
The execution time for the VM Storage configuration
is larger due to the execution of local data movement
operations (waiting for concurrent I/O operations)
Data transfer operations become a bottleneck in
the Cloud Storage and Submit Host configurations
VMStorage
SubmitHostCloudStorage
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
0
4000
8000
12000
Makespan Execution Time Transfer Time
Time(s)
Amazon
Google
14
BENCHMARKING
Network I/O
Evaluated in the Cloud Storage scenario
(files are stored in an object store)
>
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
●
●
●
●●●
●
●
●
●
●●
●●●
●
●
●●●
●
●●●
●
●●
●
●●●
●●
●
●
●
●
●
●●●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●●
●●
1
Hourly Downloads
Log10(Timeinseconds)
● aws−cli gsutil pegasus−s3
Time series download times from an object store to a VM
(May 12, 2015 to May 18, 2015)
The goal in transferring an empty file is to measure
the overhead induced by the system
The merge job stages in over 6K small files with
average size of 0.3KB
The performance of these operations are of utmost
importance
15
NETWORK I/O
pegasus-s3 has better performance
for small file sizes in mostly cases
Poor performance of the gsutil tool may include network
latency and increased load
Upload
Download
10
20
30
0B 10KB 100KB 1MB 10MB 100MB 1GB
File Size
Time(s)
aws−cli
gsutil
pegasus−s3
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
10
20
0B 10KB 100KB 1MB 10MB 100MB 1GB
File Size
Time(s)
aws−cli
gsutil
pegasus−s3
16
NETWORK I/O
We used tcpdump and wireshark to trace TCP packets
We ran the transfer tools in the debug mode and
evaluated all request operationsBytes per 0.01s transferred per TCP connection
<
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
aws-cli uses https over all operations, which generates an
additional overhead of Transport Level Security (TLS)●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0
3000
6000
9000
Time (s)
Bytespertick(0.01s)
● pegasus−s3 aws−cli (TCP 1) aws−cli (TCP 2)
Amazon client (aws-cli) uses two TCP connections to perform
a copy command, while pegasus-s3 uses only one
TLS overheadGET request gsutil also establishes two TCP connections, but two GET
requests
17
BENCHMARKING
Amazon provides a consistent baseline
throughput of 3 IOPS per GB and handles
bursts up to 3000 IOPS per volume
Executed dd with a block size of 4MB and one thousand blocks
Performed 100 sequential iterations
I/OThroughput
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
255075100
10
20
30
40
50
60
70
80
90
100
Iteration
Throughput(MB/s)
● Google Amazon
SSD I/O
Evaluated in the VM Storage scenario
(files are stored in an attached disk)
> Amazon General Purpose SSD / Google Persistent SSD
Amazon’s burst
tolerance policy
Alternative solution: Provisioned IOPS SSD Volume
Drawback: may significantly increase the cost
MULTI-THREADED DATA TRANSFER
Single-threaded Mode
Facilitates the detection/evaluation of performance issues
Not often used in production environments
18
Multi-threaded Mode
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
>
0
5000
10000
15000
Single−thread Multi−threaded
Makespan(s)
Amazon
Google
Average workflow makespan for 3 runs of the Montage workflow
using singe- and multi-threaded mode for data transfer
Workflow runs using the Cloud Storage configuration
Gradually increased the number of threads used to
execute the transfer operations. A reduction in the
makespan was observed up to 5 threads
Makespan for Amazon is 21% lower, while for Google the
improvement is of 32%
>
SUMMARY
Summary Future Research Directions
Conclusion
Future Research Directions
For workflows that operate over a large number of (small) files, the
performance may be poor. A possible solution to mitigate this overhead is
to use a bulk mechanism to concatenate and manage files within a single
transfer request (similar to multipart strategy)
The performance difference among data transfer clients is mostly due to
the number of connections established to perform a transfer operation. If
secured connections is not a requirement, not using them could
significantly increase the performance
19
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
>
SUMMARY
Summary Future Research Directions
Conclusion
Future Research Directions
Cloud computing is constantly evolving
20
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Time series download times from an object store to a VM
(Oct 28, 2015 to Dec 02, 2015)
>
SUMMARY
Summary Contributions
Conclusion
Future Research Directions
An evaluation of the impact of varying storage configurations on the
performance of an I/O-intensive workflow
A quantitative analysis of application performance on popular cloud
systems using provenance data
A comprehensive analysis of benchmarking file transfer times of different
sizes using different cloud tools
A discussion on indicators that would significantly improve the performance
of I/O-intensive workflows on cloud environments
21
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW
EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES
Rafael Ferreira da Silva, Ph.D.
Computer Scientist, USC Information Sciences Institute
rafsilva@isi.edu – http://rafaelsilva.com
Thank You
Questions?
http://pegasus.isi.edu
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman

Contenu connexe

Tendances

Whitepaper: Where did my CPU go?
Whitepaper: Where did my CPU go?Whitepaper: Where did my CPU go?
Whitepaper: Where did my CPU go?
Kristofferson A
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Databricks
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
DataWorks Summit
 
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationWhitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Kristofferson A
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 

Tendances (20)

Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - ClouderaHadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
Hadoop World 2011: Hadoop Troubleshooting 101 - Kate Ting - Cloudera
 
Spark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg SchadSpark Summit EU talk by Jorg Schad
Spark Summit EU talk by Jorg Schad
 
Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010Hadoop for Scientific Workloads__HadoopSummit2010
Hadoop for Scientific Workloads__HadoopSummit2010
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Hadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspectiveHadoop Scheduling - a 7 year perspective
Hadoop Scheduling - a 7 year perspective
 
Powering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big DataPowering a Virtual Power Station with Big Data
Powering a Virtual Power Station with Big Data
 
Whitepaper: Where did my CPU go?
Whitepaper: Where did my CPU go?Whitepaper: Where did my CPU go?
Whitepaper: Where did my CPU go?
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma TangOptimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
Optimal Strategies for Large Scale Batch ETL Jobs with Emma Tang
 
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
Modus operandi of Spark Streaming - Recipes for Running your Streaming Applic...
 
A Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen FanA Developer’s View into Spark's Memory Model with Wenchen Fan
A Developer’s View into Spark's Memory Model with Wenchen Fan
 
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and VisualizationWhitepaper: Mining the AWR repository for Capacity Planning and Visualization
Whitepaper: Mining the AWR repository for Capacity Planning and Visualization
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
Towards True Elasticity of Spark-(Michael Le and Min Li, IBM)
 
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
A Spark Framework For &lt; $100, &lt; 1 Hour, Accurate Personalized DNA Analy...
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 

Similaire à Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
Alluxio, Inc.
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
Wei Ting Chen
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
programmermag
 
Slide 1
Slide 1Slide 1
Slide 1
butest
 

Similaire à Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services (20)

Tombolo
TomboloTombolo
Tombolo
 
From limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiencyFrom limited Hadoop compute capacity to increased data scientist efficiency
From limited Hadoop compute capacity to increased data scientist efficiency
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 
Aspirus Enterprise Backup Assessment And Implementation Of Avamar
Aspirus Enterprise Backup Assessment And Implementation Of AvamarAspirus Enterprise Backup Assessment And Implementation Of Avamar
Aspirus Enterprise Backup Assessment And Implementation Of Avamar
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
AWS Webinar 23 - Getting Started with AWS - Understanding total cost of owner...
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Serverless Data Science
Serverless Data ScienceServerless Data Science
Serverless Data Science
 
Rafiq_Resume
Rafiq_ResumeRafiq_Resume
Rafiq_Resume
 
Champion Fas Deduplication
Champion Fas DeduplicationChampion Fas Deduplication
Champion Fas Deduplication
 
Slide 1
Slide 1Slide 1
Slide 1
 
Giga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching OverviewGiga Spaces Data Grid / Data Caching Overview
Giga Spaces Data Grid / Data Caching Overview
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Everything comes in 3's
Everything comes in 3'sEverything comes in 3's
Everything comes in 3's
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStack
 
iguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30thiguazio - nuclio Meetup Nov 30th
iguazio - nuclio Meetup Nov 30th
 
Best practices: Backup and Recovery for Windows Workloads
Best practices: Backup and Recovery for Windows WorkloadsBest practices: Backup and Recovery for Windows Workloads
Best practices: Backup and Recovery for Windows Workloads
 

Plus de Rafael Ferreira da Silva

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
Rafael Ferreira da Silva
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
Rafael Ferreira da Silva
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Rafael Ferreira da Silva
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Rafael Ferreira da Silva
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Rafael Ferreira da Silva
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
Rafael Ferreira da Silva
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
Rafael Ferreira da Silva
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
Rafael Ferreira da Silva
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
Rafael Ferreira da Silva
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Rafael Ferreira da Silva
 

Plus de Rafael Ferreira da Silva (20)

Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...Towards an Infrastructure for Enabling Systematic Development and Research of...
Towards an Infrastructure for Enabling Systematic Development and Research of...
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
 
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
Good Practices for Developing Scientific Software Frameworks: The WRENCH fram...
 
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...WorkflowHub: Community Framework for Enabling  Scientific Workflow Research a...
WorkflowHub: Community Framework for Enabling Scientific Workflow Research a...
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific WorkflowsAccurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
Accurately Simulating Energy Consumption of I/O-intensive Scientific Workflows
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
WRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation WorkbenchWRENCH: Workflow Management System Simulation Workbench
WRENCH: Workflow Management System Simulation Workbench
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific WorkflowsOn the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
On the Use of Burst Buffers for Accelerating Data-Intensive Scientific Workflows
 
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
Using Simple PID Controllers to Prevent and Mitigate Faults in Scientific Wor...
 
Automating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific WorkflowsAutomating Environmental Computing Applications with Scientific Workflows
Automating Environmental Computing Applications with Scientific Workflows
 
Analysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTCAnalysis of User Submission Behavior on HPC and HTC
Analysis of User Submission Behavior on HPC and HTC
 
Task Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and WorkflowsTask Resource Consumption Prediction for Scientific Applications and Workflows
Task Resource Consumption Prediction for Scientific Applications and Workflows
 
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
Characterizing a High Throughput Computing Workload: The Compact Muon Solenoi...
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
 
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
A Unified Approach for Modeling and Optimization of Energy, Makespan and Reli...
 
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific WorkflowsLeveraging Semantics to Improve Reproducibility in Scientific Workflows
Leveraging Semantics to Improve Reproducibility in Scientific Workflows
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
Toward Fine-Grained Online Task Characteristics Estimation in Scientific Work...
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 

Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services

  • 1. PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES Hassan Nawaz, Gideon Juve, Rafael Ferreira da Silva, Ewa Deelman USC Information Sciences Institute 18th Workshop on Advances in Parallel and Distributed Computational Models 30th IEEE International Parallel & Distributed Processing Symposium May 23, 2016 – Chicago, USA
  • 2. OUTLINE Introduction Experiment Conditions Storage Configurations Evaluation Performance Benchmarking Summary Scientific workflows Motivation Goals Scientific application Execution environment Conclusions Future Research Directions Cloud storage Virtual machine storage Submit host Network I/O SSD I/O Makespan Cumulative execution time Data transfer time H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services 2Pegasus
  • 3. 3 INTRODUCTION > > Scientific Workflows Large scale computations Provenance Data-intensive Workflows Characterized by tasks that consume large volumes of data, and the application makespan is dominated by the processing of data movement and I/O operations H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services job dependency Usually data dependencies split merge pipeline Command-line programs DAGdirected-acyclic graphs Pegasus
  • 4. 4 MOTIVATION > > > Scientific Workflows Campus clusters National cyberinfrastructures Cloud Computing Predictable performance Quality of the service Challenge: run data-intensive application on clouds Clouds were not designed for the execution of complex simulations and I/O intensive applications H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus Advantages On-demand resource provisioning Ability to store virtual machines (VM) Resource monitoring Full control of the execution environment
  • 5. 5 GOALS OCTOBER 2014 Therefore… There is an incentive for researchers to explore avenues to reduce the cost of executing workflows, while increasing their efficiency In this work… Practical evaluation of the performance of an I/O-intensive scientific workflow (Montage) Cloud environments: H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus Cost efficiency Application deadline Energy-aware scheduling On-demand resource provisioning … Google Compute Engine
  • 6. 6 SCIENTIFIC APPLICATION > Montage 10,429 jobs reads 4.2 GB of input data produces 7.9 GB of output data Montage is an astronomy application that creates astronomical image mosaics using data collected from telescopes An illustrative representation of the Montage workflow H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus A workflow instance operates over about 23K intermediate files, where most of them have a few MBs 6 6190 2 199 5931 10598 0 3000 6000 9000 0 256 4K 64K 512K 4M File Size (Bytes) #Files >
  • 7. EXECUTION ENVIRONMENT Pegasus Workflow Management System http://pegasus.isi.edu Automates complex, multi-stage processing pipelines Enables parallel, distributed computations Automatically executes data transfers Handles failures with to provide reliability Amazon: m3.2xlarge Google: n1-standard-8 8 cores per node, 30GB of memory, and 70GB SSD 7 VM Instance Types cleanup job Removes unused data H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus > stage-in job stage-out job registration job Transfers the workflow input data Transfers the workflow output data Registers the workflow output data
  • 8. DATA MOVEMENT TOOLS aws-cli 8 In order to measure the actual overhead involved on data transfers, we initially limit the transfer mechanisms to single-threaded mode gsutil pegasus-s3 H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus Amazon standard client Google standard client Pegasus standard client built on top of standard Amazon API
  • 9. 9 STORAGE CONFIGURATION DEPLOYMENTS However, storing all intermediate files in a storage service may be costly Intermediate files are stored into object storage. It is expected to be the most commonly used in cloud environments due to its simplicity H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus object storage submit host (e.g., user’s laptop)
  • 10. 10 STORAGE CONFIGURATION DEPLOYMENTS Although it may reduce the monetary cost, it may not be scalable for very large workflow executions Intermediate files are stored locally to disks attached to the VM Helps quantify the overhead of other storage configurations H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus local disk Virtual Machine B Virtual Machine A submit host (e.g., user’s laptop) local disk
  • 11. 11 STORAGE CONFIGURATION DEPLOYMENTS Unlikely to be used in real production workflows due to the high latency in transferring data to the submit host Intermediate files are stored at the submit host Useful in low cost scenarios, or local analyses in the intermediate results H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus submit host (e.g., user’s laptop) Typical OSG sites Open Science Grid
  • 12. 12 OVERALL MAKESPAN EVALUATION The VM storage configuration outperforms all other configurations due to the absence of transferring intermediate files Average workflow makespan, the turnaround time for a workflow to complete its execution Average makespan values for 3 runs of the Montage Workflow for different storage configurations > H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus The performance gain on storing data locally is 400% when compared to the Cloud Storage configuration, and up to 580% in relation to the Submit Host configuration
  • 13. 13 EVALUATION The execution time for the VM Storage configuration is larger due to the execution of local data movement operations (waiting for concurrent I/O operations) Data transfer operations become a bottleneck in the Cloud Storage and Submit Host configurations VMStorage SubmitHostCloudStorage H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus 0 4000 8000 12000 Makespan Execution Time Transfer Time Time(s) Amazon Google
  • 14. 14 BENCHMARKING Network I/O Evaluated in the Cloud Storage scenario (files are stored in an object store) > H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus ● ● ● ●●● ● ● ● ● ●● ●●● ● ● ●●● ● ●●● ● ●● ● ●●● ●● ● ● ● ● ● ●●● ● ● ●●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ●● ● ●● ●●● ●● 1 Hourly Downloads Log10(Timeinseconds) ● aws−cli gsutil pegasus−s3 Time series download times from an object store to a VM (May 12, 2015 to May 18, 2015) The goal in transferring an empty file is to measure the overhead induced by the system The merge job stages in over 6K small files with average size of 0.3KB The performance of these operations are of utmost importance
  • 15. 15 NETWORK I/O pegasus-s3 has better performance for small file sizes in mostly cases Poor performance of the gsutil tool may include network latency and increased load Upload Download 10 20 30 0B 10KB 100KB 1MB 10MB 100MB 1GB File Size Time(s) aws−cli gsutil pegasus−s3 H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus 10 20 0B 10KB 100KB 1MB 10MB 100MB 1GB File Size Time(s) aws−cli gsutil pegasus−s3
  • 16. 16 NETWORK I/O We used tcpdump and wireshark to trace TCP packets We ran the transfer tools in the debug mode and evaluated all request operationsBytes per 0.01s transferred per TCP connection < H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus aws-cli uses https over all operations, which generates an additional overhead of Transport Level Security (TLS)●●●●●●●● ● ●●●●●●●●● ● ●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0 3000 6000 9000 Time (s) Bytespertick(0.01s) ● pegasus−s3 aws−cli (TCP 1) aws−cli (TCP 2) Amazon client (aws-cli) uses two TCP connections to perform a copy command, while pegasus-s3 uses only one TLS overheadGET request gsutil also establishes two TCP connections, but two GET requests
  • 17. 17 BENCHMARKING Amazon provides a consistent baseline throughput of 3 IOPS per GB and handles bursts up to 3000 IOPS per volume Executed dd with a block size of 4MB and one thousand blocks Performed 100 sequential iterations I/OThroughput H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 255075100 10 20 30 40 50 60 70 80 90 100 Iteration Throughput(MB/s) ● Google Amazon SSD I/O Evaluated in the VM Storage scenario (files are stored in an attached disk) > Amazon General Purpose SSD / Google Persistent SSD Amazon’s burst tolerance policy Alternative solution: Provisioned IOPS SSD Volume Drawback: may significantly increase the cost
  • 18. MULTI-THREADED DATA TRANSFER Single-threaded Mode Facilitates the detection/evaluation of performance issues Not often used in production environments 18 Multi-threaded Mode H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus > 0 5000 10000 15000 Single−thread Multi−threaded Makespan(s) Amazon Google Average workflow makespan for 3 runs of the Montage workflow using singe- and multi-threaded mode for data transfer Workflow runs using the Cloud Storage configuration Gradually increased the number of threads used to execute the transfer operations. A reduction in the makespan was observed up to 5 threads Makespan for Amazon is 21% lower, while for Google the improvement is of 32%
  • 19. > SUMMARY Summary Future Research Directions Conclusion Future Research Directions For workflows that operate over a large number of (small) files, the performance may be poor. A possible solution to mitigate this overhead is to use a bulk mechanism to concatenate and manage files within a single transfer request (similar to multipart strategy) The performance difference among data transfer clients is mostly due to the number of connections established to perform a transfer operation. If secured connections is not a requirement, not using them could significantly increase the performance 19 H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
  • 20. > SUMMARY Summary Future Research Directions Conclusion Future Research Directions Cloud computing is constantly evolving 20 H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus Time series download times from an object store to a VM (Oct 28, 2015 to Dec 02, 2015)
  • 21. > SUMMARY Summary Contributions Conclusion Future Research Directions An evaluation of the impact of varying storage configurations on the performance of an I/O-intensive workflow A quantitative analysis of application performance on popular cloud systems using provenance data A comprehensive analysis of benchmarking file transfer times of different sizes using different cloud tools A discussion on indicators that would significantly improve the performance of I/O-intensive workflows on cloud environments 21 H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
  • 22. PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES Rafael Ferreira da Silva, Ph.D. Computer Scientist, USC Information Sciences Institute rafsilva@isi.edu – http://rafaelsilva.com Thank You Questions? http://pegasus.isi.edu H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman