Presentation held at the 18th Workshop on Advances in Parallel and Distributed Computational Models - 2015
Abstract - Scientific workflows have become the mainstream to conduct large-scale scientific research. In the meantime, cloud computing has emerged as an alternative computing paradigm. In this paper, we conduct an analysis of the performance of an I/O-intensive real scientific workflow on cloud environments using makespan (the turnaround time for a workflow to complete its execution) as the key performance metric. In particular, we assess the impact of varying the storage configurations on workflow performance when executing on Google Cloud and Amazon Web Services. We aim to understand the performance bottlenecks of the popular cloud-based execution environments. Experimental results show significant differences in application performance for different configurations. They also reveal that Amazon Web Services outperforms Google Cloud with equivalent application and system configurations. We then investigate the root cause of these results using provenance data and by benchmarking disk and network I/O on both infrastructures. Lastly, we also suggest modifications in the standard cloud storage APIs, which will reduce the makespan for I/O-intensive workflows.
More information: www.rafaelsilva.com
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Performance Analysis of an I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services
1. PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW
EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES
Hassan Nawaz, Gideon Juve, Rafael Ferreira da Silva, Ewa Deelman
USC Information Sciences Institute
18th Workshop on Advances in Parallel and Distributed Computational Models
30th IEEE International Parallel & Distributed Processing Symposium
May 23, 2016 – Chicago, USA
2. OUTLINE
Introduction Experiment Conditions Storage Configurations
Evaluation Performance Benchmarking Summary
Scientific workflows
Motivation
Goals
Scientific application
Execution environment
Conclusions
Future Research Directions
Cloud storage
Virtual machine storage
Submit host
Network I/O
SSD I/O
Makespan
Cumulative execution time
Data transfer time
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services
2Pegasus
3. 3
INTRODUCTION
> >
Scientific Workflows
Large scale computations
Provenance
Data-intensive Workflows
Characterized by tasks that consume
large volumes of data, and the application
makespan is dominated by the processing
of data movement and I/O operations
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web Services
job
dependency
Usually data dependencies
split
merge
pipeline
Command-line programs
DAGdirected-acyclic graphs
Pegasus
4. 4
MOTIVATION
> > >
Scientific Workflows
Campus clusters
National cyberinfrastructures
Cloud Computing
Predictable performance
Quality of the service
Challenge: run data-intensive application on clouds
Clouds were not designed for the execution of
complex simulations and I/O intensive applications
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Advantages
On-demand resource provisioning
Ability to store virtual machines (VM)
Resource monitoring
Full control of the execution
environment
5. 5
GOALS
OCTOBER 2014
Therefore…
There is an incentive for researchers to explore
avenues to reduce the cost of executing
workflows, while increasing their efficiency
In this work…
Practical evaluation of the performance of an
I/O-intensive scientific workflow (Montage)
Cloud environments:
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Cost efficiency
Application deadline
Energy-aware scheduling
On-demand resource provisioning
…
Google
Compute Engine
6. 6
SCIENTIFIC APPLICATION
>
Montage
10,429 jobs
reads 4.2 GB of input data
produces 7.9 GB of output data
Montage is an astronomy application that creates astronomical
image mosaics using data collected from telescopes
An illustrative representation of the
Montage workflow
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
A workflow instance
operates over about
23K intermediate files,
where most of them
have a few MBs
6
6190
2 199
5931
10598
0
3000
6000
9000
0 256 4K 64K 512K 4M
File Size (Bytes)
#Files
>
7. EXECUTION ENVIRONMENT
Pegasus Workflow Management System
http://pegasus.isi.edu
Automates complex, multi-stage processing pipelines
Enables parallel, distributed computations
Automatically executes data transfers
Handles failures with to provide reliability
Amazon: m3.2xlarge
Google: n1-standard-8
8 cores per node, 30GB of memory,
and 70GB SSD
7
VM Instance Types cleanup job
Removes unused data
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
>
stage-in job
stage-out job
registration job
Transfers the workflow input data
Transfers the workflow output data
Registers the workflow output data
8. DATA MOVEMENT TOOLS
aws-cli
8
In order to measure the actual overhead involved on data transfers, we
initially limit the transfer mechanisms to single-threaded mode
gsutil pegasus-s3
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Amazon standard client Google standard client
Pegasus standard client built on top of
standard Amazon API
9. 9
STORAGE CONFIGURATION DEPLOYMENTS
However, storing all intermediate files in a
storage service may be costly
Intermediate files are stored into object storage.
It is expected to be the most commonly used in cloud
environments due to its simplicity
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
object
storage
submit host
(e.g., user’s laptop)
10. 10
STORAGE CONFIGURATION DEPLOYMENTS
Although it may reduce the monetary cost, it may
not be scalable for very large workflow executions
Intermediate files are stored locally
to disks attached to the VM
Helps quantify the overhead of other storage configurations
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
local disk
Virtual Machine B
Virtual Machine A
submit host
(e.g., user’s laptop)
local disk
11. 11
STORAGE CONFIGURATION DEPLOYMENTS
Unlikely to be used in real production workflows due to the
high latency in transferring data to the submit host
Intermediate files are stored at the submit host
Useful in low cost scenarios, or local
analyses in the intermediate results
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
submit host
(e.g., user’s laptop)
Typical OSG sites
Open Science Grid
12. 12
OVERALL MAKESPAN EVALUATION
The VM storage configuration outperforms all other
configurations due to the absence of transferring
intermediate files
Average workflow makespan, the turnaround time for a
workflow to complete its execution
Average makespan values for 3 runs of the Montage Workflow for
different storage configurations
>
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
The performance gain on storing data locally is 400% when
compared to the Cloud Storage configuration, and up to
580% in relation to the Submit Host configuration
13. 13
EVALUATION
The execution time for the VM Storage configuration
is larger due to the execution of local data movement
operations (waiting for concurrent I/O operations)
Data transfer operations become a bottleneck in
the Cloud Storage and Submit Host configurations
VMStorage
SubmitHostCloudStorage
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
0
4000
8000
12000
Makespan Execution Time Transfer Time
Time(s)
Amazon
Google
14. 14
BENCHMARKING
Network I/O
Evaluated in the Cloud Storage scenario
(files are stored in an object store)
>
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
●
●
●
●●●
●
●
●
●
●●
●●●
●
●
●●●
●
●●●
●
●●
●
●●●
●●
●
●
●
●
●
●●●
●
●
●●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●●
●●
1
Hourly Downloads
Log10(Timeinseconds)
● aws−cli gsutil pegasus−s3
Time series download times from an object store to a VM
(May 12, 2015 to May 18, 2015)
The goal in transferring an empty file is to measure
the overhead induced by the system
The merge job stages in over 6K small files with
average size of 0.3KB
The performance of these operations are of utmost
importance
15. 15
NETWORK I/O
pegasus-s3 has better performance
for small file sizes in mostly cases
Poor performance of the gsutil tool may include network
latency and increased load
Upload
Download
10
20
30
0B 10KB 100KB 1MB 10MB 100MB 1GB
File Size
Time(s)
aws−cli
gsutil
pegasus−s3
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
10
20
0B 10KB 100KB 1MB 10MB 100MB 1GB
File Size
Time(s)
aws−cli
gsutil
pegasus−s3
16. 16
NETWORK I/O
We used tcpdump and wireshark to trace TCP packets
We ran the transfer tools in the debug mode and
evaluated all request operationsBytes per 0.01s transferred per TCP connection
<
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
aws-cli uses https over all operations, which generates an
additional overhead of Transport Level Security (TLS)●●●●●●●●
●
●●●●●●●●●
●
●●●●●●●●●●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0
3000
6000
9000
Time (s)
Bytespertick(0.01s)
● pegasus−s3 aws−cli (TCP 1) aws−cli (TCP 2)
Amazon client (aws-cli) uses two TCP connections to perform
a copy command, while pegasus-s3 uses only one
TLS overheadGET request gsutil also establishes two TCP connections, but two GET
requests
17. 17
BENCHMARKING
Amazon provides a consistent baseline
throughput of 3 IOPS per GB and handles
bursts up to 3000 IOPS per volume
Executed dd with a block size of 4MB and one thousand blocks
Performed 100 sequential iterations
I/OThroughput
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
255075100
10
20
30
40
50
60
70
80
90
100
Iteration
Throughput(MB/s)
● Google Amazon
SSD I/O
Evaluated in the VM Storage scenario
(files are stored in an attached disk)
> Amazon General Purpose SSD / Google Persistent SSD
Amazon’s burst
tolerance policy
Alternative solution: Provisioned IOPS SSD Volume
Drawback: may significantly increase the cost
18. MULTI-THREADED DATA TRANSFER
Single-threaded Mode
Facilitates the detection/evaluation of performance issues
Not often used in production environments
18
Multi-threaded Mode
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
>
0
5000
10000
15000
Single−thread Multi−threaded
Makespan(s)
Amazon
Google
Average workflow makespan for 3 runs of the Montage workflow
using singe- and multi-threaded mode for data transfer
Workflow runs using the Cloud Storage configuration
Gradually increased the number of threads used to
execute the transfer operations. A reduction in the
makespan was observed up to 5 threads
Makespan for Amazon is 21% lower, while for Google the
improvement is of 32%
19. >
SUMMARY
Summary Future Research Directions
Conclusion
Future Research Directions
For workflows that operate over a large number of (small) files, the
performance may be poor. A possible solution to mitigate this overhead is
to use a bulk mechanism to concatenate and manage files within a single
transfer request (similar to multipart strategy)
The performance difference among data transfer clients is mostly due to
the number of connections established to perform a transfer operation. If
secured connections is not a requirement, not using them could
significantly increase the performance
19
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
20. >
SUMMARY
Summary Future Research Directions
Conclusion
Future Research Directions
Cloud computing is constantly evolving
20
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
Time series download times from an object store to a VM
(Oct 28, 2015 to Dec 02, 2015)
21. >
SUMMARY
Summary Contributions
Conclusion
Future Research Directions
An evaluation of the impact of varying storage configurations on the
performance of an I/O-intensive workflow
A quantitative analysis of application performance on popular cloud
systems using provenance data
A comprehensive analysis of benchmarking file transfer times of different
sizes using different cloud tools
A discussion on indicators that would significantly improve the performance
of I/O-intensive workflows on cloud environments
21
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman
Performance Analysis ofan I/O-Intensive Workflow executing on Google Cloud and Amazon Web ServicesPegasus
22. PERFORMANCE ANALYSIS OF AN I/O-INTENSIVE WORKFLOW
EXECUTING ON GOOGLE CLOUD AND AMAZON WEB SERVICES
Rafael Ferreira da Silva, Ph.D.
Computer Scientist, USC Information Sciences Institute
rafsilva@isi.edu – http://rafaelsilva.com
Thank You
Questions?
http://pegasus.isi.edu
H. Nawaz, G. Juve, R. Ferreira da Silva, E. Deelman