e-Science is scientific research enabled by widely distributed computational resources in collaboration among several institutes. One of issues in making use of e-Science infrastructure is to define complex workflows (composition of many tasks and their dependencies). We propose to employ Rake as a workflow definition language. In contrast to Makefile, Rake is an internal DSL and takes advantage of Ruby's scripting power which requires to define complex scientific workflows. In order to execute Rake workflow on distributed computer resources, we develop Pwrake, an Parallel Workflow extension for Rake. Pwrake is designed to work on Gfarm, a wide-area distributed file system. Gfarm provides a unified filesystem and consistent file time stamps among distributed computers, and also high performance of parallel I/O. We show a powerfulness of Rake as a workflow language and the scalable performance of distributed computing using Pwrake and Gfarm.
2. Masahiro Tanaka
NArray author
majored in Astronomy
Research fellow in Computer Science
◦ at Center for Computational Sciences,
University of Tsukuba
◦ since 2009
2010-11-14 2RubyConf X
3. Research Fields
◦ Computer Science:
High Performance Computing
Computational Informatics
◦ Computational Science:
Particle Physics, Astrophysics,
Material Science, Life Science,
Biology, Environmental Science
2010-11-14 3RubyConf X
12. Computationally intensive science that
is carried out in highly distributed
network environments,
or
Science that uses immense data sets
that require grid computing
◦ (Wikipedia).
2010-11-14 12RubyConf X
13. The term was created by John Taylor,
◦ Director General of the United Kingdom's Office
of Science and Technology
◦ in 1999
2010-11-14RubyConf X 13
14. is a key issue for e-Science.
2010-11-14 14RubyConf X
15. Performance of single core does no
more increase.
2010-11-14 15RubyConf X
18. Parallelize your program
Scalability is an key issue
2010-11-14RubyConf X 18
19. P : parallelizable
1-P : sequential
N : # of processors
Speed-up formula :
2010-11-14RubyConf X 19
Number of Processors
Speedup
1
1 − P + P/N
P<1
1
1 − P
20. MapReduce
MPI
OpenMP
thread
Parallel programming languages
process
2010-11-14RubyConf X 20
21. Independent processes can be
parallelized
Without parallel programming
Workflow System is required
2010-11-14RubyConf X 21
input1
program
output1
...input2
program
output2
input3
program
output3
input4
program
output4
...
...
23. Description of
procedures
It is like building a
program
2010-11-14RubyConf X 23
cc –c –o a.o a.c
a.c b.c
b.oa.o
cc –o prog …
prog
cc –c –o b.o b.c
28. ◦ invoke task based on
dependency
◦ assign a task to an
available computer
◦ parallel execution for
independent tasks
2010-11-14RubyConf X 28
Process
Process
Process
Workflow
System
Workflow
definition
29. for Grid Computing
2010-11-14RubyConf X 29
• DAGMan
• Pegasus
• Triana
• ICENI
• Taverna
• GrADS
• GridFlow
• UNICORE
• Globus workflow
• Askalan
• Karajan
• Kepler
from “A Taxonomy of Scientific
Workflow Systems for Grid Computing”
Jia Yu and Rajkumar Buyya (2005)
30. Define DAG in XML
◦ Human cannot write
complex XML.
◦ Need to write a program
to generate XML
2010-11-14RubyConf X 30
<adag xmlns="http://www.griphyn.org/chimera/DAX"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.griphyn.org/chimera/DAX
http://www.griphyn.org/chimera/dax-1.8.xsd"
count="1" index="0" name="test">
<filename file="2mass-atlas-981204n-j0160056.fits"
link="input"/>
…
<job id="ID000001" name="mProject" version="3.0" level="11" dv-
name="mProject1" dv-version="1.0">
<argument>
<filename file="2mass-atlas-981204n-j0160056.fits"/>
<filename file="p2mass-atlas-981204n-j0160056.fits"/>
<filename file="templateTMP_AAAaaa01.hdr"/>
</argument>
<uses file="2mass-atlas-981204n-j0160056.fits" link="input"
dontRegister="false" dontTransfer="false"/>
<uses file="p2mass-atlas-981204n-j0160056.fits" link="output"
dontRegister="true" dontTransfer="true"
temporaryHint="tmp"/>
<uses file="p2mass-atlas-981204n-j0160056_area.fits"
link="output" dontRegister="true" dontTransfer="true"
temporaryHint="tmp"/>
<uses file="templateTMP_AAAaaa01.hdr" link="input"
dontRegister="false" dontTransfer="false"/>
</job>
…
<child ref="ID003006">
<parent ref="ID000001"/>
<parent ref="ID000006"/>
</child>
DAG XML
31. DSL to define task dependency
Rule
◦ define multiple tasks at once
◦ avoid redundancy
Skip finished tasks
◦ based on timestamp of file
2010-11-14RubyConf X 31
32. Grid Explorer : Grid and Cluster shell
◦ http://www.logos.ic.i.u-tokyo.ac.jp/gxp/
◦ written in Python.
GXP Make
◦ GNU Make-based workflow system
◦ Distributed & Parallel execution
2010-11-14RubyConf X 32
33. Makefile
◦ same input files
◦ same tasks
◦ executed repeatedly
Scientific Workflows have different
aspects.
2010-11-14 33RubyConf X
34. Same workflow for different files
◦ “rule” may solve, but is not enough.
2010-11-14RubyConf X 34
File set 1
file file file
File set 2
file file file
Workflow
result 1 result 2
35. Task dependencies rely on :
◦ Not only file name
◦ Parameters, e.g. Geometry
2010-11-14RubyConf X 35
A B
C
D
A
task
B C D
task task task
36. Entire workflow is unknown at first
Result of a task affect
◦ Output files
◦ Afterward tasks
2010-11-14RubyConf X 36
Task
file3file1 file2
Task Task
37. create Makefile during Make execution
tricky way
Makefile
Makefile.sub: prerequisite
awk –f hoge.awk $< > $@
target: Makefile.sub
make -f $<
Makefile.sub
target1: source1
target2: source2
…
create
invoke
2010-11-14 37RubyConf X
38. Scientific workflow requires powerful
and flexible definition language.
You probably know the solution.
◦ What is it?
2010-11-14 38RubyConf X
40. Build tool
Internal DSL
Programming power of Ruby
2010-11-14RubyConf X 40
41. file "file2" => "file1" do
sh "program file1 > file2"
end
2010-11-14RubyConf X 41
file1
program
file2
42. for x in LIST
file x[1] => x[0] do |t|
sh "your_program …"
end
end
2010-11-14 42RubyConf X
43. How do your write it with Rake?
2010-11-14RubyConf X 43
Task
file3file1 file2
Task Task
44. task :A do
task :B do
puts “B”
end
end
task :default => :A
No task depends
on Task B
45. task :A do
b = task :B do
puts “B”
end
b.invoke
end
task :default => :A
Rake::Task#invoke
Invoke Task B
immediately after
definition
46. multitask
◦ Rake built-in feature
◦ Parallelize prerequisite tasks of multitask
◦ Ruby thread
Problem
◦ No control for the number of thread.
◦ All the prerequisite tasks are invoked at
the same time.
2010-11-14RubyConf X 46
47. http://drake.rubyforge.org/
Specify the number of threads
All the independent task are
automatically parallelized.
◦ multitask is not necessary
2010-11-14RubyConf X 47
48. Find task
2010-11-14RubyConf X 48
D E F
C
A
B
Queue to
workers
worker
thread 1
worker
thread 2
Queue from
workers
B
Available task
...
49. Remote process execution
Dynamic task definition
◦ dRake does not allows “invoke” method.
Performance issue
2010-11-14 49RubyConf X
50. Need Powerful Scientific Workflow tool
Existing
◦ Rake : Powerful for writing workflow
◦ dRake : Parallel execution
Missing
◦ Remote Process Invocation
◦ Scalability
2010-11-14RubyConf X 50
53. Parallel + distributed
Workflow
extension for Rake
repository:
◦ http://github.com/masa16/pwrake
2010-11-14RubyConf X 53
54. Same syntax as Rake.
Parallelize task, file
◦ no multitask
Replace “sh” method
◦ invoke process through SSH
Scalability
2010-11-14 54RubyConf X
55. Why SSH
◦ Secure
◦ Probably SSH port is available
SSH class for Pwrake
◦ Original implementation
◦ Performance issues
2010-11-14RubyConf X 55
56. Worker thread in Ruby
Ruby thread uses single-core
◦ GVL
sh process uses multi-core.
for x in LIST
file x[1] => x[0] do |t|
sh "your_program …"
end
end
2010-11-14RubyConf X 56
here uses multi-core
58. Our approach:
use Distributed Filesystem
◦ file sharing
◦ consistent file timestamp
◦ I/O performance
2010-11-14RubyConf X 58
59. Storage
CPU CPU CPU
file1 file2 file3
Storage
CPU CPU CPU
Storage I/O
becomes
bottleneck StorageStorage
file1 file2 file3
Network File System Distributed File System
Parallel
Processing
59RubyConf X 2010-11-14
61. Wide-area distributed file system
Global namespace to federate storages
Main developer : Prof. Osamu Tatebe
Open source development
◦ http://datafarm.apgrid.org/
Local
Storage
Local
Storage
Local
Storage
Internet
Gfarm File System
/
/dir1
file1 file2
/dir2
file3 file4
Computer nodes
Local
Storage
Local
Storage
Local
Storage
61RubyConf X 2010-11-14
62. use Local I/O for performance
assign task based on File locality
implement as a function of Pwrake
62RubyConf X 2010-11-14
Local
Storage
Local
Storage
Local
Storage
File System
Nodes
file1 file2 file3
Local
Storagefile4
Job for
File 1
Job for
File 3
Job for
File 3
Slow
Fast
63. Locality-aware task assignment for
Gfarm
2010-11-14RubyConf X 63
Task1
Task2
Task3
AffinityQueue
…
worker
thread1
worker
thread2
worker
thread3
push
with hostname
Queue for
host1
pop
with hostname
Queue for
host2
Queue for
host3
…
…
64. 1 node
4 cores
2 nodes
8 cores
4 nodes
16 cores
8 nodes
32 cores
64RubyConf X 2010-11-14
NFS
does not
scale
68. Rake
◦ is so powerful to be used for Scientific
definition language.
Pwrake
◦ Parallel and Distributed Workflow
extension for Rake
Gfarm
◦ for scalable I/O performance
2010-11-14 68RubyConf X
69. Pwrake site
◦ https://github.com/masa16/pwrake
Questions?
2010-11-14RubyConf X 69