SlideShare une entreprise Scribd logo
1  sur  69
Télécharger pour lire hors ligne
Masahiro Tanaka
University of Tsukuba
2010-11-14 1RubyConf X
 Masahiro Tanaka
 NArray author
 majored in Astronomy
 Research fellow in Computer Science
◦ at Center for Computational Sciences,
University of Tsukuba
◦ since 2009
2010-11-14 2RubyConf X
 Research Fields
◦ Computer Science:
 High Performance Computing
 Computational Informatics
◦ Computational Science:
 Particle Physics, Astrophysics,
Material Science, Life Science,
Biology, Environmental Science
2010-11-14 3RubyConf X
 FIRST
◦ 512 cores+BladeGRAPE
◦ 36 TFLOPS
 PACS-CS
◦ 2,560 cores
◦ 14.4 TFOPS
 T2K Tsukuba
◦ 10,368 cores
◦ 95 TFLOPS
2010-11-14RubyConf X 4
 Conference on
SuperComputer
 > 10,000
participants
2010-11-14RubyConf X 5
We are here
SC10 venue
Ernest N. Morial
Convention Center
exhibit Nov 15-18
2010-11-14 6RubyConf X
 CCS booth at SC09
2010-11-14RubyConf X 7
 Pwrake : a Distributed Workflow
Engine for e-Science
2010-11-14 8RubyConf X
2010-11-14 9RubyConf X
LHC
Particle Accelerator
ALMA
Radio Observatory
2010-11-14 10RubyConf X
2010-11-14RubyConf X 11
http://www.sinet.ad.jp/case-examples/tsukuba
: Sharing QCD Simulation data
 Computationally intensive science that
is carried out in highly distributed
network environments,
or
 Science that uses immense data sets
that require grid computing
◦ (Wikipedia).
2010-11-14 12RubyConf X
 The term was created by John Taylor,
◦ Director General of the United Kingdom's Office
of Science and Technology
◦ in 1999
2010-11-14RubyConf X 13
 is a key issue for e-Science.
2010-11-14 14RubyConf X
 Performance of single core does no
more increase.
2010-11-14 15RubyConf X
2010-11-14 16RubyConf X
2010-11-14 17RubyConf X
 Parallelize your program
 Scalability is an key issue
2010-11-14RubyConf X 18
 P : parallelizable
 1-P : sequential
 N : # of processors
 Speed-up formula :
2010-11-14RubyConf X 19
Number of Processors
Speedup
1
1 − P + P/N
P<1
1
1 − P
 MapReduce
 MPI
 OpenMP
 thread
 Parallel programming languages
 process
2010-11-14RubyConf X 20
 Independent processes can be
parallelized
 Without parallel programming
 Workflow System is required
2010-11-14RubyConf X 21
input1
program
output1
...input2
program
output2
input3
program
output3
input4
program
output4
...
...
2010-11-14RubyConf X 22
 Description of
procedures
 It is like building a
program
2010-11-14RubyConf X 23
cc –c –o a.o a.c
a.c b.c
b.oa.o
cc –o prog …
prog
cc –c –o b.o b.c
 Task: Ellipse Node
 File: Rectangle Node
 Dependency: Edge
 DAG
◦ Directed Acyclic Graph
2010-11-14 24RubyConf X
Input File
Task
Output File
Dependency
 Montage
◦ software for producing a custom mosaic
image from multiple shots of images.
◦ http://montage.ipac.caltech.edu/
2010-11-14 25RubyConf X
 Tasks:
◦ Projection
◦ Brightness correction
◦ Coadding
 1 image : 1 process
26RubyConf X 2010-11-14
mProjectPP
mDiff
mBgModel
mBackground
mAddmFitplane
m1 = a'1x+b'1y+c'1
m2 = a'2x+b'2y+c'2
a1x+b1y+c1=0 a2x+b2y+c2=0
Final image
Input images
flow
Input
files
Output
file
2010-11-14 27RubyConf X
◦ invoke task based on
dependency
◦ assign a task to an
available computer
◦ parallel execution for
independent tasks
2010-11-14RubyConf X 28
Process
Process
Process
Workflow
System
Workflow
definition
 for Grid Computing
2010-11-14RubyConf X 29
• DAGMan
• Pegasus
• Triana
• ICENI
• Taverna
• GrADS
• GridFlow
• UNICORE
• Globus workflow
• Askalan
• Karajan
• Kepler
from “A Taxonomy of Scientific
Workflow Systems for Grid Computing”
Jia Yu and Rajkumar Buyya (2005)
 Define DAG in XML
◦ Human cannot write
complex XML.
◦ Need to write a program
to generate XML
2010-11-14RubyConf X 30
<adag xmlns="http://www.griphyn.org/chimera/DAX"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.griphyn.org/chimera/DAX
http://www.griphyn.org/chimera/dax-1.8.xsd"
count="1" index="0" name="test">
<filename file="2mass-atlas-981204n-j0160056.fits"
link="input"/>
…
<job id="ID000001" name="mProject" version="3.0" level="11" dv-
name="mProject1" dv-version="1.0">
<argument>
<filename file="2mass-atlas-981204n-j0160056.fits"/>
<filename file="p2mass-atlas-981204n-j0160056.fits"/>
<filename file="templateTMP_AAAaaa01.hdr"/>
</argument>
<uses file="2mass-atlas-981204n-j0160056.fits" link="input"
dontRegister="false" dontTransfer="false"/>
<uses file="p2mass-atlas-981204n-j0160056.fits" link="output"
dontRegister="true" dontTransfer="true"
temporaryHint="tmp"/>
<uses file="p2mass-atlas-981204n-j0160056_area.fits"
link="output" dontRegister="true" dontTransfer="true"
temporaryHint="tmp"/>
<uses file="templateTMP_AAAaaa01.hdr" link="input"
dontRegister="false" dontTransfer="false"/>
</job>
…
<child ref="ID003006">
<parent ref="ID000001"/>
<parent ref="ID000006"/>
</child>
DAG XML
 DSL to define task dependency
 Rule
◦ define multiple tasks at once
◦ avoid redundancy
 Skip finished tasks
◦ based on timestamp of file
2010-11-14RubyConf X 31
 Grid Explorer : Grid and Cluster shell
◦ http://www.logos.ic.i.u-tokyo.ac.jp/gxp/
◦ written in Python.
 GXP Make
◦ GNU Make-based workflow system
◦ Distributed & Parallel execution
2010-11-14RubyConf X 32
 Makefile
◦ same input files
◦ same tasks
◦ executed repeatedly
 Scientific Workflows have different
aspects.
2010-11-14 33RubyConf X
 Same workflow for different files
◦ “rule” may solve, but is not enough.
2010-11-14RubyConf X 34
File set 1
file file file
File set 2
file file file
Workflow
result 1 result 2
 Task dependencies rely on :
◦ Not only file name
◦ Parameters, e.g. Geometry
2010-11-14RubyConf X 35
A B
C
D
A
task
B C D
task task task
 Entire workflow is unknown at first
 Result of a task affect
◦ Output files
◦ Afterward tasks
2010-11-14RubyConf X 36
Task
file3file1 file2
Task Task
 create Makefile during Make execution
 tricky way
Makefile
Makefile.sub: prerequisite
awk –f hoge.awk $< > $@
target: Makefile.sub
make -f $<
Makefile.sub
target1: source1
target2: source2
…
create
invoke
2010-11-14 37RubyConf X
 Scientific workflow requires powerful
and flexible definition language.
 You probably know the solution.
◦ What is it?
2010-11-14 38RubyConf X
2010-11-14RubyConf X 39
 Build tool
 Internal DSL
 Programming power of Ruby
2010-11-14RubyConf X 40
file "file2" => "file1" do
sh "program file1 > file2"
end
2010-11-14RubyConf X 41
file1
program
file2
for x in LIST
file x[1] => x[0] do |t|
sh "your_program …"
end
end
2010-11-14 42RubyConf X
 How do your write it with Rake?
2010-11-14RubyConf X 43
Task
file3file1 file2
Task Task
task :A do
task :B do
puts “B”
end
end
task :default => :A
 No task depends
on Task B
task :A do
b = task :B do
puts “B”
end
b.invoke
end
task :default => :A
 Rake::Task#invoke
 Invoke Task B
immediately after
definition
 multitask
◦ Rake built-in feature
◦ Parallelize prerequisite tasks of multitask
◦ Ruby thread
 Problem
◦ No control for the number of thread.
◦ All the prerequisite tasks are invoked at
the same time.
2010-11-14RubyConf X 46
 http://drake.rubyforge.org/
 Specify the number of threads
 All the independent task are
automatically parallelized.
◦ multitask is not necessary
2010-11-14RubyConf X 47
Find task
2010-11-14RubyConf X 48
D E F
C
A
B
Queue to
workers
worker
thread 1
worker
thread 2
Queue from
workers
B
Available task
...
 Remote process execution
 Dynamic task definition
◦ dRake does not allows “invoke” method.
 Performance issue
2010-11-14 49RubyConf X
 Need Powerful Scientific Workflow tool
 Existing
◦ Rake : Powerful for writing workflow
◦ dRake : Parallel execution
 Missing
◦ Remote Process Invocation
◦ Scalability
2010-11-14RubyConf X 50
2010-11-14RubyConf X 51
Process
Process
Process
Rake
Rakefile
Gfarm filesystem
file file file
Pwrake
extension
2010-11-14RubyConf X 52
 Parallel + distributed
 Workflow
 extension for Rake
 repository:
◦ http://github.com/masa16/pwrake
2010-11-14RubyConf X 53
 Same syntax as Rake.
 Parallelize task, file
◦ no multitask
 Replace “sh” method
◦ invoke process through SSH
 Scalability
2010-11-14 54RubyConf X
 Why SSH
◦ Secure
◦ Probably SSH port is available
 SSH class for Pwrake
◦ Original implementation
◦ Performance issues
2010-11-14RubyConf X 55
 Worker thread in Ruby
 Ruby thread uses single-core
◦ GVL
 sh process uses multi-core.
for x in LIST
file x[1] => x[0] do |t|
sh "your_program …"
end
end
2010-11-14RubyConf X 56
here uses multi-core
2010-11-14RubyConf X 57
x10
faster
dRake
Pwrake
Our approach:
 use Distributed Filesystem
◦ file sharing
◦ consistent file timestamp
◦ I/O performance
2010-11-14RubyConf X 58
Storage
CPU CPU CPU
file1 file2 file3
Storage
CPU CPU CPU
Storage I/O
becomes
bottleneck StorageStorage
file1 file2 file3
Network File System Distributed File System
Parallel
Processing
59RubyConf X 2010-11-14
2010-11-14RubyConf X 60
 Wide-area distributed file system
 Global namespace to federate storages
 Main developer : Prof. Osamu Tatebe
 Open source development
◦ http://datafarm.apgrid.org/
Local
Storage
Local
Storage
Local
Storage
Internet
Gfarm File System
/
/dir1
file1 file2
/dir2
file3 file4
Computer nodes
Local
Storage
Local
Storage
Local
Storage
61RubyConf X 2010-11-14
 use Local I/O for performance
 assign task based on File locality
 implement as a function of Pwrake
62RubyConf X 2010-11-14
Local
Storage
Local
Storage
Local
Storage
File System
Nodes
file1 file2 file3
Local
Storagefile4
Job for
File 1
Job for
File 3
Job for
File 3
Slow
Fast
 Locality-aware task assignment for
Gfarm
2010-11-14RubyConf X 63
Task1
Task2
Task3
AffinityQueue
…
worker
thread1
worker
thread2
worker
thread3
push
with hostname
Queue for
host1
pop
with hostname
Queue for
host2
Queue for
host3
…
…
1 node
4 cores
2 nodes
8 cores
4 nodes
16 cores
8 nodes
32 cores
64RubyConf X 2010-11-14
NFS
does not
scale
1 node
4 cores
2 nodes
8 cores
4 nodes
16 cores
8 nodes
32 cores
65RubyConf X 2010-11-14
Gfarm
scales
20%
Speedup
using locality
 Montage workflow
2010-11-14RubyConf X 66
 Geographically distributed wokrflow
 Fault tolerance
2010-11-14RubyConf X 67
 Rake
◦ is so powerful to be used for Scientific
definition language.
 Pwrake
◦ Parallel and Distributed Workflow
extension for Rake
 Gfarm
◦ for scalable I/O performance
2010-11-14 68RubyConf X
 Pwrake site
◦ https://github.com/masa16/pwrake
 Questions?
2010-11-14RubyConf X 69

Contenu connexe

Tendances

A Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPFA Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPFoholiab
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prodYunong Xiao
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Hajime Tazaki
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughThomas Graf
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking WalkthroughThomas Graf
 
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Hajime Tazaki
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_mapslcplcp1
 
Networking and Go: An Epic Journey
Networking and Go: An Epic JourneyNetworking and Go: An Epic Journey
Networking and Go: An Epic JourneySneha Inguva
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1Hajime Tazaki
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabMichelle Holley
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Tomasz Bednarz
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.Naoto MATSUMOTO
 
Kernel Recipes 2013 - Nftables, what motivations and what solutions
Kernel Recipes 2013 - Nftables, what motivations and what solutionsKernel Recipes 2013 - Nftables, what motivations and what solutions
Kernel Recipes 2013 - Nftables, what motivations and what solutionsAnne Nicolas
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBbmbouter
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCKernel TLV
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopenHajime Tazaki
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Anne Nicolas
 
Linux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic KernelLinux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic KernelHajime Tazaki
 

Tendances (20)

A Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPFA Kernel of Truth: Intrusion Detection and Attestation with eBPF
A Kernel of Truth: Intrusion Detection and Attestation with eBPF
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01Library Operating System for Linux #netdev01
Library Operating System for Linux #netdev01
 
LinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking WalkthroughLinuxCon 2015 Linux Kernel Networking Walkthrough
LinuxCon 2015 Linux Kernel Networking Walkthrough
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
 
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
Linux rumpkernel - ABC2018 (AsiaBSDCon 2018)
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
 
Xdp and ebpf_maps
Xdp and ebpf_mapsXdp and ebpf_maps
Xdp and ebpf_maps
 
Networking and Go: An Epic Journey
Networking and Go: An Epic JourneyNetworking and Go: An Epic Journey
Networking and Go: An Epic Journey
 
LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1LibOS as a regression test framework for Linux networking #netdev1.1
LibOS as a regression test framework for Linux networking #netdev1.1
 
DPDK in Containers Hands-on Lab
DPDK in Containers Hands-on LabDPDK in Containers Hands-on Lab
DPDK in Containers Hands-on Lab
 
Streams for the Web
Streams for the WebStreams for the Web
Streams for the Web
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
 
How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.How to Speak Intel DPDK KNI for Web Services.
How to Speak Intel DPDK KNI for Web Services.
 
Kernel Recipes 2013 - Nftables, what motivations and what solutions
Kernel Recipes 2013 - Nftables, what motivations and what solutionsKernel Recipes 2013 - Nftables, what motivations and what solutions
Kernel Recipes 2013 - Nftables, what motivations and what solutions
 
Debugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDBDebugging Hung Python Processes With GDB
Debugging Hung Python Processes With GDB
 
Building Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCCBuilding Network Functions with eBPF & BCC
Building Network Functions with eBPF & BCC
 
Kernelvm 201312-dlmopen
Kernelvm 201312-dlmopenKernelvm 201312-dlmopen
Kernelvm 201312-dlmopen
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 
Linux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic KernelLinux Kernel Library - Reusing Monolithic Kernel
Linux Kernel Library - Reusing Monolithic Kernel
 

En vedette

Boris Krstović - Choosing the Right Content Management System
Boris Krstović - Choosing the Right Content Management SystemBoris Krstović - Choosing the Right Content Management System
Boris Krstović - Choosing the Right Content Management Systemboccio
 
אקטיביזם תקשורתי וירטואלי2
אקטיביזם תקשורתי וירטואלי2אקטיביזם תקשורתי וירטואלי2
אקטיביזם תקשורתי וירטואלי2hagitmt
 
6th Latin America Clinical Trials Conference, March 2011, Philadelphia
6th Latin America Clinical Trials Conference, March 2011, Philadelphia6th Latin America Clinical Trials Conference, March 2011, Philadelphia
6th Latin America Clinical Trials Conference, March 2011, PhiladelphiaExL Pharma
 
Real-Time Coherence Monitoring in Integrated Environments
Real-Time Coherence Monitoring in Integrated EnvironmentsReal-Time Coherence Monitoring in Integrated Environments
Real-Time Coherence Monitoring in Integrated EnvironmentsSL Corporation
 
2nd Annual Proactive GCP Compliance, April 2011, Arlington, VA
2nd Annual Proactive GCP Compliance, April 2011, Arlington, VA2nd Annual Proactive GCP Compliance, April 2011, Arlington, VA
2nd Annual Proactive GCP Compliance, April 2011, Arlington, VAExL Pharma
 
2011 US Combustion Meeting - Kinetic Modeling of Methyl Formate Oxidation
2011 US Combustion Meeting - Kinetic Modeling of Methyl Formate Oxidation2011 US Combustion Meeting - Kinetic Modeling of Methyl Formate Oxidation
2011 US Combustion Meeting - Kinetic Modeling of Methyl Formate OxidationRichard West
 
Historia apuntes
Historia apuntesHistoria apuntes
Historia apuntesmaricel
 
Analisisbab4 pencapaian
Analisisbab4 pencapaianAnalisisbab4 pencapaian
Analisisbab4 pencapaiansiti mizarina
 
4th Lean Sigma & Kaizen for Pharma R&D Conference, March 2011, Miami
4th Lean Sigma & Kaizen for Pharma R&D Conference, March 2011, Miami4th Lean Sigma & Kaizen for Pharma R&D Conference, March 2011, Miami
4th Lean Sigma & Kaizen for Pharma R&D Conference, March 2011, MiamiExL Pharma
 
JARDÍN DE INFANTES "GENERAL SAN MARTÍN"
JARDÍN DE INFANTES "GENERAL SAN MARTÍN" JARDÍN DE INFANTES "GENERAL SAN MARTÍN"
JARDÍN DE INFANTES "GENERAL SAN MARTÍN" maricel
 
Supertek induction lamp catalog
Supertek induction lamp catalogSupertek induction lamp catalog
Supertek induction lamp catalogbingoeric
 
Building a supportive network
Building a supportive networkBuilding a supportive network
Building a supportive networkAl Cini
 

En vedette (20)

Asesores contable claros & cia
Asesores contable claros & cia Asesores contable claros & cia
Asesores contable claros & cia
 
Urinary system
Urinary systemUrinary system
Urinary system
 
Marble
MarbleMarble
Marble
 
Boris Krstović - Choosing the Right Content Management System
Boris Krstović - Choosing the Right Content Management SystemBoris Krstović - Choosing the Right Content Management System
Boris Krstović - Choosing the Right Content Management System
 
Hoja de trabajo nomina
Hoja de trabajo nominaHoja de trabajo nomina
Hoja de trabajo nomina
 
Membranes and permeability
Membranes and permeabilityMembranes and permeability
Membranes and permeability
 
אקטיביזם תקשורתי וירטואלי2
אקטיביזם תקשורתי וירטואלי2אקטיביזם תקשורתי וירטואלי2
אקטיביזם תקשורתי וירטואלי2
 
6th Latin America Clinical Trials Conference, March 2011, Philadelphia
6th Latin America Clinical Trials Conference, March 2011, Philadelphia6th Latin America Clinical Trials Conference, March 2011, Philadelphia
6th Latin America Clinical Trials Conference, March 2011, Philadelphia
 
Real-Time Coherence Monitoring in Integrated Environments
Real-Time Coherence Monitoring in Integrated EnvironmentsReal-Time Coherence Monitoring in Integrated Environments
Real-Time Coherence Monitoring in Integrated Environments
 
2nd Annual Proactive GCP Compliance, April 2011, Arlington, VA
2nd Annual Proactive GCP Compliance, April 2011, Arlington, VA2nd Annual Proactive GCP Compliance, April 2011, Arlington, VA
2nd Annual Proactive GCP Compliance, April 2011, Arlington, VA
 
2011 US Combustion Meeting - Kinetic Modeling of Methyl Formate Oxidation
2011 US Combustion Meeting - Kinetic Modeling of Methyl Formate Oxidation2011 US Combustion Meeting - Kinetic Modeling of Methyl Formate Oxidation
2011 US Combustion Meeting - Kinetic Modeling of Methyl Formate Oxidation
 
Pre zimskih snova 1
Pre zimskih snova  1Pre zimskih snova  1
Pre zimskih snova 1
 
Historia apuntes
Historia apuntesHistoria apuntes
Historia apuntes
 
Analisisbab4 pencapaian
Analisisbab4 pencapaianAnalisisbab4 pencapaian
Analisisbab4 pencapaian
 
Otherside - RHCP
Otherside - RHCPOtherside - RHCP
Otherside - RHCP
 
4th Lean Sigma & Kaizen for Pharma R&D Conference, March 2011, Miami
4th Lean Sigma & Kaizen for Pharma R&D Conference, March 2011, Miami4th Lean Sigma & Kaizen for Pharma R&D Conference, March 2011, Miami
4th Lean Sigma & Kaizen for Pharma R&D Conference, March 2011, Miami
 
JARDÍN DE INFANTES "GENERAL SAN MARTÍN"
JARDÍN DE INFANTES "GENERAL SAN MARTÍN" JARDÍN DE INFANTES "GENERAL SAN MARTÍN"
JARDÍN DE INFANTES "GENERAL SAN MARTÍN"
 
AS操作分享-- 中壢啟智
AS操作分享-- 中壢啟智AS操作分享-- 中壢啟智
AS操作分享-- 中壢啟智
 
Supertek induction lamp catalog
Supertek induction lamp catalogSupertek induction lamp catalog
Supertek induction lamp catalog
 
Building a supportive network
Building a supportive networkBuilding a supportive network
Building a supportive network
 

Similaire à Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Threestackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick ThreeNETWAYS
 
OSMC 2009 | NConf - Enterprise Nagios configurator by Angelo Gargiulo
OSMC 2009 | NConf - Enterprise Nagios configurator by Angelo GargiuloOSMC 2009 | NConf - Enterprise Nagios configurator by Angelo Gargiulo
OSMC 2009 | NConf - Enterprise Nagios configurator by Angelo GargiuloNETWAYS
 
Banog meetup August 30th, network device property as code
Banog meetup August 30th, network device property as codeBanog meetup August 30th, network device property as code
Banog meetup August 30th, network device property as codeDamien Garros
 
OpenZFS send and receive
OpenZFS send and receiveOpenZFS send and receive
OpenZFS send and receiveMatthew Ahrens
 
Flow Base Programming with Node-RED and Functional Reactive Programming with ...
Flow Base Programming with Node-RED and Functional Reactive Programming with ...Flow Base Programming with Node-RED and Functional Reactive Programming with ...
Flow Base Programming with Node-RED and Functional Reactive Programming with ...Sven Beauprez
 
Cloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep DiveCloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep DiveKazuto Kusama
 
Chicago Docker Meetup Presentation - Mediafly
Chicago Docker Meetup Presentation - MediaflyChicago Docker Meetup Presentation - Mediafly
Chicago Docker Meetup Presentation - MediaflyMediafly
 
FastR+Apache Flink
FastR+Apache FlinkFastR+Apache Flink
FastR+Apache FlinkJuan Fumero
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsCeph Community
 
New bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around CollinsNew bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around Collinsleboncoin engineering
 
Introduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkIntroduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkJérôme Petazzoni
 
[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep DiveAkihiro Suda
 
Super powered Drupal development with docker
Super powered Drupal development with dockerSuper powered Drupal development with docker
Super powered Drupal development with dockerMaciej Lukianski
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebula Project
 
Gears of Perforce: AAA Game Development Challenges
Gears of Perforce: AAA Game Development ChallengesGears of Perforce: AAA Game Development Challenges
Gears of Perforce: AAA Game Development ChallengesPerforce
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpStacy Véronneau
 
Nanog75, Network Device Property as Code
Nanog75, Network Device Property as CodeNanog75, Network Device Property as Code
Nanog75, Network Device Property as CodeDamien Garros
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OSri Ambati
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaAbhinav Singh
 

Similaire à Pwrake: Distributed Workflow Engine for e-Science - RubyConfX (20)

Lua and its Ecosystem
Lua and its EcosystemLua and its Ecosystem
Lua and its Ecosystem
 
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Threestackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
stackconf 2022: Cluster Management: Heterogeneous, Lightweight, Safe. Pick Three
 
OSMC 2009 | NConf - Enterprise Nagios configurator by Angelo Gargiulo
OSMC 2009 | NConf - Enterprise Nagios configurator by Angelo GargiuloOSMC 2009 | NConf - Enterprise Nagios configurator by Angelo Gargiulo
OSMC 2009 | NConf - Enterprise Nagios configurator by Angelo Gargiulo
 
Banog meetup August 30th, network device property as code
Banog meetup August 30th, network device property as codeBanog meetup August 30th, network device property as code
Banog meetup August 30th, network device property as code
 
OpenZFS send and receive
OpenZFS send and receiveOpenZFS send and receive
OpenZFS send and receive
 
Flow Base Programming with Node-RED and Functional Reactive Programming with ...
Flow Base Programming with Node-RED and Functional Reactive Programming with ...Flow Base Programming with Node-RED and Functional Reactive Programming with ...
Flow Base Programming with Node-RED and Functional Reactive Programming with ...
 
Cloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep DiveCloud Foundry V2 | Intermediate Deep Dive
Cloud Foundry V2 | Intermediate Deep Dive
 
Chicago Docker Meetup Presentation - Mediafly
Chicago Docker Meetup Presentation - MediaflyChicago Docker Meetup Presentation - Mediafly
Chicago Docker Meetup Presentation - Mediafly
 
FastR+Apache Flink
FastR+Apache FlinkFastR+Apache Flink
FastR+Apache Flink
 
Experiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah WatkinsExperiences building a distributed shared log on RADOS - Noah Watkins
Experiences building a distributed shared log on RADOS - Noah Watkins
 
New bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around CollinsNew bare-metal provisioning setup built around Collins
New bare-metal provisioning setup built around Collins
 
Introduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New YorkIntroduction to Docker at the Azure Meet-up in New York
Introduction to Docker at the Azure Meet-up in New York
 
[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive[KubeCon EU 2020] containerd Deep Dive
[KubeCon EU 2020] containerd Deep Dive
 
Super powered Drupal development with docker
Super powered Drupal development with dockerSuper powered Drupal development with docker
Super powered Drupal development with docker
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
 
Gears of Perforce: AAA Game Development Challenges
Gears of Perforce: AAA Game Development ChallengesGears of Perforce: AAA Game Development Challenges
Gears of Perforce: AAA Game Development Challenges
 
Montreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUpMontreal OpenStack Q3-2017 MeetUp
Montreal OpenStack Q3-2017 MeetUp
 
Nanog75, Network Device Property as Code
Nanog75, Network Device Property as CodeNanog75, Network Device Property as Code
Nanog75, Network Device Property as Code
 
Hambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2OHambug R Meetup - Intro to H2O
Hambug R Meetup - Intro to H2O
 
Stream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache KafkaStream Processing using Apache Spark and Apache Kafka
Stream Processing using Apache Spark and Apache Kafka
 

Dernier

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Pwrake: Distributed Workflow Engine for e-Science - RubyConfX

  • 1. Masahiro Tanaka University of Tsukuba 2010-11-14 1RubyConf X
  • 2.  Masahiro Tanaka  NArray author  majored in Astronomy  Research fellow in Computer Science ◦ at Center for Computational Sciences, University of Tsukuba ◦ since 2009 2010-11-14 2RubyConf X
  • 3.  Research Fields ◦ Computer Science:  High Performance Computing  Computational Informatics ◦ Computational Science:  Particle Physics, Astrophysics, Material Science, Life Science, Biology, Environmental Science 2010-11-14 3RubyConf X
  • 4.  FIRST ◦ 512 cores+BladeGRAPE ◦ 36 TFLOPS  PACS-CS ◦ 2,560 cores ◦ 14.4 TFOPS  T2K Tsukuba ◦ 10,368 cores ◦ 95 TFLOPS 2010-11-14RubyConf X 4
  • 5.  Conference on SuperComputer  > 10,000 participants 2010-11-14RubyConf X 5
  • 6. We are here SC10 venue Ernest N. Morial Convention Center exhibit Nov 15-18 2010-11-14 6RubyConf X
  • 7.  CCS booth at SC09 2010-11-14RubyConf X 7
  • 8.  Pwrake : a Distributed Workflow Engine for e-Science 2010-11-14 8RubyConf X
  • 12.  Computationally intensive science that is carried out in highly distributed network environments, or  Science that uses immense data sets that require grid computing ◦ (Wikipedia). 2010-11-14 12RubyConf X
  • 13.  The term was created by John Taylor, ◦ Director General of the United Kingdom's Office of Science and Technology ◦ in 1999 2010-11-14RubyConf X 13
  • 14.  is a key issue for e-Science. 2010-11-14 14RubyConf X
  • 15.  Performance of single core does no more increase. 2010-11-14 15RubyConf X
  • 18.  Parallelize your program  Scalability is an key issue 2010-11-14RubyConf X 18
  • 19.  P : parallelizable  1-P : sequential  N : # of processors  Speed-up formula : 2010-11-14RubyConf X 19 Number of Processors Speedup 1 1 − P + P/N P<1 1 1 − P
  • 20.  MapReduce  MPI  OpenMP  thread  Parallel programming languages  process 2010-11-14RubyConf X 20
  • 21.  Independent processes can be parallelized  Without parallel programming  Workflow System is required 2010-11-14RubyConf X 21 input1 program output1 ...input2 program output2 input3 program output3 input4 program output4 ... ...
  • 23.  Description of procedures  It is like building a program 2010-11-14RubyConf X 23 cc –c –o a.o a.c a.c b.c b.oa.o cc –o prog … prog cc –c –o b.o b.c
  • 24.  Task: Ellipse Node  File: Rectangle Node  Dependency: Edge  DAG ◦ Directed Acyclic Graph 2010-11-14 24RubyConf X Input File Task Output File Dependency
  • 25.  Montage ◦ software for producing a custom mosaic image from multiple shots of images. ◦ http://montage.ipac.caltech.edu/ 2010-11-14 25RubyConf X
  • 26.  Tasks: ◦ Projection ◦ Brightness correction ◦ Coadding  1 image : 1 process 26RubyConf X 2010-11-14 mProjectPP mDiff mBgModel mBackground mAddmFitplane m1 = a'1x+b'1y+c'1 m2 = a'2x+b'2y+c'2 a1x+b1y+c1=0 a2x+b2y+c2=0 Final image Input images
  • 28. ◦ invoke task based on dependency ◦ assign a task to an available computer ◦ parallel execution for independent tasks 2010-11-14RubyConf X 28 Process Process Process Workflow System Workflow definition
  • 29.  for Grid Computing 2010-11-14RubyConf X 29 • DAGMan • Pegasus • Triana • ICENI • Taverna • GrADS • GridFlow • UNICORE • Globus workflow • Askalan • Karajan • Kepler from “A Taxonomy of Scientific Workflow Systems for Grid Computing” Jia Yu and Rajkumar Buyya (2005)
  • 30.  Define DAG in XML ◦ Human cannot write complex XML. ◦ Need to write a program to generate XML 2010-11-14RubyConf X 30 <adag xmlns="http://www.griphyn.org/chimera/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.8.xsd" count="1" index="0" name="test"> <filename file="2mass-atlas-981204n-j0160056.fits" link="input"/> … <job id="ID000001" name="mProject" version="3.0" level="11" dv- name="mProject1" dv-version="1.0"> <argument> <filename file="2mass-atlas-981204n-j0160056.fits"/> <filename file="p2mass-atlas-981204n-j0160056.fits"/> <filename file="templateTMP_AAAaaa01.hdr"/> </argument> <uses file="2mass-atlas-981204n-j0160056.fits" link="input" dontRegister="false" dontTransfer="false"/> <uses file="p2mass-atlas-981204n-j0160056.fits" link="output" dontRegister="true" dontTransfer="true" temporaryHint="tmp"/> <uses file="p2mass-atlas-981204n-j0160056_area.fits" link="output" dontRegister="true" dontTransfer="true" temporaryHint="tmp"/> <uses file="templateTMP_AAAaaa01.hdr" link="input" dontRegister="false" dontTransfer="false"/> </job> … <child ref="ID003006"> <parent ref="ID000001"/> <parent ref="ID000006"/> </child> DAG XML
  • 31.  DSL to define task dependency  Rule ◦ define multiple tasks at once ◦ avoid redundancy  Skip finished tasks ◦ based on timestamp of file 2010-11-14RubyConf X 31
  • 32.  Grid Explorer : Grid and Cluster shell ◦ http://www.logos.ic.i.u-tokyo.ac.jp/gxp/ ◦ written in Python.  GXP Make ◦ GNU Make-based workflow system ◦ Distributed & Parallel execution 2010-11-14RubyConf X 32
  • 33.  Makefile ◦ same input files ◦ same tasks ◦ executed repeatedly  Scientific Workflows have different aspects. 2010-11-14 33RubyConf X
  • 34.  Same workflow for different files ◦ “rule” may solve, but is not enough. 2010-11-14RubyConf X 34 File set 1 file file file File set 2 file file file Workflow result 1 result 2
  • 35.  Task dependencies rely on : ◦ Not only file name ◦ Parameters, e.g. Geometry 2010-11-14RubyConf X 35 A B C D A task B C D task task task
  • 36.  Entire workflow is unknown at first  Result of a task affect ◦ Output files ◦ Afterward tasks 2010-11-14RubyConf X 36 Task file3file1 file2 Task Task
  • 37.  create Makefile during Make execution  tricky way Makefile Makefile.sub: prerequisite awk –f hoge.awk $< > $@ target: Makefile.sub make -f $< Makefile.sub target1: source1 target2: source2 … create invoke 2010-11-14 37RubyConf X
  • 38.  Scientific workflow requires powerful and flexible definition language.  You probably know the solution. ◦ What is it? 2010-11-14 38RubyConf X
  • 40.  Build tool  Internal DSL  Programming power of Ruby 2010-11-14RubyConf X 40
  • 41. file "file2" => "file1" do sh "program file1 > file2" end 2010-11-14RubyConf X 41 file1 program file2
  • 42. for x in LIST file x[1] => x[0] do |t| sh "your_program …" end end 2010-11-14 42RubyConf X
  • 43.  How do your write it with Rake? 2010-11-14RubyConf X 43 Task file3file1 file2 Task Task
  • 44. task :A do task :B do puts “B” end end task :default => :A  No task depends on Task B
  • 45. task :A do b = task :B do puts “B” end b.invoke end task :default => :A  Rake::Task#invoke  Invoke Task B immediately after definition
  • 46.  multitask ◦ Rake built-in feature ◦ Parallelize prerequisite tasks of multitask ◦ Ruby thread  Problem ◦ No control for the number of thread. ◦ All the prerequisite tasks are invoked at the same time. 2010-11-14RubyConf X 46
  • 47.  http://drake.rubyforge.org/  Specify the number of threads  All the independent task are automatically parallelized. ◦ multitask is not necessary 2010-11-14RubyConf X 47
  • 48. Find task 2010-11-14RubyConf X 48 D E F C A B Queue to workers worker thread 1 worker thread 2 Queue from workers B Available task ...
  • 49.  Remote process execution  Dynamic task definition ◦ dRake does not allows “invoke” method.  Performance issue 2010-11-14 49RubyConf X
  • 50.  Need Powerful Scientific Workflow tool  Existing ◦ Rake : Powerful for writing workflow ◦ dRake : Parallel execution  Missing ◦ Remote Process Invocation ◦ Scalability 2010-11-14RubyConf X 50
  • 51. 2010-11-14RubyConf X 51 Process Process Process Rake Rakefile Gfarm filesystem file file file Pwrake extension
  • 53.  Parallel + distributed  Workflow  extension for Rake  repository: ◦ http://github.com/masa16/pwrake 2010-11-14RubyConf X 53
  • 54.  Same syntax as Rake.  Parallelize task, file ◦ no multitask  Replace “sh” method ◦ invoke process through SSH  Scalability 2010-11-14 54RubyConf X
  • 55.  Why SSH ◦ Secure ◦ Probably SSH port is available  SSH class for Pwrake ◦ Original implementation ◦ Performance issues 2010-11-14RubyConf X 55
  • 56.  Worker thread in Ruby  Ruby thread uses single-core ◦ GVL  sh process uses multi-core. for x in LIST file x[1] => x[0] do |t| sh "your_program …" end end 2010-11-14RubyConf X 56 here uses multi-core
  • 58. Our approach:  use Distributed Filesystem ◦ file sharing ◦ consistent file timestamp ◦ I/O performance 2010-11-14RubyConf X 58
  • 59. Storage CPU CPU CPU file1 file2 file3 Storage CPU CPU CPU Storage I/O becomes bottleneck StorageStorage file1 file2 file3 Network File System Distributed File System Parallel Processing 59RubyConf X 2010-11-14
  • 61.  Wide-area distributed file system  Global namespace to federate storages  Main developer : Prof. Osamu Tatebe  Open source development ◦ http://datafarm.apgrid.org/ Local Storage Local Storage Local Storage Internet Gfarm File System / /dir1 file1 file2 /dir2 file3 file4 Computer nodes Local Storage Local Storage Local Storage 61RubyConf X 2010-11-14
  • 62.  use Local I/O for performance  assign task based on File locality  implement as a function of Pwrake 62RubyConf X 2010-11-14 Local Storage Local Storage Local Storage File System Nodes file1 file2 file3 Local Storagefile4 Job for File 1 Job for File 3 Job for File 3 Slow Fast
  • 63.  Locality-aware task assignment for Gfarm 2010-11-14RubyConf X 63 Task1 Task2 Task3 AffinityQueue … worker thread1 worker thread2 worker thread3 push with hostname Queue for host1 pop with hostname Queue for host2 Queue for host3 … …
  • 64. 1 node 4 cores 2 nodes 8 cores 4 nodes 16 cores 8 nodes 32 cores 64RubyConf X 2010-11-14 NFS does not scale
  • 65. 1 node 4 cores 2 nodes 8 cores 4 nodes 16 cores 8 nodes 32 cores 65RubyConf X 2010-11-14 Gfarm scales 20% Speedup using locality
  • 67.  Geographically distributed wokrflow  Fault tolerance 2010-11-14RubyConf X 67
  • 68.  Rake ◦ is so powerful to be used for Scientific definition language.  Pwrake ◦ Parallel and Distributed Workflow extension for Rake  Gfarm ◦ for scalable I/O performance 2010-11-14 68RubyConf X
  • 69.  Pwrake site ◦ https://github.com/masa16/pwrake  Questions? 2010-11-14RubyConf X 69