SlideShare une entreprise Scribd logo
1  sur  14
Télécharger pour lire hors ligne
SciPipe
A light-weight workflow library
inspired by flow-based
programming
Samuel Lampa, @smllmp, bionics.it
Dept. Pharm. Biosci. UU, 2016-04-28
Top light-weight workflow tools
Snakemake
● Great for short one-off explorative stuff
● Tricky for complex graphs
Bpipe
● Easy to use for highly linear workflows
● Not so easy with branching workflows
Nextflow
● Dataflow means dynamic scheduling possible(!)
● Own way of organizing outputs
● No “re-usable components” support
SciLuigi and SciPipe
SciLuigi
● Great re-usable components story
● Highly customizable output file naming
● Easy to extend API
● No dynamic scheduling :(
● Performance problems with more than 64 workers
SciPipe
● (Same benefits as SciLuigi)
● Also: Allows dynamic scheduling
● Also: Much lower resource usage
(1000s of workers is OK)
● Also: Simpler, less code, less maintenance
● Also: High-performance for in-line components
SciPipe in brief
● Website: scipipe.org
● Simple, very little code => maintainable
● Write workflows in a subset of Go(lang)
● Execute readable .go-files:
go run myworkflow.go
● Optional compilation to static executable files:
go build; ./myworkflow
● No new language. Use existing Go tooling:
● Editors, Debuggers, Linters, Profilers ...
Flow-based programming
www.jpaulmorrison.com/fbp
Flow-based programming principles
● Separate network definition
(separate from process definitions)
● Named ports
● Channels with bounded buffers
● Information packets (IPs) with defined lifetimes
● More info:
en.wikipedia.org/wiki/Flow-based programming
www.jpaulmorrison.com/fbp
Define components
From shell commands:
… or in plain Go:
… and then mix & match! (See e.g. this example)
Connect components
Just assign out-ports to in-ports
(will make them use the same Go channel):
That’s about it.
SciPipe in action: ”Hello World” example
SciPipe:
A bit longer
example...
SciPipe:
A bit longer
example...
In this area, all processes will
run tasks in parallel, one for
each split produced by “split”
Architecture: Basic Components
● scipipe.SciProcess
● Long-running
● Typically one per operation
● Typically spawns one task per input
● scipipe.SciTask
● Short lived
● Executes just one shell command or custom Go
function
● Typically one per operation/set of in-data files
● scipipe.FileTarget
● Most common data type passed between processes
Architecture: Processes vs. Tasks
Thank you!
@smllmp
bionics.it

Contenu connexe

En vedette

Flow based programming an overview
Flow based programming   an overviewFlow based programming   an overview
Flow based programming an overview
Samuel Lampa
 

En vedette (20)

Taming Snakemake
Taming SnakemakeTaming Snakemake
Taming Snakemake
 
Principals, Practices, and Habits
Principals, Practices, and HabitsPrincipals, Practices, and Habits
Principals, Practices, and Habits
 
The RDFIO Extension - A Status update
The RDFIO Extension - A Status updateThe RDFIO Extension - A Status update
The RDFIO Extension - A Status update
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 
Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15
 
Thesis presentation Samuel Lampa
Thesis presentation Samuel LampaThesis presentation Samuel Lampa
Thesis presentation Samuel Lampa
 
Reproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and AndurilReproducible bioinformatics pipelines with Docker and Anduril
Reproducible bioinformatics pipelines with Docker and Anduril
 
Agile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discoveryAgile large-scale machine-learning pipelines in drug discovery
Agile large-scale machine-learning pipelines in drug discovery
 
Batch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWikiBatch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWiki
 
Neuro4j Workflow Overview
Neuro4j Workflow OverviewNeuro4j Workflow Overview
Neuro4j Workflow Overview
 
Flow based programming an overview
Flow based programming   an overviewFlow based programming   an overview
Flow based programming an overview
 
Flow-based programming with Elixir
Flow-based programming with ElixirFlow-based programming with Elixir
Flow-based programming with Elixir
 
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
 
ZeroMQ Is The Answer
ZeroMQ Is The AnswerZeroMQ Is The Answer
ZeroMQ Is The Answer
 
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive SystemsGo Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
Go Reactive: Event-Driven, Scalable, Resilient & Responsive Systems
 
Reactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive WayReactive Streams: Handling Data-Flow the Reactive Way
Reactive Streams: Handling Data-Flow the Reactive Way
 
Reproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience SeminarReproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience Seminar
 
Introduction to the drug discovery process
Introduction to the drug discovery processIntroduction to the drug discovery process
Introduction to the drug discovery process
 
Strategic review (Sample)
Strategic review (Sample)Strategic review (Sample)
Strategic review (Sample)
 
Hadoop gets Groovy
Hadoop gets GroovyHadoop gets Groovy
Hadoop gets Groovy
 

Similaire à SciPipe - A light-weight workflow library inspired by flow-based programming

Iga workflow
Iga workflowIga workflow
Iga workflow
WEBdeBS
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 

Similaire à SciPipe - A light-weight workflow library inspired by flow-based programming (20)

apacheairflow-160827123852.pdf
apacheairflow-160827123852.pdfapacheairflow-160827123852.pdf
apacheairflow-160827123852.pdf
 
Jupyter notebooks on steroids
Jupyter notebooks on steroidsJupyter notebooks on steroids
Jupyter notebooks on steroids
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Performance optimization techniques for Java code
Performance optimization techniques for Java codePerformance optimization techniques for Java code
Performance optimization techniques for Java code
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
Iga workflow
Iga workflowIga workflow
Iga workflow
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
PyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web ApplicationsPyCon AU 2012 - Debugging Live Python Web Applications
PyCon AU 2012 - Debugging Live Python Web Applications
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Wait, IPython can do that?! (30 minutes)
Wait, IPython can do that?! (30 minutes)Wait, IPython can do that?! (30 minutes)
Wait, IPython can do that?! (30 minutes)
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Go at Skroutz
Go at SkroutzGo at Skroutz
Go at Skroutz
 
Building Your First Apache Apex Application
Building Your First Apache Apex ApplicationBuilding Your First Apache Apex Application
Building Your First Apache Apex Application
 
Building your first aplication using Apache Apex
Building your first aplication using Apache ApexBuilding your first aplication using Apache Apex
Building your first aplication using Apache Apex
 
Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.Great Tools Heavily Used In Japan, You Don't Know.
Great Tools Heavily Used In Japan, You Don't Know.
 
My "Perfect" Toolchain Setup for Grails Projects
My "Perfect" Toolchain Setup for Grails ProjectsMy "Perfect" Toolchain Setup for Grails Projects
My "Perfect" Toolchain Setup for Grails Projects
 
Api world apache nifi 101
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 

Plus de Samuel Lampa

Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorial
Samuel Lampa
 
My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013
Samuel Lampa
 

Plus de Samuel Lampa (7)

Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
 
How to document computational research projects
How to document computational research projectsHow to document computational research projects
How to document computational research projects
 
First encounter with Elixir - Some random things
First encounter with Elixir - Some random thingsFirst encounter with Elixir - Some random things
First encounter with Elixir - Some random things
 
Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorial
 
My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013
 
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

SciPipe - A light-weight workflow library inspired by flow-based programming

  • 1. SciPipe A light-weight workflow library inspired by flow-based programming Samuel Lampa, @smllmp, bionics.it Dept. Pharm. Biosci. UU, 2016-04-28
  • 2. Top light-weight workflow tools Snakemake ● Great for short one-off explorative stuff ● Tricky for complex graphs Bpipe ● Easy to use for highly linear workflows ● Not so easy with branching workflows Nextflow ● Dataflow means dynamic scheduling possible(!) ● Own way of organizing outputs ● No “re-usable components” support
  • 3. SciLuigi and SciPipe SciLuigi ● Great re-usable components story ● Highly customizable output file naming ● Easy to extend API ● No dynamic scheduling :( ● Performance problems with more than 64 workers SciPipe ● (Same benefits as SciLuigi) ● Also: Allows dynamic scheduling ● Also: Much lower resource usage (1000s of workers is OK) ● Also: Simpler, less code, less maintenance ● Also: High-performance for in-line components
  • 4. SciPipe in brief ● Website: scipipe.org ● Simple, very little code => maintainable ● Write workflows in a subset of Go(lang) ● Execute readable .go-files: go run myworkflow.go ● Optional compilation to static executable files: go build; ./myworkflow ● No new language. Use existing Go tooling: ● Editors, Debuggers, Linters, Profilers ...
  • 6. Flow-based programming principles ● Separate network definition (separate from process definitions) ● Named ports ● Channels with bounded buffers ● Information packets (IPs) with defined lifetimes ● More info: en.wikipedia.org/wiki/Flow-based programming www.jpaulmorrison.com/fbp
  • 7. Define components From shell commands: … or in plain Go: … and then mix & match! (See e.g. this example)
  • 8. Connect components Just assign out-ports to in-ports (will make them use the same Go channel): That’s about it.
  • 9. SciPipe in action: ”Hello World” example
  • 11. SciPipe: A bit longer example... In this area, all processes will run tasks in parallel, one for each split produced by “split”
  • 12. Architecture: Basic Components ● scipipe.SciProcess ● Long-running ● Typically one per operation ● Typically spawns one task per input ● scipipe.SciTask ● Short lived ● Executes just one shell command or custom Go function ● Typically one per operation/set of in-data files ● scipipe.FileTarget ● Most common data type passed between processes