SlideShare une entreprise Scribd logo
1  sur  21
Taming Snakemake
1/27/14
Why Make?


What are Make's advantages (over Perl and shell scripts)?


Make forces you to think about file transformation in terms of inputs and outputs, recipes and rules. In Perl you are forced to think at the level of
variables, conditionals, and loops. In Shell you are forced to think like a caveman.



Unfortunately, bioinformatics is still largely about files and their suffixes. Make has a very powerful syntax based almost entirely around file suffixes.



Make knows what's been made and what hasn't. Make can be interrupted and restarted safely, and without overwriting finished work.



Make knows what's changed and what hasn't. If an input is newer than an output, it will attempt to rebuild the output.



Make allows you to add new input files without worrying about overwriting old ones.



Make is well supported. There are 1333 Make questions on SO alone.



When people see a Makefile, they immediately know how to run it.



Make does not force you to wrap shell statements in quotes.



Make is a DSL. It will attempt to validate your syntax.



Make is ancient, ubiquitous, and reliable.



Make can parallelize with --jobs.



Make recipes encourage reuse.

https://share.chop.edu/pages/viewpage.action?pageId=138478819
Make review

http://github.research.chop.edu/BiG/err_chip_seq/blob/master/Makefile
Pipelines and Workflows
Other pipelines
Ruffus

Queue

GKNO
Why Snakemake?
 Addresses Makefile weaknesses without
throwing out the good stuff
 Difficult to implement control flow
 No cluster support
 Inflexible wildcards
 Too much reliance on sentinal files
 No reporting mechanism

Johannes Köster
Syntax
Make
Variables

Targets

Rules

Snakemake
Utilities
 Logs - wire them up manually

 Cluster support pretty decent
source /nas/is1/leipzig/martin/variome-env/bin/activate
snakemake --directory /nas/is1/leipzig/martin/snake-env --snakefile /nas/is1/leipzig/martin/snake-env/Snakefile -c qsub -j
16
source /mnt/isilon/cbmi/variome/leipzig/martin/respublica-env/bin/activate
snakemake --directory /mnt/isilon/cbmi/variome/leipzig/martin/snake-env --snakefile /mnt/isilon/cbmi/variome/leipzig/martin/snake-env/Snakefile -c qsub -j
16

 Cores/jobs/resources
Useful stuff
 dry-runs
 keep-going
 touch
 version changes
 workflow diagrams
Python legal
Client websites with Jekyll
 Jekyll is a templating engine for blogs that
accepts Markdown
 Layouts use the Liquid markup

http://mitomap.org/martin-rnaseq/
A workflow that reports itself
Avoiding Sweave-Hell
The bad way
Cache-ing chunks?
Avoiding Sweave-Hell
Avoiding Sweave-Hell
R/Snakemake integration
git submodule add git@github.research.chop.edu:BiG/rna-seq-common-functions.git common/rna-seq
Leave a paper trail
Reproducible Checklist
 repository github.research.chop.edu
 workflow of some kind from beginning to end
 website at mybic.chop.edu
Ties that bind

Contenu connexe

Tendances

Sciunits: Resuable Research Object
Sciunits: Resuable Research Object Sciunits: Resuable Research Object
Sciunits: Resuable Research Object Tanu Malik
 
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIPhil Ewels
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Lichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerLichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerBOSC 2010
 
London bosc2010
London bosc2010London bosc2010
London bosc2010BOSC 2010
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolHong ChangBum
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryZongYing Lyu
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotLi Shen
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analysesrjorton
 
Bic bellec 2016_small
Bic bellec 2016_smallBic bellec 2016_small
Bic bellec 2016_smallPierre Bellec
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceJustin Johnson
 
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesReece Hart
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Anne Nicolas
 
Variability, Bugs, and Cognition
Variability, Bugs, and CognitionVariability, Bugs, and Cognition
Variability, Bugs, and CognitionAndrzej Wasowski
 
Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemZongYing Lyu
 
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelAn Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelNicolas Bettenburg
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysisYun Lung Li
 

Tendances (20)

Sciunits: Resuable Research Object
Sciunits: Resuable Research Object Sciunits: Resuable Research Object
Sciunits: Resuable Research Object
 
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGIWhole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
Whole Genome Sequencing - Data Processing and QC at SciLifeLab NGI
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Lichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerLichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseeker
 
London bosc2010
London bosc2010London bosc2010
London bosc2010
 
Galaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo ProtocolGalaxy RNA-Seq Analysis: Tuxedo Protocol
Galaxy RNA-Seq Analysis: Tuxedo Protocol
 
Software Dev
Software DevSoftware Dev
Software Dev
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memory
 
Next-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plotNext-generation sequencing format and visualization with ngs.plot
Next-generation sequencing format and visualization with ngs.plot
 
NGS data formats and analyses
NGS data formats and analysesNGS data formats and analyses
NGS data formats and analyses
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
Bic bellec 2016_small
Bic bellec 2016_smallBic bellec 2016_small
Bic bellec 2016_small
 
Nephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele resultsNephele 2.0: How to get the most out of your Nephele results
Nephele 2.0: How to get the most out of your Nephele results
 
Closing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real ScienceClosing the Gap in Time: From Raw Data to Real Science
Closing the Gap in Time: From Raw Data to Real Science
 
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment Discrepancies
 
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
Kernel Recipes 2016 - Would an ABI changes visualization tool be useful to Li...
 
Variability, Bugs, and Cognition
Variability, Bugs, and CognitionVariability, Bugs, and Cognition
Variability, Bugs, and Cognition
 
Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory system
 
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release LevelAn Empirical Study on Inconsistent Changes to Code Clones at Release Level
An Empirical Study on Inconsistent Changes to Code Clones at Release Level
 
Big data solution for ngs data analysis
Big data solution for ngs data analysisBig data solution for ngs data analysis
Big data solution for ngs data analysis
 

Similaire à Taming Snakemake

🐲 Here be Stacktraces — Flink SQL for Non-Java Developers
🐲 Here be Stacktraces — Flink SQL for Non-Java Developers🐲 Here be Stacktraces — Flink SQL for Non-Java Developers
🐲 Here be Stacktraces — Flink SQL for Non-Java DevelopersHostedbyConfluent
 
Makefile for python projects
Makefile for python projectsMakefile for python projects
Makefile for python projectsMpho Mphego
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBoxlzap
 
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Espen Brækken
 
Compile ahead of time. It's fine?
Compile ahead of time. It's fine?Compile ahead of time. It's fine?
Compile ahead of time. It's fine?Dmitry Chuyko
 
Iptablesrocks
IptablesrocksIptablesrocks
Iptablesrocksqwer_asdf
 
Functional programming is the most extreme programming
Functional programming is the most extreme programmingFunctional programming is the most extreme programming
Functional programming is the most extreme programmingsamthemonad
 
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in... ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...Saurabh Nanda
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkIan Pointer
 
Developing OpenResty Framework
Developing OpenResty FrameworkDeveloping OpenResty Framework
Developing OpenResty FrameworkOpenRestyCon
 
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS
 
Webinar: Learn Perl - The Jewel of Scripting Languages
Webinar: Learn Perl - The Jewel of Scripting LanguagesWebinar: Learn Perl - The Jewel of Scripting Languages
Webinar: Learn Perl - The Jewel of Scripting LanguagesEdureka!
 
Concurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple SpacesConcurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple Spacesluccastera
 
Le PERL est mort
Le PERL est mortLe PERL est mort
Le PERL est mortapeiron
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Jennifer Shelton
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data storeJ On The Beach
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala InternalsDavid Groozman
 
Introduction to Clojure
Introduction to ClojureIntroduction to Clojure
Introduction to ClojureRenzo Borgatti
 

Similaire à Taming Snakemake (20)

🐲 Here be Stacktraces — Flink SQL for Non-Java Developers
🐲 Here be Stacktraces — Flink SQL for Non-Java Developers🐲 Here be Stacktraces — Flink SQL for Non-Java Developers
🐲 Here be Stacktraces — Flink SQL for Non-Java Developers
 
Makefile for python projects
Makefile for python projectsMakefile for python projects
Makefile for python projects
 
Katello on TorqueBox
Katello on TorqueBoxKatello on TorqueBox
Katello on TorqueBox
 
Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex Ruby on Rails (RoR) as a back-end processor for Apex
Ruby on Rails (RoR) as a back-end processor for Apex
 
Compile ahead of time. It's fine?
Compile ahead of time. It's fine?Compile ahead of time. It's fine?
Compile ahead of time. It's fine?
 
Iptablesrocks
IptablesrocksIptablesrocks
Iptablesrocks
 
Functional programming is the most extreme programming
Functional programming is the most extreme programmingFunctional programming is the most extreme programming
Functional programming is the most extreme programming
 
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in... ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
 
Basic Make
Basic MakeBasic Make
Basic Make
 
Developing OpenResty Framework
Developing OpenResty FrameworkDeveloping OpenResty Framework
Developing OpenResty Framework
 
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
 
Intro to Elixir talk
Intro to Elixir talkIntro to Elixir talk
Intro to Elixir talk
 
Webinar: Learn Perl - The Jewel of Scripting Languages
Webinar: Learn Perl - The Jewel of Scripting LanguagesWebinar: Learn Perl - The Jewel of Scripting Languages
Webinar: Learn Perl - The Jewel of Scripting Languages
 
Concurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple SpacesConcurrent Programming with Ruby and Tuple Spaces
Concurrent Programming with Ruby and Tuple Spaces
 
Le PERL est mort
Le PERL est mortLe PERL est mort
Le PERL est mort
 
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
Lecture1: NGS Analysis on Beocat and an introduction to Perl programming for ...
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
Introduction to Clojure
Introduction to ClojureIntroduction to Clojure
Introduction to Clojure
 

Dernier

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 

Dernier (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 

Taming Snakemake

  • 2. Why Make?  What are Make's advantages (over Perl and shell scripts)?  Make forces you to think about file transformation in terms of inputs and outputs, recipes and rules. In Perl you are forced to think at the level of variables, conditionals, and loops. In Shell you are forced to think like a caveman.  Unfortunately, bioinformatics is still largely about files and their suffixes. Make has a very powerful syntax based almost entirely around file suffixes.  Make knows what's been made and what hasn't. Make can be interrupted and restarted safely, and without overwriting finished work.  Make knows what's changed and what hasn't. If an input is newer than an output, it will attempt to rebuild the output.  Make allows you to add new input files without worrying about overwriting old ones.  Make is well supported. There are 1333 Make questions on SO alone.  When people see a Makefile, they immediately know how to run it.  Make does not force you to wrap shell statements in quotes.  Make is a DSL. It will attempt to validate your syntax.  Make is ancient, ubiquitous, and reliable.  Make can parallelize with --jobs.  Make recipes encourage reuse. https://share.chop.edu/pages/viewpage.action?pageId=138478819
  • 6. Why Snakemake?  Addresses Makefile weaknesses without throwing out the good stuff  Difficult to implement control flow  No cluster support  Inflexible wildcards  Too much reliance on sentinal files  No reporting mechanism Johannes Köster
  • 8. Utilities  Logs - wire them up manually  Cluster support pretty decent source /nas/is1/leipzig/martin/variome-env/bin/activate snakemake --directory /nas/is1/leipzig/martin/snake-env --snakefile /nas/is1/leipzig/martin/snake-env/Snakefile -c qsub -j 16 source /mnt/isilon/cbmi/variome/leipzig/martin/respublica-env/bin/activate snakemake --directory /mnt/isilon/cbmi/variome/leipzig/martin/snake-env --snakefile /mnt/isilon/cbmi/variome/leipzig/martin/snake-env/Snakefile -c qsub -j 16  Cores/jobs/resources
  • 9. Useful stuff  dry-runs  keep-going  touch  version changes  workflow diagrams
  • 11. Client websites with Jekyll  Jekyll is a templating engine for blogs that accepts Markdown  Layouts use the Liquid markup http://mitomap.org/martin-rnaseq/
  • 12. A workflow that reports itself
  • 18. R/Snakemake integration git submodule add git@github.research.chop.edu:BiG/rna-seq-common-functions.git common/rna-seq
  • 19. Leave a paper trail
  • 20. Reproducible Checklist  repository github.research.chop.edu  workflow of some kind from beginning to end  website at mybic.chop.edu