The Bioinformatics Core Facility implemented during the years a number of procedures and pipelines for providing high quality results to an increasing number of users. Here we present our experience with migrating some of extensively used pipelines to the Nextflow framework and creating docker/singularity containers for reproducibility.
3. Core facility typical workflow
User
Standardised
analysis
Non standard
analysis
Building a database
Reproducing an analysis…
Chipseq
RNAseq
SNP calling …
4. Core facility typical workflow
User
Standardised
analysis
Non standard
analysis
Semi-automatic
pipelines
Chipseq
RNAseq
SNP calling …
Building a database
Reproducing an analysis…
5. Core facility typical workflow
User
Standardised
analysis
Non standard
analysis
Bunch of tools,
custom scripts, R
magics etc..
Semi-automatic
pipelines
Chipseq
RNAseq
SNP calling …
Building a database
Reproducing an analysis…
6. Core facility typical workflow
User
Standardised
analysis
Non standard
analysis
Bunch of tools,
custom scripts, R
magics etc..
Semi-automatic
pipelines
50%50%
7. Core facility typical workflow
Genomics
39%
Database
15%
Microbiome
12%
RNA-seq
18%
ChIP-seq
13%
Microarray &
HTqPCR
3%
Hours by type of projects (2015 & 2016 )
8. Core facility typical workflow
Genomics
39%
Database
15%
Microbiome
12%
RNA-seq
18%
ChIP-seq
13%
Microarray &
HTqPCR
3%
Hours by type of projects (2015 & 2016 )
9. Why nextflow?
• Standard analysis:
• Automation, parallelization, portability, reproducibility
(together with containers).
• NF allows adding new steps without pain (thanks to isolation
of processes) in a collaborative way
[After 2 years] Can you redo the SAME analysis with new samples?
10. Why nextflow?
• Standard analysis:
• Automation, parallelization, portability, reproducibility
(together with containers)
• NF allows adding new steps without pain (thanks to isolation
of processes) in a collaborative way
• Non standard analysis can benefit too:
• NF code is easy to reuse / modify. It is polyglot!
• Using containers prevent several problems like portability, OS
upgrade, libraries / version mismatch, etc.
21. Our experience
Now we are developing / developed:
• ChIPseq pipeline
• RNAseq pipeline
• small RNAseq pipeline
• SNP calling procedures (based on GATK)
Standard analysis
22. Our experience
Now we are developing / developed:
• ChIPseq pipeline
• RNAseq pipeline
• small RNAseq pipeline
• SNP calling procedures (based on GATK)
• Pipeline for analysis or single cell transcriptome
• Detection of plant resistant genes …
NON standard
Standard analysis
23. Future ideas
• Semi-automatic reports
• A CMS able to mine the NextFlow logfile and store both
metadata and logs
• Maybe a simple graphical interface to compete / complement
with Galaxy?