SlideShare a Scribd company logo
1 of 22
Download to read offline
Getting Started With RNA-Seq Data Analysis
Genome Institute of Singapore
Andreas Wilm, PhD :: Team Lead, Bioinformatics Core
24th May 2018
2
Sequencing, Data Management & Genome Analytics Services
http://www.igap.io	| igap@gis.a-star.edu.sg
3
This is not an RNASeq tutorial
Disclaimer
4
• Bulk vs. Single Cell
• Transcript discovery
• Differential gene expression
• Allele-specific expression
• Detection of RNA editing
• Viral detection
• Gene fusion detection
• Alternative splicing
• De novo transcript assembly
• …
“The analysis goals of RNA-Seq experiments are diverse. Each of
these analysis goals has distinct requirements and challenges”
Source: Griffith et al. (2015)
5
Outstanding Training Resource
Griffith et al. (PLoS Comp Biol, 2015)
Informatics for RNA Sequencing: A Web Resource for
Analysis on the Cloud
6
Companion Web Resource: www.rnaseq.wiki
7
Short Read Mapping
• Naïve mapping to genome won’t
work: exon-exon junctions
• Use splice-aware mappers (e.g.
STAR). Require annotation
• Recent (fast) alternative: pseudo-
aligners (e.g. Kallisto), which
map to transcriptome
Source:	Wikipedia
8
File Formats Plain Reads: FastQ
(usually	>=	20M	per	sample)
Aligned: BAM Courtesy:	Jonathan	Göke
9
Common Analysis Flowchart
• Example: Tuxedo suite
• Large number of tools
• Number of representative tools
listed in Griffith et al. (2015): >100
• Multiple versions available
Source:		Griffith	 et	al. 2015
10
Tools Evolve
Source:	Twitter
• Keep upgrading !
• Try different programs
11
Installation: Experts Only?
• Often requires Unix/Linux knowledge
• Install package dependencies first
• Not being root/admin can complicate things
• How to install different version of the same program?
Source:	XKCD
12
Package Managers
Homebrew-Bio
Bioinformatics	formulae	
for	Linuxbrew	and Homebrew
More	than	3000	bioinformatics
packages	ready	to	use	with conda
• Precompiled	packages
• Installation	through	single	command
• Multiple	versions	available
• Support	for	isolated	environments
13
RNACocktail Protocol Dependencies
…
…
…
…
…
…
...
…
…
…
…
…
…
…
Sahraeianet	al. 2017
14
Containers
Singularity
• Containers package software, dependencies etc.
• No installation needed at all
• Think: entire operating system in a file
• Extreme Portability
15
Example: New Tuxedo Protocol (Pertea et al., 2016)
Source: https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual
• Replaces TopHat, Cuffdiff etc.
• How to scale to many samples?
• What if a step fails?
• How to make useable for others?
16
Workflow Managers
• Automation
• Scalability
• Robustness
• Reentrancy
• Reproducibility
• Sharing
Di	Tommaso et	al.	(NBT,	2017) Köster &	Rahmann (Bioinformatics,	2012)	
snakemake
17
Putting it all Together
• Portable
• Validated
• Well documented
• Community maintained
Courtesy: Paolo Di Tommaso
Presented at Bio-IT World 2018
18
RNASeq Pipelines, Written in Nextflow and Using Containers
• NF-Core RNASeq: expression analysis and extensive QC
• Tuxedo: Transcript-level expression analysis and DE
• CalliNGS-NF: Variant Calling with GATK, incl. ASE
• …
19
Demo
• Example: GATK best practices for
variant calling for RNAseq
• Plus SNVs post-processing and
quantification for allele specific
expression
• Note: multiple samples, no software
installation, fully orchestrated etc.
Source: https://github.com/CRG-CNAG/CalliNGS-NF
20
Honorable Mentions
Cloud based analytics portals:
And others…
Software	for	Single	Cell	Gene	Expression:
Cell	Ranger	Pipelines	and	Loupe Cell	Browser
21
We can do the Heavy Lifting for you:
From Off-the-shelf to “Artisan” Analysis
http://www.igap.io	| igap@gis.a-star.edu.sg
Thank you

More Related Content

What's hot (6)

Lichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseekerLichtenberg bosc2010 wordseeker
Lichtenberg bosc2010 wordseeker
 
Collaborative Real-Time Editing: Shane Carr
Collaborative Real-Time Editing: Shane CarrCollaborative Real-Time Editing: Shane Carr
Collaborative Real-Time Editing: Shane Carr
 
The eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal ToolkitThe eXtensible Catalog Drupal Toolkit
The eXtensible Catalog Drupal Toolkit
 
Xitrum internals
Xitrum internalsXitrum internals
Xitrum internals
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Lo19
Lo19Lo19
Lo19
 

Similar to Getting Started with RNA-Seq Data Analysis

Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
Ola Spjuth
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
Raffi Khatchadourian
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
Emir Muñoz
 

Similar to Getting Started with RNA-Seq Data Analysis (20)

Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
A personal journey towards more reproducible networking research
A personal journey towards more reproducible networking researchA personal journey towards more reproducible networking research
A personal journey towards more reproducible networking research
 
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
CLIMB talk in the Virtual Laboratories session at the RCUK Cloud Working Grou...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
From construction to deployment of LifeWatchGreece the potentail role of EGI-...
From construction to deployment of LifeWatchGreece the potentail role of EGI-...From construction to deployment of LifeWatchGreece the potentail role of EGI-...
From construction to deployment of LifeWatchGreece the potentail role of EGI-...
 
Network research
Network researchNetwork research
Network research
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
HiPipe Professional
HiPipe ProfessionalHiPipe Professional
HiPipe Professional
 
Reproducible research - to infinity
Reproducible research - to infinityReproducible research - to infinity
Reproducible research - to infinity
 
Opentracing jaeger
Opentracing jaegerOpentracing jaeger
Opentracing jaeger
 
Distributed Tracing with Jaeger
Distributed Tracing with JaegerDistributed Tracing with Jaeger
Distributed Tracing with Jaeger
 
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
dkNET Webinar "Texera: A Scalable Cloud Computing Platform for Sharing Data a...
 
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
Research data zone: veilige en geoptimaliseerde netwerkomgeving voor onderzoe...
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
 
Reading Group 2014
Reading Group 2014Reading Group 2014
Reading Group 2014
 

Recently uploaded

HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
ssusera4ec7b
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
Brahmesh Reddy B R
 

Recently uploaded (20)

THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.pptTHE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
THE FUNDAMENTAL UNIT OF LIFE CLASS IX.ppt
 
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdfFORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
FORENSIC CHEMISTRY ARSON INVESTIGATION.pdf
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
 
GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Efficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence accelerationEfficient spin-up of Earth System Models usingsequence acceleration
Efficient spin-up of Earth System Models usingsequence acceleration
 
Introduction and significance of Symbiotic algae
Introduction and significance of  Symbiotic algaeIntroduction and significance of  Symbiotic algae
Introduction and significance of Symbiotic algae
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...Manganese‐RichSandstonesasanIndicatorofAncientOxic  LakeWaterConditionsinGale...
Manganese‐RichSandstonesasanIndicatorofAncientOxic LakeWaterConditionsinGale...
 
GBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) EnzymologyGBSN - Biochemistry (Unit 8) Enzymology
GBSN - Biochemistry (Unit 8) Enzymology
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed RahimoonVital Signs of Animals Presentation By Aftab Ahmed Rahimoon
Vital Signs of Animals Presentation By Aftab Ahmed Rahimoon
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
MSCII_              FCT UNIT 5 TOXICOLOGY.pdfMSCII_              FCT UNIT 5 TOXICOLOGY.pdf
MSCII_ FCT UNIT 5 TOXICOLOGY.pdf
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPTHIV AND INFULENZA VIRUS PPT HIV PPT  INFULENZA VIRUS PPT
HIV AND INFULENZA VIRUS PPT HIV PPT INFULENZA VIRUS PPT
 
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptxNanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
Nanoparticles for the Treatment of Alzheimer’s Disease_102718.pptx
 
PARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th semPARENTAL CARE IN FISHES.pptx for 5th sem
PARENTAL CARE IN FISHES.pptx for 5th sem
 
Heat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree daysHeat Units in plant physiology and the importance of Growing Degree days
Heat Units in plant physiology and the importance of Growing Degree days
 
EU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdfEU START PROJECT. START-Newsletter_Issue_4.pdf
EU START PROJECT. START-Newsletter_Issue_4.pdf
 
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENSANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
ANITINUTRITION FACTOR GYLCOSIDES SAPONINS CYANODENS
 

Getting Started with RNA-Seq Data Analysis