SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Bioinformatics Course
Ferran Briansó and Alex
Sánchez
ferran.brianso@vhir.org
Hospital Universitari Vall d’Hebron
Institut de Recerca - VHIR
Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII)
Introduction to RNA-seq
and
RNA-seq Data Analysis
OVERVIEW1
2
3
4
RNA-SEQ ANALYSIS PIPELINE(S)
NORMALIZATION METHODS
DIFFERENTIAL EXPRESSION TESTING
5 Complements
1
Disclaimer
• This lecture is based on many
presentations freely available in the
web.
• We wish to acknowledge the
authors for their efforts and for
making their work available
Transcriptomics by NGS
1
Evolution of transcriptomics technologies
• Northern Blot
• RT-PCR
• Microarrays
• (NGS) RNA-seq
• Single Genes
• Multiple genes
• Whole Genomes
• Populations of genomes
1
What is RNA-seq?
• RNA-seq is the high throughput sequencing of cDNA using NGS
technologies
• RNA-seq works by sequencing every RNA molecule and profiling
the expression of a particular gene by counting the number of time
its transcripts have been sequenced.
• The summarized RNA-Seq data is widely known as count data
A typical RNA-seq
experiment
Briefly, long RNAs are first converted
into a library of cDNA fragments
through either RNA fragmentation or
DNA fragmentation (see main text).
Sequencing adaptors (blue) are
subsequently added to each cDNA
fragment and a short sequence is
obtained from each cDNA using high-
throughput sequencing technology. The
resulting sequence reads are aligned
with the reference genome or
transcriptome, and classified as three
types: exonic reads, junction reads and
poly(A) end-reads. These three types
are used to generate a base-resolution
expression profile for each gene, as
illustrated at the bottom; a yeast ORF
with one intron is shown.
Nature Reviews Genetics 10, 57-63 (January 2009)
Why use RNA-seq
• Unique new possibilities
• Evaluate absolute transcript level of sequenced and
unsequenced organisms.
• Detect novel transcripts and isoforms
• Map exon/intron boundaries, splice junctions
• Analyze alternative splicing
• Reveal sequence variations (e.g. SNPs) and splice
variants
It has limitations too ...
• Non-uniformity coverage of the genome due to
experimental factors
• Transcript-length bias
• Read mapping uncertainty caused by sequencing
error rates, repetitive elements, incomplete genome
sequence, etc
• Downstream bioinformatics algorithm/software need
to be improved
• Cost more than microarray
• Sequencing depth (Library size): Total number of reads mapped to the
genome.
• Gene length: Number of bases that a gene has.
• Gene counts: Number of reads mapping to that gene (expression
measurement).
1
Important concepts
Microarrays vs NGS
• Digital Signal
• Harder to achieve & interpret
• Reads counts: discrete values
• Weak background or no noise
• Analog Signal
• Easy to convey the signal’s
information
• Continuous strength
• Signal loss and distortion
RNA-seq can be seen as the NGS-counterpart of
microarrays
1 Microarrays and NGS pipelines
RNA seq and microarrays yield correlated results
Pros and cons of RNA-seq and microarrays
Microarrays

• Costs,
• well established methods,
small data

• Hybridization bias,
• sequence must be known
RNA-seq

• High reproducibility,
• not limited to expression

• Cost
• Complexity of analysis
So what?
• It is generally agreed/believed/expected that RNA-
seq will soon replace microarrays for many uses.
But not for all uses
• There are still situations where the “simplicity” of
microarrays yields the necessary information at an
optimal cost.
• Microarrays are now part of the standard molecular
biology toolbox whereas RNA-seq is still in development.
RNA-seq analysis
(workflows, pipelines, protocols…)
1
RNA-seq analysis workflow
1
RNA-seq analysis workflow
• Reads are mapped to the reference genome or
transcriptome
• Mapped reads are assembled into expression
summaries (tables of counts, showing how may
reads are in coding region, exon, gene or
junction)
• Data is normalized
• Statistical testing of differential expression (DE)
is performed, producing a list of genes with p-
values and fold changes.
• Similar downstream analysis than microarray
results (Functional Annotations, Gene
Enrichment Analysis; Integration with other
data...)
Tools for base calling, sequence quality control,
alignment, mapping, summarizing...
FastQC, FastX, Bowtie/Picard, TopHat,
Cufflinks, Cuffmerge, ...
RNA-seq analysis workflow
Main Issues:
• Number of allowed mismatche
• Number of multihits
• Mates expected distance
• Considering exon junctions
End up with a list of
# of reads per transcript
These will be our (discrete)
response variable
RNA Seq data analysis (1)-Mapping
10 years or plus of high throughput
data analysis
• Sequencing  genomic locations of many reads
• Next task : Summarize & aggregate reads over
some biologically meaningful unit, such as
exons, transcripts or genes.
• Many methods available
• Counts # of reads overlapping the exons in a gene,
• Include reads along the whole length of the gene and
thereby incorporate reads from 'introns'.
• Include only reads that map to coding sequence or…
RNA Seq data analysis (2)-Summarization
RNA Seq data analysis (2)-Summarization
10 years or plus of high throughput
data analysis
• Two main sources of bias
– Influence of length: Counts are proportional to the
transcript length times the mRNA expression level.
– Influence of sequencing depth: The higher
sequencing depth, the higher counts.
• How to deal with this
– Normalize (correct) gene counts to minimize biases.
– Use statistical models that take into account
length and sequencing depth
RNA Seq data analysis (3)-Normalization
• RPKM (Mortazavi et al., 2008): Counts are divided by the transcript length
(kb) times the total number of millions of mapped reads.
•
• TMM (Robinson and Oshlack, 2010): Trimmed Mean of M values.
• EDAseq (Risso et al., 2011): Within-lane gene-level GC-content
normalization (corrects for library size, gene length, GC-content)
• cqn (Hansen et al., 2011): Conditional quantile normalization (CQN)
algorithm combining robust generalized regression (corrects for library
size, gene length, GC-content)
• Others: Upper-quartile (Bullard et al., 2010); FPKM (Trapnell et al., 2010): Instead
of counts, Cufflinks software generates FPKM values (Fragments Per Kilobase of
exon per Million fragments mapped) to estimate gene expression, which are
analogous to RPKM.
3 RNA-seq normalization methods
RNA Seq (4)- Differential
expression analysis
• The goal of a DE analysis is to highlight
genes that have changed significantly in
abundance across experimental conditions.
• In general, this means
• taking a table of summarized count data for each
library and
• performing statistical testing between samples of
interest
RNA Seq (4)- Methods for Differential
Expression Analysis
• Transform count data to use existing approaches for
microarray data.
• Use Fisher's exact test or similar approaches.
• Use statistical models appropriate for count data
such as Generalized Linear Models using
– Poisson distribution.
– Negative binomial distribution.
10 years or plus of high throughput
data analysis
10 years or plus of high throughput
data analysis
RNA seq example (Normalized values)
10 years or plus of high throughput
data analysis
RNA seq example – Analysis (Fisher test)
•The number of reads that are mapped into a gene was first
modelled using a Poisson distribution
• Poisson distribution appears when things are counted
• It assumes that mean and variance are the same
• However biological variability of RNA-seq count data cannot be
captured using the Poisson distribution because data present
overdispersion (i.e., variance of counts larger than mean)
• Negative Binomial (NB) distribution takes into account
overdispersion; hence, it has been used to model RNA-seq
data
• Poisson distribution has only one parameter λ, while NB is a
two-parameter distribution λ and φ.
Statistical models for count data
Analysis methods based on
assuming statistical models
• Basic analysis methods use the exact test approach:
• for each gene (t = 1, ...), and groups A and B, H0 : λtA = λtB
• There are better options if data are assumed to
follow a Negative Binomial Distribution or some
generlization of this.
• edgeR allows the option of estimating a different φ
parameter for each gene
• baySeq uses Poisson-Gamma and BN models estimating
parameters by bootstrapping from the data.
• DESeq assumes that the mean is a good predictor of the
variance.
RNA Seq (5)-Going beyond gene lists (1)
• DE analysis yields lists of differentially expressed
genes [transcripts, …]
• Traditionally these lists arae explored by some type
of gene set analysis
• RNA-seq has biases (e.g. due to gene length) that
require adapting methods developed with
microarray
• GO-Seq is such a method
RNA Seq (5)-Going beyond gene lists (2)
• Results of RNA-seq data can be integrated with
other sources of biological data e.g. to establish
a more complete picture of gene regulation
• RNA-seq has in conjunction with genotyping data identify
genetic loci responsible for variation in gene expression
between individuals
• Integration of expression data & epigenomic information
(transcription factor binding, histone modification,
methylation) has the potential for greater understanding of
regulatory mechanisms.
Additional topics
Additional topics
• Transcriptome assembly
• Alignment methods and tools
• Alternative splicing and isoforms
• List of software methods and tools for
differential expression analysis of RNA-seq
http://genomebiology.com/2010/11/12/220/table/T1
De novo assembly
• Underlying assumptions relative to RNA expression
• sequence coverage is similar in reads of the same
transcript
• strand specific (sense and antisense transcripts)
• Assemblers:
• Velvet (Genomic and transcriptomic)
• Trinity (Transcriptomic)
• Cufflinks (Transcriptominc, reassemble pre-aligned
transcripts to find alternative splicing based on differential
expression)
Transcriptome assembly
Alignment methods
• Two different approach are possible:
• Align vs the transcriptome
• faster, easier
• Align vs the whole genome
• the complete information
Alignment tools
• NGS common alignment program:
• BWA
• Bowtie (Bowtie2)
• Novoalign
• Take into account splice-junction
• Tophat/Cufflinks

Contenu connexe

Tendances

SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seqJyoti Singh
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Bharathiar university
 
Genomics(functional genomics)
Genomics(functional genomics)Genomics(functional genomics)
Genomics(functional genomics)IndrajaDoradla
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNAmaryamshah13
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-SeqcursoNGS
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications Aneela Rafiq
 
ZINC FINGER NUCLEASE TECHNOLOGY
ZINC FINGER NUCLEASE TECHNOLOGYZINC FINGER NUCLEASE TECHNOLOGY
ZINC FINGER NUCLEASE TECHNOLOGYPriyesh Waghmare
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsAnnelies Haegeman
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysissaberhussain9
 

Tendances (20)

SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Transcriptome analysis
Transcriptome analysisTranscriptome analysis
Transcriptome analysis
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS) Massively Parallel Signature Sequencing (MPSS)
Massively Parallel Signature Sequencing (MPSS)
 
Rna seq pipeline
Rna seq pipelineRna seq pipeline
Rna seq pipeline
 
Express sequence tags
Express sequence tagsExpress sequence tags
Express sequence tags
 
Genomics(functional genomics)
Genomics(functional genomics)Genomics(functional genomics)
Genomics(functional genomics)
 
Emulsion pcr
Emulsion pcrEmulsion pcr
Emulsion pcr
 
Next Generation Sequencing of DNA
Next Generation Sequencing of DNANext Generation Sequencing of DNA
Next Generation Sequencing of DNA
 
Differential expression in RNA-Seq
Differential expression in RNA-SeqDifferential expression in RNA-Seq
Differential expression in RNA-Seq
 
SNP
SNPSNP
SNP
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 
Basics of Genome Assembly
Basics of Genome Assembly Basics of Genome Assembly
Basics of Genome Assembly
 
ZINC FINGER NUCLEASE TECHNOLOGY
ZINC FINGER NUCLEASE TECHNOLOGYZINC FINGER NUCLEASE TECHNOLOGY
ZINC FINGER NUCLEASE TECHNOLOGY
 
NGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platformsNGS - Basic principles and sequencing platforms
NGS - Basic principles and sequencing platforms
 
Primer design
Primer designPrimer design
Primer design
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
2 whole genome sequencing and analysis
2 whole genome sequencing and analysis2 whole genome sequencing and analysis
2 whole genome sequencing and analysis
 

Similaire à Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Course - Session 4.1 - VHIR, Barcelona)

RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment DesignYaoyu Wang
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods Mrinal Vashisth
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018GenomeInABottle
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data AnalysisRavi Gandham
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencingshinycthomas
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Prof. Wim Van Criekinge
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Cpgr services brochure 14 may 2013 - v 16
Cpgr services brochure   14 may 2013 - v 16Cpgr services brochure   14 may 2013 - v 16
Cpgr services brochure 14 may 2013 - v 16Reinhard Hiller
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisAdamCribbs1
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomicsJalormi Parekh
 
Tools for lncRNA research in cancer
Tools for lncRNA research in cancerTools for lncRNA research in cancer
Tools for lncRNA research in cancerGhent University
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubJennifer Shelton
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...QIAGEN
 

Similaire à Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Course - Session 4.1 - VHIR, Barcelona) (20)

RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Next generation sequencing methods
Next generation sequencing methods Next generation sequencing methods
Next generation sequencing methods
 
Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018Giab poster structural variants ashg 2018
Giab poster structural variants ashg 2018
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf05_Microbio590B_QC_2022.pdf
05_Microbio590B_QC_2022.pdf
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Cpgr services brochure 14 may 2013 - v 16
Cpgr services brochure   14 may 2013 - v 16Cpgr services brochure   14 may 2013 - v 16
Cpgr services brochure 14 may 2013 - v 16
 
Making powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysisMaking powerful science: an introduction to NGS data analysis
Making powerful science: an introduction to NGS data analysis
 
Comparative and functional genomics
Comparative and functional genomicsComparative and functional genomics
Comparative and functional genomics
 
Tools for lncRNA research in cancer
Tools for lncRNA research in cancerTools for lncRNA research in cancer
Tools for lncRNA research in cancer
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
RNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal ClubRNASeq DE methods review Applied Bioinformatics Journal Club
RNASeq DE methods review Applied Bioinformatics Journal Club
 
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
Next-Generation Sequencing an Intro to Tech and Applications: NGS Tech Overvi...
 

Plus de VHIR Vall d’Hebron Institut de Recerca

Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...VHIR Vall d’Hebron Institut de Recerca
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...VHIR Vall d’Hebron Institut de Recerca
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...VHIR Vall d’Hebron Institut de Recerca
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...VHIR Vall d’Hebron Institut de Recerca
 
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...VHIR Vall d’Hebron Institut de Recerca
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...VHIR Vall d’Hebron Institut de Recerca
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaVHIR Vall d’Hebron Institut de Recerca
 

Plus de VHIR Vall d’Hebron Institut de Recerca (20)

Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
Introduction to Functional Analysis with IPA (UEB-UAT Bioinformatics Course -...
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
Brief Overview to Amplicon Variant Analysis (UEB-UAT Bioinformatics Course - ...
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
Introduction to Galaxy (UEB-UAT Bioinformatics Course - Session 2.2 - VHIR, B...
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
 
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Bar...
 
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
NGS Introduction and Technology Overview (UEB-UAT Bioinformatics Course - Ses...
 
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
Storing and Accessing Information. Databases and Queries (UEB-UAT Bioinformat...
 
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
Introduction to Bioinformatics (UEB-UAT Bioinformatics Course - Session 1.1 -...
 
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
Genome Browsing, Genomic Data Mining and Genome Data Visualization with Ensem...
 
Information management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cmsInformation management at vhir ueb using tiki-cms
Information management at vhir ueb using tiki-cms
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCRCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de RT-qPCR
 
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCRCurso de Genómica - UAT (VHIR) 2012 - RT-qPCR
Curso de Genómica - UAT (VHIR) 2012 - RT-qPCR
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génicaCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de expression génica
 
Curso de Genómica - UAT (VHIR) 2012 - Microarrays
Curso de Genómica - UAT (VHIR) 2012 - MicroarraysCurso de Genómica - UAT (VHIR) 2012 - Microarrays
Curso de Genómica - UAT (VHIR) 2012 - Microarrays
 
Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
 Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
Curso de Genómica - UAT (VHIR) 2012 - Arrays de Proteínas Zeptosens
 
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGSCurso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
Curso de Genómica - UAT (VHIR) 2012 - Análisis de datos de NGS
 

Dernier

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 

Dernier (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 

Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Course - Session 4.1 - VHIR, Barcelona)

  • 1. Bioinformatics Course Ferran Briansó and Alex Sánchez ferran.brianso@vhir.org Hospital Universitari Vall d’Hebron Institut de Recerca - VHIR Institut d’Investigació Sanitària de l’Instituto de Salud Carlos III (ISCIII) Introduction to RNA-seq and RNA-seq Data Analysis
  • 2. OVERVIEW1 2 3 4 RNA-SEQ ANALYSIS PIPELINE(S) NORMALIZATION METHODS DIFFERENTIAL EXPRESSION TESTING 5 Complements
  • 3. 1 Disclaimer • This lecture is based on many presentations freely available in the web. • We wish to acknowledge the authors for their efforts and for making their work available
  • 5. 1 Evolution of transcriptomics technologies • Northern Blot • RT-PCR • Microarrays • (NGS) RNA-seq • Single Genes • Multiple genes • Whole Genomes • Populations of genomes
  • 6. 1 What is RNA-seq? • RNA-seq is the high throughput sequencing of cDNA using NGS technologies • RNA-seq works by sequencing every RNA molecule and profiling the expression of a particular gene by counting the number of time its transcripts have been sequenced. • The summarized RNA-Seq data is widely known as count data
  • 7. A typical RNA-seq experiment Briefly, long RNAs are first converted into a library of cDNA fragments through either RNA fragmentation or DNA fragmentation (see main text). Sequencing adaptors (blue) are subsequently added to each cDNA fragment and a short sequence is obtained from each cDNA using high- throughput sequencing technology. The resulting sequence reads are aligned with the reference genome or transcriptome, and classified as three types: exonic reads, junction reads and poly(A) end-reads. These three types are used to generate a base-resolution expression profile for each gene, as illustrated at the bottom; a yeast ORF with one intron is shown. Nature Reviews Genetics 10, 57-63 (January 2009)
  • 8. Why use RNA-seq • Unique new possibilities • Evaluate absolute transcript level of sequenced and unsequenced organisms. • Detect novel transcripts and isoforms • Map exon/intron boundaries, splice junctions • Analyze alternative splicing • Reveal sequence variations (e.g. SNPs) and splice variants
  • 9. It has limitations too ... • Non-uniformity coverage of the genome due to experimental factors • Transcript-length bias • Read mapping uncertainty caused by sequencing error rates, repetitive elements, incomplete genome sequence, etc • Downstream bioinformatics algorithm/software need to be improved • Cost more than microarray
  • 10. • Sequencing depth (Library size): Total number of reads mapped to the genome. • Gene length: Number of bases that a gene has. • Gene counts: Number of reads mapping to that gene (expression measurement). 1 Important concepts
  • 11. Microarrays vs NGS • Digital Signal • Harder to achieve & interpret • Reads counts: discrete values • Weak background or no noise • Analog Signal • Easy to convey the signal’s information • Continuous strength • Signal loss and distortion RNA-seq can be seen as the NGS-counterpart of microarrays
  • 12. 1 Microarrays and NGS pipelines
  • 13. RNA seq and microarrays yield correlated results
  • 14. Pros and cons of RNA-seq and microarrays Microarrays  • Costs, • well established methods, small data  • Hybridization bias, • sequence must be known RNA-seq  • High reproducibility, • not limited to expression  • Cost • Complexity of analysis
  • 15. So what? • It is generally agreed/believed/expected that RNA- seq will soon replace microarrays for many uses. But not for all uses • There are still situations where the “simplicity” of microarrays yields the necessary information at an optimal cost. • Microarrays are now part of the standard molecular biology toolbox whereas RNA-seq is still in development.
  • 19. • Reads are mapped to the reference genome or transcriptome • Mapped reads are assembled into expression summaries (tables of counts, showing how may reads are in coding region, exon, gene or junction) • Data is normalized • Statistical testing of differential expression (DE) is performed, producing a list of genes with p- values and fold changes. • Similar downstream analysis than microarray results (Functional Annotations, Gene Enrichment Analysis; Integration with other data...) Tools for base calling, sequence quality control, alignment, mapping, summarizing... FastQC, FastX, Bowtie/Picard, TopHat, Cufflinks, Cuffmerge, ... RNA-seq analysis workflow
  • 20. Main Issues: • Number of allowed mismatche • Number of multihits • Mates expected distance • Considering exon junctions End up with a list of # of reads per transcript These will be our (discrete) response variable RNA Seq data analysis (1)-Mapping
  • 21. 10 years or plus of high throughput data analysis • Sequencing  genomic locations of many reads • Next task : Summarize & aggregate reads over some biologically meaningful unit, such as exons, transcripts or genes. • Many methods available • Counts # of reads overlapping the exons in a gene, • Include reads along the whole length of the gene and thereby incorporate reads from 'introns'. • Include only reads that map to coding sequence or… RNA Seq data analysis (2)-Summarization
  • 22. RNA Seq data analysis (2)-Summarization
  • 23. 10 years or plus of high throughput data analysis • Two main sources of bias – Influence of length: Counts are proportional to the transcript length times the mRNA expression level. – Influence of sequencing depth: The higher sequencing depth, the higher counts. • How to deal with this – Normalize (correct) gene counts to minimize biases. – Use statistical models that take into account length and sequencing depth RNA Seq data analysis (3)-Normalization
  • 24. • RPKM (Mortazavi et al., 2008): Counts are divided by the transcript length (kb) times the total number of millions of mapped reads. • • TMM (Robinson and Oshlack, 2010): Trimmed Mean of M values. • EDAseq (Risso et al., 2011): Within-lane gene-level GC-content normalization (corrects for library size, gene length, GC-content) • cqn (Hansen et al., 2011): Conditional quantile normalization (CQN) algorithm combining robust generalized regression (corrects for library size, gene length, GC-content) • Others: Upper-quartile (Bullard et al., 2010); FPKM (Trapnell et al., 2010): Instead of counts, Cufflinks software generates FPKM values (Fragments Per Kilobase of exon per Million fragments mapped) to estimate gene expression, which are analogous to RPKM. 3 RNA-seq normalization methods
  • 25. RNA Seq (4)- Differential expression analysis • The goal of a DE analysis is to highlight genes that have changed significantly in abundance across experimental conditions. • In general, this means • taking a table of summarized count data for each library and • performing statistical testing between samples of interest
  • 26. RNA Seq (4)- Methods for Differential Expression Analysis • Transform count data to use existing approaches for microarray data. • Use Fisher's exact test or similar approaches. • Use statistical models appropriate for count data such as Generalized Linear Models using – Poisson distribution. – Negative binomial distribution.
  • 27. 10 years or plus of high throughput data analysis
  • 28. 10 years or plus of high throughput data analysis RNA seq example (Normalized values)
  • 29. 10 years or plus of high throughput data analysis RNA seq example – Analysis (Fisher test)
  • 30. •The number of reads that are mapped into a gene was first modelled using a Poisson distribution • Poisson distribution appears when things are counted • It assumes that mean and variance are the same • However biological variability of RNA-seq count data cannot be captured using the Poisson distribution because data present overdispersion (i.e., variance of counts larger than mean) • Negative Binomial (NB) distribution takes into account overdispersion; hence, it has been used to model RNA-seq data • Poisson distribution has only one parameter λ, while NB is a two-parameter distribution λ and φ. Statistical models for count data
  • 31. Analysis methods based on assuming statistical models • Basic analysis methods use the exact test approach: • for each gene (t = 1, ...), and groups A and B, H0 : λtA = λtB • There are better options if data are assumed to follow a Negative Binomial Distribution or some generlization of this. • edgeR allows the option of estimating a different φ parameter for each gene • baySeq uses Poisson-Gamma and BN models estimating parameters by bootstrapping from the data. • DESeq assumes that the mean is a good predictor of the variance.
  • 32. RNA Seq (5)-Going beyond gene lists (1) • DE analysis yields lists of differentially expressed genes [transcripts, …] • Traditionally these lists arae explored by some type of gene set analysis • RNA-seq has biases (e.g. due to gene length) that require adapting methods developed with microarray • GO-Seq is such a method
  • 33. RNA Seq (5)-Going beyond gene lists (2) • Results of RNA-seq data can be integrated with other sources of biological data e.g. to establish a more complete picture of gene regulation • RNA-seq has in conjunction with genotyping data identify genetic loci responsible for variation in gene expression between individuals • Integration of expression data & epigenomic information (transcription factor binding, histone modification, methylation) has the potential for greater understanding of regulatory mechanisms.
  • 35. Additional topics • Transcriptome assembly • Alignment methods and tools • Alternative splicing and isoforms • List of software methods and tools for differential expression analysis of RNA-seq http://genomebiology.com/2010/11/12/220/table/T1
  • 36. De novo assembly • Underlying assumptions relative to RNA expression • sequence coverage is similar in reads of the same transcript • strand specific (sense and antisense transcripts) • Assemblers: • Velvet (Genomic and transcriptomic) • Trinity (Transcriptomic) • Cufflinks (Transcriptominc, reassemble pre-aligned transcripts to find alternative splicing based on differential expression)
  • 38. Alignment methods • Two different approach are possible: • Align vs the transcriptome • faster, easier • Align vs the whole genome • the complete information
  • 39. Alignment tools • NGS common alignment program: • BWA • Bowtie (Bowtie2) • Novoalign • Take into account splice-junction • Tophat/Cufflinks