This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.
Ladies kitty party invitation messages and greetings.pdf
Differential gene expression
1. [Pink Sherbet Photography] RNAseq analysis: Differential gene expression (2/2) Hopscotch and isoforms August 25, 2011
2. Reads->alignment to reference genome->transcript assembly Resulting file type: BAM, gff/bed “What transcripts are in my samples?” August 25, 2011 Transcript assembly Projects Fastq Mapping Quick recap: Mapping and transcript assembly Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
3. RNAseq analysis question Is there a difference in the transcriptome of two different conditions ? Quantify expression Quantify difference August 25, 2011 Condition1 Condition2
4. RNAseqvsExpression Array RNAseq can capture a larger dynamic range RNAseq can handle degraded samples Gain additional information New transcripts (New) isoforms Variants August 25, 2011 Flattening out Array RNA-seq Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 PMID: 19015660
5. Challenges Strand-specific methods still biased Number of reads not necessarily correlate with transcript abundance Longer transcripts have more reads (fragmentation). Technical variability between runs causes different number of total reads. Lowly abundant does not mean non-functional How to quantify expression of isoforms August 25, 2011 Ozsolak F, Milos PM. RNA sequencing: advances, challenges and opportunities. Nat Rev Genet. 2011 PMID: 21191423 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
6. Production Informatics and Bioinformatics August 25, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
7. Quantifying expression in RNAseq Long genes get more reads Normalize: fragments per kilobase of transcript per million mapped reads (FPKM) FPKM accounts for the dependency between paired-end reads August 25, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353. Oshlack A, Wakefield MJ. Transcript length bias in RNA-seq data confounds systems biology. Biol Direct. 2009 PMID: 19371405
8. Quantifying expression of overlapping isoforms We do not know where reads of overlapping isoformsacutally belong Alexa-Seq counting only the reads that map uniquely to a single isoform isoform-expression methods (cufflinks) likelihood function modeling the sequencing process (not very accurate for lowly expressed transcripts) 'exon intersection method’ (analogous to expression microarrays) counts reads mapped to its constitutive exons (reduce power for differential expression analysis) 'exon union method’ counts all reads mapped to any exon in any of the gene's isoforms (underestimates expression for alternatively spliced genes). August 25, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
9. Differentially expression What is a statistically significant difference between a set of measurements (expression of a gene) of two populations (conditions) First, estimate variability Observe biological variability (needs large numbers of replicates to sample the population). model biological variability model the count variance across replicates as a nonlinear function of the mean counts using various different parametric approaches (such as the normal and negative binomial distributions) (EdgeR, DESeq, Cuffdiff) August 25, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
10. Three things to remember RNAseq captures larger dynamic range (more sensitive) Additional information compared to arrays (e.g. isoforms) Need to make assumptions/compromises (quantification, few replicates) August 25, 2011 [cabbit]
11. Next Weeks: NGS Discussion group Jake’s topic August 25, 2011 Two Weeks: Abstract: This session will focus on identifying SNPs from whole genome, exome capture or targeted resequencing data. The approaches of mapping, local realigment, recalibration, SNP calling, and SNP recalibration will be introduced and quality metrics discussed.