SlideShare une entreprise Scribd logo
1  sur  18
[by Joseph Robertson] RNAseq analysis: Transcript detection (1/2) What is a jar ? August 11, 2011
Quick recap: Production informatics August 11, 2011 Sequencing->Images->Conversion (Demultiplexing) Resulting file type: FASTQ “Having raw sequence reads and quality scores” Sequencing Image Fastq Quality Control Projects
Objective & Challenges Objective: study the active transcriptome of the cell Problems: The RNA content of a cell is dominated by tRNA, rRNA and housekeeping genes Flowcell has only a finite real-estate of which most would be  occupied by these mainly invariable transcripts How to focus the sequencing on the “interesting” part of the transcriptome: mRNA and ncRNA ? August 11, 2011
What RNAseq protocols are there? RNA seq total RNA tRNA/rRNA removed + PolyA-tail filtered Good for studying protein coding genes, e.g.  gene expression, isoforms, expression of variant alleles RNA editing events RNA-DNA differences in the human transcriptome provide a yet-unexplored aspect of genome variation.  Small RNAseq:  Total RNA size selection for small RNA molecules Good for small ncRNA e.g. miRNAs, snoRNA Duplex-specific thermostable nuclease (DSN) guided RNA seq normalization Total RNA  high abundant transcripts are digested  Good for studying all transcripts August 11, 2011 Today Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952. Christodoulou DC, Gorham JM, Herman DS, Seidman JG. Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. CurrProtoc Mol Biol. 2011 PMID: 21472699
RNA-seq workflow Select PolyA-tail + remove tRNA/rRNA Fragment RNA Make cDNA(caution you may loose strand info) Sequence Map reads Identify transcripts Quantify transcripts Identify differences between conditions August 11, 2011 Today Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 PMID: 18516045.
Production Informatics and Bioinformatics August 11, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced  Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
Challenges for RNAseq read mapping Loosing reads because they do not match the ref. genome Reads spanning exon junctions RNA editing events  Approaches Align to ref. transcriptom library Exon-first e.g. Tophat Seed-extend methods e.g. GSNAP August 11, 2011 Sequencing reads DNA gRNA mRNA editing event Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353. Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.
Exon-first approach Align reads to ref. genome Chop up unaligned reads and try to identify matching regions Find splice junctions around the matches August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Seed-extend approach Break reads in smaller k-mers and find matches Iteratively extend k-mers to identify exact spliced alignment August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Which method ? Exon-first: less computationally intensive The additional exon-junctions found by seed-extend have not (yet) been demonstrated to be real. August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Challenges for transcript detection Identifying isoforms is difficult Transcript abundance is volatile Most reads are not helpful (reads from exons) or even misleading (incompletely spliced precursor RNA)  Genes can have many isoforms Approaches Ignore isoforms Genome-guided reconstruction, e.g. Cufflinks Genome-independent reconstruction, e.g. Trinity August 11, 2011 QBI data Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Genome-guided reconstruction Use reads spanning slice junction to assemble the transcript path Work out minimal possible set paths so that all reads are visited (graph theory) If more than one set use read count to pick the most probable  August 11, 2011 Reads aligned to the genome Isoforms Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
Genome-independent reconstruction Break reads into k-mers find their mutual overlap to build a de Bruijn graph Find probable paths through the graph by using read counts Map consensus assembly to genome August 11, 2011 Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
Which method? De novo methods are very computationally intensive However, they are able to find alternative isoforms and promoters and structural variation deletions (yellow) chimeras (green) August 11, 2011 Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
What are real transcripts? Even the most sophisticated computational method can’t tell you what is a real transcript. August 11, 2011 Roberts et al. QBI data Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 PMID: 21697122.
Solution: biological replicates Significant findings (here: new isoforms) in small sample sets can be due to  Technical errors Biological variability Population outliers Sequencing experiments are subject to the same issues (even though they are more expensive than arrays) Replicates are necessary to build confidence in your results! August 11, 2011 Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011 PMID: 21747377
Three things to remember Methods for analyzing RNAseq data are not as mature as expression array analysis tools yet. Especially identifying transcript isoforms is difficult. Replicates are crucial to account for the biological variability August 11, 2011
Next Week: August 11, 2011 Abstract: This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.

Contenu connexe

Tendances

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
hemantbreeder
 

Tendances (20)

RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
non coding RNA
non coding RNAnon coding RNA
non coding RNA
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
RNA Interference (RNAi)
RNA Interference (RNAi)RNA Interference (RNAi)
RNA Interference (RNAi)
 
Structural Genomics
Structural GenomicsStructural Genomics
Structural Genomics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Genetic mapping
Genetic mappingGenetic mapping
Genetic mapping
 
Proteomics
ProteomicsProteomics
Proteomics
 
RNA-Seq
RNA-SeqRNA-Seq
RNA-Seq
 
Genome Assembly
Genome AssemblyGenome Assembly
Genome Assembly
 
RNA-seq Data Analysis Overview
RNA-seq Data Analysis OverviewRNA-seq Data Analysis Overview
RNA-seq Data Analysis Overview
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 
Types of genomics ppt
Types of genomics pptTypes of genomics ppt
Types of genomics ppt
 
Genomics
GenomicsGenomics
Genomics
 
Non coding rna
Non coding rnaNon coding rna
Non coding rna
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02Sequencealignmentinbioinformatics 100204112518-phpapp02
Sequencealignmentinbioinformatics 100204112518-phpapp02
 
Assembly and gene_prediction
Assembly and gene_predictionAssembly and gene_prediction
Assembly and gene_prediction
 
Rna seq and chip seq
Rna seq and chip seqRna seq and chip seq
Rna seq and chip seq
 
Comparative genomics 2
Comparative genomics 2Comparative genomics 2
Comparative genomics 2
 

En vedette

DARana Master's Transcript
DARana Master's TranscriptDARana Master's Transcript
DARana Master's Transcript
Dhavalkumar Rana
 
New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12
Ranjani Reddy
 
New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project
Senthil Natesan
 

En vedette (19)

DARana Master's Transcript
DARana Master's TranscriptDARana Master's Transcript
DARana Master's Transcript
 
New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12New insights into the human genome by encode 14.12.12
New insights into the human genome by encode 14.12.12
 
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
Analysis of Single-Cell Sequencing Data by CLC/Ingenuity: Single Cell Analysi...
 
sequencing-methods-review
sequencing-methods-reviewsequencing-methods-review
sequencing-methods-review
 
ChIP-seq
ChIP-seqChIP-seq
ChIP-seq
 
RNASeq Experiment Design
RNASeq Experiment DesignRNASeq Experiment Design
RNASeq Experiment Design
 
Single cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applicationsSingle cell RNA sequencing; Methods and applications
Single cell RNA sequencing; Methods and applications
 
New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project New insights into the human genome by ENCODE project
New insights into the human genome by ENCODE project
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
RNA Sequencing from Single Cell
RNA Sequencing from Single CellRNA Sequencing from Single Cell
RNA Sequencing from Single Cell
 
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
Introduction to RNA-seq and RNA-seq Data Analysis (UEB-UAT Bioinformatics Cou...
 
Genome mapping
Genome mappingGenome mapping
Genome mapping
 
Mapping interview transcript records: theoretical, technical and cartographic...
Mapping interview transcript records: theoretical, technical and cartographic...Mapping interview transcript records: theoretical, technical and cartographic...
Mapping interview transcript records: theoretical, technical and cartographic...
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
genome mapping
genome mappinggenome mapping
genome mapping
 
Gene mapping
Gene mappingGene mapping
Gene mapping
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single Cell
 
Genome Mapping
Genome MappingGenome Mapping
Genome Mapping
 

Similaire à Transcript detection in RNAseq

Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable crops
Pulipati Gangadhara Rao
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
Long Pei
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
Thomas Keane
 

Similaire à Transcript detection in RNAseq (20)

Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Rnaseq forgenefinding
Rnaseq forgenefindingRnaseq forgenefinding
Rnaseq forgenefinding
 
Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
Thesis biobix
Thesis biobixThesis biobix
Thesis biobix
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Next generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable cropsNext generation seqencing tecnologies and application vegetable crops
Next generation seqencing tecnologies and application vegetable crops
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Rna
RnaRna
Rna
 
Use of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seqUse of TGIRT for ssDNA-seq
Use of TGIRT for ssDNA-seq
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
NGS Presentation .pptx
NGS Presentation  .pptxNGS Presentation  .pptx
NGS Presentation .pptx
 
Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...Evolutionary analysis across mammals reveals distinct classes of long non-cod...
Evolutionary analysis across mammals reveals distinct classes of long non-cod...
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)The research and application progress of transcriptome sequencing technology (i)
The research and application progress of transcriptome sequencing technology (i)
 
An introduction to RNA-seq data analysis
An introduction to RNA-seq data analysisAn introduction to RNA-seq data analysis
An introduction to RNA-seq data analysis
 

Plus de Denis C. Bauer

Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
Denis C. Bauer
 

Plus de Denis C. Bauer (19)

Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...VariantSpark: applying Spark-based machine learning methods to genomic inform...
VariantSpark: applying Spark-based machine learning methods to genomic inform...
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runs
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genome
 
ReliF
ReliFReliF
ReliF
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site prediction
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site prediction
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Transcript detection in RNAseq

  • 1. [by Joseph Robertson] RNAseq analysis: Transcript detection (1/2) What is a jar ? August 11, 2011
  • 2. Quick recap: Production informatics August 11, 2011 Sequencing->Images->Conversion (Demultiplexing) Resulting file type: FASTQ “Having raw sequence reads and quality scores” Sequencing Image Fastq Quality Control Projects
  • 3. Objective & Challenges Objective: study the active transcriptome of the cell Problems: The RNA content of a cell is dominated by tRNA, rRNA and housekeeping genes Flowcell has only a finite real-estate of which most would be occupied by these mainly invariable transcripts How to focus the sequencing on the “interesting” part of the transcriptome: mRNA and ncRNA ? August 11, 2011
  • 4. What RNAseq protocols are there? RNA seq total RNA tRNA/rRNA removed + PolyA-tail filtered Good for studying protein coding genes, e.g. gene expression, isoforms, expression of variant alleles RNA editing events RNA-DNA differences in the human transcriptome provide a yet-unexplored aspect of genome variation. Small RNAseq: Total RNA size selection for small RNA molecules Good for small ncRNA e.g. miRNAs, snoRNA Duplex-specific thermostable nuclease (DSN) guided RNA seq normalization Total RNA  high abundant transcripts are digested Good for studying all transcripts August 11, 2011 Today Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952. Christodoulou DC, Gorham JM, Herman DS, Seidman JG. Construction of normalized RNA-seq libraries for next-generation sequencing using the crab duplex-specific nuclease. CurrProtoc Mol Biol. 2011 PMID: 21472699
  • 5. RNA-seq workflow Select PolyA-tail + remove tRNA/rRNA Fragment RNA Make cDNA(caution you may loose strand info) Sequence Map reads Identify transcripts Quantify transcripts Identify differences between conditions August 11, 2011 Today Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008 PMID: 18516045.
  • 6. Production Informatics and Bioinformatics August 11, 2011 Produce raw sequence reads Basic Production Informatics Map to genome and generate raw genomic features (e.g. SNPs) Advanced Production Inform. Analyze the data; Uncover the biological meaning Bioinformatics Research Per one-flowcell project
  • 7. Challenges for RNAseq read mapping Loosing reads because they do not match the ref. genome Reads spanning exon junctions RNA editing events Approaches Align to ref. transcriptom library Exon-first e.g. Tophat Seed-extend methods e.g. GSNAP August 11, 2011 Sequencing reads DNA gRNA mRNA editing event Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353. Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG. Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011. PMID: 21596952.
  • 8. Exon-first approach Align reads to ref. genome Chop up unaligned reads and try to identify matching regions Find splice junctions around the matches August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 9. Seed-extend approach Break reads in smaller k-mers and find matches Iteratively extend k-mers to identify exact spliced alignment August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 10. Which method ? Exon-first: less computationally intensive The additional exon-junctions found by seed-extend have not (yet) been demonstrated to be real. August 11, 2011 Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 11. Challenges for transcript detection Identifying isoforms is difficult Transcript abundance is volatile Most reads are not helpful (reads from exons) or even misleading (incompletely spliced precursor RNA) Genes can have many isoforms Approaches Ignore isoforms Genome-guided reconstruction, e.g. Cufflinks Genome-independent reconstruction, e.g. Trinity August 11, 2011 QBI data Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 12. Genome-guided reconstruction Use reads spanning slice junction to assemble the transcript path Work out minimal possible set paths so that all reads are visited (graph theory) If more than one set use read count to pick the most probable August 11, 2011 Reads aligned to the genome Isoforms Garber M, Grabherr MG, Guttman M, Trapnell C. Computational methods for transcriptome annotation and quantification using RNA-seq. Nat Methods. 2011 PMID: 21623353.
  • 13. Genome-independent reconstruction Break reads into k-mers find their mutual overlap to build a de Bruijn graph Find probable paths through the graph by using read counts Map consensus assembly to genome August 11, 2011 Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
  • 14. Which method? De novo methods are very computationally intensive However, they are able to find alternative isoforms and promoters and structural variation deletions (yellow) chimeras (green) August 11, 2011 Iyer MK, Chinnaiyan AM. RNA-Seq unleashed. Nat Biotechnol. 2011 PMID: 21747384.
  • 15. What are real transcripts? Even the most sophisticated computational method can’t tell you what is a real transcript. August 11, 2011 Roberts et al. QBI data Roberts A, Pimentel H, Trapnell C, Pachter L. Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 PMID: 21697122.
  • 16. Solution: biological replicates Significant findings (here: new isoforms) in small sample sets can be due to Technical errors Biological variability Population outliers Sequencing experiments are subject to the same issues (even though they are more expensive than arrays) Replicates are necessary to build confidence in your results! August 11, 2011 Hansen KD, Wu Z, Irizarry RA, Leek JT. Sequencing technology does not eliminate biological variability. Nat Biotechnol. 2011 PMID: 21747377
  • 17. Three things to remember Methods for analyzing RNAseq data are not as mature as expression array analysis tools yet. Especially identifying transcript isoforms is difficult. Replicates are crucial to account for the biological variability August 11, 2011
  • 18. Next Week: August 11, 2011 Abstract: This session will follow up from transcript quantification of RNAseq data and discusses statistical means of identifying differentially regulated transcripts, and isoforms and contrasts these against microarray analysis approaches.

Notes de l'éditeur

  1. http://www.nature.com/nbt/journal/v29/n7/full/nbt.1915.html
  2. That is, double-stranded cDNA is denatured, then allowed to partially re-anneal, and the most abundant species, which re-anneal most rapidly, are digested with crab duplex-specific nuclease