SlideShare a Scribd company logo
1 of 55
RNA-Sequencing for Full-length Transcript Discovery
Lab Meeting
2/10/14
Anne Deslattes Mays
Mentor: Anton Wellstein, MD, PhD
Special Recognition: Marcel Schmidt, PhD
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
1
2
Discovery of homing gene fragments
using bone marrow-derived monocytes
Questions:
1. which proteins drive organ homing of hematopoietic
cells ?
2. are there distinct homing proteins for diseased organs
(cancer, wound healing, ischemia, infection) ?
Approaches:
1. use human bone marrow (BM) cDNA library
that displays large proteins from bone
marrow & precursor cells on the phage
surface
2. in vivo selection of homing proteins from
target organs or vessels in animal models
(normal or diseased)
3. this approach selects for gene fragments
coding for homing proteins
full length transcripts
from source material
Experimental Objective
We aim to identify the full-length transcripts using 2nd and 3rd generation
sequencing methods for genes whose fragments were discovered through the
phage display experiments nearly a decade ago.
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
3
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
4
MedStar Georgetown University Hospital Cell Processing Unit
Objective: Obtain healthy donor bone marrow bags
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
5
Objective: RNA Isolation from Total Bone Marrow
Step 1: Total Bone Marrow Isolation
Four Sequencing Experiments
Second Generation Sequencing
4/18/2014 Wellstein/Riegel Laboratory 7
4/18/2014 Wellstein/Riegel Laboratory 8
2nd Generation
Sequencing with
Illumina HiSeq 2000
Four Sequencing Experiments
Second Generation Sequencing
1. Total.bm.random – total bone marrow sequenced mate paired
non-strand specific randomly primed ~ 180 million reads
4/18/2014 Wellstein/Riegel Laboratory 9
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
10
Experiment 1 Results
Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity
(gsnap & blat)) using the read information
Wellstein Genome – created a sub genome with excised regions around the
phage with the hopes of discovering the underlying isoform and gene
structure
Blat/Blasted the short reads against this region and still
• Results were ambiguous information regarding isoforms and gene
structure hits which included phage
• Structure of transcript was not clear
• Strand information regarding reads aligned not clear
Next Steps
• Design another experiment, same cell population, this time targeted
(including original phage primers used often in experiments in both
lineage negative and total bone marrow experiments) and strand specific
• Create a custom long transcript library primed to include full length phage
transcripts
4/18/2014 Wellstein/Riegel Laboratory 11
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
12
Random RNA-Sequencing vs Strand-specific Targeted RNA-
sequencing
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
13
Targeted RNA-Sequencing Workflow
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
14
5
Initial G12 Gene Model from the Total Bone Marrow
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
15
Design targeted primers and create custom long reaction cDNA
library
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
16
Results and pre-sequencing fragmentation
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
17
Experiment 2 Results
Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity
(gsnap & blat)) using the read information
Wellstein Genome – created a sub genome with excised regions around the phage
with the hopes of discovering the underlying isoform and gene structure
Blat/Blasted the short reads against this region and still
• Results were ambiguous information regarding isoforms and gene structure
hits which included phage
• Strand information known but yet
• Structure of transcript was not clear
• Was it the depth? Was it the cell population? Was it mistargeted regions?
Next Steps
• Design another experiment, now looking at only the lineage negative cell
population where it is known the phage are enriched
• Return to randomly primed reads
• Sequence at a depth similar to the original total bone marrow experiment
(100 million reads)
4/18/2014 Wellstein/Riegel Laboratory 18
Four Sequencing Experiments
Second Generation Sequencing
1. Total.bm.random – total bone marrow non-strand specific
randomly primed ~ 180 million reads
2. Total.bm.ss.targeted – total bone marrow strand specific targeted
primed to a depth ~ 20 million reads
3. Lin.neg.ss.random – lineage-negative strand specific randomly
primed ~ 111 million reads
4/18/2014 Wellstein/Riegel Laboratory 19
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
20
Negative Selection:
Human Progenitor Cell Enrichment Kit with Platelet Depletion
to Isolate the Lineage Negative sub population from total bone marrow
Loading and Negative Controls
class gene total.bm.ss lin.neg.ss
loading ACTB 2933 12,643
loading B2M 1500 8473
loading GAPDH 622 44,413
negative CD11B 231 1193
negative CD11C 132 689
negative CD14 21 49
negative CD16a 418 1312
negative CD19 8 36
negative CD2 7 16
negative CD24 142 177
negative CD3EAP 28 243
negative CD56 197 2039
negative CD61 24 480
negative CD66B 207 208
negative glycophorin.A 49 80
negative mir155 2 20
Phage and Positive Controls
class gene total.bm.ss lin.neg.ss
phage _b9 203 2298
phage a1 0 0
phage A12 0 0
phage A5 186 553
phage a8 76 789
phage b3 439 4731
phage b6 68 331
phage B9 171 2354
phage C1 9 139
phage C12 42 10,657
phage C2 147 1757
phage c3 163 453
phage C7 170 1419
phage d5 236 744
phage E12.1 34 459
phage E7 106 300
phage E9 236 2723
phage F6 120 2556
phage G12 292 925
phage H3 64 1060
phage h4 179 658
phage h6 0 0
phage h7 126 1302
positive BST1 32 1616
positive CD133 0 0
positive CD34 9 398
positive THY1 2 4
3 loading controls
13 negative controls
27 Positive controls and phage
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
23
Peak read count: 45,701
Peak read count: 52,626
Peak read count: 12,570
Peak read count: 200
ACTB
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
24
Negative Control: CD14 (should be highest in Total Bone Marrow)
Peak read count: 109
Peak read count: 6318
Peak read count: 48
Peak read count: 21
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
25
Negative Control: CD34 (should be highest in Lineage Negative)
Peak read count: 169
Peak read count: 43
Peak read count: 386
Peak read count: 10
What’s Wrong With Illumina Reads
Uniformity of Read Coverage*
• An aligned read can be represented as an integer point in R2 as follows:
The ‘t-coordinate’ corresponding to the read is its left-end point while the
‘l-coordinate’ is the length of the fragment. In Evans et al. (2010), it is
shown that for any choice of fragment length distribution, the col- lection
of points f(t, l)} from a sequencing experiment forms a two-dimensional
Poisson process. This principle guides our further analysis of these points
f(t, l)}, as we test for uniformity in both the t and l coordinates. The output
of ReadSpy is a list of test statistics and P-values for each transcript. A
statistically significant (low) P-value means we reject the fact that the
dataset is uniform on that transcript. Thus, a higher P-value corresponds
to a set of reads sampled uniformly, which is desired. In the next two
sections, we describe the statistical test applied a each transcript. The test
is formulated in terms of the genomic segment [a, b].
*Hower, Valerie, Richard Starfield, Adam Roberts, and Lior Pachter. "Quantifying uniformity of mapped reads." Bioinformatics
28, no. 20 (2012): 2680-2682.
4/18/2014 Wellstein/Riegel Laboratory 26
Lior Pachter’s ReadSpy Results
Total BM Targeted
Strand Specific (20 million reads)
target_id length df
pair_counts
_0 test_stat_0 p_value_0
chr19 49129131 19 226 3948.34 0.00E+00
chr4 191038775 19 227 1760.40 0.00E+00
chr11 135006716 19 304 2811.79 0.00E+00
chr2 243199471 19 361 6859.00 0.00E+00
chr16 90354953 38 402 7638.00 0.00E+00
chr9 141354337 38 436 2754.92 0.00E+00
chr12 133851995 57 797 15143.00 0.00E+00
chr15 102531492 76 841 15979.00 0.00E+00
chr1 249250866 247 2739 20184.43 0.00E+00
chr7 159138908 285 3325 54980.68 0.00E+00
Lineage Negative Strand Specific
Random (110 million reads)
target_id length df
pair_counts
_0 test_stat_0 p_value_0
chrY 59373664 19 224 4256.00 0.00E+00
chr21 48130091 19 284 2951.63 0.00E+00
chr19 49129131 57 663 10583.74 0.00E+00
chr8 146364218 57 751 5478.61 0.00E+00
chr10 135534897 76 902 8655.73 0.00E+00
chr3 198022577 76 957 12936.24 0.00E+00
chr16 90354953 133 1439 27341.00 0.00E+00
chr11 135006716 190 2067 23431.41 0.00E+00
chr2 243199471 190 2260 42940.00 0.00E+00
chr4 191038775 285 3236 40639.91 0.00E+00
chr9 141354337 304 3423 23574.66 0.00E+00
chr15 102531492 380 5735 108965.00 0.00E+00
chr1 249250866 912 10322 97596.23 0.00E+00
chr7 159138908 2394 29726 504209.24 0.00E+00
chr12 133851995 5605 84272 1601168.00 0.00E+00
Our reads all have low p-values
indicating the non-uniform
nature of their read coverage
Experiment 3 Results
Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity (gsnap & blat)) using the
read information
Wellstein Genome – created a sub genome with excised regions around the phage with the hopes of
discovering the underlying isoform and gene structure
Blat/Blasted the short reads against this region and still
• Results were ambiguous information regarding isoforms and gene structure hits which included
phage
• Strand information known but yet
• Enrichment in population is evident
• Unambiguous Structure of phage transcripts still not clear
• Finding known genes can be done, even de novo assembly of novel transcripts is done on a regular
basis
• But with these phage, a fragment is known -- how do we find the full length structure of this
phage?
• What if we had the phage transcripts in the targeted full length library, but it was lost in the
fragmentation? Is there a way to do sequencing without fragmentation?
Next Steps
• Use new 3rd generation technology to do full length transcript sequencing without fragmentation
4/18/2014 Wellstein/Riegel Laboratory 28
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
29
Source: Iso-seq webinar by Liz Tseng, Pacific Biosystems
https://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding-PacBio-
transcriptome-data
Four Sequencing Experiments
Second Generation Sequencing
1. Total.bm.random – total bone marrow sequenced non-strand
specific randomly primed ~ 180 million reads
2. Total.bm.ss.targeted – total bone marrow sequenced strand
specific targeted primed to a depth ~ 20 million reads
3. Lin.neg.ss.random – lin- sequenced strand specific randomly
primed ~ 111 million reads
Third Generation Sequencing
4. Lin.neg Pac Bio Long reads –
6 million CCS Filtered SubReads ~ 277,000 readsOfInserts
4/18/2014 Wellstein/Riegel Laboratory 30
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
31Source: http://www.pacificbiosciences.com/products/smrt-technology/
4/18/2014 Wellstein/Riegel Laboratory 32
Source:
https://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding-
PacBio-transcriptome-data#wiki-roiexplained
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
33
Source:
https://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding-
PacBio-transcriptome-data#wiki-roiexplained
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
34
Source: Bobby Sebra – smrt
portal analysis results
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
35
Peak read count: 45,701
Peak read count: 52,626
Peak read count: 12,570
Peak read count: 10
ACTB
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
36
Negative Control: CD14 (should be highest in Total Bone Marrow)
Peak read count: 109
Peak read count: 6318
Peak read count: 48
Peak read count: 21
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
37
Negative Control: CD34 (should be highest in Lineage Negative)
Peak read count: 169
Peak read count: 43
Peak read count: 386
Peak read count: 10
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
38
Phage: B9 – only the phage (953 bp)
Peak read count: 10
Peak read count: 10
Peak read count: 10
Peak read count: 10
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
39
Peak read count: 10
Peak read count: 16
Peak read count: 10
Peak read count: 10
Phage: B9 10x larger region (~9kb) centered on phage evidence
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
40
2/6/2014 Reports for Job readsofinsert
http://ec2-54-197-149-12.compute-1.amazonaws.com:8080/smrtportal/View-Data/Report/16437?name=readsofinsert&media=all&reportKey=Reads-Of-Insert-R… 1/1
Read  Length  Of  Insert Read  Quality  Of  Insert
Number Of  Passes
Reports for Job  readsofinsert
Reads Of Insert
Movie
Reads  Of
Insert
Read Bases
Of  Insert
Mean Read Length
Of Insert
Read Accuracy
Of Insert
Mean Number  Of
Passes
m131214_160008_42177R_c100597152550000001823102305221422_s1_p0 47,762 61,257,390 1,282 97.96% 11.01
m131212_234151_42177R_c100597412550000001823102305221473_s1_p0 23,360 33,092,110 1,416 98.39% 11.65
m131214_092100_42177R_c100597152550000001823102305221420_s1_p0 36,623 59,671,472 1,629 98.41% 10.78
m131214_124034_42177R_c100597152550000001823102305221421_s1_p0 49,710 63,809,739 1,283 98.04% 11.26
m131213_232025_42177R_c100597412550000001823102305221475_s1_p0 30,720 37,357,905 1,216 97.49% 10.75
m131213_030106_42177R_c100597412550000001823102305221474_s1_p0 24,284 34,943,462 1,438 98.49% 11.85
m131214_060132_42177R_c100597412550000001823102305221477_s1_p0 32,492 39,813,943 1,225 97.49% 10.54
m131214_023937_42177R_c100597412550000001823102305221476_s1_p0 32,210 39,536,384 1,227 97.57% 10.74
Generated  by SMRT®  Portal. Thu  Feb  06  13:30:44  UTC  2014  
For Research  Use  Only. Not for use  in  diagnostic procedures.
Source: self-install smrt portal – reads of insert
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
41
87%
11%
2%
Transcript Size Distribution
1 to 2k 2 to 3k over 3k
Summary of reads.
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
42
------ 5' primer seen summary ----
Per subread: 258835/277161 (93.4%)
Per ZMW: 258835/277161 (93.4%)
Per ZMW first-pass: 258835/277161 (93.4%)
------ 3' primer seen summary ----
Per subread: 1361/277161 (0.5%)
Per ZMW: 1361/277161 (0.5%)
Per ZMW first-pass: 1361/277161 (0.5%)
------ 5'&3' primer seen summary ----
Per subread: 1341/277161 (0.5%)
Per ZMW: 1341/277161 (0.5%)
Per ZMW first-pass: 1341/277161 (0.5%)
------ 5'&3'&polyA primer seen summary ----
Per subread: 18/277161 (0.0%)
Per ZMW: 18/277161 (0.0%)
Per ZMW first-pass: 18/277161 (0.0%)
------ Primer Match breakdown ----
F0/R0: 258855 (100.0%) Source: output of summarize_results.py (Liz Tseng)
But this is not good – it turns out that the primers were incorrectly
chosen and the best way to find the primers used is to do as follows:
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
43
>cat reads_of_insert.fasta | grep -A1 "AAAAAAAAAAAAAAAAA" | more
GGCTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
--
AACATTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTAACTCTGCGTTGATACCACTGCTT
--
TGTTTTATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
--
TTACAATTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
--
GAGCCCTTACCGAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
--
GTGGTGATTGTTTACTAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
--
GACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
--
TTTCCCGCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
--
CTTACTTACGTAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
--
GCCCCATCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT
>cat reads_of_insert.fasta | grep -A1 "TTTTTTTTTTTT" | more
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTGGCTTGAT
--
AAGCAGTTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTTGATTTCCAT
--
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACTTGGGATCTTT
--
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTT
--
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACCCATCAGCG
--
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTGGTATTTGTTTGTTTCTG
--
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTT
--
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGACATAAACAC
--
AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACTAAGCATATT
T
Now my primers are:
>F0
AAGCAGTGGTATCAACGCAGAGTAC
>R0
GTAACTCTGCGTTGATACCACTGCTT
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
44
------ 5' primer seen summary ----
Per subread: 256672/277161 (92.6%)
Per ZMW: 256672/277161 (92.6%)
Per ZMW first-pass: 256672/277161 (92.6%)
------ 3' primer seen summary ----
Per subread: 208877/277161 (75.4%)
Per ZMW: 208877/277161 (75.4%)
Per ZMW first-pass: 208877/277161 (75.4%)
------ 5'&3' primer seen summary ----
Per subread: 207111/277161 (74.7%)
Per ZMW: 207111/277161 (74.7%)
Per ZMW first-pass: 207111/277161 (74.7%)
------ 5'&3'&polyA primer seen summary ----
Per subread: 100863/277161 (36.4%)
Per ZMW: 100863/277161 (36.4%)
Per ZMW first-pass: 100863/277161 (36.4%)
------ Primer Match breakdown ----
F0/R0: 258438 (100.0%)
Source: output of summarize_results.py (Liz Tseng)
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
45
Negative Control: CD14 (should be highest in Total Bone Marrow)
Peak read count: 109
Peak read count: 6318
Peak read count: 48
Peak read count: 21
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
46
Negative Control: CD34 (should be highest in Lineage Negative)
Peak read count: 169
Peak read count: 43
Peak read count: 386
Peak read count: 10
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
47
Phage: B9 – only the phage (953 bp)
Peak read count: 10
Peak read count: 10
Peak read count: 10
Peak read count: 10
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
48
Peak read count: 10
Peak read count: 16
Peak read count: 10
Peak read count: 10
Phage: B9 10x larger region (~9kb) centered on phage evidence
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
49
Scale
chr11:
MOB2
CTSD
Indiv. Seq. Matches
Sequences
SNPs
Genes
Human mRNAs
Spliced ESTs
DNase Clusters
Txn Factor ChIP
Rhesus
Mouse
Dog
Elephant
Chicken
X_tropicalis
Zebrafish
Lamprey
Common SNPs(138)
RepeatMasker
200 bases hg19
1,774,050 1,774,100 1,774,150 1,774,200 1,774,250 1,774,300 1,774,350 1,774,400 1,774,450
Your Sequence from Blat Search
UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics)
RefSeq Genes
Retroposed Genes V5, Including Pseudogenes
Publications: Sequences in scientific articles
Human mRNAs from GenBank
Human ESTs That Have Been Spliced
H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE
Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE
Transcription Factor ChIP-seq from ENCODE
100 vertebrates Basewise Conservation by PhyloP
Multiz Alignments of 100 Vertebrates
Simple Nucleotide Polymorphisms (dbSNP 138) Found in >= 1% of Samples
Repeating Elements by RepeatMasker
01823102305221476_s1_p0/142269/25_1056_CCS
001823102305221475_s1_p0/23219/25_2124_CCS
14-10
01823102305221420_s1_p0/101093/25_2057_CCS
001823102305221420_s1_p0/43933/25_2151_CCS
01823102305221474_s1_p0/126784/25_2052_CCS
001823102305221474_s1_p0/38774/25_2111_CCS
001823102305221473_s1_p0/61096/26_2148_CCS
001823102305221420_s1_p0/90213/25_2018_CCS
001823102305221420_s1_p0/70860/25_1785_CCS
001823102305221420_s1_p0/46857/25_2050_CCS
01823102305221474_s1_p0/129700/25_2069_CCS
001823102305221473_s1_p0/56996/25_2088_CCS
01823102305221421_s1_p0/102623/25_2092_CCS
0001823102305221477_s1_p0/3072/2126_65_CCS
001823102305221476_s1_p0/26060/25_2036_CCS
0001823102305221476_s1_p0/1057/25_2034_CCS
0001823102305221474_s1_p0/5669/25_2058_CCS
01823102305221476_s1_p0/118762/25_1890_CCS
001823102305221422_s1_p0/82049/25_2039_CCS
MOB2
CTSD
Layered H3K27Ac
100 _
0 _
100 Vert. Cons
4.88 _
-4.5 _
0 -
Phage 14-10: 100% identity and alignment to 19 full length read of inserts
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
50
Scale
chr11:
MOB2
IFITM10
CTSD
Indiv. Seq. Matches
Sequences
SNPs
Genes
Human mRNAs
Spliced ESTs
DNase Clusters
Txn Factor ChIP
Rhesus
Mouse
Dog
Elephant
Chicken
X_tropicalis
Zebrafish
Lamprey
Common SNPs(138)
RepeatMasker
5 kb hg19
1,775,000 1,780,000 1,785,000
Your Sequence from Blat Search
UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics)
RefSeq Genes
Retroposed Genes V5, Including Pseudogenes
Publications: Sequences in scientific articles
Human mRNAs from GenBank
Human ESTs That Have Been Spliced
H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE
Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE
Transcription Factor ChIP-seq from ENCODE
100 vertebrates Basewise Conservation by PhyloP
Multiz Alignments of 100 Vertebrates
Simple Nucleotide Polymorphisms (dbSNP 138) Found in >= 1% of Samples
Repeating Elements by RepeatMasker
01823102305221476_s1_p0/142269/25_1056_CCS
001823102305221475_s1_p0/23219/25_2124_CCS
14-10
01823102305221420_s1_p0/101093/25_2057_CCS
001823102305221420_s1_p0/43933/25_2151_CCS
01823102305221474_s1_p0/126784/25_2052_CCS
001823102305221474_s1_p0/38774/25_2111_CCS
001823102305221473_s1_p0/61096/26_2148_CCS
001823102305221420_s1_p0/90213/25_2018_CCS
001823102305221420_s1_p0/70860/25_1785_CCS
001823102305221420_s1_p0/46857/25_2050_CCS
01823102305221474_s1_p0/129700/25_2069_CCS
001823102305221473_s1_p0/56996/25_2088_CCS
01823102305221421_s1_p0/102623/25_2092_CCS
0001823102305221477_s1_p0/3072/2126_65_CCS
001823102305221476_s1_p0/26060/25_2036_CCS
0001823102305221476_s1_p0/1057/25_2034_CCS
0001823102305221474_s1_p0/5669/25_2058_CCS
01823102305221476_s1_p0/118762/25_1890_CCS
001823102305221422_s1_p0/82049/25_2039_CCS
MOB2
IFITM10 CTSD
Layered H3K27Ac
100 _
0 _
100 Vert. Cons
4.88 _
-4.5 _
0 -
Phage 14-10: 100% aligned to CTSD, 2 possibly 3 splice variants in lineage negative cell
population – structure fully resolved
Conclusions:
• Full Length Transcript discovery is achieved with Pacific Biosystems RS
sequencer, using size selection in library preparation prior to sequencing
and Reads Of Insert algorithm
• Even before the release of the ReadsOfInsert approach, the subreads that
are available as a result of the sequencing still had the ability to tell you
the structure of the complete transcript.
• With an error rate of 15%, seemingly daunting, the random nature of the
error and the length of the read provided the complete structure in a way
that no short read second generation sequence could.
• When one is searching for the complete structure, perfection in the parts
is of no consequence
• NO ASSEMBLY is REQUIRED
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
51
Next Steps:
1. Compete the reads of insert approach with 75% accuracy and minimum 1
pass
2. Identify additional full length structure (if possible with the sample reads)
3. Write up the results
4. (next paper) If no additional phage found, sequence an enriched
population with confirmed phage evidence at full length with more
another pacific bio sequencing
5. Use illumina reads to correct for errors and recover more reads
6. Use greater pac bio sequencing depth
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
52
4/18/2014 Wellstein/Riegel Laboratory 53
Acknowledgements
Dr. Anton Wellstein
Dr. Anna Riegel
Dr. Elena Tassi
Dr. Marcel Schmidt
The entire lab: Elena, Virginie, Ghada, Ivana, Eveline, Khalid, Khaled, Eric, Nitya, the entire
Wellstein/Riegel laboratory
My Committee
Dr. Yuri Gusev
Dr. Anatoly Dritschilo
Dr. Michael Johnson
Dr. Christopher Loffredo
Dr. Habtom Ressom
Dr. Terry Ryan (external committee member)
Robert Sebra, Mt. Sinai PacBio Sequencing
Liz, Tseng, Pacific Biosystems
Eric Schadt, Mt. Sinai PacBio Sequencing
Brian Haas, Author Trinity Suite
`
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
54
CD11: New Evidence of an Exon From all Samples, confirmed by PacBio
Peak read count: 16
Peak read count: 1925
Peak read count: 639
Peak read count: 121
4/18/2014
Wellstein/Riegel Laboratory, Lombardi
Cancer Center, Washington DC 20007
55
PASA assembly (Trinity Pipeline) Denovo + Genome Guided
Evidence of a new exon – not found in annotation
for CD11

More Related Content

What's hot

Современное лечение ВИЧ.Обобщённые данные с конференции CROI 2020 / Contempor...
Современное лечение ВИЧ.Обобщённые данные с конференции CROI 2020 / Contempor...Современное лечение ВИЧ.Обобщённые данные с конференции CROI 2020 / Contempor...
Современное лечение ВИЧ.Обобщённые данные с конференции CROI 2020 / Contempor...hivlifeinfo
 
Confronting the Challenges of HIV Care in an Aging Population.2019
Confronting the Challenges of HIV Care in an Aging Population.2019Confronting the Challenges of HIV Care in an Aging Population.2019
Confronting the Challenges of HIV Care in an Aging Population.2019hivlifeinfo
 
Improved sensitivit
Improved sensitivitImproved sensitivit
Improved sensitivitt7260678
 
Ключевые решения в лечении ВИЧ: оптимизация стратегии лечения для пациентов с...
Ключевые решения в лечении ВИЧ: оптимизация стратегии лечения для пациентов с...Ключевые решения в лечении ВИЧ: оптимизация стратегии лечения для пациентов с...
Ключевые решения в лечении ВИЧ: оптимизация стратегии лечения для пациентов с...hivlifeinfo
 
Determining Candidacy and Strategies for ART Modification.2019
Determining Candidacy and Strategies for ART Modification.2019Determining Candidacy and Strategies for ART Modification.2019
Determining Candidacy and Strategies for ART Modification.2019hivlifeinfo
 
Highlights of AIDS 2014 .CCO Official Conference Coverage of the 20th Interna...
Highlights of AIDS 2014 .CCO Official Conference Coverage of the 20th Interna...Highlights of AIDS 2014 .CCO Official Conference Coverage of the 20th Interna...
Highlights of AIDS 2014 .CCO Official Conference Coverage of the 20th Interna...Hivlife Info
 
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...Candy Smellie
 
Современное лечение ВИЧ: модификация АРТ у пациентов с вирусологической супре...
Современное лечение ВИЧ: модификация АРТ у пациентов с вирусологической супре...Современное лечение ВИЧ: модификация АРТ у пациентов с вирусологической супре...
Современное лечение ВИЧ: модификация АРТ у пациентов с вирусологической супре...hivlifeinfo
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Thermo Fisher Scientific
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine LectureDan Gaston
 
Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...
Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...
Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...Databricks
 
Expert Insights in Selecting a Switch Regimen for Virologically Suppressed HI...
Expert Insights in Selecting a Switch Regimen for Virologically Suppressed HI...Expert Insights in Selecting a Switch Regimen for Virologically Suppressed HI...
Expert Insights in Selecting a Switch Regimen for Virologically Suppressed HI...hivlifeinfo
 
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...Thermo Fisher Scientific
 
A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers Thermo Fisher Scientific
 
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...Thermo Fisher Scientific
 

What's hot (18)

Современное лечение ВИЧ.Обобщённые данные с конференции CROI 2020 / Contempor...
Современное лечение ВИЧ.Обобщённые данные с конференции CROI 2020 / Contempor...Современное лечение ВИЧ.Обобщённые данные с конференции CROI 2020 / Contempor...
Современное лечение ВИЧ.Обобщённые данные с конференции CROI 2020 / Contempor...
 
Confronting the Challenges of HIV Care in an Aging Population.2019
Confronting the Challenges of HIV Care in an Aging Population.2019Confronting the Challenges of HIV Care in an Aging Population.2019
Confronting the Challenges of HIV Care in an Aging Population.2019
 
Improved sensitivit
Improved sensitivitImproved sensitivit
Improved sensitivit
 
Ключевые решения в лечении ВИЧ: оптимизация стратегии лечения для пациентов с...
Ключевые решения в лечении ВИЧ: оптимизация стратегии лечения для пациентов с...Ключевые решения в лечении ВИЧ: оптимизация стратегии лечения для пациентов с...
Ключевые решения в лечении ВИЧ: оптимизация стратегии лечения для пациентов с...
 
Determining Candidacy and Strategies for ART Modification.2019
Determining Candidacy and Strategies for ART Modification.2019Determining Candidacy and Strategies for ART Modification.2019
Determining Candidacy and Strategies for ART Modification.2019
 
Highlights of AIDS 2014 .CCO Official Conference Coverage of the 20th Interna...
Highlights of AIDS 2014 .CCO Official Conference Coverage of the 20th Interna...Highlights of AIDS 2014 .CCO Official Conference Coverage of the 20th Interna...
Highlights of AIDS 2014 .CCO Official Conference Coverage of the 20th Interna...
 
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...
 
Современное лечение ВИЧ: модификация АРТ у пациентов с вирусологической супре...
Современное лечение ВИЧ: модификация АРТ у пациентов с вирусологической супре...Современное лечение ВИЧ: модификация АРТ у пациентов с вирусологической супре...
Современное лечение ВИЧ: модификация АРТ у пациентов с вирусологической супре...
 
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
Moving Towards a Validated High Throughput Sequencing Solution for Human Iden...
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture
 
Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...
Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...
Identify Disease-Associated Genetic Variants Via 3D Genomics Structure and Re...
 
Expert Insights in Selecting a Switch Regimen for Virologically Suppressed HI...
Expert Insights in Selecting a Switch Regimen for Virologically Suppressed HI...Expert Insights in Selecting a Switch Regimen for Virologically Suppressed HI...
Expert Insights in Selecting a Switch Regimen for Virologically Suppressed HI...
 
Qpcr
QpcrQpcr
Qpcr
 
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
Developing Custom Next-Generation Sequencing Panels using Pre-Optimized Assay...
 
A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers A novel method for building custom ampli seq panels using optimized pcr primers
A novel method for building custom ampli seq panels using optimized pcr primers
 
Gamida cell ppt_english_5-5-2010
Gamida cell ppt_english_5-5-2010Gamida cell ppt_english_5-5-2010
Gamida cell ppt_english_5-5-2010
 
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
Why Y Chromosome Markers are an Ever Expanding Essential Tool in Sexual Assau...
 
pDC waspJEM2013
pDC waspJEM2013pDC waspJEM2013
pDC waspJEM2013
 

Viewers also liked

Tag-based transcript sequencing: Comparison of SAGE and CAGE
Tag-based transcript sequencing: Comparison of SAGE and CAGETag-based transcript sequencing: Comparison of SAGE and CAGE
Tag-based transcript sequencing: Comparison of SAGE and CAGEMatthias Harbers
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Mrinal Vashisth
 
Domagoj šegregur, croatia, fishes as bioindicators of water pollution of river
Domagoj šegregur, croatia,  fishes as bioindicators of water pollution of riverDomagoj šegregur, croatia,  fishes as bioindicators of water pollution of river
Domagoj šegregur, croatia, fishes as bioindicators of water pollution of riverMohd Subri
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seqb0rAAs
 
Reproductive biology
Reproductive biologyReproductive biology
Reproductive biologysanchu yadav
 
Recent lipid metabolism
Recent lipid metabolismRecent lipid metabolism
Recent lipid metabolismAlhassan Ali
 
Regulation Of Metabolism
Regulation Of MetabolismRegulation Of Metabolism
Regulation Of Metabolismraj kumar
 
Signal Transduction Revised
Signal Transduction RevisedSignal Transduction Revised
Signal Transduction RevisedMD Specialclass
 
217 c reactive protein
217 c reactive protein217 c reactive protein
217 c reactive proteinSHAPE Society
 
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid MetabolismChem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid MetabolismShaina Mavreen Villaroza
 
Regulation of Gene Expression ppt
Regulation of Gene Expression pptRegulation of Gene Expression ppt
Regulation of Gene Expression pptKhaled Elmasry
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 

Viewers also liked (19)

Tag-based transcript sequencing: Comparison of SAGE and CAGE
Tag-based transcript sequencing: Comparison of SAGE and CAGETag-based transcript sequencing: Comparison of SAGE and CAGE
Tag-based transcript sequencing: Comparison of SAGE and CAGE
 
Introduction to statistics iii
Introduction to statistics iiiIntroduction to statistics iii
Introduction to statistics iii
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
 
Domagoj šegregur, croatia, fishes as bioindicators of water pollution of river
Domagoj šegregur, croatia,  fishes as bioindicators of water pollution of riverDomagoj šegregur, croatia,  fishes as bioindicators of water pollution of river
Domagoj šegregur, croatia, fishes as bioindicators of water pollution of river
 
Next-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-SeqNext-Generation Sequencing and its Applications in RNA-Seq
Next-Generation Sequencing and its Applications in RNA-Seq
 
Reproductive biology
Reproductive biologyReproductive biology
Reproductive biology
 
Recent lipid metabolism
Recent lipid metabolismRecent lipid metabolism
Recent lipid metabolism
 
Dna repair
Dna repairDna repair
Dna repair
 
Regulation Of Metabolism
Regulation Of MetabolismRegulation Of Metabolism
Regulation Of Metabolism
 
Signal Transduction Revised
Signal Transduction RevisedSignal Transduction Revised
Signal Transduction Revised
 
Dna sequencing
Dna sequencingDna sequencing
Dna sequencing
 
217 c reactive protein
217 c reactive protein217 c reactive protein
217 c reactive protein
 
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid MetabolismChem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
Chem 45 Biochemistry: Stoker chapter 25 Lipid Metabolism
 
Lipid metabolism
Lipid metabolismLipid metabolism
Lipid metabolism
 
Regulation of Gene Expression ppt
Regulation of Gene Expression pptRegulation of Gene Expression ppt
Regulation of Gene Expression ppt
 
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTESREGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
REGULATION OF GENE EXPRESSION IN PROKARYOTES & EUKARYOTES
 
Dna Sequencing
Dna SequencingDna Sequencing
Dna Sequencing
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
Slideshare ppt
Slideshare pptSlideshare ppt
Slideshare ppt
 

Similar to RNA Sequencing for Full Length Transcript Discovery

2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...Anne Deslattes Mays
 
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...Golden Helix
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917GenomeInABottle
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Organ-i World Transplant Congress Soild Organ Rejection Test (k-SORT)
Organ-i World Transplant Congress Soild Organ Rejection Test (k-SORT)Organ-i World Transplant Congress Soild Organ Rejection Test (k-SORT)
Organ-i World Transplant Congress Soild Organ Rejection Test (k-SORT)Kevin Jaglinski
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsGenomeInABottle
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Peter Nagy, Columbia Agilent Symposium, Jan, 27 2012
Peter Nagy, Columbia Agilent Symposium, Jan, 27 2012Peter Nagy, Columbia Agilent Symposium, Jan, 27 2012
Peter Nagy, Columbia Agilent Symposium, Jan, 27 2012sequencing_columbia
 
Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...
Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...
Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...Golden Helix
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisGolden Helix
 
Selection and application of ssDNA aptamers to detect active TB from sputum s...
Selection and application of ssDNA aptamers to detect active TB from sputum s...Selection and application of ssDNA aptamers to detect active TB from sputum s...
Selection and application of ssDNA aptamers to detect active TB from sputum s...Saw Yi
 
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayTumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayThermo Fisher Scientific
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GenomeInABottle
 
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Thermo Fisher Scientific
 

Similar to RNA Sequencing for Full Length Transcript Discovery (20)

2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
2014 June 17 PacBio User Group Meeting Presentation "How Looking for a Needle...
 
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
From Panels to Genomes with VarSeq: The Complete Tertiary Platform for Short ...
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Organ-i World Transplant Congress Soild Organ Rejection Test (k-SORT)
Organ-i World Transplant Congress Soild Organ Rejection Test (k-SORT)Organ-i World Transplant Congress Soild Organ Rejection Test (k-SORT)
Organ-i World Transplant Congress Soild Organ Rejection Test (k-SORT)
 
Aug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigenticsAug2015 analysis team 10 mason epigentics
Aug2015 analysis team 10 mason epigentics
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Peter Nagy, Columbia Agilent Symposium, Jan, 27 2012
Peter Nagy, Columbia Agilent Symposium, Jan, 27 2012Peter Nagy, Columbia Agilent Symposium, Jan, 27 2012
Peter Nagy, Columbia Agilent Symposium, Jan, 27 2012
 
Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...
Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...
Analyzing Performance of the Twist Exome with CNV Backbone at Various Probe D...
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveVarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
VarSeq 2.4.0: VSClinical ACMG Workflow from the User Perspective
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
Mason abrf single_cell_2017
Mason abrf single_cell_2017Mason abrf single_cell_2017
Mason abrf single_cell_2017
 
Selection and application of ssDNA aptamers to detect active TB from sputum s...
Selection and application of ssDNA aptamers to detect active TB from sputum s...Selection and application of ssDNA aptamers to detect active TB from sputum s...
Selection and application of ssDNA aptamers to detect active TB from sputum s...
 
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assayTumor Mutational Load assessment of FFPE samples using an NGS based assay
Tumor Mutational Load assessment of FFPE samples using an NGS based assay
 
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
GIAB Benchmarks for SVs and Repeats for stanford genetics sv 200511
 
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
Global Gene Expression Profiles from Breast Tumor Samples using the Ion Ampli...
 

More from Anne Deslattes Mays

Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuAnne Deslattes Mays
 
Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Anne Deslattes Mays
 
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...Anne Deslattes Mays
 
2012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v22012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v2Anne Deslattes Mays
 
2013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v22013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v2Anne Deslattes Mays
 

More from Anne Deslattes Mays (7)

Jax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbuJax bio dataworldcongress.ngs.20181128finalwithoutbu
Jax bio dataworldcongress.ngs.20181128finalwithoutbu
 
Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018Wellstein poster embl meeting nov 2018
Wellstein poster embl meeting nov 2018
 
BioData World Basel 2018
BioData World Basel 2018BioData World Basel 2018
BioData World Basel 2018
 
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...FGFBP1 pathways control after induction of a conditional transgene in a mouse...
FGFBP1 pathways control after induction of a conditional transgene in a mouse...
 
2013 oct 2 rna sequencing
2013 oct 2 rna sequencing2013 oct 2 rna sequencing
2013 oct 2 rna sequencing
 
2012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v22012 august 16 systems biology rna seq v2
2012 august 16 systems biology rna seq v2
 
2013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v22013 july 25 systems biology rna seq v2
2013 july 25 systems biology rna seq v2
 

Recently uploaded

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

RNA Sequencing for Full Length Transcript Discovery

  • 1. RNA-Sequencing for Full-length Transcript Discovery Lab Meeting 2/10/14 Anne Deslattes Mays Mentor: Anton Wellstein, MD, PhD Special Recognition: Marcel Schmidt, PhD 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 1
  • 2. 2 Discovery of homing gene fragments using bone marrow-derived monocytes Questions: 1. which proteins drive organ homing of hematopoietic cells ? 2. are there distinct homing proteins for diseased organs (cancer, wound healing, ischemia, infection) ? Approaches: 1. use human bone marrow (BM) cDNA library that displays large proteins from bone marrow & precursor cells on the phage surface 2. in vivo selection of homing proteins from target organs or vessels in animal models (normal or diseased) 3. this approach selects for gene fragments coding for homing proteins full length transcripts from source material
  • 3. Experimental Objective We aim to identify the full-length transcripts using 2nd and 3rd generation sequencing methods for genes whose fragments were discovered through the phage display experiments nearly a decade ago. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 3
  • 4. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 4 MedStar Georgetown University Hospital Cell Processing Unit Objective: Obtain healthy donor bone marrow bags
  • 5. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 5 Objective: RNA Isolation from Total Bone Marrow Step 1: Total Bone Marrow Isolation
  • 6.
  • 7. Four Sequencing Experiments Second Generation Sequencing 4/18/2014 Wellstein/Riegel Laboratory 7
  • 8. 4/18/2014 Wellstein/Riegel Laboratory 8 2nd Generation Sequencing with Illumina HiSeq 2000
  • 9. Four Sequencing Experiments Second Generation Sequencing 1. Total.bm.random – total bone marrow sequenced mate paired non-strand specific randomly primed ~ 180 million reads 4/18/2014 Wellstein/Riegel Laboratory 9
  • 11. Experiment 1 Results Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity (gsnap & blat)) using the read information Wellstein Genome – created a sub genome with excised regions around the phage with the hopes of discovering the underlying isoform and gene structure Blat/Blasted the short reads against this region and still • Results were ambiguous information regarding isoforms and gene structure hits which included phage • Structure of transcript was not clear • Strand information regarding reads aligned not clear Next Steps • Design another experiment, same cell population, this time targeted (including original phage primers used often in experiments in both lineage negative and total bone marrow experiments) and strand specific • Create a custom long transcript library primed to include full length phage transcripts 4/18/2014 Wellstein/Riegel Laboratory 11
  • 13. Random RNA-Sequencing vs Strand-specific Targeted RNA- sequencing 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 13
  • 14. Targeted RNA-Sequencing Workflow 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 14 5
  • 15. Initial G12 Gene Model from the Total Bone Marrow 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 15
  • 16. Design targeted primers and create custom long reaction cDNA library 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 16
  • 17. Results and pre-sequencing fragmentation 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 17
  • 18. Experiment 2 Results Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity (gsnap & blat)) using the read information Wellstein Genome – created a sub genome with excised regions around the phage with the hopes of discovering the underlying isoform and gene structure Blat/Blasted the short reads against this region and still • Results were ambiguous information regarding isoforms and gene structure hits which included phage • Strand information known but yet • Structure of transcript was not clear • Was it the depth? Was it the cell population? Was it mistargeted regions? Next Steps • Design another experiment, now looking at only the lineage negative cell population where it is known the phage are enriched • Return to randomly primed reads • Sequence at a depth similar to the original total bone marrow experiment (100 million reads) 4/18/2014 Wellstein/Riegel Laboratory 18
  • 19. Four Sequencing Experiments Second Generation Sequencing 1. Total.bm.random – total bone marrow non-strand specific randomly primed ~ 180 million reads 2. Total.bm.ss.targeted – total bone marrow strand specific targeted primed to a depth ~ 20 million reads 3. Lin.neg.ss.random – lineage-negative strand specific randomly primed ~ 111 million reads 4/18/2014 Wellstein/Riegel Laboratory 19
  • 21. Negative Selection: Human Progenitor Cell Enrichment Kit with Platelet Depletion to Isolate the Lineage Negative sub population from total bone marrow
  • 22. Loading and Negative Controls class gene total.bm.ss lin.neg.ss loading ACTB 2933 12,643 loading B2M 1500 8473 loading GAPDH 622 44,413 negative CD11B 231 1193 negative CD11C 132 689 negative CD14 21 49 negative CD16a 418 1312 negative CD19 8 36 negative CD2 7 16 negative CD24 142 177 negative CD3EAP 28 243 negative CD56 197 2039 negative CD61 24 480 negative CD66B 207 208 negative glycophorin.A 49 80 negative mir155 2 20 Phage and Positive Controls class gene total.bm.ss lin.neg.ss phage _b9 203 2298 phage a1 0 0 phage A12 0 0 phage A5 186 553 phage a8 76 789 phage b3 439 4731 phage b6 68 331 phage B9 171 2354 phage C1 9 139 phage C12 42 10,657 phage C2 147 1757 phage c3 163 453 phage C7 170 1419 phage d5 236 744 phage E12.1 34 459 phage E7 106 300 phage E9 236 2723 phage F6 120 2556 phage G12 292 925 phage H3 64 1060 phage h4 179 658 phage h6 0 0 phage h7 126 1302 positive BST1 32 1616 positive CD133 0 0 positive CD34 9 398 positive THY1 2 4 3 loading controls 13 negative controls 27 Positive controls and phage
  • 23. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 23 Peak read count: 45,701 Peak read count: 52,626 Peak read count: 12,570 Peak read count: 200 ACTB
  • 24. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 24 Negative Control: CD14 (should be highest in Total Bone Marrow) Peak read count: 109 Peak read count: 6318 Peak read count: 48 Peak read count: 21
  • 25. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 25 Negative Control: CD34 (should be highest in Lineage Negative) Peak read count: 169 Peak read count: 43 Peak read count: 386 Peak read count: 10
  • 26. What’s Wrong With Illumina Reads Uniformity of Read Coverage* • An aligned read can be represented as an integer point in R2 as follows: The ‘t-coordinate’ corresponding to the read is its left-end point while the ‘l-coordinate’ is the length of the fragment. In Evans et al. (2010), it is shown that for any choice of fragment length distribution, the col- lection of points f(t, l)} from a sequencing experiment forms a two-dimensional Poisson process. This principle guides our further analysis of these points f(t, l)}, as we test for uniformity in both the t and l coordinates. The output of ReadSpy is a list of test statistics and P-values for each transcript. A statistically significant (low) P-value means we reject the fact that the dataset is uniform on that transcript. Thus, a higher P-value corresponds to a set of reads sampled uniformly, which is desired. In the next two sections, we describe the statistical test applied a each transcript. The test is formulated in terms of the genomic segment [a, b]. *Hower, Valerie, Richard Starfield, Adam Roberts, and Lior Pachter. "Quantifying uniformity of mapped reads." Bioinformatics 28, no. 20 (2012): 2680-2682. 4/18/2014 Wellstein/Riegel Laboratory 26
  • 27. Lior Pachter’s ReadSpy Results Total BM Targeted Strand Specific (20 million reads) target_id length df pair_counts _0 test_stat_0 p_value_0 chr19 49129131 19 226 3948.34 0.00E+00 chr4 191038775 19 227 1760.40 0.00E+00 chr11 135006716 19 304 2811.79 0.00E+00 chr2 243199471 19 361 6859.00 0.00E+00 chr16 90354953 38 402 7638.00 0.00E+00 chr9 141354337 38 436 2754.92 0.00E+00 chr12 133851995 57 797 15143.00 0.00E+00 chr15 102531492 76 841 15979.00 0.00E+00 chr1 249250866 247 2739 20184.43 0.00E+00 chr7 159138908 285 3325 54980.68 0.00E+00 Lineage Negative Strand Specific Random (110 million reads) target_id length df pair_counts _0 test_stat_0 p_value_0 chrY 59373664 19 224 4256.00 0.00E+00 chr21 48130091 19 284 2951.63 0.00E+00 chr19 49129131 57 663 10583.74 0.00E+00 chr8 146364218 57 751 5478.61 0.00E+00 chr10 135534897 76 902 8655.73 0.00E+00 chr3 198022577 76 957 12936.24 0.00E+00 chr16 90354953 133 1439 27341.00 0.00E+00 chr11 135006716 190 2067 23431.41 0.00E+00 chr2 243199471 190 2260 42940.00 0.00E+00 chr4 191038775 285 3236 40639.91 0.00E+00 chr9 141354337 304 3423 23574.66 0.00E+00 chr15 102531492 380 5735 108965.00 0.00E+00 chr1 249250866 912 10322 97596.23 0.00E+00 chr7 159138908 2394 29726 504209.24 0.00E+00 chr12 133851995 5605 84272 1601168.00 0.00E+00 Our reads all have low p-values indicating the non-uniform nature of their read coverage
  • 28. Experiment 3 Results Genome aligned (tophat (bowtie2)/cufflinks) and De novo assemblies (trinity (gsnap & blat)) using the read information Wellstein Genome – created a sub genome with excised regions around the phage with the hopes of discovering the underlying isoform and gene structure Blat/Blasted the short reads against this region and still • Results were ambiguous information regarding isoforms and gene structure hits which included phage • Strand information known but yet • Enrichment in population is evident • Unambiguous Structure of phage transcripts still not clear • Finding known genes can be done, even de novo assembly of novel transcripts is done on a regular basis • But with these phage, a fragment is known -- how do we find the full length structure of this phage? • What if we had the phage transcripts in the targeted full length library, but it was lost in the fragmentation? Is there a way to do sequencing without fragmentation? Next Steps • Use new 3rd generation technology to do full length transcript sequencing without fragmentation 4/18/2014 Wellstein/Riegel Laboratory 28
  • 29. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 29 Source: Iso-seq webinar by Liz Tseng, Pacific Biosystems https://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding-PacBio- transcriptome-data
  • 30. Four Sequencing Experiments Second Generation Sequencing 1. Total.bm.random – total bone marrow sequenced non-strand specific randomly primed ~ 180 million reads 2. Total.bm.ss.targeted – total bone marrow sequenced strand specific targeted primed to a depth ~ 20 million reads 3. Lin.neg.ss.random – lin- sequenced strand specific randomly primed ~ 111 million reads Third Generation Sequencing 4. Lin.neg Pac Bio Long reads – 6 million CCS Filtered SubReads ~ 277,000 readsOfInserts 4/18/2014 Wellstein/Riegel Laboratory 30
  • 31. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 31Source: http://www.pacificbiosciences.com/products/smrt-technology/
  • 32. 4/18/2014 Wellstein/Riegel Laboratory 32 Source: https://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding- PacBio-transcriptome-data#wiki-roiexplained
  • 33. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 33 Source: https://github.com/PacificBiosciences/cDNA_primer/wiki/Understanding- PacBio-transcriptome-data#wiki-roiexplained
  • 34. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 34 Source: Bobby Sebra – smrt portal analysis results
  • 35. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 35 Peak read count: 45,701 Peak read count: 52,626 Peak read count: 12,570 Peak read count: 10 ACTB
  • 36. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 36 Negative Control: CD14 (should be highest in Total Bone Marrow) Peak read count: 109 Peak read count: 6318 Peak read count: 48 Peak read count: 21
  • 37. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 37 Negative Control: CD34 (should be highest in Lineage Negative) Peak read count: 169 Peak read count: 43 Peak read count: 386 Peak read count: 10
  • 38. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 38 Phage: B9 – only the phage (953 bp) Peak read count: 10 Peak read count: 10 Peak read count: 10 Peak read count: 10
  • 39. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 39 Peak read count: 10 Peak read count: 16 Peak read count: 10 Peak read count: 10 Phage: B9 10x larger region (~9kb) centered on phage evidence
  • 40. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 40 2/6/2014 Reports for Job readsofinsert http://ec2-54-197-149-12.compute-1.amazonaws.com:8080/smrtportal/View-Data/Report/16437?name=readsofinsert&media=all&reportKey=Reads-Of-Insert-R… 1/1 Read  Length  Of  Insert Read  Quality  Of  Insert Number Of  Passes Reports for Job  readsofinsert Reads Of Insert Movie Reads  Of Insert Read Bases Of  Insert Mean Read Length Of Insert Read Accuracy Of Insert Mean Number  Of Passes m131214_160008_42177R_c100597152550000001823102305221422_s1_p0 47,762 61,257,390 1,282 97.96% 11.01 m131212_234151_42177R_c100597412550000001823102305221473_s1_p0 23,360 33,092,110 1,416 98.39% 11.65 m131214_092100_42177R_c100597152550000001823102305221420_s1_p0 36,623 59,671,472 1,629 98.41% 10.78 m131214_124034_42177R_c100597152550000001823102305221421_s1_p0 49,710 63,809,739 1,283 98.04% 11.26 m131213_232025_42177R_c100597412550000001823102305221475_s1_p0 30,720 37,357,905 1,216 97.49% 10.75 m131213_030106_42177R_c100597412550000001823102305221474_s1_p0 24,284 34,943,462 1,438 98.49% 11.85 m131214_060132_42177R_c100597412550000001823102305221477_s1_p0 32,492 39,813,943 1,225 97.49% 10.54 m131214_023937_42177R_c100597412550000001823102305221476_s1_p0 32,210 39,536,384 1,227 97.57% 10.74 Generated  by SMRT®  Portal. Thu  Feb  06  13:30:44  UTC  2014   For Research  Use  Only. Not for use  in  diagnostic procedures. Source: self-install smrt portal – reads of insert
  • 41. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 41 87% 11% 2% Transcript Size Distribution 1 to 2k 2 to 3k over 3k
  • 42. Summary of reads. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 42 ------ 5' primer seen summary ---- Per subread: 258835/277161 (93.4%) Per ZMW: 258835/277161 (93.4%) Per ZMW first-pass: 258835/277161 (93.4%) ------ 3' primer seen summary ---- Per subread: 1361/277161 (0.5%) Per ZMW: 1361/277161 (0.5%) Per ZMW first-pass: 1361/277161 (0.5%) ------ 5'&3' primer seen summary ---- Per subread: 1341/277161 (0.5%) Per ZMW: 1341/277161 (0.5%) Per ZMW first-pass: 1341/277161 (0.5%) ------ 5'&3'&polyA primer seen summary ---- Per subread: 18/277161 (0.0%) Per ZMW: 18/277161 (0.0%) Per ZMW first-pass: 18/277161 (0.0%) ------ Primer Match breakdown ---- F0/R0: 258855 (100.0%) Source: output of summarize_results.py (Liz Tseng)
  • 43. But this is not good – it turns out that the primers were incorrectly chosen and the best way to find the primers used is to do as follows: 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 43 >cat reads_of_insert.fasta | grep -A1 "AAAAAAAAAAAAAAAAA" | more GGCTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT -- AACATTGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTAACTCTGCGTTGATACCACTGCTT -- TGTTTTATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT -- TTACAATTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT -- GAGCCCTTACCGAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT -- GTGGTGATTGTTTACTAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT -- GACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT -- TTTCCCGCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT -- CTTACTTACGTAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT -- GCCCCATCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTACTCTGCGTTGATACCACTGCTT >cat reads_of_insert.fasta | grep -A1 "TTTTTTTTTTTT" | more AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTGGCTTGAT -- AAGCAGTTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGTTTTGATTTCCAT -- AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACTTGGGATCTTT -- AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTT -- AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACCCATCAGCG -- AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTGGTATTTGTTTGTTTCTG -- AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTT -- AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGACATAAACAC -- AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTACTAAGCATATT T Now my primers are: >F0 AAGCAGTGGTATCAACGCAGAGTAC >R0 GTAACTCTGCGTTGATACCACTGCTT
  • 44. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 44 ------ 5' primer seen summary ---- Per subread: 256672/277161 (92.6%) Per ZMW: 256672/277161 (92.6%) Per ZMW first-pass: 256672/277161 (92.6%) ------ 3' primer seen summary ---- Per subread: 208877/277161 (75.4%) Per ZMW: 208877/277161 (75.4%) Per ZMW first-pass: 208877/277161 (75.4%) ------ 5'&3' primer seen summary ---- Per subread: 207111/277161 (74.7%) Per ZMW: 207111/277161 (74.7%) Per ZMW first-pass: 207111/277161 (74.7%) ------ 5'&3'&polyA primer seen summary ---- Per subread: 100863/277161 (36.4%) Per ZMW: 100863/277161 (36.4%) Per ZMW first-pass: 100863/277161 (36.4%) ------ Primer Match breakdown ---- F0/R0: 258438 (100.0%) Source: output of summarize_results.py (Liz Tseng)
  • 45. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 45 Negative Control: CD14 (should be highest in Total Bone Marrow) Peak read count: 109 Peak read count: 6318 Peak read count: 48 Peak read count: 21
  • 46. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 46 Negative Control: CD34 (should be highest in Lineage Negative) Peak read count: 169 Peak read count: 43 Peak read count: 386 Peak read count: 10
  • 47. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 47 Phage: B9 – only the phage (953 bp) Peak read count: 10 Peak read count: 10 Peak read count: 10 Peak read count: 10
  • 48. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 48 Peak read count: 10 Peak read count: 16 Peak read count: 10 Peak read count: 10 Phage: B9 10x larger region (~9kb) centered on phage evidence
  • 49. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 49 Scale chr11: MOB2 CTSD Indiv. Seq. Matches Sequences SNPs Genes Human mRNAs Spliced ESTs DNase Clusters Txn Factor ChIP Rhesus Mouse Dog Elephant Chicken X_tropicalis Zebrafish Lamprey Common SNPs(138) RepeatMasker 200 bases hg19 1,774,050 1,774,100 1,774,150 1,774,200 1,774,250 1,774,300 1,774,350 1,774,400 1,774,450 Your Sequence from Blat Search UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) RefSeq Genes Retroposed Genes V5, Including Pseudogenes Publications: Sequences in scientific articles Human mRNAs from GenBank Human ESTs That Have Been Spliced H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE Transcription Factor ChIP-seq from ENCODE 100 vertebrates Basewise Conservation by PhyloP Multiz Alignments of 100 Vertebrates Simple Nucleotide Polymorphisms (dbSNP 138) Found in >= 1% of Samples Repeating Elements by RepeatMasker 01823102305221476_s1_p0/142269/25_1056_CCS 001823102305221475_s1_p0/23219/25_2124_CCS 14-10 01823102305221420_s1_p0/101093/25_2057_CCS 001823102305221420_s1_p0/43933/25_2151_CCS 01823102305221474_s1_p0/126784/25_2052_CCS 001823102305221474_s1_p0/38774/25_2111_CCS 001823102305221473_s1_p0/61096/26_2148_CCS 001823102305221420_s1_p0/90213/25_2018_CCS 001823102305221420_s1_p0/70860/25_1785_CCS 001823102305221420_s1_p0/46857/25_2050_CCS 01823102305221474_s1_p0/129700/25_2069_CCS 001823102305221473_s1_p0/56996/25_2088_CCS 01823102305221421_s1_p0/102623/25_2092_CCS 0001823102305221477_s1_p0/3072/2126_65_CCS 001823102305221476_s1_p0/26060/25_2036_CCS 0001823102305221476_s1_p0/1057/25_2034_CCS 0001823102305221474_s1_p0/5669/25_2058_CCS 01823102305221476_s1_p0/118762/25_1890_CCS 001823102305221422_s1_p0/82049/25_2039_CCS MOB2 CTSD Layered H3K27Ac 100 _ 0 _ 100 Vert. Cons 4.88 _ -4.5 _ 0 - Phage 14-10: 100% identity and alignment to 19 full length read of inserts
  • 50. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 50 Scale chr11: MOB2 IFITM10 CTSD Indiv. Seq. Matches Sequences SNPs Genes Human mRNAs Spliced ESTs DNase Clusters Txn Factor ChIP Rhesus Mouse Dog Elephant Chicken X_tropicalis Zebrafish Lamprey Common SNPs(138) RepeatMasker 5 kb hg19 1,775,000 1,780,000 1,785,000 Your Sequence from Blat Search UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) RefSeq Genes Retroposed Genes V5, Including Pseudogenes Publications: Sequences in scientific articles Human mRNAs from GenBank Human ESTs That Have Been Spliced H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE Digital DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE Transcription Factor ChIP-seq from ENCODE 100 vertebrates Basewise Conservation by PhyloP Multiz Alignments of 100 Vertebrates Simple Nucleotide Polymorphisms (dbSNP 138) Found in >= 1% of Samples Repeating Elements by RepeatMasker 01823102305221476_s1_p0/142269/25_1056_CCS 001823102305221475_s1_p0/23219/25_2124_CCS 14-10 01823102305221420_s1_p0/101093/25_2057_CCS 001823102305221420_s1_p0/43933/25_2151_CCS 01823102305221474_s1_p0/126784/25_2052_CCS 001823102305221474_s1_p0/38774/25_2111_CCS 001823102305221473_s1_p0/61096/26_2148_CCS 001823102305221420_s1_p0/90213/25_2018_CCS 001823102305221420_s1_p0/70860/25_1785_CCS 001823102305221420_s1_p0/46857/25_2050_CCS 01823102305221474_s1_p0/129700/25_2069_CCS 001823102305221473_s1_p0/56996/25_2088_CCS 01823102305221421_s1_p0/102623/25_2092_CCS 0001823102305221477_s1_p0/3072/2126_65_CCS 001823102305221476_s1_p0/26060/25_2036_CCS 0001823102305221476_s1_p0/1057/25_2034_CCS 0001823102305221474_s1_p0/5669/25_2058_CCS 01823102305221476_s1_p0/118762/25_1890_CCS 001823102305221422_s1_p0/82049/25_2039_CCS MOB2 IFITM10 CTSD Layered H3K27Ac 100 _ 0 _ 100 Vert. Cons 4.88 _ -4.5 _ 0 - Phage 14-10: 100% aligned to CTSD, 2 possibly 3 splice variants in lineage negative cell population – structure fully resolved
  • 51. Conclusions: • Full Length Transcript discovery is achieved with Pacific Biosystems RS sequencer, using size selection in library preparation prior to sequencing and Reads Of Insert algorithm • Even before the release of the ReadsOfInsert approach, the subreads that are available as a result of the sequencing still had the ability to tell you the structure of the complete transcript. • With an error rate of 15%, seemingly daunting, the random nature of the error and the length of the read provided the complete structure in a way that no short read second generation sequence could. • When one is searching for the complete structure, perfection in the parts is of no consequence • NO ASSEMBLY is REQUIRED 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 51
  • 52. Next Steps: 1. Compete the reads of insert approach with 75% accuracy and minimum 1 pass 2. Identify additional full length structure (if possible with the sample reads) 3. Write up the results 4. (next paper) If no additional phage found, sequence an enriched population with confirmed phage evidence at full length with more another pacific bio sequencing 5. Use illumina reads to correct for errors and recover more reads 6. Use greater pac bio sequencing depth 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 52
  • 53. 4/18/2014 Wellstein/Riegel Laboratory 53 Acknowledgements Dr. Anton Wellstein Dr. Anna Riegel Dr. Elena Tassi Dr. Marcel Schmidt The entire lab: Elena, Virginie, Ghada, Ivana, Eveline, Khalid, Khaled, Eric, Nitya, the entire Wellstein/Riegel laboratory My Committee Dr. Yuri Gusev Dr. Anatoly Dritschilo Dr. Michael Johnson Dr. Christopher Loffredo Dr. Habtom Ressom Dr. Terry Ryan (external committee member) Robert Sebra, Mt. Sinai PacBio Sequencing Liz, Tseng, Pacific Biosystems Eric Schadt, Mt. Sinai PacBio Sequencing Brian Haas, Author Trinity Suite `
  • 54. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 54 CD11: New Evidence of an Exon From all Samples, confirmed by PacBio Peak read count: 16 Peak read count: 1925 Peak read count: 639 Peak read count: 121
  • 55. 4/18/2014 Wellstein/Riegel Laboratory, Lombardi Cancer Center, Washington DC 20007 55 PASA assembly (Trinity Pipeline) Denovo + Genome Guided Evidence of a new exon – not found in annotation for CD11

Editor's Notes

  1. Fragments are important and interestingSequencing is cheap and should reveal our fragments – as shown they express at high levels relative to actin – as shown for an annotation experiment – recommendations are paired end stranded sequencing –
  2. Figure 5 – Random RNASeqvs Strand Specific Targeted RNA-SequencingPanel A shows the typical RNA seq experiment. It begins with cDNA library preparation constructed from the tissue of choice but randomly primed and includes second strand cDNA synthesis. PanelB shows the steps in a strand-specific targetd RNA-sequencing experiment. Primers are targetd and the second strand cDNA not synthesized.
  3. Figure 5 – Random RNASeqvs Strand Specific Targeted RNA-SequencingPanel A shows the typical RNA seq experiment. It begins with cDNA library preparation constructed from the tissue of choice but randomly primed and includes second strand cDNA synthesis. PanelB shows the steps in a strand-specific targetd RNA-sequencing experiment. Primers are targetd and the second strand cDNA not synthesized.
  4. Figure 2 -Step 1 – Assemble known information. Both the novel transcript fragments discovered through phage display experiments and additional transcript data gathered from a random RNASeq experiment were mapped to the genome. Step 2 – Create gene model. Step 3 – Primer Design. Primers were designed to be unique to the genome and specific and antisense to the gene. Step 4 – Perform Targeted RNASeq – this step involves fragmentation (see figure 5). Step 5 – Reassemble the fragmented Transcript data into full length transcripts. Step 6 – Confirm the full-length transcript.
  5. Figure 3 Map phagecDNA fragment information together with Random RNAseq readsIn steps 1 and 2 of our workflow we want to map all known information to the genome, create a putative gene model. Mapping of short reads is a crucial and not always disambiguous step. Read mapping with blat versus read mapping with bowtie2 is not identical. The gene model in step 3 was created using blat reads. Using abundancy and known transcript information to select novel and specific transcript data to create our initial putative G12 gene model
  6. Figure 4 – Primer design and custom cDNA library creationPrimers were designed specific to the gene model created. Panel A shows G12.1, G12.2, G12.3, G12.4, G12.6, G12.7, G12.transcript.1, G12.transcript.2, G12.transcript.3. These are the primers that were designed to the G12 gene model. 119 Primers were designed to 23 genes discovered in the initial surgical experiments.An average of 6 primers were designed to each of the genes including the 3’ most putative exon. To create the custom targeted cDNA library, a pooling strategy was employed separating chromosomes and primers to each of the genes in such a way that the reverse transcriptase reaction could occur as specifically as possible in 24 separate reactions (Panel B). The cDNA library was synthesized in a long reaction (> 12 hours) on sample freshly harvested from bone marrow with a RIN quality of greater than 9.
  7. Figure 5 6 – Results and pre-sequencing fragmentationPanel A shows the results from our long reverse transcriptase reaction (12-16 hours) in our cDNA library creation. On aaverage, the transcripts are 3671 base pairs in length. Panel B shows the results from the pre-sequencing step. The purpose of this latter spte is to fragment the full length transcripts to an average length of 300 base pairs due to sequencing length limitations. The electropherogram reveals an average length of 333 base pairs for these fragments.
  8. Total count: 46A : 46 (100%, 17+, 29- )C : 0G : 0T : 0N : 0---------------
  9. Total count: 46A : 46 (100%, 17+, 29- )C : 0G : 0T : 0N : 0---------------
  10. Total count: 46A : 46 (100%, 17+, 29- )C : 0G : 0T : 0N : 0---------------
  11. Background – Discovery of the fragments
  12. Background – Discovery of the fragments