5. Considerations for PacBio WGS
• High molecular weight genomic DNA
• DNA must be of sufficient quality to allow for >30 kb shearing to
produce PacBio Continuous Long Reads (CLR)
• Consistent shearing >30 kb
• Shearing genomic DNA >30 kb is challenging and requires a
consistent technology
• Preferred method: Diagenode Megaruptor
• Alternate method: Covaris g-Tube
• Sufficient DNA for PacBio sample prep
• A single PacBio sample prep reaction requires 5 μg sheared DNA
• One library is composed of 8-10 sample prep reactions
• At least 2-4 libraries are required for 60x coverage
7. PacBio Workflow
DNA Shear
DNA Repair
Ligation/Exonuclease
BluePippin
>18kb Sizing
DNA Repair
AMPure PB
AMPure PB
3x AMPure PB
Rinse wells
AMPure PB
AMPure PB
Seq. Primer Anneal
P6 Polymerase Bind
MagBead Bind
Sequencing
30 minutes or 4 hours
20 minutes to 2 hours
Denature primer prior to use
4 to 6 hour collection time
• Adding DNA Damage Repair after BluePippin sizing increased the average Reads of Insert length by ~1 kb.
• Extending the P6 Polymerase Binding time from 30 minutes to 4 hours improved library complex loading per
SMRT cell
8. Standard PacBio protocol (sample prep & complex)
0.0
200.0
400.0
600.0
800.0
1,000.0
1,200.0
1,400.0
8,000
9,000
10,000
11,000
12,000
13,000
14,000
15,000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77
ROIYield(Mbases)
AverageROI(bp)
SMRT Cell
NA19240 Library 4 - Per SMRT Cell ROI length/yield
ROI length (bp) ROI Yield (Mbases)
Titration
• No Post-BluePippin DNA Damage Repair
• 30 min P6 polymerase bind
6 hour
Movies
4 hour
Movies
125 pM “on plate” loading concentration
G-Tube 4800✜
9. DNA Damage Repair & extended P6 bind
0.0
200.0
400.0
600.0
800.0
1,000.0
1,200.0
1,400.0
8,000
9,000
10,000
11,000
12,000
13,000
14,000
15,000
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536373839404142434445464748
ROIYield(Mbases)
AverageROI(bp)
SMRT Cell
NA19240 Library 5 - Per SMRT Cell ROI length/yield
ROI length (bp) ROI Yield (Mbases)
• No Post-BluePippin DNA Damage Repair
• 30 min P6 polymerase bind
• Post-BluePippin
DNA Damage
Repair
• 4 hour P6
polymerase
bind
G-Tube 4500✪
13. PacBio Sequencing Observations
HG00514: 4h v 6h movie lengths
Instrument Movie Time Avg. ROI (bp) ROI Mb/Cell # Cells
00116 240 13,502 803 119
42274 240 13,036 881 95
00116 360 14,324 998 56
42274 360 13,282 1,063 24
• DNA Input and Sizing
• The library DNA >18 kb is fractionated using the Sage Science BluePippin
• DNA Damage Repair enzyme mix used post BluePippin (increased read length)
• Chemistry
• (+) DNA Damage Repair/4 hr bind: 970.2 Mbases/cell
• Instruments
• Longer average Reads
• Increased Loading Efficiency
• What about long term storage?
15. Reconfigured Oligo
- Uses inline index sequence
- No P5 index – HiSeq X single index compatible
10X Genomics Overview
16. 10X Chromium Workflow
WGX Beta Product Workflow
gDNA
Extraction
GEM
Formation
Library
Prep
Long
Ranger
Pre-GEMs
Post-
GEMs
NGS Loupe
Qiagen MagAttract HMW Kit
Qubit quantifications
Dilute to 1 ng/ul
gDNA egram
Aliquot Master Mix
NaOH template denature
Load WGX Chip
Run instrument
Chip volume assessment
Instrument log
Isothermal incubation
Emulsion breaking
Bioanalyzer
HiSeq X or HiSeq 4000
2x150bp sequencing
200pmol loading
End Repair/A-tailing
Adapter ligation
SI PCR
Bioanalyzer
KAPA qPCR
Visualization
Demultiplexing
Alignment
De-duplication
SNP and indel calling
Large SV calling
Phasing
17. • HiSeq 4000
• 2x150, 200 pmol loading
• 2 lanes
Chromium NA19240 Library Sequencing Statistics
Post Gem: Isothermal Amp size dist.
Library Size Distribution
The spike at 0 in that graph is due to the
N's in the reference assembly.
18. NA19240 (MGI) NA12878 (10X)
Molecule Length (kb): 26,768 (±33,673) 94,923 (±64,103)
DNA in Molecules > 10kb 50.85 % 95.0%
DNA in Molecules > 100kb 1.38% 36.4%
SNPs Phased: 99.1% 97.8%
Longest Phase Block: 9.6 Mbp 34.7 Mbp
N50 Phase Block: 1.9 Mbp 9.5 Mbp
Chromium Molecule and Phasing Statistics
22. PacBio Assembly Contig
BioNano Genome Map Contigs
Sequencing Plan
Add 10X Linked Read information
Add Dovetail Hi-C/chiCago Data
23. Summary
• Goal: Generate robust data sets for additional high-quality
reference genome enhancing the full range of genetic
diversity in humans
• These long read (long range) sequencing/mapping
applications vary in approach and will provide synergistic
data sets to help accomplish our goal.
• Each system possesses unique challenges and requires
optimization of protocols and running conditions specific
to our needs.
• Experience and communication is key.
• Increasing applications and utility
• Polymerase read = read of insert
• BAC Pooling
• Low input SNV
• Multicolor labeling
24. Acknowledgements
The McDonnell Genome Institute at
Washington University in St. Louis
Rick Wilson
Sean McGrath
Amy Ly
Ryan Demeter
Dave Larson
Karyn Meltz Steinberg
Tina Graves
Bob Fulton
Derek Albracht
Milinn Kremitzki
Susan Rock
Debbie Scheer
Wes Warren
Chad Tomlinson
10X Genomics
Cassandra Jabara
Michael Schnall-Levin
Drew Kebbel
Rob Tarbox
Deanna Church
BioNano Genomics
Andrew Anfora
Palak Sheth
Alex Hastie
Pacific Biosciences
Paul Peluso
Nick Sisneros
Editor's Notes
For this project, we are generating ~60X – 100X coverage of PacBio long read data based on Reads – of – insert data
The plan:
1. de novo assembly of PacBio data.
2. scaffold the assembly with BioNano data as well as Dovetail chiCago and 10X genomics linked read data sets.
With the combined data sets will begin to generate scaffolds and identify areas of f potential misassemblies.
3. We are also targeting difficult to assemble regions of the genome by sequencing BACs.
Once the BACs are incorporated, we plan to align all of this data to the Reference very stringently to produce chromosomal AGPs. The end product will be a very high quality whole genome assembly.
Our role in this project mostly focused on Larger Insert pacbio libraries, adding BioNano data sets, and early work with 10X genomics.
Today, I will highlight our progress with these large molecule applications.
Highlight library – SMRT Bell
Reads types
Polymerase Reads
Subreads
Read on Insert
Currently, we are enriching SMRT bell libraries > 18kb.
The sloppy slide
For the NA19240 project, we generated a number of libraries (8). Please note, when I say library, I mean, for each library, a 10 reaction kit was used to
Process 50µg of DNA in 8-10 independent reactions and then pooled.
As this graph illustrates, the total number of SMRT cells for each of the libraries as well as the average ROI read length. Based on the these values, we
Were able to consistently show inconsistent results; which required some tweaking of the process and many discussions with the Nick and Paul at PacBio.
Based on these fruitful discussions, we were able to show a marked improvement which will be highlighted in this section.
Genomic DNA tape station shows and overlay of each of the different shearing conditions used to generated the multiple NA19240 SMRT Bell libraries
The symbols represent the mode for each electropherograms.
The table on the right highlights the shearing method for 7 of the 8 libraries, a
Data accumulation to date for each of the four genomes we’ve.
The total number of cells is provide by the number of 8 packs,
However, the table also illustrates the average number of
ROI reads in Mbp per cell.
Based on our modifications, we’ve transitioned the PacBio larger insert library protocol into production,
And we are happy to report a positive trend with each new library – increased ROI data throughput per cell.
In addition to sequence based methods, we are also utilizing the the BioNano IRYS system to generate physical maps of the genome
The advantage obtained with the physical allows us to maintain the order and orientation based on the nicking endonuclease recognition site.
The slide illustrates the processing overview starting with a cell culture.
Another resource we have is a BioNano Genome Map of CHM1. BioNano is a nanopore mapping technology where the DNA in very long molecules is nicked and labeled and run through a nanochannel. Here is an example of the CHM1 Bionano map aligned to a 1.5 Mb Pb contig. On top in green is the PacBio contig. The lines indicate the in silico nick sites. The Blue bars indicate the Bionano contigs. You can see how well they align. The Bionano data can be used as an independent source to assess the CHM1.
I want to acknowledge all of the collaborators on this project and all of the work that has gone into it thus far.