SlideShare une entreprise Scribd logo
1  sur  48
Telomere-to-telomere assembly of a
complete human chromosomes
Karen Miga
UC Davis Genetics Seminar
Sept 30, 2019
@khmiga
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
Human reference genome is incomplete.
• 368 unresolved issues, 102 gaps
• Segmental duplications, gene families, satellite
arrays, centromeres, rDNAs
• Uncharacterized sequence variation in the human
population
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
Human reference genome is incomplete.
• 368 unresolved issues, 102 gaps
• Segmental duplications, gene families, satellite
arrays, centromeres, rDNAs
• Uncharacterized sequence variation in the human
population
chr21
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
Human reference genome is incomplete.
• 368 unresolved issues, 102 gaps
• Segmental duplications, gene families, satellite
arrays, centromeres, rDNAs
• Uncharacterized sequence variation in the human
population
Our current understanding of
genome biology and function30 Mb
chr21
New Era in Genetics and Genomics
We are finally reaching complete, high-quality
telomere-to-telomere chromosome assemblies
Human reference genome is incomplete.
• 368 unresolved issues, 102 gaps
• Segmental duplications, gene families, satellite
arrays, centromeres, rDNAs
• Uncharacterized sequence variation in the human
population
Our current understanding of
genome biology and function30 Mb
chr21
~20 Mb ?
Challenge:
Generating assemblies across repetitive regions that
span hundreds of kilobases.
Repeats (100 kb+)
Unique
variant
Unique
variant
Can high-coverage ultra-long sequencing resolve
complete assemblies of the human genome?
MinION
100kb+
It’s time to finish the human genome
The Telomere-to-Telomere (T2T) consortium is an
open, community-based effort to generate the
first complete assembly of a human genome.
Our target: CHM13hTERT
Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers
N=46; XX
Our target: CHM13hTERT
Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers
N=46; XX
Intramural Sequencing Center
CHM13 Sequencing
94 MinION/GridION flow cells
11.1M reads
155 Gb (1.6 Gb / flow cell) (50x)
99 Gb in reads >50 kb (32x)
78 Gb in reads >70 kb (25x)
Max mapped read length 1.04 Mb
From May 1/18 – Jan 8/19
Intramural Sequencing Center
CHM13 Sequencing
94 MinION/GridION flow cells
11.1M reads
155 Gb (1.6 Gb / flow cell) (50x)
99 Gb in reads >50 kb (32x)
78 Gb in reads >70 kb (25x)
Max mapped read length 1.04 Mb
From May 1/18 – Jan 8/19
50x Nanopore ultra-long
Contig building
60x PacBio
Polishing
50x 10x Genomics
Polishing
BioNano
Structural validation
• 2.94 Gbp assembly NG50: 75 Mbp
• Exceeds the continuity of the reference
genome GRCh38 (56 Mbp NG50
contig size).
• Subset of chromosome assemblies
break only at centromere.
Roadmap for completing the genome
Canu
Canu
Canu
Orthogonal Validation
Jo and Valerie
2.2 - 3.7 Mb
mean of 3010 kb (S.D. = 429; n = 49)
STRUCTURAL VARIANT
STRUCTURAL VARIANT
151516 15 3 8 2
8
4
Assemble contigs
Using overlapping
SV patterns
XqXp
Scaffold Assembly of XCEN
XqXp
Rel3 Assembly: ~3.1 Mb
The assembly is a hypothesis(!)
2107 294659
Beth SullivanJennifer Gerton
Edmund Howe
Rel3 Assembly: ~3.1 Mb
@NanoporeConf | #NanoporeConf
Marker-assisted mapping
Adam Phillippy Arang Rhie Sergey Koren
@NanoporeConf | #NanoporeConf
Create a scaffold of unique, or
single copy k-mers genome-wide
Marker-assisted mapping
Adam Phillippy Arang Rhie Sergey Koren
Marker-assisted mapping
@NanoporeConf | #NanoporeConf
Anchor high-confident
long-read alignments to
repeat assemblies
Marker-assisted mapping
Adam Phillippy Arang Rhie Sergey Koren
Marker-assisted mapping
28
Confident mapping of long reads
using a single-copy k-mer strategy
Identify and mark all sites of unique anchors across the chromosome
chrX
• 21-mers that appear ~c times in Illumina data
• Also found in PacBio/Nanopore reads
• Less frequent in the centromere, but still there
• (Validated with Duplex-Seq)
29
Confident mapping of long reads
using a single-copy k-mer strategy
Filter long read alignments: retaining those with unique k-mer anchoring
chrX
chrX
30
Spacing of single-copy k-mers can be irregular in
repeat-dense regions
chrX
chrX
X CENTROMERE ARRAY
CENTROMERE
CENX: 3.1 Mbps
Number of k-mers: 2,034
Spacing N50: 6,879
Longest distance
between k-mers
: 53,798 bp
31
10XG Polishing
Unique K-mer-based filtering: Nanopore Reads
longranger + freebayes (two rounds)
nanopolish (two rounds)
arrow (two rounds)
Unique K-mer-based filtering: PacBio (CLR) Reads
chrX
chrX
chrX
GAGE pre-polishing
ChrX GAGE array: 19 tandemly arrayed ~9.4 kb repeats
Coverage
250
200
150
100
50
0
Base position
Most frequent base
Second most frequent base (error)
19 tandemly arrayed ~9.4 kb repeats
GAGE with marker-assisted polishing
Most frequent base
Second most frequent base (error)
ChrX GAGE array: 19 tandemly arrayed ~9.4 kb repeats
Coverage
250
200
150
100
50
0
Base position
19 tandemly arrayed ~9.4 kb repeats
34
CSS/HiFi Evaluation
chrX
HiFi Alignments to Evaluate Polishing
CENTROMERE X:
BEFORE POLISHING
DXZ1: 3.1 Mb
35
CSS/HiFi Evaluation
chrX
HiFi Alignments to Evaluate Polishing
CENTROMERE X:
AFTER POLISHING
NOTE:
Underlying satellite array
structure remains the same.
DXZ1: 3.1 Mb
Opens the whole genome to analysis
Ariel Gershman
Winston Timp’s
Laboratory
Ariel Gershman
Winston Timp’s
Laboratory
Ariel Gershman
Winston Timp’s
Laboratory
Ariel Gershman
Winston Timp’s
Laboratory
1. Structurally validated assembly from telomere-to-telomere. Including
3.1 Mb tandem repeat at the X centromere and providing a complete
assessment across tandemly repeated gene families.
Finished T2T X Chromosome:
High Accuracy and High Continuity
1. Structurally validated assembly from telomere-to-telomere. Including
3.1 Mb tandem repeat at the X centromere and providing a complete
assessment across tandemly repeated gene families.
2. Novel polishing strategy capable of improving the quality of large repeat-
rich regions. Demonstrating dramatic improvements in quality over the
entirety of the X chromosome.
Finished T2T X Chromosome:
High Accuracy and High Continuity
1. Structurally validated assembly from telomere-to-telomere. Including
3.1 Mb tandem repeat at the X centromere and providing a complete
assessment across tandemly repeated gene families.
2. Novel polishing strategy capable of improving the quality of large repeat-
rich regions. Demonstrating dramatic improvements in quality over the
entirety of the X chromosome.
3. Statistics of CHM13 full length BAC alignments to polished assembly:
275/341 (81%) QV 37.4 QV 27.9
153/341 (45%) QV 37.7 QV 27.4
Vollger M, Logsdon, G et al. bioRxiv doi.org/10.1101/635037
MeanMedianBACs Aligned
HiFi
UL-asm
Finished T2T X Chromosome:
High Accuracy and High Continuity
@NanoporeConf | #NanoporeConf
It is time to finish the
human genome
• github.com/nanopore-wgs-consortium/chm13
• 120x Nanopore reads
• NHGRI, UW, Nottingham,
• UC Davis (PromethION, Megan Dennis)
• 50x 10x Genomics linked reads (NHGRI)
• 70x PacBio CLR reads (WashU)
• 24x PacBio HiFi reads (UW)
• 40x Hi-C (Arima Genomics)
• BioNano optical map (WashU)
• Unpolished Canu assemblies
NEW! Rel3 open data release
Additional ultra-long ONT data
from Glennis Logsdon (UW)
Read length Coverage Percent of data
>50 kbp 12X 86%
>100 kbp 9.1X 66%
>150 kbp 6.8X 49%
>200 kbp 4.9X 35%
>250 kbp 3.4X 24%
N50 = 147.1
N1 = 649.6
Max = 1538.3
0.1 1 10 100 1000 10,000
Read length (kbp)
20,000
17,500
15,000
12,500
10,000
7,500
5,000
2,500
0
Numberofreads
13.9X coverage
• github.com/nanopore-wgs-consortium/chm13
• Minimal change in continuity
• 79.5 Mbp (rel2) vs. 71.8 Mbp (rel3) NG50
• Don’t judge assemblies based on continuity
• Tricky regions are fixed
• GAGE and more SegDups automatically resolved
• Improved BAC validation
• 288 (rel2) vs. 310 (rel3) of 341 BACs resolved
• 1 chromosome down, 23 to go…
Triple the coverage, what changed?
Goal of a complete human genome in the next two
years.
Challenges in front of us:
• Acrocentric p-arms
• Large segmental duplications
• Classical Human satellites 2,3
Establishing new benchmarking standards (XChr)
Pioneering new pipelines: Polishing, repeat assembly, and array
structural validation.
Setting the bar higher for quality and completeness.
Telomere-to-telomere assembly of a complete human chromosomes

Contenu connexe

Tendances

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Mrinal Vashisth
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Sebastian Schmeier
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...Torsten Seemann
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomesavrilcoghlan
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeProf. Wim Van Criekinge
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie
 
Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...Vall d'Hebron Institute of Research (VHIR)
 
DNA sequencing by kk sahu sir
DNA sequencing by kk sahu sirDNA sequencing by kk sahu sir
DNA sequencing by kk sahu sirKAUSHAL SAHU
 
Next generation-sequencing.ppt-converted
Next generation-sequencing.ppt-convertedNext generation-sequencing.ppt-converted
Next generation-sequencing.ppt-convertedShweta Tiwari
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...VHIR Vall d’Hebron Institut de Recerca
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Hamza Khan
 

Tendances (20)

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Gene expression profiling i
Gene expression profiling  iGene expression profiling  i
Gene expression profiling i
 
Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)Next generation sequencing methods (final edit)
Next generation sequencing methods (final edit)
 
Exome sequence analysis
Exome sequence analysisExome sequence analysis
Exome sequence analysis
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Rna seq
Rna seqRna seq
Rna seq
 
Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)Next-generation sequencing and quality control: An Introduction (2016)
Next-generation sequencing and quality control: An Introduction (2016)
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...De novo genome assembly  - T.Seemann - IMB winter school 2016 - brisbane, au ...
De novo genome assembly - T.Seemann - IMB winter school 2016 - brisbane, au ...
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...
 
DNA sequencing by kk sahu sir
DNA sequencing by kk sahu sirDNA sequencing by kk sahu sir
DNA sequencing by kk sahu sir
 
Next generation-sequencing.ppt-converted
Next generation-sequencing.ppt-convertedNext generation-sequencing.ppt-converted
Next generation-sequencing.ppt-converted
 
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
Basic Aspects of Microarray Technology and Data Analysis (UEB-UAT Bioinformat...
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)Exome seuencing (steps, method, and applications)
Exome seuencing (steps, method, and applications)
 

Similaire à Telomere-to-telomere assembly of a complete human chromosomes

London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen MigaKaren Hayden Miga
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?Adam Phillippy
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortiumGenomeInABottle
 
Architecture and evolution of neochromosomes
Architecture and evolution of neochromosomesArchitecture and evolution of neochromosomes
Architecture and evolution of neochromosomesAnthony Papenfuss
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Miten Jain
 
Tetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenTetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenJonathan Eisen
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowBrian Krueger
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsGenomeInABottle
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_coursehansjansen9999
 
CALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqCALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqAshley Yow
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for educationaryajayakottarathil
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets? ehsan sepahi
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeBrian Krueger
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfNoraCRuizGuevara
 
Useful.ppt
Useful.pptUseful.ppt
Useful.pptaaaa bbb
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cellAmitSamadhiya1
 

Similaire à Telomere-to-telomere assembly of a complete human chromosomes (20)

London Calling 2019: Karen Miga
London Calling 2019: Karen MigaLondon Calling 2019: Karen Miga
London Calling 2019: Karen Miga
 
40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?40 Years of Genome Assembly: Are We Done Yet?
40 Years of Genome Assembly: Are We Done Yet?
 
How giab fits in the rest of the world telomere to telomere consortium
How giab fits in the rest of the world   telomere to telomere consortiumHow giab fits in the rest of the world   telomere to telomere consortium
How giab fits in the rest of the world telomere to telomere consortium
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Architecture and evolution of neochromosomes
Architecture and evolution of neochromosomesArchitecture and evolution of neochromosomes
Architecture and evolution of neochromosomes
 
Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...Generating high-quality human reference genomes using PromethION nanopore seq...
Generating high-quality human reference genomes using PromethION nanopore seq...
 
Sept2016 sv 10_x
Sept2016 sv 10_xSept2016 sv 10_x
Sept2016 sv 10_x
 
Tetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan EisenTetrahymena genome project update 2004 by Jonathan Eisen
Tetrahymena genome project update 2004 by Jonathan Eisen
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
High Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can KnowHigh Throughput Sequencing Technologies: What We Can Know
High Throughput Sequencing Technologies: What We Can Know
 
Aug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomicsAug2015 analysis team 04 10x genomics
Aug2015 analysis team 04 10x genomics
 
20150601 bio sb_assembly_course
20150601 bio sb_assembly_course20150601 bio sb_assembly_course
20150601 bio sb_assembly_course
 
CALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeqCALS_Stewards_of_Future_2015_Yow_IsoSeq
CALS_Stewards_of_Future_2015_Yow_IsoSeq
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
How we revealed genomes secrets?
How we revealed genomes secrets? How we revealed genomes secrets?
How we revealed genomes secrets?
 
High Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genomeHigh Throughput Sequencing Technologies: On the path to the $0* genome
High Throughput Sequencing Technologies: On the path to the $0* genome
 
Clase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdfClase 2 - Genoma Humano proyecto conicet.pdf
Clase 2 - Genoma Humano proyecto conicet.pdf
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
 

Plus de Genome Reference Consortium

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 

Plus de Genome Reference Consortium (20)

Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 

Dernier

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 

Dernier (20)

Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 

Telomere-to-telomere assembly of a complete human chromosomes

  • 1. Telomere-to-telomere assembly of a complete human chromosomes Karen Miga UC Davis Genetics Seminar Sept 30, 2019 @khmiga
  • 2. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies
  • 3. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies Human reference genome is incomplete. • 368 unresolved issues, 102 gaps • Segmental duplications, gene families, satellite arrays, centromeres, rDNAs • Uncharacterized sequence variation in the human population
  • 4. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies Human reference genome is incomplete. • 368 unresolved issues, 102 gaps • Segmental duplications, gene families, satellite arrays, centromeres, rDNAs • Uncharacterized sequence variation in the human population chr21
  • 5. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies Human reference genome is incomplete. • 368 unresolved issues, 102 gaps • Segmental duplications, gene families, satellite arrays, centromeres, rDNAs • Uncharacterized sequence variation in the human population Our current understanding of genome biology and function30 Mb chr21
  • 6. New Era in Genetics and Genomics We are finally reaching complete, high-quality telomere-to-telomere chromosome assemblies Human reference genome is incomplete. • 368 unresolved issues, 102 gaps • Segmental duplications, gene families, satellite arrays, centromeres, rDNAs • Uncharacterized sequence variation in the human population Our current understanding of genome biology and function30 Mb chr21 ~20 Mb ?
  • 7. Challenge: Generating assemblies across repetitive regions that span hundreds of kilobases. Repeats (100 kb+) Unique variant Unique variant Can high-coverage ultra-long sequencing resolve complete assemblies of the human genome?
  • 9. It’s time to finish the human genome The Telomere-to-Telomere (T2T) consortium is an open, community-based effort to generate the first complete assembly of a human genome.
  • 10. Our target: CHM13hTERT Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers N=46; XX
  • 11. Our target: CHM13hTERT Cell line from Urvashi Surti, Pitt; SKY karyotype from Jennifer Gerton and Tamara Potapova, Stowers N=46; XX
  • 12. Intramural Sequencing Center CHM13 Sequencing 94 MinION/GridION flow cells 11.1M reads 155 Gb (1.6 Gb / flow cell) (50x) 99 Gb in reads >50 kb (32x) 78 Gb in reads >70 kb (25x) Max mapped read length 1.04 Mb From May 1/18 – Jan 8/19
  • 13. Intramural Sequencing Center CHM13 Sequencing 94 MinION/GridION flow cells 11.1M reads 155 Gb (1.6 Gb / flow cell) (50x) 99 Gb in reads >50 kb (32x) 78 Gb in reads >70 kb (25x) Max mapped read length 1.04 Mb From May 1/18 – Jan 8/19 50x Nanopore ultra-long Contig building 60x PacBio Polishing 50x 10x Genomics Polishing BioNano Structural validation
  • 14. • 2.94 Gbp assembly NG50: 75 Mbp • Exceeds the continuity of the reference genome GRCh38 (56 Mbp NG50 contig size). • Subset of chromosome assemblies break only at centromere. Roadmap for completing the genome Canu
  • 15. Canu
  • 16. Canu
  • 18.
  • 19. 2.2 - 3.7 Mb mean of 3010 kb (S.D. = 429; n = 49)
  • 21. STRUCTURAL VARIANT 151516 15 3 8 2 8 4 Assemble contigs Using overlapping SV patterns
  • 23. XqXp Rel3 Assembly: ~3.1 Mb The assembly is a hypothesis(!)
  • 24. 2107 294659 Beth SullivanJennifer Gerton Edmund Howe Rel3 Assembly: ~3.1 Mb
  • 25. @NanoporeConf | #NanoporeConf Marker-assisted mapping Adam Phillippy Arang Rhie Sergey Koren
  • 26. @NanoporeConf | #NanoporeConf Create a scaffold of unique, or single copy k-mers genome-wide Marker-assisted mapping Adam Phillippy Arang Rhie Sergey Koren Marker-assisted mapping
  • 27. @NanoporeConf | #NanoporeConf Anchor high-confident long-read alignments to repeat assemblies Marker-assisted mapping Adam Phillippy Arang Rhie Sergey Koren Marker-assisted mapping
  • 28. 28 Confident mapping of long reads using a single-copy k-mer strategy Identify and mark all sites of unique anchors across the chromosome chrX • 21-mers that appear ~c times in Illumina data • Also found in PacBio/Nanopore reads • Less frequent in the centromere, but still there • (Validated with Duplex-Seq)
  • 29. 29 Confident mapping of long reads using a single-copy k-mer strategy Filter long read alignments: retaining those with unique k-mer anchoring chrX chrX
  • 30. 30 Spacing of single-copy k-mers can be irregular in repeat-dense regions chrX chrX X CENTROMERE ARRAY CENTROMERE CENX: 3.1 Mbps Number of k-mers: 2,034 Spacing N50: 6,879 Longest distance between k-mers : 53,798 bp
  • 31. 31 10XG Polishing Unique K-mer-based filtering: Nanopore Reads longranger + freebayes (two rounds) nanopolish (two rounds) arrow (two rounds) Unique K-mer-based filtering: PacBio (CLR) Reads chrX chrX chrX
  • 32. GAGE pre-polishing ChrX GAGE array: 19 tandemly arrayed ~9.4 kb repeats Coverage 250 200 150 100 50 0 Base position Most frequent base Second most frequent base (error) 19 tandemly arrayed ~9.4 kb repeats
  • 33. GAGE with marker-assisted polishing Most frequent base Second most frequent base (error) ChrX GAGE array: 19 tandemly arrayed ~9.4 kb repeats Coverage 250 200 150 100 50 0 Base position 19 tandemly arrayed ~9.4 kb repeats
  • 34. 34 CSS/HiFi Evaluation chrX HiFi Alignments to Evaluate Polishing CENTROMERE X: BEFORE POLISHING DXZ1: 3.1 Mb
  • 35. 35 CSS/HiFi Evaluation chrX HiFi Alignments to Evaluate Polishing CENTROMERE X: AFTER POLISHING NOTE: Underlying satellite array structure remains the same. DXZ1: 3.1 Mb
  • 36. Opens the whole genome to analysis Ariel Gershman Winston Timp’s Laboratory
  • 40. 1. Structurally validated assembly from telomere-to-telomere. Including 3.1 Mb tandem repeat at the X centromere and providing a complete assessment across tandemly repeated gene families. Finished T2T X Chromosome: High Accuracy and High Continuity
  • 41. 1. Structurally validated assembly from telomere-to-telomere. Including 3.1 Mb tandem repeat at the X centromere and providing a complete assessment across tandemly repeated gene families. 2. Novel polishing strategy capable of improving the quality of large repeat- rich regions. Demonstrating dramatic improvements in quality over the entirety of the X chromosome. Finished T2T X Chromosome: High Accuracy and High Continuity
  • 42. 1. Structurally validated assembly from telomere-to-telomere. Including 3.1 Mb tandem repeat at the X centromere and providing a complete assessment across tandemly repeated gene families. 2. Novel polishing strategy capable of improving the quality of large repeat- rich regions. Demonstrating dramatic improvements in quality over the entirety of the X chromosome. 3. Statistics of CHM13 full length BAC alignments to polished assembly: 275/341 (81%) QV 37.4 QV 27.9 153/341 (45%) QV 37.7 QV 27.4 Vollger M, Logsdon, G et al. bioRxiv doi.org/10.1101/635037 MeanMedianBACs Aligned HiFi UL-asm Finished T2T X Chromosome: High Accuracy and High Continuity
  • 43. @NanoporeConf | #NanoporeConf It is time to finish the human genome
  • 44. • github.com/nanopore-wgs-consortium/chm13 • 120x Nanopore reads • NHGRI, UW, Nottingham, • UC Davis (PromethION, Megan Dennis) • 50x 10x Genomics linked reads (NHGRI) • 70x PacBio CLR reads (WashU) • 24x PacBio HiFi reads (UW) • 40x Hi-C (Arima Genomics) • BioNano optical map (WashU) • Unpolished Canu assemblies NEW! Rel3 open data release
  • 45. Additional ultra-long ONT data from Glennis Logsdon (UW) Read length Coverage Percent of data >50 kbp 12X 86% >100 kbp 9.1X 66% >150 kbp 6.8X 49% >200 kbp 4.9X 35% >250 kbp 3.4X 24% N50 = 147.1 N1 = 649.6 Max = 1538.3 0.1 1 10 100 1000 10,000 Read length (kbp) 20,000 17,500 15,000 12,500 10,000 7,500 5,000 2,500 0 Numberofreads 13.9X coverage • github.com/nanopore-wgs-consortium/chm13
  • 46. • Minimal change in continuity • 79.5 Mbp (rel2) vs. 71.8 Mbp (rel3) NG50 • Don’t judge assemblies based on continuity • Tricky regions are fixed • GAGE and more SegDups automatically resolved • Improved BAC validation • 288 (rel2) vs. 310 (rel3) of 341 BACs resolved • 1 chromosome down, 23 to go… Triple the coverage, what changed?
  • 47. Goal of a complete human genome in the next two years. Challenges in front of us: • Acrocentric p-arms • Large segmental duplications • Classical Human satellites 2,3 Establishing new benchmarking standards (XChr) Pioneering new pipelines: Polishing, repeat assembly, and array structural validation. Setting the bar higher for quality and completeness.

Notes de l'éditeur

  1. KEY POINT HERE: spacing of unique variants… Some regions are easier than others….
  2. Number of k-mers: 2,034 Spacing N50: 6,879 Longest distance: 53,798 bp
  3. Median BAC QV 37.4 (mean QV 28.0) vs median QV 37.6 (mean WV 27.4 ) for the best CHM13 HiFi asm. And resolve 85% of BACs at >99.8% idy v.s. 54% for prior PacBio asm. T otal BACs: 341 Compressed: 166 1 Median: 99.9895 QV: 39.78811 Mean: 99.8706 QV: 28.88052 Mitchell HiFi: 153 1 Median: 99.9827 QV: 37.61954 Mean: 99.81871 QV: 27.41627 UL + 10x: 275 1 Median: 99.982 QV: 37.44727 Mean: 99.84145 QV: 27.99832
  4. Median BAC QV 37.4 (mean QV 28.0) vs median QV 37.6 (mean WV 27.4 ) for the best CHM13 HiFi asm. And resolve 85% of BACs at >99.8% idy v.s. 54% for prior PacBio asm. T otal BACs: 341 Compressed: 166 1 Median: 99.9895 QV: 39.78811 Mean: 99.8706 QV: 28.88052 Mitchell HiFi: 153 1 Median: 99.9827 QV: 37.61954 Mean: 99.81871 QV: 27.41627 UL + 10x: 275 1 Median: 99.982 QV: 37.44727 Mean: 99.84145 QV: 27.99832
  5. Median BAC QV 37.4 (mean QV 28.0) vs median QV 37.6 (mean WV 27.4 ) for the best CHM13 HiFi asm. And resolve 85% of BACs at >99.8% idy v.s. 54% for prior PacBio asm. T otal BACs: 341 Compressed: 166 1 Median: 99.9895 QV: 39.78811 Mean: 99.8706 QV: 28.88052 Mitchell HiFi: 153 1 Median: 99.9827 QV: 37.61954 Mean: 99.81871 QV: 27.41627 UL + 10x: 275 1 Median: 99.982 QV: 37.44727 Mean: 99.84145 QV: 27.99832