SlideShare une entreprise Scribd logo
1  sur  25
ASHG - GRC Workshop 
Tina Lindsay 
ASHG Oct 18, 2014
The Human Reference is Not Complete 
• Reference has been found to not be optimal in some 
regions 
• Structural variation makes it difficult to assemble a truly 
representative genome when using a diploid sample 
• Some regions were recalcitrant to closure with technology 
and resources available at the time 
• Additional sequences are needed to capture the full range 
of diversity in humans
UGT2B17 – Conflicting Alleles 
AC074378.4 
AC079749.5 
AC147055.2 
AC134921.2 
AC140484.1 
AC019173.4 
AC093720.2 
AC021146.7 
NCBI36 NC_000004.10 (chr4) Tiling Path 
TMPRSS11E TMPRSS11E2 
Xue Y et al, 2008 
GRCh37 NC_000004.11 (chr4) Tiling Path 
AC074378.4 
AC079749.5 
AC147055.2 
AC134921.1 
AC093720.2 
AC021146.7 
TMPRSS11E 
GRCh37: NT_167250.1 (UGT2B17 alternate locus) 
AC074378.4 
AC140484.1 
AC019173.4 
AC226496.2 
AC021146.7 
TMPRSS11E2 
G 
A 
P
Allelic Diversity vs. Segmental Duplication 
A 
A 
C 
T 
C 
G 
C 
C 
Repeat Copies (noted by color difference) 
Allelic 
Copies 
Diploid Genome 
With a diploid genome, there is significant ambiguity sorting allelic copies from repeat copies 
Haploid Genome 
A C C C 
Repeat Copies (ONLY but noted by color difference) 
With a haploid genome, allelic differences are eliminated, and base differences are likely 
indicative of repeat copies
Hydatidiform mole 
1. Fertilization of an oocyte without a nucleus 
2. Post-zygotic diploidization of triploid zygotes 
23x 
23X 
23X 23X 
? 
Oocyte Androgenetic HM
Initial Use Of CHM1 Source 
• CHORI-17 BAC Library 
• CHORI-17 BAC end sequences (n=325,659) 
• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) 
• CHORI-17 BACs 
• > 750 have been sequenced 
• 590 of them in Genbank as phase 3
SRGAP2 Homology between genes 
Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs 
Shows homology between SRGAP2B and SRGAP2C 
SRGAP2A 
SRGAP2B 
SRGAP2C 
Dennis, et.al. 2012
1q21 
1q32 1q21 1p21 
1q21 patch alignment to chromosome 1
IGH Region Highlights Allelic Differences 
Watson, et. al., 2013
Williams-Beuren Syndrome region 
Slide courtesy of Megan Dennis
Current status of CHM1 resources 
• CHORI-17 BAC Library (created from CHM1 cell line) 
• CHORI-17 BAC end sequences (n=325,659) 
• CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) 
• CHORI-17 BACs (>750 have been sequenced, with 592 of them in 
Genbank as phase 3) 
• Active cell line 
• >100X coverage Illumina 100bp reads 
• 300, 500bp, 3kb inserts 
• Reference assisted assembly CHM1_1.1 
• BioNano genome map 
• >50X coverage of PacBio long read data
CHM1_1.1 Assembly 
• Reference-guided assembly – SRPRISM v2.3, R. Agarwala 
• Alignment of Illumina reads to GRCh37 primary assembly 
• CHORI-17 BAC clone tilepaths were then incorporated 
• 428 total clones 
• 324 clones in 45 tilepaths 
• 104 clones as singletons 
• Comparison back to GRCh37 reference to provide appropriate gaps 
sizes 
• Assembly submitted to Genbank 
• http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2 
• Paper to be published soon 
• Genome Research (in press) 
• biorxiv doi (doi: http://dx.doi.org/10.1101/006841)
CHM1_1.1 Assembly 
Total Sequence Length 3,037,866,619 bp 
Total Assembly Gap Length 210,229,812 bp 
Number of Scaffolds 163 
Scaffold N50 50,362,920 bp 
Number of Contigs 40,828 
Contig N50 143,936 bp 
CHM1_1.1 
GRCh3 
7
Incorporation of CHM1_1.1 Assembly Data in GRCh38
PacBio CHM1 Assembly potentially fills GRCh38 Gaps 
GRCh38 
PacBio CHM1
PacBio CHM1 Assembly Shows Data Not in GRCH38 
GRCh38 
PacBio CHM1 
Second Pass Alignment
CHM1 BioNano Genome Map Aligned to GRCh38 
GRCh38 
CHM1 BioNano Map 
~15kb additional data
BioNano SV Calls Identified a Assembly Problems 
Collapse 
Expansion 
in Assembly 
CHM1_1.1 Assembly Gap in Sequence 
CHM1 BioNano Map
Collapse in Sequence Data 
Thought to be missing ~100kb in sequenced clones 
GRCh38
Gap Sizing 
Chr8 – Stalled Gap 
Estimated at ~150kb 
GRCh38 
Sized using CHM1 Genome Map - >500 Kb
Future of CHM1 Assembly 
• Plan to make as contiguous and accurate as possible 
• Incorporate PacBio assembly where possible 
• Additional CH17 clones being sequenced through 
segmentally duplicated and structurally variant regions to 
provide local assembly benefits (isolates the repeats)
CYP2D6 – Providing Alternate Alleles 
ABC7 
(NA18517) 
ABC8 
(NA18507) 
ABC9 
(NA18956) 
ABC11 
(NA18555)
Future Directions 
• Continued Improvement on CHM1 Genome 
• Integration of Pacific Bioscience whole genome assembly 
• BioNano genome map data 
• Continue to add diversity to the reference by sequencing 
new samples that provide additional diversity than what is 
currently represented in GRCh38 
• Continued sequencing of CH17 single haplotype BAC 
tilepaths to better represent segmentally duplicated 
regions 
• Additional collaborations with the community to develop 
tools to more fully utilize the full reference assembly 
(alternate haplotypes)
Acknowledgements 
The Genome Institute at Washington 
University in St. Louis 
Rick Wilson 
Bob Fulton 
Wes Warren 
Karyn Meltz Steinberg 
Vince Magrini 
Derek Albracht 
Milinn Kremitzki 
Susan Rock 
Debbie Scheer 
Aye Wollam 
The Finishing and Bioinformatics Teams 
at The Genome Institute 
University of Washington 
Evan Eichler 
Megan Dennis 
Xander Nuttler 
NCBI 
Richa Argwala 
Valerie Schneider 
University of Pittsburgh 
School of Medicine (CHM1 cell line) 
Urvashi Surti 
Personalis 
Deanna Church 
BioNano Genomics 
Pacific Biosciences 
UCSF 
Pui-Yan Kwok 
Yvonne Lai 
Chin Lin 
CHORI Catherine Chu 
Pieter de Jong
Ashg grc workshop2014_tg

Contenu connexe

Tendances

Tendances (20)

Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
Explaining the assembly model
Explaining the assembly modelExplaining the assembly model
Explaining the assembly model
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 
Schneider_AGBT2014
Schneider_AGBT2014Schneider_AGBT2014
Schneider_AGBT2014
 
AGBT 2016 Workshop Magrini
AGBT 2016 Workshop MagriniAGBT 2016 Workshop Magrini
AGBT 2016 Workshop Magrini
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 

Similaire à Ashg grc workshop2014_tg

Similaire à Ashg grc workshop2014_tg (20)

Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
150224 grc kms
150224 grc kms150224 grc kms
150224 grc kms
 
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremyTowards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
Towards a Reference Genome for Switchgrass (Panicum virgatum) - Schmutz jeremy
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121Open pacbiomodelorgpaper j_landolin_20150121
Open pacbiomodelorgpaper j_landolin_20150121
 
Jan2016 pac bio giab
Jan2016 pac bio giabJan2016 pac bio giab
Jan2016 pac bio giab
 
How to sequence a large eukaryotic genome
How to sequence a large eukaryotic genomeHow to sequence a large eukaryotic genome
How to sequence a large eukaryotic genome
 
Abrf 2017 hadfield j
Abrf 2017 hadfield jAbrf 2017 hadfield j
Abrf 2017 hadfield j
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Bionano genome maps_feb2014
Bionano genome maps_feb2014Bionano genome maps_feb2014
Bionano genome maps_feb2014
 
Giab agbt SVs_2019
Giab agbt SVs_2019Giab agbt SVs_2019
Giab agbt SVs_2019
 
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycleRNA-Seq transcriptome analysis of Gonium pectorale cell cycle
RNA-Seq transcriptome analysis of Gonium pectorale cell cycle
 
Making genome edits in mammalian cells
Making genome edits in mammalian cellsMaking genome edits in mammalian cells
Making genome edits in mammalian cells
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
ASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottleASHG 2015 Genome in a bottle
ASHG 2015 Genome in a bottle
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
DNA-Protein interaction by 3C based method.pptx
DNA-Protein interaction by 3C based method.pptxDNA-Protein interaction by 3C based method.pptx
DNA-Protein interaction by 3C based method.pptx
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 

Plus de Genome Reference Consortium

Plus de Genome Reference Consortium (16)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 

Ashg grc workshop2014_tg

  • 1. ASHG - GRC Workshop Tina Lindsay ASHG Oct 18, 2014
  • 2. The Human Reference is Not Complete • Reference has been found to not be optimal in some regions • Structural variation makes it difficult to assemble a truly representative genome when using a diploid sample • Some regions were recalcitrant to closure with technology and resources available at the time • Additional sequences are needed to capture the full range of diversity in humans
  • 3. UGT2B17 – Conflicting Alleles AC074378.4 AC079749.5 AC147055.2 AC134921.2 AC140484.1 AC019173.4 AC093720.2 AC021146.7 NCBI36 NC_000004.10 (chr4) Tiling Path TMPRSS11E TMPRSS11E2 Xue Y et al, 2008 GRCh37 NC_000004.11 (chr4) Tiling Path AC074378.4 AC079749.5 AC147055.2 AC134921.1 AC093720.2 AC021146.7 TMPRSS11E GRCh37: NT_167250.1 (UGT2B17 alternate locus) AC074378.4 AC140484.1 AC019173.4 AC226496.2 AC021146.7 TMPRSS11E2 G A P
  • 4. Allelic Diversity vs. Segmental Duplication A A C T C G C C Repeat Copies (noted by color difference) Allelic Copies Diploid Genome With a diploid genome, there is significant ambiguity sorting allelic copies from repeat copies Haploid Genome A C C C Repeat Copies (ONLY but noted by color difference) With a haploid genome, allelic differences are eliminated, and base differences are likely indicative of repeat copies
  • 5. Hydatidiform mole 1. Fertilization of an oocyte without a nucleus 2. Post-zygotic diploidization of triploid zygotes 23x 23X 23X 23X ? Oocyte Androgenetic HM
  • 6. Initial Use Of CHM1 Source • CHORI-17 BAC Library • CHORI-17 BAC end sequences (n=325,659) • CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) • CHORI-17 BACs • > 750 have been sequenced • 590 of them in Genbank as phase 3
  • 7. SRGAP2 Homology between genes Shows nearly identical segments between SRGAP2A and SRGAP2 paralogs Shows homology between SRGAP2B and SRGAP2C SRGAP2A SRGAP2B SRGAP2C Dennis, et.al. 2012
  • 8. 1q21 1q32 1q21 1p21 1q21 patch alignment to chromosome 1
  • 9. IGH Region Highlights Allelic Differences Watson, et. al., 2013
  • 10. Williams-Beuren Syndrome region Slide courtesy of Megan Dennis
  • 11. Current status of CHM1 resources • CHORI-17 BAC Library (created from CHM1 cell line) • CHORI-17 BAC end sequences (n=325,659) • CHORI-17 multiple enzyme fingerprint map (1560 fpc contigs) • CHORI-17 BACs (>750 have been sequenced, with 592 of them in Genbank as phase 3) • Active cell line • >100X coverage Illumina 100bp reads • 300, 500bp, 3kb inserts • Reference assisted assembly CHM1_1.1 • BioNano genome map • >50X coverage of PacBio long read data
  • 12. CHM1_1.1 Assembly • Reference-guided assembly – SRPRISM v2.3, R. Agarwala • Alignment of Illumina reads to GRCh37 primary assembly • CHORI-17 BAC clone tilepaths were then incorporated • 428 total clones • 324 clones in 45 tilepaths • 104 clones as singletons • Comparison back to GRCh37 reference to provide appropriate gaps sizes • Assembly submitted to Genbank • http://www.ncbi.nlm.nih.gov/assembly/GCF_000306695.2 • Paper to be published soon • Genome Research (in press) • biorxiv doi (doi: http://dx.doi.org/10.1101/006841)
  • 13. CHM1_1.1 Assembly Total Sequence Length 3,037,866,619 bp Total Assembly Gap Length 210,229,812 bp Number of Scaffolds 163 Scaffold N50 50,362,920 bp Number of Contigs 40,828 Contig N50 143,936 bp CHM1_1.1 GRCh3 7
  • 14. Incorporation of CHM1_1.1 Assembly Data in GRCh38
  • 15. PacBio CHM1 Assembly potentially fills GRCh38 Gaps GRCh38 PacBio CHM1
  • 16. PacBio CHM1 Assembly Shows Data Not in GRCH38 GRCh38 PacBio CHM1 Second Pass Alignment
  • 17. CHM1 BioNano Genome Map Aligned to GRCh38 GRCh38 CHM1 BioNano Map ~15kb additional data
  • 18. BioNano SV Calls Identified a Assembly Problems Collapse Expansion in Assembly CHM1_1.1 Assembly Gap in Sequence CHM1 BioNano Map
  • 19. Collapse in Sequence Data Thought to be missing ~100kb in sequenced clones GRCh38
  • 20. Gap Sizing Chr8 – Stalled Gap Estimated at ~150kb GRCh38 Sized using CHM1 Genome Map - >500 Kb
  • 21. Future of CHM1 Assembly • Plan to make as contiguous and accurate as possible • Incorporate PacBio assembly where possible • Additional CH17 clones being sequenced through segmentally duplicated and structurally variant regions to provide local assembly benefits (isolates the repeats)
  • 22. CYP2D6 – Providing Alternate Alleles ABC7 (NA18517) ABC8 (NA18507) ABC9 (NA18956) ABC11 (NA18555)
  • 23. Future Directions • Continued Improvement on CHM1 Genome • Integration of Pacific Bioscience whole genome assembly • BioNano genome map data • Continue to add diversity to the reference by sequencing new samples that provide additional diversity than what is currently represented in GRCh38 • Continued sequencing of CH17 single haplotype BAC tilepaths to better represent segmentally duplicated regions • Additional collaborations with the community to develop tools to more fully utilize the full reference assembly (alternate haplotypes)
  • 24. Acknowledgements The Genome Institute at Washington University in St. Louis Rick Wilson Bob Fulton Wes Warren Karyn Meltz Steinberg Vince Magrini Derek Albracht Milinn Kremitzki Susan Rock Debbie Scheer Aye Wollam The Finishing and Bioinformatics Teams at The Genome Institute University of Washington Evan Eichler Megan Dennis Xander Nuttler NCBI Richa Argwala Valerie Schneider University of Pittsburgh School of Medicine (CHM1 cell line) Urvashi Surti Personalis Deanna Church BioNano Genomics Pacific Biosciences UCSF Pui-Yan Kwok Yvonne Lai Chin Lin CHORI Catherine Chu Pieter de Jong