SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Keep Calm
And
Carry on Sequencing
Deanna M. Church
Staff Scientist, NCBI
@deannachurch
http://genomereference.org
Valerie Schneider, NCBI
Photograph: Paul Popper/Popperfoto/Getty Images
Church sfaf13
GRCh38 is coming
(September, 2013)
Church sfaf13
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
0
5,000,000
10,000,000
15,000,000
20,000,000
25,000,000
30,000,000
35,000,000
40,000,000
45,000,000
GRCh37p12 CHM1.0 HuRef HsapALLPATHS1 YH1
Con g N50
Con g N50
0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
400,000
CHM1.0 HuRef HsapALLPATHS1 YH1
Con g N50
Con g N50
0
1000000
2000000
3000000
4000000
5000000
6000000
GRCh37p12 CHM1.0 HuRef HsapALLPATHS1 YH1
Number of Con gs
Number of Con gs
0
10000
20000
30000
40000
50000
60000
70000
80000
GRCh37p12
CHM
1.0
HuRef
HsapALLPATHS1
Number of Con gs
Number of Con gs
Church sfaf13
http://www.bioplanet.com/gcat
Church sfaf13
http://genomereference.org
Dennis et al., 2012
1q32 1q21 1p21
1p21 patch alignment to chromosome 1
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
CDC27
1KG Phase 1 Strict accessibility mask
SNP (all)
SNP (not 1KG)
Sudmant et al., 2010
Kidd et al, 2007APOBEC cluster
Part of chr22 assembly
Alternate locus for chr22
White: Insertion
Black: Deletion
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
129S6/SVEvTac tiling path
Alignment to C57BL/6J chr1
B6 Genes
129S6/SvEvTac Genes
+ 32Kb in 129S6/SvEvTac
Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
NM_031192.3: transcript from C57BL/6J
NM_031193.2: transcript from FVB/N
129S6/SvEvTac Alt Locus Alignment (allelic)
FVB/N Transcript Alignment (paralog)
129S6/SvEvTac Ren1
FVB Ren2 Tx
Paralogous
diff
SNP +
Paralogous
diff
Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
NM_031192.3: transcript from C57BL/6J
NM_031193.2: transcript from FVB/N
An assembly is a MODEL of the genome
Assembly Model
Church sfaf13
BAC insert
BAC vector
Shotgun sequence
Assemble
GAPS
Finishing
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-21
NCBI36 (hg18)
GRCh37(hg19)
NCBI35 (hg17)
GRCh37 (hg19)
AL139246.20
AL139246.21
Daly et al., 2013
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321
Fixing Rare/Incorrect Bases
Fixing Rare/Incorrect Bases
GRCh37B Sites for Update: n=1164
Sites with unique successful ctg 1148 (98.6%)
Avg Length 448 bp
Min/Max Success Length 51/791 bp
Avg Coverage 80x
Read Source (all contigs)
High coverage 32%
Low coverage 57%
Exome 10%
Fixing Rare/Incorrect Bases
Build sequence contigs based on contigs
defined in TPF (Tiling Path File).
Check for orientation consistencies
Select switch points
Instantiate sequence for further analysis
Switch point
Representative chromosome
sequence
RP11-34P13 64E8 RP4-669L17 RP5-857K21 RP11-206L10 RP11-54O7
Gaps
NCBI36
nsv832911 (nstd68) Submitted on NCBI35 (hg17)
NCBI35 (hg17) Tiling Path
GRCh37 (hg19) Tiling Path
Gap Inserted
Moved approximately 2 Mb
distal on chr15
NC_0000015.8 (chr15)
NC_0000015.9 (chr15)
Removed from assembly
Added to assembly
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-24
Sequences from haplotype 1
Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
AC074378.4
AC079749.5
AC134921.2
AC147055.2
AC140484.1
AC019173.4
AC093720.2
AC021146.7
NCBI36NC_000004.10 (chr4) Tiling Path
Xue Y et al, 2008
TMPRSS11E TMPRSS11E2
GRCh37NC_000004.11 (chr4) Tiling Path
AC074378.4
AC079749.5
AC134921.1
AC147055.2
AC093720.2
AC021146.7
TMPRSS11E
GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC074378.4
AC140484.1
AC019173.4
AC226496.2
AC021146.7
TMPRSS11E2
nsv532126 (nstd37)
Adding Novel Sequence
1000G ph1 decoy sequence, viewed by:
• GenBank alignment
• Percent Repeat Masker
• Repeat Masker type
• Sequence Source (HTG, HuRef, ALLPATHS)
Adding Novel Sequence
Adding Novel Sequence
Genovese et al., 2013
Adding Novel Sequence
Karen Hayden and Jim Kent
Human Resolved for GRCh38
http://genomereference.org
Examples
Preview of GRCh38 (scheduled Fall 2013)
TEX28 TKTL1
LOC101060233
(opsin related)
LOC101060234
(TEX28 related)
GRCh37 (current reference assembly)
chrX
Hydin: chr16 (16q22.2)
Hydin2: chr1 (1q21.1)
Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Doggett et al., 2006
FAM23_MRC1 Region, chr10
Segmental Duplications
1KG accessibility Mask
Novel Patch 250 kb of artificial duplication
Adding Novel Sequence
Richa Agarwala
MHC Alternate locus
Alignment to chr6
Church sfaf13
Making the assembly accessible to
existing tools: masking
Query set: 439,109,084 NA12878 HiSeq reads
Masking effectively blocks alignments
in regions with high identity
Simulated reads from GRCh37.p9
• Unpaired reads
• 101 bp
• 1x coverage
• Default wgsim parameters
Masking parameters
• Percent Id: 100%
• Step size: 5 bp
• Minimum length: 101 bp
• Center SNPs in unmasked regions
Masking improves alignments in
regions with alternate loci or patches
NA12878 reads whose best
alignment was on an alt/patch in
the masked assembly were
evaluated for their alignment
location when aligned to the
primary assembly alone
Masking effectively reduces the
increase in NA12878 reads that
have alignments with MAPQ=0 that
occurs when the full assembly is
used as an alignment substrate
GRCh38 is coming
(September, 2013)

Contenu connexe

Similaire à Church sfaf13

MLSC-382A INSTRUCTOR BETH RAWSON 1 Resu
MLSC-382A       INSTRUCTOR BETH RAWSON     1 ResuMLSC-382A       INSTRUCTOR BETH RAWSON     1 Resu
MLSC-382A INSTRUCTOR BETH RAWSON 1 ResuIlonaThornburg83
 
SPECT/CT: HOW Much Radiation Dose CT Constitute
SPECT/CT: HOW Much Radiation Dose CT ConstituteSPECT/CT: HOW Much Radiation Dose CT Constitute
SPECT/CT: HOW Much Radiation Dose CT ConstituteShahid Younas
 
Dr. Robert Langer - Simposio Internacional 'Terapias oncológicas avanzadas'
Dr. Robert Langer - Simposio Internacional 'Terapias oncológicas avanzadas'Dr. Robert Langer - Simposio Internacional 'Terapias oncológicas avanzadas'
Dr. Robert Langer - Simposio Internacional 'Terapias oncológicas avanzadas'Fundación Ramón Areces
 
Metodiautonomico2015
Metodiautonomico2015Metodiautonomico2015
Metodiautonomico2015PahPavia
 
Native and IM-MS characterization of ADCs
Native and IM-MS characterization of ADCsNative and IM-MS characterization of ADCs
Native and IM-MS characterization of ADCsWaters Corporation
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsAndrea Ujvari
 
Biochem research paper for summer2
Biochem research paper for summer2Biochem research paper for summer2
Biochem research paper for summer2Lucy Ingaiza
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...Benjamin Schwessinger
 
hERG SOT Poster 2010
hERG SOT Poster 2010hERG SOT Poster 2010
hERG SOT Poster 2010ShiminWang
 
hERG SOT Poster 2010
hERG SOT Poster 2010hERG SOT Poster 2010
hERG SOT Poster 2010karenbernards
 
Models Can Lie
Models Can LieModels Can Lie
Models Can LieRaju Rimal
 
Pressure drop model presentation april 19th
Pressure drop model presentation april 19thPressure drop model presentation april 19th
Pressure drop model presentation april 19thYen Nguyen
 
Packer Stop Af1
Packer Stop Af1Packer Stop Af1
Packer Stop Af1enforme
 

Similaire à Church sfaf13 (20)

RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues tRNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
RNA (gene expression) analysis of Prostate cancers and non-cancerous tissues t
 
MLSC-382A INSTRUCTOR BETH RAWSON 1 Resu
MLSC-382A       INSTRUCTOR BETH RAWSON     1 ResuMLSC-382A       INSTRUCTOR BETH RAWSON     1 Resu
MLSC-382A INSTRUCTOR BETH RAWSON 1 Resu
 
SPECT/CT: HOW Much Radiation Dose CT Constitute
SPECT/CT: HOW Much Radiation Dose CT ConstituteSPECT/CT: HOW Much Radiation Dose CT Constitute
SPECT/CT: HOW Much Radiation Dose CT Constitute
 
Dr. Robert Langer - Simposio Internacional 'Terapias oncológicas avanzadas'
Dr. Robert Langer - Simposio Internacional 'Terapias oncológicas avanzadas'Dr. Robert Langer - Simposio Internacional 'Terapias oncológicas avanzadas'
Dr. Robert Langer - Simposio Internacional 'Terapias oncológicas avanzadas'
 
Metodiautonomico2015
Metodiautonomico2015Metodiautonomico2015
Metodiautonomico2015
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Native and IM-MS characterization of ADCs
Native and IM-MS characterization of ADCsNative and IM-MS characterization of ADCs
Native and IM-MS characterization of ADCs
 
Optimization parameters in Countercurrent Chromatography
Optimization parameters in Countercurrent ChromatographyOptimization parameters in Countercurrent Chromatography
Optimization parameters in Countercurrent Chromatography
 
Concordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_resultsConcordance_of_HTA_array_and_real_time_qPCR_results
Concordance_of_HTA_array_and_real_time_qPCR_results
 
Biochem research paper for summer2
Biochem research paper for summer2Biochem research paper for summer2
Biochem research paper for summer2
 
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
Generating haplotype phased reference genomes  for the dikaryotic wheat strip...Generating haplotype phased reference genomes  for the dikaryotic wheat strip...
Generating haplotype phased reference genomes for the dikaryotic wheat strip...
 
hERG SOT Poster 2010
hERG SOT Poster 2010hERG SOT Poster 2010
hERG SOT Poster 2010
 
hERG SOT Poster 2010
hERG SOT Poster 2010hERG SOT Poster 2010
hERG SOT Poster 2010
 
Yonas, Howard
Yonas, HowardYonas, Howard
Yonas, Howard
 
Models Can Lie
Models Can LieModels Can Lie
Models Can Lie
 
Dreams & papyrus
Dreams & papyrusDreams & papyrus
Dreams & papyrus
 
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
Chromatography: Meeting the Challenges of EU regulations with up-to-date Conf...
 
Pressure drop model presentation april 19th
Pressure drop model presentation april 19thPressure drop model presentation april 19th
Pressure drop model presentation april 19th
 
Packer Stop Af1
Packer Stop Af1Packer Stop Af1
Packer Stop Af1
 
Tct surya dharma
Tct surya dharmaTct surya dharma
Tct surya dharma
 

Plus de Deanna Church

Plus de Deanna Church (11)

Church gia13
Church gia13Church gia13
Church gia13
 
Church apr2013
Church apr2013Church apr2013
Church apr2013
 
Church ngs
Church ngsChurch ngs
Church ngs
 
Church agbt13 merge
Church agbt13 mergeChurch agbt13 merge
Church agbt13 merge
 
Church clinical2012
Church clinical2012Church clinical2012
Church clinical2012
 
Church isca2012
Church isca2012Church isca2012
Church isca2012
 
Church nhgri 2012
Church nhgri 2012Church nhgri 2012
Church nhgri 2012
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 
Church gmod2012 pt1
Church gmod2012 pt1Church gmod2012 pt1
Church gmod2012 pt1
 
Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
 
Church Fif2009
Church Fif2009Church Fif2009
Church Fif2009
 

Dernier

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8DianaGray10
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 

Dernier (20)

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8UiPath Studio Web workshop series - Day 8
UiPath Studio Web workshop series - Day 8
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 

Church sfaf13

Notes de l'éditeur

  1. Picture of a really bored teenager?
  2. 1000Gs and ENCODE logos: What ties them together? Data analysis was absolutely dependent on the reference assembly.
  3. CtgN50 stats here
  4. Look up how much novel sequence addedAcross all patches: 35 Mb of sequence added
  5. 44 SNVs between Ren2 Tx alignment and Primary, 29 of these have rsIDs: of these, 19 Alt base = Ref (likely paralog diff and no evidence for polymorphism), 9 Alt base = Tx base (SNP and Parolog diff?), 1 Alt base != Ref and Alt base != Tx (craziness)
  6. Insert dot matrix alignment- pull from assembly-assembly alignments
  7. Daly paper on VNTR
  8. For the intermediate build GRCh37B, we are updating a subset of the high-confidence bases, about 1000, as our proof-of-principle. This panel shows reads from NA12878 aligned to chr. 19 that identify a base with MAF=0 in the LIN37 locus. This creates a non-consensus splice site.To create accessioned sequence for correcting the reference, we are using cortex_con (Iqbal and Caccamo) to generate mini-contigs (>= 50 bp) from collections of 1kG and RP11 WGS reads, the former selected from random 1kG populations.
  9. In ph1, 1000G identified just over 235K bases with an MAF < 0.05. These may represent wrong or rare bases, and the GRC has been urged to change all of these.GRC decided to take a conservative approach:Focus on high-confidence subset of these bases (provided by 1kG analysis group: Poplin, Clarke, Streeter): 54K of these; 5K “wrong”; 1.5K overlap a trxpt:In strict accessibility maskHave clone sequence supporting alt baseNo failed variants within 150 bp of questionable baseWill fix wrong bases in set- those cause the most problems for variant analyses. Will update only rare bases in set with functional effects.Not updating ph1 indels. Sanger also doing some independent analyses for bases and indels. Summer will be spent defining the final collection of bases to be updated.
  10. Stats for the mini-contigs built for GRCh37B.This slide shows the correction of the LIN37 issues via insertion of two mini-contigs into the tiling path.In GRCh37B, 56 RefSeqs, corresponding to 26 distinctloci have had their alignments improved by addition of the mini-contigs in GRCh37B. We have some development left for GRCh38:Tweaking process to build contigs that address clustered bases and/or indelsDefine the final set of bases
  11. Alignments refer to pairs of sequence. Once you know how a pair of sequences go together, you can look at stringing the pairs along into a contig. The contig is essentially the consensus sequence that is produced from the components.To create a contig, we use the steps shown on this slide.What are switch points? As you create the consensus sequence of the contig, the switch points tell you where to stop using the sequence from one component and begin using the sequence from the next.
  12. Adding novel sequence for GRCh38.One source of this novel sequence is the 1kG ph1 decoy sequence.Decoy doesn’t provide chromosome context. Thus, if we can place much of the decoy in chromosome context for GRCh38, that adds even more value to the assembly. This slide shows the breakdown of the decoy: by source (bottom), by alignment to GenBank, and by amount and type of repeat.The GRC intends to assess capture by looking at 1kG reads that used to align to the decoy and seeing where they align in the updated assembly.
  13. Other portions of the decoylikely represent sequence that belongs in reference assembly gaps. We are aligning all HuRef and ALLPATHS scaffolds to the reference assembly to identify sequences that extend into or span gaps. This slide shows how a combination of HuRef WGS and PCR product close a gap on chr. 16 and provide complete representation for TMEM114.Analysis of GRCh37B shows:46 of 73 HuRef scaffold insertions involve decoy. 77 ALLPATHS decoy contigs are being added at 46 gaps.
  14. Lastly, some portions of the decoy will represent sequence variants. In these cases, the primary assembly does not need to be changed, but the decoy can be added as a NOVEL patch/alt locus.This slide shows a NOVEL locus that was created to capture a decoy sequence containing 30kb of additional sequence, which represents a repeat expansion.As of GRCh37.p12, 87 of 781 decoy sequences have been captured in chromosome updates/fix patches or as novel patches/alt loci.
  15. There are several mechanisms we can use for capturing decoy.Much of the decoy represents centromeric repeat sequence. In collaboration with Karen Hayden in Jim Kent’s lab at UCSC, the GRC is planning to include modeled centromeric sequences in GRCh38.
  16. The reference is not just the is the chromosome sequences of the primary assembly unit, but also includes the alternate loci and patches, which are used to provide additional sequence representations at selected genomic regions. The GRC has been releasing patches to the human assembly on a quarterly cycle, and we’re now at GRCh37.p12. There are two varieties of patches:FIX patches correct existing assembly problems: chromosome will update, patches integrated in GRCh38NOVEL patches add new sequence representations: will become alternate lociThis ideogram shows the current distribution of patches and alternate loci, and you can see that many regions have changed since GRCh37. Note that approximately 3% of the current public human assembly GRCh37 is associated with a region that is represented by a patch or alternate locus.
  17. Adding NOVEL sequence for GRCh38 doesn’t just mean adding sequence that is completely unrepresented in GRCh37. While many of the NOVEL patches, like the one on the previous slide, represent indels, adding novel sequence also means adding sequence variants for regions too complex to be represented by a single path.There is substantial variation at the LRC/KIR region on chr. 19. As shown on this slide, not only has the GRC replaced the GRCh37 path, which was derived from components from different clone libraries, with a single haplotype path from the CHM1 assembly, it also now has 8 different haplotypes represented as alternate loci. The addition of another 10+ haplotypes at this locus is also under consideration.
  18. The excess of red in the cSRA alignment track comes from secondary alignments. Somewhere in the SAM to cSRA conversion it seems that the secondary alignment CIGAR strings got messed up, resulting in what looks like really bad alignments. There’s no way to turn off the display for just the secondary alignments in Gbench. We will have to try and regenerate the cSRA to get rid of these…