SlideShare une entreprise Scribd logo
1  sur  47
Télécharger pour lire hors ligne
SSAHA_pileup:
A Genome Variation Detection Pipeline for
     Various Sequencing Platforms




             Photo Credit: saynine on flickr.com


          Ben Blackburne
     Wellcome Trust Sanger Institute
Acknowledgments


●Zemin Ning
●Yong Gu
●Antony Cox
●Adam Spargo
●Hannes Ponstingl
Introduction
●New sequencing technologies
  – More data
  – Different kinds of data
     ●Solexa, 454
     ●capillary, too
  – Diploid genomes
  – SNPs, indels, VNTRs




                              Photo Credit: mknowles on flickr.com
SSAHA_pileup
●Sequence Search and Alignment by Hashing
 Algorithm
●SSAHA_SNP
  – Global positioning with SSAHA algorithm
  – Fast Smith-Waterman implementation (from
    Cross_Match)
  – Identification of best match
●SSAHA_pileup
  – Determines SNPs from set of best alignments
●Works on Solexa, 454, and capillary reads
The Toolchain
Reference
 Genome




            SSAHA_snp/
                         Alignments      SSAHA_pileup
             SSAHA2




                                           variations
 Reads


                            refinement
SSAHA_SNP
●Reference genome is “hashed”
  – table made of all k-mer words
  – overlapping or not, at user's option
SSAHA_SNP
●k-mer matches found for query in reference


  chr n




  chr m
SSAHA_SNP


chr n

        Global Mapping


chr m
SSAHA_SNP


chr n
                           score: 126
        Local Mapping
        (Smith-Waterman)
                           score: 113
chr m
SSAHA_SNP


chr n
                            score: 126
        Select best match

                            score: 113
chr m
SSAHA_SNP
●Read pair information
  – currently possible with
    extra step using SSAHA2
  – being integrated into
    SSAHA_SNP
  – Removes incorrectly
    mapped pairs




                              Photo Credit: Matthew Fang on flickr.com
SSAHA_pileup
Reference
 Genome




            SSAHA_snp/
                         Alignments      SSAHA_pileup
             SSAHA2




                                           variations
 Reads


                            refinement
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
   GGTCCCACGGAGCTGGAG
        CCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
                     Aligned reads
 Homozygous SNP
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
    GGTCCCACAGAGCTGGAG
          CCACAGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
       TCCCACGGAGCTGGAGAAAGCCT
                       Aligned reads
 Heterozygous SNP
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
     GGTCCCACAGAGCTGGAG
           CCACAGAGCTGGAGAAAGCCT
        TCCCACggagCTGGAGAAAGCCT
        TCCCACggagcTGGAGAAAGCCT
        TCCCacggagcTGGAGAAAGCCT
                             Aligned reads
Heterozygous SNP??
                   (Probably not)
SSAHA_pileup
                      Reference
...GGTCCCACAGAGCTGGAGAAAG...
   GGTCCCAC-----TGGAG
        CCAC-----TGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
     TCCCACGGAGCTGGAGAAAGCCT
                       Aligned reads
    Heterozygous indel
How well does it work?
Datasets
●Venter: ABI capillary reads
  – Celera: 19,397,599     55% in pairs
  – JCVI: 12,541,352       98% in pairs
  – Total: 31,938,951    72% in pairs (90% mapped)
●Watson: 454 GS FLX reads
  – Baylor & Roche 74,198,831 (90.5% mapped)
  – single end reads with length 150 – 280 bps
●Chromosome X Illumina reads
  – 278,557,156 reads (71.6% mapped)
  – (paired with insert size 200bps)
How conservative should we
           be?
How conservative should we
           be?
Or....




How liberal should we be?
How do we even know if we are
         winning?
dbSNP
(but not ideal)
Filtering
●Processes that cause bogus SNPs
  – Incorrect global mapping
  – Incorrect local alignment
  – Poor quality reads
  – Sequence amplification errors
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                                        chr n




                `          `               `
                       `               `            `
                                   `
                ` ``       `   `               ``
                                           `
                                                        chr m
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                                        chr n




                `          `               `
                       `               `            `
                                   `
                ` ``       `   `               ``
                                           `
                                                        chr m
Global Mapping Problems
●Reads from unmapped regions of the genome
  – Lead to absurdly high apparent coverage

                                              chr n
              `
                               `
             `  ``
              `
              `
                  `
                `
                          `
                      `            `
                          ``
                  `
SNPs
Solution:
 Filter out SNPs called from
abnormally high read depths
Global Mapping Problems
●Incorrectly aligned reads


                                  chr n
               `     score: 132




               `     score: 136
                                  chr m
Solution:
                          nd
Filter out SNPs where 2 best
       score is too close
Local Alignment Problems
●Misalignment
  – Uncaught incorrect global alignment
  – Variations in short repeats
Local Misalignment
                      Reference
...GGTCCCACAGAGCTGGAGAAAA...
    GGTCCCACT---CTAGTG
        CCACT---CTAGTGAAAA
      TCCCACT---CTAGTGAAAA


                       Aligned reads
 Real SNPs?
Local Misalignment
                      Reference
..TAATAATAATAATAATAATAAGAAG..
    AATAATAAGAAGAAGAAGAAGAAG
    AATAATAAGAAGAAGAAGAAGAAG
    AATAATAAGAAGAAGAAGAAGAAG


                       Aligned reads
 Real SNPs?
Solution:
Filter out short blocks of many
             SNPs
Venter SNP Calling (Capillary)

                 count     fraction in dbSNP

Homozygous SNPs 1 347 806 97.1%

Heterozygous SNPs 1 857 167 90.9%

Total SNPs       3 204 973 93.5%
Watson SNP Calling (454)

                  count    fraction in
                           dbSNP

Homozygous SNPs   1 298 309 93.0%

Heterozygous SNPs 1 767 951 63.9%

Total SNPs        3 066 260 76.3%
X Chromosome SNPs (Solexa)

                  count    fraction in dbSNP

Homozygous SNPs 27 708     92.8%

Heterozygous SNPs 63 197   81.8%

Total SNPs        90 905   85.1%
Venter-Watson Overlap



  1 593 791   1 611 182   1 455 078




   Venter                     Watson
X Chromosome Overlap

             Solexa X reads
                  40 625


         19 978            12 590

                  17 712


    26 502        6 588       22 872

    Venter                    Watson
Conclusions
●SSAHA_pileup is effective across both new and
 old sequencing technologies
●Questions
  – When is a SNP not a SNP?
  – Homozygous/Heterozygous SNPs
Conclusions
●SSAHA_pileup is effective across both new and
 old sequencing technologies
●Questions
  – When is a SNP not a SNP?
  – Homozygous/Heterozygous SNPs
●Length matters...?
  – But it's what you do with it that counts
Obtaining SSAHA_pileup
                 SSAHA_pileup:
    ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/

                    SSAHA2:
http://www.sanger.ac.uk/Software/analysis/SSAHA2/
                   These Slides:
             http://slideshare.net/bpb/

Contenu connexe

En vedette

Osmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring ToolOsmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring Toolosmius
 
B A U T I S M O2
B A U T I S M O2B A U T I S M O2
B A U T I S M O2gloriaysela
 
E X P O R T A N D O M I S D I B U J O S
E X P O R T A N D O  M I S  D I B U J O SE X P O R T A N D O  M I S  D I B U J O S
E X P O R T A N D O M I S D I B U J O SYrianat
 
Abschlusspräsentation
AbschlusspräsentationAbschlusspräsentation
AbschlusspräsentationHerr_Poffo
 
Carnaval de San Diego
Carnaval de San DiegoCarnaval de San Diego
Carnaval de San Diegoguest990cbb
 
Musica1eso
Musica1esoMusica1eso
Musica1esocarloshc
 
Colonus - rock -
Colonus - rock -Colonus - rock -
Colonus - rock -colonusrock
 
La France 2140 C O N T E X T O
La  France 2140 C O N T E X T OLa  France 2140 C O N T E X T O
La France 2140 C O N T E X T Olosdehinojosos
 
Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2umystic
 
аэг нов с домиками
аэг нов с домикамиаэг нов с домиками
аэг нов с домикамиVictor Gridnev
 

En vedette (20)

Osmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring ToolOsmius 8.01 - Open Source Monitoring Tool
Osmius 8.01 - Open Source Monitoring Tool
 
Day Two
Day TwoDay Two
Day Two
 
Internet
InternetInternet
Internet
 
B A U T I S M O2
B A U T I S M O2B A U T I S M O2
B A U T I S M O2
 
E X P O R T A N D O M I S D I B U J O S
E X P O R T A N D O  M I S  D I B U J O SE X P O R T A N D O  M I S  D I B U J O S
E X P O R T A N D O M I S D I B U J O S
 
Grabalo
GrabaloGrabalo
Grabalo
 
Abschlusspräsentation
AbschlusspräsentationAbschlusspräsentation
Abschlusspräsentation
 
Cuento 1
Cuento 1Cuento 1
Cuento 1
 
Mashuta Mashuta
Mashuta MashutaMashuta Mashuta
Mashuta Mashuta
 
Carnaval de San Diego
Carnaval de San DiegoCarnaval de San Diego
Carnaval de San Diego
 
Mellorconhumor
MellorconhumorMellorconhumor
Mellorconhumor
 
Musica1eso
Musica1esoMusica1eso
Musica1eso
 
flickr + slide + animoto
flickr + slide + animotoflickr + slide + animoto
flickr + slide + animoto
 
Abusoinfantil
AbusoinfantilAbusoinfantil
Abusoinfantil
 
Quase
QuaseQuase
Quase
 
Colonus - rock -
Colonus - rock -Colonus - rock -
Colonus - rock -
 
La France 2140 C O N T E X T O
La  France 2140 C O N T E X T OLa  France 2140 C O N T E X T O
La France 2140 C O N T E X T O
 
Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2Kingdoms Of Southeast Asia And Korea2
Kingdoms Of Southeast Asia And Korea2
 
Sesion 05 WinForm
Sesion 05 WinFormSesion 05 WinForm
Sesion 05 WinForm
 
аэг нов с домиками
аэг нов с домикамиаэг нов с домиками
аэг нов с домиками
 

Similaire à SSAHA_pileup

Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionGenomeInABottle
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015Torsten Seemann
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsJan Aerts
 
Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3iainj88
 
Genotype Imputation via Matrix Completion
Genotype Imputation via Matrix CompletionGenotype Imputation via Matrix Completion
Genotype Imputation via Matrix Completionechi99
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataJoachim Jacob
 
Winnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesWinnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesChirag Jain
 
NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09Anthony Salvagno
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analysesfnothaft
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for educationaryajayakottarathil
 
Photomorphogenesis talk
Photomorphogenesis talkPhotomorphogenesis talk
Photomorphogenesis talkHugh Shanahan
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGScursoNGS
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSHAMNAHAMNA8
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingIntegrated DNA Technologies
 
Fly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelFly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelSanju K. Sinha
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Yaoyu Wang
 

Similaire à SSAHA_pileup (20)

Karen miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detectionKaren miga centromere sequence characterization and variant detection
Karen miga centromere sequence characterization and variant detection
 
De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015De novo genome assembly - IMB Winter School - 7 July 2015
De novo genome assembly - IMB Winter School - 7 July 2015
 
ECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPsECCB10 talk - Nextgen sequencing and SNPs
ECCB10 talk - Nextgen sequencing and SNPs
 
Genomics lecture 3
Genomics lecture 3Genomics lecture 3
Genomics lecture 3
 
Genotype Imputation via Matrix Completion
Genotype Imputation via Matrix CompletionGenotype Imputation via Matrix Completion
Genotype Imputation via Matrix Completion
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
CQNCER
CQNCERCQNCER
CQNCER
 
Part 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw dataPart 2 of RNA-seq for DE analysis: Investigating raw data
Part 2 of RNA-seq for DE analysis: Investigating raw data
 
Winnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequencesWinnowmap2: A long read mapping method for highly repetitive reference sequences
Winnowmap2: A long read mapping method for highly repetitive reference sequences
 
NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09NSMS IGERT Nano Cafe 2/12/09
NSMS IGERT Nano Cafe 2/12/09
 
Scaling Genomic Analyses
Scaling Genomic AnalysesScaling Genomic Analyses
Scaling Genomic Analyses
 
01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education01-Sequencing_Technologies (1).ppt for education
01-Sequencing_Technologies (1).ppt for education
 
Photomorphogenesis talk
Photomorphogenesis talkPhotomorphogenesis talk
Photomorphogenesis talk
 
Introduction to NGS
Introduction to NGSIntroduction to NGS
Introduction to NGS
 
Hong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptxHong_Celine_ES_workshop.pptx
Hong_Celine_ES_workshop.pptx
 
RNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGSRNA sequencing analysis tutorial with NGS
RNA sequencing analysis tutorial with NGS
 
New RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editingNew RNA tools for optimized CRISPR/Cas9 genome editing
New RNA tools for optimized CRISPR/Cas9 genome editing
 
20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop20140710 6 c_mason_ercc2.0_workshop
20140710 6 c_mason_ercc2.0_workshop
 
Fly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov modelFly chromatin dynamics using bidirectional hidden markov model
Fly chromatin dynamics using bidirectional hidden markov model
 
Rnaseq basics ngs_application1
Rnaseq basics ngs_application1Rnaseq basics ngs_application1
Rnaseq basics ngs_application1
 

Dernier

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMRavindra Nath Shukla
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
John Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdfJohn Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdfAmzadHosen3
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 

Dernier (20)

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Forklift Operations: Safety through Cartoons
Forklift Operations: Safety through CartoonsForklift Operations: Safety through Cartoons
Forklift Operations: Safety through Cartoons
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Monte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSMMonte Carlo simulation : Simulation using MCSM
Monte Carlo simulation : Simulation using MCSM
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
John Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdfJohn Halpern sued for sexual assault.pdf
John Halpern sued for sexual assault.pdf
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 

SSAHA_pileup

  • 1. SSAHA_pileup: A Genome Variation Detection Pipeline for Various Sequencing Platforms Photo Credit: saynine on flickr.com Ben Blackburne Wellcome Trust Sanger Institute
  • 2. Acknowledgments ●Zemin Ning ●Yong Gu ●Antony Cox ●Adam Spargo ●Hannes Ponstingl
  • 3. Introduction ●New sequencing technologies – More data – Different kinds of data ●Solexa, 454 ●capillary, too – Diploid genomes – SNPs, indels, VNTRs Photo Credit: mknowles on flickr.com
  • 4.
  • 5.
  • 6.
  • 7. SSAHA_pileup ●Sequence Search and Alignment by Hashing Algorithm ●SSAHA_SNP – Global positioning with SSAHA algorithm – Fast Smith-Waterman implementation (from Cross_Match) – Identification of best match ●SSAHA_pileup – Determines SNPs from set of best alignments ●Works on Solexa, 454, and capillary reads
  • 8. The Toolchain Reference Genome SSAHA_snp/ Alignments SSAHA_pileup SSAHA2 variations Reads refinement
  • 9. SSAHA_SNP ●Reference genome is “hashed” – table made of all k-mer words – overlapping or not, at user's option
  • 10. SSAHA_SNP ●k-mer matches found for query in reference chr n chr m
  • 11. SSAHA_SNP chr n Global Mapping chr m
  • 12. SSAHA_SNP chr n score: 126 Local Mapping (Smith-Waterman) score: 113 chr m
  • 13. SSAHA_SNP chr n score: 126 Select best match score: 113 chr m
  • 14. SSAHA_SNP ●Read pair information – currently possible with extra step using SSAHA2 – being integrated into SSAHA_SNP – Removes incorrectly mapped pairs Photo Credit: Matthew Fang on flickr.com
  • 15. SSAHA_pileup Reference Genome SSAHA_snp/ Alignments SSAHA_pileup SSAHA2 variations Reads refinement
  • 16. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACGGAGCTGGAG CCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Homozygous SNP
  • 17. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACAGAGCTGGAG CCACAGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Heterozygous SNP
  • 18. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCACAGAGCTGGAG CCACAGAGCTGGAGAAAGCCT TCCCACggagCTGGAGAAAGCCT TCCCACggagcTGGAGAAAGCCT TCCCacggagcTGGAGAAAGCCT Aligned reads Heterozygous SNP?? (Probably not)
  • 19. SSAHA_pileup Reference ...GGTCCCACAGAGCTGGAGAAAG... GGTCCCAC-----TGGAG CCAC-----TGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT TCCCACGGAGCTGGAGAAAGCCT Aligned reads Heterozygous indel
  • 20. How well does it work?
  • 21. Datasets ●Venter: ABI capillary reads – Celera: 19,397,599 55% in pairs – JCVI: 12,541,352 98% in pairs – Total: 31,938,951 72% in pairs (90% mapped) ●Watson: 454 GS FLX reads – Baylor & Roche 74,198,831 (90.5% mapped) – single end reads with length 150 – 280 bps ●Chromosome X Illumina reads – 278,557,156 reads (71.6% mapped) – (paired with insert size 200bps)
  • 25. How do we even know if we are winning?
  • 26.
  • 28. Filtering ●Processes that cause bogus SNPs – Incorrect global mapping – Incorrect local alignment – Poor quality reads – Sequence amplification errors
  • 29. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` ` ` ` ` ` `` ` ` `` ` chr m
  • 30. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` ` ` ` ` ` `` ` ` `` ` chr m
  • 31. Global Mapping Problems ●Reads from unmapped regions of the genome – Lead to absurdly high apparent coverage chr n ` ` ` `` ` ` ` ` ` ` ` `` `
  • 32. SNPs
  • 33. Solution: Filter out SNPs called from abnormally high read depths
  • 34. Global Mapping Problems ●Incorrectly aligned reads chr n ` score: 132 ` score: 136 chr m
  • 35. Solution: nd Filter out SNPs where 2 best score is too close
  • 36. Local Alignment Problems ●Misalignment – Uncaught incorrect global alignment – Variations in short repeats
  • 37. Local Misalignment Reference ...GGTCCCACAGAGCTGGAGAAAA... GGTCCCACT---CTAGTG CCACT---CTAGTGAAAA TCCCACT---CTAGTGAAAA Aligned reads Real SNPs?
  • 38. Local Misalignment Reference ..TAATAATAATAATAATAATAAGAAG.. AATAATAAGAAGAAGAAGAAGAAG AATAATAAGAAGAAGAAGAAGAAG AATAATAAGAAGAAGAAGAAGAAG Aligned reads Real SNPs?
  • 39. Solution: Filter out short blocks of many SNPs
  • 40. Venter SNP Calling (Capillary) count fraction in dbSNP Homozygous SNPs 1 347 806 97.1% Heterozygous SNPs 1 857 167 90.9% Total SNPs 3 204 973 93.5%
  • 41. Watson SNP Calling (454) count fraction in dbSNP Homozygous SNPs 1 298 309 93.0% Heterozygous SNPs 1 767 951 63.9% Total SNPs 3 066 260 76.3%
  • 42. X Chromosome SNPs (Solexa) count fraction in dbSNP Homozygous SNPs 27 708 92.8% Heterozygous SNPs 63 197 81.8% Total SNPs 90 905 85.1%
  • 43. Venter-Watson Overlap 1 593 791 1 611 182 1 455 078 Venter Watson
  • 44. X Chromosome Overlap Solexa X reads 40 625 19 978 12 590 17 712 26 502 6 588 22 872 Venter Watson
  • 45. Conclusions ●SSAHA_pileup is effective across both new and old sequencing technologies ●Questions – When is a SNP not a SNP? – Homozygous/Heterozygous SNPs
  • 46. Conclusions ●SSAHA_pileup is effective across both new and old sequencing technologies ●Questions – When is a SNP not a SNP? – Homozygous/Heterozygous SNPs ●Length matters...? – But it's what you do with it that counts
  • 47. Obtaining SSAHA_pileup SSAHA_pileup: ftp://ftp.sanger.ac.uk/pub/zn1/ssaha_pileup/ SSAHA2: http://www.sanger.ac.uk/Software/analysis/SSAHA2/ These Slides: http://slideshare.net/bpb/