SlideShare une entreprise Scribd logo
1  sur  57
Bioinformatics in Genetics
                        Research

             Genetics Noon Symposium Series
                          Daniel Gaston, PhD
Dr. Karen Bedard Lab, Department of Pathology

                         November 21st, 2012
IGNITE


   Orphan Diseases: Identifying Genes and Novel
    Therapeutics to Enhance Treatment
   Identify causative genetic variations in orphan
    diseases with an emphasis on Atlantic Canada
   Develop animal and cell culture models
   Identify and develop novel therapeutics
   igniteproject.ca
IGNITE


   Orphan Diseases: Identifying Genes and Novel
    Therapeutics to Enhance Treatment
   Identify causative genetic variations in orphan
    diseases with an emphasis on Atlantic Canada
   Develop animal and cell culture models
   Identify and develop novel therapeutics
   igniteproject.ca
Outline


   Introduction
       Bioinformatics in Disease Genomics
       Next-Generation Sequencing
   Genomics in Research and the Clinic
   The Data Deluge and its Solutions
       Bioinformatic Methods for Analyzing Genomic Data
   Case Studies
   Conclusion
Bioinformatics in Disease Genomics
   Handling and long-term storage of raw data
    (sequencing, gene expression, etc)
   Maintenance and support of computational
    infrastructure
   Experimental design
   Data analysis
   Methods development
       Analysis pipelines
       Statistical analyses
       Algorithm design
Bioinformatics in Disease Genomics
   Handling and long-term storage of raw data
    (sequencing, gene expression, etc)
   Maintenance and support of computational
    infrastructure
   Experimental design
   Data analysis
   Methods development
       Analysis pipelines
       Statistical analysis techniques
       Algorithm design
‘Next-Generation’ Sequencing and
               Disease Genomics
Disease Genomics: Hunting Down Pathogenic
 Genetic Variation


Referenc       Exon 1   Intron 1   Exon 2
e

       Start
                                            TAA
                                            Stop
Disease Genomics: Hunting Down Pathogenic
 Genetic Variation
                             Splice
                             Sites

Referenc       Exon 1          Intron 1           Exon 2
e

       Start
                                                           TAA
                        mRNA coding for protein            Stop
Disease Genomics: Hunting Down Pathogenic
 Genetic Variation
                                Splice
                                Sites

Referenc          Exon 1          Intron 1           Exon 2
e

          Start
                                                              TAA
                           mRNA coding for protein            Stop




Patient           Exon 1          Intron 1           Exon 2
Disease Genomics: Hunting Down Pathogenic
 Genetic Variation
                                Splice
                                Sites

Referenc          Exon 1          Intron 1           Exon 2
e

          Start
                                                              TAA
                           mRNA coding for protein            Stop

                                                              TAC
                                                              Tyr



Patient           Exon 1          Intron 1           Exon 2
Disease Genomics: Hunting Down Pathogenic
 Genetic Variation
                                Splice
                                Sites

Referenc          Exon 1          Intron 1           Exon 2
e

          Start
                                                              TAA
                           mRNA coding for protein            Stop

                                                              TAC
                                                              Tyr



Patient           Exon 1          Intron 1           Exon 2
Disease Genomics: Hunting Down Pathogenic
 Genetic Variation
                                Splice
                                Sites

Referenc          Exon 1          Intron 1           Exon 2
e

          Start
                                                              TAA
                           mRNA coding for protein            Stop

                                                              TAC
                                                              Tyr



Patient           Exon 1          Intron 1           Exon 2
Disease Genomics: Research vs Clinic
   Still predominantly research oriented
       Complex/Common disease
       Mendelian disorders
       Cancer genomics
Disease Genomics: Research vs Clinic
   Still predominantly research oriented
       Complex/Common disease
       Mendelian disorders
       Cancer genomics
   Clinical genomics starting to gain traction
       Cancer genomics
           Cancer subtype identification
           Personalized medicine and predicting outcomes
       Mendelian disorders
           Early diagnosis
           Cost effectiveness
Clinical Genomics




   Children’s Mercy Hospital NICU
       In the US >20% of infant deaths due to genetic disease
       Serial sequencing of candidate genes too slow
Children’s Mercy Hospital NICU
   50-hour differential diagnosis of monogenic disease
       Sample preparation and sequencing: 30.5 hours
       Automated bioinformatics analysis: 17.5 hours
       Previous high-throughput sequencing methods: 19 days
       Test on seven infants, two previously diagnosed using
        standard methods, five undiagnosed
Children’s Mercy Hospital NICU
   50-hour differential diagnosis of monogenic disease
       Sample preparation and sequencing: 30.5 hours
       Automated bioinformatics analysis: 17.5 hours
       Previous high-throughput sequencing methods: 19 days
       Test on seven infants, two previously diagnosed using
        standard methods, five undiagnosed
   Caveats
       Bioinformatics portion not available outside of hospital
       Requires thorough clinical phenotyping using a controlled
        vocabulary
       Generates a large amount of data
The Data Deluge

           4 million genetic variants


           2 million associated with
             protein-coding genes

               10,000 possibly
                 of disease
                causing type

                   1500 <1%
                 frequency in
                  population
Surviving the Data Deluge

Reducing the Search Space: Exome Sequencing
Exome Sequencing

   Exome: Portion of genome composed of protein-
    coding exons and functional RNA sequences

   1.5 - 2% of human genome (50 Mb)

   > 85% of monogenic diseases due to variants in
    exome

   Complete exome sequencing: ~ $1000/sample
Caveats


   Incomplete and non-uniform coverage of exome
       Systematic bias (GC content)
       Random sampling


   Not all genetic variants amenable to discovery
       Non-coding variants
       Structural variants
Surviving The Data Deluge

                 Bioinformatics
Typical Bioinformatics Workflow
              QC of Raw Data


              Map to Reference


                    QC


                Find Variants


                    QC


                  Annotate


                    Filter
It Sounds simple but…
   For every stage there are multiple programs
    available and published in the literature
It Sounds simple but…
   For every stage there are multiple programs
    available and published in the literature
   For every program there are a wide-variety of
    parameter values and options. Defaults often “good
    enough” but not always
It Sounds simple but…
   For every stage there are multiple programs
    available and published in the literature
   For every program there are a wide-variety of
    parameter values and options. Defaults often “good
    enough” but not always
   Best combinations of programs and options not well
    understood
It Sounds simple but…
   For every stage there are multiple programs
    available and published in the literature
   For every program there are a wide-variety of
    parameter values and options. Defaults often “good
    enough” but not always
   Best combinations of programs and options not well
    understood
   Protocols changing rapidly as new technologies and
    methods developed
It Sounds simple but…
   For every stage there are multiple programs
    available and published in the literature
   For every program there are a wide-variety of
    parameter values and options. Defaults often “good
    enough” but not always
   Best combinations of programs and options not well
    understood
   Protocols changing rapidly as new technologies and
    methods developed
   Different centres and groups use slightly different
    workflows with similar, but not identical results
Typical Bioinformatics Workflow
              QC of Raw Data


              Map to Reference


                    QC


                Find Variants


                    QC


                  Annotate


                    Filter
Annotating Variants
If a problem cannot be
solved, enlarge it.
             --Dwight D.
Eisenhower
Annotations Associated with Genomic
Variants
   Is variant in a known protein-coding gene?
       What does the gene do?
       What molecular pathways?
       What protein-protein interactions?           4 million genetic variants

       What tissues is it expressed in?                 2 million associated with
                                                           protein-coding genes
       When in development?
                                                             10,000 possibly

   Has this variant been seen before?                         of disease
                                                              causing type

                                                                 1500 <1%
       What population(s)? With what frequency?               frequency in
                                                                population

       Has it been seen in local sequencing projects?
       Is there any known clinical significance?
   What is the effect of the variation?
       Does it change the resulting protein? How?
Gene Annotation Resources
Variant Annotation Resources
Potential Pitfalls with Annotation Sources



   Databases often overlap and agree, but there may
    be disagreements
   Source of information: Predicted versus
    experimental
   Incorrect and out-of-date information
   Large-scale un-validated versus manually curated
    datasets
Bioinformatics Analyses of Genomic
                           Variants

           Combining Data Sources and Filtering
IGNITE Data Pipeline and Integration

                 Gene
               Annotations    Annotated
                              Genomic
                               Variants




 Mapped           Gene
 Region(s)      Definitions
                                Filter
                                 Sort
                               Prioritize
Known Genes    Pathway and
               Interactions
Filtering the Data: Categorization
                                    4 million
                                    variants



           Intronic                             Exonic                            Intergenic




                                      Amino Acid
 Unknown              Splice Site                        Silent Mutation          Splice Site
                                       Changing

        Potential                                                                  Potential
        Disease                                                                    Disease
        Causing                                                                    Causing


Known Genetic                        Amino Acid      Amino Acid               Known
                      Stop Loss /
   Disease                          Change Likely   Change Likely          Polymorphism
                      Stop Gain
   Variant                           Pathogenic        Benign              in Population
Filtering the Data: Common or Rare?


   Variants in dbSNP – Typically known polymorphisms,
    unlikely to be associated with rare disease
   Variants with relatively high frequency in control
    populations (1000 Genomes, HapMAP, EVS, 2800
    Exomes)
   Number of times variant previously seen at
    sequencing centre/locally
Notes on Filtering and Variant Annotation
   Very important to be aware of population when
    referencing frequency of a variant. Incorrect
    background leads to incorrect assumptions on
    prevalence
Notes on Filtering and Variant Annotation
   Very important to be aware of population when
    referencing frequency of a variant. Incorrect
    background leads to incorrect assumptions on
    prevalence
   Reasonably well-sampled local populations are
    better than any other reference
Notes on Filtering and Variant Annotation
   Very important to be aware of population when
    referencing frequency of a variant. Incorrect
    background leads to incorrect assumptions on
    prevalence
   Reasonably well-sampled local populations are
    better than any other reference
   Strike a balance between hard filtering for variants of
    largest potential effect and being inclusive to not
    miss variants
Notes on Filtering and Variant Annotation
   Very important to be aware of population when
    referencing frequency of a variant. Incorrect
    background leads to incorrect assumptions on
    prevalence
   Reasonably well-sampled local populations are
    better than any other reference
   Strike a balance between hard filtering for variants of
    largest potential effect and being inclusive to not
    miss variants
   Some genes acquire large effect variants (stop loss /
    stop gain, etc) frequently. Some genes can be lost
    without causing disease
Applications to Real Data

Charcot-Marie-Tooth Disease and Cutis Laxa
IGNITE Data Pipeline and Integration

                 Gene
               Annotations    Annotated
                              Genomic
                               Variants




 Mapped           Gene
 Region(s)      Definitions
                                Filter
                                 Sort
                               Prioritize
Known Genes    Pathway and
               Interactions
Charcot-Marie-Tooth: Genetic Mapping



                       Chromosome 9:
                       120,962,282 -
                       133,033,431
Cutis Laxa: Genetic Mapping




                      Chromosome 17:
                      79,596,811-
                      81,041,077
Charcot-Marie-Tooth               Cutis Laxa
   143 genes in region      52 genes in region
   13 known genes in        5 known genes in genome
    genome                       ATP6V0A2
       MPZ                      ELN
       PMP22                    FBLN5
       GDAP1                    EFEMP2
       KIF1B                    SCYL1BP1
       MFN2                     ALDH18A1
       SOX
       EGR2
       DNM2
       RAB7
       LITAF (SIMPLE)
       GARS
       YARS
       LMNA
Pathway and Interaction Data
   37 pathways                       10 pathways
       Clathrin-derived vesicle          Phagosome
        budding                           Collecting duct acid
       Lysosome vesicle                   secretion
        biogenesis                        Lysosome
       Endocytosis                       Protein digestion and
       Golgi-associated vesicle           absorption
        biogenesis                        Metabolic pathways
       Membrane trafficking              Oxidative
       Trans-Golgi network                phosphorylation
        vesicle budding                   Arginine and proline
   Primarily LMNA or                      metabolism
    DNM2                              Primarily ATP6V0A2
Results: Charcot-Marie-Tooth
   8 Genes Prioritized
Gene                     Interactions   Pathway
LRSAM1                   Multiple       Endocytosis
DNM1                     DNM2               -
FNBP1                    DNM2               -
TOR1A                    MNA                -
STXBP1                   Multiple                 Five
SH3GLB2                     -           Endocytosis
PIP5KL1                             -         Endocytosis
FAM125B                     -           Endocytosis


   For more information
       Guernsey et al (2010) PLoS Genetics. 6(8): e1001081
Results: Cutis Laxa
 10 genes prioritized
Gene                   Interactions    Pathway
HEXDC                  Multiple              Phagosome
HG5                       -            Phagosome
HG5                    Multiple              Lysosome, Protein
digestion
SIRT7                  Multiple               Metabolic Pathways
FASN                      -            Metabolic Pathways
DCXR                      -            Metabolic Pathways
PYCR1                     -            Metabolic Pathways,
                                              Arginine/Proline
PCYT2                       -          Metabolic Pathways
ARHGDIA                      -         Oxidative Phosphorylation

   For more information
     Guernsey et al (2009) Am J Hum Genet. 85(1): 120-9
Conclusions
Conclusions
   Bioinformatics is involved at every stage of genomic
    research from experimental design through to final
    analysis
   Standards and best practices do exist, but are
    rapidly evolving as new technologies and methods
    are developed
   Progress towards automatic generation of clinically
    interpretable genomics studies
   Annotation, filtering, and prioritization of genetic
    variants crucial
   Balance between false positive calls and false
    negatives
Where Are We Headed?

   Integration of more data sources
       Gene expression
       More annotation sources
           Controlled phenotype vocabularies
           Gene Ontology terms
       Predictive models
           Recessive versus Dominant inheritance and Penetrance
   “New” and Emerging Technologies
       RNA-Seq (Gene Expression)
       ChIP-Seq (Protein-DNA binding)
       Single-Molecule Sequencing
Acknowledgements
   Dalhousie University          McGill/Genome Quebec
       Dr. Karen Bedard              Dr. Jacek Majewski
       Dr. Chris McMaster            Jeremy
       Dr. Andrew Orr                 Schwartzentruber
       Dr. Conrad Fernandez
       Dr. Marissa Leblanc       Dr. Sarah Dyack
       Mat Nightingale           Dr. Johane Robataille
       Bedard Lab
                                  Genome Atlantic
       IGNITE

Contenu connexe

Tendances

13 genetic engineering bw
13 genetic engineering bw13 genetic engineering bw
13 genetic engineering bw
honey444
 
The Ginés‐Mera Fellowship Fund for Postgraduates Studies in Biodiversity
The Ginés‐Mera Fellowship Fund for Postgraduates Studies in BiodiversityThe Ginés‐Mera Fellowship Fund for Postgraduates Studies in Biodiversity
The Ginés‐Mera Fellowship Fund for Postgraduates Studies in Biodiversity
CIAT
 
Bacterial panicle blight
Bacterial panicle blightBacterial panicle blight
Bacterial panicle blight
CIAT
 
Triparental Mating
Triparental MatingTriparental Mating
Triparental Mating
roxanne-b
 
Erasmus Critical Care Days 2011 Genetics
Erasmus Critical Care Days 2011 GeneticsErasmus Critical Care Days 2011 Genetics
Erasmus Critical Care Days 2011 Genetics
Hazelzet
 

Tendances (20)

New promoters and selection methods
New promoters and selection methods New promoters and selection methods
New promoters and selection methods
 
13 genetic engineering bw
13 genetic engineering bw13 genetic engineering bw
13 genetic engineering bw
 
Plasmodium CSP - Based vaccines Past - prsesent - future
Plasmodium CSP - Based vaccines Past - prsesent - futurePlasmodium CSP - Based vaccines Past - prsesent - future
Plasmodium CSP - Based vaccines Past - prsesent - future
 
Marker free transgenics: concept and approaches
Marker free transgenics: concept and approachesMarker free transgenics: concept and approaches
Marker free transgenics: concept and approaches
 
Genetic engineering
Genetic engineeringGenetic engineering
Genetic engineering
 
Tobacco ring e3 ligase nt rfp1 mediates romance
Tobacco ring e3 ligase nt rfp1 mediates romanceTobacco ring e3 ligase nt rfp1 mediates romance
Tobacco ring e3 ligase nt rfp1 mediates romance
 
Application of rDNA technology to produce Interferon, Hepatitis-B Vaccine & I...
Application of rDNA technology to produce Interferon, Hepatitis-B Vaccine & I...Application of rDNA technology to produce Interferon, Hepatitis-B Vaccine & I...
Application of rDNA technology to produce Interferon, Hepatitis-B Vaccine & I...
 
Plant expression vectors
Plant expression vectorsPlant expression vectors
Plant expression vectors
 
Gene transfer in plants 2- biological vector
Gene transfer in plants 2- biological vector Gene transfer in plants 2- biological vector
Gene transfer in plants 2- biological vector
 
The Ginés‐Mera Fellowship Fund for Postgraduates Studies in Biodiversity
The Ginés‐Mera Fellowship Fund for Postgraduates Studies in BiodiversityThe Ginés‐Mera Fellowship Fund for Postgraduates Studies in Biodiversity
The Ginés‐Mera Fellowship Fund for Postgraduates Studies in Biodiversity
 
Viral vector gene transfer - plant viruses as a vector for gene transfer
Viral vector gene transfer - plant viruses as a vector for gene transferViral vector gene transfer - plant viruses as a vector for gene transfer
Viral vector gene transfer - plant viruses as a vector for gene transfer
 
agrobacterim vector
 agrobacterim vector agrobacterim vector
agrobacterim vector
 
Bacterial panicle blight
Bacterial panicle blightBacterial panicle blight
Bacterial panicle blight
 
Triparental Mating
Triparental MatingTriparental Mating
Triparental Mating
 
Erasmus Critical Care Days 2011 Genetics
Erasmus Critical Care Days 2011 GeneticsErasmus Critical Care Days 2011 Genetics
Erasmus Critical Care Days 2011 Genetics
 
002 control options for rice bacterial panicle blight, don groth
002   control options for rice bacterial panicle blight, don groth002   control options for rice bacterial panicle blight, don groth
002 control options for rice bacterial panicle blight, don groth
 
Host cell and vectors
Host cell and vectorsHost cell and vectors
Host cell and vectors
 
Transplastomics
TransplastomicsTransplastomics
Transplastomics
 
Genetics of Microorganisms. Forms of variation in microbes : Non-heredity and...
Genetics of Microorganisms. Forms of variation in microbes : Non-heredity and...Genetics of Microorganisms. Forms of variation in microbes : Non-heredity and...
Genetics of Microorganisms. Forms of variation in microbes : Non-heredity and...
 
Gene transfer in bacteria
Gene transfer in bacteriaGene transfer in bacteria
Gene transfer in bacteria
 

En vedette

Bioinformatics kernels relations
Bioinformatics kernels relationsBioinformatics kernels relations
Bioinformatics kernels relations
Michiel Stock
 
Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 month
biinoida
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
biinoida
 

En vedette (14)

Cb08 gonzález jesús
Cb08 gonzález jesúsCb08 gonzález jesús
Cb08 gonzález jesús
 
Bioinformatics kernels relations
Bioinformatics kernels relationsBioinformatics kernels relations
Bioinformatics kernels relations
 
EasyGene oligo factory
EasyGene oligo factoryEasyGene oligo factory
EasyGene oligo factory
 
Sssc retreat.bioinfo resources.20110411
Sssc retreat.bioinfo resources.20110411Sssc retreat.bioinfo resources.20110411
Sssc retreat.bioinfo resources.20110411
 
Introduction to Cancer Genomics Databases
Introduction to Cancer Genomics DatabasesIntroduction to Cancer Genomics Databases
Introduction to Cancer Genomics Databases
 
Architecture and evolution of neochromosomes
Architecture and evolution of neochromosomesArchitecture and evolution of neochromosomes
Architecture and evolution of neochromosomes
 
Biometric encryption
Biometric encryptionBiometric encryption
Biometric encryption
 
Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 month
 
Appli bioinfo
Appli bioinfoAppli bioinfo
Appli bioinfo
 
Primer Designing
Primer DesigningPrimer Designing
Primer Designing
 
PCR Primer desining
PCR Primer desiningPCR Primer desining
PCR Primer desining
 
Computer for Biological Research
Computer for Biological ResearchComputer for Biological Research
Computer for Biological Research
 
Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Similaire à Bioinformatics in Gene Research

Current Trends in Molecular Biology and BioTechnology (ppt)
Current Trends in Molecular Biology and BioTechnology (ppt)Current Trends in Molecular Biology and BioTechnology (ppt)
Current Trends in Molecular Biology and BioTechnology (ppt)
Perez Eric
 
Personalized Medicine and the Omics Revolution by Professor Mike Snyder
Personalized Medicine and the Omics Revolution by Professor Mike SnyderPersonalized Medicine and the Omics Revolution by Professor Mike Snyder
Personalized Medicine and the Omics Revolution by Professor Mike Snyder
The Hive
 
Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Epstein-Barr virus genetic variants are associated with multiple sclerosis.Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Mutiple Sclerosis
 
Dr. Jessica Mar to Speak
Dr. Jessica Mar to SpeakDr. Jessica Mar to Speak
Dr. Jessica Mar to Speak
FHCCommunity
 
Recombinant Dna technology, Restriction Endonucleas and Vector
Recombinant Dna technology, Restriction Endonucleas and Vector Recombinant Dna technology, Restriction Endonucleas and Vector
Recombinant Dna technology, Restriction Endonucleas and Vector
Dr. Priti D. Diwan
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
Fyzah Bashir
 

Similaire à Bioinformatics in Gene Research (20)

Current Trends in Molecular Biology and BioTechnology (ppt)
Current Trends in Molecular Biology and BioTechnology (ppt)Current Trends in Molecular Biology and BioTechnology (ppt)
Current Trends in Molecular Biology and BioTechnology (ppt)
 
Introns: structure and functions
Introns: structure and functionsIntrons: structure and functions
Introns: structure and functions
 
Stephen Friend ICR UK 2012-06-18
Stephen Friend ICR UK 2012-06-18Stephen Friend ICR UK 2012-06-18
Stephen Friend ICR UK 2012-06-18
 
microbial genetics
 microbial genetics microbial genetics
microbial genetics
 
Personalized Medicine and the Omics Revolution by Professor Mike Snyder
Personalized Medicine and the Omics Revolution by Professor Mike SnyderPersonalized Medicine and the Omics Revolution by Professor Mike Snyder
Personalized Medicine and the Omics Revolution by Professor Mike Snyder
 
Digging into thousands of variants to find disease genes in Mendelian and com...
Digging into thousands of variants to find disease genes in Mendelian and com...Digging into thousands of variants to find disease genes in Mendelian and com...
Digging into thousands of variants to find disease genes in Mendelian and com...
 
Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Epstein-Barr virus genetic variants are associated with multiple sclerosis.Epstein-Barr virus genetic variants are associated with multiple sclerosis.
Epstein-Barr virus genetic variants are associated with multiple sclerosis.
 
2014 11 03_bioinformatics_case_studies
2014 11 03_bioinformatics_case_studies2014 11 03_bioinformatics_case_studies
2014 11 03_bioinformatics_case_studies
 
Msb201158
Msb201158Msb201158
Msb201158
 
Dr. Jessica Mar to Speak
Dr. Jessica Mar to SpeakDr. Jessica Mar to Speak
Dr. Jessica Mar to Speak
 
gene mapping, clonning of disease gene(1).pptx
gene mapping, clonning of disease gene(1).pptxgene mapping, clonning of disease gene(1).pptx
gene mapping, clonning of disease gene(1).pptx
 
Identification of disease genes
Identification of disease genesIdentification of disease genes
Identification of disease genes
 
Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016Genetic engineering and biotechnology 2016
Genetic engineering and biotechnology 2016
 
Recombinant Dna technology, Restriction Endonucleas and Vector
Recombinant Dna technology, Restriction Endonucleas and Vector Recombinant Dna technology, Restriction Endonucleas and Vector
Recombinant Dna technology, Restriction Endonucleas and Vector
 
Microarrays;application
Microarrays;applicationMicroarrays;application
Microarrays;application
 
Dra. Mary Reilly - 'Neuropatías periféricas hereditarias'
Dra. Mary Reilly - 'Neuropatías periféricas hereditarias' Dra. Mary Reilly - 'Neuropatías periféricas hereditarias'
Dra. Mary Reilly - 'Neuropatías periféricas hereditarias'
 
Epigenetics 2013
Epigenetics 2013Epigenetics 2013
Epigenetics 2013
 
basic concept of molecular pathology
basic concept of molecular pathologybasic concept of molecular pathology
basic concept of molecular pathology
 
Genetic disorders 1
Genetic disorders 1Genetic disorders 1
Genetic disorders 1
 
Una revisión de los conocimientos fundamentales de la biología de la célula. ...
Una revisión de los conocimientos fundamentales de la biología de la célula. ...Una revisión de los conocimientos fundamentales de la biología de la célula. ...
Una revisión de los conocimientos fundamentales de la biología de la célula. ...
 

Plus de Dan Gaston

Plus de Dan Gaston (11)

Population and evolutionary genetics 1
Population and evolutionary genetics 1Population and evolutionary genetics 1
Population and evolutionary genetics 1
 
2016 ngs health_lecture
2016 ngs health_lecture2016 ngs health_lecture
2016 ngs health_lecture
 
Human genetics evolutionary genetics
Human genetics   evolutionary geneticsHuman genetics   evolutionary genetics
Human genetics evolutionary genetics
 
Genomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyGenomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and Pathology
 
2015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and22015 Bioc4010 lecture1and2
2015 Bioc4010 lecture1and2
 
2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture2016 Dal Human Genetics - Genomics in Medicine Lecture
2016 Dal Human Genetics - Genomics in Medicine Lecture
 
Bioc4700 2014 Guest Lecture
Bioc4700   2014 Guest LectureBioc4700   2014 Guest Lecture
Bioc4700 2014 Guest Lecture
 
Protein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human HealthProtein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human Health
 
Bioc4010 sample questions
Bioc4010 sample questionsBioc4010 sample questions
Bioc4010 sample questions
 
Bioc4010 lectures 1 and 2
Bioc4010 lectures 1 and 2Bioc4010 lectures 1 and 2
Bioc4010 lectures 1 and 2
 
Dgaston dec-06-2012
Dgaston dec-06-2012Dgaston dec-06-2012
Dgaston dec-06-2012
 

Dernier

Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Dipal Arora
 

Dernier (20)

Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
 
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
Night 7k to 12k Navi Mumbai Call Girl Photo 👉 BOOK NOW 9833363713 👈 ♀️ night ...
 
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Faridabad Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bangalore Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 9907093804 Top Class Call Girl Service Available
 
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
(Low Rate RASHMI ) Rate Of Call Girls Jaipur ❣ 8445551418 ❣ Elite Models & Ce...
 
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 9907093804 Top Class Call Girl Service Available
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel roomLucknow Call girls - 8800925952 - 24x7 service with hotel room
Lucknow Call girls - 8800925952 - 24x7 service with hotel room
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Kochi Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Call Girls Siliguri Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Siliguri Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Siliguri Just Call 8250077686 Top Class Call Girl Service Available
 
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
(👑VVIP ISHAAN ) Russian Call Girls Service Navi Mumbai🖕9920874524🖕Independent...
 
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
Call Girls Bhubaneswar Just Call 9907093804 Top Class Call Girl Service Avail...
 
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
♛VVIP Hyderabad Call Girls Chintalkunta🖕7001035870🖕Riya Kappor Top Call Girl ...
 

Bioinformatics in Gene Research

  • 1. Bioinformatics in Genetics Research Genetics Noon Symposium Series Daniel Gaston, PhD Dr. Karen Bedard Lab, Department of Pathology November 21st, 2012
  • 2. IGNITE  Orphan Diseases: Identifying Genes and Novel Therapeutics to Enhance Treatment  Identify causative genetic variations in orphan diseases with an emphasis on Atlantic Canada  Develop animal and cell culture models  Identify and develop novel therapeutics  igniteproject.ca
  • 3. IGNITE  Orphan Diseases: Identifying Genes and Novel Therapeutics to Enhance Treatment  Identify causative genetic variations in orphan diseases with an emphasis on Atlantic Canada  Develop animal and cell culture models  Identify and develop novel therapeutics  igniteproject.ca
  • 4. Outline  Introduction  Bioinformatics in Disease Genomics  Next-Generation Sequencing  Genomics in Research and the Clinic  The Data Deluge and its Solutions  Bioinformatic Methods for Analyzing Genomic Data  Case Studies  Conclusion
  • 5. Bioinformatics in Disease Genomics  Handling and long-term storage of raw data (sequencing, gene expression, etc)  Maintenance and support of computational infrastructure  Experimental design  Data analysis  Methods development  Analysis pipelines  Statistical analyses  Algorithm design
  • 6. Bioinformatics in Disease Genomics  Handling and long-term storage of raw data (sequencing, gene expression, etc)  Maintenance and support of computational infrastructure  Experimental design  Data analysis  Methods development  Analysis pipelines  Statistical analysis techniques  Algorithm design
  • 8.
  • 9. Disease Genomics: Hunting Down Pathogenic Genetic Variation Referenc Exon 1 Intron 1 Exon 2 e Start TAA Stop
  • 10. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice Sites Referenc Exon 1 Intron 1 Exon 2 e Start TAA mRNA coding for protein Stop
  • 11. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice Sites Referenc Exon 1 Intron 1 Exon 2 e Start TAA mRNA coding for protein Stop Patient Exon 1 Intron 1 Exon 2
  • 12. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice Sites Referenc Exon 1 Intron 1 Exon 2 e Start TAA mRNA coding for protein Stop TAC Tyr Patient Exon 1 Intron 1 Exon 2
  • 13. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice Sites Referenc Exon 1 Intron 1 Exon 2 e Start TAA mRNA coding for protein Stop TAC Tyr Patient Exon 1 Intron 1 Exon 2
  • 14. Disease Genomics: Hunting Down Pathogenic Genetic Variation Splice Sites Referenc Exon 1 Intron 1 Exon 2 e Start TAA mRNA coding for protein Stop TAC Tyr Patient Exon 1 Intron 1 Exon 2
  • 15. Disease Genomics: Research vs Clinic  Still predominantly research oriented  Complex/Common disease  Mendelian disorders  Cancer genomics
  • 16. Disease Genomics: Research vs Clinic  Still predominantly research oriented  Complex/Common disease  Mendelian disorders  Cancer genomics  Clinical genomics starting to gain traction  Cancer genomics  Cancer subtype identification  Personalized medicine and predicting outcomes  Mendelian disorders  Early diagnosis  Cost effectiveness
  • 17. Clinical Genomics  Children’s Mercy Hospital NICU  In the US >20% of infant deaths due to genetic disease  Serial sequencing of candidate genes too slow
  • 18. Children’s Mercy Hospital NICU  50-hour differential diagnosis of monogenic disease  Sample preparation and sequencing: 30.5 hours  Automated bioinformatics analysis: 17.5 hours  Previous high-throughput sequencing methods: 19 days  Test on seven infants, two previously diagnosed using standard methods, five undiagnosed
  • 19. Children’s Mercy Hospital NICU  50-hour differential diagnosis of monogenic disease  Sample preparation and sequencing: 30.5 hours  Automated bioinformatics analysis: 17.5 hours  Previous high-throughput sequencing methods: 19 days  Test on seven infants, two previously diagnosed using standard methods, five undiagnosed  Caveats  Bioinformatics portion not available outside of hospital  Requires thorough clinical phenotyping using a controlled vocabulary  Generates a large amount of data
  • 20. The Data Deluge 4 million genetic variants 2 million associated with protein-coding genes 10,000 possibly of disease causing type 1500 <1% frequency in population
  • 21. Surviving the Data Deluge Reducing the Search Space: Exome Sequencing
  • 22. Exome Sequencing  Exome: Portion of genome composed of protein- coding exons and functional RNA sequences  1.5 - 2% of human genome (50 Mb)  > 85% of monogenic diseases due to variants in exome  Complete exome sequencing: ~ $1000/sample
  • 23. Caveats  Incomplete and non-uniform coverage of exome  Systematic bias (GC content)  Random sampling  Not all genetic variants amenable to discovery  Non-coding variants  Structural variants
  • 24. Surviving The Data Deluge Bioinformatics
  • 25. Typical Bioinformatics Workflow QC of Raw Data Map to Reference QC Find Variants QC Annotate Filter
  • 26. It Sounds simple but…  For every stage there are multiple programs available and published in the literature
  • 27. It Sounds simple but…  For every stage there are multiple programs available and published in the literature  For every program there are a wide-variety of parameter values and options. Defaults often “good enough” but not always
  • 28. It Sounds simple but…  For every stage there are multiple programs available and published in the literature  For every program there are a wide-variety of parameter values and options. Defaults often “good enough” but not always  Best combinations of programs and options not well understood
  • 29. It Sounds simple but…  For every stage there are multiple programs available and published in the literature  For every program there are a wide-variety of parameter values and options. Defaults often “good enough” but not always  Best combinations of programs and options not well understood  Protocols changing rapidly as new technologies and methods developed
  • 30. It Sounds simple but…  For every stage there are multiple programs available and published in the literature  For every program there are a wide-variety of parameter values and options. Defaults often “good enough” but not always  Best combinations of programs and options not well understood  Protocols changing rapidly as new technologies and methods developed  Different centres and groups use slightly different workflows with similar, but not identical results
  • 31. Typical Bioinformatics Workflow QC of Raw Data Map to Reference QC Find Variants QC Annotate Filter
  • 33. If a problem cannot be solved, enlarge it. --Dwight D. Eisenhower
  • 34. Annotations Associated with Genomic Variants  Is variant in a known protein-coding gene?  What does the gene do?  What molecular pathways?  What protein-protein interactions? 4 million genetic variants  What tissues is it expressed in? 2 million associated with protein-coding genes  When in development? 10,000 possibly  Has this variant been seen before? of disease causing type 1500 <1%  What population(s)? With what frequency? frequency in population  Has it been seen in local sequencing projects?  Is there any known clinical significance?  What is the effect of the variation?  Does it change the resulting protein? How?
  • 37. Potential Pitfalls with Annotation Sources  Databases often overlap and agree, but there may be disagreements  Source of information: Predicted versus experimental  Incorrect and out-of-date information  Large-scale un-validated versus manually curated datasets
  • 38. Bioinformatics Analyses of Genomic Variants Combining Data Sources and Filtering
  • 39. IGNITE Data Pipeline and Integration Gene Annotations Annotated Genomic Variants Mapped Gene Region(s) Definitions Filter Sort Prioritize Known Genes Pathway and Interactions
  • 40. Filtering the Data: Categorization 4 million variants Intronic Exonic Intergenic Amino Acid Unknown Splice Site Silent Mutation Splice Site Changing Potential Potential Disease Disease Causing Causing Known Genetic Amino Acid Amino Acid Known Stop Loss / Disease Change Likely Change Likely Polymorphism Stop Gain Variant Pathogenic Benign in Population
  • 41. Filtering the Data: Common or Rare?  Variants in dbSNP – Typically known polymorphisms, unlikely to be associated with rare disease  Variants with relatively high frequency in control populations (1000 Genomes, HapMAP, EVS, 2800 Exomes)  Number of times variant previously seen at sequencing centre/locally
  • 42. Notes on Filtering and Variant Annotation  Very important to be aware of population when referencing frequency of a variant. Incorrect background leads to incorrect assumptions on prevalence
  • 43. Notes on Filtering and Variant Annotation  Very important to be aware of population when referencing frequency of a variant. Incorrect background leads to incorrect assumptions on prevalence  Reasonably well-sampled local populations are better than any other reference
  • 44. Notes on Filtering and Variant Annotation  Very important to be aware of population when referencing frequency of a variant. Incorrect background leads to incorrect assumptions on prevalence  Reasonably well-sampled local populations are better than any other reference  Strike a balance between hard filtering for variants of largest potential effect and being inclusive to not miss variants
  • 45. Notes on Filtering and Variant Annotation  Very important to be aware of population when referencing frequency of a variant. Incorrect background leads to incorrect assumptions on prevalence  Reasonably well-sampled local populations are better than any other reference  Strike a balance between hard filtering for variants of largest potential effect and being inclusive to not miss variants  Some genes acquire large effect variants (stop loss / stop gain, etc) frequently. Some genes can be lost without causing disease
  • 46. Applications to Real Data Charcot-Marie-Tooth Disease and Cutis Laxa
  • 47. IGNITE Data Pipeline and Integration Gene Annotations Annotated Genomic Variants Mapped Gene Region(s) Definitions Filter Sort Prioritize Known Genes Pathway and Interactions
  • 48. Charcot-Marie-Tooth: Genetic Mapping Chromosome 9: 120,962,282 - 133,033,431
  • 49. Cutis Laxa: Genetic Mapping Chromosome 17: 79,596,811- 81,041,077
  • 50. Charcot-Marie-Tooth Cutis Laxa  143 genes in region  52 genes in region  13 known genes in  5 known genes in genome genome  ATP6V0A2  MPZ  ELN  PMP22  FBLN5  GDAP1  EFEMP2  KIF1B  SCYL1BP1  MFN2  ALDH18A1  SOX  EGR2  DNM2  RAB7  LITAF (SIMPLE)  GARS  YARS  LMNA
  • 51. Pathway and Interaction Data  37 pathways  10 pathways  Clathrin-derived vesicle  Phagosome budding  Collecting duct acid  Lysosome vesicle secretion biogenesis  Lysosome  Endocytosis  Protein digestion and  Golgi-associated vesicle absorption biogenesis  Metabolic pathways  Membrane trafficking  Oxidative  Trans-Golgi network phosphorylation vesicle budding  Arginine and proline  Primarily LMNA or metabolism DNM2  Primarily ATP6V0A2
  • 52. Results: Charcot-Marie-Tooth  8 Genes Prioritized Gene Interactions Pathway LRSAM1 Multiple Endocytosis DNM1 DNM2 - FNBP1 DNM2 - TOR1A MNA - STXBP1 Multiple Five SH3GLB2 - Endocytosis PIP5KL1 - Endocytosis FAM125B - Endocytosis  For more information  Guernsey et al (2010) PLoS Genetics. 6(8): e1001081
  • 53. Results: Cutis Laxa  10 genes prioritized Gene Interactions Pathway HEXDC Multiple Phagosome HG5 - Phagosome HG5 Multiple Lysosome, Protein digestion SIRT7 Multiple Metabolic Pathways FASN - Metabolic Pathways DCXR - Metabolic Pathways PYCR1 - Metabolic Pathways, Arginine/Proline PCYT2 - Metabolic Pathways ARHGDIA - Oxidative Phosphorylation  For more information  Guernsey et al (2009) Am J Hum Genet. 85(1): 120-9
  • 55. Conclusions  Bioinformatics is involved at every stage of genomic research from experimental design through to final analysis  Standards and best practices do exist, but are rapidly evolving as new technologies and methods are developed  Progress towards automatic generation of clinically interpretable genomics studies  Annotation, filtering, and prioritization of genetic variants crucial  Balance between false positive calls and false negatives
  • 56. Where Are We Headed?  Integration of more data sources  Gene expression  More annotation sources  Controlled phenotype vocabularies  Gene Ontology terms  Predictive models  Recessive versus Dominant inheritance and Penetrance  “New” and Emerging Technologies  RNA-Seq (Gene Expression)  ChIP-Seq (Protein-DNA binding)  Single-Molecule Sequencing
  • 57. Acknowledgements  Dalhousie University  McGill/Genome Quebec  Dr. Karen Bedard  Dr. Jacek Majewski  Dr. Chris McMaster  Jeremy  Dr. Andrew Orr Schwartzentruber  Dr. Conrad Fernandez  Dr. Marissa Leblanc  Dr. Sarah Dyack  Mat Nightingale  Dr. Johane Robataille  Bedard Lab  Genome Atlantic  IGNITE