SlideShare une entreprise Scribd logo
1  sur  55
Expanding the Clinical
Phenotype Space with
Semantics and Model Systems
Melissa Haendel
March 14th, 2014
Updates in Clinical Genetics 2014
Outline
 Issues in candidate prioritization
 Computational techniques for comparing
phenotypes
 Undiagnosed Disease Program semantic
phenotyping
 Minimum phenotype requirements
 Tools leveraging phenotypes
The Challenge: Interpretation of
Disease Candidates
?
 What’s in the box?
 How are
candidates
identified?
 How do they
compare?
Prioritized
Candidates, Models,
functional validation
M1
M2
M3
M4
...
Phenotypes
P1
P2
P3
…
Genotype info
G1
G2
G3
G4
…
Pathogenicity, frequency, p
rotein interactions, gene
expression, gene
networks, epigenomics, m
etabolomics….
Candidate gene prioritization
Phenot ypic inf or mat ionGenet ic inf or mat ion
gene/ gene pr oduct Inf o
Phenotypes
collected for
individual patients
Sequences from an
individual,family,or
related group
Candidate interpretation
Human sequence reference
sequences (e.g.reference
sequence,1K genome data,
genomic location)
Community phenotype data (e.g.
literature MODS,KOMP2,OMIM,
EHRs,GWAS,ClinVar,disease
specific repositories,etc.)
Pathway
Functional (GO)
Gene
expression,
OMICS data
Protein-Protein
Interactions
Enrichment analysis
(e.g.GATACA,Galaxy)
Combined variant +
phenotype candidate
reporting(e.g.Exomizer)
BiomedicalKnowledgeIndividual'sInformation
Phenotypic comparison
methods
Variant calling
(e.g.GATK)
Pathogenicity
/Impact
calling (e.g.
VAAST,SIFT)
Orthologs
Network module analysis
B6.Cg-Alms1foz/fox/J
increased weight,
adipose tissue volume,
glucose homeostasis altered
ALSM1(NM_015120.4)
[c.10775delC] + [-]
GENOTYPE
PHENOTYPE
obesity,
diabetes mellitus,
insulin resistance
increased food
intake, hyperglycemia,
insulin resistance
kcnj11c14/c14; insrt143/+(AB)
Models recapitulate various
phenotypic aspects of
disease
?
OMIM Query # Records
“large bone” 785
“enlarged bone” 156
“big bone” 16
“huge bones” 4
“massive bones” 28
“hyperplastic bones” 12
“hyperplastic bone” 40
“bone hyperplasia” 134
“increased bone growth” 612
Searching for phenotypes using
text alone is insufficient
Problem: Clinical and model
phenotypes are described differently
“Expanding” the phenotypic coverage
of the human genome
0%
20%
40%
60%
80%
100%
%humancodinggenes
OMIM
OMIM+GWA
S
Ortholog only
Human+Ortholog
Human only
Five model organisms (mouse, zebrafish, fly, yeast, rat)
provide almost 80% phenotypic coverage of the human
genome
How can we take advantage
this model organism
phenotype data?
Outline
 Issues in candidate prioritization
 Computational techniques for comparing
phenotypes
 Undiagnosed Disease Program semantic
phenotyping
 Minimum phenotype requirements
 Tools leveraging phenotypes
Using ontologies to compare phenotypes
across species
Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009).
Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS
Biol, 7(11). doi:10.1371/journal.pbio.1000247
What is an ontology?
A set of logically defined, inter-related terms
used to annotate data
Use of common or logically related terms across
databases enables integration
Relationships between terms allow annotations to
be grouped in scientifically meaningful ways
Reasoning software enables computation of inferred
knowledge
Groups of annotations can be compared using
semantic similarity algorithms
An ontology provides the logical
basis of classification
Any sense organ that functions in the
detection of smell is an olfactory sense
organ
sense organ
capable_of
some
detection of
smell
olfactory
sense
organ
nose
sense organ
nose
capable_of some
detection of smell
sense organ
capable_of
some
detection of
smell
olfactory
sense
organ
nose
=> These are necessary and sufficient conditions
Classifying
Representating phenotypes
Human Phenotype Ontology
Used to annotate:
• Patients
• Disorders
• Genotypes
• Genes
• Sequence variants
In human
Reduced pancreatic
beta cells
Abnormality of
pancreatic islet
cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Abnormality of exocrine
pancreas physiology
Köhler et al. The Human Phenotype Ontology project: linking molecular biology and
disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
Mammalian Phenotype Ontology
Smith et al. (2005). The Mammalian Phenotype Ontology as a
tool for annotating, analyzing and comparing phenotypic
information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7
Used to annotate and
query:
• Genotypes
• Alleles
• Genes
In mice
abnormal
pancreatic
beta cell
mass
abnormal
pancreatic
beta cell
morphology
abnormal
pancreatic islet
morphology
abnormal
endocrine
pancreas
morphology
abnormal
pancreatic
beta cell
differentiation
abnormal
pancreatic
alpha cell
morphology
abnormal
pancreatic
alpha cell
differentiation
abnormal
pancreatic
alpha cell
number
Post-composed models of
phenotype annotation
Entity
Anatomy: head
Anatomy: heart
Anatomy: ventral mandibular arch
Gene Ontology: swim bladder inflation
Quality
Small size
Edematous
Thick
Arrested
A human phenotype example
Abnormality
of the eye
Vitreous
hemorrhage
Abnormal
eye
morphology
Abnormality of the
cardiovascular system
Abnormal
eye
physiology
Hemorrhage
of the eye
Internal
hemorrhage
Abnormality
of the globe
Abnormality of
blood circulation
lung
lung
lobular organ
parenchymatous
organ
solid organ
pleural sac
thoracic
cavity organ
thoracic
cavity
abnormal lung
morphology
abnormal respiratory
system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary
acinus morphology
abnormal pulmonary
alveolus morphology
lung
alveolus
organ system
respiratory
system
Lower
respiratory
tract
alveolar sac
pulmonary
acinus
organ system
respiratory
system
Human development
lung
lung bud
respiratory
primordium
pharyngeal region
Problem: Data silos
develops_from
part_of
is_a (SubClassOf)
surrounded_by
Solution: bridging semantics
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative
multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
anatomical
structure
endoderm of
forgut
lung bud
lung
respiration organ
organ
foregut
alveolus
alveolus of lung
organ part
FMA:lung
MA:lung
endoderm
GO: respiratory
gaseous exchange
MA:lung
alveolus
FMA:
pulmonary
alveolus
is_a (taxon equivalent)
develops_from
part_of
is_a (SubClassOf)
capable_of
NCBITaxon: Mammalia
EHDAA:
lung bud
only_in_taxon
pulmonary acinus
alveolar sac
lung primordium
swim bladder
respiratory
primordium
NCBITaxon:
Actinopterygii
Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with
gene annotations for biomedical research F1000Research 2014, 2:30
Phenotype representation requires
more than “phenotype ontologies”
glucose
metabolism
(GO:0006006
)
Gene/protein
function data
glucose
(CHEBI:172
34)
Metabolomics, t
oxicogenomics
data
Disease &
phenotype
data
type II
diabetes
mellitus
(DOID:9352)
pyruvate
(CHEBI:153
61)
Disease Gene Ontology Chemical
pancreatic
beta cell
(CL:0000169)
transcriptomic
data
Cell
OWLsim: Phenotype similarity
across patients or organisms
Unstable
posture
Constipation
Neuronal loss in
Substantia Nigra
Shuffling gait
Resting tremors
REM disorder
Hyposmia
poor rotarod
performance
decreased gut
peristalsis
axon
degeneration
decreased
stride length
sterotypic
behavior
abnormal
EEG
failure to find
food
abnormal
coordination
abnormal
digestive
physiology
CNS neuron
degeneration
abnormal
locomotion
abnormal
motor function
sleep
disturbance
abnormal
olfaction
https://code.google.com/p/owltools/wiki/OwlSim
Outline
 Issues in candidate prioritization
 Computational techniques for comparing
phenotypes
 Undiagnosed Disease Program semantic
phenotyping
 Minimum phenotype requirements
 Tools leveraging phenotypes
General exome analysis
Single Exome
Remove off-target and common
variants, filter on predicted
deleteriousness, candidate gene
strategies
Prioritize based on known
genes, allele frequency, and
pathogenicity
Homozygous recessive, X-
linked, De novo (if trio)
Undiagnosed Disease Program
exome analysis
Family exome data
Prioritize based on alignment quality, allele
frequency, predicted deleterious, and PubMed
Filter using SNP chip data,
Mendelian models of inheritance
and Population frequency
exome analysis
Recessive, De novo filters
Remove off-target, common
variants, and variants not in known
disease causing genes
Zemojtelet al., manuscript submittedhttp://compbio.charite.de/PhenIX/
Remove off-target and common
variants
Recessive, De novo filters
https://www.sanger.ac.uk/resourc
es/databases/exomiser/
Robinson et al.
http://genome.cshlp.org/content/early/2013/10/2
Exomiser exome analysis
Current UDP analysis with
semantic phenotyping
Family Exome Data
Combined
Score
Phenotype
Data
Filter using SNP chip
data, Mendelian models of
inheritance, and population
frequency
Benchmarking
1092 unaffected
exomes 28,516 disease
associated variants
100,000
simulated
exomes
 Annotate variants
 Remove off-target, syn and common(>1% MAF)
variants (plus optional inheritance model
filtering)
 Prioritize based on combined score
0
10
20
30
40
50
60
70
80
90
100
All diseases Autosomal
Dominant
Autosomal
Recessive (hom)
Autosomal
Recessive
(compound het)
%exomeswithdiseasegeneas
tophit
Variant
Phenotypic relevance
PHIVE
Phenotype and variant data synergistically
improve exome interpretation
Results
 Correct gene as top scoring hit in 68.3% of exomes out
of an average of 272 post-filtering candidate genes
 Improvement of between 1.8 and 5.1 fold in the
percentage of candidate genes correctly ranked in first
place compared to just using pathogenicity and
frequency data
 Shows utility of structured phenotype data for
computational analysis
UDP Experiment
UDP Diploid
Aligned Cohort
VCF file
18 families
Phenotype
profiles
Mendelian filtered
files (per family)
Mendelian
Filters
Exomiser
PhenIX
Phenotype only
VCF files with
phenotype and variant
scores (per family)
Top de novo candidates for
patient 2543
Patient Exomiser Phenotype only PhenIX
UDP2543
STIM1, CYP2D6,
MUC5B
ITGA7, PLEC,
STIM1, PTGS1,
TTN
STIM1, RB1,
DLEC1, CHRNB4,
MUC5B, REPIN1,
NBPF8, GPRIN3,
TMEFF1, FLT3LG,
OSM, FZD10,
MUC12
Gene Variant MAF(ESP or
1000g)
Consequence Predicted
pathogenicity:
SIFT, PolyPhen,
MutTaster (0-1)
STIM1 chr11:g.4045175A>T [0/1] 0% p.I115T 1
UDP2543: phenotypic similarity
Patient Stim1 het mouse OMIM:612783
(IMMUNODEFICIENCY
10) - hom STIM1
mutations
OMIM:160565
(MYOPATHY, TUBULAR
AGGREGATE) - het STIM1
mutations
Impaired platelet
aggregation
abnormal platelet
activation
Thrombocytopenia
Thrombocytopenia decreased platelet
cell number
Thrombocytopenia
Myopathy Myopathy Myopathy
Generalized
hypotonia
Muscular hypotonia Proximal muscle weakness
Petechiae increased
bleeding time
Autoimmune hemolytic
anemia
Delayed gross motor
development
Epistaxis
increased
bleeding time
Gower sign
STIM1
Suspected genetic
disease
DRG sequencing
Deep phenotyping
Top ranked
candidates
Clinical rounds
Exclude candidate
gene
Sanger validation
Cosegregation
studies
Diagnosis
Fails
Passes
Inconsistent
Consistent
Reconsider short list
Choose best
candidate
Variant Analysis
HGMD
MAF (dbSNP, ESP)
ClinVar
Annotation sources:
Predicted pathogenicity
Variant class
Location in DRG target region
Prediction criteria:
Computational Phenotype Analysis
HPO
Semantic similarity
Mode of inheritance
OMIM, Orphanet,
MGI...
Annotation sources:
Ontology:
Prediction criteria: MP
Proposed workflow for undiagnosed
diseases
What constitutes an
adequate phenotype
annotation for an
undiagnosed patient?
Defining minimum phenotype
standard:
1. Is the annotation specificity similar to or better than the
corpus of available phenotype data?
2. Is the number of annotations/patient similar or better?
3. How does the ontology and annotation set differ across
anatomical systems in terms of granularity? Does this
change specificity requirements for phenotypic profiles?
4. How does use of NOT annotations help further specify
the uniqueness of an undiagnosed patient?
5. How do onset, temporal ordering, and severity affect
specificity?
UDP phenotype annotation
metrics
UDP annotations have a similar Information content (IC) and a
larger number of average annotations per disease/patient
Anatomical annotation distribution in
the corpus
Nervous system, skeletal system, and immune system is highest =>
these categories require greater specificity and numbers of annotations
Annotation specificity meter
What about common traits, like blue eyes or acne?
Making the patient phenotype profiles
as good as can be
Total requests from UDP 614 Examples
Number of requests assigned to
HPO terms 423 Chronic limb pain -> limb pain
Number of terms that need
consideration by UDP 145
Expressive language -> delay?
Increase? Abnormal?
Number of requests that belong
in other parts of the patient
record 68
Abnormal aCGH 12q21.1-
12q.2 (662 kb duplication)
paternal origin -> move to
genotype information portion
of the record
It is a community effort to contribute requests to the ontologies and
quality profiling helps make our tools work better for everyone
Limitations and ongoing work
 Adding negation to the algorithm
 Temporal ordering of phenotypes
 Leveraging severity, expressivity, and
penetrance data
Additional tools leveraging
structured phenotype data
The Monarch system
http://monarchinitiative.org
Monarch phenotype data
Species Source Unique
genotypes/va
riants
Disease/phen
otype
associations
Mouse MGI 53,573 406,618
Zebrafish ZFIN 14,703 75,698
C. elegans Wormbase 116,106 411,154
Fruit fly Flybase 98,596 265,329
Human OMIM 26,372 27,798
Human Orphanet 2,872 5,095
Human ClinVar 62,437 178,424
Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD;
Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions;
Drugbank; AutDB; Allen Brain …157 sources to date
Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse
ModelCompare: How do the models
recapitulate the disease?
Late-onset Parkinson’s
Phenotypes Mouse Phenotypes
Slc6a3
Dbh
Tyrosine
metabolism
Slc6a3
Slc18a2
Uchl1
Uchl3
Snca
Mfn2
Cx IV
Cox8a
Th
Late-onset
Parkinson’s
Phenotypes
(subset)
Bradykinesia
Depression
Dysphagia
Lewy bodies
Network phenotype distribution
Slc6a3
Dbh
Tyrosine
metabolism
Slc6a3
Slc18a2
Uchl1
Uchl3
Snca
Mfn2
Cx IV
Cox8a
Th
Late-onset
Parkinson’s
Phenotypes
(subset)
Bradykinesia
Depression
Dysphagia
Lewy bodies
Abnormal gait
ataxia
paralysis
Bradykinesia
Abnormal locomotion
Abnormality of
central motor function
Phenotypes in common
Finding collaborators for
functional validation
Patient
Phenotype profile
Phenotyping
experts
Exome Walker: Network based exploration
of phenotypically similar diseases
http://compbio.charite.de/ExomeWalker/
Walking the interactome for prioritization of candidate disease genes.
Am J Hum Genet. 2008 Apr;82(4):949-58. doi: 10.1016/j.ajhg.2008.02.013.
Bare Lymphocyte Syndrome Type 1 Protein-Interaction Network
 Exploits vicinity in the protein interaction network between phenotypically related
diseases and uses this to rank exome candidates
 Large boost in rankings of candidate genes using 250 disease gene-families
 Prototype version online, manuscript in preparation
PhenoViz: Integrate all human, mouse, and
fish data to understand CNVs
Desktop application
for differential
diagnostics in CNVs
 Explain manifestations of CNV diseases based on genes
contained in CNV
E.g., Supravalcular aortic stenosis in Williams syndrome can be
explained by haploinsufficiency for elastin
 Double the number of explanations using model data
Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72
Conclusions
 Cross-species phenotype data can be used to
perform semantic similarity
 Structured phenotype data for rare and
undiagnosed disease patients can aid
candidate evaluation
 We are experimenting with these methods for
UDP patient phenotypes to aid candidate
prioritization, identify models, explore
mechanisms, and find collaborators
NIH-UDP
William Bone
Murat Sincan
David Adams
Amanda Links
David Draper
Neal Boerkoel
Cyndi Tifft
Bill Gahl
OHSU
Nicole Vasilesky
Matt Brush
Lawrence Berkeley
Nicole Washington
Suzanna Lewis
Chris Mungall
UCSD
Amarnath Gupta
Jeff Grethe
Anita Bandrowski
Maryann Martone
U of Pitt
Chuck Boromeo
Jeremy Espino
Harry Hochheiser
Acknowledgments
Sanger
Anika Oehlrich
Jules Jacobson
Damian Smedley
Toronto
Marta Girdea
Sergiu Dumitriu
Mike Brudno
JAX
Cynthia Smith
Charité
Sebastian Kohler
Sandra Doelken
Sebastian Bauer
Peter Robinson
Funding:
NIH Office of Director: 1R24OD011883
NIH-UDP: HHSN268201300036C

Contenu connexe

Tendances

Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...nist-spin
 
Sophie F. summer Poster Final
Sophie F. summer Poster FinalSophie F. summer Poster Final
Sophie F. summer Poster FinalSophie Friedheim
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisDespoina Kalfakakou
 
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...John Blue
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingEmiliano De Cristofaro
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101Ino de Bruijn
 
Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...
Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...
Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...Thermo Fisher Scientific
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencingcdgenomics525
 
Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validationGenomeInABottle
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...mhb120
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Mahidol University, Thailand
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
Clinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidanceClinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidanceNikesh Shah
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingNixon Mendez
 

Tendances (20)

Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
Ngs presentation
Ngs presentationNgs presentation
Ngs presentation
 
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for HarmonizationEU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
EU PathoNGenTraceConsortium:cgMLST Evolvement and Challenges for Harmonization
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
Sophie F. summer Poster Final
Sophie F. summer Poster FinalSophie F. summer Poster Final
Sophie F. summer Poster Final
 
Bioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesisBioinformatics as a tool for understanding carcinogenesis
Bioinformatics as a tool for understanding carcinogenesis
 
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
Dr. Douglas Marthaler - Use of Next Generation Sequencing for Whole Genome An...
 
The Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome SequencingThe Chills and Thrills of Whole Genome Sequencing
The Chills and Thrills of Whole Genome Sequencing
 
20170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_10120170209 ngs for_cancer_genomics_101
20170209 ngs for_cancer_genomics_101
 
Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...
Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...
Clinical Validation of an NGS-based (CE-IVD) Kit for Targeted Detection of Ge...
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 
Testing for Food Authenticity
Testing for Food AuthenticityTesting for Food Authenticity
Testing for Food Authenticity
 
Aug2015 deanna church analytical validation
Aug2015 deanna church analytical validationAug2015 deanna church analytical validation
Aug2015 deanna church analytical validation
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...What's In a Genotype?: An Ontological Characterization for the Integration of...
What's In a Genotype?: An Ontological Characterization for the Integration of...
 
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
Introduction to Metagenomics Data Analysis - UEB-VHIR - 2013
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
Clinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidanceClinical molecular diagnostics for drug guidance
Clinical molecular diagnostics for drug guidance
 
Errors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation SequencingErrors and Limitaions of Next Generation Sequencing
Errors and Limitaions of Next Generation Sequencing
 

En vedette (16)

Presentation1
Presentation1Presentation1
Presentation1
 
Molecular basis of heterosis in crop plants
Molecular basis of heterosis in crop plantsMolecular basis of heterosis in crop plants
Molecular basis of heterosis in crop plants
 
Heterosis
Heterosis  Heterosis
Heterosis
 
Breeding methods for vegetables
Breeding  methods for  vegetablesBreeding  methods for  vegetables
Breeding methods for vegetables
 
Haploid
HaploidHaploid
Haploid
 
Heterosis concepts
Heterosis conceptsHeterosis concepts
Heterosis concepts
 
Genetical and physiological basis of heterosis and inbreeding
Genetical and physiological basis of heterosis and inbreedingGenetical and physiological basis of heterosis and inbreeding
Genetical and physiological basis of heterosis and inbreeding
 
Hybrid seed technology
Hybrid seed technology Hybrid seed technology
Hybrid seed technology
 
Presentation on Epistasis
Presentation on EpistasisPresentation on Epistasis
Presentation on Epistasis
 
Interaction of genes for slide share
Interaction of genes for slide shareInteraction of genes for slide share
Interaction of genes for slide share
 
Epistasis
EpistasisEpistasis
Epistasis
 
Epistasis
EpistasisEpistasis
Epistasis
 
Breeding systems
Breeding systemsBreeding systems
Breeding systems
 
Epistasis
EpistasisEpistasis
Epistasis
 
Breeding methods in cross pollinated crops
Breeding methods in cross pollinated cropsBreeding methods in cross pollinated crops
Breeding methods in cross pollinated crops
 
Plant Breeding Methods
Plant Breeding MethodsPlant Breeding Methods
Plant Breeding Methods
 

Similaire à Haendel clingenetics.3.14.14

Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Chris Mungall
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology mhaendel
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updatemhaendel
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesChris Mungall
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
 
Use of semantic phenotyping to aid disease diagnosis
Use of semantic phenotyping to aid disease diagnosisUse of semantic phenotyping to aid disease diagnosis
Use of semantic phenotyping to aid disease diagnosismhaendel
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataYannick Pouliot
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataChris Mungall
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsfisherp
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Nicole Vasilevsky
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyonemhaendel
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseasemhaendel
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicJoaquin Dopazo
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformaticaMartín Arrieta
 
Evotec - How can Knowledge Graphs support Druh Discovery
Evotec - How can Knowledge Graphs support Druh DiscoveryEvotec - How can Knowledge Graphs support Druh Discovery
Evotec - How can Knowledge Graphs support Druh DiscoveryNeo4j
 
Enhancing Rare Disease Literature for Researchers and Patients
Enhancing Rare Disease Literature for Researchers and PatientsEnhancing Rare Disease Literature for Researchers and Patients
Enhancing Rare Disease Literature for Researchers and PatientsErin D. Foster
 

Similaire à Haendel clingenetics.3.14.14 (20)

Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015Computing on Phenotypes AMP 2015
Computing on Phenotypes AMP 2015
 
The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology The Application of the Human Phenotype Ontology
The Application of the Human Phenotype Ontology
 
GA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team updateGA4GH Phenotype Ontologies Task team update
GA4GH Phenotype Ontologies Task team update
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and Diabetes
 
The Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision MedicineThe Monarch Initiative: From Model Organism to Precision Medicine
The Monarch Initiative: From Model Organism to Precision Medicine
 
Use of semantic phenotyping to aid disease diagnosis
Use of semantic phenotyping to aid disease diagnosisUse of semantic phenotyping to aid disease diagnosis
Use of semantic phenotyping to aid disease diagnosis
 
Ontologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological DataOntologies for Semantic Normalization of Immunological Data
Ontologies for Semantic Normalization of Immunological Data
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 
A systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlationsA systematic approach to Genotype-Phenotype correlations
A systematic approach to Genotype-Phenotype correlations
 
Nutrigenomics
NutrigenomicsNutrigenomics
Nutrigenomics
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015
 
How to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical informationHow to transform genomic big data into valuable clinical information
How to transform genomic big data into valuable clinical information
 
Deep phenotyping for everyone
Deep phenotyping for everyoneDeep phenotyping for everyone
Deep phenotyping for everyone
 
Envisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve diseaseEnvisioning a world where everyone helps solve disease
Envisioning a world where everyone helps solve disease
 
Dr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 MedicineDr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 Medicine
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
 
Evotec - How can Knowledge Graphs support Druh Discovery
Evotec - How can Knowledge Graphs support Druh DiscoveryEvotec - How can Knowledge Graphs support Druh Discovery
Evotec - How can Knowledge Graphs support Druh Discovery
 
MLGG_for_linkedIn
MLGG_for_linkedInMLGG_for_linkedIn
MLGG_for_linkedIn
 
Enhancing Rare Disease Literature for Researchers and Patients
Enhancing Rare Disease Literature for Researchers and PatientsEnhancing Rare Disease Literature for Researchers and Patients
Enhancing Rare Disease Literature for Researchers and Patients
 

Plus de mhaendel

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...mhaendel
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverymhaendel
 
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA mhaendel
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholdermhaendel
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introductionmhaendel
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odysseymhaendel
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discoverymhaendel
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoverymhaendel
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebasesmhaendel
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?mhaendel
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...mhaendel
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation mhaendel
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsmhaendel
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be onemhaendel
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationmhaendel
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoverymhaendel
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we domhaendel
 
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...mhaendel
 

Plus de mhaendel (20)

Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
Patient-led deep phenotyping using a lay-friendly version of the Human Phenot...
 
Semantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discoverySemantics for rare disease phenotyping, diagnostics, and discovery
Semantics for rare disease phenotyping, diagnostics, and discovery
 
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA The Software and Data Licensing Solution: Not Your Dad’s UBMTA
The Software and Data Licensing Solution: Not Your Dad’s UBMTA
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
 
GA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project IntroductionGA4GH Monarch Driver Project Introduction
GA4GH Monarch Driver Project Introduction
 
Reusable data for biomedicine: A data licensing odyssey
Reusable data for biomedicine:  A data licensing odysseyReusable data for biomedicine:  A data licensing odyssey
Reusable data for biomedicine: A data licensing odyssey
 
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease DiscoveryData Translator: an Open Science Data Platform for Mechanistic Disease Discovery
Data Translator: an Open Science Data Platform for Mechanistic Disease Discovery
 
Global phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discoveryGlobal phenotypic data sharing standards to maximize diagnostic discovery
Global phenotypic data sharing standards to maximize diagnostic discovery
 
How open is open? An evaluation rubric for public knowledgebases
How open is open?  An evaluation rubric for public knowledgebasesHow open is open?  An evaluation rubric for public knowledgebases
How open is open? An evaluation rubric for public knowledgebases
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 
Science in the open, what does it take?
Science in the open, what does it take?Science in the open, what does it take?
Science in the open, what does it take?
 
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
Global Phenotypic Data Sharing Standards to Maximize Diagnostics and Mechanis...
 
Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation Phenopackets as applied to variant interpretation
Phenopackets as applied to variant interpretation
 
Credit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributionsCredit where credit is due: acknowledging all types of contributions
Credit where credit is due: acknowledging all types of contributions
 
Why the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be oneWhy the world needs phenopacketeers, and how to be one
Why the world needs phenopacketeers, and how to be one
 
On the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integrationOn the frontier of genotype-2-phenotype data integration
On the frontier of genotype-2-phenotype data integration
 
The Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discoveryThe Monarch Initiative: A semantic phenomics approach to disease discovery
The Monarch Initiative: A semantic phenomics approach to disease discovery
 
Getting (and giving) credit for all that we do
Getting (and giving) credit for all that we doGetting (and giving) credit for all that we do
Getting (and giving) credit for all that we do
 
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...The Monarch Initiative: An integrated genotype-phenotype platform for disease...
The Monarch Initiative: An integrated genotype-phenotype platform for disease...
 

Haendel clingenetics.3.14.14

  • 1. Expanding the Clinical Phenotype Space with Semantics and Model Systems Melissa Haendel March 14th, 2014 Updates in Clinical Genetics 2014
  • 2. Outline  Issues in candidate prioritization  Computational techniques for comparing phenotypes  Undiagnosed Disease Program semantic phenotyping  Minimum phenotype requirements  Tools leveraging phenotypes
  • 3. The Challenge: Interpretation of Disease Candidates ?  What’s in the box?  How are candidates identified?  How do they compare? Prioritized Candidates, Models, functional validation M1 M2 M3 M4 ... Phenotypes P1 P2 P3 … Genotype info G1 G2 G3 G4 … Pathogenicity, frequency, p rotein interactions, gene expression, gene networks, epigenomics, m etabolomics….
  • 4. Candidate gene prioritization Phenot ypic inf or mat ionGenet ic inf or mat ion gene/ gene pr oduct Inf o Phenotypes collected for individual patients Sequences from an individual,family,or related group Candidate interpretation Human sequence reference sequences (e.g.reference sequence,1K genome data, genomic location) Community phenotype data (e.g. literature MODS,KOMP2,OMIM, EHRs,GWAS,ClinVar,disease specific repositories,etc.) Pathway Functional (GO) Gene expression, OMICS data Protein-Protein Interactions Enrichment analysis (e.g.GATACA,Galaxy) Combined variant + phenotype candidate reporting(e.g.Exomizer) BiomedicalKnowledgeIndividual'sInformation Phenotypic comparison methods Variant calling (e.g.GATK) Pathogenicity /Impact calling (e.g. VAAST,SIFT) Orthologs Network module analysis
  • 5. B6.Cg-Alms1foz/fox/J increased weight, adipose tissue volume, glucose homeostasis altered ALSM1(NM_015120.4) [c.10775delC] + [-] GENOTYPE PHENOTYPE obesity, diabetes mellitus, insulin resistance increased food intake, hyperglycemia, insulin resistance kcnj11c14/c14; insrt143/+(AB) Models recapitulate various phenotypic aspects of disease ?
  • 6. OMIM Query # Records “large bone” 785 “enlarged bone” 156 “big bone” 16 “huge bones” 4 “massive bones” 28 “hyperplastic bones” 12 “hyperplastic bone” 40 “bone hyperplasia” 134 “increased bone growth” 612 Searching for phenotypes using text alone is insufficient
  • 7. Problem: Clinical and model phenotypes are described differently
  • 8. “Expanding” the phenotypic coverage of the human genome 0% 20% 40% 60% 80% 100% %humancodinggenes OMIM OMIM+GWA S Ortholog only Human+Ortholog Human only Five model organisms (mouse, zebrafish, fly, yeast, rat) provide almost 80% phenotypic coverage of the human genome
  • 9. How can we take advantage this model organism phenotype data?
  • 10. Outline  Issues in candidate prioritization  Computational techniques for comparing phenotypes  Undiagnosed Disease Program semantic phenotyping  Minimum phenotype requirements  Tools leveraging phenotypes
  • 11. Using ontologies to compare phenotypes across species Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247
  • 12. What is an ontology? A set of logically defined, inter-related terms used to annotate data Use of common or logically related terms across databases enables integration Relationships between terms allow annotations to be grouped in scientifically meaningful ways Reasoning software enables computation of inferred knowledge Groups of annotations can be compared using semantic similarity algorithms
  • 13. An ontology provides the logical basis of classification Any sense organ that functions in the detection of smell is an olfactory sense organ sense organ capable_of some detection of smell olfactory sense organ
  • 14. nose sense organ nose capable_of some detection of smell sense organ capable_of some detection of smell olfactory sense organ nose => These are necessary and sufficient conditions Classifying
  • 16. Human Phenotype Ontology Used to annotate: • Patients • Disorders • Genotypes • Genes • Sequence variants In human Reduced pancreatic beta cells Abnormality of pancreatic islet cells Abnormality of endocrine pancreas physiology Pancreatic islet cell adenoma Pancreatic islet cell adenoma Insulinoma Multiple pancreatic beta-cell adenomas Abnormality of exocrine pancreas physiology Köhler et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
  • 17. Mammalian Phenotype Ontology Smith et al. (2005). The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7 Used to annotate and query: • Genotypes • Alleles • Genes In mice abnormal pancreatic beta cell mass abnormal pancreatic beta cell morphology abnormal pancreatic islet morphology abnormal endocrine pancreas morphology abnormal pancreatic beta cell differentiation abnormal pancreatic alpha cell morphology abnormal pancreatic alpha cell differentiation abnormal pancreatic alpha cell number
  • 18. Post-composed models of phenotype annotation Entity Anatomy: head Anatomy: heart Anatomy: ventral mandibular arch Gene Ontology: swim bladder inflation Quality Small size Edematous Thick Arrested
  • 19. A human phenotype example Abnormality of the eye Vitreous hemorrhage Abnormal eye morphology Abnormality of the cardiovascular system Abnormal eye physiology Hemorrhage of the eye Internal hemorrhage Abnormality of the globe Abnormality of blood circulation
  • 20. lung lung lobular organ parenchymatous organ solid organ pleural sac thoracic cavity organ thoracic cavity abnormal lung morphology abnormal respiratory system morphology Mammalian Phenotype Mouse Anatomy FMA abnormal pulmonary acinus morphology abnormal pulmonary alveolus morphology lung alveolus organ system respiratory system Lower respiratory tract alveolar sac pulmonary acinus organ system respiratory system Human development lung lung bud respiratory primordium pharyngeal region Problem: Data silos develops_from part_of is_a (SubClassOf) surrounded_by
  • 21. Solution: bridging semantics Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 anatomical structure endoderm of forgut lung bud lung respiration organ organ foregut alveolus alveolus of lung organ part FMA:lung MA:lung endoderm GO: respiratory gaseous exchange MA:lung alveolus FMA: pulmonary alveolus is_a (taxon equivalent) develops_from part_of is_a (SubClassOf) capable_of NCBITaxon: Mammalia EHDAA: lung bud only_in_taxon pulmonary acinus alveolar sac lung primordium swim bladder respiratory primordium NCBITaxon: Actinopterygii Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research F1000Research 2014, 2:30
  • 22. Phenotype representation requires more than “phenotype ontologies” glucose metabolism (GO:0006006 ) Gene/protein function data glucose (CHEBI:172 34) Metabolomics, t oxicogenomics data Disease & phenotype data type II diabetes mellitus (DOID:9352) pyruvate (CHEBI:153 61) Disease Gene Ontology Chemical pancreatic beta cell (CL:0000169) transcriptomic data Cell
  • 23. OWLsim: Phenotype similarity across patients or organisms Unstable posture Constipation Neuronal loss in Substantia Nigra Shuffling gait Resting tremors REM disorder Hyposmia poor rotarod performance decreased gut peristalsis axon degeneration decreased stride length sterotypic behavior abnormal EEG failure to find food abnormal coordination abnormal digestive physiology CNS neuron degeneration abnormal locomotion abnormal motor function sleep disturbance abnormal olfaction https://code.google.com/p/owltools/wiki/OwlSim
  • 24. Outline  Issues in candidate prioritization  Computational techniques for comparing phenotypes  Undiagnosed Disease Program semantic phenotyping  Minimum phenotype requirements  Tools leveraging phenotypes
  • 25. General exome analysis Single Exome Remove off-target and common variants, filter on predicted deleteriousness, candidate gene strategies Prioritize based on known genes, allele frequency, and pathogenicity Homozygous recessive, X- linked, De novo (if trio)
  • 26. Undiagnosed Disease Program exome analysis Family exome data Prioritize based on alignment quality, allele frequency, predicted deleterious, and PubMed Filter using SNP chip data, Mendelian models of inheritance and Population frequency
  • 27. exome analysis Recessive, De novo filters Remove off-target, common variants, and variants not in known disease causing genes Zemojtelet al., manuscript submittedhttp://compbio.charite.de/PhenIX/
  • 28. Remove off-target and common variants Recessive, De novo filters https://www.sanger.ac.uk/resourc es/databases/exomiser/ Robinson et al. http://genome.cshlp.org/content/early/2013/10/2 Exomiser exome analysis
  • 29. Current UDP analysis with semantic phenotyping Family Exome Data Combined Score Phenotype Data Filter using SNP chip data, Mendelian models of inheritance, and population frequency
  • 30. Benchmarking 1092 unaffected exomes 28,516 disease associated variants 100,000 simulated exomes  Annotate variants  Remove off-target, syn and common(>1% MAF) variants (plus optional inheritance model filtering)  Prioritize based on combined score
  • 31. 0 10 20 30 40 50 60 70 80 90 100 All diseases Autosomal Dominant Autosomal Recessive (hom) Autosomal Recessive (compound het) %exomeswithdiseasegeneas tophit Variant Phenotypic relevance PHIVE Phenotype and variant data synergistically improve exome interpretation
  • 32. Results  Correct gene as top scoring hit in 68.3% of exomes out of an average of 272 post-filtering candidate genes  Improvement of between 1.8 and 5.1 fold in the percentage of candidate genes correctly ranked in first place compared to just using pathogenicity and frequency data  Shows utility of structured phenotype data for computational analysis
  • 33. UDP Experiment UDP Diploid Aligned Cohort VCF file 18 families Phenotype profiles Mendelian filtered files (per family) Mendelian Filters Exomiser PhenIX Phenotype only VCF files with phenotype and variant scores (per family)
  • 34. Top de novo candidates for patient 2543 Patient Exomiser Phenotype only PhenIX UDP2543 STIM1, CYP2D6, MUC5B ITGA7, PLEC, STIM1, PTGS1, TTN STIM1, RB1, DLEC1, CHRNB4, MUC5B, REPIN1, NBPF8, GPRIN3, TMEFF1, FLT3LG, OSM, FZD10, MUC12 Gene Variant MAF(ESP or 1000g) Consequence Predicted pathogenicity: SIFT, PolyPhen, MutTaster (0-1) STIM1 chr11:g.4045175A>T [0/1] 0% p.I115T 1
  • 35. UDP2543: phenotypic similarity Patient Stim1 het mouse OMIM:612783 (IMMUNODEFICIENCY 10) - hom STIM1 mutations OMIM:160565 (MYOPATHY, TUBULAR AGGREGATE) - het STIM1 mutations Impaired platelet aggregation abnormal platelet activation Thrombocytopenia Thrombocytopenia decreased platelet cell number Thrombocytopenia Myopathy Myopathy Myopathy Generalized hypotonia Muscular hypotonia Proximal muscle weakness Petechiae increased bleeding time Autoimmune hemolytic anemia Delayed gross motor development Epistaxis increased bleeding time Gower sign
  • 36. STIM1
  • 37. Suspected genetic disease DRG sequencing Deep phenotyping Top ranked candidates Clinical rounds Exclude candidate gene Sanger validation Cosegregation studies Diagnosis Fails Passes Inconsistent Consistent Reconsider short list Choose best candidate Variant Analysis HGMD MAF (dbSNP, ESP) ClinVar Annotation sources: Predicted pathogenicity Variant class Location in DRG target region Prediction criteria: Computational Phenotype Analysis HPO Semantic similarity Mode of inheritance OMIM, Orphanet, MGI... Annotation sources: Ontology: Prediction criteria: MP Proposed workflow for undiagnosed diseases
  • 38. What constitutes an adequate phenotype annotation for an undiagnosed patient?
  • 39. Defining minimum phenotype standard: 1. Is the annotation specificity similar to or better than the corpus of available phenotype data? 2. Is the number of annotations/patient similar or better? 3. How does the ontology and annotation set differ across anatomical systems in terms of granularity? Does this change specificity requirements for phenotypic profiles? 4. How does use of NOT annotations help further specify the uniqueness of an undiagnosed patient? 5. How do onset, temporal ordering, and severity affect specificity?
  • 40. UDP phenotype annotation metrics UDP annotations have a similar Information content (IC) and a larger number of average annotations per disease/patient
  • 41. Anatomical annotation distribution in the corpus Nervous system, skeletal system, and immune system is highest => these categories require greater specificity and numbers of annotations
  • 42. Annotation specificity meter What about common traits, like blue eyes or acne?
  • 43. Making the patient phenotype profiles as good as can be Total requests from UDP 614 Examples Number of requests assigned to HPO terms 423 Chronic limb pain -> limb pain Number of terms that need consideration by UDP 145 Expressive language -> delay? Increase? Abnormal? Number of requests that belong in other parts of the patient record 68 Abnormal aCGH 12q21.1- 12q.2 (662 kb duplication) paternal origin -> move to genotype information portion of the record It is a community effort to contribute requests to the ontologies and quality profiling helps make our tools work better for everyone
  • 44. Limitations and ongoing work  Adding negation to the algorithm  Temporal ordering of phenotypes  Leveraging severity, expressivity, and penetrance data
  • 47. Monarch phenotype data Species Source Unique genotypes/va riants Disease/phen otype associations Mouse MGI 53,573 406,618 Zebrafish ZFIN 14,703 75,698 C. elegans Wormbase 116,106 411,154 Fruit fly Flybase 98,596 265,329 Human OMIM 26,372 27,798 Human Orphanet 2,872 5,095 Human ClinVar 62,437 178,424 Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources to date Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse
  • 48. ModelCompare: How do the models recapitulate the disease? Late-onset Parkinson’s Phenotypes Mouse Phenotypes
  • 51. Finding collaborators for functional validation Patient Phenotype profile Phenotyping experts
  • 52. Exome Walker: Network based exploration of phenotypically similar diseases http://compbio.charite.de/ExomeWalker/ Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008 Apr;82(4):949-58. doi: 10.1016/j.ajhg.2008.02.013. Bare Lymphocyte Syndrome Type 1 Protein-Interaction Network  Exploits vicinity in the protein interaction network between phenotypically related diseases and uses this to rank exome candidates  Large boost in rankings of candidate genes using 250 disease gene-families  Prototype version online, manuscript in preparation
  • 53. PhenoViz: Integrate all human, mouse, and fish data to understand CNVs Desktop application for differential diagnostics in CNVs  Explain manifestations of CNV diseases based on genes contained in CNV E.g., Supravalcular aortic stenosis in Williams syndrome can be explained by haploinsufficiency for elastin  Double the number of explanations using model data Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72
  • 54. Conclusions  Cross-species phenotype data can be used to perform semantic similarity  Structured phenotype data for rare and undiagnosed disease patients can aid candidate evaluation  We are experimenting with these methods for UDP patient phenotypes to aid candidate prioritization, identify models, explore mechanisms, and find collaborators
  • 55. NIH-UDP William Bone Murat Sincan David Adams Amanda Links David Draper Neal Boerkoel Cyndi Tifft Bill Gahl OHSU Nicole Vasilesky Matt Brush Lawrence Berkeley Nicole Washington Suzanna Lewis Chris Mungall UCSD Amarnath Gupta Jeff Grethe Anita Bandrowski Maryann Martone U of Pitt Chuck Boromeo Jeremy Espino Harry Hochheiser Acknowledgments Sanger Anika Oehlrich Jules Jacobson Damian Smedley Toronto Marta Girdea Sergiu Dumitriu Mike Brudno JAX Cynthia Smith Charité Sebastian Kohler Sandra Doelken Sebastian Bauer Peter Robinson Funding: NIH Office of Director: 1R24OD011883 NIH-UDP: HHSN268201300036C

Notes de l'éditeur

  1. Note: these searches don’t seem to work in OMIM anymore, they may have gotten rid of the ability to search for quoted strings.
  2. Different terminology is used to describe clinical manifestations than is used to describe model system biological features.
  3. Distribution of human annotations from GWAS catalog and OMIM Morbidmap are largely disjoint and touch only 38% of protein-coding genes. Combining together human and ortholog data, nearly 80% of humanprotein-coding genes have phenotype annotations in at least one organism, with more than half only present in animal models.Note that human "phenotypes" are those things liked via GWAS catalog and OMIM. it means that some of the inferences might be artificially low because we aren't yet mapping CNVs to their constituent genes. Note that this also does not include the ClinVar data stats that we recently ingested, and only the model organisms: mouse, zebrafish, fly, yeast, rat. We have a lot more phenotype data now coming from other databases and organisms. These statistics will be available soon.
  4. Also point out the functional classification axis
  5. Things like finding models of sirenomelia due to disruption of the lateral plate mesoderm . Helping to find models and gene candidates based on the relationships in the development
  6. Without additional knowledge and linking, computers can’t make the connections. These links take us from the molecular to the protein, to the cellular and anatomical, to the disease level of phenotypes
  7. OWLsim computes semantic similarity between sets of phenotypes within and across species using the bridging semantics. Phenotypes in common from the bridging ontologies relate human clinical phenotypes with model organism phenotypes.Examples include motor systems, olfaction, and digestion. In this case, data encoded using the human phenotype ontology has been made interoperable with mouse, zebrafish and other model system ontologies. This also enables the use of more complex algorithms to detect similarity – not bases solely on mapping or string matching; e.g. constipation and decreased gut peristalsis are both subtypes of abnormal digestive system physiology.
  8. The norm in exome analysis is to run either single exomes or to do trio analysis. These methods generally use some combination of quality filter, frequency filter, a form of predicted deleteriousness and often a candidate gene method. This is followed by a some basic Mendalian filtration, and then the remain variants are ranked by allele frequency, correlation to phenotype according to an annotation like HGMD, apparent pathogenicity.single exomes or to do trio analysisCANDIDATE GENE LIST
  9. The procedure used at the Undiagnosed Disease Program puts more emphasis on the Mendelian inheritance models. Normally we use SNP chip data coupled with Mendelian filters for the exome data. A script in this case or the program Varsifter is used to filter out all variants that do not meet a homozygous rec, compound het, de novo dominant, or X-linked. Then after using the BAM files to check the quality of the variants, a final, very labor intensive step is done where these variants are currated and annotated by hand based on allele frequecy, predicted deleteriousness and PubMed articles. It is not uncommon that it ends up there is no way to distinguish between the last few variant for which is causing disease.emphasis on the Mendelian inheritance modelsSNPshomozygous rec, compound het, de novo dominant, or X-linkedFinally step done by hand labour intensive
  10. PhenIXusese human data and predicted deleteriousness HGMD ClinVAR OMIM Orphanet
  11. Exomiser Mouse Pheno and deleteriousness
  12. The analysis that we have been experimenting with has been the use of the UDP standard operating procedure script being run on a families’ exome data, then the output of those filters was then put through phenotypic and variant analysis using either Mouse phenotype data via Exomiser or Human data via PhenIX.UDP standard operating procedure script Homo rec, comp het, denovo, X-linked Frequencyunneled into Exomizer or PhenIX for ranking
  13. Run through pipeline: Exomiser LOT is a version of exomiser that is less restrictive as far as what transcripts it recognizes (not at worried about off target reads because of the Mendalian filters and the ability to look at the BAMS)ExomiserExomiser LOT Pheno only
  14. The goal of the following computational analysis is to specifically understand the minimum human phenotype annotation that will enable useful identification of candidate genes and additional related phenotypes for UDP patients based on the current corpus available in Monarch (covering a large set of annotated human diseases from OMIM, Decipher, and Orphanet, as well as phenotype data from mouse, zebrafish and many other species).
  15. Shown is a survey of the human annotations currently in the Monarch system. IC is information content, and higher numbers are a graph measure of specificity. sumIC is a combined indicator of depth of annotation. For the UDP set, each patient id is considered a distinct disease.
  16. Each anatomical system is indicated in a color that is inversely listed in the legend to the graph (e.g. Skeletal System is at the top of the graph). Data are combined from Orphanet, OMIM, and Decipher. The graph shows that the systems with the largest proportion of annotations are the skeletal system and the nervous system. Note that the data is not disjoint - some annotations may fall into multiple categories according to the structure of the ontology.
  17. First implementation. User guidelines written by Monarch will be implemented in PhenoTips in the next few days as a help menu.
  18. Monarch is curating and assisting clinicians to create quality annotation profiles and the clinicians are helping to improve the ontology and therefore the corpus against which the similarity algorithms run.
  19. Large scale data integration of genotypes, phenotypes and many other dataBased on NIF, contains large number of integrated databases (157 to date, more added every week).Building innovative visualization tools to explore model system phenotype data in context of other biomedical data. Widgets and services publicly availableWhy an initiative? Because it is a partnership to promote standardization and integration across model systems and clinical applications and all are welcome.
  20. Using the phenotypes associated with the patient, one can query all model systems to find the ones that have the most related sets of phenotypes. Choosing the right model for a co-clinical trial, or for further analysis must involve an understanding of how well the model recapitulates the full spectrum (or not) of phenotypes. One would want to choose the model with the phenotype that one is most interested in understanding, assaying, or treating. This also have the benefit of providing collaborator suggestions, since the person who phenotyped each model is related to the model. They could be tasked to help perform the co-clinical trial or further phenotyping.These people are best at phenotyping the model, can inform human phenotyping, and conversely be trained to perform additional clinical assays in the modelsThis visualization is under active development and will be available in PhenoTips in the next few weeks.Also, one can drill down on the right side to see more specific annotations as to how, for example, the cerebrum is abnormal.
  21. Lewy bodies, a hallmark of this disease, seems to only manifest phenotypes from a few of the genes, resulting in cerebral abnormalities and other CNS morphological changes in the mice. Lewy Bodies maps to these LCS (genes):Abnormality of the cerebrum (Snca, Slc6a3, Mfn2, Cox8a)Morphological abnormality of the central nervous system (Uchl1, Uchl3)
  22. If we take a closer look at Bradykinesia, and the double-mutant mouse in Uchl1 and Uchl3 (Uchl1<gad>/Uchl1<gad>; Uchl3<tm1Tilg>/Uchl3<tm1Tilg>), Here, we examine the mouse phenotypes in our model that are related to Bradykinesia. There are three recorded phenotypes for this mouse that show some similarity.
  23. Each model organism has a different suite of phenotypes that are examined, because different models are used to explore different types of biological function and malfunction. By using a diversity of model systems, we have the potential to identify candidates based on partial overlaps with the patient phenotype profile by looking at different models with mutations in potential candidates or related via interactions, co-expression, genomic regulatory region, etc.
  24. Bare Lymphocyte Syndrome Type 1 Protein-Interaction NetworkThe protein-interaction network associated with bare lymphocyte syndrome type 1, which comprises the genes TAP1, TAP2, and TAPBP. Each of these genes is shown in red. The DI and SP methods additionally identified the unrelated genes PSMB8 and PSMB9 (shown in yellow) as potential disease genes because they each have an interaction with one of the true disease genes. The RWR method ranks the true disease genes higher because each true disease gene has interactions with two other family members and because there is a dense net of proteins that connect the disease genes via paths with two interactions.
  25. Phenoviz is a new graphviz plugin that can be used as a standalone app for Windows, Mac, or Linux. The user uploads a list of CNVs detected by Array CGH (SNP Chips, or even genome sequence data would also work as a starting point, but the program expects a simple list). You also enter a list of the HPO terms observed in the patient. The application then tries to find “matches” based on the single gene disorders (human – HPO annotations) or the mouse models (mainly knockouts, MP annotations from MGI) or fish models (ZFIN E/Q annotations). This is being in the Charite Array CGH diagnostics service to help with interpretation of CNVs. Subjectively, the tool helps you to quickly find good candidates in order to write reports. The program also picks out the best matching CNV in case the user enters several (a typical array CGH finding in our lab has up to 50 CNVs, of which 2-5 are not found in databases of common variants like DGV).
  26. There are a lot of people who have contributed to this work over many years.