Haendel clingenetics.3.14.14

Expanding the Clinical
Phenotype Space with
Semantics and Model Systems
Melissa Haendel
March 14th, 2014
Updates in Clinical Genetics 2014

Outline
 Issues in candidate prioritization
 Computational techniques for comparing
phenotypes
 Undiagnosed Disease Program semantic
phenotyping
 Minimum phenotype requirements
 Tools leveraging phenotypes

The Challenge: Interpretation of
Disease Candidates
?
 What’s in the box?
 How are
candidates
identified?
 How do they
compare?
Prioritized
Candidates, Models,
functional validation
M1
M2
M3
M4
...
Phenotypes
P1
P2
P3
…
Genotype info
G1
G2
G3
G4
…
Pathogenicity, frequency, p
rotein interactions, gene
expression, gene
networks, epigenomics, m
etabolomics….

Candidate gene prioritization
Phenot ypic inf or mat ionGenet ic inf or mat ion
gene/ gene pr oduct Inf o
Phenotypes
collected for
individual patients
Sequences from an
individual,family,or
related group
Candidate interpretation
Human sequence reference
sequences (e.g.reference
sequence,1K genome data,
genomic location)
Community phenotype data (e.g.
literature MODS,KOMP2,OMIM,
EHRs,GWAS,ClinVar,disease
specific repositories,etc.)
Pathway
Functional (GO)
Gene
expression,
OMICS data
Protein-Protein
Interactions
Enrichment analysis
(e.g.GATACA,Galaxy)
Combined variant +
phenotype candidate
reporting(e.g.Exomizer)
BiomedicalKnowledgeIndividual'sInformation
Phenotypic comparison
methods
Variant calling
(e.g.GATK)
Pathogenicity
/Impact
calling (e.g.
VAAST,SIFT)
Orthologs
Network module analysis

B6.Cg-Alms1foz/fox/J
increased weight,
adipose tissue volume,
glucose homeostasis altered
ALSM1(NM_015120.4)
[c.10775delC] + [-]
GENOTYPE
PHENOTYPE
obesity,
diabetes mellitus,
insulin resistance
increased food
intake, hyperglycemia,
insulin resistance
kcnj11c14/c14; insrt143/+(AB)
Models recapitulate various
phenotypic aspects of
disease
?

OMIM Query # Records
“large bone” 785
“enlarged bone” 156
“big bone” 16
“huge bones” 4
“massive bones” 28
“hyperplastic bones” 12
“hyperplastic bone” 40
“bone hyperplasia” 134
“increased bone growth” 612
Searching for phenotypes using
text alone is insufficient

Problem: Clinical and model
phenotypes are described differently

“Expanding” the phenotypic coverage
of the human genome
0%
20%
40%
60%
80%
100%
%humancodinggenes
OMIM
OMIM+GWA
S
Ortholog only
Human+Ortholog
Human only
Five model organisms (mouse, zebrafish, fly, yeast, rat)
provide almost 80% phenotypic coverage of the human
genome

How can we take advantage
this model organism
phenotype data?

Using ontologies to compare phenotypes
across species
Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009).
Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS
Biol, 7(11). doi:10.1371/journal.pbio.1000247

What is an ontology?
A set of logically defined, inter-related terms
used to annotate data
Use of common or logically related terms across
databases enables integration
Relationships between terms allow annotations to
be grouped in scientifically meaningful ways
Reasoning software enables computation of inferred
knowledge
Groups of annotations can be compared using
semantic similarity algorithms

An ontology provides the logical
basis of classification
Any sense organ that functions in the
detection of smell is an olfactory sense
organ
sense organ
capable_of
some
detection of
smell
olfactory
sense
organ

nose
sense organ
nose
capable_of some
detection of smell
sense organ
capable_of
some
detection of
smell
olfactory
sense
organ
nose
=> These are necessary and sufficient conditions
Classifying

Human Phenotype Ontology
Used to annotate:
• Patients
• Disorders
• Genotypes
• Genes
• Sequence variants
In human
Reduced pancreatic
beta cells
Abnormality of
pancreatic islet
cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Abnormality of exocrine
pancreas physiology
Köhler et al. The Human Phenotype Ontology project: linking molecular biology and
disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.

Mammalian Phenotype Ontology
Smith et al. (2005). The Mammalian Phenotype Ontology as a
tool for annotating, analyzing and comparing phenotypic
information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7
Used to annotate and
query:
• Genotypes
• Alleles
• Genes
In mice
abnormal
pancreatic
beta cell
mass
abnormal
pancreatic
beta cell
morphology
abnormal
pancreatic islet
morphology
abnormal
endocrine
pancreas
morphology
abnormal
pancreatic
beta cell
differentiation
abnormal
pancreatic
alpha cell
morphology
abnormal
pancreatic
alpha cell
differentiation
abnormal
pancreatic
alpha cell
number

Post-composed models of
phenotype annotation
Entity
Anatomy: head
Anatomy: heart
Anatomy: ventral mandibular arch
Gene Ontology: swim bladder inflation
Quality
Small size
Edematous
Thick
Arrested

A human phenotype example
Abnormality
of the eye
Vitreous
hemorrhage
Abnormal
eye
morphology
Abnormality of the
cardiovascular system
Abnormal
eye
physiology
Hemorrhage
of the eye
Internal
hemorrhage
Abnormality
of the globe
Abnormality of
blood circulation

lung
lung
lobular organ
parenchymatous
organ
solid organ
pleural sac
thoracic
cavity organ
thoracic
cavity
abnormal lung
morphology
abnormal respiratory
system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary
acinus morphology
abnormal pulmonary
alveolus morphology
lung
alveolus
organ system
respiratory
system
Lower
respiratory
tract
alveolar sac
pulmonary
acinus
organ system
respiratory
system
Human development
lung
lung bud
respiratory
primordium
pharyngeal region
Problem: Data silos
develops_from
part_of
is_a (SubClassOf)
surrounded_by

Solution: bridging semantics
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative
multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
anatomical
structure
endoderm of
forgut
lung bud
lung
respiration organ
organ
foregut
alveolus
alveolus of lung
organ part
FMA:lung
MA:lung
endoderm
GO: respiratory
gaseous exchange
MA:lung
alveolus
FMA:
pulmonary
alveolus
is_a (taxon equivalent)
develops_from
part_of
is_a (SubClassOf)
capable_of
NCBITaxon: Mammalia
EHDAA:
lung bud
only_in_taxon
pulmonary acinus
alveolar sac
lung primordium
swim bladder
respiratory
primordium
NCBITaxon:
Actinopterygii
Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with
gene annotations for biomedical research F1000Research 2014, 2:30

Phenotype representation requires
more than “phenotype ontologies”
glucose
metabolism
(GO:0006006
)
Gene/protein
function data
glucose
(CHEBI:172
34)
Metabolomics, t
oxicogenomics
data
Disease &
phenotype
data
type II
diabetes
mellitus
(DOID:9352)
pyruvate
(CHEBI:153
61)
Disease Gene Ontology Chemical
pancreatic
beta cell
(CL:0000169)
transcriptomic
data
Cell

OWLsim: Phenotype similarity
across patients or organisms
Unstable
posture
Constipation
Neuronal loss in
Substantia Nigra
Shufﬂing gait
Resting tremors
REM disorder
Hyposmia
poor rotarod
performance
decreased gut
peristalsis
axon
degeneration
decreased
stride length
sterotypic
behavior
abnormal
EEG
failure to ﬁnd
food
abnormal
coordination
abnormal
digestive
physiology
CNS neuron
degeneration
abnormal
locomotion
abnormal
motor function
sleep
disturbance
abnormal
olfaction
https://code.google.com/p/owltools/wiki/OwlSim

General exome analysis
Single Exome
Remove off-target and common
variants, filter on predicted
deleteriousness, candidate gene
strategies
Prioritize based on known
genes, allele frequency, and
pathogenicity
Homozygous recessive, X-
linked, De novo (if trio)

Undiagnosed Disease Program
exome analysis
Family exome data
Prioritize based on alignment quality, allele
frequency, predicted deleterious, and PubMed
Filter using SNP chip data,
Mendelian models of inheritance
and Population frequency

exome analysis
Recessive, De novo filters
Remove off-target, common
variants, and variants not in known
disease causing genes
Zemojtelet al., manuscript submittedhttp://compbio.charite.de/PhenIX/

Remove off-target and common
variants
Recessive, De novo filters
https://www.sanger.ac.uk/resourc
es/databases/exomiser/
Robinson et al.
http://genome.cshlp.org/content/early/2013/10/2
Exomiser exome analysis

Current UDP analysis with
semantic phenotyping
Family Exome Data
Combined
Score
Phenotype
Data
Filter using SNP chip
data, Mendelian models of
inheritance, and population
frequency

Benchmarking
1092 unaffected
exomes 28,516 disease
associated variants
100,000
simulated
exomes
 Annotate variants
 Remove off-target, syn and common(>1% MAF)
variants (plus optional inheritance model
filtering)
 Prioritize based on combined score

0
10
20
30
40
50
60
70
80
90
100
All diseases Autosomal
Dominant
Autosomal
Recessive (hom)
Autosomal
Recessive
(compound het)
%exomeswithdiseasegeneas
tophit
Variant
Phenotypic relevance
PHIVE
Phenotype and variant data synergistically
improve exome interpretation

Results
 Correct gene as top scoring hit in 68.3% of exomes out
of an average of 272 post-ﬁltering candidate genes
 Improvement of between 1.8 and 5.1 fold in the
percentage of candidate genes correctly ranked in ﬁrst
place compared to just using pathogenicity and
frequency data
 Shows utility of structured phenotype data for
computational analysis

UDP Experiment
UDP Diploid
Aligned Cohort
VCF file
18 families
Phenotype
profiles
Mendelian filtered
files (per family)
Mendelian
Filters
Exomiser
PhenIX
Phenotype only
VCF files with
phenotype and variant
scores (per family)

Top de novo candidates for
patient 2543
Patient Exomiser Phenotype only PhenIX
UDP2543
STIM1, CYP2D6,
MUC5B
ITGA7, PLEC,
STIM1, PTGS1,
TTN
STIM1, RB1,
DLEC1, CHRNB4,
MUC5B, REPIN1,
NBPF8, GPRIN3,
TMEFF1, FLT3LG,
OSM, FZD10,
MUC12
Gene Variant MAF(ESP or
1000g)
Consequence Predicted
pathogenicity:
SIFT, PolyPhen,
MutTaster (0-1)
STIM1 chr11:g.4045175A>T [0/1] 0% p.I115T 1

UDP2543: phenotypic similarity
Patient Stim1 het mouse OMIM:612783
(IMMUNODEFICIENCY
10) - hom STIM1
mutations
OMIM:160565
(MYOPATHY, TUBULAR
AGGREGATE) - het STIM1
mutations
Impaired platelet
aggregation
abnormal platelet
activation
Thrombocytopenia
Thrombocytopenia decreased platelet
cell number
Thrombocytopenia
Myopathy Myopathy Myopathy
Generalized
hypotonia
Muscular hypotonia Proximal muscle weakness
Petechiae increased
bleeding time
Autoimmune hemolytic
anemia
Delayed gross motor
development
Epistaxis
increased
bleeding time
Gower sign

Suspected genetic
disease
DRG sequencing
Deep phenotyping
Top ranked
candidates
Clinical rounds
Exclude candidate
gene
Sanger validation
Cosegregation
studies
Diagnosis
Fails
Passes
Inconsistent
Consistent
Reconsider short list
Choose best
candidate
Variant Analysis
HGMD
MAF (dbSNP, ESP)
ClinVar
Annotation sources:
Predicted pathogenicity
Variant class
Location in DRG target region
Prediction criteria:
Computational Phenotype Analysis
HPO
Semantic similarity
Mode of inheritance
OMIM, Orphanet,
MGI...
Annotation sources:
Ontology:
Prediction criteria: MP
Proposed workflow for undiagnosed
diseases

What constitutes an
adequate phenotype
annotation for an
undiagnosed patient?

Defining minimum phenotype
standard:
1. Is the annotation specificity similar to or better than the
corpus of available phenotype data?
2. Is the number of annotations/patient similar or better?
3. How does the ontology and annotation set differ across
anatomical systems in terms of granularity? Does this
change specificity requirements for phenotypic profiles?
4. How does use of NOT annotations help further specify
the uniqueness of an undiagnosed patient?
5. How do onset, temporal ordering, and severity affect
specificity?

UDP phenotype annotation
metrics
UDP annotations have a similar Information content (IC) and a
larger number of average annotations per disease/patient

Anatomical annotation distribution in
the corpus
Nervous system, skeletal system, and immune system is highest =>
these categories require greater specificity and numbers of annotations

Annotation specificity meter
What about common traits, like blue eyes or acne?

Making the patient phenotype profiles
as good as can be
Total requests from UDP 614 Examples
Number of requests assigned to
HPO terms 423 Chronic limb pain -> limb pain
Number of terms that need
consideration by UDP 145
Expressive language -> delay?
Increase? Abnormal?
Number of requests that belong
in other parts of the patient
record 68
Abnormal aCGH 12q21.1-
12q.2 (662 kb duplication)
paternal origin -> move to
genotype information portion
of the record
It is a community effort to contribute requests to the ontologies and
quality profiling helps make our tools work better for everyone

Limitations and ongoing work
 Adding negation to the algorithm
 Temporal ordering of phenotypes
 Leveraging severity, expressivity, and
penetrance data

Additional tools leveraging
structured phenotype data

The Monarch system
http://monarchinitiative.org

Monarch phenotype data
Species Source Unique
genotypes/va
riants
Disease/phen
otype
associations
Mouse MGI 53,573 406,618
Zebrafish ZFIN 14,703 75,698
C. elegans Wormbase 116,106 411,154
Fruit fly Flybase 98,596 265,329
Human OMIM 26,372 27,798
Human Orphanet 2,872 5,095
Human ClinVar 62,437 178,424
Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD;
Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions;
Drugbank; AutDB; Allen Brain …157 sources to date
Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse

ModelCompare: How do the models
recapitulate the disease?
Late-onset Parkinson’s
Phenotypes Mouse Phenotypes

Slc6a3
Dbh
Tyrosine
metabolism
Slc6a3
Slc18a2
Uchl1
Uchl3
Snca
Mfn2
Cx IV
Cox8a
Th
Late-onset
Parkinson’s
Phenotypes
(subset)
Bradykinesia
Depression
Dysphagia
Lewy bodies
Network phenotype distribution

Slc6a3
Dbh
Tyrosine
metabolism
Slc6a3
Slc18a2
Uchl1
Uchl3
Snca
Mfn2
Cx IV
Cox8a
Th
Late-onset
Parkinson’s
Phenotypes
(subset)
Bradykinesia
Depression
Dysphagia
Lewy bodies
Abnormal gait
ataxia
paralysis
Bradykinesia
Abnormal locomotion
Abnormality of
central motor function
Phenotypes in common

Finding collaborators for
functional validation
Patient
Phenotype profile
Phenotyping
experts

Exome Walker: Network based exploration
of phenotypically similar diseases
http://compbio.charite.de/ExomeWalker/
Walking the interactome for prioritization of candidate disease genes.
Am J Hum Genet. 2008 Apr;82(4):949-58. doi: 10.1016/j.ajhg.2008.02.013.
Bare Lymphocyte Syndrome Type 1 Protein-Interaction Network
 Exploits vicinity in the protein interaction network between phenotypically related
diseases and uses this to rank exome candidates
 Large boost in rankings of candidate genes using 250 disease gene-families
 Prototype version online, manuscript in preparation

PhenoViz: Integrate all human, mouse, and
fish data to understand CNVs
Desktop application
for differential
diagnostics in CNVs
 Explain manifestations of CNV diseases based on genes
contained in CNV
E.g., Supravalcular aortic stenosis in Williams syndrome can be
explained by haploinsufficiency for elastin
 Double the number of explanations using model data
Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72

Conclusions
 Cross-species phenotype data can be used to
perform semantic similarity
 Structured phenotype data for rare and
undiagnosed disease patients can aid
candidate evaluation
 We are experimenting with these methods for
UDP patient phenotypes to aid candidate
prioritization, identify models, explore
mechanisms, and find collaborators

NIH-UDP
William Bone
Murat Sincan
David Adams
Amanda Links
David Draper
Neal Boerkoel
Cyndi Tifft
Bill Gahl
OHSU
Nicole Vasilesky
Matt Brush
Lawrence Berkeley
Nicole Washington
Suzanna Lewis
Chris Mungall
UCSD
Amarnath Gupta
Jeff Grethe
Anita Bandrowski
Maryann Martone
U of Pitt
Chuck Boromeo
Jeremy Espino
Harry Hochheiser
Acknowledgments
Sanger
Anika Oehlrich
Jules Jacobson
Damian Smedley
Toronto
Marta Girdea
Sergiu Dumitriu
Mike Brudno
JAX
Cynthia Smith
Charité
Sebastian Kohler
Sandra Doelken
Sebastian Bauer
Peter Robinson
Funding:
NIH Office of Director: 1R24OD011883
NIH-UDP: HHSN268201300036C

Haendel clingenetics.3.14.14

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (16)

Similaire à Haendel clingenetics.3.14.14

Similaire à Haendel clingenetics.3.14.14 (20)

Plus de mhaendel

Plus de mhaendel (20)

Haendel clingenetics.3.14.14

Notes de l'éditeur