Presented at the II International Summer School for Rare Disease and Orphan Drug Registries, September 15-19, 2014, Organized by the National Centre for Rare Diseases
Istituto Superiore di Sanità (ISS), Rome, Italy.
Note the extensive contribution by many consortium members and partners listed in the acknowledgements slide.
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
The Application of the Human Phenotype Ontology
1. Melissa
Haendel
Sept 19th,
2014
THE APPLICATION OF
THE HUMAN
PHENOTYPE
ONTOLOGY
II International Summer School
RARE DISEASE AND ORPHAN
DRUG REGISTRIES
2. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
5. THE CONSTELLATION OF PHENOTYPES SIGNIFIES THE DISEASE –
A ‘PROFILE’
http://www.learn.ppdictionary.com/prenatal_development_2.htm
http://www.pyroenergen.com/articles07/do
wns-syndrome.htm
http://www.theguardian.com/commentisfree/2009/oct/27/downs-syndrome-increase-terminations
http://anthro.palomar.edu/abnormal/abnormal_4.htm
7. SEARCHING FOR PHENOTYPES USING
TEXT ALONE IS INSUFFICIENT
OMIM Query # Records
“large bone” 785
“enlarged bone” 156
“big bone” 16
“huge bones” 4
“massive bones” 28
“hyperplastic bones” 12
“hyperplastic bone” 40
“bone hyperplasia” 134
“increased bone growth” 612
8. TERMS SHOULD BE WELL DEFINED SO
THEY GET USED PROPERLY
We need to capture synonyms and use unique
labels
9. SO WHAT IS THE PROBLEM?
Obviously similar phenotype descriptions mean the same
thing to you, but not to a computer:
generalized amyotrophy
generalized muscle, atrophy
muscular atrophy, generalized
Many publications have little information about the actual
phenotypic features seen in patients with particular mutations
Databases cannot talk to one another about phenotypes
10. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
11. ONTOLOGIES CAN HELP.
A controlled vocabulary of logically defined,
inter-related terms used to annotate data
Use of common or logically related terms across
databases enables integration
Relationships between terms allow annotations
to be grouped in scientifically meaningful ways
Reasoning software enables computation of
inferred knowledge
Some well known ontologies are SNOMED-CT,
Foundational Model of Anatomy, Gene Ontology,
Linnean Taxonomy of species
13. HUMAN PHENOTYPE ONTOLOGY
Used to annotate:
• Patients
• Disorders
• Genotypes
• Genes
• Sequence variants
Abnormality of
pancreatic islet
cells
Reduced pancreatic
beta cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Abnormality of exocrine
pancreas physiology
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Mappings to SNOMED-CT,
UMLS, MeSH, ICD, etc.
Köhler et al. The Human Phenotype Ontology project: linking molecular biology and
disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
14. USING A CONTROLLED VOCABULARY TO LINK
PHENOTYPES TO DISEASES
Failure to
Thrive
Chromosome
21 Trisomy
Flat Head
Abnormal
Ears
Umbilical
Hernia
Broad Hands
15. SURVEY OF ANNOTATIONS IN DISEASE CORPUS
7000+ diseases OMIM +
Orphanet + Decipher
(ClinVar coming soon)
111,000+ annotations
Phenotype annotations are unevenly distributed across
different anatomical systems
16. HOW DOES HPO RELATE TO OTHER
CLINICAL VOCABULARIES?
Winnenburg and Bodenreider, ISMB PhenoDay, 2014
17. LOGICAL TERM DEFINITION
Definitions are of the following Genus-Differentia form:
X = a Y which has one or more differentiating characteristics.
where X is the is_a parent of Y.
Definition of a cylinder:
Surface formed by the set of lines
perpendicular to a plane, which pass
through a given circle in that plane.
is_a is_a
Definition: Blue cylinder = Cylinder that has color blue.
Definition: Red cylinder = Cylinder that has color red.
18. ABOUT REASONERS
A piece of software able to infer logical consequences
from a set of asserted facts or axioms.
They are used to check the logical consistency of the
ontologies and to extend the ontologies with "inferred"
facts or axioms
For example, a reasoner would infer:
Major premise: All mortals die.
Minor premise: Some men are mortals.
Conclusion: Some men die.
19. PHENOTYPES CAN BE CLASSIFIED IN
MULTIPLE WAYS
Abnormality
of the eye
Abnormal eye
morphology
Vitreous
hemorrhage
Abnormality of the
cardiovascular
system
Abnormal eye
physiology
Hemorrhage
of the eye
Internal
hemorrhage
Abnormality
of the globe
Abnormality of
blood circulation
20. PHENOTYPE MATCHING
Patient Phenotype
Resting tremors
REM disorder
Unstable posture
Myopia
Neuronal loss in
Substantia Nigra
Phenotype of known
variant
Resting tremors
REM disorder
Unstable posture
Nystagmus
Neuronal loss in
Substantia Niagra
21. abn. of
the eye
abn. of the
ocular region
abn. of the
eyelid abn. of globe
localization or size
abn. of the
palpebral fissures
hypertelorism
downward slanting
palpebral fissures
Noonan Syndrome
a)
Syndrome term
Query term
Overlap between
query and disease
abn. of
the eye
abn. of the
ocular region
abn. of the
eyelid abn. of globe
localization or size
abn. of the
palpebral fissures
downward slanting
palpebral fissures
Opitz Syndrome
b)
telecanthus
hypertelorism
c)
Noonan Syndrome
downward slanting
palpebral fissures
hypertelorism
3.78
3.05
Opitz Syndrome
hypertelorism
telecanthus
3.05
Query (Q)
hypertelorism
2.45
downward slanting
palpebral fissures
(IC of abn. of the eyelid)
sim(Q,Noonan) = 3.78 + 3.05
sim(Q,Opitz) =2.45 + 3.05
2
= 2.75
2
= 3.42
23. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
24. THE YET-TO-BE DIAGNOSED PATIENT
Known disorders not recognized during
prior evaluations?
Atypical presentation of known
disorders?
Combinations of several disorders?
Novel, unreported disorder?
25. EXOME ANALYSIS
Remove off-target, common variants,
and variants not in known disease
causing genes
Recessive, de novo filters
http://compbio.charite.de/PhenIX/
Target panel of 2741 known
Mendelian disease genes
Compare
phenotype
profiles using
data from:
HGMD, Clinvar,
OMIM, Orphanet
Zemojtel et al. Sci Transl Med 3 September 2014:
Vol. 6, Issue 252, p.252ra123
26. PHENIX PERFORMANCE TESTING
Figure removed due to restrictions. Please see the paper:
http://stm.sciencemag.org/content/6/252/252ra123.full
Simulated datasets created by spiking DAG panel generated VCF file with the causative
mutation removed
27. CONTROL PATIENTS WITH KNOWN
MUTATIONS
Inheritance Gene Average
Rank
AD ACVR1, ATL1, BRCA1, BRCA2, CHD7 (4),
CLCN7, COL1A1, COL2A1, EXT1, FGFR2 (2),
FGFR3, GDF5, KCNQ1, MLH1 (2), MLL2/KMT2D,
MSH2, MSH6, MYBPC3, NF1 (6), P63, PTCH1,
PTH1R (2), PTPN11 (2), SCN1A, SOS1, TRPS1,
TSC1, WNT10A
1.7
AR ATM, ATP6V0A2, CLCN1 (2), LRP5, PYCR1,
SLC39A4
5
X EFNB1, MECP2 (2), DMD, PHF6 1.8
28. WORKFLOW FOR CLINICAL EXOME
Suspected genetic
disease
DRG sequencing
Deep phenotyping
ANALYSIS
Top ranked
candidates
Clinical rounds
Exclude candidate
gene
Sanger validation
Cosegregation
studies
Fails
Diagnosis
Passes
Inconsistent
Consistent
Reconsider short list
Choose best
candidate
Variant Analysis
Annotation sources:
HGMD
MAF (dbSNP, ESP)
ClinVar
Prediction criteria:
Predicted pathogenicity
Variant class
Location in DRG target region
Computational Phenotype Analysis
HPO
Annotation sources:
OMIM, Orphanet,
MGI...
Semantic similarity
Mode of inheritance
Ontology:
Prediction criteria: MP
29. PHENIX HELPED DIAGNOSE 11/40 PATIENTS
global developmental delay (HP:0001263)
delayed speech and language development (HP:0000750)
motor delay (HP:0001270)
proportionate short stature (HP:0003508)
microcephaly (HP:0000252)
feeding difficulties (HP:0011968)
congenital megaloureter (HP:0008676)
cone-shaped epiphysis of the phalanges of the hand (HP:0010230)
sacral dimple (HP:0000960)
hyperpigmentated/hypopigmentated macules (HP:0007441)
hypertelorism (HP:0000316)
abnormality of the midface (HP:0000309)
flat nose (HP:0000457)
thick lower lip vermilion (HP:0000179)
thick upper lip vermilion (HP:0000215)
full cheeks (HP:0000293)
short neck (HP:0000470)
31. SKELETOME PATIENT ARCHIVE
Integration with the HPO, Orphanet, and Monarch Initiative
Automated phenotyping from clinical summaries
Collaborative diagnosis
University of Queensland, University of Sydney
32. THE FACES OF RARE DISEASES
Screening and
diagnosis
Treatment moni toring
Surgical planning and
audi t
Genotype-phenotype
correlation
Cross-species
comparisons
Face to text conversion
for text mining
mm
Non-invasive, non-irradiating deeply precise
3D facial analysis
University of Western Australia
33. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
35. USING ONTOLOGIES IN THE CLINIC
Ontologies are large (HPO has > 10,000 terms) and
difficult to navigate
Mapping data to an ontology post-visit is time
consuming and prone to error
Best time to phenotype using ontologies is during the
patient visit
Goals of PhenoTips
Make deep phenotyping simple
Make it “faster than paper”
36. PhenomeCentral is a Matchmaker
Lets you know about other similar patients
Lets you easily connect with other users
Each Patient Record can be:
Public – Anyone can see the record
Private – Only specified users/consortia can see
the record
Matchable – The record cannot be seen, but can
be “discovered” by users who submit similar
patients
37. STEP 1: ADD PATIENT
Can use the interface
built into
PhenomeCentral
Can export data
directly from a local
PhenoTips instance
Add a vcf file (or list
of genes)
Set each record as
Private, Public or
Matchable
40. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
41. HOW MUCH PHENOTYPING IS ENOUGH?
How many annotations…?
How many different categories?
How many within each?
42. Not everything that counts can be counted and not everything that can be
counted counts -Albert Einstein
43. METHOD: DERIVE BY CATEGORY
REMOVAL
Remove annotations that are
subclasses of a single high-level node
Repeat for each 1° subclass
47. SEMANTIC SIMILARITY ALGORITHMS ARE
ROBUST IN THE FACE OF MISSING
INFORMATION
Similarity of Derived Disease to Original Derived Disease Profile Rank
(avg) 92% of derived diseases are most-similar
to original disease
Severity of impact follows proportion of
phenotype
48. METHOD: DERIVE BY LIFTING
Iteratively map each class to their direct
superclass(es)
Keep only leaf nodes
49. SEMANTIC SIMILARITY ALGORITHMS ARE
SENSITIVE TO SPECIFICITY OF INFORMATION
Similarity of Derived Disease to Original Derived Disease Profile Rank
Severity of impact increases with more-general
phenotypes
51. OUTLINE
Why phenotyping is hard
About Ontologies
Diagnosing known diseases
Getting the phenotype data
How much phenotyping is enough?
Model organism data for undiagnosed
diseases
52. WHAT TO DO WHEN WE CAN’T DIAGNOSE
WITH A KNOWN DISEASE?
54. HOW MUCH PHENOTYPE DATA?
Human genes have poor phenotype coverage
GWAS
+
ClinVar
+
OMIM
55. HOW MUCH PHENOTYPE DATA?
Human genes have poor phenotype coverage
What else can we leverage?
GWAS
+
ClinVar
+
OMIM
56. HOW MUCH PHENOTYPE DATA?
Human genes have poor phenotype coverage
What else can we leverage? …animal models
Orthology via PANTHER v9
57. COMBINED, HUMAN AND MODEL PHENOTYPES
CAN BE LINKED TO >75% HUMAN GENES
Orthology via PANTHER v9
58. EACH MODEL CONTRIBUTES DIFFERENT
PHENOTYPES
Data from MGI, ZFIN, & HPO, reasoned over with cross-species phenotype
ontology
https://code.google.com/p/phenotype-ontologies/
60. SOLUTION: BRIDGING SEMANTICS
anatomical
structure
endoderm of
forgut
lung bud
respiration organ
lung
organ
foregut
is_a (SubClassOf)
part_of
develops_from
capable_of
alveolus
organ part
alveolus of lung
FMA:lung
MA:lung
endoderm
GO: respiratory
gaseous exchange
MA:lung
alveolus
FMA:
pulmonary
alveolus
is_a (taxon equivalent)
NCBITaxon: Mammalia
EHDAA:
lung bud
only_in_taxon
pulmonary
acinus
alveolar sac
lung primordium
swim bladder
respiratory
primordium
NCBITaxon:
Actinopterygii
Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative
multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
61. PHENOTYPE REPRESENTATION REQUIRES
MORE THAN “PHENOT YPE ONTOLOGIES
Disease Gene Ontology Chemical
glucose
metabolism
(GO:0006006)
Gene/protein
function data
glucose
(CHEBI:172
34)
Metabolomics,
toxicogenomics
data
type II
diabetes
mellitus
(DOID:9352)
Disease &
phenotype
data
pyruvate
(CHEBI:153
61)
Cell
pancreatic
beta cell
(CL:0000169)
transcriptomic
data
62. MODELS BASED ON PHENOTYPIC
SIMILARITY
Washington, Haendel, et al. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype
Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247
63. OWLSIM: PHENOTYPE SIMILARITY
ACROSS PATIENTS OR ORGANISMS
find
Resting tremors
REM disorder
Shuffling gait
Unstable
posture
Neuronal loss in
Substant ia Nigra
Const ipat ion
Hyposmia
sterotypic
behavior
abnormal
EEG
decreased
stride length
poor rotarod
performance
axon
degenerat ion
decreased gut
peristalsis
failure to find
food
abnormal
motor function
sleep
disturbance
abnormal
locomotion
abnormal
coordination
CNS neuron
degenerat ion
abnormal
digestive
physiology
abnormal
olfaction
https://code.google.com/p/owltools/wiki/OwlSim
64. MONARCH PHENOTYPE DATA
Species Data source Genes Genotypes Variants Phenotype
annotations
Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS;
Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources
to date
Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse
Diseases
mouse MGI 13,433 59,087 34,895 271,621
fish ZFIN 7,612 25,588 17,244 81,406
fly Flybase 27,951 91,096 108,348 267,900
worm Wormbase 23,379 15,796 10,944 543,874
human HPOA 112,602 7,401
human OMIM 2,970 4,437 3,651
human ClinVar 3,215 100,523 445,241 4,056
human KEGG 2,509 3,927 1,159
human ORPHANET 3,113 5,690 3,064
human CTD 7,414 23,320 4,912
65. EXOMISER
METHOD Cohort VCF
file
Homo rec
De novo
dom
Compound
het
X-linked
Exomiser
Filters:
Mendelian
Frequency
Candidates
HP
https://www.sanger.ac.uk/resources/databa
ses/exomiser/query/exomiser2
66. EXOMISER RESULTS ON NIH UNDIAGNOSED
DISEASE PROGRAM PATIENTS
9 previously diagnosed families
Identified causative variants with a
rank of at least 7/408 potential
variants
21 families without identified
disorders
We have now prioritized variants in
STIM1, ATP13A2, PANK2, and CSF1R
in 5 different families (2 STIM1
families)
Bone et al., submitted
67. UDP_2731
Patient
phenotypes Sh3kbp1 tm1Ivdi -/ -
Gait apraxia
Spasticity
Thyroid
stimulating
hormone excess
Behavioural/
Psychiatric
Abnormality
hyperactivity
hyperactivity
increased
dopamine level
increased
exploration in new
environment
abnormal
locomotor
behavior
Abnormal
voluntary
movement
Abnormality of
the endocrine
system
Behavioral
abnormality
70. PHENOVIZ: INTEGRATE ALL HUMAN,
MOUSE, AND FISH DATA TO UNDERSTAND
CNVS
Desktop application
for differential
diagnostics in CNVs
Explain manifestations of CNV diseases based on genes
contained in CNV
E.g., Supravalcular aortic stenosis in Williams syndrome can
be explained by haploinsufficiency for elastin
Double the number of explanations using model data
Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72
72. WHO USES THE HPO?
Bayés, Àlex, et al. Nature
neuroscience 2011
Castellano, Sergi, et al. PNAS
2014
Corpas, Manuel, et al. " Current
Protocols in Human Genetics
2012
Sifrim, Alejandro, et al. Nature
methods 2013
Lappalainen, Ilkka, et al.
Nucleic acids research 2013
Firth, Helen V., and Caroline F.
Wright. Developmental
Medicine & Child Neurology
2011
Many more…
73. ADVANTAGES OF HPO
Widely used, flexible, freely available, and community
supported resource
Prioritization of candidate variants through tools such as
PhenIX and Exomizer, and others
Extensive links to model organism ontologies, allowing
selection of optimal models for wet-lab validation and
research, and collaborators
Intuitive clinical interfaces built into tools such as
PhenoTips, Certagenia, and others
Ability to easily share data with key international projects
(Decipher/DDD, RD-Connect, PhenomeCentral,
Matchmaker Exchange, etc.)
74. LIMITATIONS
Quantitative vs. qualitative – Much of clinical data is
quantitative lab data with reference standards. It is possible
to convert based on ±3 SD, but no way to record the
reference measure/population yet.
Temporal presentation – ontologies can support temporal
ordering, but data capture tools don’t yet capture this and
the comparison algorithms don’t yet take it into account
Severity – semantic encoding is available, but simple in
comparison to phenotype-specific measures
Emerging ontology – some areas have poor coverage, such
as nervous system, behavior, and imaging results. Need to
represent the assays in these contexts.
75. ACKNOWLEDGMENTS
NIH-UDP
William Bone
Murat Sincan
David Adams
Amanda Links
David Draper
Joie Davis
Neal Boerkoel
Cyndi Tif f t
Bill Gahl
OHSU
Nicole Vasilesky
Matt Brush
Bryan Laraway
Shahim Essaid
Lawrence Berkeley
Nicole Washington
Suzanna Lewis
Chris Mungall
UCSD
Amarnath Gupta
Jef f Grethe
Anita Bandrowski
Maryann Martone
U of Pitt
Chuck Boromeo
Jeremy Espino
Becky Boes
Harry Hochheiser
Sanger
Anika Oehl r ich
Jules Jacobson
Damian Smedley
Toronto
Mar ta Gi rdea
Sergiu Dumi t r iu
Heather Trang
Mike Brudno
JAX
Cynthia Smi th
Charité
Sebast ian Kohler
Sandra Doelken
Sebast ian Bauer
Peter Robinson
Funding:
NIH Office of Director: 1R24OD011883
NIH-UDP: HHSN268201300036C, HHSN268201400093P
76. WHERE TO GET HPO, AND HOW TO
REQUEST NEW CONTENT
We need you!
Browse in the following places:
http://www.human-phenotype-ontology.org/
http://purl.bioontology.org/ontology/HP
Get the file:
http://purl.obolibrary.org/obo/hp.owl
Request content:
https://sourceforge.net/p/obo/human-phenotype-requests/new/
More Documentation:
https://code.google.com/p/phenotype-ontologies/
Notes de l'éditeur
A good example of this can be seen here. So the average person has had enough experience with Down’s Syndrome that they are likely able to notice that all three of these patients have it. However, if you asked them to describe this phenotype in short phrases that can be agreed upon by the majority of people, can be used to identify the disorder, and ideally can be easily used by a computer, they are going to have a difficult time. It is a collection of phenotypes “Together “ that define a disease and it is difficult for someone who does have the proper training to parse this information out.
Down’s Syndrome good example
Average person can identify Down’s , but can’t list out the Pheno for a computer
Unless clinically TRAINED tough to outline a phenotype
But, through hard work, and being very specific with our description of phenotype we can begin to approach making this information manageable and use it to identify disease.
Can make a COMPUTABLE list of phenotype with hard work
How can we characterize the diseases? Well, a distribution of the phenotypes across all diseases reveals that most have phenotypes affecting the nervous system, while the least have connective tissue phenotypes.
7401 diseases
99,045 annotations
There are 20 categories in all. Note that they are not additive, as some phenotypes might belong to two categories. For example, some eye and ear phenotypes also belong to the head and neck.
MESSAGE: Phenotype annotations unevenly distributed in different anatomical systems.
As you might expect, we use this phenotype ontology to search for known variants that have similar phenotype to our patient. Using just HPO and the disease annotated with HPO terms we can compare to the known human disease, but as we are all painfully aware of we do not know the consequences of mutations in every gene that is in the human genome.
We use patient pheno and search for known variants to with similar pheno
Can do this with Humans, but don’t know all Human disease
PhenIX usese human data and predicted deleteriousness
HGMD ClinVAR OMIM Orphanet
HGMD mutations were inserted into variant files from DAG panels from which the causative mutations had been removed and phenotypic annotations of the corresponding diseases were extracted from the HPO database. The genes were ranked with PhenIX. Results were simulated either on the entire disease set (All) or by filtering for known autosomal dominant (AD) or autosomal recessive (AR) diseases (fig. S2). A total of 8504 (All), 3471 (AD), and 5006 (AR) simulations were performed.
See final figure in paper: http://stm.sciencemag.org/content/6/252/252ra123.full
A community-driven knowledge curation platform for skeletal dysplasias
Editorial process for curating phenotype and genotype knowledge on skeletal dysplasias
Integration with HPO
1: Diagnostic Facial Signature via a vector diagram: shows directional differences versus a normative data set. It is a 2D representation of a 3D image. The ends of the coloured lines represent the patient’s (or averaged groups of patients’ ) image(s). The grey scale represents the normative data (normal equivalent). In this case, if the tips of the arrows are seen sticking out from the grey surface, then the patient (or group of patient’s) is characterised by a protrusion of that region. E.g. here the patient has a prominent nasal tip and prominent lips. If you were to rotate the 3D image you would see the lines in the cheek region pointing inwards i.e. the patient has a flat cheek region compared to the reference range.
2: Monitoring treatment in a multisystem disorder. Progressive reduction in facial dysmorphology (anomaly), top to bottom, over treatment for a rare metabolic condition (Mucopolysaccharidosis type 1)
3: Text mining: converting face to text, specifically, elements of morphology terms.
When we have a patient with an undiagnosed/rare disease, we want to be able to search the whole landscape of knowledge.
How can we effectively utilize phenotype information collected about the patients?
It all boils down to the question, how much is enough to be useful?
What does that information need to be like in order to be useful?
We have partnered with the UDP to try to help figure out how much information might be necessary to collect about the patients with rare diseases.
To illustrate the method, let’s take Schwartz-jampel Syndrome. It has ~100 phenotype annotations, distributed to 17 phenotypic categories. The majority of which are in the skeletal system.
Problem in Hspg2, a proteoglycan that binds to and cross-links many extracellular matrix components and cell-surface molecules.
For this example, I’ve taken a subset of the phenotypes, and colored them by category.
We can test the roll of category by creating a derived disease that removes all the phenotypes for that category as our “case”…
We can test the role of category by creating a derived disease that removes all the phenotypes for that category as our “case”…
And then as a control, remove an equal amount of “information” from other categories.
In the case of Schwartz-Jampel Syndrome, removing only skeletal phenotypes (which comprises 40% of phenotype profile) it significantly reduces its similarity, dropping it to only 86% similar, whereas removing the same amount of information from the controls gives an average of 91% similarity. In this case, there were 73 controls to compare to. We have performed these types of experiments for every disease profile in our corpus (approx 8K).
The experiment therefore is:
Create a variety of “derived” less-specific diseases
Assess the change in similarity:
Is the derived disease still considered similar to the original disease?
…or more similar to a different disease?
Is it distinguishable beyond random?
Using the results, we can create a metric to define when a phenotype profile is unique enough to be useful in comparisons to other diseases and models systems.
Using the results, we can create a metric to define when a phenotype profile is unique enough to be useful in comparisons to other diseases and models systems.
This has been implemented at UDP so that clinicians can provide quality human data to be able to compare to animal models.
It has also been implemented on monarch genotype pages, where one can see how well annotated any given model is.
This functionality is also available as a service call. We can also generate different types of reports based on these analyses to determine which diseases have the least coverage, which models are the least well annotated, etc.
Our abilities to link genotype to phenotype are constrained by our knowledgebase. To date, <40% of human genes have been directly linked to diseases or traits by ClinVar, OMIM, and GWAS. How are we going to discover the disease-gene links if the phenotype coverage is so poor?
(Of course, there may be much more links for GWAS, once we figure out what to do with all the variants that lie outside of gene boundaries.
So, <40% of human genes have phenotypes. And when we look at the orthologs for each of the standard multi-cellular model organisms, there doesn’t appear to be any more than about 50% coverage for any given model.
But when put together, they bring the phenotypic coverage of human genes (either directly or inferred via orthology) up to nearly 80%. That is A LOT of coverage. How can we better tap that?
Since any computational methods rely heavily on the data, what does the available data look like?
The distribution of phenotype information per model genotype is different compared to human disease annotations.
For mouse, there’s a much higher representation of metabolic, cardiovascular, blood, and endocrine phenotypes available to compare;
For fish, there’s increased nervous, skeletal, head and neck, and cardiovascular, and connective tissue.
(Note that these do not include “normal” phenotypes for either diseases or genotypes.)
Each model brings something different to the phenotype landscape.
Different terminology is used to describe clinical manifestations than is used to describe model system biological features.
Things like finding models of sirenomelia due to disruption of the lateral plate mesoderm . Helping to find models and gene candidates based on the relationships in the development
Without additional knowledge and linking, computers can’t make the connections. These links take us from the molecular to the protein, to the cellular and anatomical, to the disease level of phenotypes
OWLsim computes semantic similarity between sets of phenotypes within and across species using the bridging semantics. Phenotypes in common from the bridging ontologies relate human clinical phenotypes with model organism phenotypes.
Examples include motor systems, olfaction, and digestion.
In this case, data encoded using the human phenotype ontology has been made interoperable with mouse, zebrafish and other model system ontologies. This also enables the use of more complex algorithms to detect similarity – not bases solely on mapping or string matching; e.g. constipation and decreased gut peristalsis are both subtypes of abnormal digestive system physiology.
Run through pipeline: Exomiser LOT is a version of exomiser that is less restrictive as far as what transcripts it recognizes (not at worried about off target reads because of the Mendalian filters and the ability to look at the BAMS)
Exomiser
Exomiser LOT
Pheno only
Exomiser is an exome analysis tool that leverages the uberpheno cross species phentoype comparisons, standard exome filtering (pathogenicity, frequency, off targets, etc.), in combination with mendelian filtering and interactome walking. The original method paper is published: http://genome.cshlp.org/content/early/2013/10/25/gr.160325.113.abstract
This work is unpublished but has been (just) submitted. Happy to share the manuscript if of interest.
Each model organism has a different suite of phenotypes that are examined, because different models are used to explore different types of biological function and malfunction. By using a diversity of model systems, we have the potential to identify candidates based on partial overlaps with the patient phenotype profile by looking at different models with mutations in potential candidates or related via interactions, co-expression, genomic regulatory region, etc.
Phenoviz is a new graphviz plugin that can be used as a standalone app for Windows, Mac, or Linux. The user uploads a list of CNVs detected by Array CGH (SNP Chips, or even genome sequence data would also work as a starting point, but the program expects a simple list). You also enter a list of the HPO terms observed in the patient. The application then tries to find “matches” based on the single gene disorders (human – HPO annotations) or the mouse models (mainly knockouts, MP annotations from MGI) or fish models (ZFIN E/Q annotations). This is being in the Charite Array CGH diagnostics service to help with interpretation of CNVs. Subjectively, the tool helps you to quickly find good candidates in order to write reports. The program also picks out the best matching CNV in case the user enters several (a typical array CGH finding in our lab has up to 50 CNVs, of which 2-5 are not found in databases of common variants like DGV).
There are a lot of people who have contributed to this work over many years.