9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
Using Ontology to Classify Members of a Protein Family
1. Using Ontology to Classify
Members of a Protein Family
Robert Stevens
BioHealth Informatics Group
School of Computer Science
University of Manchester
Robert.stevens@manchester.ac.uk
2. Introduction
• Developing an automated system for extracting
and classifying proteins from newly sequenced
genomes
• Building an OWL ontology that defines class
membership
• Describing protein instances in OWL
• Classifying against the ontology
• Describing the protein family complement of a
genome
• As good as human classification, but added value
• Only possible through inter-disciplinary research
3. Acknowledgements
(it takes all sorts)
Katy Wolstencroft (Bioinformatics)
Daniele Turi (Instance Store)
Phil Lord (myGrid)
Lydia Tabernero (Protein Scientist)
Matt Horridge, Nick Drummond et al
(Protégé OWL)
Andy Brass and Robert Stevens
(Bioinformatics)
4. Protein Classification
• Proteins divided into broad functional classes
“Protein Families”
• Families sub-divided to give family
classifications
• Class membership cam be determined by
“protein features”, such as domains, etc.
• Resources exist for feature detection via
primary sequence– but not class
membership
• Current Limitation of Automated Tools
• Needs human knowledge to recognise class
membership
5. Finding Domains on a Sequence
A search of the linear sequence of protein
tyrosine phosphatase type K – identified 9
functional domains
>uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa
precursor (EC 3.1.3.48) (R-PTP-kappa).
MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV
SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP
GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI
AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV…
……..
6. Why Classify?
• Classification and curation of a genome is
the first step in understanding the processes
and functions happening in an organism
• Classification enables comparative genomic
studies - what is already known in other
organisms
• The similarities and differences between
processes and functions in related
organisms often provide the greatest insight
into the biology
• In silico characterisation is the current
bottleneck
7. The Protein Phosphatases
• large superfamily of proteins – involved in
the removal of phosphate groups from
molecules
• Important proteins in almost all cellular
processes
• Involved in diseases – diabetes and cancer
• human phosphatases well characterised
8. Phosphatase Classification
• Diagnostic phosphatase domains/motifs –
sufficient for membership of the protein
phosphatase superfamily
• Any protein having a phosphatase domain is a
member of the phosphatase super-family
• Other motifs determine a protein’s place within
the family
• Usually needs human to recognise that features
detected imply class membership
• Can these be captured in an ontology?
9. Ontologies
• Describing and defining the classes of
objects represented in information
• Defining the characteristics of objects
• The characteristics by which it can be
recognised to which class an object belongs
• In a form understandable by a computer
• … and, of course, humans.
10. Web Ontology Language (OWL)
• W3C recommendation for ontologies for the
Semantic Web
• OWL-DL mapped to a decidable fragment of
first order logic
• Classes, properties and instances
• Boolean operators, plus existential and
universal quantification
• Rich class expressions used in restriction on
properties – hasDomain some
(ImnunoGlobinDomain or
FibronectinDomain)
12. Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin
domain
• Having a fibronectin domain does not a
phosphatase make
• Necessity -- what must a class instance have?
• Any protein that has a phosphatase catalytic
domain is a phosphatase enzyme
• All phosphatase enzymes have a catalytic domain
• Sufficiency – how is an instance recognised to be a
member of a class?
13. Definition of Tyrosine
Phosphatase
Class TyrosineRreceptorProteinPhosphatase
EquivalentTo: Protein That
- contains atLeast-1
ProteinTyrosinePhosphataseDomain and
- contains EXACTLY 1
TransmembraneDomain
14. …there are known knowns; there are things
we know we know. We also know there are
known unknowns; that is to say we know
there are some things we do not know. But
there are also unknown unknowns -- the ones
we don't know we don't know.
15. Definition of Tyrosine Phosphatase:
What we Know we Know
Class TyrosineRreceptorProteinPhosphatase
EquivalentTo: Protein That
- contains atLeast-1
ProteinTyrosinePhosphataseDomain and
- contains EXACGTLY 1
TransmembraneDomain
16. Definition for R2A Phosphatase
Class: R2A
EquivalentTO: Protein That
- contains 2 ProteinTyrosinePhosphataseDomain and
- (contains 1 TransmembraneDomain )and
- (contains 4 FibronectinDomains) and
- contains 1 ImmunoglobulinDomain and
- contains 1 MAMDomain and
- contains 1 Cadherin-LikeDomain and
- contains only TyrosinePhosphataseDomain or
TransmembraneDomain or FibronectinDomain or
ImnunoglobulinDomain or Clathrin-LikeDomain or
ManDomain
17. Automatic Reasoning
• An OWL-DL ontology mapped to its dL form
as a collection of axioms
• An automatic reasoner checks for satisfiability
– throws out the inconsistant and infers
subsumption
• Defined classes (where there are necessary
and sufficient restrictions) enable a reasoner
to infer subclass axioms
• Also infer to which class an object belongs
• Based on the facts we know about it
18. Incremental Addition of Protein
Functional Domains
Phosphatase catalytic
Cadherin-like
Immunoglobulin
MAM domain Cellular retinaldehyde
Adhesion recognition Transmembrane
Fibronectin III Glycosylation
19. Building the Ontology
• Classifications already made by biologists – based
on protein functionality;
• Protein domain composition and other details in
the literature;
• Some 50 classes of phosphatase, 30 protein
domains and one relationship;
• ”Value partition” of protein domains (covering and
disjoint);
• Defines range of contains property;
• Literature contains knowledge of how to recognise
members of each class of phosphatase.
21. What is the Ontology Telling Us?
• Each class of phosphatase defined in terms of
domain composition
• We know the characteristics by which an
individual protein can be recognised to be a
member of a particular class of phosphatase
• We have this knowledge in a computational
form
• If we had protein instances described in terms
of the ontology, we could classify those
individual proteins
• A catalogue of phosphatases
22. Description of an Instance of a
Protein
• Instance: P21592
TypeOf: Protein That
Fact: hasDomain 2
ProteinTyrosinePhosphataseDomain and
Fact: hasdomain 1 TransmembraneDomain
and
Fact: hasdomain 4 FibronectinDomains
and
Fact: hasDomain 1
ImmunoglobulinDomain and
Fact: hasdomain 1 MAMDomain and
Fact: hasdomain 1 Cadherin-LikeDomain
23. Instance: P21592
TypeOf: Protein That
Fact: hasDomain 2
ProteinTyrosinePhosphataseDomain and
Fact: hasdomain 1 TransmembraneDomain and
Fact: hasdomain 4 FibronectinDomains and
Fact: hasDomain 1 ImmunoglobulinDomain and
Fact: hasdomain 1 MAMDomain and
Fact: hasdomain 1 Cadherin-LikeDomain
Tyrosine Phosphatase
(containsDomain some TransmembraneDomain) and
(containsDomain at least 1 ProteinTyrosinePhosphataseDomain)
tase
n some MAMDomain) and
n some ProteinTyrosineCatalyticDomain or ImmunoglobulinDomain) and
n some FibronectinDomain or FibronectinTypeIIIFoldDomain) and
n exactly 2 ProteinTyrosinePhosphataseDomain)
25. So Far…..
• Human phosphatases have been classified using
the system
• The ontology classification performed equally well
as expert classification
• The ontology system refined classification
- DUSC contains zinc finger domain
Characterised and conserved – but not in
classification
- DUSA contains a disintegrin domain
previously uncharacterised – evolutionarily
conserved
• A new kind of phosphatase?
26. Aspergillus fumigatus
• Phosphatase compliment very different from
human
>100 human <50 A.fumigatus
• Whole subfamilies ‘missing’
Different fungi-specific phosphorylation pathways?
No requirement for tissue-specific variations?
• Novel serine/threonine phosphatase with
homeobox
Conserved in aspergillus and closely related
species, but not in any other
Again, a new phosphatase?
27. Scaling
• Over 700 protein families
• Some 14,000 described sequence
features
• Hundreds of thousands types of protein
• Mass classification, then what?
28. Generic Technique
• Feature detection
• Categories defined in terms of those
features
• Produce catalogue of what you
currently know
• Highlight cases that don’t match current
knowledge
29. Conclusions
• Using ontology allows automated classification to
reach the standard of human expert annotation
• Reasoning capabilities allow interpretation of
domain organisation
• Capturing human knowledge in computational form
• Systematic survey produces interesting biological
questions
• Discovering the unexpected
• Allows fast, efficient comparative genomics studies
• A combination of CS and bioinformatics to do
biology
Notes de l'éditeur
&lt;number&gt;
All of which helps build better ontologies. But can we actually apply this computational amenability more
Directly to biological knowledge. In this example, which is work by Katy Wolstencroft, we have codified
Community knowledge about protein domains in phosphatases in OWL. We then take unknown protein sequences,
Pass then through interpro and stick them into the instance store, which is basically a database and reasoner tied together
Qualified Cardiniality!!!