SlideShare une entreprise Scribd logo
1  sur  29
Using Ontology to Classify
Members of a Protein Family
Robert Stevens
BioHealth Informatics Group
School of Computer Science
University of Manchester
Robert.stevens@manchester.ac.uk
Introduction
• Developing an automated system for extracting
and classifying proteins from newly sequenced
genomes
• Building an OWL ontology that defines class
membership
• Describing protein instances in OWL
• Classifying against the ontology
• Describing the protein family complement of a
genome
• As good as human classification, but added value
• Only possible through inter-disciplinary research
Acknowledgements
(it takes all sorts)
Katy Wolstencroft (Bioinformatics)
Daniele Turi (Instance Store)
Phil Lord (myGrid)
Lydia Tabernero (Protein Scientist)
Matt Horridge, Nick Drummond et al
(Protégé OWL)
Andy Brass and Robert Stevens
(Bioinformatics)
Protein Classification
• Proteins divided into broad functional classes
“Protein Families”
• Families sub-divided to give family
classifications
• Class membership cam be determined by
“protein features”, such as domains, etc.
• Resources exist for feature detection via
primary sequence– but not class
membership
• Current Limitation of Automated Tools
• Needs human knowledge to recognise class
membership
Finding Domains on a Sequence
A search of the linear sequence of protein
tyrosine phosphatase type K – identified 9
functional domains
>uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa
precursor (EC 3.1.3.48) (R-PTP-kappa).
MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV
SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP
GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI
AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV…
……..
Why Classify?
• Classification and curation of a genome is
the first step in understanding the processes
and functions happening in an organism
• Classification enables comparative genomic
studies - what is already known in other
organisms
• The similarities and differences between
processes and functions in related
organisms often provide the greatest insight
into the biology
• In silico characterisation is the current
bottleneck
The Protein Phosphatases
• large superfamily of proteins – involved in
the removal of phosphate groups from
molecules
• Important proteins in almost all cellular
processes
• Involved in diseases – diabetes and cancer
• human phosphatases well characterised
Phosphatase Classification
• Diagnostic phosphatase domains/motifs –
sufficient for membership of the protein
phosphatase superfamily
• Any protein having a phosphatase domain is a
member of the phosphatase super-family
• Other motifs determine a protein’s place within
the family
• Usually needs human to recognise that features
detected imply class membership
• Can these be captured in an ontology?
Ontologies
• Describing and defining the classes of
objects represented in information
• Defining the characteristics of objects
• The characteristics by which it can be
recognised to which class an object belongs
• In a form understandable by a computer
• … and, of course, humans.
Web Ontology Language (OWL)
• W3C recommendation for ontologies for the
Semantic Web
• OWL-DL mapped to a decidable fragment of
first order logic
• Classes, properties and instances
• Boolean operators, plus existential and
universal quantification
• Rich class expressions used in restriction on
properties – hasDomain some
(ImnunoGlobinDomain or
FibronectinDomain)
OWL represents
classes of instances
A
B
C
Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin
domain
• Having a fibronectin domain does not a
phosphatase make
• Necessity -- what must a class instance have?
• Any protein that has a phosphatase catalytic
domain is a phosphatase enzyme
• All phosphatase enzymes have a catalytic domain
• Sufficiency – how is an instance recognised to be a
member of a class?
Definition of Tyrosine
Phosphatase
Class TyrosineRreceptorProteinPhosphatase
EquivalentTo: Protein That
- contains atLeast-1
ProteinTyrosinePhosphataseDomain and
- contains EXACTLY 1
TransmembraneDomain
…there are known knowns; there are things
we know we know. We also know there are
known unknowns; that is to say we know
there are some things we do not know. But
there are also unknown unknowns -- the ones
we don't know we don't know.
Definition of Tyrosine Phosphatase:
What we Know we Know
Class TyrosineRreceptorProteinPhosphatase
EquivalentTo: Protein That
- contains atLeast-1
ProteinTyrosinePhosphataseDomain and
- contains EXACGTLY 1
TransmembraneDomain
Definition for R2A Phosphatase
Class: R2A
EquivalentTO: Protein That
- contains 2 ProteinTyrosinePhosphataseDomain and
- (contains 1 TransmembraneDomain )and
- (contains 4 FibronectinDomains) and
- contains 1 ImmunoglobulinDomain and
- contains 1 MAMDomain and
- contains 1 Cadherin-LikeDomain and
- contains only TyrosinePhosphataseDomain or
TransmembraneDomain or FibronectinDomain or
ImnunoglobulinDomain or Clathrin-LikeDomain or
ManDomain
Automatic Reasoning
• An OWL-DL ontology mapped to its dL form
as a collection of axioms
• An automatic reasoner checks for satisfiability
– throws out the inconsistant and infers
subsumption
• Defined classes (where there are necessary
and sufficient restrictions) enable a reasoner
to infer subclass axioms
• Also infer to which class an object belongs
• Based on the facts we know about it
Incremental Addition of Protein
Functional Domains
Phosphatase catalytic
Cadherin-like
Immunoglobulin
MAM domain Cellular retinaldehyde
Adhesion recognition Transmembrane
Fibronectin III Glycosylation
Building the Ontology
• Classifications already made by biologists – based
on protein functionality;
• Protein domain composition and other details in
the literature;
• Some 50 classes of phosphatase, 30 protein
domains and one relationship;
• ”Value partition” of protein domains (covering and
disjoint);
• Defines range of contains property;
• Literature contains knowledge of how to recognise
members of each class of phosphatase.
Classification of the Classical
Tyrosine Phosphatases
What is the Ontology Telling Us?
• Each class of phosphatase defined in terms of
domain composition
• We know the characteristics by which an
individual protein can be recognised to be a
member of a particular class of phosphatase
• We have this knowledge in a computational
form
• If we had protein instances described in terms
of the ontology, we could classify those
individual proteins
• A catalogue of phosphatases
Description of an Instance of a
Protein
• Instance: P21592
TypeOf: Protein That
Fact: hasDomain 2
ProteinTyrosinePhosphataseDomain and
Fact: hasdomain 1 TransmembraneDomain
and
Fact: hasdomain 4 FibronectinDomains
and
Fact: hasDomain 1
ImmunoglobulinDomain and
Fact: hasdomain 1 MAMDomain and
Fact: hasdomain 1 Cadherin-LikeDomain
Instance: P21592        
TypeOf: Protein That
Fact: hasDomain 2
ProteinTyrosinePhosphataseDomain and
Fact: hasdomain 1 TransmembraneDomain and 
Fact: hasdomain 4 FibronectinDomains and
Fact: hasDomain 1 ImmunoglobulinDomain and
Fact: hasdomain 1 MAMDomain and
Fact: hasdomain 1 Cadherin-LikeDomain
Tyrosine Phosphatase
(containsDomain some TransmembraneDomain) and
(containsDomain at least 1 ProteinTyrosinePhosphataseDomain)
tase
n some MAMDomain) and
n some ProteinTyrosineCatalyticDomain or ImmunoglobulinDomain) and
n some FibronectinDomain or FibronectinTypeIIIFoldDomain) and
n exactly 2 ProteinTyrosinePhosphataseDomain)
ClassifyingProteins
>uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine
phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa).
MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV
SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP
GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI
AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV………..
InterPro
Instance Store
Reasoner
Translate
Codify
So Far…..
• Human phosphatases have been classified using
the system
• The ontology classification performed equally well
as expert classification
• The ontology system refined classification
- DUSC contains zinc finger domain
Characterised and conserved – but not in
classification
- DUSA contains a disintegrin domain
previously uncharacterised – evolutionarily
conserved
• A new kind of phosphatase?
Aspergillus fumigatus
• Phosphatase compliment very different from
human
>100 human <50 A.fumigatus
• Whole subfamilies ‘missing’
Different fungi-specific phosphorylation pathways?
No requirement for tissue-specific variations?
• Novel serine/threonine phosphatase with
homeobox
Conserved in aspergillus and closely related
species, but not in any other
Again, a new phosphatase?
Scaling
• Over 700 protein families
• Some 14,000 described sequence
features
• Hundreds of thousands types of protein
• Mass classification, then what?
Generic Technique
• Feature detection
• Categories defined in terms of those
features
• Produce catalogue of what you
currently know
• Highlight cases that don’t match current
knowledge
Conclusions
• Using ontology allows automated classification to
reach the standard of human expert annotation
• Reasoning capabilities allow interpretation of
domain organisation
• Capturing human knowledge in computational form
• Systematic survey produces interesting biological
questions
• Discovering the unexpected
• Allows fast, efficient comparative genomics studies
• A combination of CS and bioinformatics to do
biology

Contenu connexe

Similaire à Using Ontology to Classify Members of a Protein Family

proteomic and Genomics and the available proteomic technologies and the data ...
proteomic and Genomics and the available proteomic technologies and the data ...proteomic and Genomics and the available proteomic technologies and the data ...
proteomic and Genomics and the available proteomic technologies and the data ...SamiMohamed28
 
How to become a molecular biologist expert in 4 days
How to become a molecular biologist expert in 4 daysHow to become a molecular biologist expert in 4 days
How to become a molecular biologist expert in 4 daysBoster Biological Technology
 
Biotechnology Chapter One Lecture- Intro to Biotech
Biotechnology Chapter One Lecture- Intro to BiotechBiotechnology Chapter One Lecture- Intro to Biotech
Biotechnology Chapter One Lecture- Intro to BiotechMary Beth Smith
 
BACTERIAL GENETICS for MBBS students
BACTERIAL GENETICS for MBBS studentsBACTERIAL GENETICS for MBBS students
BACTERIAL GENETICS for MBBS studentsNCRIMS, Meerut
 
How to become a molecular biologist/PCR expert in 4 days
How to become a molecular biologist/PCR expert in 4 daysHow to become a molecular biologist/PCR expert in 4 days
How to become a molecular biologist/PCR expert in 4 daysCJ Xia
 
protein microarray.pptx
protein microarray.pptxprotein microarray.pptx
protein microarray.pptxAtulSingh77625
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsAyeshaYousaf20
 
Advanced molecular biology.ppt
Advanced molecular biology.pptAdvanced molecular biology.ppt
Advanced molecular biology.pptMUHAMMEDBAWAYUSUF
 
Molecular biology and its application in food biotechnology
Molecular biology and its application in food biotechnologyMolecular biology and its application in food biotechnology
Molecular biology and its application in food biotechnologyHeru Pramono
 
Protein and Peptide.pptx
Protein and Peptide.pptxProtein and Peptide.pptx
Protein and Peptide.pptxPrachi Pandey
 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyAbhijeet Kashyap
 

Similaire à Using Ontology to Classify Members of a Protein Family (20)

PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Ontology at Manchester
Ontology at ManchesterOntology at Manchester
Ontology at Manchester
 
proteomic and Genomics and the available proteomic technologies and the data ...
proteomic and Genomics and the available proteomic technologies and the data ...proteomic and Genomics and the available proteomic technologies and the data ...
proteomic and Genomics and the available proteomic technologies and the data ...
 
Structural database and their classification by abdul qahar
Structural database and their classification by abdul qaharStructural database and their classification by abdul qahar
Structural database and their classification by abdul qahar
 
New proteomics
New proteomicsNew proteomics
New proteomics
 
How to become a molecular biologist expert in 4 days
How to become a molecular biologist expert in 4 daysHow to become a molecular biologist expert in 4 days
How to become a molecular biologist expert in 4 days
 
Biotechnology Chapter One Lecture- Intro to Biotech
Biotechnology Chapter One Lecture- Intro to BiotechBiotechnology Chapter One Lecture- Intro to Biotech
Biotechnology Chapter One Lecture- Intro to Biotech
 
BACTERIAL GENETICS for MBBS students
BACTERIAL GENETICS for MBBS studentsBACTERIAL GENETICS for MBBS students
BACTERIAL GENETICS for MBBS students
 
proteomics.ppt
proteomics.pptproteomics.ppt
proteomics.ppt
 
How to become a molecular biologist/PCR expert in 4 days
How to become a molecular biologist/PCR expert in 4 daysHow to become a molecular biologist/PCR expert in 4 days
How to become a molecular biologist/PCR expert in 4 days
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Proteomics
ProteomicsProteomics
Proteomics
 
protein microarray.pptx
protein microarray.pptxprotein microarray.pptx
protein microarray.pptx
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
Advanced molecular biology.ppt
Advanced molecular biology.pptAdvanced molecular biology.ppt
Advanced molecular biology.ppt
 
Molecular biology and its application in food biotechnology
Molecular biology and its application in food biotechnologyMolecular biology and its application in food biotechnology
Molecular biology and its application in food biotechnology
 
Protein and Peptide.pptx
Protein and Peptide.pptxProtein and Peptide.pptx
Protein and Peptide.pptx
 
KSD_L8_CMB06.ppt
KSD_L8_CMB06.pptKSD_L8_CMB06.ppt
KSD_L8_CMB06.ppt
 
Protein protein interactions
Protein protein interactionsProtein protein interactions
Protein protein interactions
 
Proteomics and its applications in phytopathology
Proteomics and its applications in phytopathologyProteomics and its applications in phytopathology
Proteomics and its applications in phytopathology
 

Plus de robertstevens65

Ontologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientOntologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientrobertstevens65
 
The Pragmatics and Formality of Authoring OntologiesOdsl 2016
The Pragmatics and Formality of Authoring OntologiesOdsl 2016The Pragmatics and Formality of Authoring OntologiesOdsl 2016
The Pragmatics and Formality of Authoring OntologiesOdsl 2016robertstevens65
 
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...robertstevens65
 
The Quality of Method Reporting in
The Quality of Method Reporting in The Quality of Method Reporting in
The Quality of Method Reporting in robertstevens65
 
The Semantics of Genomic Analysis
The Semantics of  Genomic AnalysisThe Semantics of  Genomic Analysis
The Semantics of Genomic Analysisrobertstevens65
 
Issues and activities in authoring ontologies
Issues and activities in authoring ontologiesIssues and activities in authoring ontologies
Issues and activities in authoring ontologiesrobertstevens65
 
The state of the nation for ontology development
The state of the nation for ontology developmentThe state of the nation for ontology development
The state of the nation for ontology developmentrobertstevens65
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biologyrobertstevens65
 
Properties and Individuals in OWL: Reasoning About Family History
Properties and Individuals in OWL: Reasoning About Family HistoryProperties and Individuals in OWL: Reasoning About Family History
Properties and Individuals in OWL: Reasoning About Family Historyrobertstevens65
 
Choosing and Building Knowledge Artefacts
Choosing and Building Knowledge ArtefactsChoosing and Building Knowledge Artefacts
Choosing and Building Knowledge Artefactsrobertstevens65
 
Populous: A tool for Populating OWL Ontologies from Templates
Populous: A tool for Populating OWL Ontologies from TemplatesPopulous: A tool for Populating OWL Ontologies from Templates
Populous: A tool for Populating OWL Ontologies from Templatesrobertstevens65
 
Keeping ontology development Agile
Keeping ontology development AgileKeeping ontology development Agile
Keeping ontology development Agilerobertstevens65
 
Lessons from teaching non-computer scientists OWL and ontologies
Lessons from teaching non-computer scientists OWL and ontologiesLessons from teaching non-computer scientists OWL and ontologies
Lessons from teaching non-computer scientists OWL and ontologiesrobertstevens65
 
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)robertstevens65
 
A Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a RoseA Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a Roserobertstevens65
 
Working with big biomedical ontologies
Working with big biomedical ontologiesWorking with big biomedical ontologies
Working with big biomedical ontologiesrobertstevens65
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...robertstevens65
 
Ontology learning from text
Ontology learning from textOntology learning from text
Ontology learning from textrobertstevens65
 
Knowledge Management in a Knowledge Based Discipline
Knowledge Management in a Knowledge Based DisciplineKnowledge Management in a Knowledge Based Discipline
Knowledge Management in a Knowledge Based Disciplinerobertstevens65
 

Plus de robertstevens65 (20)

Ontologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficientOntologies: Necessary, but not sufficient
Ontologies: Necessary, but not sufficient
 
The Pragmatics and Formality of Authoring OntologiesOdsl 2016
The Pragmatics and Formality of Authoring OntologiesOdsl 2016The Pragmatics and Formality of Authoring OntologiesOdsl 2016
The Pragmatics and Formality of Authoring OntologiesOdsl 2016
 
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
OBOPedia: An Encyclopaedia of Biology Using OBO OntologiesObopedia swat4ls-20...
 
The Quality of Method Reporting in
The Quality of Method Reporting in The Quality of Method Reporting in
The Quality of Method Reporting in
 
The Semantics of Genomic Analysis
The Semantics of  Genomic AnalysisThe Semantics of  Genomic Analysis
The Semantics of Genomic Analysis
 
Issues and activities in authoring ontologies
Issues and activities in authoring ontologiesIssues and activities in authoring ontologies
Issues and activities in authoring ontologies
 
The state of the nation for ontology development
The state of the nation for ontology developmentThe state of the nation for ontology development
The state of the nation for ontology development
 
Building and Using Ontologies to do biology
Building and Using Ontologies to do biologyBuilding and Using Ontologies to do biology
Building and Using Ontologies to do biology
 
Properties and Individuals in OWL: Reasoning About Family History
Properties and Individuals in OWL: Reasoning About Family HistoryProperties and Individuals in OWL: Reasoning About Family History
Properties and Individuals in OWL: Reasoning About Family History
 
Choosing and Building Knowledge Artefacts
Choosing and Building Knowledge ArtefactsChoosing and Building Knowledge Artefacts
Choosing and Building Knowledge Artefacts
 
Populous: A tool for Populating OWL Ontologies from Templates
Populous: A tool for Populating OWL Ontologies from TemplatesPopulous: A tool for Populating OWL Ontologies from Templates
Populous: A tool for Populating OWL Ontologies from Templates
 
Keeping ontology development Agile
Keeping ontology development AgileKeeping ontology development Agile
Keeping ontology development Agile
 
Spreadsheets to OWL
Spreadsheets to OWLSpreadsheets to OWL
Spreadsheets to OWL
 
Lessons from teaching non-computer scientists OWL and ontologies
Lessons from teaching non-computer scientists OWL and ontologiesLessons from teaching non-computer scientists OWL and ontologies
Lessons from teaching non-computer scientists OWL and ontologies
 
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
Kidney and Urinary Pathways Knowledge Base (part of e-LICO)
 
A Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a RoseA Rose by Any Other Name is Still a Rose
A Rose by Any Other Name is Still a Rose
 
Working with big biomedical ontologies
Working with big biomedical ontologiesWorking with big biomedical ontologies
Working with big biomedical ontologies
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
 
Ontology learning from text
Ontology learning from textOntology learning from text
Ontology learning from text
 
Knowledge Management in a Knowledge Based Discipline
Knowledge Management in a Knowledge Based DisciplineKnowledge Management in a Knowledge Based Discipline
Knowledge Management in a Knowledge Based Discipline
 

Dernier

Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Dernier (20)

Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 

Using Ontology to Classify Members of a Protein Family

  • 1. Using Ontology to Classify Members of a Protein Family Robert Stevens BioHealth Informatics Group School of Computer Science University of Manchester Robert.stevens@manchester.ac.uk
  • 2. Introduction • Developing an automated system for extracting and classifying proteins from newly sequenced genomes • Building an OWL ontology that defines class membership • Describing protein instances in OWL • Classifying against the ontology • Describing the protein family complement of a genome • As good as human classification, but added value • Only possible through inter-disciplinary research
  • 3. Acknowledgements (it takes all sorts) Katy Wolstencroft (Bioinformatics) Daniele Turi (Instance Store) Phil Lord (myGrid) Lydia Tabernero (Protein Scientist) Matt Horridge, Nick Drummond et al (Protégé OWL) Andy Brass and Robert Stevens (Bioinformatics)
  • 4. Protein Classification • Proteins divided into broad functional classes “Protein Families” • Families sub-divided to give family classifications • Class membership cam be determined by “protein features”, such as domains, etc. • Resources exist for feature detection via primary sequence– but not class membership • Current Limitation of Automated Tools • Needs human knowledge to recognise class membership
  • 5. Finding Domains on a Sequence A search of the linear sequence of protein tyrosine phosphatase type K – identified 9 functional domains >uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa). MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV… ……..
  • 6. Why Classify? • Classification and curation of a genome is the first step in understanding the processes and functions happening in an organism • Classification enables comparative genomic studies - what is already known in other organisms • The similarities and differences between processes and functions in related organisms often provide the greatest insight into the biology • In silico characterisation is the current bottleneck
  • 7. The Protein Phosphatases • large superfamily of proteins – involved in the removal of phosphate groups from molecules • Important proteins in almost all cellular processes • Involved in diseases – diabetes and cancer • human phosphatases well characterised
  • 8. Phosphatase Classification • Diagnostic phosphatase domains/motifs – sufficient for membership of the protein phosphatase superfamily • Any protein having a phosphatase domain is a member of the phosphatase super-family • Other motifs determine a protein’s place within the family • Usually needs human to recognise that features detected imply class membership • Can these be captured in an ontology?
  • 9. Ontologies • Describing and defining the classes of objects represented in information • Defining the characteristics of objects • The characteristics by which it can be recognised to which class an object belongs • In a form understandable by a computer • … and, of course, humans.
  • 10. Web Ontology Language (OWL) • W3C recommendation for ontologies for the Semantic Web • OWL-DL mapped to a decidable fragment of first order logic • Classes, properties and instances • Boolean operators, plus existential and universal quantification • Rich class expressions used in restriction on properties – hasDomain some (ImnunoGlobinDomain or FibronectinDomain)
  • 11. OWL represents classes of instances A B C
  • 12. Necessity and Sufficiency • An R2A phosphatase must have a fibronectin domain • Having a fibronectin domain does not a phosphatase make • Necessity -- what must a class instance have? • Any protein that has a phosphatase catalytic domain is a phosphatase enzyme • All phosphatase enzymes have a catalytic domain • Sufficiency – how is an instance recognised to be a member of a class?
  • 13. Definition of Tyrosine Phosphatase Class TyrosineRreceptorProteinPhosphatase EquivalentTo: Protein That - contains atLeast-1 ProteinTyrosinePhosphataseDomain and - contains EXACTLY 1 TransmembraneDomain
  • 14. …there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.
  • 15. Definition of Tyrosine Phosphatase: What we Know we Know Class TyrosineRreceptorProteinPhosphatase EquivalentTo: Protein That - contains atLeast-1 ProteinTyrosinePhosphataseDomain and - contains EXACGTLY 1 TransmembraneDomain
  • 16. Definition for R2A Phosphatase Class: R2A EquivalentTO: Protein That - contains 2 ProteinTyrosinePhosphataseDomain and - (contains 1 TransmembraneDomain )and - (contains 4 FibronectinDomains) and - contains 1 ImmunoglobulinDomain and - contains 1 MAMDomain and - contains 1 Cadherin-LikeDomain and - contains only TyrosinePhosphataseDomain or TransmembraneDomain or FibronectinDomain or ImnunoglobulinDomain or Clathrin-LikeDomain or ManDomain
  • 17. Automatic Reasoning • An OWL-DL ontology mapped to its dL form as a collection of axioms • An automatic reasoner checks for satisfiability – throws out the inconsistant and infers subsumption • Defined classes (where there are necessary and sufficient restrictions) enable a reasoner to infer subclass axioms • Also infer to which class an object belongs • Based on the facts we know about it
  • 18. Incremental Addition of Protein Functional Domains Phosphatase catalytic Cadherin-like Immunoglobulin MAM domain Cellular retinaldehyde Adhesion recognition Transmembrane Fibronectin III Glycosylation
  • 19. Building the Ontology • Classifications already made by biologists – based on protein functionality; • Protein domain composition and other details in the literature; • Some 50 classes of phosphatase, 30 protein domains and one relationship; • ”Value partition” of protein domains (covering and disjoint); • Defines range of contains property; • Literature contains knowledge of how to recognise members of each class of phosphatase.
  • 20. Classification of the Classical Tyrosine Phosphatases
  • 21. What is the Ontology Telling Us? • Each class of phosphatase defined in terms of domain composition • We know the characteristics by which an individual protein can be recognised to be a member of a particular class of phosphatase • We have this knowledge in a computational form • If we had protein instances described in terms of the ontology, we could classify those individual proteins • A catalogue of phosphatases
  • 22. Description of an Instance of a Protein • Instance: P21592 TypeOf: Protein That Fact: hasDomain 2 ProteinTyrosinePhosphataseDomain and Fact: hasdomain 1 TransmembraneDomain and Fact: hasdomain 4 FibronectinDomains and Fact: hasDomain 1 ImmunoglobulinDomain and Fact: hasdomain 1 MAMDomain and Fact: hasdomain 1 Cadherin-LikeDomain
  • 23. Instance: P21592         TypeOf: Protein That Fact: hasDomain 2 ProteinTyrosinePhosphataseDomain and Fact: hasdomain 1 TransmembraneDomain and  Fact: hasdomain 4 FibronectinDomains and Fact: hasDomain 1 ImmunoglobulinDomain and Fact: hasdomain 1 MAMDomain and Fact: hasdomain 1 Cadherin-LikeDomain Tyrosine Phosphatase (containsDomain some TransmembraneDomain) and (containsDomain at least 1 ProteinTyrosinePhosphataseDomain) tase n some MAMDomain) and n some ProteinTyrosineCatalyticDomain or ImmunoglobulinDomain) and n some FibronectinDomain or FibronectinTypeIIIFoldDomain) and n exactly 2 ProteinTyrosinePhosphataseDomain)
  • 24. ClassifyingProteins >uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa). MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHV SAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNP GTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYI AIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV……….. InterPro Instance Store Reasoner Translate Codify
  • 25. So Far….. • Human phosphatases have been classified using the system • The ontology classification performed equally well as expert classification • The ontology system refined classification - DUSC contains zinc finger domain Characterised and conserved – but not in classification - DUSA contains a disintegrin domain previously uncharacterised – evolutionarily conserved • A new kind of phosphatase?
  • 26. Aspergillus fumigatus • Phosphatase compliment very different from human >100 human <50 A.fumigatus • Whole subfamilies ‘missing’ Different fungi-specific phosphorylation pathways? No requirement for tissue-specific variations? • Novel serine/threonine phosphatase with homeobox Conserved in aspergillus and closely related species, but not in any other Again, a new phosphatase?
  • 27. Scaling • Over 700 protein families • Some 14,000 described sequence features • Hundreds of thousands types of protein • Mass classification, then what?
  • 28. Generic Technique • Feature detection • Categories defined in terms of those features • Produce catalogue of what you currently know • Highlight cases that don’t match current knowledge
  • 29. Conclusions • Using ontology allows automated classification to reach the standard of human expert annotation • Reasoning capabilities allow interpretation of domain organisation • Capturing human knowledge in computational form • Systematic survey produces interesting biological questions • Discovering the unexpected • Allows fast, efficient comparative genomics studies • A combination of CS and bioinformatics to do biology

Notes de l'éditeur

  1. &amp;lt;number&amp;gt;
  2. All of which helps build better ontologies. But can we actually apply this computational amenability more Directly to biological knowledge. In this example, which is work by Katy Wolstencroft, we have codified Community knowledge about protein domains in phosphatases in OWL. We then take unknown protein sequences, Pass then through interpro and stick them into the instance store, which is basically a database and reasoner tied together Qualified Cardiniality!!!