SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
Designing a community resource – the
Complex Portal as an example
Sandra Orchard
Hands-on exercise
Design a manually curated data resource that will enable
the description of species agnostic protein complexes,
to act as reference resource in the same way that
UniProt does for proteins – use as examples
1. Human Haemoglobin
2. Arabidopsis Light harvesting complex
Designing a new resource - what else is out
there?
• Before starting to design a resource, assess what else is
out there – re-inventing the wheel causes community
fragmentation and confusion as well as being a waste of
limited funds
• Is it needed – what gap in the market is it designed to fill?
• Investigate possibilities for collaboration, rather than
competition
• If another resource exists, does it meet your/consumer
demands – can you contribute and improve
Designing a new resource
• How will researchers use it, what information do they
want? Conduct extensive user requirement studies
before starting the design process.
• How will users search it? This will impact on data
entry/annotation.
• Data visualisation – again, what do users want? Usability
studies are critical
• Long term plans – will it survive the first grant renewal?
Complex Portal - what else was out there?
• Information on protein complexes scattered between
multiple resources but no unifying resource
• MIPS catalogued yeast complexes in 2000
• Corum – human complexes, project terminated in 2009
• Decision – use as starting point or start again?
Information content and presentation
• User consultation – design what they need, not what you
want to give them
• Don’t get too attached to your first paper prototype – be
prepared to sacrifice your concept to community need
• Develop a beta site, then observe researchers using it.
• Keep testing, react to new demands, novel use cases
Use of community standards
• Use of community standards enable
• Data merger across multiple resources – contribute to a
greater community effort
• Data re-use and longevity
• Immediate access to existing tool suites
Use of Community standards – Complex
Portal
• Established standard formats for molecular interactions PSI-MI
XML/MITAB)
• PSI-XML2.5 designed for experimental data, curated complex
data not a perfect fit – worked with PSI-MI workgroup to
produce new version
• MITAB designed for binary pairs, not complexes –
ComplexTAB will be presented to MI workgroup for adoption
Use of Community standards – Complex
Portal
• Used existing identifiers for components (UniProtKB,
ChEBI, RNAcentral)
– enables import of additional information using resource APIs,
for example can search website using gene synonyms
- Organism non-specific, enables us to describe complexes in
a range of species, including non-model organisms
Use of Community standards enables use of
existing tools
• Community standards have encouraged tool
development by users, software often open-source and
freely available – often can be incorporated directly into
websites with little/no additional development
• Complex Portal viewer originally
written to visualise cross-linking
data
Use of Community standards enables use of
existing tools
• Look for initiatives which make open-source tools,
apps/plug-ins, visualizers and widgets freely available
e.g. BioJS, BioPerl, Cytoscape……
Free text vs Ontologies
Free text
Pros – versatile, fully descriptive, flexible
Cons – can be difficult to interpret, long winded,
error-prone, difficult to search
CVs
Pros – structured, consistent, concise
Cons – may not deal well with ‘odd’ cases, lack of
information
Consider using both!
Use of controlled vocabularies
• Again, re-use rather than re-invent
• Use of CVs enables searches across resources, but also
can make intelligent searches within resources easy to
implement
For example can search for
• all transcription factors
• all complexes involved in respiration
• all mitochondrial complexes
Use of controlled vocabularies
In the Complex Portal you can search for
1. All enzymes - GO:0003824 (catalytic activity)
2. All transferases - GO:0016740 (transferase activity)
3. All protein kinases - GO:0004672 (protein kinase activity)
4. All cyclin-dependent protein kinase - GO:0097472
(cyclin-dependent protein kinase activity)
Similarly can use the ChEBI ontology – search on porphyrin
Linking to external resources
• Extensive cross-
referencing is time
consuming but enables
subsequent pulling in of
data from other resources
Make this the ‘go to’ resource for your
community
• Must fit community need, be easy to search and deliver
the results the user wants
Outreach – publications, conferences, talks….
Collaborate on a high impact analysis paper, with your
resource playing a key role.
Protocols, tutorials, videos, hands-on training courses.
Use social media
Using Social Media
InterPro and Annotation transfer to
Non-Model Organism Proteomes
What is InterPro
• InterPro provides functional analysis of proteins by
classifying them into families and predicting domains
and important sites.
• Combine protein signatures from a number of member
databases into a single searchable resource,
• Has resulted in an integrated database and diagnostic
tool (InerProScan).
Protein signatures
Model the pattern of conserved amino acids at specific positions within
a multiple sequence alignment
• Patterns
• Profiles
• Profile HMMs
Use these models (signatures) to infer relationships with the
characterised sequences from which the alignment was constructed
Approach used by a variety of databases: Pfam, TIGRFAMs,
PANTHER, Prosite, etc
Protein signatures
Alternatively, model the pattern of conserved amino acids at specific
positions within a multiple sequence alignment
• Patterns
• Profiles
• Profile HMMs
Use these models (signatures) to infer relationships with the
characterised sequences from which the alignment was constructed
Approach used by a variety of databases: Pfam, TIGRFAMs,
PANTHER, Prosite, etc
Introduction to InterPro
How are protein signatures made?
Multiple sequence alignment
Protein family/domain Build model Search
Significant
matches
ITWKGPVCGLDGKTYRNECALL
AVPRSPVCGSDDVTYANECELK
SVPRSPVCGSDGVTYGTECDLK
HPPPGPVCGTDGLTYDNRCELR
E-value 1e-49
E-value 3e-42
E-value 5e-39
E-value 6e-10
Protein
signature
Refine
Structural
domains
Functional annotation of
families/domains
Protein
features
(sites)
Hidden Markov Models Finger
prints
Profiles Patterns
HAMAP
Database Basis Institution Built from Focus URL
Pfam HMM EBI
Sequence
alignment
Family & Domain
based on
conserved
sequence
http://pfam.xfam.org/
Gene3D HMM UCL
Structure
alignment
Structural
Domain
http://gene3d.biochem.ucl.a
c.uk/Gene3D/
Superfamily HMM Uni. of Bristol
Structure
alignment
Evolutionary
domain
relationships
http://supfam.cs.bris.ac.uk/
SUPERFAMILY/
SMART HMM EMBL Heidelberg
Sequence
alignment
Functional
domain
annotation
http://smart.embl-
heidelberg.de/
TIGRFAM HMM J. Craig Venter Inst.
Sequence
alignment
Microbial
Functional Family
Classification
http://www.jcvi.org/cms/rese
arch/projects/tigrfams/overv
iew/
Panther HMM Uni. S. California
Sequence
alignment
Family functional
classification
http://www.pantherdb.org/
PIRSF HMM
PIR, Georgetown,
Washington D.C.
Sequence
alignment
Functional
classification
http://pir.georgetown.edu/pir
www/dbinfo/pirsf.shtml
PRINTS Fingerprints Uni. of Manchester
Sequence
alignment
Family functional
classification
http://www.bioinf.mancheste
r.ac.uk/dbbrowser/PRINTS/i
ndex.php
PROSITE
Patterns &
Profiles
SIB
Sequence
alignment
Functional
annotation
http://expasy.org/prosite/
HAMAP Profiles SIB
Sequence
alignment
Microbial protein
family
classification
http://expasy.org/sprot/ham
ap/
Conserved
The aim of InterPro
InterPro
InterPro: multiple sequence analysis
• Outputs TSV, XML, GFF3, HTML & SVG formats
InterPro as a tool for Automatic
annotation
Why automatic annotation is needed
• data growth in UniProtKB is fast:
• manual curation is time-consuming
• experimental data are unavailable for many
sequences/organisms
• organisms’ genomes are sequenced but often no
biochemical characterization is conducted
Release Section of database No. of entries Growth
2015_10 reviewed (Swiss-Prot) ~0.5 mio slow
2015_10 unreviewed (TrEMBL) >50 mio rapid
The Concepts in GO
1. Molecular Function
2. Biological Process
3. Cellular Component
An elemental activity or task or job
• protein kinase activity
• insulin receptor
activity
A commonly recognised series of events
• cell division
Where a gene product is located
• mitochondrion
• mitochondrial matrix
• mitochondrial inner membrane
The relationship between InterPro and GO
(InterPro2GO)
• Curators manually add relevant GO terms to InterPro entries
• InterPro entry specificity determines the GO terms assigned
GO:0007186 G-protein coupled receptor signaling
GO:0016021 integral to membrane
GO:0007601 visual perception
GO:0007186 G-protein coupled receptor signaling
GO:0016021 integral to membrane
InterPro2GO
InterPro
Using InterPro for annotation
• InterPro is the world’s major source of GO terms:
~ 90 million GO terms for ~ 30 million distinct UniProtKB seqs
• Also underlies the system adding annotation to UniProtKB/TrEMBL
• Provides matches to ~40 million proteins (approx 80% of UniProtKB)
Annotation consistency:
• Using InterPro and GO for annotation allows direct comparison
proteins in UniProtKB
System
Rule
creation
Trigger Annotations Scope
SAAS automatic
taxonomy
InterPro
protein names,
EC numbers,
comments, KW
GO terms
all taxa
UniRule manual
taxonomy
InterPro*
proteome property
sequence length
protein names,
EC numbers,
gene names,
comments,
features**, KW,
GO terms
all taxa
*flexibility to create custom signatures and submitted to InterPro as required
**predictors for signal, transmembrane, coiled-coil features, alignment for positional ones
Automatic Annotation in UniProtKB
Components of a rule: conditions
Restrict application of rules to those unreviewed UniProtKB entries
fulfilling the conditions
Types of conditions:
• InterPro signatures
• Functional classification of proteins using predictive models (signatures)
• taxonomy
• sequences features, e.g. length
• proteome features, e.g. outer membrane:yes; (bacterial sequences)
Components of a rule: annotations
If an unreviewed UniProtKB entry fulfils conditions of a rule, annotations
in a rule are propagated to this entry.
Types of annotations:
• protein names, including enzyme classification (EC) numbers
• functional annotation, e.g. catalytic activities
• gene ontology terms
• keywords
• sequence features, e.g. active sites, transmembrane domains
How to access automatic annotation data?
How to access automatic annotation data?
Example of a UniRule
UR000172789 applied
evidence tags clearly state where annotation comes
from
Example of a UniRule
highlight a rule’s logic
Example of a UniRule
highlight a rule’s logic
Attributing evidence
It needs to be made clear to the user when information is
1. experimentally based
2. predicted
3. transferred from a related species
Use of evidence codes give this information
Evidence Code Ontology
http://www.ebi.ac.uk/ols/ontologies/eco
Thank you!
www.ebi.ac.uk
Twitter: @emblebi
Facebook: EMBLEBI

Contenu connexe

Tendances

Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesPistoia Alliance
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Edureka!
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zenecaKerstin Forsberg
 
Data Science Popup Austin: Data Meet Product
Data Science Popup Austin: Data Meet Product Data Science Popup Austin: Data Meet Product
Data Science Popup Austin: Data Meet Product Domino Data Lab
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data ManagementJulia Gross
 
Open science as roadmap to better data science research
Open science as roadmap to better data science researchOpen science as roadmap to better data science research
Open science as roadmap to better data science researchBeth Plale
 
Open Access to Research Data in H2020
Open Access to Research Data in H2020Open Access to Research Data in H2020
Open Access to Research Data in H2020OpenAIRE
 
The future of FAIR
The future of FAIRThe future of FAIR
The future of FAIRSarah Jones
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data CommonsSimon Twigger
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchersSarah Jones
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesSciBite Limited
 
LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Europe
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDASarah Jones
 
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...Dr. Haxel Consult
 
Data science
Data scienceData science
Data science9diov
 

Tendances (20)

Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zeneca
 
Data Science Popup Austin: Data Meet Product
Data Science Popup Austin: Data Meet Product Data Science Popup Austin: Data Meet Product
Data Science Popup Austin: Data Meet Product
 
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to...
 
ANDS and Data Management
ANDS and Data ManagementANDS and Data Management
ANDS and Data Management
 
Open science as roadmap to better data science research
Open science as roadmap to better data science researchOpen science as roadmap to better data science research
Open science as roadmap to better data science research
 
Open Access to Research Data in H2020
Open Access to Research Data in H2020Open Access to Research Data in H2020
Open Access to Research Data in H2020
 
The future of FAIR
The future of FAIRThe future of FAIR
The future of FAIR
 
IC-SDV 2018: Deep Search 9
IC-SDV 2018: Deep Search 9IC-SDV 2018: Deep Search 9
IC-SDV 2018: Deep Search 9
 
Converged IT and Data Commons
Converged IT and Data CommonsConverged IT and Data Commons
Converged IT and Data Commons
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchers
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
Open PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future ChallengesOpen PHACTS : Linked Data Future Challenges
Open PHACTS : Linked Data Future Challenges
 
Data Skills for Digital Era
Data Skills for Digital EraData Skills for Digital Era
Data Skills for Digital Era
 
IC-SDV 2018: Averbis
IC-SDV 2018: AverbisIC-SDV 2018: Averbis
IC-SDV 2018: Averbis
 
LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?LIBER Webinar: Are the FAIR Data Principles really fair?
LIBER Webinar: Are the FAIR Data Principles really fair?
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDA
 
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...
IC-SDV 2018: Stefan Geißler (Expert System) Navigating to new shores: the Bio...
 
Data science
Data scienceData science
Data science
 

En vedette

Emergency Community Resources
Emergency Community ResourcesEmergency Community Resources
Emergency Community Resourcesbrendahuls
 
Community Resources Ppt Adobe
Community Resources Ppt   AdobeCommunity Resources Ppt   Adobe
Community Resources Ppt Adobesarahjowatkins
 
Annotation Systems & Implementation Issues - Suzanna Lewis
Annotation Systems & Implementation Issues - Suzanna LewisAnnotation Systems & Implementation Issues - Suzanna Lewis
Annotation Systems & Implementation Issues - Suzanna LewisEMBL-ABR
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...ChemAxon
 
How to use your teaching resources
How to use your teaching resourcesHow to use your teaching resources
How to use your teaching resourcesRoadio
 
2016 08 04 CPD on DV and Child Abuse - Final
2016 08 04 CPD on DV and Child Abuse - Final2016 08 04 CPD on DV and Child Abuse - Final
2016 08 04 CPD on DV and Child Abuse - FinalAzan Marwah
 
sex worker and HIV
sex worker and HIVsex worker and HIV
sex worker and HIVAnil Kumar
 
Presentation On Child Abuse
Presentation On Child AbusePresentation On Child Abuse
Presentation On Child Abusehzulema1987
 
Trauma in the LGBTQ Community (TAG 2014)
Trauma in the LGBTQ Community (TAG 2014)Trauma in the LGBTQ Community (TAG 2014)
Trauma in the LGBTQ Community (TAG 2014)Morganne Ray
 
Attitude of teachers toward utilizing community resources in physics in abuja...
Attitude of teachers toward utilizing community resources in physics in abuja...Attitude of teachers toward utilizing community resources in physics in abuja...
Attitude of teachers toward utilizing community resources in physics in abuja...Alexander Decker
 
The Role Of The Trauma Social Worker
The Role Of The Trauma Social WorkerThe Role Of The Trauma Social Worker
The Role Of The Trauma Social Workerjenmsw
 
Copyright - using resources in teaching and learning
Copyright - using resources in teaching and learningCopyright - using resources in teaching and learning
Copyright - using resources in teaching and learningClaire Ridall
 
Trauma Informed Services and PBiS at LSSU
Trauma Informed Services and PBiS at LSSUTrauma Informed Services and PBiS at LSSU
Trauma Informed Services and PBiS at LSSUnmdreamcatcher
 
Project Against Child Abuse English Version2
Project Against Child Abuse English Version2Project Against Child Abuse English Version2
Project Against Child Abuse English Version2Melz James
 
Micro teaching and exploring community resources
Micro teaching and exploring community resourcesMicro teaching and exploring community resources
Micro teaching and exploring community resourcesEr Animo
 
Nurses Role On Substance Abuse By Philo
Nurses Role On Substance Abuse By PhiloNurses Role On Substance Abuse By Philo
Nurses Role On Substance Abuse By Philophiloarnold
 

En vedette (20)

Emergency Community Resources
Emergency Community ResourcesEmergency Community Resources
Emergency Community Resources
 
Community Resources Ppt Adobe
Community Resources Ppt   AdobeCommunity Resources Ppt   Adobe
Community Resources Ppt Adobe
 
Annotation Systems & Implementation Issues - Suzanna Lewis
Annotation Systems & Implementation Issues - Suzanna LewisAnnotation Systems & Implementation Issues - Suzanna Lewis
Annotation Systems & Implementation Issues - Suzanna Lewis
 
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
EUGM15 - George Papadatos, Mark Davies, Nathan Dedman (EMBL-EBI): SureChEMBL:...
 
How to use your teaching resources
How to use your teaching resourcesHow to use your teaching resources
How to use your teaching resources
 
2016 08 04 CPD on DV and Child Abuse - Final
2016 08 04 CPD on DV and Child Abuse - Final2016 08 04 CPD on DV and Child Abuse - Final
2016 08 04 CPD on DV and Child Abuse - Final
 
sex worker and HIV
sex worker and HIVsex worker and HIV
sex worker and HIV
 
Cadgme2016 keynote final
Cadgme2016 keynote finalCadgme2016 keynote final
Cadgme2016 keynote final
 
Presentation On Child Abuse
Presentation On Child AbusePresentation On Child Abuse
Presentation On Child Abuse
 
Trauma in the LGBTQ Community (TAG 2014)
Trauma in the LGBTQ Community (TAG 2014)Trauma in the LGBTQ Community (TAG 2014)
Trauma in the LGBTQ Community (TAG 2014)
 
Attitude of teachers toward utilizing community resources in physics in abuja...
Attitude of teachers toward utilizing community resources in physics in abuja...Attitude of teachers toward utilizing community resources in physics in abuja...
Attitude of teachers toward utilizing community resources in physics in abuja...
 
Qualities of a good Nurse
Qualities of a good NurseQualities of a good Nurse
Qualities of a good Nurse
 
The Role Of The Trauma Social Worker
The Role Of The Trauma Social WorkerThe Role Of The Trauma Social Worker
The Role Of The Trauma Social Worker
 
Copyright - using resources in teaching and learning
Copyright - using resources in teaching and learningCopyright - using resources in teaching and learning
Copyright - using resources in teaching and learning
 
Trauma Informed Services and PBiS at LSSU
Trauma Informed Services and PBiS at LSSUTrauma Informed Services and PBiS at LSSU
Trauma Informed Services and PBiS at LSSU
 
Project Against Child Abuse English Version2
Project Against Child Abuse English Version2Project Against Child Abuse English Version2
Project Against Child Abuse English Version2
 
Substance abuse
Substance abuseSubstance abuse
Substance abuse
 
Micro teaching and exploring community resources
Micro teaching and exploring community resourcesMicro teaching and exploring community resources
Micro teaching and exploring community resources
 
Orphange PItch
Orphange PItchOrphange PItch
Orphange PItch
 
Nurses Role On Substance Abuse By Philo
Nurses Role On Substance Abuse By PhiloNurses Role On Substance Abuse By Philo
Nurses Role On Substance Abuse By Philo
 

Similaire à Designing a community resource - Sandra Orchard

Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Trish Whetzel
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003robertstevens65
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsGigaScience, BGI Hong Kong
 
Designing Biological Databases
Designing Biological DatabasesDesigning Biological Databases
Designing Biological DatabasesArjei Balandra
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web servicesTrish Whetzel
 
Ontology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowOntology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowTrish Whetzel
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer SchoolCarole Goble
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceDavid Johnson
 
Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014Lee Larcombe
 
AI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite PredictionAI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite PredictionYannick Djoumbou
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityMonica Munoz-Torres
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Susanna-Assunta Sansone
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOAEBI
 
Expert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with UnileverExpert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with UnileverEagle Genomics
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchEuropean Bioinformatics Institute
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 

Similaire à Designing a community resource - Sandra Orchard (20)

Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientistsRamil Mauleon: Galaxy: bioinformatics for rice scientists
Ramil Mauleon: Galaxy: bioinformatics for rice scientists
 
Designing Biological Databases
Designing Biological DatabasesDesigning Biological Databases
Designing Biological Databases
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web services
 
Ontology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation WorkflowOntology-based Tools to Enhance the Curation Workflow
Ontology-based Tools to Enhance the Curation Workflow
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
Being FAIR:  FAIR data and model management SSBSS 2017 Summer SchoolBeing FAIR:  FAIR data and model management SSBSS 2017 Summer School
Being FAIR: FAIR data and model management SSBSS 2017 Summer School
 
GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014Intro to in silico drug discovery 2014
Intro to in silico drug discovery 2014
 
Data integration
Data integrationData integration
Data integration
 
AI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite PredictionAI and Machine Learning for Secondary Metabolite Prediction
AI and Machine Learning for Secondary Metabolite Prediction
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research Community
 
Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015Big Data Standards - Workshop, ExpBio, Boston, 2015
Big Data Standards - Workshop, ExpBio, Boston, 2015
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
Expert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with UnileverExpert panel on industrialising microbiomics - with Unilever
Expert panel on industrialising microbiomics - with Unilever
 
Important protein databases and proteomics softwares
Important protein databases and proteomics softwaresImportant protein databases and proteomics softwares
Important protein databases and proteomics softwares
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 

Dernier

Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function. MUKTA MANJARI SAHOO
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrashi Coaching
 
Unit 3, Herbal Drug Technology, B Pharmacy 6th Sem
Unit 3, Herbal Drug Technology, B Pharmacy 6th SemUnit 3, Herbal Drug Technology, B Pharmacy 6th Sem
Unit 3, Herbal Drug Technology, B Pharmacy 6th SemHUHam1
 
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...marwaahmad357
 
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Sérgio Sacani
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 
Alternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusAlternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusPradnya Wadekar
 
Gender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGRAPE
 
IB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxIB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxUalikhanKalkhojayev1
 
Genomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeGenomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeAjay Kumar Mahato
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPirithiRaju
 
Intensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxIntensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxHarshiniAlapati
 
PSP3 employability assessment form .docx
PSP3 employability assessment form .docxPSP3 employability assessment form .docx
PSP3 employability assessment form .docxmarwaahmad357
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPirithiRaju
 
Thermonuclear explosions on neutron stars reveal the speed of their jets
Thermonuclear explosions on neutron stars reveal the speed of their jetsThermonuclear explosions on neutron stars reveal the speed of their jets
Thermonuclear explosions on neutron stars reveal the speed of their jetsSérgio Sacani
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxmarwaahmad357
 
M.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsM.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsSumathi Arumugam
 
Excavation Methods in Archaeological Research & Studies
Excavation Methods in Archaeological Research &  StudiesExcavation Methods in Archaeological Research &  Studies
Excavation Methods in Archaeological Research & StudiesPrachya Adhyayan
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WaySérgio Sacani
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docxmarwaahmad357
 

Dernier (20)

Human brain.. It's parts and function.
Human brain.. It's parts and function. Human brain.. It's parts and function.
Human brain.. It's parts and function.
 
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्रKrishi Vigyan Kendras - कृषि विज्ञान केंद्र
Krishi Vigyan Kendras - कृषि विज्ञान केंद्र
 
Unit 3, Herbal Drug Technology, B Pharmacy 6th Sem
Unit 3, Herbal Drug Technology, B Pharmacy 6th SemUnit 3, Herbal Drug Technology, B Pharmacy 6th Sem
Unit 3, Herbal Drug Technology, B Pharmacy 6th Sem
 
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
 
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
Legacy Analysis of Dark Matter Annihilation from the Milky Way Dwarf Spheroid...
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 
Alternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabusAlternative system of medicine herbal drug technology syllabus
Alternative system of medicine herbal drug technology syllabus
 
Gender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eye
 
IB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptxIB Biology New syllabus B3.2 Transport.pptx
IB Biology New syllabus B3.2 Transport.pptx
 
Genomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenomeGenomics and Bioinformatics basics from genome to phenome
Genomics and Bioinformatics basics from genome to phenome
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPR
 
Intensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxIntensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptx
 
PSP3 employability assessment form .docx
PSP3 employability assessment form .docxPSP3 employability assessment form .docx
PSP3 employability assessment form .docx
 
Pests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPRPests of Redgram_Identification, Binomics_Dr.UPR
Pests of Redgram_Identification, Binomics_Dr.UPR
 
Thermonuclear explosions on neutron stars reveal the speed of their jets
Thermonuclear explosions on neutron stars reveal the speed of their jetsThermonuclear explosions on neutron stars reveal the speed of their jets
Thermonuclear explosions on neutron stars reveal the speed of their jets
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docx
 
M.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery SystemsM.Pharm - Question Bank - Drug Delivery Systems
M.Pharm - Question Bank - Drug Delivery Systems
 
Excavation Methods in Archaeological Research & Studies
Excavation Methods in Archaeological Research &  StudiesExcavation Methods in Archaeological Research &  Studies
Excavation Methods in Archaeological Research & Studies
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docx
 

Designing a community resource - Sandra Orchard

  • 1. Designing a community resource – the Complex Portal as an example Sandra Orchard
  • 2. Hands-on exercise Design a manually curated data resource that will enable the description of species agnostic protein complexes, to act as reference resource in the same way that UniProt does for proteins – use as examples 1. Human Haemoglobin 2. Arabidopsis Light harvesting complex
  • 3. Designing a new resource - what else is out there? • Before starting to design a resource, assess what else is out there – re-inventing the wheel causes community fragmentation and confusion as well as being a waste of limited funds • Is it needed – what gap in the market is it designed to fill? • Investigate possibilities for collaboration, rather than competition • If another resource exists, does it meet your/consumer demands – can you contribute and improve
  • 4. Designing a new resource • How will researchers use it, what information do they want? Conduct extensive user requirement studies before starting the design process. • How will users search it? This will impact on data entry/annotation. • Data visualisation – again, what do users want? Usability studies are critical • Long term plans – will it survive the first grant renewal?
  • 5. Complex Portal - what else was out there? • Information on protein complexes scattered between multiple resources but no unifying resource • MIPS catalogued yeast complexes in 2000 • Corum – human complexes, project terminated in 2009 • Decision – use as starting point or start again?
  • 6. Information content and presentation • User consultation – design what they need, not what you want to give them • Don’t get too attached to your first paper prototype – be prepared to sacrifice your concept to community need • Develop a beta site, then observe researchers using it. • Keep testing, react to new demands, novel use cases
  • 7. Use of community standards • Use of community standards enable • Data merger across multiple resources – contribute to a greater community effort • Data re-use and longevity • Immediate access to existing tool suites
  • 8. Use of Community standards – Complex Portal • Established standard formats for molecular interactions PSI-MI XML/MITAB) • PSI-XML2.5 designed for experimental data, curated complex data not a perfect fit – worked with PSI-MI workgroup to produce new version • MITAB designed for binary pairs, not complexes – ComplexTAB will be presented to MI workgroup for adoption
  • 9. Use of Community standards – Complex Portal • Used existing identifiers for components (UniProtKB, ChEBI, RNAcentral) – enables import of additional information using resource APIs, for example can search website using gene synonyms - Organism non-specific, enables us to describe complexes in a range of species, including non-model organisms
  • 10. Use of Community standards enables use of existing tools • Community standards have encouraged tool development by users, software often open-source and freely available – often can be incorporated directly into websites with little/no additional development • Complex Portal viewer originally written to visualise cross-linking data
  • 11. Use of Community standards enables use of existing tools • Look for initiatives which make open-source tools, apps/plug-ins, visualizers and widgets freely available e.g. BioJS, BioPerl, Cytoscape……
  • 12. Free text vs Ontologies Free text Pros – versatile, fully descriptive, flexible Cons – can be difficult to interpret, long winded, error-prone, difficult to search CVs Pros – structured, consistent, concise Cons – may not deal well with ‘odd’ cases, lack of information Consider using both!
  • 13. Use of controlled vocabularies • Again, re-use rather than re-invent • Use of CVs enables searches across resources, but also can make intelligent searches within resources easy to implement For example can search for • all transcription factors • all complexes involved in respiration • all mitochondrial complexes
  • 14. Use of controlled vocabularies In the Complex Portal you can search for 1. All enzymes - GO:0003824 (catalytic activity) 2. All transferases - GO:0016740 (transferase activity) 3. All protein kinases - GO:0004672 (protein kinase activity) 4. All cyclin-dependent protein kinase - GO:0097472 (cyclin-dependent protein kinase activity) Similarly can use the ChEBI ontology – search on porphyrin
  • 15. Linking to external resources • Extensive cross- referencing is time consuming but enables subsequent pulling in of data from other resources
  • 16. Make this the ‘go to’ resource for your community • Must fit community need, be easy to search and deliver the results the user wants Outreach – publications, conferences, talks…. Collaborate on a high impact analysis paper, with your resource playing a key role. Protocols, tutorials, videos, hands-on training courses. Use social media
  • 18. InterPro and Annotation transfer to Non-Model Organism Proteomes
  • 19. What is InterPro • InterPro provides functional analysis of proteins by classifying them into families and predicting domains and important sites. • Combine protein signatures from a number of member databases into a single searchable resource, • Has resulted in an integrated database and diagnostic tool (InerProScan).
  • 20. Protein signatures Model the pattern of conserved amino acids at specific positions within a multiple sequence alignment • Patterns • Profiles • Profile HMMs Use these models (signatures) to infer relationships with the characterised sequences from which the alignment was constructed Approach used by a variety of databases: Pfam, TIGRFAMs, PANTHER, Prosite, etc
  • 21. Protein signatures Alternatively, model the pattern of conserved amino acids at specific positions within a multiple sequence alignment • Patterns • Profiles • Profile HMMs Use these models (signatures) to infer relationships with the characterised sequences from which the alignment was constructed Approach used by a variety of databases: Pfam, TIGRFAMs, PANTHER, Prosite, etc
  • 22. Introduction to InterPro How are protein signatures made? Multiple sequence alignment Protein family/domain Build model Search Significant matches ITWKGPVCGLDGKTYRNECALL AVPRSPVCGSDDVTYANECELK SVPRSPVCGSDGVTYGTECDLK HPPPGPVCGTDGLTYDNRCELR E-value 1e-49 E-value 3e-42 E-value 5e-39 E-value 6e-10 Protein signature Refine
  • 24. Database Basis Institution Built from Focus URL Pfam HMM EBI Sequence alignment Family & Domain based on conserved sequence http://pfam.xfam.org/ Gene3D HMM UCL Structure alignment Structural Domain http://gene3d.biochem.ucl.a c.uk/Gene3D/ Superfamily HMM Uni. of Bristol Structure alignment Evolutionary domain relationships http://supfam.cs.bris.ac.uk/ SUPERFAMILY/ SMART HMM EMBL Heidelberg Sequence alignment Functional domain annotation http://smart.embl- heidelberg.de/ TIGRFAM HMM J. Craig Venter Inst. Sequence alignment Microbial Functional Family Classification http://www.jcvi.org/cms/rese arch/projects/tigrfams/overv iew/ Panther HMM Uni. S. California Sequence alignment Family functional classification http://www.pantherdb.org/ PIRSF HMM PIR, Georgetown, Washington D.C. Sequence alignment Functional classification http://pir.georgetown.edu/pir www/dbinfo/pirsf.shtml PRINTS Fingerprints Uni. of Manchester Sequence alignment Family functional classification http://www.bioinf.mancheste r.ac.uk/dbbrowser/PRINTS/i ndex.php PROSITE Patterns & Profiles SIB Sequence alignment Functional annotation http://expasy.org/prosite/ HAMAP Profiles SIB Sequence alignment Microbial protein family classification http://expasy.org/sprot/ham ap/ Conserved
  • 25. The aim of InterPro InterPro
  • 26. InterPro: multiple sequence analysis • Outputs TSV, XML, GFF3, HTML & SVG formats
  • 27. InterPro as a tool for Automatic annotation
  • 28. Why automatic annotation is needed • data growth in UniProtKB is fast: • manual curation is time-consuming • experimental data are unavailable for many sequences/organisms • organisms’ genomes are sequenced but often no biochemical characterization is conducted Release Section of database No. of entries Growth 2015_10 reviewed (Swiss-Prot) ~0.5 mio slow 2015_10 unreviewed (TrEMBL) >50 mio rapid
  • 29. The Concepts in GO 1. Molecular Function 2. Biological Process 3. Cellular Component An elemental activity or task or job • protein kinase activity • insulin receptor activity A commonly recognised series of events • cell division Where a gene product is located • mitochondrion • mitochondrial matrix • mitochondrial inner membrane
  • 30. The relationship between InterPro and GO (InterPro2GO) • Curators manually add relevant GO terms to InterPro entries • InterPro entry specificity determines the GO terms assigned GO:0007186 G-protein coupled receptor signaling GO:0016021 integral to membrane GO:0007601 visual perception GO:0007186 G-protein coupled receptor signaling GO:0016021 integral to membrane
  • 32. Using InterPro for annotation • InterPro is the world’s major source of GO terms: ~ 90 million GO terms for ~ 30 million distinct UniProtKB seqs • Also underlies the system adding annotation to UniProtKB/TrEMBL • Provides matches to ~40 million proteins (approx 80% of UniProtKB) Annotation consistency: • Using InterPro and GO for annotation allows direct comparison proteins in UniProtKB
  • 33. System Rule creation Trigger Annotations Scope SAAS automatic taxonomy InterPro protein names, EC numbers, comments, KW GO terms all taxa UniRule manual taxonomy InterPro* proteome property sequence length protein names, EC numbers, gene names, comments, features**, KW, GO terms all taxa *flexibility to create custom signatures and submitted to InterPro as required **predictors for signal, transmembrane, coiled-coil features, alignment for positional ones Automatic Annotation in UniProtKB
  • 34. Components of a rule: conditions Restrict application of rules to those unreviewed UniProtKB entries fulfilling the conditions Types of conditions: • InterPro signatures • Functional classification of proteins using predictive models (signatures) • taxonomy • sequences features, e.g. length • proteome features, e.g. outer membrane:yes; (bacterial sequences)
  • 35. Components of a rule: annotations If an unreviewed UniProtKB entry fulfils conditions of a rule, annotations in a rule are propagated to this entry. Types of annotations: • protein names, including enzyme classification (EC) numbers • functional annotation, e.g. catalytic activities • gene ontology terms • keywords • sequence features, e.g. active sites, transmembrane domains
  • 36. How to access automatic annotation data?
  • 37. How to access automatic annotation data?
  • 38. Example of a UniRule
  • 39. UR000172789 applied evidence tags clearly state where annotation comes from
  • 40. Example of a UniRule highlight a rule’s logic
  • 41. Example of a UniRule highlight a rule’s logic
  • 42. Attributing evidence It needs to be made clear to the user when information is 1. experimentally based 2. predicted 3. transferred from a related species Use of evidence codes give this information Evidence Code Ontology http://www.ebi.ac.uk/ols/ontologies/eco