SlideShare a Scribd company logo
1 of 29
PomBase Community Curation:
A Fast Track to Capture Expert Knowledge

Antonia Lock
The S. pombe Community
¡  Medium-sized research community
 ¡  >200 labs, 1300 subscribe to mailing list
 ¡  Close-knit

¡  GeneDB S. pombe model organism database set up in 2004
 ¡  Maintained by one person (V. Wood)
 ¡  Mainly GO annotation
 ¡  Problem:
   ¡  Needed to support additional types of data
   ¡  Too many publications to curate considering the available
       man-power
The Community Curation
Initiative
¡  Pilot study in 2009
  ¡  Highly successful
      ¡  29/44 responded (no follow up for non-responders)
      ¡  ~360 new annotations
      ¡  Annotations were generally of high quality – errors easy to spot
      ¡  Enabled a dialogue between author and curators
  ¡  Process must be simplified
      ¡  Need for a simple tool in which to do the curation, instead of a
          complicated word document

¡  2010 – Wellcome Trust grant
  ¡  to develop and implement a community curation tool
  ¡  Also to develop a new fission yeast database ‘PomBase’ which will
      support a range of additional data-types not previously captured in
      GeneDB
Data captured in GeneDB
   vs. PomBase
Data type                         Ontology               GeneDB    PomBase
Function/Process/Component        GO                       ✔          ✔
Protein modifications             Protein Modification      -         ✔
                                  Ontology
Phenotypes                        FYPO (Fission Yeast     Some        ✔
                                  Phenotype Ontology)
Interactions                      BioGRID                BioGRID      ✔
Gene expression                   In-house vocabulary       -         ✔

Misc features (disease            In-house vocabulary      ✔          ✔
associations, complementation…)



   The increased breadth makes community curation even more important
Phenotype Ontology
¡  User survey 2007 - Phenotypes were identified as the single most
    desirable information type not supported by GeneDB S.
    pombe.

¡  Need for a pre-composed Fission Yeast Phenotype Ontology
  ¡  Ease for community curation
  ¡  Needed greater specificity of terms than that offered by existing
      phenotype ontologies

¡  Term is accompanied by two types of information:
  ¡  Allele description – deletion, overexpression of mutation
  ¡  Experimental conditions where appropriate

¡  Combination of different ontologies used to create formal definitions
     ¡  E.g. PATO, ChEBI, GO
      PATO                  FYPO                                ChEBI
      resistance to         resistance to thiabendazole         thiabendazole
GO Term Extensions


GO	
  ID	
      Term	
                                                 Evidence	
     With/From	
     Source	
  

GO:004674	
     Protein	
  serine/threonine	
  kinase	
  ac<vity	
  

                has_substrate	
  pom1	
                                IDA	
          	
  	
          Yoon	
  HJ	
  et	
  al.	
  (2006)	
  

                has_substrate	
  rum1	
  	
  	
  	
                    IDA	
          	
  	
          Noguchi	
  E	
  et	
  al.	
  (2002)	
  

                has_substrate	
  rbp80	
                               IDA	
          	
  	
          Holig	
  K	
  et	
  al.	
  (2009)	
  

                has_substrate	
  sin1	
                                IDA	
          	
  	
          Jang	
  YJ	
  et	
  al.	
  (1997)	
  
Why Not a Wiki?
¡  Traditionally biologists would study one gene/protein
 ¡  Individual text-based gene pages were an ideal format

¡  Many techniques used today generate gene lists
 ¡  Enrichment identify patterns in the data-set e.g. are certain
     processes common the group of genes?
 ¡  Need annotations to controlled vocabularies to make efficient,
     computerized comparisons
   ¡  A wiki, essentially free-text, does not provide this

¡  All annotations are supported by evidence
What Will the
Community Curate?
¡  Data that can be captured by the formal vocabularies used in
    PomBase
 ¡  GO (including extensions)
 ¡  Protein modifications (including residue information)
 ¡  Phenotypes (including alleles and conditions)
 ¡  Interactions

¡  Mostly pre-composed terms
 ¡  Extensions will be captured by prompting where relevant
   ¡  E.g. the community will not be expected to know when to use these
The Community Annotation
Tool - CANTO
¡  Final stages of development
 ¡  Developed by Kim Rutherford
 ¡  Already in use by the PomBase curators
 ¡  We are involving the community at this stage through review of
     curated (recent) publications

¡  Provides a web-based interface
 ¡  Can be used as a stand-alone application (provides annotations in
     GAFs)
 ¡  Pipelines are in place for direct loading into Chado
   ¡  Chado (GMOD project) is a database schema for handling
       biological data
5 Easy Steps to Broad
Curation of Data
- A Walk-through
Step 1: add your genes
The main page
- choose a gene to get started…
Step 2: Choose the type of
annotation
Step 3: Find the correct term
Child terms are suggested…
Step 4: Add the evidence
Step 5: Review, extend and
transfer
Quality Control and
Consistency Checking
¡  Professional curators are needed not just for
    curation support, but also for quality control and
    consistency checking.
Help?!
¡  There is always a visible help button
Benefits of Community
Curation

¡  Researchers can curate ‘from home’ immediately following
    publication
 ¡  First-pass annotations quickly obtained – data will quickly appear in the
     database
 ¡  Expert knowledge, coupled to quality control by curators make for
     powerful, accurate annotations
 ¡  Controlled annotations can be loaded from the tool directly into our
     database

¡  Bottle-neck is how quickly professional curators can check
    annotations, not how fast we can obtain them

¡  Frees up time for us to clear the back-log of papers
Benefits to the Researcher
¡  Greater visibility of
    publication
  ¡  Annotations propagated to
      GO, BioGRID, Ensembl, NCBI,
      UniProt…
  ¡  Increased citation index?

¡  A greater understanding of
    ontologies
  ¡  Will be able to use them
      better to support their
      research
Future Directions
¡  ~3 months until official launch of CANTO
 ¡  Multi-gene phenotypes
 ¡  Extensions (restricted usage for specific terms and
     relations)
 ¡  More help features and descriptive boxes

¡  Longer term
 ¡  Making the tool easily configurable for other
     organisms
 ¡  Making the tool available to other communities
Acknowledgements
¡  The PomBase team:
   ¡    Val Wood
   ¡    Midori Harris
   ¡    Kim Rutherford
   ¡    Mark McDowall
   ¡    Antonia Lock

¡  PI’s:
   ¡  Jurg Bahler (UCL)
   ¡  Steve Oliver (Cambridge)
   ¡  Paul Kersey (EBI Hinxton)

¡  Funded by the Wellcome
    Trust

More Related Content

Viewers also liked (7)

Belajar dari Sejarah Untuk Membangun Kekuatan Perdagangan dan Keuangan Umat
Belajar dari Sejarah Untuk Membangun Kekuatan Perdagangan dan Keuangan UmatBelajar dari Sejarah Untuk Membangun Kekuatan Perdagangan dan Keuangan Umat
Belajar dari Sejarah Untuk Membangun Kekuatan Perdagangan dan Keuangan Umat
 
Leverage the Internet to Empower Your Career, Image, and Reputation
Leverage the Internet to Empower Your Career, Image, and ReputationLeverage the Internet to Empower Your Career, Image, and Reputation
Leverage the Internet to Empower Your Career, Image, and Reputation
 
Presentatie HootSuite pro
Presentatie HootSuite proPresentatie HootSuite pro
Presentatie HootSuite pro
 
Scotland vs canada powerpoint
Scotland vs canada powerpointScotland vs canada powerpoint
Scotland vs canada powerpoint
 
Brochure Ski Portillo
Brochure Ski PortilloBrochure Ski Portillo
Brochure Ski Portillo
 
Ingrid Castillo
Ingrid CastilloIngrid Castillo
Ingrid Castillo
 
Presentation mentioned by Greg Howell on episode 47 of the Construction Indus...
Presentation mentioned by Greg Howell on episode 47 of the Construction Indus...Presentation mentioned by Greg Howell on episode 47 of the Construction Indus...
Presentation mentioned by Greg Howell on episode 47 of the Construction Indus...
 

Similar to Lock - PomBase community curation

UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Monica Munoz-Torres
 

Similar to Lock - PomBase community curation (20)

UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
ICAR2016 TAIR talk
ICAR2016 TAIR talkICAR2016 TAIR talk
ICAR2016 TAIR talk
 
TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...TAIR -Using biological ontologies to accelerate progress in plant biology res...
TAIR -Using biological ontologies to accelerate progress in plant biology res...
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Plant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In SequencesPlant Pathogen Genome Data: My Life In Sequences
Plant Pathogen Genome Data: My Life In Sequences
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Trends In Genomics
Trends In GenomicsTrends In Genomics
Trends In Genomics
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
Bio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challengesBio-ontologies in bioinformatics: Growing up challenges
Bio-ontologies in bioinformatics: Growing up challenges
 
Prosdocimi ucb cdao
Prosdocimi ucb cdaoProsdocimi ucb cdao
Prosdocimi ucb cdao
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
RDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developmentsRDA Wheat Data Interoperability Cookbook and last developments
RDA Wheat Data Interoperability Cookbook and last developments
 
Integrate Ontologies into your apps
Integrate Ontologies into your appsIntegrate Ontologies into your apps
Integrate Ontologies into your apps
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
Curate locally, think globally
Curate locally, think globallyCurate locally, think globally
Curate locally, think globally
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
Ontology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrievalOntology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrieval
 
Ontology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrievalOntology development and use for efficient information input and retrieval
Ontology development and use for efficient information input and retrieval
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
The Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and CurationThe Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and Curation
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 

Lock - PomBase community curation

  • 1. PomBase Community Curation: A Fast Track to Capture Expert Knowledge Antonia Lock
  • 2. The S. pombe Community ¡  Medium-sized research community ¡  >200 labs, 1300 subscribe to mailing list ¡  Close-knit ¡  GeneDB S. pombe model organism database set up in 2004 ¡  Maintained by one person (V. Wood) ¡  Mainly GO annotation ¡  Problem: ¡  Needed to support additional types of data ¡  Too many publications to curate considering the available man-power
  • 3.
  • 4. The Community Curation Initiative ¡  Pilot study in 2009 ¡  Highly successful ¡  29/44 responded (no follow up for non-responders) ¡  ~360 new annotations ¡  Annotations were generally of high quality – errors easy to spot ¡  Enabled a dialogue between author and curators ¡  Process must be simplified ¡  Need for a simple tool in which to do the curation, instead of a complicated word document ¡  2010 – Wellcome Trust grant ¡  to develop and implement a community curation tool ¡  Also to develop a new fission yeast database ‘PomBase’ which will support a range of additional data-types not previously captured in GeneDB
  • 5. Data captured in GeneDB vs. PomBase Data type Ontology GeneDB PomBase Function/Process/Component GO ✔ ✔ Protein modifications Protein Modification - ✔ Ontology Phenotypes FYPO (Fission Yeast Some ✔ Phenotype Ontology) Interactions BioGRID BioGRID ✔ Gene expression In-house vocabulary - ✔ Misc features (disease In-house vocabulary ✔ ✔ associations, complementation…) The increased breadth makes community curation even more important
  • 6. Phenotype Ontology ¡  User survey 2007 - Phenotypes were identified as the single most desirable information type not supported by GeneDB S. pombe. ¡  Need for a pre-composed Fission Yeast Phenotype Ontology ¡  Ease for community curation ¡  Needed greater specificity of terms than that offered by existing phenotype ontologies ¡  Term is accompanied by two types of information: ¡  Allele description – deletion, overexpression of mutation ¡  Experimental conditions where appropriate ¡  Combination of different ontologies used to create formal definitions ¡  E.g. PATO, ChEBI, GO PATO FYPO ChEBI resistance to resistance to thiabendazole thiabendazole
  • 7. GO Term Extensions GO  ID   Term   Evidence   With/From   Source   GO:004674   Protein  serine/threonine  kinase  ac<vity   has_substrate  pom1   IDA       Yoon  HJ  et  al.  (2006)   has_substrate  rum1         IDA       Noguchi  E  et  al.  (2002)   has_substrate  rbp80   IDA       Holig  K  et  al.  (2009)   has_substrate  sin1   IDA       Jang  YJ  et  al.  (1997)  
  • 8. Why Not a Wiki? ¡  Traditionally biologists would study one gene/protein ¡  Individual text-based gene pages were an ideal format ¡  Many techniques used today generate gene lists ¡  Enrichment identify patterns in the data-set e.g. are certain processes common the group of genes? ¡  Need annotations to controlled vocabularies to make efficient, computerized comparisons ¡  A wiki, essentially free-text, does not provide this ¡  All annotations are supported by evidence
  • 9. What Will the Community Curate? ¡  Data that can be captured by the formal vocabularies used in PomBase ¡  GO (including extensions) ¡  Protein modifications (including residue information) ¡  Phenotypes (including alleles and conditions) ¡  Interactions ¡  Mostly pre-composed terms ¡  Extensions will be captured by prompting where relevant ¡  E.g. the community will not be expected to know when to use these
  • 10. The Community Annotation Tool - CANTO ¡  Final stages of development ¡  Developed by Kim Rutherford ¡  Already in use by the PomBase curators ¡  We are involving the community at this stage through review of curated (recent) publications ¡  Provides a web-based interface ¡  Can be used as a stand-alone application (provides annotations in GAFs) ¡  Pipelines are in place for direct loading into Chado ¡  Chado (GMOD project) is a database schema for handling biological data
  • 11. 5 Easy Steps to Broad Curation of Data - A Walk-through
  • 12. Step 1: add your genes
  • 13. The main page - choose a gene to get started…
  • 14. Step 2: Choose the type of annotation
  • 15. Step 3: Find the correct term
  • 16. Child terms are suggested…
  • 17. Step 4: Add the evidence
  • 18. Step 5: Review, extend and transfer
  • 19.
  • 20.
  • 21.
  • 22. Quality Control and Consistency Checking ¡  Professional curators are needed not just for curation support, but also for quality control and consistency checking.
  • 23.
  • 24.
  • 25. Help?! ¡  There is always a visible help button
  • 26. Benefits of Community Curation ¡  Researchers can curate ‘from home’ immediately following publication ¡  First-pass annotations quickly obtained – data will quickly appear in the database ¡  Expert knowledge, coupled to quality control by curators make for powerful, accurate annotations ¡  Controlled annotations can be loaded from the tool directly into our database ¡  Bottle-neck is how quickly professional curators can check annotations, not how fast we can obtain them ¡  Frees up time for us to clear the back-log of papers
  • 27. Benefits to the Researcher ¡  Greater visibility of publication ¡  Annotations propagated to GO, BioGRID, Ensembl, NCBI, UniProt… ¡  Increased citation index? ¡  A greater understanding of ontologies ¡  Will be able to use them better to support their research
  • 28. Future Directions ¡  ~3 months until official launch of CANTO ¡  Multi-gene phenotypes ¡  Extensions (restricted usage for specific terms and relations) ¡  More help features and descriptive boxes ¡  Longer term ¡  Making the tool easily configurable for other organisms ¡  Making the tool available to other communities
  • 29. Acknowledgements ¡  The PomBase team: ¡  Val Wood ¡  Midori Harris ¡  Kim Rutherford ¡  Mark McDowall ¡  Antonia Lock ¡  PI’s: ¡  Jurg Bahler (UCL) ¡  Steve Oliver (Cambridge) ¡  Paul Kersey (EBI Hinxton) ¡  Funded by the Wellcome Trust