The document discusses BioDBCore, a collaborative project aimed at gathering and standardizing metadata about biological databases. It provides an overview of BioDBCore's goals of improving data integration, encouraging standards, and maximizing resources. BioDBCore is led by Pascale Gaudet and Philippe Rocca-Serra and implemented on the BioSharing website. The document outlines the BioDBCore descriptors for databases and provides an example entry for the dictyBase database. It discusses maintaining and expanding BioDBCore records with the help of database providers and journals.
Scanning the Internet for External Cloud Exposures via SSL Certs
BioDBCore: Current Status and Next Developments
1. Pascale Gaudet
Chair, International Society for Biocuration
Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics
BioDBCore: Current status
and future developments
2. International Society for Biocuration:
Mission statement
• Define and promote the work of biocurators
• Foster connections with user communities to
ensure that databases and accompanying
tools meet specific user needs
• Promote communication and exchanges
between curators: meetings, workshops,
• Encourage best practices by providing
documentation on standards and annotation
procedures
ISB
3. The need
• Databases: improve data integration from
published papers
• Journals: link to databases objects
• Researchers: identify resources
• Grant submitters: enforce data sharing plans
4. Goals
1) Gather information required to provide a
general overview of the database
landscape and compare the various
resources
2) Encourage consistency and interoperability
3) Promote the use of standards
4) Provide guidance for users
5) Maximize the collective impact of the
resources
5. BioDBcore group organization
• Lead by Pascale Gaudet (ISB/SIB) and
Philippe-Rocca-Serra (BioSharing)
• Guidelines proposed in 2011 paper
• Implemented in 2012 NAR database issue
6. Use cases
• Show all resources of type database which use
MIMARK guidelines
• Show all resources where John Smith is involved
• Show all resources for mouse phenotypes
• Where can I submit my data?
and also:
• Guidance for grants’ data sharing policies
• Improving integration of data from papers into
databases
7. Collaborative philosophy
• Many groups/resources have been providing
registries and lists of databases
• Often not funded, not maintained
• BioDBCore seeks to collaborate with all interested
parties to work together to provide a more
permanent solution to database descriptions
8. BioDBcore: Participating groups
² BioDB100
² BioSharing
² BioCatalogue
² Bioinformatics Links Directory
² Biositemaps
² CASIMIR
² MIBBI
² MIRIAM
² Model Organism Databases
² NIF registry
² … and your group !
9. BioDBCore descriptors
1. Database name
2. Main resource URL
3. Contact information (e-mail; postal mail)
4. Date resource established (year)
5. Conditions of use (Free, or type of license)
6. Scope: data types captured, curation policy,
standards used
7. Standards: MIs, Data formats, Terminologies
8. Taxonomic coverage
9. Data accessibility/output options
10. Data release frequency
11. Versioning policy and access to historical files
12. Documentation available
13. User support options
14. Data submission policy
15. Relevant publications
16. Resource’s Wikipedia URL
17. Tools available
10. Database name dictyBase
Main resource URL http://dictybase.org
Contact information dictybase@northwestern.edu
Date resource established (year)2003
Conditions of use Free
Scope: Data types captured
Genome sequence; gene models including CDS and predicted proteins;
Phenotypes,
Gene Ontology annotations,
Functional annotation (gene product names),
Gene nomenclature;
Strains; Plasmids;
Free text descriptions,
Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Protein
subcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot),
Citations, Researchers database
11. Curation policy manual curation
Standards: MIs, Data formats, Terminologies Gene Ontology,
Dicty Anatomy Ontology, Dicty Gene Nomeclature
Data formats FASTA, OBO, GAF, GFF3 (standard)
Taxonomic coverage (use NCBI Taxid) D. discoideum (44689)
including all strains [PRIMARY], also some genome/EST/gene
model info for D. purpureum (5786), and gene model sequences
for P. pallidum (13642) and D. fasiculatum (261658)
Data accessibility/output optionsHTML, text, database reports
Data release frequency curators work on the 'live' database,
weekly data dumps (sequences) or monthly
(other data)
Versioning policy/ access to historical files no versioning
but access to historical
files is possible
12. Documentation available http://dictybase.org/FAQ/
HelpFilesIndex.html
User support options documents, email, webform
Data submission policy Data from published literature. Some HTP
data
corresponding to published analyses is
incorporated
Relevant publications PMID: 18974179, PMID: 14681427
Resource’s Wikipedia URL
http://en.wikipedia.org/wiki/DictyBase
Tools available BLAST, BioMart, Generic Genome Browser, TextPresso,
MetaCyc (dictyCyc)
15. Implementation plan
• Goal: BioDBCore data public and linked
• Community aware approach: reuse existing
stuff
• Current Data model: RDF based on
categories from BioSiteMap, MIRIAM, NIF,
Dublin core, Darwin Core
• Defined extension mechanisms
19. Creating, editing, maintaining entries
• Until now: records are manually created from data
provided by NAR at publication of Database issue
and the Life Sciences Registry (Michel Dumontier and
Nick Juty)
- Those mostly come as xls files that need to be
manually entered
- Close to 200 records have been entered
out of over 2,000 obtained
20. Beyond maintenance at BioSharing
Ideally database providers would maintain their BioDBCore
record up to date
• Claim ownership
- A database provider can now (in theory) maintain his
own BioDBCore record
Encouraging best practices
• DATABASE and Nucleic Acids Research journals:
Editors in chief request BioDBCore information from
submitters
• ISB seal of approval
• BioDB100 - launched at InCoB 2011 – examples of 100 well
annotated databases
21. What’s next ?
q Continue to extend participating groups and journals
q Refine scope
q Integrate semantic support
q Develop querying system
q Implement validation tests
q Set up mechanisms for exchange of data among
collaborating groups (in BioDBCore RDF format, or
other)
22. Identifying or developing
semantic support
• Policies and guidelines: BioSharing
• Publications and taxon info: identifiers.org
• Authors: ORCID (will also implement
organizations)
• Keywords/database scope: NIF when possible
Identifying resources is preferable to developing them !
23. For biohackaton2013
q Evaluate need for BioDBCore in today’s landscape
of metadatabase resources
q Evaluate further collaboration opportunities
q Set up a better system for creating and maintaining
BioDBCore records
q Identify/develop ontologies pertinent to BioDBCore