SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Pascale Gaudet
Chair, International Society for Biocuration
Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics
BioDBCore: Current status
and future developments
International Society for Biocuration:
Mission statement
•  Define and promote the work of biocurators
•  Foster connections with user communities to
ensure that databases and accompanying
tools meet specific user needs
•  Promote communication and exchanges
between curators: meetings, workshops,
•  Encourage best practices by providing
documentation on standards and annotation
procedures
ISB
The need
• Databases: improve data integration from
published papers
• Journals: link to databases objects
• Researchers: identify resources
• Grant submitters: enforce data sharing plans
Goals
1)  Gather information required to provide a
general overview of the database
landscape and compare the various
resources
2)  Encourage consistency and interoperability
3)  Promote the use of standards
4)  Provide guidance for users
5)  Maximize the collective impact of the
resources
BioDBcore group organization
•  Lead by Pascale Gaudet (ISB/SIB) and
Philippe-Rocca-Serra (BioSharing)
•  Guidelines proposed in 2011 paper
•  Implemented in 2012 NAR database issue
Use cases
•  Show all resources of type database which use
MIMARK guidelines
•  Show all resources where John Smith is involved
•  Show all resources for mouse phenotypes
•  Where can I submit my data?
and also: 
•  Guidance for grants’ data sharing policies
•  Improving integration of data from papers into
databases
Collaborative philosophy
•  Many groups/resources have been providing
registries and lists of databases
•  Often not funded, not maintained
•  BioDBCore seeks to collaborate with all interested
parties to work together to provide a more
permanent solution to database descriptions
BioDBcore: Participating groups
²  BioDB100
²  BioSharing
²  BioCatalogue 
²  Bioinformatics Links Directory 
²  Biositemaps 
²  CASIMIR 
²  MIBBI
²  MIRIAM 
²  Model Organism Databases
²  NIF registry 
²  … and your group !
BioDBCore descriptors

 1.  Database name
2.  Main resource URL
3.  Contact information (e-mail; postal mail)
4.  Date resource established (year)
5.  Conditions of use (Free, or type of license)
6.  Scope: data types captured, curation policy,
standards used
7.  Standards: MIs, Data formats, Terminologies
8.  Taxonomic coverage
9.  Data accessibility/output options
10.  Data release frequency
11.  Versioning policy and access to historical files
12.  Documentation available
13.  User support options
14.  Data submission policy
15.  Relevant publications
16.  Resource’s Wikipedia URL
17.  Tools available
Database name dictyBase
Main resource URL http://dictybase.org
Contact information dictybase@northwestern.edu
Date resource established (year)2003
Conditions of use Free
Scope: Data types captured
Genome sequence; gene models including CDS and predicted proteins;
Phenotypes,
Gene Ontology annotations,
Functional annotation (gene product names),
Gene nomenclature;
Strains; Plasmids;
Free text descriptions,
Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Protein
subcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot),
Citations, Researchers database
Curation policy manual curation
Standards: MIs, Data formats, Terminologies Gene Ontology,
Dicty Anatomy Ontology, Dicty Gene Nomeclature
Data formats FASTA, OBO, GAF, GFF3 (standard)
Taxonomic coverage (use NCBI Taxid) D. discoideum (44689)
including all strains [PRIMARY], also some genome/EST/gene
model info for D. purpureum (5786), and gene model sequences
for P. pallidum (13642) and D. fasiculatum (261658)
Data accessibility/output optionsHTML, text, database reports
Data release frequency curators work on the 'live' database,
weekly data dumps (sequences) or monthly
(other data)
Versioning policy/ access to historical files no versioning
but access to historical
files is possible
Documentation available http://dictybase.org/FAQ/
HelpFilesIndex.html
User support options documents, email, webform
Data submission policy Data from published literature. Some HTP
data
corresponding to published analyses is
incorporated
Relevant publications PMID: 18974179, PMID: 14681427
Resource’s Wikipedia URL
http://en.wikipedia.org/wiki/DictyBase
Tools available BLAST, BioMart, Generic Genome Browser, TextPresso,
MetaCyc (dictyCyc)
Implementation of BioDBCore at
BioSharing (Many thanks to Philippe RS !)
BioDBcore announcement
Published in Nucleic Acids Research database issue 2011
and in the DATABASE journal
Implementation plan
•  Goal: BioDBCore data public and linked
•  Community aware approach: reuse existing
stuff
•  Current Data model: RDF based on
categories from BioSiteMap, MIRIAM, NIF,
Dublin core, Darwin Core
•  Defined extension mechanisms
www.biodbcore.org
Example BioDBCore entry (1/2)
Example BioDBCore entry (2/2)
Creating, editing, maintaining entries
•  Until now: records are manually created from data
provided by NAR at publication of Database issue
and the Life Sciences Registry (Michel Dumontier and
Nick Juty)
- Those mostly come as xls files that need to be
manually entered
- Close to 200 records have been entered
out of over 2,000 obtained
Beyond maintenance at BioSharing
Ideally database providers would maintain their BioDBCore
record up to date
•  Claim ownership
- A database provider can now (in theory) maintain his
own BioDBCore record
Encouraging best practices
•  DATABASE and Nucleic Acids Research journals:
Editors in chief request BioDBCore information from
submitters
•  ISB seal of approval
•  BioDB100 - launched at InCoB 2011 – examples of 100 well
annotated databases
What’s next ? 
q  Continue to extend participating groups and journals
q  Refine scope
q  Integrate semantic support
q  Develop querying system
q  Implement validation tests
q  Set up mechanisms for exchange of data among
collaborating groups (in BioDBCore RDF format, or
other)
Identifying or developing
semantic support
•  Policies and guidelines: BioSharing
•  Publications and taxon info: identifiers.org
•  Authors: ORCID (will also implement
organizations)
•  Keywords/database scope: NIF when possible
Identifying resources is preferable to developing them !
For biohackaton2013
q  Evaluate need for BioDBCore in today’s landscape
of metadatabase resources
q  Evaluate further collaboration opportunities
q  Set up a better system for creating and maintaining
BioDBCore records
q  Identify/develop ontologies pertinent to BioDBCore
Acknowledgements
Philippe Rocca-Serra
Susanna-Assunta Sansone
Eamonn Maguire
Alejandra Gonzalez Beltran
International Society for Biocuration
Michael Galperin
David Landsman
Francis Ouellette
OXFORD	
  UNIVERSITY	
  PRESS	
  
collaborators

Contenu connexe

Tendances

TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)Dag Endresen
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016Dag Endresen
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...ASIS&T
 
The Global Biodiversity Information Facility and Africa Rising
The Global Biodiversity Information Facility and Africa RisingThe Global Biodiversity Information Facility and Africa Rising
The Global Biodiversity Information Facility and Africa RisingFatima Parker-Allie
 
GBIF BIFA mentoring, Day 2 Publish data, July 2016
GBIF BIFA mentoring, Day 2 Publish data, July 2016GBIF BIFA mentoring, Day 2 Publish data, July 2016
GBIF BIFA mentoring, Day 2 Publish data, July 2016Dag Endresen
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics David Shorthouse
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresRafael C. Jimenez
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)Dag Endresen
 
1st USETDA Annual Conference 2011
1st USETDA Annual Conference 2011 1st USETDA Annual Conference 2011
1st USETDA Annual Conference 2011 Plato L. Smith II
 
ETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesBhojaraju Gunjal
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.FAIRDOM
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordKerstin Lehnert
 
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...Dag Endresen
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Vivek Krishnakumar
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Dag Endresen
 
#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishingDag Endresen
 

Tendances (20)

TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
TDWG and GBIF, at European genbank network meeting (Bonn, April 2004)
 
GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016GBIF BIFA mentoring, Day 5a Data management, July 2016
GBIF BIFA mentoring, Day 5a Data management, July 2016
 
Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...Creating a sustainable business model for a digital repository: the Dryad exp...
Creating a sustainable business model for a digital repository: the Dryad exp...
 
The Global Biodiversity Information Facility and Africa Rising
The Global Biodiversity Information Facility and Africa RisingThe Global Biodiversity Information Facility and Africa Rising
The Global Biodiversity Information Facility and Africa Rising
 
GBIF BIFA mentoring, Day 2 Publish data, July 2016
GBIF BIFA mentoring, Day 2 Publish data, July 2016GBIF BIFA mentoring, Day 2 Publish data, July 2016
GBIF BIFA mentoring, Day 2 Publish data, July 2016
 
Dspace Webinar
Dspace WebinarDspace Webinar
Dspace Webinar
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
Standardisation in BMS European infrastructures
Standardisation in BMS European infrastructuresStandardisation in BMS European infrastructures
Standardisation in BMS European infrastructures
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
1st USETDA Annual Conference 2011
1st USETDA Annual Conference 2011 1st USETDA Annual Conference 2011
1st USETDA Annual Conference 2011
 
ETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challengesETDs and Open Access for Research and Development: Issues and challenges
ETDs and Open Access for Research and Development: Issues and challenges
 
Making your data good enough for sharing.
Making your data good enough for sharing.Making your data good enough for sharing.
Making your data good enough for sharing.
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
GBIF web services for biodiversity data, for USDA GRIN, Washington DC, USA (2...
 
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
Quick Intro to InterMine within AIP and MTGD - JCVI Research Works-in-Progres...
 
Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...Knowledge Organization System (KOS) for biodiversity information resources, G...
Knowledge Organization System (KOS) for biodiversity information resources, G...
 
Rusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing AccessRusbridge Feb 8 Improving Clarity around Continuing Access
Rusbridge Feb 8 Improving Clarity around Continuing Access
 
Wilcox - Open Source Repositories and the Future of Fedora
Wilcox - Open Source Repositories and the Future of FedoraWilcox - Open Source Repositories and the Future of Fedora
Wilcox - Open Source Repositories and the Future of Fedora
 
#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing#HepaticaWeek April 2016, GBIF data publishing
#HepaticaWeek April 2016, GBIF data publishing
 
Ices wgdim-may-2010
Ices wgdim-may-2010Ices wgdim-may-2010
Ices wgdim-may-2010
 

En vedette

Using computational predictions to improve literature-based Gene Ontology ann...
Using computational predictions to improve literature-based Gene Ontology ann...Using computational predictions to improve literature-based Gene Ontology ann...
Using computational predictions to improve literature-based Gene Ontology ann...Pascale Gaudet
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012Pascale Gaudet
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOPascale Gaudet
 
Millburn - Flybase community curation
Millburn - Flybase community curationMillburn - Flybase community curation
Millburn - Flybase community curationPascale Gaudet
 
Lock - PomBase community curation
Lock - PomBase community curationLock - PomBase community curation
Lock - PomBase community curationPascale Gaudet
 

En vedette (7)

Using computational predictions to improve literature-based Gene Ontology ann...
Using computational predictions to improve literature-based Gene Ontology ann...Using computational predictions to improve literature-based Gene Ontology ann...
Using computational predictions to improve literature-based Gene Ontology ann...
 
José Cruz Toledo - Aptamer basebc2012
José Cruz  Toledo - Aptamer basebc2012José Cruz  Toledo - Aptamer basebc2012
José Cruz Toledo - Aptamer basebc2012
 
Bairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHOBairoch ISB closing-talk: CALIPHO
Bairoch ISB closing-talk: CALIPHO
 
Millburn - Flybase community curation
Millburn - Flybase community curationMillburn - Flybase community curation
Millburn - Flybase community curation
 
Masson - ViralZone
Masson - ViralZoneMasson - ViralZone
Masson - ViralZone
 
Rinaldi - ODIN
Rinaldi - ODINRinaldi - ODIN
Rinaldi - ODIN
 
Lock - PomBase community curation
Lock - PomBase community curationLock - PomBase community curation
Lock - PomBase community curation
 

Similaire à BioDBCore: Current Status and Next Developments

HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataSusanna-Assunta Sansone
 
Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...Peter McQuilton
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...Hilmar Lapp
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data RepositoriesHeinz Pampel
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflowVarsha Khodiyar
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiativesiainh_z
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourcePhilippa Griffin
 
Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansJeffrey Beall
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Susanna-Assunta Sansone
 
David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access DataSciSIG
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest GroupYaffa Rubinstien
 
Fair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortFair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortData Science NIH
 
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSRDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSSusanna-Assunta Sansone
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service ARDC
 
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET
 

Similaire à BioDBCore: Current Status and Next Developments (20)

Gaudet - BioDBcore
Gaudet - BioDBcoreGaudet - BioDBcore
Gaudet - BioDBcore
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...Cross-linked metadata standards, repositories and the data policies - The Bio...
Cross-linked metadata standards, repositories and the data policies - The Bio...
 
Scholze liber 2015-06-25_final
Scholze liber 2015-06-25_finalScholze liber 2015-06-25_final
Scholze liber 2015-06-25_final
 
The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...The Dryad Digital Repository: Published data as part of the greater data ecos...
The Dryad Digital Repository: Published data as part of the greater data ecos...
 
re3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositoriesre3data.org – Registry of Research Data Repositories
re3data.org – Registry of Research Data Repositories
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 
Data sharing as part of the research workflow
Data sharing as part of the research workflowData sharing as part of the research workflow
Data sharing as part of the research workflow
 
BioMed Central's open data initiatives
BioMed Central's open data initiativesBioMed Central's open data initiatives
BioMed Central's open data initiatives
 
An Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data ResourceAn Oz Mammals Bioinformatics and Data Resource
An Oz Mammals Bioinformatics and Data Resource
 
Digital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic LibrariansDigital Repositories: Essential Information for Academic Librarians
Digital Repositories: Essential Information for Academic Librarians
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access David Van Enckevort - FAIR sample and data access
David Van Enckevort - FAIR sample and data access
 
NIH Data Science Special Interest Group
NIH Data Science Special Interest GroupNIH Data Science Special Interest Group
NIH Data Science Special Interest Group
 
Fair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevortFair sample and data access -David Van enckevort
Fair sample and data access -David Van enckevort
 
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWSRDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
RDA BioSharing WG + RDA Metabolomics IG OVERVIEWS
 
Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service Establishing a UQ Research Data Management Service
Establishing a UQ Research Data Management Service
 
Researh data management
Researh data managementResearh data management
Researh data management
 
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
dkNET Office Hours - "Are You Ready for 2023: New NIH Data Management and Sha...
 

Dernier

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Dernier (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

BioDBCore: Current Status and Next Developments

  • 1. Pascale Gaudet Chair, International Society for Biocuration Scientific Manager, neXtProt, SIB Swiss Institute of Bioinformatics BioDBCore: Current status and future developments
  • 2. International Society for Biocuration: Mission statement •  Define and promote the work of biocurators •  Foster connections with user communities to ensure that databases and accompanying tools meet specific user needs •  Promote communication and exchanges between curators: meetings, workshops, •  Encourage best practices by providing documentation on standards and annotation procedures ISB
  • 3. The need • Databases: improve data integration from published papers • Journals: link to databases objects • Researchers: identify resources • Grant submitters: enforce data sharing plans
  • 4. Goals 1)  Gather information required to provide a general overview of the database landscape and compare the various resources 2)  Encourage consistency and interoperability 3)  Promote the use of standards 4)  Provide guidance for users 5)  Maximize the collective impact of the resources
  • 5. BioDBcore group organization •  Lead by Pascale Gaudet (ISB/SIB) and Philippe-Rocca-Serra (BioSharing) •  Guidelines proposed in 2011 paper •  Implemented in 2012 NAR database issue
  • 6. Use cases •  Show all resources of type database which use MIMARK guidelines •  Show all resources where John Smith is involved •  Show all resources for mouse phenotypes •  Where can I submit my data? and also: •  Guidance for grants’ data sharing policies •  Improving integration of data from papers into databases
  • 7. Collaborative philosophy •  Many groups/resources have been providing registries and lists of databases •  Often not funded, not maintained •  BioDBCore seeks to collaborate with all interested parties to work together to provide a more permanent solution to database descriptions
  • 8. BioDBcore: Participating groups ²  BioDB100 ²  BioSharing ²  BioCatalogue ²  Bioinformatics Links Directory ²  Biositemaps ²  CASIMIR ²  MIBBI ²  MIRIAM ²  Model Organism Databases ²  NIF registry ²  … and your group !
  • 9. BioDBCore descriptors 1.  Database name 2.  Main resource URL 3.  Contact information (e-mail; postal mail) 4.  Date resource established (year) 5.  Conditions of use (Free, or type of license) 6.  Scope: data types captured, curation policy, standards used 7.  Standards: MIs, Data formats, Terminologies 8.  Taxonomic coverage 9.  Data accessibility/output options 10.  Data release frequency 11.  Versioning policy and access to historical files 12.  Documentation available 13.  User support options 14.  Data submission policy 15.  Relevant publications 16.  Resource’s Wikipedia URL 17.  Tools available
  • 10. Database name dictyBase Main resource URL http://dictybase.org Contact information dictybase@northwestern.edu Date resource established (year)2003 Conditions of use Free Scope: Data types captured Genome sequence; gene models including CDS and predicted proteins; Phenotypes, Gene Ontology annotations, Functional annotation (gene product names), Gene nomenclature; Strains; Plasmids; Free text descriptions, Domains (via InterPro), Orthologs (via OrthoMCL and inParanoid), Protein subcellular location (via Swiss-Prot); Protein existence (via Swiss-Prot), Citations, Researchers database
  • 11. Curation policy manual curation Standards: MIs, Data formats, Terminologies Gene Ontology, Dicty Anatomy Ontology, Dicty Gene Nomeclature Data formats FASTA, OBO, GAF, GFF3 (standard) Taxonomic coverage (use NCBI Taxid) D. discoideum (44689) including all strains [PRIMARY], also some genome/EST/gene model info for D. purpureum (5786), and gene model sequences for P. pallidum (13642) and D. fasiculatum (261658) Data accessibility/output optionsHTML, text, database reports Data release frequency curators work on the 'live' database, weekly data dumps (sequences) or monthly (other data) Versioning policy/ access to historical files no versioning but access to historical files is possible
  • 12. Documentation available http://dictybase.org/FAQ/ HelpFilesIndex.html User support options documents, email, webform Data submission policy Data from published literature. Some HTP data corresponding to published analyses is incorporated Relevant publications PMID: 18974179, PMID: 14681427 Resource’s Wikipedia URL http://en.wikipedia.org/wiki/DictyBase Tools available BLAST, BioMart, Generic Genome Browser, TextPresso, MetaCyc (dictyCyc)
  • 13. Implementation of BioDBCore at BioSharing (Many thanks to Philippe RS !)
  • 14. BioDBcore announcement Published in Nucleic Acids Research database issue 2011 and in the DATABASE journal
  • 15. Implementation plan •  Goal: BioDBCore data public and linked •  Community aware approach: reuse existing stuff •  Current Data model: RDF based on categories from BioSiteMap, MIRIAM, NIF, Dublin core, Darwin Core •  Defined extension mechanisms
  • 19. Creating, editing, maintaining entries •  Until now: records are manually created from data provided by NAR at publication of Database issue and the Life Sciences Registry (Michel Dumontier and Nick Juty) - Those mostly come as xls files that need to be manually entered - Close to 200 records have been entered out of over 2,000 obtained
  • 20. Beyond maintenance at BioSharing Ideally database providers would maintain their BioDBCore record up to date •  Claim ownership - A database provider can now (in theory) maintain his own BioDBCore record Encouraging best practices •  DATABASE and Nucleic Acids Research journals: Editors in chief request BioDBCore information from submitters •  ISB seal of approval •  BioDB100 - launched at InCoB 2011 – examples of 100 well annotated databases
  • 21. What’s next ? q  Continue to extend participating groups and journals q  Refine scope q  Integrate semantic support q  Develop querying system q  Implement validation tests q  Set up mechanisms for exchange of data among collaborating groups (in BioDBCore RDF format, or other)
  • 22. Identifying or developing semantic support •  Policies and guidelines: BioSharing •  Publications and taxon info: identifiers.org •  Authors: ORCID (will also implement organizations) •  Keywords/database scope: NIF when possible Identifying resources is preferable to developing them !
  • 23. For biohackaton2013 q  Evaluate need for BioDBCore in today’s landscape of metadatabase resources q  Evaluate further collaboration opportunities q  Set up a better system for creating and maintaining BioDBCore records q  Identify/develop ontologies pertinent to BioDBCore
  • 24. Acknowledgements Philippe Rocca-Serra Susanna-Assunta Sansone Eamonn Maguire Alejandra Gonzalez Beltran International Society for Biocuration Michael Galperin David Landsman Francis Ouellette OXFORD  UNIVERSITY  PRESS   collaborators