SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Bio2RDF


 Providing named entity based search with a
common biological database naming scheme

                                        BioSearch08

                                        Peter Ansell




                       real world                                          1
                                    R
a university for the                                   CRICOS No. 00213J
Introduction
• Bio2RDF is a set of query services and RDF versions
  of biological databases that provide query resolution
  based on URI's and common formats for URI's so
  that a reference to a given database can always be
  recognised based on the URI




                       real world                                          2
                                    R
a university for the                                   CRICOS No. 00213J
real world                           3
                                    R
a university for the                    CRICOS No. 00213J
Entity based link detection

• Reverse links
      o   http://bio2rdf.org/links/namespace:identifier
      o   Example: http://bio2rdf.org/links/geneid:12345
      o   Finds all of the items which have linked back to the
          Entrez Geneid for “capping protein (actin filament)
          muscle Z-line, beta”
• Namespace specific reverse links
    – http://bio2rdf.org/linksns/targetNamespace/names
      pace:identifier
     o http://bio2rdf.org/linksns/uniprot/geneid:12345
     o Only finds items linked from the UniProt database



                        real world                                       4
                                     R
 a university for the                                CRICOS No. 00213J
Complete full text search
• Overall RDF database search
    – http://bio2rdf.org/search/searchTerm
• Provides an efficient multi database full text
  search functionality




                        real world                               5
                                     R
 a university for the                        CRICOS No. 00213J
Namespace specific search
• Namespace specific RDF database search
    – http://bio2rdf.org/searchns/namespace:searchTer
      m
• Live search, converted to RDF using
  Bio2RDF URI's
    – This method is preferred to RDF database search
      for a small number of very large databases such
      as Swoogle and Pubmed which have their own
      search engines implemented



                        real world                               6
                                     R
 a university for the                        CRICOS No. 00213J
Integration with text mining
• The live search option could be one place to provide
  an interchange point between Text Mining tools and
  the Biological databases that are provided by
  Bio2RDF
• Results from text mining recognition tools can be
  provided in RDF form, or can be rdfised in some way
  to contain Bio2RDF URI's that link to the rest of the
  Bio2RDF databases
• Alternatively, some basic text mining can be
  performed using fulltext search




                       real world                                  7
                                    R
a university for the                           CRICOS No. 00213J
Cross-database queries
• Cross-database queries with SPARQL
  currently require both of the databases to
  exist within the same SPARQL endpoint
• While this is not available on the public
  endpoints, a user can setup their own
  database relatively quickly and load in their
  desired databases and setup a new query
  type to execute on that endpoint only



                        real world                            8
                                     R
 a university for the                     CRICOS No. 00213J
Example cross database query
• An example of this might be resolving the
  Pubmed articles relating to a GO term.
  Endpoint http://localhost:8890/sparql loaded
  with PubMed, Entrez Geneid, and GO
• If abstracts were loaded into the endpoint
  they could also be used
• SPARQL = CONSTRUCT ... WHERE ...
  ?geneid geneid:xGo ?myGoTerm .
  ?geneid geneid:xPubMed ?pubmed .

                        real world                           9
                                     R
 a university for the                    CRICOS No. 00213J

Contenu connexe

Tendances

Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Jackie Wirz, PhD
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Eleanor Howe
 
UNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium PosterUNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium Poster
Nichole Leacock
 

Tendances (20)

BHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-EuropeBHL Tech Overview for BHL-Europe
BHL Tech Overview for BHL-Europe
 
Stanford workshop2020
Stanford workshop2020Stanford workshop2020
Stanford workshop2020
 
Darwin Core extension for germplasm (11th December 2013)
Darwin Core extension for germplasm (11th December 2013)Darwin Core extension for germplasm (11th December 2013)
Darwin Core extension for germplasm (11th December 2013)
 
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)R.P Maurya ppt  on C C D C & DSSP(Bioinformatics)
R.P Maurya ppt on C C D C & DSSP(Bioinformatics)
 
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
 
Behavior ontology workshop princeton
Behavior ontology workshop princetonBehavior ontology workshop princeton
Behavior ontology workshop princeton
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
 
Collaborative Genomic Data Analyses in the Cloud
Collaborative Genomic Data Analyses in the CloudCollaborative Genomic Data Analyses in the Cloud
Collaborative Genomic Data Analyses in the Cloud
 
Science Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman JohnsonScience Seminar Series 4 Norman Johnson
Science Seminar Series 4 Norman Johnson
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
 
Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics Introduction to Biodiversity Informatics
Introduction to Biodiversity Informatics
 
0032-Ijabpt-Imed pub
0032-Ijabpt-Imed pub0032-Ijabpt-Imed pub
0032-Ijabpt-Imed pub
 
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...Researcher Identifiers and National Federated Search Portal for Japanese Inst...
Researcher Identifiers and National Federated Search Portal for Japanese Inst...
 
Publishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked DataPublishing Germplasm Vocabularies as Linked Data
Publishing Germplasm Vocabularies as Linked Data
 
UNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium PosterUNL UCARE Summer Symposium Poster
UNL UCARE Summer Symposium Poster
 
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
Data Exchange Model Of EPGRIS, seminar at the Vavilov Institute in St Petersb...
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ... Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
Use of CEDAR Technology for Ontology-based Submission of Biomedical Data to ...
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
2 donat agosti-1
2 donat agosti-12 donat agosti-1
2 donat agosti-1
 

En vedette (7)

Is a Biological Database Really Different than a Biological Journal?
Is a Biological Database Really Different than a Biological Journal?Is a Biological Database Really Different than a Biological Journal?
Is a Biological Database Really Different than a Biological Journal?
 
Protein networks: A basis for large-scale data mining
Protein networks: A basis for large-scale data miningProtein networks: A basis for large-scale data mining
Protein networks: A basis for large-scale data mining
 
Bio2RDF Distributed Querying model
Bio2RDF Distributed Querying modelBio2RDF Distributed Querying model
Bio2RDF Distributed Querying model
 
Data integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactionsData integration: The STITCH database of protein-small molecule interactions
Data integration: The STITCH database of protein-small molecule interactions
 
The pragmatic text miner: It's just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized dataThe pragmatic text miner: It's just another type of poorly standardized data
The pragmatic text miner: It's just another type of poorly standardized data
 
Systems biology: Large-scale biomedical data mining
Systems biology: Large-scale biomedical data miningSystems biology: Large-scale biomedical data mining
Systems biology: Large-scale biomedical data mining
 
Medical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactionsMedical data and text mining: Linking diseases, drugs, and adverse reactions
Medical data and text mining: Linking diseases, drugs, and adverse reactions
 

Similaire à Providing named entity based search with a common biological database naming scheme

BioPAX Models and Pathways
BioPAX Models and PathwaysBioPAX Models and Pathways
BioPAX Models and Pathways
Michel Dumontier
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
Remzi Çelebi
 
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Michel Dumontier
 
Linked Data for integrating life-science databases
Linked Data for integrating life-science databasesLinked Data for integrating life-science databases
Linked Data for integrating life-science databases
Shuichi Kawashima
 

Similaire à Providing named entity based search with a common biological database naming scheme (20)

Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009Bio2RDF @ W3C HCLS2009
Bio2RDF @ W3C HCLS2009
 
BioPAX Models and Pathways
BioPAX Models and PathwaysBioPAX Models and Pathways
BioPAX Models and Pathways
 
W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2
 
Use of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformaticsUse of open_linked_data_in_bioinformatics
Use of open_linked_data_in_bioinformatics
 
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
EURISCO and GBIF IPT, at the Vavilov Institute in St Petersburg (27 April 2010)
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...Using Architectures for Semantic Interoperability to Create Journal Clubs for...
Using Architectures for Semantic Interoperability to Create Journal Clubs for...
 
Bio2RDF@BH2010
Bio2RDF@BH2010Bio2RDF@BH2010
Bio2RDF@BH2010
 
Introduction to BioHackathon 2014
Introduction to BioHackathon 2014Introduction to BioHackathon 2014
Introduction to BioHackathon 2014
 
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge SystemBio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
Bio2RDF: Towards A Mashup To Build Bioinformatics Knowledge System
 
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
Bio2RDF Release 2: Improved coverage, interoperability and provenance of Link...
 
2013 eswc-bio2rdf-r2
2013 eswc-bio2rdf-r22013 eswc-bio2rdf-r2
2013 eswc-bio2rdf-r2
 
Sharing of germplasm data sets, at the TDWG 2006 conference
Sharing of germplasm data sets, at the TDWG 2006 conferenceSharing of germplasm data sets, at the TDWG 2006 conference
Sharing of germplasm data sets, at the TDWG 2006 conference
 
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungiDr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
 
Building Data
Building DataBuilding Data
Building Data
 
Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)Web services for sharing germplasm data sets, at FAO in Rome (2006)
Web services for sharing germplasm data sets, at FAO in Rome (2006)
 
Linked Data for integrating life-science databases
Linked Data for integrating life-science databasesLinked Data for integrating life-science databases
Linked Data for integrating life-science databases
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
 
Global RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm DataGlobal RDF Descriptors for Germplasm Data
Global RDF Descriptors for Germplasm Data
 
GDG Meets U event - Big data & Wikidata - no lies codelab
GDG Meets U event - Big data & Wikidata -  no lies codelabGDG Meets U event - Big data & Wikidata -  no lies codelab
GDG Meets U event - Big data & Wikidata - no lies codelab
 

Dernier

Dernier (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Providing named entity based search with a common biological database naming scheme

  • 1. Bio2RDF Providing named entity based search with a common biological database naming scheme BioSearch08 Peter Ansell real world 1 R a university for the CRICOS No. 00213J
  • 2. Introduction • Bio2RDF is a set of query services and RDF versions of biological databases that provide query resolution based on URI's and common formats for URI's so that a reference to a given database can always be recognised based on the URI real world 2 R a university for the CRICOS No. 00213J
  • 3. real world 3 R a university for the CRICOS No. 00213J
  • 4. Entity based link detection • Reverse links o http://bio2rdf.org/links/namespace:identifier o Example: http://bio2rdf.org/links/geneid:12345 o Finds all of the items which have linked back to the Entrez Geneid for “capping protein (actin filament) muscle Z-line, beta” • Namespace specific reverse links – http://bio2rdf.org/linksns/targetNamespace/names pace:identifier o http://bio2rdf.org/linksns/uniprot/geneid:12345 o Only finds items linked from the UniProt database real world 4 R a university for the CRICOS No. 00213J
  • 5. Complete full text search • Overall RDF database search – http://bio2rdf.org/search/searchTerm • Provides an efficient multi database full text search functionality real world 5 R a university for the CRICOS No. 00213J
  • 6. Namespace specific search • Namespace specific RDF database search – http://bio2rdf.org/searchns/namespace:searchTer m • Live search, converted to RDF using Bio2RDF URI's – This method is preferred to RDF database search for a small number of very large databases such as Swoogle and Pubmed which have their own search engines implemented real world 6 R a university for the CRICOS No. 00213J
  • 7. Integration with text mining • The live search option could be one place to provide an interchange point between Text Mining tools and the Biological databases that are provided by Bio2RDF • Results from text mining recognition tools can be provided in RDF form, or can be rdfised in some way to contain Bio2RDF URI's that link to the rest of the Bio2RDF databases • Alternatively, some basic text mining can be performed using fulltext search real world 7 R a university for the CRICOS No. 00213J
  • 8. Cross-database queries • Cross-database queries with SPARQL currently require both of the databases to exist within the same SPARQL endpoint • While this is not available on the public endpoints, a user can setup their own database relatively quickly and load in their desired databases and setup a new query type to execute on that endpoint only real world 8 R a university for the CRICOS No. 00213J
  • 9. Example cross database query • An example of this might be resolving the Pubmed articles relating to a GO term. Endpoint http://localhost:8890/sparql loaded with PubMed, Entrez Geneid, and GO • If abstracts were loaded into the endpoint they could also be used • SPARQL = CONSTRUCT ... WHERE ... ?geneid geneid:xGo ?myGoTerm . ?geneid geneid:xPubMed ?pubmed . real world 9 R a university for the CRICOS No. 00213J