Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Providing named entity based search with a common biological database naming scheme
1. Bio2RDF
Providing named entity based search with a
common biological database naming scheme
BioSearch08
Peter Ansell
real world 1
R
a university for the CRICOS No. 00213J
2. Introduction
• Bio2RDF is a set of query services and RDF versions
of biological databases that provide query resolution
based on URI's and common formats for URI's so
that a reference to a given database can always be
recognised based on the URI
real world 2
R
a university for the CRICOS No. 00213J
3. real world 3
R
a university for the CRICOS No. 00213J
4. Entity based link detection
• Reverse links
o http://bio2rdf.org/links/namespace:identifier
o Example: http://bio2rdf.org/links/geneid:12345
o Finds all of the items which have linked back to the
Entrez Geneid for “capping protein (actin filament)
muscle Z-line, beta”
• Namespace specific reverse links
– http://bio2rdf.org/linksns/targetNamespace/names
pace:identifier
o http://bio2rdf.org/linksns/uniprot/geneid:12345
o Only finds items linked from the UniProt database
real world 4
R
a university for the CRICOS No. 00213J
5. Complete full text search
• Overall RDF database search
– http://bio2rdf.org/search/searchTerm
• Provides an efficient multi database full text
search functionality
real world 5
R
a university for the CRICOS No. 00213J
6. Namespace specific search
• Namespace specific RDF database search
– http://bio2rdf.org/searchns/namespace:searchTer
m
• Live search, converted to RDF using
Bio2RDF URI's
– This method is preferred to RDF database search
for a small number of very large databases such
as Swoogle and Pubmed which have their own
search engines implemented
real world 6
R
a university for the CRICOS No. 00213J
7. Integration with text mining
• The live search option could be one place to provide
an interchange point between Text Mining tools and
the Biological databases that are provided by
Bio2RDF
• Results from text mining recognition tools can be
provided in RDF form, or can be rdfised in some way
to contain Bio2RDF URI's that link to the rest of the
Bio2RDF databases
• Alternatively, some basic text mining can be
performed using fulltext search
real world 7
R
a university for the CRICOS No. 00213J
8. Cross-database queries
• Cross-database queries with SPARQL
currently require both of the databases to
exist within the same SPARQL endpoint
• While this is not available on the public
endpoints, a user can setup their own
database relatively quickly and load in their
desired databases and setup a new query
type to execute on that endpoint only
real world 8
R
a university for the CRICOS No. 00213J
9. Example cross database query
• An example of this might be resolving the
Pubmed articles relating to a GO term.
Endpoint http://localhost:8890/sparql loaded
with PubMed, Entrez Geneid, and GO
• If abstracts were loaded into the endpoint
they could also be used
• SPARQL = CONSTRUCT ... WHERE ...
?geneid geneid:xGo ?myGoTerm .
?geneid geneid:xPubMed ?pubmed .
real world 9
R
a university for the CRICOS No. 00213J