Contains a small background on the semantic web, and shows how Prolog is thought to be used from inside Bioclipse research software for RDF data handling.
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
1. nd
2 Status report of degree project
Integrating Blipkit/BioProlog
for semantic reasoning in Bioclipse
Samuel Lampa, 2010-01-25
Project blog: http://saml.rilspace.com
4. What is Semantic Web?
“Enabling more powerful use of information”
Main goals:
● Data availability (on the web)
● Machine-readability of data
● Knowledge integration
● Automatic “conclusion drawing”
● “Reasoning”, using Reasoners →
7. Research question
How do biochemical questions
formulated as Prolog queries
compare to other solutions
available in Bioclipse in terms of
speed and expressiveness?
8. Semantic Reasoners
● Pellet/Jena
● Uses W3C languages
– OWL (Class definitions)
– RDF (Facts)
– SPARQL (Querying)
● Blipkit/BioProlog
● Uses Prolog, with W3C languages “on top”
– Class definitions, Facts and Queries either in
W3C languages (“on top” of prolog) or in pure
Prolog!
10. What is Prolog?
● State facts and rules
● Execute by running queries over these
facts and rules
● Unique features:
● Backtracking
● “Closed-world assumption”
12. Prolog code example
% === SOME FACTS ===
hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”
% etc …
% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===
isDrugLike( Substance ) :-
hasHBondDonorsCount( Substance, HBDonors ),
HBDonors <= 5,
hasHBondAcceptorsCount( Substance, HBAcceptors ),
HBAcceptors <= 10,
hasMolecularWeight( Substance, MW ),
MW < 500.
% === QUERYING THE RULE ===
?- isDrugLike(substanceX)
true.
?- isDrugLike(X)
X = substanceX ;
X = substanceY.
13. Prolog code example
% === SOME FACTS ===
hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”
% etc …
% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===
Head Implication (“If [body] then [head]”)
isDrugLike( Substance ) :-
hasHBondDonorsCount( Substance, HBDonors ),
HBDonors <= 5,
hasHBondAcceptorsCount( Substance, HBAcceptors ),
HBAcceptors <= 10,
hasMolecularWeight( Substance, MW ),
MW < 500.
Body
% === QUERYING THE RULE ===
?- isDrugLike(substanceX) Comma means conjunction (“and”)
true.
?- isDrugLike(X)
X = substanceX ;
X = substanceY.
Capitalized terms are always variables
14. Prolog code example
% === SOME FACTS ===
hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”
% etc …
% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===
isDrugLike( Substance ) :-
hasHBondDonorsCount( Substance, HBDonors ),
HBDonors <= 5,
hasHBondAcceptorsCount( Substance, HBAcceptors ),
HBAcceptors <= 10,
hasMolecularWeight( Substance, MW ),
MW < 500.
% === QUERYING THE RULE ===
?- isDrugLike(substanceX) Testing a specific atom (“sutstanceX”)
true.
?- isDrugLike(X)
X = substanceX ; By submitting a variable (“X”), it will be populated with all
instances which satisfies the “isDrugLike” rule
X = substanceY.
18. What is done so far?
● Integration of Blipkit in Bioclipse
● Done: General purpose methods
● Done: Found usage strategy for combined use of
Bioclipse JS scripting and Prolog
● Comparing Prolog and Pellet
● Done: Simple performance testing
● Now: Stuck on NMR spectrum similarity search
– (No backtracking on arithmetic operators in
SPARQL)
20. What remains to be done?
● Integration of Prolog / Blipkit
● Refinements?
● Comparing Prolog and Pellet
● NMR spectrum similarity search
– Investigate use of OWL in querying
– Other options? SWRL?
● ChEMBL data
● Toxicity data (opentox.org)
22. Example Bioclipse/Prolog script
blipkit.init();
blipkit.loadRDFToProlog("nmrshiftdata.100.rdf.xml");
// Define a “convenience prolog method”
blipkit.loadPrologCode("
hasPeak( Subject, Predicate ) :-
rdf_db:rdf( Subject,
'http://www.nmrshiftdb.org/onto#hasPeak',
Predicate ).
");
// Call the convenience method (which in turn executes it's
// “body”), and returns all mathing results as an array
var resultList =
blipkit.queryProlog(["hasPeak","10","Subject","Predicate"]);
23. Example Bioclipse/Prolog script
blipkit.init();
blipkit.loadRDFToProlog("nmrshiftdata.100.rdf.xml");
// Define a “convenience prolog method”
blipkit.loadPrologCode("
hasPeak( Subject, Predicate ) :-
rdf_db:rdf( Subject,
'http://www.nmrshiftdb.org/onto#hasPeak',
Predicate ).
"); Prolog rule to load into prolog engine
// Call the convenience method (which in turn executes it's
// “body”), and returns all mathing results as an array
var resultList =
blipkit.queryProlog(["hasPeak","10","Subject","Predicate"]);
Prolog method to call
Limit the number of results Prolog variables
25. Current status of research question
● Performance
● Prolog won so far. Exceptions?
● Usability
● Prolog very convenient for iterative
wrapping of complex logic.
Can RDF/OWL/SPARQL replicate this?
● Where do RDF/OWL/SPARQL excel?