2. An RDF Data Model for the
Semantic Web
5th Oracle Life Sciences User Group meeting
May 16-17, 2005
3. Agenda
Introduction – 5 min
– Susie Stephens
Semantic Web for Life Sciences – 25 min
– Susie Stephens
Oracle support of RDF in RDBMS – 25 min
– Souripriya Das
Demo of Siderean’s Seamark Navigation Server – 25 min
– Mike DiLascio, David LaVigna & Joanne Luciano
Discussion – 10 min
– Susie Stephens
5. What is the Semantic Web?
A machine-readable format that is Web
compatible
The Semantic Web adds definition tags to
information in Web pages
– Enables computers to discover data more
effectively
– Allows new associations to form between pieces
of information
6. Resource Description Framework
W3C standard for the common data format
Based on triples (subject–predicate–object)
Everything has a URI
Ontologies used to label the RDF tagged elements
Image Source: W3C
16. Inferencing
If Gene G is implicated in Disease D, and its Protein
Product P is a functional component of only Pathway
P2 -> then Disease D directly perturbs Pathway P2
<rdf:Description>
<log:is rdf:parseType=‘Quote’>
<rdf:Description rdf:about=‘variable#Gene_G’>
<hasProduct rdf:resource=‘variable#Protein_P’/>
<isImplicatedIn rdf:resource=‘variable#Disease_D’/>
</rdf:Description>
<rdf:Description rdf:about=‘variable#Protein_P’>
<inPathway rdf:resource=‘variable#Pathway_P2’/>
</rdf:Description>
<log:is>
<log:implies rdf:parseType=‘Quote’>
<rdf:Description rdf:about=‘variable#Disease_D’>
<D_perturbs rdf:resource=‘variable#pathway_P2’>
</rdf:Description>
</log:implies>
</rdf:Description>
17. Why Semantic Web for Life Sciences?
Heterogeneous data integration using explicit
semantics
Expression well-defined and rich models of
biological systems
Annotating findings and interpretations formally and
sharing with other scientists
Embedding models and semantics within papers
Applying logic to infer additional insights and to
propose and/or capture new hypotheses
19. RDF Support in Oracle RDBMS
Souripriya Das, Ph.D.
Consultant Member of Technical Staff
Oracle New England Development Center
20. Overview
Three types of database objects
Model RDF graph consisting of a set of triples
Rulebase Set of (user-defined) rules
Rule Index Entailed RDF graph
We discuss following aspects for each type of object
DDL
DML
Views
Security
RDF Query (with Inference)
22. Model: Overview
Each RDF Model (graph) consists of a set of
triples
A triple (statement) consists of three
components
– Subject URI or blank node
– Predicate URI
– Object URI or literal or blank node
A statement itself can be a resource (allowing
nested graphs)
25. SDO_RDF_MATCH Table Func
Arguments
– Graph pattern
A sequence of triple patterns
Triple patterns typically use variables
– RDF Data set a set of models
– Filter
– Aliases
…
FROM TABLE(SDO_RDF_MATCH(
‘(?x :brotherOf ?y) (?y :parentOf ?z)’,
SDO_RDF_Models(‘family’),
…
)) t
…
26. SDO_RDF_MATCH: return
Columns (of type VARCHAR2) in each returned row:
For each variable ?x in Graph Pattern
– x
– x$rdfVTYP
URI, Literal, Blank node
– x$rdfLTYP
Specific literal type (e.g., xsd:integer)
– x$rdfCLOB
Contains actual value, if ?x matches a
CLOB value
– x$rdfLANG
Language tag, if any (e.g., “en-us”)
If no variable in Graph Pattern
– A dummy column
27. SDO_RDF_MATCH: matching
Matching multiple representations
The same point in value space may have
multiple representations
– “10”^^xsd:Integer
– “10”^^xsd:PositiveInteger
– “010”^^xsd:Integer
– “000010”^^xsd:Integer
SDO_RDF_MATCH automatically resolves
these
28. RDF Query: Example
Find salary and hiredate of all the uncles
SELECT emp.name, emp.salary, emp.hiredate
FROM emp,
TABLE(SDO_RDF_MATCH(
‘(?x :brotherOf ?y)
(?y :parentOf ?z)
(?x :name ?name)’,
SDO_RDF_Models(‘family'),
…)) t
WHERE emp.name=t.name;
Use of SDO_RDF_MATCH allows embedding a
graph query in a SQL query
29. RDF Query: Example 2
Find pairs of persons residing at the same
address where the first person rents a truck and
the second person buys a fertilizer
SELECT t3.x name1, t3.y name2
FROM AddrTable t1, AddrTable t2,
TABLE(SDO_RDF_MATCH(
‘(?x :rents ?a) (?a rdf:type :Truck)
(?y :buys ?b) (?b rdf:type :Fertilizer)’,
SDO_RDF_Models(‘Activities'),
…)) t3
WHERE t1.name=t3.x and t2.name=t3.y and
t1.addr=t2.addr;
31. Rulebase: Overview
Each RDF rulebase consists of a set of rules
Each rule consists of
– antecedent: graph-pattern
– filter condition (optional)
– Consequent: graph-pattern
One or more rulebases may be used with
relevant RDF models (graphs) to obtain
entailed graphs
34. Rule Index: Overview
A rule index represents an entailed graph
A rule index is created on an RDF dataset
(consisting of a set of RDF models and a set
of RDF rulebases)
35. Rule Index: Example
A rule index may be created on a dataset
consisting of
– family RDF data, and
– family_rb rulebase (shown earlier)
The rule index will contain inferred triples
showing uncleOf and ageGroup information
37. SDO_RDF_MATCH with
Rulebases
Arguments
– Graph pattern
A sequence of triples (with variables)
– RDF Data set
a set of models
a set of rulebases
– Filter
– Aliases
…
FROM TABLE(SDO_RDF_MATCH(
‘(?x :uncleOf ?y)’,
SDO_RDF_Models(‘family’),
SDO_RDF_Rulebases (‘rdfs’, ‘family_rb’)
…
)) t
…
38. RDF Query w/ Inference:
Example
Find salary and hiredate of all the
uncles
SELECT emp.name, emp.salary, emp.hiredate
FROM emp,
TABLE(SDO_RDF_MATCH(
‘(?x :uncleOf ?y) (?x :name ?name)’,
SDO_RDF_Models(‘family'),
SDO_RDF_Rulebases(‘rdfs’, ‘family_rb'),
…)) t
WHERE emp.name=t.name;
39. RDF Query w/ Inference:
Example 2
Find pairs of persons residing at the same
address where the first person rents a truck and
the second person buys a fertilizer
SELECT t3.x name1, t3.y name2
FROM AddrTable t1, AddrTable t2,
TABLE(SDO_RDF_MATCH(
‘(?x :rents ?a) (?a rdf:type :Truck)
(?y :buys ?b) (?b rdf:type :Fertilizer)’,
SDO_RDF_Models(‘Activities'),
SDO_RDF_Rulebases(‘rdfs’),
…)) t3
WHERE t1.name=t3.x and t2.name=t3.y and
t1.addr=t2.addr;
41. Model: DDL
Procedures provided as part of the API may be used
to
– Create a model
– Drop a model
When a user creates a model, a database view gets
created automatically
– rdfm_family
A model corresponds to a column of type
SDO_RDF_TRIPLE_S in a base table
Each model has exactly one base table associated
with it
42. Model: DDL Creating a Model
Create an Application Table
CREATE TABLE family_table (
id NUMBER, family_triple SDO_RDF_TRIPLE_S);
Create a Model
EXEC SDO_RDF.CREATE_RDF_MODEL(
‘family’, ‘family_table’,‘family_triple’);
Automatically creates the following database
view
rdfm_family (…)
43. Loading RDF Data into Oracle
Java API provided to load NTriple into NDM
Sample XSLs provided
– To convert RDF to NTriple
– To convert RDF to INSERT statements
44. Model: DML
SQL DML commands may be used to do DML
operations on a base table to effect DML (i.e., triple
insert, delete, and update) on the corresponding
model
Insert Triples
INSERT INTO family_table VALUES (1,
SDO_RDF_TRIPLE_S(‘family',
'<http://example.org/family/John>',
'<http://example.org/family/brotherOf>',
‘<http://example.org/family/Mary>'));
45. Model: Security
The creator of the base table corresponding to a
model can grant privileges to other users
To perform DML to a model, a user must have DML
privileges for the corresponding base table
The creator of a model can grant QUERY privileges
on the corresponding database view to other users
A user can query only those models for which s/he
has QUERY privileges to the corr. database views
Only the creator of a model can drop the model
48. Rulebase: DDL
Procedures provided as part of the API may
be used to
– Create a rulebase
create_rulebase('family_rb');
– Drop a rulebase
– drop_rulebase('family_rb');
When a user creates a rulebase, a database
view gets created automatically
– rdfr_family_rb (rule_name,
antecedent, filter, consequent, aliases)
49. Rulebase: DML
SQL DML commands may be used on the
database view corresponding to a target
rulebase to insert, delete, and update rules
insert into mdsys.rdfr_family_rb values(
‘uncle_rule',
‘(?x :brotherOf ?y) (?y :parentOf ?z)’,
NULL,
'(?x :uncleOf ?z)',
SDO_RDF_Aliases(…));
50. Rulebase: Security
Creator of a rulebase can grant privileges to
the corresponding database view to other
users
Performing DML operations requires invoker
to have appropriate privileges on the
database view
Only the creator of a rulebase can drop the
rulebase
51. Rulebase: Views
RDF_RULEBASE_INFO
– Contains the list of rulebases
– For each rulebase, contains additional
information (such as, creator, view name, etc)
Content of each rulebase is available from the
corresponding database view
53. Rule Index: DDL
Procedures provided as part of the API may be used
to
– Create a rule index
create_rules_index ('family_rb_rix_family‘,
SDO_RDF_Models('family'),
SDO_RDF_Rulebases(‘rdfs','family_rb'));
– Drop a rule index
drop_rules_index ('family_rb_rix_family');
When a user creates a rule index, a database view
gets created automatically
– rdfi_family_rb_rix_family (…)
54. Rule Index: Security
To create a rule index on an RDF dataset
(models and rulebases), user needs to have
QUERY privileges on those models and
rulebases
Creator of a rule index holds QUERY privilege
on the rule index and may grant this privilege
to other users
Only the creator of a rule index can drop it
55. Rule Index: Views
RDF_RULEINDEX_INFO
– Contains the list of rule indexes
– For each rule index, contains additional
information (such as, creator, status, etc)
RDF_RULEINDEX_DATASETS
– For every rule index, stores the names of its
models and rulebases
56. Rule Index: Dependencies
Content of a rule index depends upon the
content of each element of its dataset
– Any modification to the models or rulebases in its
dataset invalidates the rule index
– Dropping a model or rulebase will drop
dependent rule indexes automatically.
57. Summary
RDF Data Model
– Models (Graphs)
– RDF Query using SDO_RDF_MATCH Table Function
RDF Data Model with (user-defined) Rules
– Models (Graphs)
– Rulebases
– Rule Indexes
– RDF Query on entailed RDF graphs
Management (DDL, DML, Security, …)
– Models, Rulebases, and Rule Indexes
68. Demo of Siderean’s Seamark
Navigation Server
Mike DiLascio & Joanne Luciano
69. Agenda
About Siderean Software & Predictive
Medicine, Inc.
Introducing Seamark Navigation Server v.3.6
Seamark & Oracle 10g RDF Data Model
Demonstration of Seamark / Oracle 10g
integration
Lessons Learned / Q&A
70. About Siderean Software
Aggregate, organize and navigate information
-the way users think –
-to improve analysis and decision making.
Founded in 2001 and based in El Segundo, CA
Ventured backed in 2004
Delivering RDF-centric navigation and analysis capabilities
for end users (a.k.a. - “the last mile”)
Active W3C member leveraging Semantic Web standards
Demonstrating integrated Seamark navigation layer over
Oracle 10g RDF Data Model in collaboration with
Predictive Medicine, Inc.
71. Current solutions
“50,000 results!!! Now what?” “I give up! Hello? Get me an apple!” “Why do I get oranges when I’m looking
for apples?”
IT: CONTENT PRODUCER:
“As soon as I fix his, “I just produced three apples
hers stops working.” last week!”
Enterprise search – Knowledge management –
a brute force approach breathtakingly expensive
72. Introducing Seamark Navigation Server
“I can see the big picture!” “No more staring at a blank text box.” “I can drill down quickly to what I want.”
IT: CONTENT PRODUCER:
“I can take my coffee “I knew we had an apple in
break now.” here somewhere.”
Seamark – layering organization to deliver pinpoint navigation
73. How it works: process
Term View View
Person
Text
Place
Event
Metadata about Organized into a unified Analyzed to generate Providing pinpoint
data and content information architecture… on-demand views… navigation across
is aggregated… the data and content
74. How it works: architecture
User Navigation
and User Tagging
Unstructured Content
and Data Feeds
Web Browsers
& Portals
Search Engines User Alerts
Metadata Navigation Navigation
Aggregator Metadata Web Services
Feed Aggregators
Structured Content
Sources
75. Seamark/Oracle integration
architecture: Phase 1
User Navigation
and User Tagging
Web Browsers
& Portals
User Alerts
Batch RDFMatch
Oracle 10g Query issued from Cached Navigation
RDF Data Seamark at Navigation Web Services
Model for index time Metadata
scalable
persistence of
Feed Aggregators
metadata
76. Seamark/Oracle integration
architecture: Phase 2
User Navigation
and User Tagging
Web Browsers
& Portals
User Alerts
Oracle 10g Federated RDFMatch Dynamic Navigation
RDF Data Queries issued from Navigation Web Services
Model for Seamark at query time Metadata
scalable
persistence of
metadata Feed Aggregators
77. Seamark Demo: Background & Concepts
Life Sciences demonstration premise
RDF offers high value during early stage research
Leveraging strengths of Oracle 10g & Seamark v3.6
Oracle – large datasets / scalability
Seamark – useful subsets / flexible navigation & insights
Project elapsed time - about one week
Locating and identifying data sources represented the
greatest time element
Data sources in RDF required minimal integration time
Non-RDF data sources required transformation and linking
values (non-trivial but straightforward)
78. Seamark Demonstration: Identification of new drug candidates
1. Differentiate different forms
GO2Keyword.rdf
Keywords.rdf
of disease
ProbeSet.rdf 2. Identify patients subgroups.
3. Identify top biomarkers
Keyword 4. Identify function
GO2UniProt.rdf GO2OMIM.rdf
Probe
5. Identify biological and
chemical properties and
Protein
disease associations of
Gene
biomarker
MIM Id
OMIM.rdf 6. Identify documents
IntAct.rdf 7. Identify role in metabolic
GO.rdf
GO2Enzyme.rdf pathways
UniProt.rdf Enzyme
Organism
8. Identify compounds that
Citation interact
9. Identify and compare
Compound
Taxonomy.rdf function in other organisms
PubMed.xml Enzymes.rdf KEGG.rdf
Pathway 10. Identify any prior art
84. Cytoplasm 1st of 9 Matches
Page Scroll
Plasma Membrane, …, 2nd of 9 Matches
Cellular Location Via Gene Ontology
Page Scroll for more results, etc.
85. Start Page: Optionally search across entire collection based upon
keywords from the integrated data sources
86. Seamark Lessons Learned
RDF offers multiple unconstrained views of
data/relationships
– Provides maximum flexibility during early stage research
– Later stages can leverage OWL to constrain known
relationships
Data providers – Timing is right to publish in RDF format
– Cut your customer’s integration costs
– Speed discovery time
Even with one week of effort…
– Proof of Concept demonstrates value of broad & deep
integration
– Additional value in extending POC in customer pilot initiatives
87. Siderean Seamark Conclusion
Getting the precise
information we need from
today’s data glut is
profoundly difficult
Solving this problem
requires a solution that
works the way you think
Siderean is the world’s first
turnkey navigation server
for the enterprise and
people at large
88. To arrange a demonstration of Seamark or
Thank You! for more information please contact:
Mike DiLascio
Office: +1 781 652 0339
Mobile: +1 781 354 7663
mdilascio@siderean.com
Siderean Software, Inc.
390 North Sepulveda Blvd., Suite 2070
El Segundo, CA 90245-4475 USA
http://www.siderean.com