1. Mini-curso sobre LinkedData Oscar Corcho, Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es) Universidad Politécnica de Madrid Florianópolis, September 1st 2010(3º OntoBras 2010) Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Juan Sequeda, Carlos Ruiz Moreno and manyothers WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0
2. Contents IntroductiontoLinked Data Linked Data Foundations: RDF, RDF Schema, SPARQL and OWL Coffee break Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 2
3. Whatisthe Web of Linked Data? An extension of the current Web… … where information and services are given well-defined and explicitly represented meaning, … … so that it can be shared and used by humans and machines, ... ... better enabling them to work in cooperation How? Promoting information exchange by tagging web content with machineprocessable descriptions of its meaning. And technologies and infrastructure to do this And clear principles on how to publish data data
4. What is Linked Data? Linked Data is a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF. Part of the Semantic Web Exposing, sharing and connecting data Technologies: URIs and RDF (although others are also important)
5. The fourprinciples (Tim Berners Lee, 2006) Use URIs as names for things Use HTTP URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) Include links to other URIs, so that they can discover more things. http://www.w3.org/DesignIssues/LinkedData.html 5 http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
24. Howshouldwepublish data? Formats in which data ispublishednowadays… XML HTML DBs APIs CSV XLS … However, mainlimitationsfrom a Web of Data point of view Difficulttointegrate Data isnotlinkedtoeachother, as ithappenswith Web documents.
25. Which format do we use then? RDF (ResourceDescription Framework) Data model Basedon triples: subject, predicate, object <Oscar> <vive en> <Madrid> <Madrid> <es la capital de> <España> <España> <es campeona de> <Mundial de Fútbol> … Serialised in differentformats RDF/XML, RDFa, N3, Turtle, JSON…
26. URIs (Universal-UniformResourceIdentifer) Two types of identifiers can be used to identify Linked Data resources URIRefs(Unique Resource IdentifiersReferences) A URI and an optional FragmentIdentifier separated from the URI by the hash symbol ‘#’ http://www.ontology.org/people#Person people:Person Plain URIs can also be used, as in FOAF: http://xmlns.com/foaf/0.1/Person 17
27. How do wepublishLinked Data? ExposingRelationalDatabasesorother similar formatsintoLinked Data D2R Triplify R2O NOR2O Virtuoso Ultrawrap … Usingnative RDF triplestores Sesame Jena Owlim Talisplatform … Incorporatingit in theform of RDFa in CMSslikeDrupal 18
28. How do we consume Linked Data? Linked Data browsers To explore things and datasets and to navigate between them. Tabulator Browser (MIT, USA), Marbles (FU Berlin, DE), OpenLink RDF Browser (OpenLink, UK), Zitgist RDF Browser (Zitgist, USA), Disco Hyperdata Browser (FU Berlin, DE), Fenfire (DERI, Ireland) Linked Data mashups Sites that mash up (thus combine Linked data) Revyu.com (KMI, UK), DBtune Slashfacet (Queen Mary, UK), DBPedia Mobile (FU Berlin, DE), Semantic Web Pipes (DERI, Ireland) Search engines To search for Linked Data. Falcons (IWS, China), Sindice (DERI, Ireland), MicroSearch (Yahoo, Spain), Watson (Open University, UK), SWSE (DERI, Ireland), Swoogle (UMBC, USA) Listing on this slide by T. Heath, M. Hausenblas, C. Bizer, R. Cyganiak, O. Hartig 19
41. Linked Data Mashup (data.gov) Clean Air Status and Trends (CASTNET) http://data-gov.tw.rpi.edu/demo/exhibit/demo-8-castnet.php
42. Linked Data in the UK Education http://education.data.gov.uk/id/school/106661 Parliament http://parliament.psi.enakting.org/id/member/1227 Maps E.g., London: http://data.ordnancesurvey.co.uk/id/7000000000041428 http://map.psi.enakting.org Transport http://www.dft.gov.uk/naptan/ SameAs service http://www.sameas.org Challenges http://gov.tso.co.uk/openup/sparql/gov-transport 29
43. Linked Data Mashup (data.gov.uk) Research Funding Explorer http://bis.clients.talis.com/
48. Linked Data Mashup (Waterquality) Water quality in Asturias’ beaches http://datos.fundacionctic.org/sandbox/asturias/playas/
49. Contents IntroductiontoLinked Data Linked Data Foundations: RDF, RDF Schema, SPARQL and OWL Coffee break Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 36
51. RDF: Resource Description Framework W3C recommendation RDF is graphical formalism ( + XML syntax + semantics) For representing metadata For describing the semantics of information in a machine- accessible way Resources are described in terms of properties andproperty values using RDF statements Statements are represented as triples, consisting of asubject, predicate and object. [S, P, O] “Oscar Corcho García” person:hasName person:hasColleague oeg:Oscar oeg:Asun person:hasHomePage person:hasColleague “http://www.fi.upm.es/” oeg:Raul 38
52. RDF and URIs RDF uses URIRefs(Unique Resource IdentifiersReferences) toidentify resources A URIRef consists of a URI and an optional FragmentIdentifier separated from the URI by the hash symbol ‘#’ Examples http://www.co-ode.org/people#hasColleague coode:hasColleague A set of URIRefs is known as a vocabulary E.g., the RDF Vocabulary The set of URIRefs used indescribing the RDF concepts:rdf:Property,rdf:Resource, rdf:type, etc. The RDFS Vocabulary The set of URIRefs used indescribing the RDF Schema language: rdfs:Class, rdfs:domain, etc. The ‘Pizza Ontology’ Vocabulary pz:hasTopping,pz:Pizza, pz:VegetarianPizza, etc. 39
53. RDF Serialisations Normative RDF/XML (www.w3.org/TR/rdf-syntax-grammar/) Alternative (for human consumption) N3 (http://www.w3.org/DesignIssues/Notation3.html) Turtle (http://www.dajobe.org/2004/01/turtle/) TriX (http://www.w3.org/2004/03/trix/) … Important: the RDF serializations allow different syntactic variants. E.g., the order of RDF statements has no meaning 40
61. Exercise 1.a. Create a graph from a file Open the file StickyNote_PureRDF.rdf Create the corresponding graph from it Compare your graph with those of your colleagues 44
63. Exercise 1.b. Create files from a graph Transform the following graph into N3 syntax 46 hasMeasurement Measurement8401 Sensor029 includes hasTemperature atTime Class01 includes 2010-06-12T12:00:12 29 Computer101 hasOwner User10A hasName Pedro
64. Blank nodes: structured property values Most real-world data involves structures that are more complicated than sets of RDF triple statements In RDF/XML, it is an <rdf:Description> node with no rdf:about In N3, it is a resource identifier that starts with ‘_’ E.g., “_:nodeX” Thisintermediate URI doesnotneedtohave a name “Oscar Corcho García” person:hasName person:hasPostalAddress oeg:Oscar address:hasStreetName address:city Campus de Montegancedo s/n city:BoadillaDelMonte 47
65. Typed literals So far, all values have been presented as strings XML Schema datatypes can be used to specify values (objects in some RDF triple statements) In RDF/XML, this is expressed as: <rdf:Description rdf:about=”#Oscar”> <person:hasBirthDate rdf:datatype="http://www.w3.org/2001/XMLSchema#date">1976-02-02 </person:hasBirthDate></rdf:Description> In N3, this is expressed as: oeg:Oscar person:hasBirthDate ”1976-02-02”^^xsd:date . person:hasBirthDate oeg:Oscar 1976-02-02 48
66. RDF Containers There is often the need to describe groups of things A book was created by several authors A lesson is taught by several persons etc. RDF provides a container vocabulary rdf:Bag A group of resources or literals, possibly including duplicate members, where the order of members is not significant rdf:Seq A group of resources or literals, possibly including duplicate members, where the order of members is significant rdf:Alt A group of resources or literals that are alternatives (typically for a single value of a property) rdf:type person:hasEmailAddress oeg:Oscar rdf:Seq rdf:_2 rdf:_1 “oscar.corcho@upm.es” “ocorcho@fi.upm.es” 49
67. RDF Reification RDF statements about other RDF statements “Raúl believes that Oscar’s birthdate is on Feb 2nd, 1976 and that his e-mail address is ocorcho@fi.upm.es” RDF Reification Allows expressing beliefs (and other modalities) Allows expressing trust models, digital signatures, etc. Allows expressing metadata about metadata modal:believes oeg:Raúl oeg:Oscar person:hasBirthDate person:hasEmailAddress 02/02/1976 “ocorcho@fi.upm.es” 50
68. Main value of a structured value Sometimes one of the values of a structured value is the main one The weight of an item is 2.4 kilograms The most important value is 2.4, which is expressed with rdf:value Scarcely used product:hasWeight product:Item1 rdf:value units:hasWeightUnit units:kilogram 2.4 51
70. RDF inference. Graph matching techniques RDF inference is based on graph matching techniques Basically, the RDF inference process consists of the following steps: Transform an RDF query into a template graph that has to be matched against the RDF graph It contains constant and variable nodes, and constant and variable edges between nodes Match against the RDF graph, taking into account constant nodes and edges Provide a solution for variable nodes and edges 53
71. RDF inference. Examples (I) Sample RDF graph Query: “Tell me who are the persons who have Asun as a colleague” Result: oeg:Oscar and oeg:Raúl “Oscar Corcho García” person:hasName person:hasColleague oeg:Oscar oeg:Asun person:hasHomePage person:hasColleague “http://www.fi.upm.es/” oeg:Raúl person:hasColleague ? oeg:Asun 54
72. RDF inference. Examples (II) Query: “Tell me which are the relationships between Oscar and Asun” Result: oeg:hasColleague Query: “Tell me the homepage of Oscar colleagues” Result: “http://www.fi.upm.es/” ? oeg:Oscar oeg:Asun person:hasColleague oeg:Oscar person:hasHomePage ? 55
75. RDFS: RDF Schema W3C Recommendation RDF Schema extends RDF to enable talking about classes of resources, and the properties to be used with them Class definition: rdfs:Class, rdfs:subClassOf Property definition: rdfs:subPropertyOf, rdfs:range, rdfs:domain Other primitives: rdfs:comment, rdfs:label, rdfs:seeAlso, rdfs:isDefinedBy RDFS vocabulary adds constraints on models, e.g.: x,y,ztype(x,y) and subClassOf(y,z) type(x,z) ex:Animal rdfs:subClassOf rdf:type ex:Oscar ex:Person 58
85. Exercise 2.a. Create a graph from a file Open the files StickyNote.rdf and StickyNote.rdfs Create the corresponding graph from them Compare your graph with those of your colleagues 64
88. Exercise 2.b. Create files from a graph Transform the following graph into N3 syntax 67 Room Person Measurement Object hasMeasurement Sensor029 includes hasTemperature atTime Class01 includes 2010-06-12T12:00:12 29 Computer101 hasOwner User10A hasName Pedro
92. RDF(S) limitations RDFS too weak to describe resources in sufficient detail No localised range and domain constraints Can’t say that the range of hasChild is person when applied to persons and elephant when applied to elephants No existence/cardinality constraints Can’t say that all instances of person have a mother that is also a person, or that persons have exactly 2 parents No boolean operators Can’t say or, not, etc. No transitive, inverse or symmetrical properties Can’t say that isPartOf is a transitive property, that hasPart is the inverse of isPartOf or that touches is symmetrical Difficult to provide reasoning support No “native” reasoners for non-standard semantics May be possible to reason via FOL axiomatisation 71
96. Given a scenario description, build a simple ontology in RDF Schema72
97. Exercise 3. Domain description Un lugar puede ser un lugar de interés. Los lugares de interés pueden ser lugares turísticos o establecimientos, pero no las dos cosas a la vez. Los lugares turísticos pueden ser palacios, iglesias, ermitas y catedrales. Los establecimientos pueden ser hoteles, hostales o albergues. Un lugar está situado en una localidad, la cual a su vez puede ser una villa, un pueblo o una ciudad. Un lugar de interés tiene una dirección postal que incluye su calle y su número. Las localidades tienen un número de habitantes. Las localidades se encuentran situadas en provincias. Covarrubias es un pueblo con 634 habitantes de la provincia de Burgos. El restaurante “El Galo” está situado en Covarrubias, en la calle Mayor, número 5. Una de las iglesias de Covarrubias está en la calle de Santo Tomás. 73
100. Sample RDF APIs RDF libraries for different languages: Java, Python, C, C++, C#, .Net, Javascript, Tcl/Tk, PHP, Lisp, Obj-C, Prolog, Perl, Ruby, Haskell List in http://esw.w3.org/topic/SemanticWebTools Usually related to a RDF repository Multilanguage: Redland RDF Application Framework (C, Perl, PHP, Python and Ruby): http://www.redland.opensource.ac.uk/ Java: Jena: http://jena.sourceforge.net/ Sesame: http://www.openrdf.org/ PHP: RAP - RDF API for PHP: http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/ Python: RDFLib: http://rdflib.net/ Pyrple: http://infomesh.net/pyrple/ 76
101. Jena Java framework for building Semantic Web applications Open source software from HP Labs The Jena framework includes: A RDF API An OWL API Reading and writing RDF in RDF/XML, N3 and N-Triples In-memory and persistent storage A rule based inference engine SPARQL query engine 77
102. Sesame A framework for storage, querying and inferencing of RDF and RDF Schema A Java Library for handling RDF A Database Server for (remote) access to repositories of RDF data Highly expressive query and transformation languages SeRQL, SPARQL Various backends Native Store RDBMS (MySQL, Oracle 10, DB2, PostgreSQL) main memory Reasoning support RDF Schema reasoner OWL DLP (OWLIM) domain reasoning (custom rule engine) 78
103. Jena example. Graph creation http://.../JohnSmith vcard:FN vcard:N John Smith vcard:Given vcard:Family Smith John // some definitions String personURI = "http://somewhere/JohnSmith"; String givenName = "John"; String familyName = "Smith"; String fullName = givenName + " " + familyName; // create an empty Model Model model = ModelFactory.createDefaultModel(); // create the resource // and add the properties cascading style Resource johnSmith = model.createResource(personURI) .addProperty(VCARD.FN, fullName) .addProperty(VCARD.N, model.createResource() .addProperty(VCARD.Given, givenName) .addProperty(VCARD.Family, familyName)); 79
104. Jena example. Read and write // create an empty model Model model = ModelFactory.createDefaultModel(); // use the FileManager to find the input file InputStream in = FileManager.get().open( inputFileName ); if (in == null) { throw new IllegalArgumentException("File not found"); } // read the RDF/XML file model.read(in, ""); // write it to standard out model.write(System.out); <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:vcard='http://www.w3.org/2001/vcard-rdf/3.0#' > <rdf:Description rdf:nodeID="A0"> <vcard:Family>Smith</vcard:Family> <vcard:Given>John</vcard:Given> </rdf:Description> <rdf:Description rdf:about='http://somewhere/JohnSmith/'> <vcard:FN>John Smith</vcard:FN> <vcard:N rdf:nodeID="A0"/> </rdf:Description> ... </rdf:RDF> 80
108. RDF(S) query languages Languages developed to allow accessing datasets expressed in RDF(S) (and in some cases OWL) Supported by the most important language APIs Jena (HP labs) Sesame (Aduna) Boca (IBM) ... There are some differences wrt. languages like SQL, such as Combination of different sources Trust management Open World Assumption 84 Application Application SQL queries SPARQL, RQL, etc., queries RelationalDB RDF(S)OWL
109. Query types Selection and extraction “Select all the essays, together with their authors and their authors’ names” “Select everything that is related to the book ‘Bellum Civille’” Reduction: we specify what it should not be returned “Select everything except for the ontological information and the book translators” Restructuring: the original structure is changed in the final result “Invert the relationship ‘author’ by ‘is author of’” Aggregation “Return all the essays together with the mean number of authors per essay” Combination and inferences “Combine the information of a book called ‘La guerra civil’ and whose author is Julius Caesar with the book whose identifier is ‘Bellum Civille’” “Select all the essays, together with its authors and author names”, including also the instances of the subclasses of Essay “Obtain the relationship ‘coauthor’ among persons who have written the same book” 85
110. RDF(S) query language families SPARQL W3C Recommendation 15 January 2008 Description graphs Query semantics Triple database Query structure SquishQLFamily SquishQL rdfDB Query Language RDQL BRQL TriQL XPath, XSLT, XQuery XQueryfor RDF XsRQL TreeHuggerandRDFTwig RDFT, Nexus Query Language RDFPath, RpathandRXPath Versa RQL Family RQL SeRQL eRQL Controlled natural language Metalog Other Algae iTQL N3QL PerlRDF Query Language RDEVICE DeductiveLanguage RDFQBE RDFQL TRIPLE WQL XML repository Query syntax 86
111. SPARQL SPARQL Protocol and RDF Query Language Supported by: Jena, Sesame, IBM Boca, etc. Features Itsupportsmostoftheaforementionedqueries Itsupportsdatatypereasoning(datatypes can be requestedinsteadof actual values) Thedomainvocabularyandtheknowledgerepresentationvocabularyare treateddifferently by thequeryinterpreters Itallowsmakingqueriesoverpropertieswithmultiplevalues, overmultiplepropertiesof a resourceandoverreifications Queries can containoptionalstatements Someimplementationssupportaggregationqueries Limitations Neitherset operationsnorexistentialor universal quantifierscan be included in thequeries Itdoesnotsupportrecursivequeries 87
112. SPARQL is also a protocol SPARQL is a Query Language …Find names and websites of contributors to PlanetRDF: PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?website FROM <http://planetrdf.com/bloggers.rdf> WHERE { ?person foaf:weblog ?website . ?person foaf:name ?name . ?website a foaf:Document } ... and a Protocolhttp://.../qps?query-lang=http://www.w3.org/TR/rdf-sparql-query/ &graph-id=http://planetrdf.com/bloggers.rdf&query=PREFIXfoaf: <http://xmlns.com/foaf/0.1/... Services running SPARQL queries over a set of graphs A transport protocol for invoking the service Based on ideas from earlier protocol work such as Joseki Describing the service with Web Service technologies 88
113. SPARQL Endpoints SPARQL protocol services Enables users (human or other) to query a knowledge base using SPARQL Results are typically returned in one or more machine-processable formats List of SPARQL Endpoints http://esw.w3.org/topic/SparqlEndpoints Programmatic access using libraries: ARC, RAP, Jena, Sesame, Javascript SPARQL, PySPARQL, etc. Examples: 89
116. Each way a pattern can be matched yields a solution
117. The sequence of solutions is filtered by: Project, distinct, order, limit/offset
118. One of the result forms is applied: SELECT, CONSTRUCT, DESCRIBE, ASK91
119. Graph patterns Basic Graph Patterns, where a set of triple patterns must match Group Graph Pattern, where a set of graph patterns must all match Optional Graph patterns, where additional patterns may extend the solution Alternative Graph Pattern, where two or more possible patterns are tried Patterns on Named Graphs, where patterns are matched against named graphs 92
128. Patterns on named graphs II PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?src ?bobNick FROM NAMED <http://example.org/foaf/aliceFoaf> FROM NAMED <http://example.org/foaf/bobFoaf> WHERE { GRAPH ?src { ?xfoaf:mbox <mailto:bob@work.example> . ?xfoaf:nick ?bobNick } } PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX data: <http://example.org/foaf/> SELECT ?nick FROM NAMED <http://example.org/foaf/aliceFoaf> FROM NAMED <http://example.org/foaf/bobFoaf> WHERE { GRAPH data:bobFoaf { ?x foaf:mbox <mailto:bob@work.example> . ?x foaf:nick ?nick } } 101
130. Value tests Based on XQuery 1.0 and XPath 2.0 Function and Operators XSD boolean, string, integer, decimal, float, double, dateTime Notation <, >, =, <=, >= and != for value comparisonApply to any type BOUND, isURI, isBLANK, isLITERAL REGEX, LANG, DATATYPE, STR (lexical form) Function call for casting and extensions functions 103
131. Solution sequences and modifiers SELECT ?name WHERE { ?x foaf:name ?name ; :empId ?emp } ORDER BY ?name DESC(?emp) Order modifier: put the solutions in order Projection modifier: choose certain variables Distinct modifier: ensure solutions in the sequence are unique Reduced modifier: permit elimination of some non-unique solutions Limit modifier: restrict the number of solutions Offset modifier: control where the solutions start from in the overall sequence of solutions SELECT ?name WHERE { ?x foaf:name ?name } SELECT DISTINCT ?name WHERE { ?x foaf:name ?name } SELECT REDUCED ?name WHERE { ?x foaf:name ?name } SELECT ?name WHERE { ?x foaf:name ?name } LIMIT 20 SELECT ?name WHERE { ?x foaf:name ?name } ORDER BY ?name LIMIT 5 OFFSET 10 104
132. SPARQL query forms SELECT Returns all, or a subset of, the variables bound in a query pattern match CONSTRUCT Returns an RDF graph constructed by substituting variables in a set of triple templates ASK Returns a boolean indicating whether a query pattern matches or not DESCRIBE Returns an RDF graph that describes the resources found 105
137. Main References Prud’hommeaux E, Seaborne A (2008) SPARQL Query Language for RDF. W3C Recommendation http://www.w3.org/TR/rdf-sparql-query/ SPARQL validator: http://www.sparql.org/validator.html SPARQL implementations: http://esw.w3.org/topic/SparqlImplementations SPARQL Endpoints http://esw.w3.org/topic/SparqlEndpoints SPARQL in Dbpedia http://dbpedia.org/sparql 110
139. Description Logics A family of logic based Knowledge Representation formalisms Descendants of semantic networks and KL-ONE Describe domain in terms of concepts (classes), roles (relationships) and individuals Specificlanguagescharacterisedbytheconstructors and axiomsusedtoassertknowledgeaboutclasses, roles and individuals. Example: ALC (theleastexpressivelanguage in DL thatispropositionallyclosed) Constructors: boolean(and, or, not) Role restrictions Distinguished by: Model theoretic semantics Decidable fragments of FOL Closely related to Propositional Modal & Dynamic Logics Provision of inference services Sound and complete decision procedures for key problems Implemented systems (highly optimised)
140. Structure of DL Ontologies A DL ontology can be divided into two parts: Tbox (Terminological KB): a set of axioms that describe the structure of a domain : Doctor Person Person Man Woman HappyFather Man hasDescendant.(Doctor hasDescendant.Doctor) Abox (Assertional KB): a set of axioms that describe a specific situation : John HappyFather hasDescendant (John, Mary)
141. Mostcommonconstructors in classdefinitions Intersection: C1 ... Cn Human Male Union: C1 ... Cn Doctor Lawyer Negation: C Male Nominals: {x1} ... {xn} {john} ... {mary} Universal restriction: P.C hasChild.Doctor Existentialrestriction: P.C hasChild.Lawyer Maximumcardinality: nP.C 3hasChild.Doctor Minimumcardinality: nP.C 1hasChild.Male SpecificValue: P.{x} hasColleague.{Matthew} Nesting of constructors can bearbitrarilycomplex Person hasChild.(Doctor hasChild.Doctor) Lots of redundancy AB is equivalent to ( A B) P.C is equivalent to P. C
152. OWL 2 (II). Three new profiles OWL2 EL Ontologies that define very large numbers of classes and/or properties, Ontology consistency, class expression subsumption, and instance checking can be decided in polynomial time. OWL2 QL Sound and complete query answering is in LOGSPACE (more precisely, in AC0) with respect to the size of the data (assertions), Provides many of the main features necessary to express conceptual models (UML class diagrams and ER diagrams). It contains the intersection of RDFS and OWL 2 DL. OWL2 RL Inspired by Description Logic Programs and pD*. Syntactic subset of OWL 2 which is amenable to implementation using rule-based technologies, and presenting a partial axiomatization of the OWL 2 RDF-Based Semantics in the form of first-order implications that can be used as the basis for such an implementation. Scalable reasoning without sacrificing too much expressive power. Designed for OWL applications trading the full expressivity of the language for efficiency, RDF(S) applications that need some added expressivity from OWL 2.
153. OWL: Most common constructors Intersection: C1 ... CnintersectionOf Human Male Union: C1 ... CnunionOf Doctor Lawyer Negation: C complementOf Male Nominals: {x1} ... {xn} oneOf {john} ... {mary} Universal restriction: P.C allValuesFrom hasChild.Doctor Existentialrestriction: P.C someValuesFrom hasChild.Lawyer Maximumcardinality: nP[.C] maxCardinality (qualifiedornot) 3hasChild[.Doctor] Minimumcardinality: nP[.C] minCardinality (qualifiedornot) 1hasChild[.Male] Exactcardinality: =nP[.C] exactCardinality (qualifiedornot) =1hasMother[.Female] SpecificValue: P.{x} hasValue hasColleague.{Matthew} Local reflexivity: -- hasSelfNarcisist Person hasSelf(loves) Keys -- hasKeyhasKey(Person, passportNumber, country) Subclass C1 C2 subClassOf Human Animal Biped Equivalence C1 C2 equivalentClass Man Human Male Disjointness C1 C2 disjointWith, AllDisjointClasses Male Female DisjointUnion C C1 ... Cn and Ci Cj foralli≠jdisjointUnionOf Person DisjointUnionOf (Man, Woman) Metaclasses and annotations on axioms are also valid in OWL2, and declarations of classes have to provided. Full list available in reference specs and in the Quick Reference Guide: http://www.w3.org/2007/OWL/refcard
154. OWL: Most common constructors Subproperty P1 P2 subPropertyOf hasDaughter hasChild Equivalence P1 P2 equivalentProperty cost price DisjointProperties P1 ... Pn disjointObjectProperties hasDaughter hasSon Inverse P1 P2- inverseOf hasChild hasParent- Transitive P+ P TransitiveProperty ancestor+ ancestor Functional 1P FunctionalProperty T 1hasMother InverseFunctional 1P- InverseFunctionalProperty T 1hasPassportID- Reflexive ReflexiveProperty Irreflexive IrreflexiveProperty Asymmetric AsymmetricProperty Property chains P P1 o ... o Pn propertyChainAxiom hasUncle hasFather o hasBrother Equivalence {x1} {x2} sameIndividualAs {oeg:OscarCorcho}{img:Oscar} Different {x1} {x2} differentFrom, AllDifferent {john} {peter} NegativePropertyAssertion NegativeDataPropertyAssertion {hasAge john 35} NegativeObjectPropertyAssertion {hasChild john peter} Besides, top and bottom object and datatype properties exist
155. Basic Inference Tasks Subsumption – check knowledge is correct (captures intuitions) Does C subsume D w.r.t. ontology O? (in every modelI of O, CIDI ) Equivalence – check knowledge is minimally redundant (no unintended synonyms) Is C equivalent to D w.r.t. O? (in every modelI of O, CI = DI ) Consistency – check knowledge is meaningful (classes can have instances) Is C satisfiable w.r.t. O? (there exists some modelI of O s.t. CI) Instantiation and querying Is x an instance of C w.r.t. O? (in every modelI of O, xICI ) Is (x,y) an instance of R w.r.t. O? (in every modelI of O, (xI,yI) RI ) All reducible to KB satisfiability or concept satisfiability w.r.t. a KB Can be decided using highly optimised tableaux reasoners
156. MainReferences W3C OWL Working Group (2009) OWL2 Web Ontology Language Document Overview. http://www.w3.org/TR/2009/REC-owl2-overview-20091027/ Dean M, Schreiber G (2004) OWL Web Ontology Language Reference. W3C Recommendation. http://www.w3.org/TR/owl-ref/ Gómez-Pérez, A.; Fernández-López, M.; Corcho, O. Ontological Engineering. Springer Verlag. 2003 Capítulo 4: Ontology languages Baader F, McGuinness D, Nardi D, Patel-Schneider P (2003) The Description Logic Handbook: Theory, implementation and applications. Cambridge University Press, Cambridge, United Kingdom Jena web site:http://jena.sourceforge.net/ Jena API: http://jena.sourceforge.net/tutorial/RDF_API/ Jena tutorials:http://www.ibm.com/developerworks/xml/library/j-jena/index.html http://www.xml.com/pub/a/2001/05/23/jena.html Pellet: http://clarkparsia.com/pellet RACER: http://www.racer-systems.com/ FaCT++: http://owl.man.ac.uk/factplusplus/ HermIT: http://hermit-reasoner.com/
157. Contents IntroductiontoLinked Data Linked Data Foundations: RDF, RDF Schema, SPARQL and OWL Coffee break Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 122
158. MethodologicalguidelinesforLinked Data publication Motivation Related Work GeoLinkedData Identification of the data sources Vocabulary Development Generation of the RDF data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery Future Work
159. GeoLinkedData It is an open initiative whose aim is to enrich the Web of Data with Spanish geospatial data. This initiative has started off by publishing diverse information sources, such as National Geographic Institute of Spain (IGN-E) and National Statistics Institute (INE) http://geo.linkeddata.es
160. Motivation 99.171 % English 0.019 % Spanish The Web of Data ismainlyfor Englishspeakers Poorpresence of Spanish Source:Billion Triples dataset at http://km.aifb.kit.edu/projects/btc-2010/ Thanks to Aidan and Richard
162. Impact of Geo.linkeddata.es Número de tripletas en Español (July): 1.412.248 Número de tripletas en Español (End august): 21.463.088 127 Asunción Gómez Pérez
163. Processfor Publishing Linked Data onthe Web Identification of the data sources Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
164. 1. Identification and selection of the data sources Identification of the data sources Instituto GeográficoNacional Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Instituto Nacionalde Estadística Enable effective discovery
170. 1. Identification and selection of the data sources IndustryProductionIndex Year Province
171. 2. Vocabulary development http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#whichvocabs Identification of the data sources Vocabulary development Generation of the RDF Data Thisisnotenough Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
172. 2. Vocabularydevelopment Features Lightweight : Taxonomies and a fewproperties Consensuatedvocabularies Toavoidthemappingproblems Multilingual Linked data are multilingual TheNeOnmethodology can helpto Re-enginer Non ontologicalresourcesintoontologies Pros: use domainterminologyalreadyconsensuatedbydomainexperts Withdraw in heavyweightontologiesthosefeaturesthatyoudon’tneed Reuseexistingvocabularies 134 Identification of the data sources Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery Asunción Gómez Pérez
173. Knowledge Resources Ontological Resources O. Design Patterns 3 4 O. Repositories and Registries 5 6 Flogic RDF(S) OWL OntologicalResource Reuse O. Aligning O. Merging 5 6 2 Ontology Design Pattern Reuse Non Ontological Resource Reuse 4 3 6 Non Ontological Resources 2 Ontological Resource Reengineering 7 Glossaries Dictionaries Lexicons 5 Non Ontological Resource Reengineering 4 6 Classification Schemas Thesauri Taxonomies Alignments 2 RDF(S) 1 Flogic O. Conceptualization O. Implementation O. Formalization O. Specification Scheduling OWL 8 Ontology Restructuring (Pruning, Extension, Specialization, Modularization) 9 O. Localization 1,2,3,4,5,6,7,8, 9 Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation; Configuration Management; Evaluation (V&V); Assessment 135
174. Vocabularydevelopment: Specification Content requirements: Identifythe set of questionsthattheontologyshouldanswer Whichone are theprovinces in Spain? Where are thebeaches? Where are thereservoirs? Identifytheproductionindex in Madrid Whichoneisthecitywithhigherproductionindex? Give me Madrid latitude and altitude …. Non-contentrequirements Theontologymustbe in thefourofficialSpanishlanguages 136 Asunción Gómez Pérez
175. 2. Lightweight Ontology Development WGS84 Geo Positioning: an RDF vocabulary scv:Dimension scv:Item scv:Dataset hydrographical phenomena (rivers, lakes, etc.) Vocabulary for instants, intervals, durations, etc. Names and international code systems for territories and groups Ontology for OGC Geography Markup Language reused Following the INSPIRE (INfrastructure for SPatial InfoRmation in Europe) recommendation. hydrOntology,SCOVO, FAO Geopolitcal, WGS84, GML, and Time
176. Objetivos: INSPIRE intenta conseguir fuentes armonizadas de Información Geográfica para dar soporte a la formulación, implementación y evaluación de políticas comunitarias (Medio Ambiente, etc). Fuentes de Información Geográfica: Bases de datos de los Estados Miembros (UE) a nivel local, regional, nacional e internacional. Contexto – Directiva INSPIRE Luis Manuel Vilches Blázquez
178. hydrOntology Existencia de gran diversidad de problemas (múltiples fuentes, heterogeneidad de contenido y estructuración, ambigüedad del lenguaje natural, etc.) en la información geográfica. Necesidad de un modelo compartido para solventar los problemas de armonización y estructuración de la información hidrográfica. hydrOntology es una ontología global de dominio desarrollada conforme a un acercamiento top-down. Recubrir la mayoría de los fenómenos representables cartográficamente asociados al dominio hidrográfico. Servir como marco de armonización entre los diferentes productores de información geo-espacial en el entorno nacional e internacional. Comenzar con los pasos necesarios para obtener una mejor organización y gestión de la información geográfica (hidrográfica). Luis Manuel Vilches Blázquez
179. Fuentes Tesauros y Bibliografía Catálogos de fenómenos Getty FTT ADL BCN25 GEMET WFD CC.AA. EGM & ERM Diccionarios y Monografías BCN200 Nomenclátor Geográfico Nacional Nomenclátor Conciso Luis Manuel Vilches Blázquez
180. Criterios de estructuración Directiva Marco del Agua Propuesta por Parlamento y Consejo de la UE Lista de definiciones de fenómenos hidrográficos Proyecto SDIGER Proyecto piloto INSPIRE Dos cuencas, países e idiomas Criterios semánticos Diccionarios geográficos Diccionario de la Real Academia de la Lengua WordNet Wikipedia Bibliografía de varias áreas de conocimiento Herencia: Estructuración actual de catálogos Asesoramiento expertos en toponimia del IGN Luis Manuel Vilches Blázquez
184. 3. Generation of RDF From the Data sources Geographic information (Databases) Statistic information (.xsl) Geospatial information Different technologies for RDF generation Reengineering patterns R20 and ODEMapster Annotation tools Geometry generation Identification of the data sources Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
185. 3. Generation of the RDF Data NOR2O INE ODEMapster IGN Geometry2RDF Geospatial column IGN
186. 3. Generation of the RDF Data / instances NOR2O is a software librarythatimplementsthetransformationsproposedbythePatternsfor Re-engineering Non-OntologicalResources (PR-NOR). Currentlywehave 16 PR-NORs. PR-NORs define a procedurethattransforms a Non-OntologicalResource (NOR) componentsintoontologyelements. http://ontologydesignpatterns.org/ · Classification schemes NOR2O · Thesauri · Lexicons NOR2O FAO Water classification · Classification scheme · Path enumeration data model · Implemented in a database
191. 3. Generation of the RDF Data – R2O & ODEMapster Creation of the R2O Mappings
192. 3. Generation of the RDF Data – Geometry2RDF Oracle STO UTIL package SELECT TO_CHAR(SDO_UTIL.TO_GML311GEOMETRY(geometry)) AS Gml311Geometry FROM "BCN200"."BCN200_0301L_RIO" c WHERE c.Etiqueta='Arroyo'
195. 3. Generation of the RDF data – RDF graphs IGN INE So far 7 RDF Named Graphs 1.412.248 triples BTN25 BCN200 IPI …. http://geo.linkeddata.es/dataset/IGN/BTN25 http://geo.linkeddata.es/dataset/IGN/BCN200 http://geo.linkeddata.es/dataset/INE/IPI
196. 4. Publication of the RDF Data Identification of the data sources Vocabulary development SPARQL Linked Data HTML Generation of the RDF Data IncludingProvenance Support Publication of the RDF data Pubby Pubby 0.3 Data cleansing Linking the RDF data Enable effective discovery Virtuoso 6.1.0
198. 4. Publication of the RDF Data - License License for GeoLinkedData Creative Commons Attribution-ShareAlike 3.0 GNU Free Documentation License Each dataset will have its own specific license, IGN, INE, etc.
199. 5. Data cleansing Identification of the data sources Lack of documentation of the IGN datasets Broken links: Spain, IGN resources Lack of documentation of theontology Missingenglish and spanishlabels Building a spanish ontology and importing some concepts of other ontology (in English): Importing the English ontology. Add annotations like a Spanish label to them. Importing the English ontology, creating new concepts and properties with a Spanish name and map those to the English equivalents. Re-declaring the terms of the English ontology that we need (using the same URI as in the English ontology), and adding a Spanish label. Creating your own class and properties that model the same things as the English ontology. Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
200. 5. Data cleansing URIs in Spanish http://geo.linkeddata.es/ontology/Río RDF allows UTF-8 characters for URIs But, Linked Data URIs has to be URLs as well So, non ASCII-US characters have to be %code http://geo.linkeddata.es/ontology/R%C3%ADo
201. 6. Linking of the RDF Data Identification of the data sources Silk - A Link Discovery Framework for the Web of Data First set of links: Provinces of Spain 86% accuracy Vocabulary development Geonames Generation of the RDF Data GeoLinkedData DBPedia Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
202. 6. Linking of the RDF Data http://geo.linkeddata.es/page/Provincia/Granada 164 Asunción Gómez Pérez
203. 7. Enable effective discovery Identification of the data sources Vocabulary development Generation of the RDF Data Publication of the RDF data Data cleansing Linking the RDF data Enable effective discovery
209. Future Work Generate more datasets from other domains, e.g. universities in Spain. Identify more links to DBPedia and Geonames. Cover complex geometrical information, i.e. not only Point and LineString-like data; we will also treat information representation through polygons.
210. Contents IntroductiontoLinked Data Linked Data Foundations: RDF, RDF Schema, SPARQL and OWL Coffee break Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 172
211. Ontology-based Access to DBs 1 3 2 4 Build a new ontology from 1 DB schema and 1 DB Align the ontology built with approach 1 with a legacy ontology Align an existing DB with a legacy ontology a) Massive dump (semantic data warehouse) b) Query-driven Align an ontology network with n DB schemas and other data sources a) Massive dump (semantic data warehouse) b) Query-driven new ontology existing ontology
212. Ontology-based Access to Databases Universidad Profesor Doctorando Ontología ? Organización Personal BDR Modelo Relacional Pregunta: Nombre de los profesores de la universidad UPM * Un profesor es una persona cuyo puesto es “docente” * Una universidad es una organización de tipo “3” Procesador Procesado de la consulta de acuerdo a la descripción formal de correspondencia Consulta: valores de la columna nombre de los registros de la tabla Personal para los que el valor de la columna puesto is “docente” que estén relacionados con al menos un registro de la tabla Organización con el valor “3” en la columna tipo y “UPM” en la columna nombre.
213. Align data sourceswithlegacyontologies Aeropuertos Ontología O2 Ontología O1 Centro Comunicaciones PuntoGPS Estación Punto Europeo Aeropuerto PuntoAsiatico PuntoEspañol Aeropuerto f (Aeropuertos) = PuntoEuropeo f (Aeropuertos) = RC(O2,M1) RC(O1,M1) Modelo Relacional M1
214. R2O is a declarative language to specify mappings between relational data sources and ontologies. <xml> R2O Mapping </xml> Organization Persons University RDB Professor Student Relational Model Ontology
215. Example: types of mappingsneeded Attibute Mapping with transformation (Regular Expression) Attibute Direct Mapping Relation Mapping w. Transformation (Regular Expression) Relation Mapping w. Transformation (Keyword search)
216. Population example (II) Population example (II) The Operation element defines a transformation based on a regular expression to be applied to the database column for extracting property values
217. For concepts... One or more concepts can be extracted from a single data field (not in 1NF). A view maps exactly one concept in the ontology. For attributes... A column in a database view maps directly an attribute or a relation. A subset of the columns in the view map a concept in the ontology. A subset (selection) of the records of a database view map a concept in the ontology. A column in a database view maps an attribute or a relation after some transformation. A subset of the records of a database view map a concept in the onto. but the selection cannot be made using SQL. A set of columns in a database view map an attribute or a relation. R2O (Relational-to-Ontology) Language
227. Using an RDF repository Itallowsstoring and accessing RDF data Forexample, SESAME (http://www.openrdf.org/) Downloaditfromhttp://www.openrdf.org/download.jsp openrdf-sesame-2.3.0-sdk.zip Deploythe .war in Tomcat (JDK and Tomcatneeded) Create a repository at http://localhost:8080/openrdf-sesame Check: http://localhost:8080/openrdf-sesame/repositories/XXXX http://localhost:8080/openrdf-sesame/repositories/XXX/statements
228. Linked Data frontend Toexpose data as Linked Data Includingcontentnegotiation, etc. Forexample, Pubby http://www4.wiwiss.fu-berlin.de/pubby/ Installation Use pubby-0.3.zip Deploythewebapp folder (and rename)in Tomcat Modify config.n3 Restarttomcat Check: http://localhost:8080/XXX/
233. Contents IntroductiontoLinked Data Linked Data Foundations: RDF, RDF Schema, SPARQL and OWL Coffee break Linked Data publication MethodologicalguidelinesforLinked Data publication RDB2RDF tools Technicalaspects of Linked Data publication Linked Data consumption 189
234. RelFinder: finding relations in Linked Data E.g., relations between films “Pulp Fiction”, “Kill Bill” y “Reservoir Dogs”
235.
236. Designing URI sets forthePublic Sector (UK) http://www.cabinetoffice.gov.uk/media/301253/puiblic_sector_uri.pdf 193
238. Mini-curso sobre LinkedData Oscar Corcho, Asunción Gómez Pérez ({ocorcho, asun}@fi.upm.es) Universidad Politécnica de Madrid Florianópolis, September 1st 2010(3º OntoBras 2010) Credits: Raúl García Castro, Oscar Muñoz, Jose Angel Ramos Gargantilla, María del Carmen Suárez de Figueroa, Boris Villazón, Alex de León, Víctor Saquicela, Luis Vilches, Miguel Angel García, Manuel Salvadores, Juan Sequeda, Carlos Ruiz Moreno and manyothers WorkdistributedunderthelicenseCreativeCommonsAttribution-Noncommercial-Share Alike 3.0