10. Linked Data Browsers Not actually separate browsers. Run inside of HTML browsers View the data that is returned after looking up a URI in tabular form User can navigate between data sources by following RDF Links (IMO) No usability
11.
12. Linked Data Browsers http://browse.semanticweb.org/ Tabulator OpenLinkDataexplorer Zitgist Marbles Explorator Disco LinkSailor
14. Linked Data (Semantic Web) Search Engines Just like conventional search engines (Google, Bing, Yahoo), crawl RDF documents and follow RDF links. Current search engines don’t crawl data, unless it’s RDFa Human focus Search Falcons - Keyword SWSE – Keyworkd VisiNav – Complex Queries Machine focus Search Sindice – data instances Swoogle - ontologies Watson - ontologies Uberblic – curated integrated data instances
15. (Semantic) SEO ++ Markup your HTML with RDFa Use standard vocabularies (ontologies) Google Vocabulary Good Relations Dublin Core Google and Yahoo will crawl this data and use it for better rendering
20. Domain Specific Applications Government Data.gov Data.gov.uk http://data-gov.tw.rpi.edu/wiki/Demos Music Seevl.net Dbpedia Mobile Life Science LinkedLifeData Sports BBC World Cup
25. Find all the locations of all the original paintings of Modigliani
26. Select all proteins that are linked to a curated interaction from the literature and to inflammatory response http://linkedlifedata.com/
27. SPARQL Endpoints Linked Data sources usually provide a SPARQL endpoint for their dataset(s) SPARQL endpoint: SPARQL query processing service that supports the SPARQL protocol* Send your SPARQL query, receive the result * http://www.w3.org/TR/rdf-sparql-protocol/
28. Where can I find SPARQL Endpoints? Dbpedia: http://dbpedia.org/sparql Musicbrainz: http://dbtune.org/musicbrainz/sparql U.S. Census: http://www.rdfabout.com/sparql http://esw.w3.org/topic/SparqlEndpoints
29. Accessing a SPARQL Endpoint SPARQL endpoints: RESTful Web services Issuing SPARQL queries to a remote SPARQL endpoint is basically an HTTP GET request to the SPARQL endpoint with parameter query GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.orgUser-agent: my-sparql-client/0.1 URL-encoded string with the SPARQL query
30. Query Results Formats SPARQL endpoints usually support different result formats: XML, JSON, plain text (for ASK and SELECT queries) RDF/XML, NTriples, Turtle, N3 (for DESCRIBE and CONSTRUCT queries)
34. Query Result Formats Use the ACCEPT header to request the preferred result format: GET /sparql?query=PREFIX+rd... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1 Accept: application/sparql-results+json
35. Query Result Formats As an alternative some SPARQL endpoint implementations (e.g. Joseki) provide an additional parameter out GET /sparql?out=json&query=... HTTP/1.1 Host: dbpedia.org User-agent: my-sparql-client/0.1
36. Accessing a SPARQL Endpoint More convenient: use a library SPARQL JavaScript Library http://www.thefigtrees.net/lee/blog/2006/04 sparql_calendar_demo_a_sparql.html ARC for PHP http://arc.semsol.org/ RAP – RDF API for PHP http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/index.html
41. Linked Data Architectures Follow-up queries Querying Local Cache Crawling Federated Query Processing On-the-fly Dereferencing
42. Follow-up Queries Idea: issue follow-up queries over other datasets based on results from previous queries Substituting placeholders in query templates
43. String s1 = "http://cb.semsol.org/sparql"; String s2 = "http://dbpedia.org/sparql"; String qTmpl = "SELECT ?c WHERE{ <%s> rdfs:comment ?c }"; String q1 = "SELECT ?s WHERE { ..."; QueryExecution e1 = QueryExecutionFactory.sparqlService(s1,q1); ResultSet results1 = e1.execSelect(); while ( results1.hasNext() ) { QuerySolution s1 = results.nextSolution(); String q2 = String.format( qTmpl, s1.getResource("s"),getURI() ); QueryExecution e2= QueryExecutionFactory.sparqlService(s2,q2); ResultSet results2 = e2.execSelect(); while ( results2.hasNext() ) { // ... } e2.close(); } e1.close(); Find a list of companies Filtered by some criteria and return DbpediaURIs from them
44. Follow-up Queries Advantage Queried data is up-to-date Drawbacks Requires the existence of a SPARQL endpoint for each dataset Requires program logic Very inefficient
45. Querying Local Cache Idea: Use an existing SPARQL endpoint that provides access to a set of copies of relevant datasets Use RDF dumps of each dataset SPARQL endpoint over a majority of datasets from the LOD cloud at: http://uberblic.org http://lod.openlinksw.com/sparql
46. Querying a Collection of Datasets Advantage: No need for specific program logic Includes the datasets that you want Complex queries and high performance Even reasoning Drawbacks: Depends on existence of RDF dump Requires effort to set up and to operate the store How to keep the copies in sync with the originals? Queried data might be out of date
47. Crawling Crawl RDF in advance by following RDF links Integrate, clean and store in your own triplestore Same way we crawl HTML today LDSpider
48. Crawling Advantages: No need for specific program logic Independent of the existence, availability, and efficiency of SPARQL endpoints Complex queries with high performance Can even reason about the data Drawbacks: Requires effort to set up and to operate the store How to keep the copies in sync with the originals? Queried data might be out of date
49. Federated Query Processing Idea: Querying a mediator which distributes sub-queries to relevant sources and integrates the results
50. Federated Query Processing Instance-based federation Each thing described by only one data source Untypical for the Web of Data Triple-based federation No restrictions Requires more distributed joins Statistics about datasets required (both cases)
51. Federated Query Processing DARQ (Distributed ARQ) http://darq.sourceforge.net/ Query engine for federated SPARQL queries Extension of ARQ (query engine for Jena) Last update: June 2006 Semantic Web Integrator and Query Engine(SemWIQ) http://semwiq.sourceforge.net/ Last update: March 2010 Commercial …
52. Federated Query Processing Advantages: No need for specific program logic Queried data is up to date Drawbacks: Requires the existence of a SPARQL endpoint for each dataset Requires effort to set up and configure the mediator
53. In any case: You have to know the relevant data sources When developing the app using follow-up queries When selecting an existing SPARQL endpoint over a collection of dataset copies When setting up your own store with a collection of dataset copies When configuring your query federation system You restrict yourself to the selected sources
54. In any case: You have to know the relevant data sources When developing the app using follow-up queries When selecting an existing SPARQL endpoint over a collection of dataset copies When setting up your own store with a collection of dataset copies When configuring your query federation system You restrict yourself to the selected sources There is an alternative: Remember, URIs link to data
55. On-the-fly Dereferencing Idea: Discover further data by looking up relevant URIs in your application on the fly Can be combined with the previous approaches Linked Data Browsers
56. Link Traversal Based Query Execution Applies the idea of automated link traversal to the execution of SPARQL queries Idea: Intertwine query evaluation with traversal of RDF links Discover data that might contribute to query results during query execution Alternately: Evaluate parts of the query Look up URIs in intermediate solutions
67. Link Traversal Based Query Execution Advantages: No need to know all data sources in advance No need for specific programming logic Queried data is up to date Does not depend on the existence of SPARQL endpoints provided by the data sources Drawbacks: Not as fast as a centralized collection of copies Unsuitable for some queries Results might be incomplete (do we care?)
68. Implementations Semantic Web Client library (SWClLib) for Java http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ SWIC for Prolog http://moustaki.org/swic/
69. Implementations SQUIN http://squin.org Provides SWClLib functionality as a Web service Accessible like a SPARQL endpoint Install package: unzip and start Less than 5 mins! Convenient access with SQUIN PHP tools: $s = 'http:// ...'; // address of the SQUIN service $q = new SparqlQuerySock( $s, '... SELECT ...' ); $res = $q->getJsonResult();// or getXmlResult()