DevEX - reference for building teams, processes, and platforms
Get on the Linked Data Web!
1. Get on the Linked Data Web! Tudor Groza, The University of Queensland Armin Haller, CSIRO ICT Canberra Meta 2011 Canberra, 26 May 2011
2. Linked Data Why now ? Haller, Groza - Get on the Linked Data Web! (Meta 2011) 2
3. 2007 Gartner predictions During the next 10 years, Web-based technologies will improve the ability to embed semantic structures [… it] will occur in multiple evolutionary steps … By 2017, we expect the vision of the Semantic Web […] to coalesce […] and the majority of the Web pages are decorated with some form of semantic hypertext… By 2012, 80% of the public Web sites will use some level of semantic hypertext to create SW documents […] 15% of public Web sites will use more extensive Semantic Web-based ontologies to create semantic databases … Haller, Groza - Get on the Linked Data Web! (Meta 2011) 3 RDFa, microformats (hcard, hatom ,etc …) [Adapted from Ivan Herman (W3C), RDFa, 2011]
4. Metadata in Use Haller, Groza - Get on the Linked Data Web! (Meta 2011) 4 [Peter Mika (Yahoo!), RDFa, 2011]
5. Australian context Draft principles for the Open Public Section Information Haller, Groza - Get on the Linked Data Web! (Meta 2011) 5
6. Technical context 6 Haller, Groza - Get on the Linked Data Web! (Meta 2011) png svg KR AJAX Speech Recognition HTML linkback Semantic Web Technologies WCAG Coreference CSS WebApps RDF SPARQL Entity Extraction URIs SSL Linked Data DL OWL HTTP JavaScript SKOS Theorem Proving XPath RIF XML GRDDL Web Technologies Logic Programming Semantic Technologies XSLT [Sandro Hawke(W3C), Introduction to Linked Data, 2010]
7. Why Linking Data? Applications can find data … without centralization No central bottlenecks No central point-of-failure No central policies No need for permission Use existing … social structures Web structures 7 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
8. Data flow 8 Web Data Apps Inference (Knowledge discovery) Linked Data Semantic data Data in SQL DBs Etc … Data in Spreadsheets Crowd source data XML Data Raw Sensor Data Haller, Groza - Get on the Linked Data Web! (Meta 2011) [Adapted from Sandro Hawke(W3C), Introduction to Linked Data, 2010]
9. Linked Data cloud growth Haller, Groza - Get on the Linked Data Web! (Meta 2011) 9 As of May 2007
10. Linked Data cloud growth Haller, Groza - Get on the Linked Data Web! (Meta 2011) 10 As of February 2008
11. Linked Data cloud growth Haller, Groza - Get on the Linked Data Web! (Meta 2011) 11 As of July 2009
12. Linked Data cloud growth Haller, Groza - Get on the Linked Data Web! (Meta 2011) 12 As of September 2010 [Richard Cyganiak, AnjaJentzsch, Linking Open Data cloud diagram, 2011]
14. Linked Data principles 1. Enable linking Good Website design 2. Publish semantic (meta)data Export data in RDF 3. Use real (live) URIs The heart of Linked Data 14 14
15. 1. Enable linking Put your information on your website Invest in content management Consider offering public APIs Use goodURLs Readable, unambiguous … even in 10 years http://example.com/index.php?123456 http://example.com/about/staff/john-doe URL survival plan Support … Caching Content negotiation 15 15 [Adapted from Sandro Hawke(W3C), Introduction to Linked Data, 2010]
16. 2. Publish semantic data Semantic data RDF Provide comprehensive information for all items For each question about the item … provide an answer 16 Premier? Nickname? Capital? 16
17. 2. Publish semantic dataRDF RDF – Resource Description Framework Knowledge representation language for the Semantic Web Add structured information to Web resources RDF as data model Statements about Web resources Triples: Subject – Predicate – Object Graph on URIs 17 17
18. 2. Publish semantic dataURIs URI – Uniform Resource Identifier Compact sequence of characters … … identifies an abstract or physical resource [RFC3986] URI ≈ URL http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 Linked Data world URI references 18 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
19. 2. Publish semantic dataRDF (cont.) RDF triples Subject: URI reference or BNode Predicate: URI reference Object: URI reference or BNode or Literal RDF “data types” URI references BNodes– blank nodes node without a name Literals – actual values of simple (usual) data types, e.g., int, long, string, etc… 19 19
20. 2. Publish semantic dataRDF Example 20 http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 Publisher Title Author Book series Editor 20
21. 2. Publish semantic dataRDF Example (cont.) 21 http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 has a dc:title which is 21
22. 2. Publish semantic dataRDF Example (cont.) 22 http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 has a dc:creator whose name is has another whose name is dc:creator 22
23. 2. Publish semantic dataRDF Example (cont.) 23 http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 has a which is dc:publisher 23
24. 2. Publish semantic dataRDF Example (cont.) 24 http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 is part of bibo:bkseries which is 24
25. 2. Publish semantic dataRDF Example (cont.) 25 http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 is part of which is dc:publisher bibo:bkseries which is has a 25
26. 2. Publish semantic dataRDF Example (cont.) @prefix dc: <http://purl.org/dc/terms/> . @prefix bibo: <http://purl.org/bibo/> . @prefix : <http://www.morganclaypool.com/book/> . :S00334ED1V01Y201102WBE001 dc:title “Linked Data …” . :S00334ED1V01Y201102WBE001 dc:creator “Tom Heath” . :S00334ED1V01Y201102WBE001 dc:creator “Christian Bizer” . :S00334ED1V01Y201102WBE001 dc:publisher “Morgan & Claypool” . :S00334ED1V01Y201102WBE001 bibo:bkseries “Synthesis Lectures …” . 26 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
31. 2. Publish semantic dataOntologies (cont.) Ontologies Shared conceptualization of a domain Classification of domain concepts Meaningful relations between domain concepts In essence … “typed”schema 28
32. 2. Publish semantic dataOntologies (cont.) 29 http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 has a dc:creator whose name is is has another Person is whose name is dc:creator 29
33. 2. Publish semantic dataOntologies (cont.) Ontology defines (e.g. FOAF ontology) Person Class URI – e.g., http://xmlns.com/foaf/Agent A perspective of the concept via attributes / properties: name, age, gender, … Example Tom Heathis-aPerson Christian Bizeris-aPerson Tom Heath, Christian Bizer – characterized by the attributes defined in the Person class 30 30
34.
35. 3. Use real (live) URIs (cont.) This looks great Web page has a URL Content and the metadata is served via Browser Unfortunately … it didn’t really work Because … 32 32
36. 3. Use real (live) URIsThe problem The problem So what is the URI of Canberra? Canberra is a city not a Web page Canberra was established on 12-03-1913 www.canberra.com.au was created on 10-11-2008 Requirements Don’t mix Canberra with www.canberra.com.au Get data via existing protocols … a challenge 33 33 [Adapted from Sandro Hawke(W3C), Introduction to Linked Data, 2010]
37. 3. Use real (live) URIs (cont.) Identifiers are required for things People, places, companies, products, courses, schools, buildings, plants, concerts, talks, species of plants, songs, musicians, etc … … all the things we have questions about … and all the questions … and all the answers Which become things on their own 34 34 [Adapted from Sandro Hawke(W3C), Introduction to Linked Data, 2010]
38. 3. Use real (live) URIsSolutions Solution 1 - Hash URI In HTML http://www.w3.org/People/Berners-Lee/#Speaking Section in a webpage However, fragment semantics are up to content-type … In RDF http://www.w3.org/People/Berners-Lee/card#i Fragment can be ANYTHING in the world Identifies Tim Berners-Lee not a section in the HTML page 35 35
39. 3. Use real (live) URIsSolutions (cont.) Solution 2 - Slash URIs In Browser GET http://dbpedia.org/resource/Canberra Server responds with redirect 303 See other LOCATION http://dbpedia.org/page/Canberra Two URIs A URI for the thing itself (in RDF for machines) A URI for a Web page (in HTML for humans) 36
40. One last thing … integration Following the principles … create semantic data … publish semantic data … how about integration / linking? Solutions Re-using established URIs Explicit entity consolidation (via URIs) SPARQL queries 37 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
41. Explicit entity consolidation (via URIs) 38 Haller, Groza - Get on the Linked Data Web! (Meta 2011) http://www.morganclaypool.com/book/S00334ED1V01Y201102WBE001 has a dc:creator whose name is http://tomheath.com/id/me has another ? http://data.semanticweb.org/person/tom-heath whose name is dc:creator
42. Explicit entity consolidation (via URIs) Linking similar concepts publishing in different datasets In principle, two types of relations owl:sameAs – strong assertion! rdfs:seeAlso – weak (loose) relation 39 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
43. SPARQL queries On-the-fly integration SPARQL Query language for semantic data (RDF) Across diverse datasets … similar to SQL from the DB world 40 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
45. Linked Data principles - Recap 1. Publish semantic data 2. Enable linking 3. Use real (live) URIs 42 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
46. Open Public Sector Information - Draft principles - Open access to information Free Based on open standards Easily discoverable Understandable Machine-readable Freely reusable and transformable Robust information asset management frameworks adequately describing information assets using appropriate metadata documenting known limitations on data quality and caveats on data use preserving the agency's information assets for appropriate periods of time 43 Haller, Groza - Get on the Linked Data Web! (Meta 2011) RDF – Semantic Data Live URIS + Linking enabled Quality Availability Preservation
47. Open Public Sector Information - Draft principles - Findable information Ensuring that published information has high quality metadata … Publishing the agency's information asset register to enable […] to identify the available information resources from a single source Open and accessible formats online Open Machine-readable Searchable and indexable by commonly used Web search applications 44 Haller, Groza - Get on the Linked Data Web! (Meta 2011) Live URIS + Linking enabled Linked Data Web !
48. In practice … 45 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
49. RDF … a “tool” for agents RDF annotations often express metadata … usually stored in a separate .rdf file useful for agents, limited use for humans 46 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
50. Data for the Web of Data Apart for relational DBs, most of the Web content is in … (X)HTML New content is generated on a daily basis Quest How do get structured data from this content? 47 Haller, Groza - Get on the Linked Data Web! (Meta 2011) [Adapted from Ivan Herman (W3C), RDFa, 2011]
51. The “traditional” Web Authors create HTML content … do not generate individual and separate RDF/XML files Reasons? RDF/XML is complex It requires separate storage, generation In any situation … it represents an overhead 48 Haller, Groza - Get on the Linked Data Web! (Meta 2011) [Adapted from Ivan Herman (W3C), RDFa, 2011]
52. Solution … RDFa Add extra structured content to (X)HTML pages Allow processors to extract this information and turn it into RDF RDFa … a “tool” for agents and humans 49 Haller, Groza - Get on the Linked Data Web! (Meta 2011) [Adapted from Ivan Herman (W3C), RDFa, 2011]
54. RDFa RDFa = RDF in attributes Markup data in Web pages Encodes RDF triples in (X)HTML Represents a complete serialization of RDF Mechanism Adds extra attributes in (X)HTML Uses namespaces and URIs Integration is just as easy as in RDF 51 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
55. RDFa example – HTML <html> <head> … </head> <body> <p> Morgan & Claypool Publishers </p> <p> Linked Data<br/> Evolving the Web into a Global Data Space <br/> Tom Heath <br/> Christian Bizer </p> <p> Synthesis lectures on The Semantic Web: Theory and Technology <br/> James Hendler, Series Editor </p> </body> </html> 52 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
56. RDFa example – RDFa <html prefix=“dc: http://purl.org/dc/terms/” base="http://www.morganclaypool.com/book/S00334…WBE001"> <head> … </head> <body> <p about=""> <span property="dc:publisher">Morgan & Claypool Publishers</span> </p> <p about=""> <span property="dc:title">Linked Data</span><br/> <span property="dc:title">Evolving the Web into a Global Data Space</span><br/> <span property="dc:creator">Tom Heath</span> <br/> <span property="dc:creator">Christian Bizer</span> </p> <p about=""> <span property=”bibo:bkseries">Synthesis lectures on The Semantic Web: Theory and Technology</span><br/> James Hendler, Series Editor </p> </body> </html> 53 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
57. In essence … The same (X)HTML file containing RDFa Is used … unchanged … by browsers Browsers ignore attributed they don’t reconigze Can be used by specialized processors to extract RDF 54 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
59. Raw datasets Great … but not really there yet APIs enable access … but not necessarily enable linking … … it is still a step ahead though … 56 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
60. Raw datasets (cont.) Australian Government 57 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
61. Raw datasets (cont.) US Government 58 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
62. Linked Data datasets DBPedia Community effort Extract structured information from Wikipedia info boxes Core of the Linked Data Web 59 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
64. Linked Data datasets 61 Haller, Groza - Get on the Linked Data Web! (Meta 2011) BBC Nature Programs Music
65. Linked Data datasets (cont.) New York Times As of January 2010 … 10,000 subject headings ~3,000 organizations ~5,000 people ~2,000 locations 62 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
66. Linked Data datasets (cont.) UK Government – data.gov.uk 6,900 datasets Dept. for Business, Innovation and Skills Dept. for Transport Dept. of Health 63 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
67. Linked Data datasets (cont.) 64 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
68. Storage Support Virtuoso RDF triple store SPARQL access http://virtuoso.openlinksw.com/ Talis SaaS Cloud-based RDF storage SPARQL access REST APIs http://www.talis.com/platform Sesame RDF triple store SPARQL access http://www.openrdf.org/ 65 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
69. Indexing, APIs & Frameworks Sindice.com & Sig.ma The Semantic Web index Large scale data acquisition Data discovery & Mash-up Synchronization RDF APIs Jena ARC2 Any23 Frameworks SILK PAGET 66 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
70. RDFa adopters 67 Haller, Groza - Get on the Linked Data Web! (Meta 2011) Facebook RDFa in OpenGraph Protocol April 2010 Any web page linked in the graph as a node Simple markup
71. RDFa adopters 68 Haller, Groza - Get on the Linked Data Web! (Meta 2011) [Adapted from Knud Moeller(DERI Galway), RDFa everywhere, 2010]
72. RDFa adopters 69 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
73. RDFa adopters 70 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
74. RDFa adopters 71 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
75. RDFa adopters 72 Haller, Groza - Get on the Linked Data Web! (Meta 2011) <div typeof="austlit:Work" id="CMCg" about="#CMCg" property="austlit:topicID" content="CMCg"> <span rel="austlit:hasTitle"> <span typeof="austlit:Title”id="SPJi" about="#SPJi”property="austlit:title"> The Drover’s Wife </span> </span> <span rel="austlit:form"> <span typeof="austlit:Form" about="http://austlit.../ns#i$” property="austlit:topicName"> short story </span> </span>
76. Drupal 7! 73 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
77. Drupal 7! Widely adopted open source CMS ING, Tesla motors, Garmin, Ebay, US House of Representatives, Yahoo! Style guide, FOI@DEEWR, … RDFa support out of the box! Common content types mapped to concepts from DublinCore, FOAF or SIOC Without any technical knowledge … you’re publishing Linked Data! 74 Haller, Groza - Get on the Linked Data Web! (Meta 2011)
78. Get on the Linked Data Web! Armin Haller CSIRO ICT Canberra W3C Office Australia Contact: armin@w3.org Tudor Groza The University of Queensland Contact: tudor.groza@uq.edu.au Meta 2011 Canberra, 26 May 2011