Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Graph databases & data integration - the case of RDF

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 47 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (12)

Publicité

Similaire à Graph databases & data integration - the case of RDF (20)

Plus récents (20)

Publicité

Graph databases & data integration - the case of RDF

  1. 1. Graph databases & data integration The case of RDF By Dimitris Kontokostas AKSW/KILT - Leipzig DBpedia Association Thessaloniki Java Meetup / 09.05.2016
  2. 2. Thessaloniki Java meetup - 09.05.2016 About me ● I live in Veria ● I am an ex-ICT teacher ● Since 2003 I was working on mainly on R&D projects ○ + some web development ● Since 2012 doing a PhD & working in AKSW group in Leipzig ○ Focusing on semantic web technologies (RDF, SPARQL, and many other scary terms) ○ aka Knowledge Engineer ● I am on open source enthusiast (DBpedia, RDFUnit) ● Recently became a W3c specification editor for SHACL ● Walked across many langs but ended up in Scala, Java, & Bash ○ With bash / CLI as a first choice;)
  3. 3. Thessaloniki Java meetup - 09.05.2016 Before we start… who knows? LOD Cloud Linked Data
  4. 4. Thessaloniki Java meetup - 09.05.2016 Agenda* ● Graphs ● RDF Graphs ● Data integration ● Who uses RDF ● Quick overview of: ○ DBpedia ○ SPARQL ○ RelFinder ○ Schema.org & actions ○ JSON-LD ○ Entity disambiguation ○ Data Quality (*) focusing mostly on getting familiar to basic terms and concepts (**) Apologies in advance for mixing greek with English
  5. 5. Thessaloniki Java meetup - 09.05.2016
  6. 6. Thessaloniki Java meetup - 09.05.2016 The four V’s heatmap for Graph Databases Study in 2013 found: ● many organizations find the “variety” dimension a greater challenge than volume or velocity. Graph DBs to the rescue: ● Combine multiple sources with different structures ● while retaining the flexibility to add new ones without adapting schematas ● query combined data, or multiple sources at once ● detecting patterns in the data (*) See also this
  7. 7. Thessaloniki Java meetup - 09.05.2016 © Image by Max De Margi
  8. 8. Thessaloniki Java meetup - 09.05.2016 ● A graph is a way of specifying relationships among a collection of items ● Items ○ Nodes - Alice, Bob, … ○ Edges ■ undirected - knows, … ■ directed - follows, … ○ Values -- weights, distances, scores, 0-5 scale, … ○ Attributes - name, time, ... Graphs
  9. 9. Thessaloniki Java meetup - 09.05.2016 Graph Data Models Property graphs ● Industry standards ○ Neo4j, Titan, Apache TinkerPop, ... ○ App specific way for querying, exporting, importing, etc ○ Optimized for specific operation and in many cases faster RDF Graphs ● W3c standards ○ Like XML / HTML, define once run everywhere TM ○ Standardised way for querying, exporting, importing
  10. 10. Thessaloniki Java meetup - 09.05.2016 Property Graphs ● Each node has a ○ unique identifier. ○ set of outgoing edges. ○ set of incoming edges. ○ collection of key-value properties. ● Each edge ○ Is directed ○ has a unique identifier. ○ has a label that denotes the type of relationship between its source and ○ target nodes. ○ has a collection of key-value
  11. 11. Thessaloniki Java meetup - 09.05.2016 RDF - Resource Description Framework ● An RDF Graph is a set of RDF Triples ● An RDF triple consists of (only) three components: ○ the subject (is an IRI) ○ the predicate (is an IRI) ○ the object (can be an IRI or Literal) ○ (subjects and objects can also be blank nodes but let’s leave it for now) http://dbpedia. org/resource/Java dbo:latestReleaseVersion “1.8.0_60” http://dbpedia. org/resource/C++ dbo:influencedBy http://dbpedia. org/resource/C# dbo:influencedBy Subject Predicate Object
  12. 12. Thessaloniki Java meetup - 09.05.2016 RDF is an abstract data model Turtle @prefix dbo: <http://dbpedia.org/ontology/> . @prefix ex: <http://example.com/> . ex:Dimitris a dbo:Person . NTriples <http://example.com/Dimitris> a <http://dbpedia.org/ontology/Person> . JSON-LD { "@id": "http://example.com/Dimitris", "@type": "http://dbpedia.org/ontology/Person" } XML <rdf:Description rdf:about="http://example.com/Dimitris"> <rdf:type rdf:resource="http://dbpedia.org/ontology/Person"/> </rdf:Description> RDFa (embedded in html) <div xmlns="http://www.w3.org/1999/xhtml" prefix=" rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# dbo: http://dbpedia.org/ontology/ rdfs: http://www.w3.org/2000/01/rdf-schema#"> <div typeof="dbo:Person" about="http://example.com/Dimitris"> </div> </div>
  13. 13. Thessaloniki Java meetup - 09.05.2016 RDF & Graphs (Separate) File1.ttl @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.com/> . ex:Dimitris foaf:knows ex:Petros . File2.ttl @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.com/> . ex:Dimitris a foaf:Person . ex:Petros a foaf:Person . File3.ttl @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix ex: <http://example.com/> . ex:Dimitris foaf:interest dbpedia:RDF . ex:Petros foaf:interest dbpedia:Cassandra .
  14. 14. Thessaloniki Java meetup - 09.05.2016 RDF & Graphs (merge) File_all.ttl @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.com/> . ex:Dimitris foaf:knows ex:Petros . ex:Dimitris a foaf:Person . ex:Petros a foaf:Person . @prefix dbpedia: <http://dbpedia.org/resource/> . ex:Dimitris foaf:interest dbpedia:RDF . ex:Petros foaf:interest dbpedia:Apache_Cassandra .
  15. 15. Thessaloniki Java meetup - 09.05.2016 RDF & Graphs (dataset / multi-graph) .n3 files <http://example.com/relations-graph> { @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.com/> . ex:Dimitris foaf:knows ex:Petros . } <http://example.com/types-graph> { @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix ex: <http://example.com/> . ex:Dimitris a foaf:Person . ex:Petros a foaf:Person . } <http://example.com/interests-graph> { @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix dbpedia: <http://dbpedia.org/resource/> . @prefix ex: <http://example.com/> . ex:Dimitris foaf:interest dbpedia:RDF . ex:Petros foaf:interest dbpedia:Cassandra . }
  16. 16. Thessaloniki Java meetup - 09.05.2016 RDF & Linked Data ● Using HTTP(s) based IRIs we get the Web of Data ○ See TED talk from Tim Berners Lee (Creator of WWW) ● Every RDF Resource becomes like a REST GET API that returns all the RDF triples it is associated with ○ content negotiation for RDF (machine) or HTML (human) ○ Follow-your-nose pattern http://dbpedia. org/resource/Java dbo:latestReleaseVersion “1.8.0_60” http://dbpedia. org/resource/C++ dbo:influencedBy http://dbpedia. org/resource/C# dbo:influencedBy http://aksw. org/DimitrisKontok ostas ex:learns http://www. geonames. org/733905/ dbo:birthPlace 40.52437 22.20242 geo:lat geo:long
  17. 17. Thessaloniki Java meetup - 09.05.2016 LOD CLOUD >1K Datasets >50B Triples >100M links
  18. 18. Thessaloniki Java meetup - 09.05.2016 Vocabularies & Semantics ● Vocabularies/Ontologies define classes and predicates (properties) in RDF ○ ex:Dimitris a dbo:Person ○ ex:Dimitris dbo:birthDate “1981-06-06”^^xsd:date ● Existing Vocabularies capture many use case ○ DBpedia ontology (general purpose) ○ Schema.org (general purpose / new backed by Google, Yahoo, Bing & Yandex) ○ Foaf (Friend of a friend) ○ Geo (geographical) ○ Prov-o (data provenance) ○ SKOS (classifications) ○ Org (organization structure) ○ … http://lov.okfn.org has more than 400
  19. 19. Thessaloniki Java meetup - 09.05.2016 Vocabularies & Semantics ● classes and predicates (properties) have definitions (semantics) ● ex:Dimitris a dbo:Person ○ dbo:Person Belongs in a class hierarchy ● ex:Dimitris dbo:birthDate “1981-06-06”^^xsd:date ○ dbo:birthDate expects a dbo:Person as subject ○ dbo:birthDate expects an xsd:date as object ● Reusing existing vocabularies (classes & properties) with defined semantics is a good practice ○ Get part of the data modeling for free ○ Using common terms can help integrate data easier ○ Validation (or inference) for free ■ ex:Thessaloniki dbo:birthDate “1981-06-06”^^xsd:date (is Thessaloniki a Person?) ■ ex:Dimitris dbo:birthDate ex:Thessaloniki (ex:Thessaloniki is not an xsd:date)
  20. 20. Thessaloniki Java meetup - 09.05.2016 Data integration with RDF ● Very simple graph data model ● Convert your data to RDF and model against common vocabularies ○ Design applications against vocabularies ○ Integrate multiple different sources ● Local identifiers are a common integration problem ● Link to data authorities ○ ex:Dimitris dbo:birthPlace ex:Veria geonames:733905 ○ (or) ex:Veria owl:sameAs geonames:733905
  21. 21. Thessaloniki Java meetup - 09.05.2016 Pay as you go Data Integration ● RDF views on top of RDBMS (e.g. MySQL) R2RML (W3c spec) ○ Mapping files defines how SQL queries / tables translate to RDF ○ Queryable through a virtual SPARQL endpoint translating SPARQL to SQL ● Convert XML/JSON/CSV/… to RDF with RML.io using mapping files ● Find links to external databases with Limes & Silk ○ e.g.: ex:Veria owl:sameAs geonames:733905 ● You can get some benefit with low effort ● The more time you invest the better the results ● (Common practice) work on secondary RDF views of your data
  22. 22. Thessaloniki Java meetup - 09.05.2016 Who uses RDF (in public) https://github.com/json-ld/json-ld.org/wiki/Users-of-JSON-LD
  23. 23. Thessaloniki Java meetup - 09.05.2016 Some More Statistics ● Based on the common crawl of Nov 2015 ● 30% of HTML pages (541M / 1.77B pages) contained structured data. ● This 30% originates from 2.72M different pay-level-domains out of the 14.41 million pay-level-domains covered by the crawl (19%). ○ 521K websites use RDFa ○ 1.1 million Microdata ○ 586K have embedded json-ld (mostly for search actions) ● Altogether, the extracted data sets consist of 24.38 billion RDF quads. http://webdatacommons.org/structureddata/2015-11/stats/stats.html#results-2015-1
  24. 24. Thessaloniki Java meetup - 09.05.2016 DBpedia Let’s look at John Cleese (Monty Pythons)
  25. 25. Thessaloniki Java meetup - 09.05.2016 SPARQL „Which films starred John Cleese without any other members of Monty Python?“ SPARQL Examples by Markus Ackermann & Markus Freudenberg
  26. 26. Thessaloniki Java meetup - 09.05.2016
  27. 27. Thessaloniki Java meetup - 09.05.2016 Basic Graph Pattern
  28. 28. Thessaloniki Java meetup - 09.05.2016
  29. 29. Thessaloniki Java meetup - 09.05.2016 Graph Group Pattern
  30. 30. Thessaloniki Java meetup - 09.05.2016
  31. 31. Thessaloniki Java meetup - 09.05.2016 Filtering Unwanted Results
  32. 32. Thessaloniki Java meetup - 09.05.2016
  33. 33. Thessaloniki Java meetup - 09.05.2016 RelFinder demo (flash)
  34. 34. Schema.org ● Vocabulary backed by all Search engines ● RDF data model ○ Normative format is JSON-LD ○ RDF in not actively mentioned (to not scare people away) ○ Allows use as general structured data (e.g. microdata) ● Enriches a lot of (at least) Google’s application ○ Search (try e.g. recipes) ○ Gmail (travel, events, actions,...) ○ Google Now ○ Google Knowledge Graph ○ ...
  35. 35. Thessaloniki Java meetup - 09.05.2016 Schema.org actions
  36. 36. Thessaloniki Java meetup - 09.05.2016 JSON-LD ● Like normal JSON but better ;)
  37. 37. Thessaloniki Java meetup - 09.05.2016 JSON-LD ● Like normal JSON but better ;) ● @context makes the difference ● Append your own context
  38. 38. Thessaloniki Java meetup - 09.05.2016 JSON-LD
  39. 39. Thessaloniki Java meetup - 09.05.2016 JSON-LD
  40. 40. Thessaloniki Java meetup - 09.05.2016 JSON-LD
  41. 41. Thessaloniki Java meetup - 09.05.2016 JSON-LD links ● Previous examples ● JSON-LD specification & playground ● Hypermedia self-described APIs with Hydra
  42. 42. Thessaloniki Java meetup - 09.05.2016 Entity disambiguation aka NERD (Named Entity Resolution & Disambiguation) ● George Bush is sitting in front of the White House ○ George: some George? ○ Bush: a small plant ○ George Bush: former president of USA ○ White: Colour ○ House: a house ○ White House: ● http://dbpedia-spotlight.github.io/demo/
  43. 43. Thessaloniki Java meetup - 09.05.2016 Data Quality ● As mentioned earlier, we can (re) use the vocabulary semantics for automatic data validation ● RDFUnit - https://github.com/AKSW/RDFUnit ○ Automatically generates data unit tests based on the vocabularies your data uses ○ Custom JUnit runner ● SHACL - http://w3c.github.io/data-shapes/shacl/ ○ Language to define advanced data constraints on RDF Graphs ○ (In progress) W3c recommendation
  44. 44. Thessaloniki Java meetup - 09.05.2016 ALIGNED project ● Aligning software & data engineering ● Tools & techniques for agility in changes in code / data ● http://aligned-project.eu ● Options a free consultancy in aligned tools ○ See website for more info
  45. 45. Thessaloniki Java meetup - 09.05.2016 Wrapping up / Key points ● Data variety is a common problem ● Integrating Data can be a pain :) ● Graph Databases can help, RDF can sometimes be more appropriate ● Pay as you go data integration ○ Map your data to RDF ○ Keep RDF as a copy of your source data ● RDF helps you develop reusable applications against schemas ● Schema.org ○ For website markups ○ For defining actions ● JSON-LD (embedded mappings) ● RDF for text annotations ● There is very good tool support for RDF in Java
  46. 46. Thessaloniki Java meetup - 09.05.2016 Links ● http://json-ld.org/ ● http://wiki.dbpedia.org ● http://dbpedia-spotlight.github.io/demo/ ● http://schema.org ● http://aksw.org - Many interesting tools ● http://wikidata.org ● Apache Jena - RDF Java library ● Virtuoso - Open Source RDF & RDBMS DB
  47. 47. Thessaloniki Java meetup - 09.05.2016 Thank you! Questions? Slides available at slideshare.net/jimkont

×