Graphs, Stores and API

BOSA.be
20 March 2019 – Brussels
Bart Hanssens
BOSA DG Digital Transformation
Graphs,
Stores
and APIs

Agenda
▪ Introduction
▪ Layers
▪ (Labeled) Property Graphs
▪ The case for identifiers
▪ Semantic Graphs (triples/quads in RDF)
▪ Similarities and differences
▪ Questions ?

Introduction
Source: https://xkcd.com/927/
Creative Commons Attribution-NonCommercial 2.5 License.

“No Highlander” rule of IT
▪ There will not be “only one”
▪ P (N 1) > P (1)
▪ See also: Not Invented Here Syndrome
▪ Also applies to
▪ “unique” identifiers
▪ Query languages
▪ File formats
▪ “generic” APIs

Layers
File
format
API
Model
Back-endBack-end
Model
Query LanguageQuery Language
API API API
File
format
File
format
File format

File formats
▪ No one format fits all
▪ CSV is fine for tabular data, but what about hierarchy
▪ XML and JSON not the fastest to produce / consume
▪ Did someone just say YAML ?
▪ … or binary / protocol buffers / thrift ?

API / Query Language
▪ Typically one wants to hide/shield the QL
▪ Security (SQL-injections, anyone ?)
▪ Flexibility (system may change, API should be stable)
▪ The distinction can be very thin
▪ SPARQL over HTTP, is it an API ?
▪ Query language often tied to model
▪ Translations may be possible, but may not be efficient

APIs with benefits
▪ Swagger / OpenAPI
▪ Hydra
▪ HyperMedia API, “next”, “previous”, “search”
▪ GraphQL
▪ “query language for API”
▪ Also provides pagination support

Query languages
▪ RDBMS: SQL
▪ Triple stores: SPARQL + some others
▪ Graph stores: (open)Cypher + others
▪ I-want-to-query-them-all-as-a-graph:
▪ Gremlin (Apache Tinkerpop)

GraphQL
▪ Created by Facebook
▪ https://graphql.org/learn/
▪ Often JSON
▪ API and query language
▪ E.g. use it to wrap REST-responses in one call
▪ Mappings/translations to e.g. SQL exist
▪ “shaping” results, data types,
▪ Pagination etc
▪ Not “linked” out-of-the box

Hydra + SHACL for linked data
▪ SHACL for validation (store) and shaping (results)
▪ Hydra Hypermedia API
▪ Often JSON-LD
▪ See also
▪ http://www.hydra-cg.com/
▪ https://github.com/Informatievlaanderen/generieke-
hypermedia-api
▪ Linked out of the box

Models / stores / backends
▪ RDBMS / graph / key-value (hello NoSQL, Big Data…)
▪ RDBMS: Oracle, Postgres
▪ Triple stores: RDF4J, Ontotext GraphDB, StarDog
▪ Graph stores: Neo4j, ArangoDB
▪ Storage underneath it can again be something else
▪ Maybe RDBMS uses key-value
▪ Graph / triple store : perhaps RDBMS, HashMaps, …

Hybrid stores
▪ Boundaries are blurring anyway
▪ IBM DB2 / Oracle + Spatial & Graph
▪ Virtuoso
▪ PostgreSQL (table inheritance, binary JSON …)
▪ BitNine AgensGraph (based on PostgreSQL)

Ha ! Your RDBMS can’t do XML, JSON, graphs …
▪ Yes it can !
▪ Although it might not be the most efficient way
▪ Recursive queries: SQL 1999
▪ Oracle: CONNECT BY… PRIOR
▪ WITH… RECURSIVE
▪ XML, XQuery: SQL 2003/2006
▪ JSON: SQL 2016
▪ And geo data too ! (e.g. PostGIS)

(Labeled) property graphs
Apple iPadProduces
AAPL
2010 -
shiny

Property Graphs
▪ Nodes/edges connected by vertex/vertices
▪ Can have additional labels / properties on “relations”
▪ E.g. a comment, date range…
▪ Not “linked” in the semantic web sense
▪ Similar to RDBMS export, it’s all about identifiers

Data wants to be combined
▪ Data (files) from =/= sources will be combined
▪ Hence there is a need for “keys” or “identifiers”
▪ Preferably
▪ globally unique
▪ (Semi-)decentralized
▪ API to get more info about the “thing”, if needed

Many options
▪ Specific schemes like EAN / UPC / ISBN / DOI
▪ More generic solutions: UUID, OID
▪ You could even use IPv6
▪ Reverse domain name dev style: be.bosa.dto.pkg
▪ Or… URL as identifier
▪ People use URL all the time anyway to surf the web
▪ … so can machines

Semantic Graphs
Apple iPadProduces
HasTicker
AAPL shiny
Screenis

Resource Description Framework
▪ Triples
▪ <subject> <predicate> <object>
▪ <A> <something about> <B>
▪ The WWW can be half the API
▪ GET a URI for additional info
▪ Content / file format negotiation via HTTP-Accept

Cheating a little bit in RDF
▪ <object> can be a literal with a type OR a language
▪ “hello”@en
▪ “15”^^xsd:integer
▪ Not both, sorry, and not on <subject> or <predicate>

Cheating even more in RDF
▪ Quads: concept of “context” or “graph”
▪ Basically: one or more triples in named collection / graph
▪ This graph is again “URI” so it can be a <subject>
▪ <subject> <predicate> <object> <graph>
▪ But still no direct labels / properties on relations

Main differences
▪ Metadata on “relations”
▪ Property graphs can have metadata on relations
▪ Triples cannot do this directly (indirectly, very verbose)
▪ Semantics
▪ Not part of property graphs, on top op RDF
▪ Then again, few systems actually use reasoning

Main differences (2)
▪ Nice(r) visualization in PG products
▪ E.g. “local” data analysis
▪ Standardization in RDF !
▪ SKOS mapping, RDF(S), SHACL, Core Vocabularies…
▪ (meta)data exchange and linking/combining data

Similarities
▪ RDF and PG typically weaker enforcing than RDBMS
▪ RDBMS have triggers, constraints, type checks etc
▪ Often a specific “shape” of data is wanted by the user
▪ Work in progress
▪ Either GraphQL at the API level
▪ Hydra API + SHACL

Will they merge ?
▪ Industry would like to combine PG and RDF
▪ See also W3C event organized by Neo4J and Ontotext
▪ https://www.w3.org/Data/events/data-ws-2019/
▪ Suggestion for RDF* / SPARQL*
▪ http://olafhartig.de/slides/RDFStarInvitedTalkWSP2018.
pdf

Do I need a triple / graph store for linked data ?
▪ You don’t have an HTML-database for websites either
▪ It depends on queries, business case and developers
▪ If you want relations, constraints and data in the same
table-like structure, and your toolbox is all relational …
use an RDBMS

Who’s using RDF anyway ?
▪ Solid / Inrupt social, user-centric linked data apps
▪ https://solid.inrupt.com/about
▪ https://github.com/solid/solid
▪ https://ruben.verborgh.org/articles/redecentralizing-
the-web/
▪ Metadata exchange using EU CoreVocabularies
▪ SEO: schema.org embedded in web pages
▪ https://developers.google.com/search/docs/guides/intro
-structured-data

Can GraphQL query RDBMS, graphs, RDF ?
▪ Yes but… translations may be needed
▪ Performance ?
▪ It will most likely not the only API
▪ Remember SOAP, Swagger…
▪ But not “semantic” / “linked” out of the box
▪ See also https://comunica.github.io/Article-ISWC2018-
Demo-GraphQlLD/

Can SPARQL query RDBMS, map data to RDF ?
▪ Well, if you really want to …
▪ http://d2rq.org/
▪ http://rml.io/RML_Input.html

Interesting linked data resources
▪ Modular Linked data Javascript framework
▪ http://comunica.linkeddatafragments.org
▪ Linked Data Cubes for statistical data
▪ http://www.proxml.be/losd/cubes.html
▪ Metreeca linked data components
▪ https://www.metreeca.com/software/

BOSA.be
@BartHanssens
Thank you !

Graphs, Stores and API

Recommandé

Recommandé

Contenu connexe

Similaire à Graphs, Stores and API

Similaire à Graphs, Stores and API (20)

Plus de Bart Hanssens

Plus de Bart Hanssens (20)

Dernier

Dernier (20)

Graphs, Stores and API