SlideShare une entreprise Scribd logo
1  sur  40
Télécharger pour lire hors ligne
Data integration with a façade
The case of knowledge graph construction
Enrico Daga
Research Fellow, — Knowledge Media Institute, STEM Faculty, The Open University
www.enridaga.net @enridaga
KMi Seminar Series, 13/7/2022
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement GA101004746.
The communication re
fl
ects only the author’s view and the Research Executive Agency is not
responsible for any use that may be made of the information it contains.
Definition: a KG …
(i) mainly describes real world entities and their
interrelations, organized in a graph,
(ii) defines possible classes and relations of entities in
a schema
(iii) allows for potentially interrelating arbitrary
entities with each other
(iv) covers various topical domains. [Paulheim, 2017]
Knowledge Graphs
# RDF triples
dbr:Wolfang_Amadeus_Mozart rdfs:label “Wolfgang Amadeus Mozart”@de .
dbr:Wolfang_Amadeus_Mozart dbo:deathPlace dbr:Vienna .
dbr:Vienna dbo:populationTotal “1852997”^^xsd:nonNegativeInteger .
# SPARQL allows us to query the graph via pattern matching
?person rdfs:label ?label .
?person dbo:deathPlace ?place .
?place dbo:populationTotal ?population .
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Example of a SELECT query, retrieving 2 variables
SELECT ?person ?label
WHERE {
#This is our graph pattern
?person rdfs:label ?label ;
dbo:deathPlace dbr:Vienna
}
PREFIX schema: <http://schema.org/>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbr: <http://dbpedia.org/resource/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Example of a CONSTRUCT query, which generates a new KG
CONSTRUCT {
?person rdf:type schema:Person ;
rdfs:label ?label ;
schema:deathPlace ?deathPlace
} WHERE {
?person rdfs:label ?label ;
dbo:deathPlace ?deathPlace
}
Preliminaries
Data Integration is the dominant use case for Knowledge Graphs - [Atkin, 2021, in
Lassila et al, 2021]
Definition: “Data integration involves combining data residing in different sources and
providing users with a unified view of them” [Lenzerini, 2002]
Data integration systems are defined as (G, S, M)
- G is the target common (global) schema
- S is the heterogeneous set of source schemas
- M are the mapping that maps queries between the
source and the global schemas
CSV
XML
JSON
XPath
IRI
JsonPath
HTML
JsonSchema
DOM
Web APIs HTTP
XSLT
XSD
Excel
Knowledge Graph
Construction (KGC)
Playing the soundtrack of our history
Preserving musical heritage
through knowledge graphs
Managing musical heritage collections
through knowledge graphs
Studying musical heritage through
(interlinked) knowledge graphs
https://spice-h2020.eu/ https://polifonia-project.eu/
“Source” can be … anything!
Data integration theory assumes that data sources are format-compatible,
i.e. it does not account for syntactic heterogeneity (it abstracts from the
actual formats or assumes the same format, e.g. a relational database)
HTML
CSV
JSON
“Source” can be … anything!
BibTex
abc
Markdown
XML/MEI
Existing approaches
• Targeting specific types/formats: Direct mapping, Tarql,
Any23, JSON2RDF, CSV2RDF, COW, SPARQL Micro-
services [Michel, 2019] — lack generality
• Specialised mapping languages, several types of
(R2RML, RML, ShexML): high learning demands, cold
start problem, limited to few popular format, difficult to
extend. [Dimou, 2014] [García-González, 2020]
• Extending SPARQL with custom features: SPARQL
Generate, high learning demands, difficult to extend to
other formats. [Lefrançois, 2017]
• Cognitive complexity: solutions transfer data source
complexity to the user (e.g. need to know XPath for XML,
JsonPath for JSON, …) or ask the user to learn a new
language!
SPARQL
Generate
RML
Instead, our aim is
• Target an open ended set of formats (extendibility without changes to
user-facing tool)
• Reduce the learning curve for Knowledge Graph practitioners (including
non-developers from Healthcare, Cultural heritage, …)
• Many SPARQL users fall into the category of end-user developer
[Lieberman, 2006]
• In a recent survey [Warren et al, 2018] 42% KG/SPARQL users are from
non-IT areas, including social sciences and the humanities, business
and economics, and biomedical, engineering or physical sciences.
KGC
Iterative process:
• Observe: the resource (e.g. a CSV file)
• Design mappings to a target ontology
• Transform: execute the mappings
• Observe: compare / evaluate
Trail and error approach, many iterations
KGC, revisited
KG construction is a twofold job:
• perform a syntax/meta-model conversion (e.g. CSV to RDF)
• project semantics onto the data (applying a domain
ontology)
A better model of the user process:
• Observe: the resource (e.g. a CSV file)
• Reengineering Design: what syntax/meta-model do I want?
• Remodelling Design: what semantics we want?
• Transform: execute the mappings
• Observe: compare / evaluate
Trail and error approach, many iterations
KGC, opinionated
• Reengineering: what syntax/meta-model do we want?
• We cannot know what structure our user wants but we
know the meta-model: RDF
• Remodelling: what semantics do we project?
• SPARQL is great for projecting semantics (change
namespaces, create entities from literals, adding types,
sophisticated relationships, composite structures, …)
• Data Integration theory here!
How to develop a unified approach to re-
engineer an open-ended set of formats
into RDF?
Concept
Façade Design Pattern
From Object Oriented Programming
A single abstraction on different, alternative interfaces
https://en.wikipedia.org/wiki/Facade_pattern
An RDF façade?
• A common RDF structure over diverse
formats
• Focusing on the meta-model (data structure)
• Leaving domain semantics as-it-is!
• apply the least possible “ontological
commitment”
• Problem Space: CSV, JSON, HTML, XML,
Binary (JPEG, PNG, …),Text
• Solution space: RDFS
1
2
3
4
1
n
A simple intuition:
Data formats rely on a common set of primitives …
BibTex
abc
Markdown
XML/MEI
Facade X
A simplified RDF meta-model, resembling a list-of-lists
Components: Containers (typed), slots (string / int), values
Intuitive, abstract notions: key-value, sequence, type
“String”, 1, true,…
“String”, 1, true
xyz:row1, …
xyz:row_n, xyz:…
fx:root | xyz:*
rdf:type
xyz:*
rdf:_N
PREFIX fx: <http://sparql.xyz/facade-x/ns/>
PREFIX xyz: <http://sparql.xyz/facade-x/data/>
JSON
https://github.com/tategallery/collection/artworks/t/023/t02319-9205.json
[ a fx:root ;
xyz:acno "T02319" ;
xyz:acquisitionYear “1978"^^xsd:int ;
xyz:all_artists "Kazimir Malevich" ;
xyz:catalogueGroup […] ;
xyz:classification "painting" ;
xyz:contributorCount “1”^^xsd:int
…
{
"acno": "T02319",
"acquisitionYear": 1978,
"all_artists": "Kazimir Malevich",
"catalogueGroup": {},
"classification": "painting",
"contributorCount": 1,
"contributors": [
{
DOM (HTML, XML, …)
[ a fx:root , xhtml:div ;
xhtml:id “az-group” ;
rdf:_1 [ a xhtml:div ;
rdf:_1 [ a xhtml:h4 ;
rdf:_1 "A" ;
<https://html.spec.whatwg.org/#innerHTML>
"A" ;
<https://html.spec.whatwg.org/#innerText>
"A"
] ;
html.selector=#az-group
@prefix xhtml: <http://www.w3.org/1999/xhtml#> .
CSV
id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url
10093,"Abakanowicz, Magdalena",Female,born 1930,1930,,Polska,,http://www.tate.org.uk/art/artists/magdalena-abakanowicz-10093
…
https://github.com/tategallery/collection/blob/master/artist_data.csv
[ a fx:root ;
rdf:_1 [ xyz:dates "born 1930" ;
xyz:gender "Female" ;
xyz:id "10093" ;
xyz:name "Abakanowicz, Magdalena" ;
xyz:placeOfBirth "Polska" ;
xyz:placeOfDeath "" ;
xyz:url "http://www.tate.org.uk/art/artists/magdalena-
abakanowicz-10093" ;
xyz:yearOfBirth "1930" ;
xyz:yearOfDeath ""
] ;
csv.headers=true|false
[ a fx:root ;
rdf:_1 [ rdf:_1 "id" ;
rdf:_2 "name" ;
rdf:_3 "gender" ;
rdf:_4 "dates" ;
rdf:_5 "yearOfBirth" ;
rdf:_6 "yearOfDeath" ;
rdf:_7 "placeOfBirth" ;
rdf:_8 “placeOfDeath" ;
rdf:_9 "url"
] ;
BibTex
ABC notation
[ a fx:root ;
xyz:X “1” ;
xyz:T “Ah! vous …” ;
xyz:C “anon.” ; xyz:O “France” ;
xyz:Z: “Transcirption based …”
rdf:_1 [
rdf:_1 “|: ” ;
rdf:_2 [
rdf:_1 “C” ;
rdf:_2 “C” ;
rdf:_3 “G” ;
rdf:_4 “G” ] ;
rdf:_3 “ | “ ;
rdf:_4 [
rdf:_1 “C” ;
rdf:_2 “C” ;
rdf:_3 “G2” ] ;
]
(3) Map on target schema
(1) Source schema
(2) Transform …
Mappings in SPARQL 1.1 !!!
Features v0.7.0
sparql-anything.cc
https://sparql-anything.readthedocs.io/
https://
github.com/
SPARQL-
Anything/
showcase-tate
Tate Gallery Open
Data
* CSV listing
artworks
* JSON with details
Task: build a SKOS
taxonomy of
@enridaga
https://github.com/
SPARQL-Anything/
showcase-imma
https://imma.ie/collection/freeing-the-memory/
Polifonia Ontology Network
* Scenarios collected on GitHub as
Markdown files
Task: extract a list of competency
questions from any scenario
https://github.com/SPARQL-Anything/showcase-mei
XML->CSV: using SPARQL to extract the note sequence from a XML/MEI file
System
SPARQL Anything is implemented on top of a
standard SPARQL 1.1 query engine (Apache
Jena ARQ)
Query execution is always after view
materialisation
In-memory / On-disk options available
Performance can be improved with
optimisations: e.g. avoid materialising triples
not needed by the query (triple-filtering)
Method can scale further by Slicing the data,
instead of materialising a Complete view
Evaluation
How to develop a unified approach to re-engineer an open-ended set of
formats into RDF?
• Empirical: all formats scrutinised so far are representable with FX
• Theoretical 1/2: any format expressible with a BNF grammar can be also
represented as FX
• Theoretical 2/2: FX can also represent the Entity-Relation model
• Approach scales linearly with input size
Lower (cognitive) complexity
• One measure of complexity is the number of (distinct) items or
variables (Halford et al. 2004; Warren et al. 2015).
• Vs SPARQL Generate: 8 queries from a museum collection use
case
• What are the titles of the artworks attributed to “ANONIMO”?
• What are the titles of the artworks created in the 1935?
• …
• vs RML / SPARQL Generate / ShexML
• 4 mappings
• Avg distinct tokens:
• SPARQL Anything: ~18
• SPARQL Generate: ~25 (∼39.72% more)
• RML: ~45 (∼150% more)
Comparisons
Practicable and sustainable
• Quantitative analysis of performance, to assess practicability
• In-Memory implementation (Naive)
• Execution time of q1-q12 (AVG on 10 executions)
• Comparable to RML Mapper and SPARQL Generate on files
up to 1M JSON objects (~5M triples)
• In-Memory implementation scales linearly
• The approach is sustainable
• Lines of Java code to maintain: SPARQL Generate 12280
(core module); RML Mapper 7951; SPARQL Anything: 3842
(all transformers) — v0.2.0 (v0.7.0 has ~12k)
Comparisons
SEMANTICS conference 2021 Submitted to ACM Transactions on Internet Technologies
Benefits
• Transform / Query resources having heterogeneous formats
• Low learning demands for KG practitioners — plain SPARQL 1.1
• A single+consistent abstraction for potentially any data format
• “Free lunch” data exploration and querying (no cold start problem)
• Open-ended extendibility: no changes to user-facing code required
• FX can express ANY format representable as BNF (as well as relational data)
• Generate RDF (and RDF-Star) but also Tabular Data
• Sometimes the KG is the intermediate object. Applications requiring other formats —
e.g. tabular data is the usual format for data science, KG can be a feature engineering
medium
Challenges / Directions
Execute FX queries
• FX view materialisation only method supported so far.
• Study other execution strategies, e.g. query-rewriting, learn from Ontology Based Data Access (OBDA)
Support users
• Only SPARQL raw code so far: develop user interfaces for FX-based data integration / exploration?
• Explore mapping design patterns
Support developers: how to streamline developing adapters to new formats?
Features:
• More inputs: relational DB, MIDI, Image Annotations, … the sky is the limit!
• More connectors: Apache Parquet, Thrift, Hadoop/HDFS, RDBMS/JDBC, MongoDB, …
• Anything in … anything out! Equip SPARQL Anything with a full-fledged template engine
Luigi
Asprino
University of
Bologna
Enrico
Daga
The Open
University
Aldo
Gangemi
ISTC-CNR
Justin
Dowdy
Software
Engineer
Paul
Warren
The Open
University
Paul
Mulholland
The Open
University
This project has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement GA101004746.
The communication re
fl
ects only the author’s view and the Research Executive Agency is not
responsible for any use that may be made of the information it contains.
Credits
Marco
Ratta
The Open
University
Jason
Carvalho
The Open
University
Thank you for listening
www.enridaga.net
• Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything. In: SEMANTiCS 2021: 17th International Conference on Semantic Systems
(2021)
• Atkin, M., Deely, T., Scharffe, F.: Knowledge Graph Benchmarking Report 2021 (version 2.0). Zenodo, http://doi.org/10.5281/zenodo.4950097 (June 2021)
• Lassila, O., Michael Schmidt, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal, Kelvin Lawrence, Ronak Sharda, and Bryan Thompson: Graph? Yes! Which
one? Help!. 1st Squaring the circle on knowledge graphs workshop - Semantics (2021)
• Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021)
• Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer (2018)
• Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020)
• Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data standards. In: Companion Proceedings of
The 2019 World Wide Web Conference. pp. 883–892 (2019)
• Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings of heterogeneous data. In: 7th
Workshop on Linked Data on the Web (2014)
• García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data mapping languages for firsttime users. PeerJ
Computer Science 6, e318 (2020)
• Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8, no. 3 (2017): 489-508.
• Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state of the art in enduser software
engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011)
• Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web Conference. pp. 35– 50. Springer
(2017)
• Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer (2006)
• Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015.
• Lenzerini, Maurizio. "Data integration: A theoretical perspective." In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,
pp. 233-246. 2002.
References

Contenu connexe

Similaire à Data integration with a façade. The case of knowledge graph construction.

Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Ralf Stockmann
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
eswcsummerschool
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
Scott Abel
 

Similaire à Data integration with a façade. The case of knowledge graph construction. (20)

Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001Harmony project - JISC Synthesis meeting 2001
Harmony project - JISC Synthesis meeting 2001
 
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
 
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked dataESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
ESWC SS 2013 - Thursday Keynote Vassilis Christophides: Preserving linked data
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic web
 
OA - Shared Canvas - TEI - Biblissima project
OA - Shared Canvas - TEI - Biblissima projectOA - Shared Canvas - TEI - Biblissima project
OA - Shared Canvas - TEI - Biblissima project
 
Understanding Information Architecture
Understanding Information ArchitectureUnderstanding Information Architecture
Understanding Information Architecture
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIs
 
Enterprise knowledge graphs
Enterprise knowledge graphsEnterprise knowledge graphs
Enterprise knowledge graphs
 
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
Multilingual Access to Cultural Heritage Content on the Semantic Web - Acl2013
 
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software ComponentsFIWARE Global Summit - IDS Implementation with FIWARE Software Components
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
 
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant FormatELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
ELSE IF 2019: Porting the xEBR Taxonomy to a Linked Open Data compliant Format
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
Semantic Web and Linked Data for cultural heritage materials - Approaches in ...
 
Data standardization process for social sciences and humanities
Data standardization process for social sciences and humanitiesData standardization process for social sciences and humanities
Data standardization process for social sciences and humanities
 
RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)RDF Data and Image Annotations in ResearchSpace (slides)
RDF Data and Image Annotations in ResearchSpace (slides)
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Linked Open Data Utrecht University Library
Linked Open Data Utrecht University LibraryLinked Open Data Utrecht University Library
Linked Open Data Utrecht University Library
 

Plus de Enrico Daga

Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Enrico Daga
 

Plus de Enrico Daga (17)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities research
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
Early Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data CubesEarly Analysis and Debuggin of Linked Open Data Cubes
Early Analysis and Debuggin of Linked Open Data Cubes
 

Dernier

NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
Amil baba
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 

Dernier (20)

Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 

Data integration with a façade. The case of knowledge graph construction.

  • 1. Data integration with a façade The case of knowledge graph construction Enrico Daga Research Fellow, — Knowledge Media Institute, STEM Faculty, The Open University www.enridaga.net @enridaga KMi Seminar Series, 13/7/2022 This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement GA101004746. The communication re fl ects only the author’s view and the Research Executive Agency is not responsible for any use that may be made of the information it contains.
  • 2. Definition: a KG … (i) mainly describes real world entities and their interrelations, organized in a graph, (ii) defines possible classes and relations of entities in a schema (iii) allows for potentially interrelating arbitrary entities with each other (iv) covers various topical domains. [Paulheim, 2017] Knowledge Graphs
  • 3.
  • 4. # RDF triples dbr:Wolfang_Amadeus_Mozart rdfs:label “Wolfgang Amadeus Mozart”@de . dbr:Wolfang_Amadeus_Mozart dbo:deathPlace dbr:Vienna . dbr:Vienna dbo:populationTotal “1852997”^^xsd:nonNegativeInteger . # SPARQL allows us to query the graph via pattern matching ?person rdfs:label ?label . ?person dbo:deathPlace ?place . ?place dbo:populationTotal ?population .
  • 5. PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX dbr: <http://dbpedia.org/resource/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> # Example of a SELECT query, retrieving 2 variables SELECT ?person ?label WHERE { #This is our graph pattern ?person rdfs:label ?label ; dbo:deathPlace dbr:Vienna }
  • 6. PREFIX schema: <http://schema.org/> PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX dbr: <http://dbpedia.org/resource/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> # Example of a CONSTRUCT query, which generates a new KG CONSTRUCT { ?person rdf:type schema:Person ; rdfs:label ?label ; schema:deathPlace ?deathPlace } WHERE { ?person rdfs:label ?label ; dbo:deathPlace ?deathPlace }
  • 7. Preliminaries Data Integration is the dominant use case for Knowledge Graphs - [Atkin, 2021, in Lassila et al, 2021] Definition: “Data integration involves combining data residing in different sources and providing users with a unified view of them” [Lenzerini, 2002] Data integration systems are defined as (G, S, M) - G is the target common (global) schema - S is the heterogeneous set of source schemas - M are the mapping that maps queries between the source and the global schemas
  • 9. Playing the soundtrack of our history Preserving musical heritage through knowledge graphs Managing musical heritage collections through knowledge graphs Studying musical heritage through (interlinked) knowledge graphs https://spice-h2020.eu/ https://polifonia-project.eu/
  • 10. “Source” can be … anything! Data integration theory assumes that data sources are format-compatible, i.e. it does not account for syntactic heterogeneity (it abstracts from the actual formats or assumes the same format, e.g. a relational database) HTML CSV JSON
  • 11. “Source” can be … anything! BibTex abc Markdown XML/MEI
  • 12. Existing approaches • Targeting specific types/formats: Direct mapping, Tarql, Any23, JSON2RDF, CSV2RDF, COW, SPARQL Micro- services [Michel, 2019] — lack generality • Specialised mapping languages, several types of (R2RML, RML, ShexML): high learning demands, cold start problem, limited to few popular format, difficult to extend. [Dimou, 2014] [García-González, 2020] • Extending SPARQL with custom features: SPARQL Generate, high learning demands, difficult to extend to other formats. [Lefrançois, 2017] • Cognitive complexity: solutions transfer data source complexity to the user (e.g. need to know XPath for XML, JsonPath for JSON, …) or ask the user to learn a new language! SPARQL Generate RML
  • 13. Instead, our aim is • Target an open ended set of formats (extendibility without changes to user-facing tool) • Reduce the learning curve for Knowledge Graph practitioners (including non-developers from Healthcare, Cultural heritage, …) • Many SPARQL users fall into the category of end-user developer [Lieberman, 2006] • In a recent survey [Warren et al, 2018] 42% KG/SPARQL users are from non-IT areas, including social sciences and the humanities, business and economics, and biomedical, engineering or physical sciences.
  • 14. KGC Iterative process: • Observe: the resource (e.g. a CSV file) • Design mappings to a target ontology • Transform: execute the mappings • Observe: compare / evaluate Trail and error approach, many iterations
  • 15. KGC, revisited KG construction is a twofold job: • perform a syntax/meta-model conversion (e.g. CSV to RDF) • project semantics onto the data (applying a domain ontology) A better model of the user process: • Observe: the resource (e.g. a CSV file) • Reengineering Design: what syntax/meta-model do I want? • Remodelling Design: what semantics we want? • Transform: execute the mappings • Observe: compare / evaluate Trail and error approach, many iterations
  • 16. KGC, opinionated • Reengineering: what syntax/meta-model do we want? • We cannot know what structure our user wants but we know the meta-model: RDF • Remodelling: what semantics do we project? • SPARQL is great for projecting semantics (change namespaces, create entities from literals, adding types, sophisticated relationships, composite structures, …) • Data Integration theory here! How to develop a unified approach to re- engineer an open-ended set of formats into RDF?
  • 17. Concept Façade Design Pattern From Object Oriented Programming A single abstraction on different, alternative interfaces https://en.wikipedia.org/wiki/Facade_pattern An RDF façade? • A common RDF structure over diverse formats • Focusing on the meta-model (data structure) • Leaving domain semantics as-it-is! • apply the least possible “ontological commitment” • Problem Space: CSV, JSON, HTML, XML, Binary (JPEG, PNG, …),Text • Solution space: RDFS
  • 18. 1 2 3 4 1 n A simple intuition: Data formats rely on a common set of primitives … BibTex abc Markdown XML/MEI
  • 19. Facade X A simplified RDF meta-model, resembling a list-of-lists Components: Containers (typed), slots (string / int), values Intuitive, abstract notions: key-value, sequence, type “String”, 1, true,… “String”, 1, true xyz:row1, … xyz:row_n, xyz:… fx:root | xyz:* rdf:type xyz:* rdf:_N PREFIX fx: <http://sparql.xyz/facade-x/ns/> PREFIX xyz: <http://sparql.xyz/facade-x/data/>
  • 20. JSON https://github.com/tategallery/collection/artworks/t/023/t02319-9205.json [ a fx:root ; xyz:acno "T02319" ; xyz:acquisitionYear “1978"^^xsd:int ; xyz:all_artists "Kazimir Malevich" ; xyz:catalogueGroup […] ; xyz:classification "painting" ; xyz:contributorCount “1”^^xsd:int … { "acno": "T02319", "acquisitionYear": 1978, "all_artists": "Kazimir Malevich", "catalogueGroup": {}, "classification": "painting", "contributorCount": 1, "contributors": [ {
  • 21. DOM (HTML, XML, …) [ a fx:root , xhtml:div ; xhtml:id “az-group” ; rdf:_1 [ a xhtml:div ; rdf:_1 [ a xhtml:h4 ; rdf:_1 "A" ; <https://html.spec.whatwg.org/#innerHTML> "A" ; <https://html.spec.whatwg.org/#innerText> "A" ] ; html.selector=#az-group @prefix xhtml: <http://www.w3.org/1999/xhtml#> .
  • 22. CSV id,name,gender,dates,yearOfBirth,yearOfDeath,placeOfBirth,placeOfDeath,url 10093,"Abakanowicz, Magdalena",Female,born 1930,1930,,Polska,,http://www.tate.org.uk/art/artists/magdalena-abakanowicz-10093 … https://github.com/tategallery/collection/blob/master/artist_data.csv [ a fx:root ; rdf:_1 [ xyz:dates "born 1930" ; xyz:gender "Female" ; xyz:id "10093" ; xyz:name "Abakanowicz, Magdalena" ; xyz:placeOfBirth "Polska" ; xyz:placeOfDeath "" ; xyz:url "http://www.tate.org.uk/art/artists/magdalena- abakanowicz-10093" ; xyz:yearOfBirth "1930" ; xyz:yearOfDeath "" ] ; csv.headers=true|false [ a fx:root ; rdf:_1 [ rdf:_1 "id" ; rdf:_2 "name" ; rdf:_3 "gender" ; rdf:_4 "dates" ; rdf:_5 "yearOfBirth" ; rdf:_6 "yearOfDeath" ; rdf:_7 "placeOfBirth" ; rdf:_8 “placeOfDeath" ; rdf:_9 "url" ] ;
  • 24. ABC notation [ a fx:root ; xyz:X “1” ; xyz:T “Ah! vous …” ; xyz:C “anon.” ; xyz:O “France” ; xyz:Z: “Transcirption based …” rdf:_1 [ rdf:_1 “|: ” ; rdf:_2 [ rdf:_1 “C” ; rdf:_2 “C” ; rdf:_3 “G” ; rdf:_4 “G” ] ; rdf:_3 “ | “ ; rdf:_4 [ rdf:_1 “C” ; rdf:_2 “C” ; rdf:_3 “G2” ] ; ]
  • 25. (3) Map on target schema (1) Source schema (2) Transform … Mappings in SPARQL 1.1 !!!
  • 27. https:// github.com/ SPARQL- Anything/ showcase-tate Tate Gallery Open Data * CSV listing artworks * JSON with details Task: build a SKOS taxonomy of @enridaga
  • 29. Polifonia Ontology Network * Scenarios collected on GitHub as Markdown files Task: extract a list of competency questions from any scenario
  • 30. https://github.com/SPARQL-Anything/showcase-mei XML->CSV: using SPARQL to extract the note sequence from a XML/MEI file
  • 31. System SPARQL Anything is implemented on top of a standard SPARQL 1.1 query engine (Apache Jena ARQ) Query execution is always after view materialisation In-memory / On-disk options available Performance can be improved with optimisations: e.g. avoid materialising triples not needed by the query (triple-filtering) Method can scale further by Slicing the data, instead of materialising a Complete view
  • 32. Evaluation How to develop a unified approach to re-engineer an open-ended set of formats into RDF? • Empirical: all formats scrutinised so far are representable with FX • Theoretical 1/2: any format expressible with a BNF grammar can be also represented as FX • Theoretical 2/2: FX can also represent the Entity-Relation model • Approach scales linearly with input size
  • 33. Lower (cognitive) complexity • One measure of complexity is the number of (distinct) items or variables (Halford et al. 2004; Warren et al. 2015). • Vs SPARQL Generate: 8 queries from a museum collection use case • What are the titles of the artworks attributed to “ANONIMO”? • What are the titles of the artworks created in the 1935? • … • vs RML / SPARQL Generate / ShexML • 4 mappings • Avg distinct tokens: • SPARQL Anything: ~18 • SPARQL Generate: ~25 (∼39.72% more) • RML: ~45 (∼150% more) Comparisons
  • 34. Practicable and sustainable • Quantitative analysis of performance, to assess practicability • In-Memory implementation (Naive) • Execution time of q1-q12 (AVG on 10 executions) • Comparable to RML Mapper and SPARQL Generate on files up to 1M JSON objects (~5M triples) • In-Memory implementation scales linearly • The approach is sustainable • Lines of Java code to maintain: SPARQL Generate 12280 (core module); RML Mapper 7951; SPARQL Anything: 3842 (all transformers) — v0.2.0 (v0.7.0 has ~12k) Comparisons
  • 35. SEMANTICS conference 2021 Submitted to ACM Transactions on Internet Technologies
  • 36. Benefits • Transform / Query resources having heterogeneous formats • Low learning demands for KG practitioners — plain SPARQL 1.1 • A single+consistent abstraction for potentially any data format • “Free lunch” data exploration and querying (no cold start problem) • Open-ended extendibility: no changes to user-facing code required • FX can express ANY format representable as BNF (as well as relational data) • Generate RDF (and RDF-Star) but also Tabular Data • Sometimes the KG is the intermediate object. Applications requiring other formats — e.g. tabular data is the usual format for data science, KG can be a feature engineering medium
  • 37. Challenges / Directions Execute FX queries • FX view materialisation only method supported so far. • Study other execution strategies, e.g. query-rewriting, learn from Ontology Based Data Access (OBDA) Support users • Only SPARQL raw code so far: develop user interfaces for FX-based data integration / exploration? • Explore mapping design patterns Support developers: how to streamline developing adapters to new formats? Features: • More inputs: relational DB, MIDI, Image Annotations, … the sky is the limit! • More connectors: Apache Parquet, Thrift, Hadoop/HDFS, RDBMS/JDBC, MongoDB, … • Anything in … anything out! Equip SPARQL Anything with a full-fledged template engine
  • 38. Luigi Asprino University of Bologna Enrico Daga The Open University Aldo Gangemi ISTC-CNR Justin Dowdy Software Engineer Paul Warren The Open University Paul Mulholland The Open University This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement GA101004746. The communication re fl ects only the author’s view and the Research Executive Agency is not responsible for any use that may be made of the information it contains. Credits Marco Ratta The Open University Jason Carvalho The Open University
  • 39. Thank you for listening www.enridaga.net
  • 40. • Daga, E., Asprino, L., Mulholland, P., Gangemi, A.: Facade-x: an opinionated approach to sparql anything. In: SEMANTiCS 2021: 17th International Conference on Semantic Systems (2021) • Atkin, M., Deely, T., Scharffe, F.: Knowledge Graph Benchmarking Report 2021 (version 2.0). Zenodo, http://doi.org/10.5281/zenodo.4950097 (June 2021) • Lassila, O., Michael Schmidt, Brad Bebee, Dave Bechberger, Willem Broekema, Ankesh Khandelwal, Kelvin Lawrence, Ronak Sharda, and Bryan Thompson: Graph? Yes! Which one? Help!. 1st Squaring the circle on knowledge graphs workshop - Semantics (2021) • Daga, E., Meroño-Peñuela, A., Motta, E.: Sequential linked data: the state of affairs. Semantic Web (2021) • Warren, P., Mulholland, P.: Using sparql–the practitioners’ viewpoint. In: European Knowledge Acquisition Workshop. pp. 485–500. Springer (2018) • Corcho, O., Priyatna, F., Chaves-Fraga, D.: Towards a new generation of ontology based data access. Semantic Web 11(1), 153–160 (2020) • Michel, F., Faron-Zucker, C., Corby, O., Gandon, F.: Enabling automatic discovery and querying of web apis at web scale using linked data standards. In: Companion Proceedings of The 2019 World Wide Web Conference. pp. 883–892 (2019) • Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: Rml: a generic language for integrated rdf mappings of heterogeneous data. In: 7th Workshop on Linked Data on the Web (2014) • García-González, H., Boneva, I., Staworko, S., Labra-Gayo, J.E., Lovelle, J.M.C.: Shexml: improving the usability of heterogeneous data mapping languages for firsttime users. PeerJ Computer Science 6, e318 (2020) • Paulheim, Heiko. "Knowledge graph refinement: A survey of approaches and evaluation methods." Semantic web 8, no. 3 (2017): 489-508. • Ko, A.J., Abraham, R., Beckwith, L., Blackwell, A., Burnett, M., Erwig, M., Scaffidi, C., Lawrance, J., Lieberman, H., Myers, B., et al.: The state of the art in enduser software engineering. ACM Computing Surveys (CSUR) 43(3), 1–44 (2011) • Lefrançois, M., Zimmermann, A., Bakerally, N.: A sparql extension for generating rdf from heterogeneous formats. In: European Semantic Web Conference. pp. 35– 50. Springer (2017) • Lieberman, H., Paternò, F., Klann, M., Wulf, V.: End-user development: An emerging paradigm. In: End user development, pp. 1–8. Springer (2006) • Cyganiak, Richard. Tarql (sparql for tables): Turn csv into rdf using sparql syntax. Technical Report, 2015. http://tarql. github. io, 2015. • Lenzerini, Maurizio. "Data integration: A theoretical perspective." In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 233-246. 2002. References