SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Early analysis and debugging of linked open data cubes 
Enrico Daga1 Mathieu d’Aquin1 Aldo Gangemi2 Enrico Motta1 
1KMi - The Open University 
{enrico.daga,mathieu.daquin,enrico.motta}@open.ac.uk 
2ISTC-CNR - Italian National Research Council aldo.grangemi@open.ac.uk 
Second International Workshop on Semantic Statistics (SemStats) 
ISWC 2014, Riva del Garda - Trentino, Italy 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 1 / 21
Summary 
Linked Data Cubes can be complex for both publishers and consumers 
We published the Linked Data version of Unistats (LinkedUp!) 
Vital: a tool to support the early analysis and debugging of linked 
data cubes 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 2 / 21
Unistats Linked Data 
The Unistats Dataset 
https://www.hesa.ac.uk/unistats-dataset 
Published by UK the Higher Education Statistical Agency (HESA) 
Statistics about universities: courses, subjects, employment outcomes, 
accommodation information, fees ... 
Collected from UK universities 
Accessible using a Web API 
Downloadable as a single XML or CSV 
Documentated on the HESA web site (HTML or PDF) 
We are a university, and we might reuse this data as well ... 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 3 / 21
Unistats Linked Data 
Example: the Employment dataset 
... it shouldn’t be hard, right? 
<INSTITUTION> 
[ . . . ] 
<KISCOURSE> 
[ . . . ] 
<EMPLOYMENT> 
<EMPSBJ>090</EMPSBJ> 
<WORKSTUDY>95</WORKSTUDY> 
<STUDY>25</STUDY> 
<ASSUNEMP>5</ASSUNEMP> 
<BOTH>5</BOTH> 
<NOAVAIL>5</NOAVAIL> 
<WORK>60</WORK> 
<EMPPOP>25</EMPPOP> 
<EMPAGG>24</EMPAGG> 
</EMPLOYMENT> 
<EMPLOYMENT> 
[ . . . ] 
Dimensions: 
INSTITUTION 
KISCOURSE 
EMPSBJ 
Measures: 
WORKSTUDY 
STUDY 
ASSUNEMP 
BOTH 
NOAVAIL 
WORK 
Attributes: 
EMPPOP 
EMPAGG 
The analyst needs to build familiarity with the syntactic format of the 
data but especially with its semantics, jumping to the documentation on 
the web site. 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 4 / 21
Unistats Linked Data 
Benefits of linked data 
help to make sense of the information (everything is a link...); 
schema and documentation are embedded in the data; 
data can be queried (SPARQL) for task-oriented tailored views; 
we can make links to other data, also from third parties; 
new dimensions exploiting links and paths 
eg: postcodes derived from institutions 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 5 / 21
Unistats Linked Data 
The RDF Data Cube Vocabulary 
A vocabulary to publish multi-dimensional data in RDF! 
[ ] a qb : Obser v a t ion ; 
qb : dataSet ex : stats OU 
ex : o r g a n i z a t i o n ou : t h e o p e n uni v e r s i t y ; 
ex : inUK f a l s e ; 
ex : amount ”13000”ˆˆ xsd : i n t ; 
ex : rounded ex : down ; 
ex : year ”2013”ˆˆ xsd : gYear . 
ex : o r g a n i z a t i o n a qb : DimensionProperty . 
ex : inUK a qb : DimensionProperty . 
ex : year a qb : Dimensi onProperty . 
ex : amount a qb : MeasureProperty . 
ex : rounded a qb : Att r i but eProper t y . 
Observations are grouped in datasets. An observation has: 
dimensions identify the observation (eg: year, subject, institution, job type) 
measures contain the observed values (eg: amount, working students) 
attributes qualify the values (eg: provisional, round o↵, unit) 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 6 / 21
Unistats Linked Data 
Schema is specified 
A data structure definition specifies components - dimensions, measures 
or attributes. 
dse t : employment a qb : DataSet ; 
r d f s : l a b e l ”Employment”@en ; 
skos : pr e fLabe l ”Employment” ; 
r d f s : comment ” Contai n s data r e l a t i n g to the employment outcomes of s tudent s ”@en ; 
qb : s t r u c t u r e dset : employmentStructure ; 
r d f s : seeAl so <http ://www. hesa . ac . uk/ i n c l u d e s /C13061 r e sources / 
Uni s t a t s c h e c k d o c d e f i n i t i o n s . pdf ?v=1.12> ; 
r d f s : i sDef inedBy k i s : 
. 
dse t : employmentStructure a qb : DataSt r uctur eDe f ini t ion ; 
r d f s : l a b e l ”Employment ( s t ruc t u r e ) ”@en ; 
skos : pr e fLabe l ”Employment ( s t ruc tur e ) ” ; 
r d f s : comment ” I n formation r e l a t i n g to the employment outcomes of s tudent s ”@en ; 
qb : component [ 
a qb : ComponentSpeci f icat ion ; 
qb : dimension dcterm : s u b j e c t ; 
skos : pr e fLabe l ” S p e c i f i c a t i o n dimension : s u b j e c t ” ] ; 
[ . . . ] 
r d f s : seeAl so <http ://www. hesa . ac . uk/ i n c l u d e s /C13061 r e sources / 
Uni s t a t s c h e c k d o c d e f i n i t i o n s . pdf ?v=1.12> ; 
r d f s : i sDef inedBy k i s : 
. 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 7 / 21
Unistats Linked Data 
RDF can be self-descriptive 
The data can be extended to also include well-formed documentation: 
resources to have at least one rdfs:label with language information 
specified; 
resources to have an rdfs:comment; 
resources to have a single skos:prefLabel; 
schema elements to have rdfs:isDefinedBy links to the vocabulary 
specification; 
resources to include links to external documentation using 
rdfs:seeAlso. 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 8 / 21
Unistats Linked Data 
Datasets 
The LinkedUp Project published Unistats as Linked Open Data! 
See: http://www.linkedup-project.eu. 
SPARQL: http://data.linkedu.eu/kis/query 
Name Observ. 
Course Stages 82642 
Continuation 30115 
Entry Qualifications 30013 
National Student Survey Results 29381 
Employment 29300 
Tari↵s 27732 
Common Jobs 26849 
Job Types 26308 
Degree Classes 22548 
Salaries 22430 
National Student Survey NHS Results 389 
7.144.510 triples in total with 327.707 qb:Observations in 11 
qb:DataSets 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 9 / 21
Challenges 
For the publisher - debugging: 
Consistency between observations and the data structure definition 
SPARQL queries are defined in the QB specification 
Coherence between RDF model and original model 
How to detect remodelling issues? 
Documentation aligned with the origin and exposed within the data 
How to assess it’s quality? 
For the consumer - early analysis: 
Data fitness for use needs to be evaluated by users, but building the 
right query requires a-priori knowledge of the data 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 10 / 21
Question 
How feasible is it to use self-descriptive RDF to give an overview of the 
data in a way that would facilitate debugging and exploration of statistical 
linked open data? 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 11 / 21
Data Cubes with Vital 
Objective 
Support for data publisher and consumer: 
A methodology and a tool to debug linked data cubes and assess 
possible data quality issues. 
A suitable procedure in order to obtain representative samples of the 
data, for early analysis and evaluation 
by relying only on the SPARQL endpoint 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 12 / 21
Data Cubes with Vital 
Methodology 1/3 
Bar charts are simple and intuitive! 
However a basic bar chart is only made up of two axis! 
It can show one dimension (the categories observed) and one measure 
(the values compared). 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 13 / 21
Data Cubes with Vital 
Methodology 2/3 
Following the data structure definition we can generate views 
automatically, by picking a single dimension and a single measure 
We aggregate the measure values by applying a SPARQL 
aggregation function: 
I if the values datatype is numeric, we compute the average (AVG); 
I otherwise, we apply the COUNT function to the set of DISTINCT values, 
thus obtaining the number of di↵erent values for all observations under 
the dimension - a diversity score. 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 14 / 21
Data Cubes with Vital 
Methodology 3/3 
The Employment Dataset 
3 dimensions: Institution, Course and Subject 
6 measures: Study, Work, Work and Study, Work or Study Assumed 
Unemployed, Not available. 
= 18 diagrams are generated. 
The view Employment: Institution + Work: 
SELECT (? dimens ion as ? cat ) (AVG ( ?value ) as ?nb ) ? l a b e l 
WHERE { [ ] a qb : Obs e r v a t ion ; 
qb : dataSe t <http : // data . l i n k edu . eu/ k i s / datase t /employment> ; 
<http : // data . l i n k edu . eu/ k i s / ont o logy / i n s t i t u t i o n> ? dimension ; 
<http : // data . l i n k edu . eu/ k i s / ont o logy /work> ? value . 
o p t i o n a l { ? dimension skos : prefLabel ? l a b e l } . 
}GROUP BY ? dimension ? l a b e l 
ORDER BY DESC(? nb ) ? l a b e l 
LIMIT 20 
Following this approach our system generates 230 diagrams. 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 15 / 21
http://data.open.ac.uk/demo/vital 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 16 / 21
Data Cubes with Vital 
Debugging 1/2 
The debugging methodology is a continous iteration of the following steps: 
1 browse the diagrams and identify a suspicious graphs (eg: an empty 
diagram) 
2 inspect the SPARQL query and the result sets 
3 identify the error (eg: a wrong data type leading to a void result set) 
4 perform the fix on the data remodelling tool 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 17 / 21
Data Cubes with Vital 
Debugging 2/2 
The issues found can be grouped in the following categories: 
1 insufficient/wrong documentation 
2 wrong or unspecified data types 
3 syntax errors in URIs 
4 inconsistency between the data structure specifications and the 
observations. 
With this tool we have been able to discover them very easily, and to fix 
the data remodelling procedure accordingly. 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 18 / 21
Conclusions 
The Vital approach takes into account with a simple tool the basic needs 
of data publishers and early adopters: 
allows the exploration of data sets, dimensions, measures; 
supports the exploration of the documentation attached to the data; 
supports the publisher on detecting remodelling issues; 
it can contribute to the design of more relevant SPARQL queries from 
initial drafts; 
applying the tool to other data sources is straight forward, requiring 
minor configuration (basically, pointing to the SPARQL endpoint). 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 19 / 21
Open issues 
Consideration of other Data Cube components to debug and evaluate, 
such as Code Lists. 
We only use two simple aggregation functions: AVG for numeric 
values and COUNT for string values. 
The semantics of the measure, that may suggest specific aggregations. 
We do not consider attributes. Attributes can then a↵ect the way 
measures need to be processed for aggregations. 
However our objective was to support the analyist in gaining an early understainding of 
the core capabilities of a statistical linked open dataset. 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 20 / 21
Thank you! 
Enrico Daga 
enrico.daga@open.ac.uk 
Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 21 / 21

Contenu connexe

Similaire à Early Analysis and Debuggin of Linked Open Data Cubes

Adopting a situated learning framework for (big) data projects
Adopting a situated learning framework for (big) data projectsAdopting a situated learning framework for (big) data projects
Adopting a situated learning framework for (big) data projectsCranfield University
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 
Big Data Analytics IEEE 2015 Projects
Big Data Analytics IEEE 2015 ProjectsBig Data Analytics IEEE 2015 Projects
Big Data Analytics IEEE 2015 ProjectsVijay Karan
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstracttsysglobalsolutions
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentationPaolo Missier
 
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docxHorton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docxwellesleyterresa
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Gurdal Ertek
 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of CubesUniversity of Bologna
 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPShivarkarSandip
 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basicsShivarkarSandip
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstracttsysglobalsolutions
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyRichard Zijdeman
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentMaribel Acosta Deibe
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET Journal
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansBram van den Hout
 
Data mining techniques application for prediction in OLAP cube
Data mining techniques application for prediction in OLAP cubeData mining techniques application for prediction in OLAP cube
Data mining techniques application for prediction in OLAP cubeIJECEIAES
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 

Similaire à Early Analysis and Debuggin of Linked Open Data Cubes (20)

Adopting a situated learning framework for (big) data projects
Adopting a situated learning framework for (big) data projectsAdopting a situated learning framework for (big) data projects
Adopting a situated learning framework for (big) data projects
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Big Data Analytics IEEE 2015 Projects
Big Data Analytics IEEE 2015 ProjectsBig Data Analytics IEEE 2015 Projects
Big Data Analytics IEEE 2015 Projects
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
IEEE Datamining 2016 Title and Abstract
IEEE  Datamining 2016 Title and AbstractIEEE  Datamining 2016 Title and Abstract
IEEE Datamining 2016 Title and Abstract
 
Camp 4-data workshop presentation
Camp 4-data workshop presentationCamp 4-data workshop presentation
Camp 4-data workshop presentation
 
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docxHorton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
Horton+Pruim+Kaplan_MOSAIC-StudentGuide.pdf Nicholas J. .docx
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 
[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes[DOLAP2023] The Whys and Wherefores of Cubes
[DOLAP2023] The Whys and Wherefores of Cubes
 
Introduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAPIntroduction to Data Mining KDD Process OLAP
Introduction to Data Mining KDD Process OLAP
 
Introduction to data mining which covers the basics
Introduction to data mining which covers the basicsIntroduction to data mining which covers the basics
Introduction to data mining which covers the basics
 
IEEE Big data 2016 Title and Abstract
IEEE Big data  2016 Title and AbstractIEEE Big data  2016 Title and Abstract
IEEE Big data 2016 Title and Abstract
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...IRJET -  	  Conversion of Unsupervised Data to Supervised Data using Topic Mo...
IRJET - Conversion of Unsupervised Data to Supervised Data using Topic Mo...
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historians
 
Data mining techniques application for prediction in OLAP cube
Data mining techniques application for prediction in OLAP cubeData mining techniques application for prediction in OLAP cube
Data mining techniques application for prediction in OLAP cube
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 

Plus de Enrico Daga

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyEnrico Daga
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...Enrico Daga
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Enrico Daga
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectEnrico Daga
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchEnrico Daga
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEIEnrico Daga
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything projectEnrico Daga
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Enrico Daga
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchEnrico Daga
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachEnrico Daga
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Enrico Daga
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterEnrico Daga
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesEnrico Daga
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User StudyEnrico Daga
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsEnrico Daga
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionEnrico Daga
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
 

Plus de Enrico Daga (19)

Citizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data JourneyCitizen Experiences in Cultural Heritage Archives: a Data Journey
Citizen Experiences in Cultural Heritage Archives: a Data Journey
 
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...Streamlining Knowledge Graph Construction with a façade:  the SPARQL Anything...
Streamlining Knowledge Graph Construction with a façade: the SPARQL Anything...
 
Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.Data integration with a façade. The case of knowledge graph construction.
Data integration with a façade. The case of knowledge graph construction.
 
Knowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything ProjectKnowledge graph construction with a façade - The SPARQL Anything Project
Knowledge graph construction with a façade - The SPARQL Anything Project
 
Capturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities researchCapturing the semantics of documentary evidence for humanities research
Capturing the semantics of documentary evidence for humanities research
 
Trying SPARQL Anything with MEI
Trying SPARQL Anything with MEITrying SPARQL Anything with MEI
Trying SPARQL Anything with MEI
 
The SPARQL Anything project
The SPARQL Anything projectThe SPARQL Anything project
The SPARQL Anything project
 
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
Towards a Smart (City) Data Science. A case-based retrospective on policies, ...
 
Linked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities researchLinked data for knowledge curation in humanities research
Linked data for knowledge curation in humanities research
 
Capturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid ApproachCapturing Themed Evidence, a Hybrid Approach
Capturing Themed Evidence, a Hybrid Approach
 
Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...Challenging knowledge extraction to support
the curation of documentary evide...
Challenging knowledge extraction to support
the curation of documentary evide...
 
Ld4 dh tutorial
Ld4 dh tutorialLd4 dh tutorial
Ld4 dh tutorial
 
OU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data ClusterOU RSE Tutorial Big Data Cluster
OU RSE Tutorial Big Data Cluster
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Propagating Data Policies - A User Study
Propagating Data Policies - A User StudyPropagating Data Policies - A User Study
Propagating Data Policies - A User Study
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Propagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data FlowsPropagation of Policies in Rich Data Flows
Propagation of Policies in Rich Data Flows
 
A bottom up approach for licences classification and selection
A bottom up approach for licences classification and selectionA bottom up approach for licences classification and selection
A bottom up approach for licences classification and selection
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 

Dernier

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...kumargunjan9515
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...HyderabadDolls
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...vershagrag
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...HyderabadDolls
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridihmeghakumariji156
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?RemarkSemacio
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numberssuginr1
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 

Dernier (20)

Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
Lake Town / Independent Kolkata Call Girls Phone No 8005736733 Elite Escort S...
 
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
💞 Safe And Secure Call Girls Agra Call Girls Service Just Call 🍑👄6378878445 🍑...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime GiridihGiridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
Giridih Escorts Service Girl ^ 9332606886, WhatsApp Anytime Giridih
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 

Early Analysis and Debuggin of Linked Open Data Cubes

  • 1. Early analysis and debugging of linked open data cubes Enrico Daga1 Mathieu d’Aquin1 Aldo Gangemi2 Enrico Motta1 1KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@open.ac.uk 2ISTC-CNR - Italian National Research Council aldo.grangemi@open.ac.uk Second International Workshop on Semantic Statistics (SemStats) ISWC 2014, Riva del Garda - Trentino, Italy Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 1 / 21
  • 2. Summary Linked Data Cubes can be complex for both publishers and consumers We published the Linked Data version of Unistats (LinkedUp!) Vital: a tool to support the early analysis and debugging of linked data cubes Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 2 / 21
  • 3. Unistats Linked Data The Unistats Dataset https://www.hesa.ac.uk/unistats-dataset Published by UK the Higher Education Statistical Agency (HESA) Statistics about universities: courses, subjects, employment outcomes, accommodation information, fees ... Collected from UK universities Accessible using a Web API Downloadable as a single XML or CSV Documentated on the HESA web site (HTML or PDF) We are a university, and we might reuse this data as well ... Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 3 / 21
  • 4. Unistats Linked Data Example: the Employment dataset ... it shouldn’t be hard, right? <INSTITUTION> [ . . . ] <KISCOURSE> [ . . . ] <EMPLOYMENT> <EMPSBJ>090</EMPSBJ> <WORKSTUDY>95</WORKSTUDY> <STUDY>25</STUDY> <ASSUNEMP>5</ASSUNEMP> <BOTH>5</BOTH> <NOAVAIL>5</NOAVAIL> <WORK>60</WORK> <EMPPOP>25</EMPPOP> <EMPAGG>24</EMPAGG> </EMPLOYMENT> <EMPLOYMENT> [ . . . ] Dimensions: INSTITUTION KISCOURSE EMPSBJ Measures: WORKSTUDY STUDY ASSUNEMP BOTH NOAVAIL WORK Attributes: EMPPOP EMPAGG The analyst needs to build familiarity with the syntactic format of the data but especially with its semantics, jumping to the documentation on the web site. Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 4 / 21
  • 5. Unistats Linked Data Benefits of linked data help to make sense of the information (everything is a link...); schema and documentation are embedded in the data; data can be queried (SPARQL) for task-oriented tailored views; we can make links to other data, also from third parties; new dimensions exploiting links and paths eg: postcodes derived from institutions Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 5 / 21
  • 6. Unistats Linked Data The RDF Data Cube Vocabulary A vocabulary to publish multi-dimensional data in RDF! [ ] a qb : Obser v a t ion ; qb : dataSet ex : stats OU ex : o r g a n i z a t i o n ou : t h e o p e n uni v e r s i t y ; ex : inUK f a l s e ; ex : amount ”13000”ˆˆ xsd : i n t ; ex : rounded ex : down ; ex : year ”2013”ˆˆ xsd : gYear . ex : o r g a n i z a t i o n a qb : DimensionProperty . ex : inUK a qb : DimensionProperty . ex : year a qb : Dimensi onProperty . ex : amount a qb : MeasureProperty . ex : rounded a qb : Att r i but eProper t y . Observations are grouped in datasets. An observation has: dimensions identify the observation (eg: year, subject, institution, job type) measures contain the observed values (eg: amount, working students) attributes qualify the values (eg: provisional, round o↵, unit) Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 6 / 21
  • 7. Unistats Linked Data Schema is specified A data structure definition specifies components - dimensions, measures or attributes. dse t : employment a qb : DataSet ; r d f s : l a b e l ”Employment”@en ; skos : pr e fLabe l ”Employment” ; r d f s : comment ” Contai n s data r e l a t i n g to the employment outcomes of s tudent s ”@en ; qb : s t r u c t u r e dset : employmentStructure ; r d f s : seeAl so <http ://www. hesa . ac . uk/ i n c l u d e s /C13061 r e sources / Uni s t a t s c h e c k d o c d e f i n i t i o n s . pdf ?v=1.12> ; r d f s : i sDef inedBy k i s : . dse t : employmentStructure a qb : DataSt r uctur eDe f ini t ion ; r d f s : l a b e l ”Employment ( s t ruc t u r e ) ”@en ; skos : pr e fLabe l ”Employment ( s t ruc tur e ) ” ; r d f s : comment ” I n formation r e l a t i n g to the employment outcomes of s tudent s ”@en ; qb : component [ a qb : ComponentSpeci f icat ion ; qb : dimension dcterm : s u b j e c t ; skos : pr e fLabe l ” S p e c i f i c a t i o n dimension : s u b j e c t ” ] ; [ . . . ] r d f s : seeAl so <http ://www. hesa . ac . uk/ i n c l u d e s /C13061 r e sources / Uni s t a t s c h e c k d o c d e f i n i t i o n s . pdf ?v=1.12> ; r d f s : i sDef inedBy k i s : . Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 7 / 21
  • 8. Unistats Linked Data RDF can be self-descriptive The data can be extended to also include well-formed documentation: resources to have at least one rdfs:label with language information specified; resources to have an rdfs:comment; resources to have a single skos:prefLabel; schema elements to have rdfs:isDefinedBy links to the vocabulary specification; resources to include links to external documentation using rdfs:seeAlso. Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 8 / 21
  • 9. Unistats Linked Data Datasets The LinkedUp Project published Unistats as Linked Open Data! See: http://www.linkedup-project.eu. SPARQL: http://data.linkedu.eu/kis/query Name Observ. Course Stages 82642 Continuation 30115 Entry Qualifications 30013 National Student Survey Results 29381 Employment 29300 Tari↵s 27732 Common Jobs 26849 Job Types 26308 Degree Classes 22548 Salaries 22430 National Student Survey NHS Results 389 7.144.510 triples in total with 327.707 qb:Observations in 11 qb:DataSets Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 9 / 21
  • 10. Challenges For the publisher - debugging: Consistency between observations and the data structure definition SPARQL queries are defined in the QB specification Coherence between RDF model and original model How to detect remodelling issues? Documentation aligned with the origin and exposed within the data How to assess it’s quality? For the consumer - early analysis: Data fitness for use needs to be evaluated by users, but building the right query requires a-priori knowledge of the data Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 10 / 21
  • 11. Question How feasible is it to use self-descriptive RDF to give an overview of the data in a way that would facilitate debugging and exploration of statistical linked open data? Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 11 / 21
  • 12. Data Cubes with Vital Objective Support for data publisher and consumer: A methodology and a tool to debug linked data cubes and assess possible data quality issues. A suitable procedure in order to obtain representative samples of the data, for early analysis and evaluation by relying only on the SPARQL endpoint Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 12 / 21
  • 13. Data Cubes with Vital Methodology 1/3 Bar charts are simple and intuitive! However a basic bar chart is only made up of two axis! It can show one dimension (the categories observed) and one measure (the values compared). Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 13 / 21
  • 14. Data Cubes with Vital Methodology 2/3 Following the data structure definition we can generate views automatically, by picking a single dimension and a single measure We aggregate the measure values by applying a SPARQL aggregation function: I if the values datatype is numeric, we compute the average (AVG); I otherwise, we apply the COUNT function to the set of DISTINCT values, thus obtaining the number of di↵erent values for all observations under the dimension - a diversity score. Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 14 / 21
  • 15. Data Cubes with Vital Methodology 3/3 The Employment Dataset 3 dimensions: Institution, Course and Subject 6 measures: Study, Work, Work and Study, Work or Study Assumed Unemployed, Not available. = 18 diagrams are generated. The view Employment: Institution + Work: SELECT (? dimens ion as ? cat ) (AVG ( ?value ) as ?nb ) ? l a b e l WHERE { [ ] a qb : Obs e r v a t ion ; qb : dataSe t <http : // data . l i n k edu . eu/ k i s / datase t /employment> ; <http : // data . l i n k edu . eu/ k i s / ont o logy / i n s t i t u t i o n> ? dimension ; <http : // data . l i n k edu . eu/ k i s / ont o logy /work> ? value . o p t i o n a l { ? dimension skos : prefLabel ? l a b e l } . }GROUP BY ? dimension ? l a b e l ORDER BY DESC(? nb ) ? l a b e l LIMIT 20 Following this approach our system generates 230 diagrams. Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 15 / 21
  • 16. http://data.open.ac.uk/demo/vital Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 16 / 21
  • 17. Data Cubes with Vital Debugging 1/2 The debugging methodology is a continous iteration of the following steps: 1 browse the diagrams and identify a suspicious graphs (eg: an empty diagram) 2 inspect the SPARQL query and the result sets 3 identify the error (eg: a wrong data type leading to a void result set) 4 perform the fix on the data remodelling tool Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 17 / 21
  • 18. Data Cubes with Vital Debugging 2/2 The issues found can be grouped in the following categories: 1 insufficient/wrong documentation 2 wrong or unspecified data types 3 syntax errors in URIs 4 inconsistency between the data structure specifications and the observations. With this tool we have been able to discover them very easily, and to fix the data remodelling procedure accordingly. Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 18 / 21
  • 19. Conclusions The Vital approach takes into account with a simple tool the basic needs of data publishers and early adopters: allows the exploration of data sets, dimensions, measures; supports the exploration of the documentation attached to the data; supports the publisher on detecting remodelling issues; it can contribute to the design of more relevant SPARQL queries from initial drafts; applying the tool to other data sources is straight forward, requiring minor configuration (basically, pointing to the SPARQL endpoint). Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 19 / 21
  • 20. Open issues Consideration of other Data Cube components to debug and evaluate, such as Code Lists. We only use two simple aggregation functions: AVG for numeric values and COUNT for string values. The semantics of the measure, that may suggest specific aggregations. We do not consider attributes. Attributes can then a↵ect the way measures need to be processed for aggregations. However our objective was to support the analyist in gaining an early understainding of the core capabilities of a statistical linked open dataset. Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 20 / 21
  • 21. Thank you! Enrico Daga enrico.daga@open.ac.uk Enrico Daga, Mathieu d’Aquin, Aldo Gangemi, Enrico Motta (KMi - The Open University {enrico.daga,mathieu.daquin,enrico.motta}@Early analysis and debugging of linked open data cubes SemStats 2014 21 / 21