SlideShare une entreprise Scribd logo
1  sur  13
Télécharger pour lire hors ligne
Dandelion: from raw data 
to dataGEMs for 
developers 
Stefano Parmesan 
Tatiana Tarasova 
Ugo Scaiella 
Michele Barbera
A bit of context 
• SpazioDati s.r.l. 
• Italian startup: Pisa & Trento 
• Members of the DBpedia Association 
• Manage the italian DBpedia
Goal 
• Close the gap between getting the data and 
using it 
• Build a Knowledge Graph as-a-service: 
• Make it querable 
• Make it stable, make it scale 
• Support different access levels
How? 
• Phase #1: PUT the data in 
• Data normalization 
• Entity deduplication 
• Phase #2: GET the data out 
• Slices
How? 
Data Normalisation Entity Deduplication Data Storage Data Access 
Sample 
Raw Data 
Reconciliation Services 
Source 1 
Source N 
Azkaban Silk 
Framework Titan Graph dandelion.eu 
Linked Data 
Slices 
dataGEM
Why… 
• … slices? 
• SQL-like APIs 
• Common knowledge, linked data 
• … a graph at all? 
• Traversals 
• Data is centralized 
• Different sources, different access levels
Why… 
• … titan/gremlin? 
• Scalable 
• Richer (multi-prop, undef-depth queries) 
• OpenSource 
• ElasticSearch powered
And now what? 
• Still a prototype: 
• Private beta access to slices (demo) 
• English and italian DBpedia 
• Corporate private data
Future? 
• Phase #1b: PUT the data in 
• Scalable entity deduplication 
• Phase #2b: GET the data out 
• API for graph traversal 
• Text analysis tools (dataTXT) 
• Customizations
RDF mappings 
<http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> a code:ISTATAdministrativeDivision ; 
sd:childOf <http://data.spaziodati.eu/resource/7b7d45857f1372e1205bcfc87c19b2b2db2e0f59> ; 
sd:code "001001" ; 
sd:acheneID "ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb" ; 
code:cadastralCode "A074" ; 
sd:label "Agliè" ; 
code:elevation "315"^^xsd:int ; 
code:isCoastal "false"^^xsd:boolean ; 
code:isMountainous "false"^^xsd:boolean ; 
sd:level "60"^^xsd:int . 
! 
_:node194hhq904x1 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; 
rdf:predicate code:population ; 
rdf:object "2574"^^xsd:int ; 
sd:acheneID "31e4104e62168ffc4c3d6d278ecc775effff6ebc" ; 
metaprop:validSince "2001-10-21"^^xsd:date . 
! 
_:node194hhq904x2 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; 
rdf:predicate code:population ; 
rdf:object "2644"^^xsd:int ; 
sd:acheneID "f38e87252cc5614faeec4abbeedd6315f5d00e9f" ; 
metaprop:validSince "2011-10-09"^^xsd:date .
Graph structure 
Provenance nodes 
Type nodes 
Bristle node 
Achene node
Traversing 
• v.as(‘x’).out(‘sd:childOf’) 
.loop(‘x’){ cur -> 
cur.outE(‘sd:childOf’).hasNext() 
}.path()
Stefano Parmesan 
parmesan@spaziodati.eu

Contenu connexe

Tendances

PID services - understandability and findability of data
PID services - understandability and findability of dataPID services - understandability and findability of data
PID services - understandability and findability of dataEOSC-hub project
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfRichard Cyganiak
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine TrainingLiz Grumbach
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real worldDiego Valerio Camarda
 
Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupalDay
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
 
(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LOD(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LODDiego Valerio Camarda
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RESChristophe Guéret
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureMichele Pasin
 
Data Wrangling with Open Refine
Data Wrangling with Open RefineData Wrangling with Open Refine
Data Wrangling with Open RefineLOUIS Libraries
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionRonald Ashri
 
Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineMartin Magdinier
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data SmarterMatheus Mota
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Asuncion Gomez-Perez
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise Ontotext
 
Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Giorgio Orsi
 

Tendances (20)

PID services - understandability and findability of data
PID services - understandability and findability of dataPID services - understandability and findability of data
PID services - understandability and findability of data
 
How to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdfHow to get your data into Sindice and Google with sitemap4rdf
How to get your data into Sindice and Google with sitemap4rdf
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
TXDHC OpenRefine Training
TXDHC OpenRefine TrainingTXDHC OpenRefine Training
TXDHC OpenRefine Training
 
30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world30° Nexa Lunch Seminar - Linked Data Platform vs real world
30° Nexa Lunch Seminar - Linked Data Platform vs real world
 
Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open data
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
 
(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LOD(Enterprise) Linked Data Platform a new standard to manage LOD
(Enterprise) Linked Data Platform a new standard to manage LOD
 
Informal presentation about RES
Informal presentation about RESInformal presentation about RES
Informal presentation about RES
 
Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
ODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer NatureODI Summit 2016 - Linked Open Data at Springer Nature
ODI Summit 2016 - Linked Open Data at Springer Nature
 
Data Wrangling with Open Refine
Data Wrangling with Open RefineData Wrangling with Open Refine
Data Wrangling with Open Refine
 
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An IntroductionLinking Open, Big Data Using Semantic Web Technologies - An Introduction
Linking Open, Big Data Using Semantic Web Technologies - An Introduction
 
Linked Open (meta)Data
Linked Open (meta)DataLinked Open (meta)Data
Linked Open (meta)Data
 
Keynote session - LOD2014 W3C event
Keynote session - LOD2014 W3C eventKeynote session - LOD2014 W3C event
Keynote session - LOD2014 W3C event
 
Iterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refineIterative data discovery and transformation with open refine
Iterative data discovery and transformation with open refine
 
Using the Semantic Web Stack to Make Big Data Smarter
Using the Semantic Web Stack to Make  Big Data SmarterUsing the Semantic Web Stack to Make  Big Data Smarter
Using the Semantic Web Stack to Make Big Data Smarter
 
Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data Maximising (Re)Usability of Library metadata using Linked Data
Maximising (Re)Usability of Library metadata using Linked Data
 
The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise The Bounties of Semantic Data Integration for the Enterprise
The Bounties of Semantic Data Integration for the Enterprise
 
Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)Fairhair.ai – alan turing institute june '17 (public)
Fairhair.ai – alan turing institute june '17 (public)
 

Similaire à ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackDenodo
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldOpenSource Connections
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsMars Lan
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Fwdays
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationPRELIDA Project
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSAPRBETTER
 
Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2Daniele Bailo
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
 
Product Keynote: Advancing Denodo’s Logical Data Fabric with AI and Advanced ...
Product Keynote: Advancing Denodo’s Logical Data Fabric with AI and Advanced ...Product Keynote: Advancing Denodo’s Logical Data Fabric with AI and Advanced ...
Product Keynote: Advancing Denodo’s Logical Data Fabric with AI and Advanced ...Denodo
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production Paolo Platter
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 

Similaire à ISWC 2014 - Dandelion: from raw data to dataGEMs for developers (20)

State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
How Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science StackHow Data Virtualization Adds Value to Your Data Science Stack
How Data Virtualization Adds Value to Your Data Science Stack
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data World
 
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data PlatformsWhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
WhereHows: Taming Metadata for 150K Datasets Over 9 Data Platforms
 
Intro to Big Data - Spark
Intro to Big Data - SparkIntro to Big Data - Spark
Intro to Big Data - Spark
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
 
Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2Metadata & brokering - a modern approach #2
Metadata & brokering - a modern approach #2
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Product Keynote: Advancing Denodo’s Logical Data Fabric with AI and Advanced ...
Product Keynote: Advancing Denodo’s Logical Data Fabric with AI and Advanced ...Product Keynote: Advancing Denodo’s Logical Data Fabric with AI and Advanced ...
Product Keynote: Advancing Denodo’s Logical Data Fabric with AI and Advanced ...
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at BitlyData Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
 
Ravi_Shrivas_CV
Ravi_Shrivas_CVRavi_Shrivas_CV
Ravi_Shrivas_CV
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas Geerdink
 

Plus de SpazioDati

Dandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismDandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismSpazioDati
 
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati
 
SpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storySpazioDati
 
News Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsNews Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsSpazioDati
 
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati
 
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APIFind the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APISpazioDati
 
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTDandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTSpazioDati
 
Cerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalyCerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalySpazioDati
 
LinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableLinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableSpazioDati
 
Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked SpazioDati
 
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013SpazioDati
 
Introducing JSONpedia
Introducing JSONpediaIntroducing JSONpedia
Introducing JSONpediaSpazioDati
 
Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1SpazioDati
 

Plus de SpazioDati (14)

Dandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data JournalismDandelion API e Atoka: due strumenti utili al Data Journalism
Dandelion API e Atoka: due strumenti utili al Data Journalism
 
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meetingSpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
SpazioDati presents Dandelion dataTXT - SenTaClAus project - final meeting
 
SpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meetingSpazioDati presents dataTXT - SenTaClAus project - final meeting
SpazioDati presents dataTXT - SenTaClAus project - final meeting
 
Opening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the storyOpening “Big Data Challenge” data: some insights on our role in the story
Opening “Big Data Challenge” data: some insights on our role in the story
 
News Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked StatisticsNews Fact-checking: One Practical Application of Linked Statistics
News Fact-checking: One Practical Application of Linked Statistics
 
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open daySpazioDati presents dataTXT - SenTaClAus project - 2nd open day
SpazioDati presents dataTXT - SenTaClAus project - 2nd open day
 
Find the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch APIFind the specific Wikipedia page you’re looking for, using Wikisearch API
Find the specific Wikipedia page you’re looking for, using Wikisearch API
 
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENTDandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
Dandelion API and mobile payment: food for thoughts for H-ACK PAYMENT
 
Cerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in ItalyCerved Group scommette sull'analisi semantica made in Italy
Cerved Group scommette sull'analisi semantica made in Italy
 
LinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuableLinkedStat: making ISTAT data more valuable
LinkedStat: making ISTAT data more valuable
 
Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked Smart Open Data Kickoff - Madrid - Linked
Smart Open Data Kickoff - Madrid - Linked
 
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
Linked STAT per l'evento datalab con ISTAT alla Smart City Exhibition 2013
 
Introducing JSONpedia
Introducing JSONpediaIntroducing JSONpedia
Introducing JSONpedia
 
Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1Pubblicare Linked Open Data, lezione 1
Pubblicare Linked Open Data, lezione 1
 

Dernier

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Dernier (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

  • 1. Dandelion: from raw data to dataGEMs for developers Stefano Parmesan Tatiana Tarasova Ugo Scaiella Michele Barbera
  • 2. A bit of context • SpazioDati s.r.l. • Italian startup: Pisa & Trento • Members of the DBpedia Association • Manage the italian DBpedia
  • 3. Goal • Close the gap between getting the data and using it • Build a Knowledge Graph as-a-service: • Make it querable • Make it stable, make it scale • Support different access levels
  • 4. How? • Phase #1: PUT the data in • Data normalization • Entity deduplication • Phase #2: GET the data out • Slices
  • 5. How? Data Normalisation Entity Deduplication Data Storage Data Access Sample Raw Data Reconciliation Services Source 1 Source N Azkaban Silk Framework Titan Graph dandelion.eu Linked Data Slices dataGEM
  • 6. Why… • … slices? • SQL-like APIs • Common knowledge, linked data • … a graph at all? • Traversals • Data is centralized • Different sources, different access levels
  • 7. Why… • … titan/gremlin? • Scalable • Richer (multi-prop, undef-depth queries) • OpenSource • ElasticSearch powered
  • 8. And now what? • Still a prototype: • Private beta access to slices (demo) • English and italian DBpedia • Corporate private data
  • 9. Future? • Phase #1b: PUT the data in • Scalable entity deduplication • Phase #2b: GET the data out • API for graph traversal • Text analysis tools (dataTXT) • Customizations
  • 10. RDF mappings <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> a code:ISTATAdministrativeDivision ; sd:childOf <http://data.spaziodati.eu/resource/7b7d45857f1372e1205bcfc87c19b2b2db2e0f59> ; sd:code "001001" ; sd:acheneID "ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb" ; code:cadastralCode "A074" ; sd:label "Agliè" ; code:elevation "315"^^xsd:int ; code:isCoastal "false"^^xsd:boolean ; code:isMountainous "false"^^xsd:boolean ; sd:level "60"^^xsd:int . ! _:node194hhq904x1 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; rdf:predicate code:population ; rdf:object "2574"^^xsd:int ; sd:acheneID "31e4104e62168ffc4c3d6d278ecc775effff6ebc" ; metaprop:validSince "2001-10-21"^^xsd:date . ! _:node194hhq904x2 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; rdf:predicate code:population ; rdf:object "2644"^^xsd:int ; sd:acheneID "f38e87252cc5614faeec4abbeedd6315f5d00e9f" ; metaprop:validSince "2011-10-09"^^xsd:date .
  • 11. Graph structure Provenance nodes Type nodes Bristle node Achene node
  • 12. Traversing • v.as(‘x’).out(‘sd:childOf’) .loop(‘x’){ cur -> cur.outE(‘sd:childOf’).hasNext() }.path()