Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
From Big Linked Data to Linked Big Data - DBpedia as a framework for data integration
1. From Big Linked Data to Linked Big Data:
DBpedia as a framework for
data integration
Giuseppe Futia1, Antonio Vetrò1, Giuseppe Rizzo2
1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino
2- Istituto Superiore Mario Boella (ISMB)
7th DBpedia Community Meeting in Leipzig
15 September 2016
2. PhD candidate on semantics at
Nexa Center for Internet & Society,
DAUIN, Politecnico di Torino
3. Experiences with LOD and DBpedia
• TellMeFirst, a tool for classifying and enriching
textual documents built on DBpedia Spotlight
(http://tellmefirst.polito.it)
• Contratti Pubblici, a tool for processing, exploring,
and visualizing Italian Public Procurements
(http://public-contracts.nexacenter.org/)
7. Linked Data repository of
Public Contracts, linked to
DBpedia and SPC
Contratti Pubblici
(Synapta + Nexa)
Contratti Pubblici
8. DBpedia in our projects
• TellMeFirst:
–Training set used for the semantic classification task
–Several interlinks used for the enrichment task
• Contratti Pubblici:
–Data enrichment to enable advanced SPARQL queries
–Data quality improvement (i.e., consistent labels)
9. • Big Linked Data
–Already implemented as shown by the exponential growth
of Linked Data in the last years
• Linked Big Data
–RDF data model for Big Data Variety
–Meta information to enable powerful analytics
–Simplify Big Data access, integration, and interlinking
From Big Linked Data to Linked Big Data
10. Big Data notion of Variety
• Variety of data and representation formats
• Variety of conceptualizations and data models
• Variety related to temporal and spatial dependencies
• Variety as a “generalization of the semantic
heterogeneity as studied in the field of Linked Data”
(Pascal Hitzler & Krzysztof Janowicz)
11. PhD research questions (i)
• RQ1: How can the technological foundations of Linked
Data and Big Data can be further improved and
combined to create an open software architecture for a
multi-thematic, multi-perspective, and multi-medial
knowledge graph from heterogeneous sources?
12. PhD research questions (ii)
• RQ2: Which are the features of a research method to
meet and evaluate security, scalability, performance,
openness, interoperability of the software architecture
mentioned earlier? And how we can measure the quality
of the knowledge graph produced with this software
architecture?
13. Key ideas for my PhD
• Get concepts and ontologies from the DBpedia
knowledge base to support semantic alignment during
the integration stage
• Use frameworks for data integration of structured
information with Big Data technologies:
RDF Mapping Language (RML) + Hadoop or Spark
• Exploit Machine Learning techniques to increment
datasets with unstructured data (i.e., Deep Learning)
14. DBpedia as knowledge base for:
• Entity linking and annotations in documents
• Assertion of additional categories for data
• Improvement of multilingual information
• Estimation of data quality of integrated information
according to different features (i.e., provenance)
15. Challenges
• Greater accuracy (integrating different datasets)
• Immediacy (near-real time data, from new data sources)
• Flexibility (not constrained by database structure)
• Better analytics (the ability to change the rules)
• Data quality (reliability and effectiveness of data)