SlideShare une entreprise Scribd logo
1  sur  17
Motivation
Data on the Web
18/06/13Lile 2013 – Rio de Janeiro
Some eyecatching opener illustrating growth and or diversity of web data
Towards Integration of Web Data into a
coherent Educational Data Graph
LILE 2013 : 3rd International Workshop on Learning and Education with the Web of Data
14 May 2013, Rio de Janeiro, Brazil
Davide Taibi – Besnik Fetahu – Stefan Dietze
(CNR – ITD, IT) (L3S Research Center, DE)
Outline
• Linked Open Data serving data-intensive applications
• Heterogeneity of datasets and schemas
• Is it all that easy to use Linked Open Data and what are they all about?
– Interlinking of datasets only at a superficial level
– Different schemas for similar resource classes accross datasets
– Non-structured resource descriptions
– Best-case scenario: very abstract topic definitions
– Difficult to query for a subset of resources and datasets for a specific topic
• Our approach
– Schema level integration
– Enhanced dataset & resource descriptions
– Instance level integration
– Scalable annotation extraction
– Clustering and correlation of datasets
18/06/13 Lile 2013 – Rio de Janeiro
Introduction
• Large amounts of publicly available Linked Open Data of educational relevance
• Difficulties on providing large-scale integration
• Dataset and resource description annotation
• Clustering and dataset interlinking
18/06/13 Lile 2013 – Rio de Janeiro
Educational Data
Steps towards a Linked Education Data Graph
18/06/13 Lile 2013 – Rio de Janeiro
Schema Level Integration
18/06/13 Lile 2013 – Rio de Janeiro
http://data.linkededucation.org/ns/linked-education.rdf
Schema Level Integration
18/06/13 Lile 2013 – Rio de Janeiro
http://data.linkededucation.org/ns/linked-education.rdf
LinkedUniversities Dataset
Schema Level Integration
• VoID based schema:
– http://data.linkededucation.org/ns/linked-education.rdf
– Dataset cataloging and classification
– Mappings (types, properties)
• Datasets:
– LinkedUniversities Dataset
– mEducator
– Europeana
• Imported resources for clustering experiments:
– 6 millions of distinct resources
– 97 millions of RDF triples
– 21.6 GB of data
• SPARQL endpoint:
– http://okkam.l3s.uni-hannover.de:8880/openrdf-workbench/repositories/linked-
learning-rdf
18/06/13 Lile 2013 – Rio de Janeiro
 DBLP-L3S
 BBC programmes
 ACM publications
Instance-level integration
18/06/13 Lile 2013 – Rio de Janeiro
<http://dbpedia.org/page/Gravitation>
<http://dbpedia.org/page/Strong>
<http://dbpedia.org/page/Dense>
• DBpedia Spotlight as NER & NED tool
• Annotation of unstructured content
• Selective & Scalable annotation
• Annotate tokens of different size
Instance-level integration
Characteristics of enrichments
•Disambiguation
•Acronyms detection (e.g. “dns”, “gmt”)
•Synonyms detection (e.g. “globe”, “earth”)
•Context detection (e.g. “apple” fruits, “apple” computer)
18/06/13 Lile 2013 – Rio de Janeiro
<http://dbpedia.org/page/Gravitation>
Correlation and Clustering
18/06/13 Lile 2013 – Rio de Janeiro
Gravitation
Equations
Earth
• Annotations used to construct a network of resources, with edges based on common
resource annotations.
Correlation and Clustering
• Methods used for clustering
• Based on the shared enrichments
• Naïve
• Based on the ef-irf (Enrichment Frequency-Inverse Resource Frequency) index
• Jaccard
• Cosine
Different threshold have been used to generate clusters
18/06/13 Lile 2013 – Rio de Janeiro
Evaluation
Three evaluation stages:
•Quantitative & Qualitative
• Assess annotation accuracy for exhaustive and scalable approaches
• Measure standard precision/recall metrics
• 250 resources for each dataset used for assessment
•Performance
• Gains in terms of scalability
18/06/13 Lile 2013 – Rio de Janeiro
Quantitative Evaluation
Context #Resources #Annotations #Entity Types
ACM 249 200 239
mEducator 250 495 355
BBC 250 1364 769
LinkedUniversities 243 166 283
DBLP 250 295 161
Europeana 249 938 672
Total 1491 3458 937
18/06/13 Lile 2013 – Rio de Janeiro
• Number of extracted entities is related to the length of a textual description in a
resource
• For long texts up to 87 distinct entities and more than 200 entity type associations
Qualitative Evaluation
18/06/13 Lile 2013 – Rio de Janeiro
• Human evaluators to measure annotation accuracy
• 2000 annotations for both (exhaustive and scalable) approaches were
assessed
• Number of evaluators for the first approach was 32, with an average of 63
tasks per user, while for the second, there were 23 users with an average
of 87 completed tasks
Precision Recall
Exhaustive 0.82 0.429
Scalable 0.77 0.687
∆[E-S] -0.05 +0.26
Performance Evaluation
Size-k No Filtering Filtered:resource level Filtered: dataset level
1 53089 24850 7464
2 51346 17919 13281
3 49603 11800 9607
4 47871 7793 6432
5 46153 5184 4289
6 44480 3529 2922
18/06/13 Lile 2013 – Rio de Janeiro
• Reduction of textual content to be analyzed for the annotation phase:
• Terms of tags {NN,NNP,NNPS}, reduce the amount of text by almost 40%.
• For various token sizes, the reduced amount goes up to 86%
• NER complexity task from DBpedia Spotlight:
• Reduction of HTTP requests.
• Avoid annotating similar chunks of text.
• Significant gains in terms of execution time: 3.5hrs vs. 20mins
Conclusion
• Large-scale educational data-graph
• Well-interlinked datasets at schema and instance level
• Enhanced dataset and resource description
• Scalable annotation procedure
• EF-IRF clustering approach
• Clusters and correlated datasets
18/06/13 Lile 2013 – Rio de Janeiro
Thank you!
Questions?
18/06/13 Lile 2013 – Rio de Janeiro

Contenu connexe

Tendances

Proposal for open government data
Proposal for open government dataProposal for open government data
Proposal for open government dataMahmoud Jalajel
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...Micah Altman
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Dataariadnenetwork
 
Open Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and BenefitsOpen Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and Benefitsariadnenetwork
 
01 caa2019 ariadn_eplus_snd_uj_krakow 20190425
01 caa2019 ariadn_eplus_snd_uj_krakow 2019042501 caa2019 ariadn_eplus_snd_uj_krakow 20190425
01 caa2019 ariadn_eplus_snd_uj_krakow 20190425ariadnenetwork
 
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAWArchiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAWariadnenetwork
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...LIBER Europe
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgJindřich Mynarz
 
Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007annegrete
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge GraphsPeter Haase
 
Visualising Data on Interactive Maps
Visualising Data on Interactive MapsVisualising Data on Interactive Maps
Visualising Data on Interactive MapsAnna Pawlicka
 
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...LIBER Europe
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterprisePeter Haase
 
Open Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in GermanyOpen Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in Germanyariadnenetwork
 
A Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesA Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesKarwan Jacksi
 
" Overview of the Metadata in the new CountrySTAT platform "
" Overview of the Metadata in the new CountrySTAT platform "" Overview of the Metadata in the new CountrySTAT platform "
" Overview of the Metadata in the new CountrySTAT platform "FAO
 

Tendances (20)

Proposal for open government data
Proposal for open government dataProposal for open government data
Proposal for open government data
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
WORLDMAP: A SPATIAL INFRASTRUCTURE TO SUPPORT TEACHING AND RESEARCH (BROWN BA...
 
Citizen Science Open Data
Citizen Science Open DataCitizen Science Open Data
Citizen Science Open Data
 
Requirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research DataRequirements for Open Sharing of Archaeological Research Data
Requirements for Open Sharing of Archaeological Research Data
 
Open Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and BenefitsOpen Data Publication - Requirements, Good practices, and Benefits
Open Data Publication - Requirements, Good practices, and Benefits
 
01 caa2019 ariadn_eplus_snd_uj_krakow 20190425
01 caa2019 ariadn_eplus_snd_uj_krakow 2019042501 caa2019 ariadn_eplus_snd_uj_krakow 20190425
01 caa2019 ariadn_eplus_snd_uj_krakow 20190425
 
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAWArchiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
Archiving archaeological data in Austria, Edeltraud Aspöck, Anja Masur OREA/ÖAW
 
Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...Adoption and Integration of Persistent Identifiers in European Research Infor...
Adoption and Integration of Persistent Identifiers in European Research Infor...
 
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.orgEC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
EC-WEB: Validator and Preview for the JobPosting Data Model of Schema.org
 
Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007Coherance in dissemination- Msis 2007
Coherance in dissemination- Msis 2007
 
Getting Started with Knowledge Graphs
Getting Started with Knowledge GraphsGetting Started with Knowledge Graphs
Getting Started with Knowledge Graphs
 
Visualising Data on Interactive Maps
Visualising Data on Interactive MapsVisualising Data on Interactive Maps
Visualising Data on Interactive Maps
 
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
The GND initiative 2017-2021: Developing a Backbone for the Web of Cultural a...
 
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the EnterpriseThe Information Workbench - Linked Data and Semantic Wikis in the Enterprise
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
 
GeoLinkedData
GeoLinkedDataGeoLinkedData
GeoLinkedData
 
Geo linked data lstd10(v2-boris)
Geo linked data lstd10(v2-boris)Geo linked data lstd10(v2-boris)
Geo linked data lstd10(v2-boris)
 
Open Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in GermanyOpen Access of Research Data - The Present and Future Situation in Germany
Open Access of Research Data - The Present and Future Situation in Germany
 
A Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD ResourcesA Survey of Exploratory Search Systems Based on LOD Resources
A Survey of Exploratory Search Systems Based on LOD Resources
 
" Overview of the Metadata in the new CountrySTAT platform "
" Overview of the Metadata in the new CountrySTAT platform "" Overview of the Metadata in the new CountrySTAT platform "
" Overview of the Metadata in the new CountrySTAT platform "
 

En vedette

Automated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesAutomated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesBesnik Fetahu
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?Besnik Fetahu
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesBesnik Fetahu
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
Improving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataImproving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataBesnik Fetahu
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Besnik Fetahu
 
Combining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingCombining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingBesnik Fetahu
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesBesnik Fetahu
 
Finding News Citations For Wikipedia
Finding News Citations For WikipediaFinding News Citations For Wikipedia
Finding News Citations For WikipediaBesnik Fetahu
 

En vedette (9)

Automated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity PagesAutomated News Suggestions for Populating Wikipedia Entity Pages
Automated News Suggestions for Populating Wikipedia Entity Pages
 
How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?How much is Wikipedia lagging behind News?
How much is Wikipedia lagging behind News?
 
Complex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype PropertiesComplex Matching of RDF Datatype Properties
Complex Matching of RDF Datatype Properties
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Improving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured DataImproving Entity Retrieval on Structured Data
Improving Entity Retrieval on Structured Data
 
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
Summaries on the fly: Query-based Extraction of Structured Knowledge from Web...
 
Combining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linkingCombining a co-occurrence-based and a semantic measure for entity linking
Combining a co-occurrence-based and a semantic measure for entity linking
 
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic ProfilesA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
 
Finding News Citations For Wikipedia
Finding News Citations For WikipediaFinding News Citations For Wikipedia
Finding News Citations For Wikipedia
 

Similaire à Towards Integration of Web Data into a coherent Educational Data Graph

2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...eMadrid network
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Lucy McKenna
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghRobin Rice
 
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...eMadrid network
 
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Visual Resources Association
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Dataaba-sah
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
Serendipity: a platform to discover and visualize data from OER
Serendipity: a platform to discover and visualize data from OERSerendipity: a platform to discover and visualize data from OER
Serendipity: a platform to discover and visualize data from OERThe Open Education Consortium
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageNoreen Whysel
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...CILIP MDG
 
Research Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghResearch Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghRobin Rice
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareRobin Rice
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 

Similaire à Towards Integration of Web Data into a coherent Educational Data Graph (20)

2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
2014 10 23 (fie2014) emadrid upm roadmap towards the openness of educational ...
 
Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...Engaging Information Professionals in the Process of Authoritative Interlinki...
Engaging Information Professionals in the Process of Authoritative Interlinki...
 
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and ApplicationsSemantics-enhanced Geoscience Interoperability, Analytics, and Applications
Semantics-enhanced Geoscience Interoperability, Analytics, and Applications
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 
Open University Data
Open University DataOpen University Data
Open University Data
 
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
2015 03 19 (EDUCON2015) eMadrid UPM Towards a Learning Analytics Approach for...
 
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
Creating, Curating, and Using Cultural Heritage Metadata and Resources in a L...
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
Hide the Stack: Toward Usable Linked Data
Hide the Stack:Toward Usable Linked DataHide the Stack:Toward Usable Linked Data
Hide the Stack: Toward Usable Linked Data
 
Hansen Metadata for Institutional Repositories
Hansen Metadata for Institutional RepositoriesHansen Metadata for Institutional Repositories
Hansen Metadata for Institutional Repositories
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Serendipity: a platform to discover and visualize data from OER
Serendipity: a platform to discover and visualize data from OERSerendipity: a platform to discover and visualize data from OER
Serendipity: a platform to discover and visualize data from OER
 
Linked Open Data for Cultural Heritage
Linked Open Data for Cultural HeritageLinked Open Data for Cultural Heritage
Linked Open Data for Cultural Heritage
 
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...Managing 'Big Data' in the social sciences: the contribution of an analytico-...
Managing 'Big Data' in the social sciences: the contribution of an analytico-...
 
Research Data Service at the University of Edinburgh
Research Data Service at the University of EdinburghResearch Data Service at the University of Edinburgh
Research Data Service at the University of Edinburgh
 
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShareScottish Digital Library Consortium Meeting: Edinburgh DataShare
Scottish Digital Library Consortium Meeting: Edinburgh DataShare
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 

Dernier

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Towards Integration of Web Data into a coherent Educational Data Graph

  • 1. Motivation Data on the Web 18/06/13Lile 2013 – Rio de Janeiro Some eyecatching opener illustrating growth and or diversity of web data Towards Integration of Web Data into a coherent Educational Data Graph LILE 2013 : 3rd International Workshop on Learning and Education with the Web of Data 14 May 2013, Rio de Janeiro, Brazil Davide Taibi – Besnik Fetahu – Stefan Dietze (CNR – ITD, IT) (L3S Research Center, DE)
  • 2. Outline • Linked Open Data serving data-intensive applications • Heterogeneity of datasets and schemas • Is it all that easy to use Linked Open Data and what are they all about? – Interlinking of datasets only at a superficial level – Different schemas for similar resource classes accross datasets – Non-structured resource descriptions – Best-case scenario: very abstract topic definitions – Difficult to query for a subset of resources and datasets for a specific topic • Our approach – Schema level integration – Enhanced dataset & resource descriptions – Instance level integration – Scalable annotation extraction – Clustering and correlation of datasets 18/06/13 Lile 2013 – Rio de Janeiro
  • 3. Introduction • Large amounts of publicly available Linked Open Data of educational relevance • Difficulties on providing large-scale integration • Dataset and resource description annotation • Clustering and dataset interlinking 18/06/13 Lile 2013 – Rio de Janeiro Educational Data
  • 4. Steps towards a Linked Education Data Graph 18/06/13 Lile 2013 – Rio de Janeiro
  • 5. Schema Level Integration 18/06/13 Lile 2013 – Rio de Janeiro http://data.linkededucation.org/ns/linked-education.rdf
  • 6. Schema Level Integration 18/06/13 Lile 2013 – Rio de Janeiro http://data.linkededucation.org/ns/linked-education.rdf LinkedUniversities Dataset
  • 7. Schema Level Integration • VoID based schema: – http://data.linkededucation.org/ns/linked-education.rdf – Dataset cataloging and classification – Mappings (types, properties) • Datasets: – LinkedUniversities Dataset – mEducator – Europeana • Imported resources for clustering experiments: – 6 millions of distinct resources – 97 millions of RDF triples – 21.6 GB of data • SPARQL endpoint: – http://okkam.l3s.uni-hannover.de:8880/openrdf-workbench/repositories/linked- learning-rdf 18/06/13 Lile 2013 – Rio de Janeiro  DBLP-L3S  BBC programmes  ACM publications
  • 8. Instance-level integration 18/06/13 Lile 2013 – Rio de Janeiro <http://dbpedia.org/page/Gravitation> <http://dbpedia.org/page/Strong> <http://dbpedia.org/page/Dense> • DBpedia Spotlight as NER & NED tool • Annotation of unstructured content • Selective & Scalable annotation • Annotate tokens of different size
  • 9. Instance-level integration Characteristics of enrichments •Disambiguation •Acronyms detection (e.g. “dns”, “gmt”) •Synonyms detection (e.g. “globe”, “earth”) •Context detection (e.g. “apple” fruits, “apple” computer) 18/06/13 Lile 2013 – Rio de Janeiro <http://dbpedia.org/page/Gravitation>
  • 10. Correlation and Clustering 18/06/13 Lile 2013 – Rio de Janeiro Gravitation Equations Earth • Annotations used to construct a network of resources, with edges based on common resource annotations.
  • 11. Correlation and Clustering • Methods used for clustering • Based on the shared enrichments • Naïve • Based on the ef-irf (Enrichment Frequency-Inverse Resource Frequency) index • Jaccard • Cosine Different threshold have been used to generate clusters 18/06/13 Lile 2013 – Rio de Janeiro
  • 12. Evaluation Three evaluation stages: •Quantitative & Qualitative • Assess annotation accuracy for exhaustive and scalable approaches • Measure standard precision/recall metrics • 250 resources for each dataset used for assessment •Performance • Gains in terms of scalability 18/06/13 Lile 2013 – Rio de Janeiro
  • 13. Quantitative Evaluation Context #Resources #Annotations #Entity Types ACM 249 200 239 mEducator 250 495 355 BBC 250 1364 769 LinkedUniversities 243 166 283 DBLP 250 295 161 Europeana 249 938 672 Total 1491 3458 937 18/06/13 Lile 2013 – Rio de Janeiro • Number of extracted entities is related to the length of a textual description in a resource • For long texts up to 87 distinct entities and more than 200 entity type associations
  • 14. Qualitative Evaluation 18/06/13 Lile 2013 – Rio de Janeiro • Human evaluators to measure annotation accuracy • 2000 annotations for both (exhaustive and scalable) approaches were assessed • Number of evaluators for the first approach was 32, with an average of 63 tasks per user, while for the second, there were 23 users with an average of 87 completed tasks Precision Recall Exhaustive 0.82 0.429 Scalable 0.77 0.687 ∆[E-S] -0.05 +0.26
  • 15. Performance Evaluation Size-k No Filtering Filtered:resource level Filtered: dataset level 1 53089 24850 7464 2 51346 17919 13281 3 49603 11800 9607 4 47871 7793 6432 5 46153 5184 4289 6 44480 3529 2922 18/06/13 Lile 2013 – Rio de Janeiro • Reduction of textual content to be analyzed for the annotation phase: • Terms of tags {NN,NNP,NNPS}, reduce the amount of text by almost 40%. • For various token sizes, the reduced amount goes up to 86% • NER complexity task from DBpedia Spotlight: • Reduction of HTTP requests. • Avoid annotating similar chunks of text. • Significant gains in terms of execution time: 3.5hrs vs. 20mins
  • 16. Conclusion • Large-scale educational data-graph • Well-interlinked datasets at schema and instance level • Enhanced dataset and resource description • Scalable annotation procedure • EF-IRF clustering approach • Clusters and correlated datasets 18/06/13 Lile 2013 – Rio de Janeiro
  • 17. Thank you! Questions? 18/06/13 Lile 2013 – Rio de Janeiro

Notes de l'éditeur

  1. http://okkam.l3s.uni-hannover.de:8880/openrdf-workbench/repositories/linked-learning-rdf/summary Previous Work: LinkedEducation 0.5+ - VoiD based schema: URL etc (dataset description and classification, alignments of types and properties) - Datasets: list (=subset of current linked education datasets) - But also imported resources for clustering experiments - Size: 6 million triples etc... - SPARQL endpoint, initial clustering results