SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Entity Enrichment and Clustering
in ARCOMEM
Elena Demidova1,
including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2,
Katerina Doka3, Yannis Stavrakas3

1

L3S Research Center, Hannover, Germany

2 University
3

Sheffield, UK

IMIS, RC ATHENA, Athens, Greece
The ARCOMEM approach
• Make use of the Social Web
– Huge source of user generated content
– Wide range of articulation methods
From simple „I like it“-Buttons to complete articles
– Represents the diversity of opinions of the public

• User activities often triggered by
– Events and related entities
(e.g. Sport Events, Celebrations,
Crises, News Articles, Persons,
Locations)
– Topics (e.g. Global Warming,
Financial Crisis, Swine Flu)

A semantic-aware and socially-driven
preservation model is a natural way to go

Slide 2
ARCOMEM architecture
ARCOMEM system architecture foresees four processing
levels: crawler level, online processing level, offline
processing level and cross crawl analysis

Slide 3
ETOE offline processing chain
The processing chain depicted here describes all components involved in
the offline processing of Web objects.

4
The extraction components for text
Aim
Extraction of Entities, Topics, Events and Opinions (ETOEs) from
Web Pages
Social Web (Twitter, YouTube, Facebook, …)
Challenges
Entity recognition from degraded input sources (tweets etc)
Advancing state of the art NLP and text mining
Dynamics detection: evolution of terms/entities
Semantic representation of Web objects and entities
Appropriate RDF schemas for ETOE and Web objects
Exploiting (Linked Open) Web data to enrich extracted ETOE
Entity classification (into events, locations, topics etc) & consolidation

Slide 5
ETOE extraction with GATE: an example

candidate multi-word term

Slide 6
Data consolidation & integration problem
Data extracted from different components or during
different processing cycles not aligned
=> consolidation, disambiguation & correlation required.

<Location>Greece</Location>
<Person>Venizelos</Person>

<Location>Griechenland</Location>
<Organisation>Greek Parliament</Organisation>

?
Slide 7
Data enrichment & clustering
Enrichment of entities with related references to Linked
Data, particularly reference datasets (DBpedia, Freebase, …)
=> use enrichments for clustering/correlation/consolidation

Slide 8
Enrichment with DBpedia & Freebase
• DBpedia and Freebase are particularly well-suited due to
their vast size, the availability of disambiguation techniques
which can utilise the variety of multilingual labels available
in both datasets for individual data items and the level of
inter-connectedness of both datasets, allowing the retrieval
of a wealth of related information for particular items.
• In the case of DBpedia, we make use of the DBpedia
Spotlight service which enables an approximate string
matching with adjustable confidence level in the interval
[0,1]. Experimentally, we set confidence to 0.6.
• For Freebase, we use structured queries, taking into
account entity types extracted by GATE.

9
Enrichment for clustering & correlation: example

<Person>Jean Claude Trichet</Person>

<Organisation>ECB</Organisation>

<Event>Trichet warns of systemic debt crisis</Event>

Slide 10
Enrichment for clustering & correlation: example

<Person>Jean Claude Trichet</Person>

<Organisation>ECB</Organisation>

<Event>Trichet warns of systemic debt crisis</Event>

<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>

<Enrichment>http://dbpedia.org/resource/ECB</Enrichment>

Slide 11
Enrichment for clustering & correlation: example

<Person>Jean Claude Trichet</Person>

<Organisation>ECB</Organisation>

<Event>Trichet warns of systemic debt crisis</Event>

<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>

<Enrichment>http://dbpedia.org/resource/ECB</Enrichment>
=> dbpprop:office
=> dcterms:subject

dbpedia:President_of_the_European_Central_Bank
dbpedia:Governor_of_the_Banque_de_France
category:Living_people
category:Karlspreis_recipients
category:Alumni_of_the_École_Nationale_d'Administration
category:People_from_Lyon

Slide 12
ARCOMEM entities, enrichments & clusters
Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)
1013 clusters of correlated entities/events

Cluster built around
enrichment db:Market

Slide 13
Cluster expansion with related enrichments
Clusters can be further expanded by considering related enrichments in the reference knowledge
base. This is an experimental feature that is currently not included in the SARA application.

Cluster expansion
Cluster built around
enrichment db:Market

Slide 14
Clustering of entities via enrichment relatedness
Discovery of “related” entities by discovering related enrichments
(a) Retrieving possible paths between 2 enrichments (eg via RelFinder
http://www.visualdataweb.org/relfinder.php)
(b) Computation of relatedness measure (considering variables such as shortest path,
number of paths, relationship types, number of directly connected edges of both
enrichments…)
(c) Clustering enrichments (entities) which are above certain threshold

Slide 15
RDF schema for the Knowledge Base
Relationships between ARCOMEM entities (ETOE etc) and enrichments
RDF schema: http://www.gate.ac.uk/ns/ontologies/arcomem-datamodel.rdf

16
Enrichment evaluation results
Manual evaluation of 240 enrichment-entity pairs
Available scores: 1 (correct), 0 (incorrect), 0.5 (vague or
ambiguous relationship)
Entity Type

Average score
DBpedia

Average score
Freebase

Average Score
Total
0.71

arco:Event

0.71

arco:Location

0.81

arco:Money

0.67

arco:Organization

0.93

1

0.97

arco:Person

0.9

0.89

0.89

arco:Time

0.74

Total

0.79

0.94

0.88
0.67

0.74
0.94

0.87

Slide 17
Further reading
•

Entity Extraction and Consolidation for Social Web Content Preservation. S.
Dietze, D. Maynard, E. Demidova, T. Risse, W. Peters, K. Doka und Y.
Stavrakas, SDA, volume 912 of CEUR Workshop Proceedings, page 18-29.
CEUR-WS.org, (2012)

•

Can entities be friends? B. P. Nunes , R. Kawase, S. Dietze, D. Taibi, M. A.
Casanova, W. Nejdl Boston, US, 2012. Web of Linked Entities (WOLE2012),
Workshop at The 11th International Semantic Web Conference (ISWC2012).

•

Combining a co-occurrence-based and a semantic measure for entity linking. B.
P. Nunes, S. Dietze, M. A. Casanova, R. Kawase, B. Fetahu, W. Nejdl. 2013.
ESWC 2013 - 10th Extended Semantic Web Conference.

•

Linked data - The Story So Far. Biser, C., Heath, T. and Berners-Lee, T. 2009,
Special Issue on Linked data, International Journal on Semantic Web and
Information Systems (IJSWIS).

Slide 18
THANK YOU
CONTACT DETAILS
Dr. Elena Demidova
L3S Research Center
+49 511 762 17732
demidova@L3S.de
www.arcomem.eu

Contenu connexe

Tendances

Jarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataJarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataMustafa Jarrar
 
Microdata cataloging tool (nada)
Microdata cataloging tool (nada)Microdata cataloging tool (nada)
Microdata cataloging tool (nada)Divya Vyas
 
Linked Data: Why Bother?
Linked Data:  Why Bother?Linked Data:  Why Bother?
Linked Data: Why Bother?Jennifer Bowen
 
How to create Database in Moodle
How to create  Database in MoodleHow to create  Database in Moodle
How to create Database in MoodleYulia Ivanova
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebIJwest
 
Semantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning EnvironmentsSemantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning EnvironmentsRobin Khanna
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic TechnolgyTalat Fakhri
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesIJMER
 
Building a Digital Library
Building a Digital LibraryBuilding a Digital Library
Building a Digital Librarytomasz
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representationJulie Allinson
 

Tendances (10)

Jarrar: Introduction to Linked Data
Jarrar: Introduction to Linked DataJarrar: Introduction to Linked Data
Jarrar: Introduction to Linked Data
 
Microdata cataloging tool (nada)
Microdata cataloging tool (nada)Microdata cataloging tool (nada)
Microdata cataloging tool (nada)
 
Linked Data: Why Bother?
Linked Data:  Why Bother?Linked Data:  Why Bother?
Linked Data: Why Bother?
 
How to create Database in Moodle
How to create  Database in MoodleHow to create  Database in Moodle
How to create Database in Moodle
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
 
Semantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning EnvironmentsSemantic Web Technology and Ontology designing for e-Learning Environments
Semantic Web Technology and Ontology designing for e-Learning Environments
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
A Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web DatabasesA Novel Data Extraction and Alignment Method for Web Databases
A Novel Data Extraction and Alignment Method for Web Databases
 
Building a Digital Library
Building a Digital LibraryBuilding a Digital Library
Building a Digital Library
 
Object models and object representation
Object models and object representationObject models and object representation
Object models and object representation
 

En vedette

Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)arcomem
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advancedarcomem
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion MiningAli Habeeb
 
Identifying features in opinion mining via intrinsic and extrinsic domain rel...
Identifying features in opinion mining via intrinsic and extrinsic domain rel...Identifying features in opinion mining via intrinsic and extrinsic domain rel...
Identifying features in opinion mining via intrinsic and extrinsic domain rel...Gajanand Sharma
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webKarishma chaudhary
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisYun Hao
 

En vedette (6)

Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)Arcomem training – Enrichment Beginner (update)
Arcomem training – Enrichment Beginner (update)
 
Arcomem training opinions_advanced
Arcomem training opinions_advancedArcomem training opinions_advanced
Arcomem training opinions_advanced
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Identifying features in opinion mining via intrinsic and extrinsic domain rel...
Identifying features in opinion mining via intrinsic and extrinsic domain rel...Identifying features in opinion mining via intrinsic and extrinsic domain rel...
Identifying features in opinion mining via intrinsic and extrinsic domain rel...
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the web
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 

Similaire à Arcomem training – Enrichment Advanced (update)

Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Studyijcsit
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchangelagoze
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
Journalism and the Semantic Web
Journalism and the Semantic WebJournalism and the Semantic Web
Journalism and the Semantic WebKurt Cagle
 
Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataXing Niu
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersGabriela Agustini
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEnno Meijers
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAudrey Britton
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpagescsandit
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpagescsandit
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage informationsemanticsconference
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearchTope Omitola
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Artificial Intelligence Institute at UofSC
 
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage  A Linkage Platform For Large Volumes Of Academic InformationAcademic Linkage  A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage A Linkage Platform For Large Volumes Of Academic InformationAmy Roman
 

Similaire à Arcomem training – Enrichment Advanced (update) (20)

mx & dbs
mx & dbsmx & dbs
mx & dbs
 
Semantic web Santhosh N Basavarajappa
Semantic web   Santhosh N BasavarajappaSemantic web   Santhosh N Basavarajappa
Semantic web Santhosh N Basavarajappa
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
 
Open Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and ExchangeOpen Archives Initiative Object Reuse and Exchange
Open Archives Initiative Object Reuse and Exchange
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
Journalism and the Semantic Web
Journalism and the Semantic WebJournalism and the Semantic Web
Journalism and the Semantic Web
 
Zhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open DataZhishi.me - Weaving Chinese Linking Open Data
Zhishi.me - Weaving Chinese Linking Open Data
 
Searching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! AnswersSearching for Interestingness in Wikipedia and Yahoo! Answers
Searching for Interestingness in Wikipedia and Yahoo! Answers
 
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
Chachra, "Improving Discovery Systems Through Post Processing of Harvested Data"
 
EuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage informationEuropeanaTech 2018: A distributed network of digital heritage information
EuropeanaTech 2018: A distributed network of digital heritage information
 
An Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain OntologyAn Incremental Method For Meaning Elicitation Of A Domain Ontology
An Incremental Method For Meaning Elicitation Of A Domain Ontology
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
eventdemo2016
eventdemo2016eventdemo2016
eventdemo2016
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
Linked dataresearch
Linked dataresearchLinked dataresearch
Linked dataresearch
 
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
Relationships at the Heart of Semantic Web: Modeling, Discovering, Validating...
 
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage  A Linkage Platform For Large Volumes Of Academic InformationAcademic Linkage  A Linkage Platform For Large Volumes Of Academic Information
Academic Linkage A Linkage Platform For Large Volumes Of Academic Information
 

Plus de arcomem

Arcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls AdvancedArcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls Advancedarcomem
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginnersarcomem
 
Arcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advancedArcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advancedarcomem
 
Arcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginnersArcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginnersarcomem
 
Arcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advancedArcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advancedarcomem
 
Arcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis AdvancedArcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis Advancedarcomem
 
Arcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis BeginnerArcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis Beginnerarcomem
 
Arcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedArcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedarcomem
 
Arcomem training system-overview_advanced
Arcomem training system-overview_advancedArcomem training system-overview_advanced
Arcomem training system-overview_advancedarcomem
 
Arcomem training specifying-crawls
Arcomem training specifying-crawlsArcomem training specifying-crawls
Arcomem training specifying-crawlsarcomem
 
Arcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginnerArcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginnerarcomem
 
Arcomem training neer_beginner
Arcomem training neer_beginnerArcomem training neer_beginner
Arcomem training neer_beginnerarcomem
 
Arcomem training neer_advanced
Arcomem training neer_advancedArcomem training neer_advanced
Arcomem training neer_advancedarcomem
 
Arcomem training heritrix_beginner
Arcomem training heritrix_beginnerArcomem training heritrix_beginner
Arcomem training heritrix_beginnerarcomem
 
Arcomem training heritrix_advanced
Arcomem training heritrix_advancedArcomem training heritrix_advanced
Arcomem training heritrix_advancedarcomem
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedarcomem
 
Arcomem training enrichment_beginner
Arcomem training enrichment_beginnerArcomem training enrichment_beginner
Arcomem training enrichment_beginnerarcomem
 
Arcomem training diversification
Arcomem training diversificationArcomem training diversification
Arcomem training diversificationarcomem
 
Arcomem training twitter-dynamics_beginner
Arcomem training twitter-dynamics_beginnerArcomem training twitter-dynamics_beginner
Arcomem training twitter-dynamics_beginnerarcomem
 
Arcomem TPDL poster
Arcomem TPDL posterArcomem TPDL poster
Arcomem TPDL posterarcomem
 

Plus de arcomem (20)

Arcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls AdvancedArcomem training Specifying Crawls Advanced
Arcomem training Specifying Crawls Advanced
 
Arcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls BeginnersArcomem training Specifying Crawls Beginners
Arcomem training Specifying Crawls Beginners
 
Arcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advancedArcomem training Topic Analysis Models advanced
Arcomem training Topic Analysis Models advanced
 
Arcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginnersArcomem training Topic Analysis Models beginners
Arcomem training Topic Analysis Models beginners
 
Arcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advancedArcomem training Twitter Domain Experts advanced
Arcomem training Twitter Domain Experts advanced
 
Arcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis AdvancedArcomem training Cultural Analysis Advanced
Arcomem training Cultural Analysis Advanced
 
Arcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis BeginnerArcomem training Cultural Analysis Beginner
Arcomem training Cultural Analysis Beginner
 
Arcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advancedArcomem training twitter-dynamics_advanced
Arcomem training twitter-dynamics_advanced
 
Arcomem training system-overview_advanced
Arcomem training system-overview_advancedArcomem training system-overview_advanced
Arcomem training system-overview_advanced
 
Arcomem training specifying-crawls
Arcomem training specifying-crawlsArcomem training specifying-crawls
Arcomem training specifying-crawls
 
Arcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginnerArcomem training simple-text-mining_beginner
Arcomem training simple-text-mining_beginner
 
Arcomem training neer_beginner
Arcomem training neer_beginnerArcomem training neer_beginner
Arcomem training neer_beginner
 
Arcomem training neer_advanced
Arcomem training neer_advancedArcomem training neer_advanced
Arcomem training neer_advanced
 
Arcomem training heritrix_beginner
Arcomem training heritrix_beginnerArcomem training heritrix_beginner
Arcomem training heritrix_beginner
 
Arcomem training heritrix_advanced
Arcomem training heritrix_advancedArcomem training heritrix_advanced
Arcomem training heritrix_advanced
 
Arcomem training entities-and-events_advanced
Arcomem training entities-and-events_advancedArcomem training entities-and-events_advanced
Arcomem training entities-and-events_advanced
 
Arcomem training enrichment_beginner
Arcomem training enrichment_beginnerArcomem training enrichment_beginner
Arcomem training enrichment_beginner
 
Arcomem training diversification
Arcomem training diversificationArcomem training diversification
Arcomem training diversification
 
Arcomem training twitter-dynamics_beginner
Arcomem training twitter-dynamics_beginnerArcomem training twitter-dynamics_beginner
Arcomem training twitter-dynamics_beginner
 
Arcomem TPDL poster
Arcomem TPDL posterArcomem TPDL poster
Arcomem TPDL poster
 

Dernier

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Dernier (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Arcomem training – Enrichment Advanced (update)

  • 1. Entity Enrichment and Clustering in ARCOMEM Elena Demidova1, including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2, Katerina Doka3, Yannis Stavrakas3 1 L3S Research Center, Hannover, Germany 2 University 3 Sheffield, UK IMIS, RC ATHENA, Athens, Greece
  • 2. The ARCOMEM approach • Make use of the Social Web – Huge source of user generated content – Wide range of articulation methods From simple „I like it“-Buttons to complete articles – Represents the diversity of opinions of the public • User activities often triggered by – Events and related entities (e.g. Sport Events, Celebrations, Crises, News Articles, Persons, Locations) – Topics (e.g. Global Warming, Financial Crisis, Swine Flu) A semantic-aware and socially-driven preservation model is a natural way to go Slide 2
  • 3. ARCOMEM architecture ARCOMEM system architecture foresees four processing levels: crawler level, online processing level, offline processing level and cross crawl analysis Slide 3
  • 4. ETOE offline processing chain The processing chain depicted here describes all components involved in the offline processing of Web objects. 4
  • 5. The extraction components for text Aim Extraction of Entities, Topics, Events and Opinions (ETOEs) from Web Pages Social Web (Twitter, YouTube, Facebook, …) Challenges Entity recognition from degraded input sources (tweets etc) Advancing state of the art NLP and text mining Dynamics detection: evolution of terms/entities Semantic representation of Web objects and entities Appropriate RDF schemas for ETOE and Web objects Exploiting (Linked Open) Web data to enrich extracted ETOE Entity classification (into events, locations, topics etc) & consolidation Slide 5
  • 6. ETOE extraction with GATE: an example candidate multi-word term Slide 6
  • 7. Data consolidation & integration problem Data extracted from different components or during different processing cycles not aligned => consolidation, disambiguation & correlation required. <Location>Greece</Location> <Person>Venizelos</Person> <Location>Griechenland</Location> <Organisation>Greek Parliament</Organisation> ? Slide 7
  • 8. Data enrichment & clustering Enrichment of entities with related references to Linked Data, particularly reference datasets (DBpedia, Freebase, …) => use enrichments for clustering/correlation/consolidation Slide 8
  • 9. Enrichment with DBpedia & Freebase • DBpedia and Freebase are particularly well-suited due to their vast size, the availability of disambiguation techniques which can utilise the variety of multilingual labels available in both datasets for individual data items and the level of inter-connectedness of both datasets, allowing the retrieval of a wealth of related information for particular items. • In the case of DBpedia, we make use of the DBpedia Spotlight service which enables an approximate string matching with adjustable confidence level in the interval [0,1]. Experimentally, we set confidence to 0.6. • For Freebase, we use structured queries, taking into account entity types extracted by GATE. 9
  • 10. Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> Slide 10
  • 11. Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> Slide 11
  • 12. Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> => dbpprop:office => dcterms:subject dbpedia:President_of_the_European_Central_Bank dbpedia:Governor_of_the_Banque_de_France category:Living_people category:Karlspreis_recipients category:Alumni_of_the_École_Nationale_d'Administration category:People_from_Lyon Slide 12
  • 13. ARCOMEM entities, enrichments & clusters Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange) 1013 clusters of correlated entities/events Cluster built around enrichment db:Market Slide 13
  • 14. Cluster expansion with related enrichments Clusters can be further expanded by considering related enrichments in the reference knowledge base. This is an experimental feature that is currently not included in the SARA application. Cluster expansion Cluster built around enrichment db:Market Slide 14
  • 15. Clustering of entities via enrichment relatedness Discovery of “related” entities by discovering related enrichments (a) Retrieving possible paths between 2 enrichments (eg via RelFinder http://www.visualdataweb.org/relfinder.php) (b) Computation of relatedness measure (considering variables such as shortest path, number of paths, relationship types, number of directly connected edges of both enrichments…) (c) Clustering enrichments (entities) which are above certain threshold Slide 15
  • 16. RDF schema for the Knowledge Base Relationships between ARCOMEM entities (ETOE etc) and enrichments RDF schema: http://www.gate.ac.uk/ns/ontologies/arcomem-datamodel.rdf 16
  • 17. Enrichment evaluation results Manual evaluation of 240 enrichment-entity pairs Available scores: 1 (correct), 0 (incorrect), 0.5 (vague or ambiguous relationship) Entity Type Average score DBpedia Average score Freebase Average Score Total 0.71 arco:Event 0.71 arco:Location 0.81 arco:Money 0.67 arco:Organization 0.93 1 0.97 arco:Person 0.9 0.89 0.89 arco:Time 0.74 Total 0.79 0.94 0.88 0.67 0.74 0.94 0.87 Slide 17
  • 18. Further reading • Entity Extraction and Consolidation for Social Web Content Preservation. S. Dietze, D. Maynard, E. Demidova, T. Risse, W. Peters, K. Doka und Y. Stavrakas, SDA, volume 912 of CEUR Workshop Proceedings, page 18-29. CEUR-WS.org, (2012) • Can entities be friends? B. P. Nunes , R. Kawase, S. Dietze, D. Taibi, M. A. Casanova, W. Nejdl Boston, US, 2012. Web of Linked Entities (WOLE2012), Workshop at The 11th International Semantic Web Conference (ISWC2012). • Combining a co-occurrence-based and a semantic measure for entity linking. B. P. Nunes, S. Dietze, M. A. Casanova, R. Kawase, B. Fetahu, W. Nejdl. 2013. ESWC 2013 - 10th Extended Semantic Web Conference. • Linked data - The Story So Far. Biser, C., Heath, T. and Berners-Lee, T. 2009, Special Issue on Linked data, International Journal on Semantic Web and Information Systems (IJSWIS). Slide 18
  • 19. THANK YOU CONTACT DETAILS Dr. Elena Demidova L3S Research Center +49 511 762 17732 demidova@L3S.de www.arcomem.eu