SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
KIT – The Research University in the Helmholtz Association
INSTITUTE OF APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS (AIFB)
www.kit.edu
Linked Data Entity Summarization
Dipl.-Inf. Univ. Andreas Thalhammer 08.12.2016
Institute of Applied Informatics and Formal
Description Methods (AIFB)
2
Outline
1. Motivation
2. Research Questions
3. Contributions
a) LinkSUM (Contribution 1)
b) SUMMA API (Contribution 3)
4. Related Work
5. Summary and Outlook
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
3 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
1. MOTIVATION
Institute of Applied Informatics and Formal
Description Methods (AIFB)
4
Information need versus availability
Information need (in the US*)
More than 40% of all search queries are focused on one specific entity.
579 million searches per day come from home and work devices in the
US every day.
~ 232 million searches for entities (every day; in the US; desktop)
Information availability (Wikidata**)
Wikidata covers 24.5 million entities (growth of 55% in last year).
3.2 million entities have > 10 statements (growth of 78% in last year).
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
* https://www.comscore.com/Insights/Rankings/comScore-Releases-February-2016-US-Desktop-Search-Engine-Rankings
** https://www.wikidata.org/wiki/Wikidata:Statistics
Institute of Applied Informatics and Formal
Description Methods (AIFB)
5
Wikidata entry
for Pulp Fiction
~ 614 facts
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Growing amount of structured data on the Web
Institute of Applied Informatics and Formal
Description Methods (AIFB)
6
Naïve solution: Entity presentation based on
class summaries
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Source: yahoo.com)
Institute of Applied Informatics and Formal
Description Methods (AIFB)
7
Problems of class summaries
1. The patterns are very static and do not reflect the individual
particularities of entities.
2. A pattern needs to be created for each type and class hierarchies
need to be considered.
3. Some entities are of multiple (distinct) types with unclear main type.
4. Some of the properties can have many values for which no ranking or
cut-off is defined.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Person Athlete
Body
builder
Arnold
Schwarzenegger
Angkor Wat
Institute of Applied Informatics and Formal
Description Methods (AIFB)
8
Entity Summarization
Propositions:
Every entity is individual.
For different entities, different properties are of importance.
Entities of the same type do not always have the same attributes.
For each entity, a single property-value pair can be of different
relevance.
Solution:
Focus on individual particularities of each entity:
Entity Summarization
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
9 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
2. RESEARCH QUESTIONS
Institute of Applied Informatics and Formal
Description Methods (AIFB)
10
Challenge #1
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
RQ1: How can we effectively summarize entities with limited
background information?
RQ1.1: How can we use link analysis effectively in order to derive
summaries of entities?
RQ1.2: How can we use usage data analysis effectively in order to derive
summaries of entities?
RDF data typically does not reflect importance levels in its relations.
Proprietary entity summarization systems have access to a lot of data
(e.g., search queries) and infrastructure (e.g., a full Web index).
Other knowledge panel providers (such as publishers) are lacking that
information and infrastructure.
(Source: google.com)
Institute of Applied Informatics and Formal
Description Methods (AIFB)
11
Challenge #2
RQ2: Is there a minimum set of re-occurring/common features of entity
summarization systems that allow us to provide a generic API?
Andreas Thalhammer – Linked Data Entity Summarization03.10.201803.10.2018
Providers of knowledge panels are hiding the original graph structure in
strongly abstracted interfaces.
Standardized programmatic access is desirable (but not available).
(Source: google.com)
(Source: developers.google.com/knowledge-graph)
Institute of Applied Informatics and Formal
Description Methods (AIFB)
12
Challenge #3
RQ3: How can we align duplicate/similar facts about Linked Data
entities on the Web?
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Different Web sources provide structured information about a single entity.
The different sources often cover similar information but do not provide
according links or vocabulary mappings.
Alignments are particularly difficult as the sources typically provide data at
different levels of modeling granularity.
(Source: imdb.com)
(Source: wikidata.org)
Institute of Applied Informatics and Formal
Description Methods (AIFB)
13 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
3. CONTRIBUTIONS
Institute of Applied Informatics and Formal
Description Methods (AIFB)
14
Knowledge
Base(s)
Input
Output
(Usage Data)
(Link Structure)
LinkSUM
UBES
UI
SUMMA
API
1
2
3
Entity
Data
Fusion
4
Overview: Research Questions and Contributions
RQ1: How can we effectively summarize entities with limited
background information?
RQ1.1: How can we use link analysis effectively in order to
derive summaries of entities? (Contribution 1)
RQ1.2: How can we use usage data analysis effectively in
order to derive summaries of entities? (Contribution 2)
RQ2: Is there a minimum set of re-occurring/common features of
entity summarization systems that allow us to provide a generic
API (Contribution 3)
RQ3: How can we align duplicate/similar facts about Linked Data
entities on the Web? (Contribution 4)
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
15
Linked Data Entity Summarization
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Knowledge
Base(s)
Input
Output
(Usage Data)
(Link Structure)
LinkSUM
UBES
UI
SUMMA
API
1
2
3
Entity
Data
Fusion
4
Contribution 1
Institute of Applied Informatics and Formal
Description Methods (AIFB)
16
LinkSUM
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Step 1: Select top-k important related resources.
Step 2: Select the most relevant connecting predicate.
Idea: Use link analysis for selecting facts.
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
17 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Approach: Resource Selection
Quentin
Tarantino
Pulp Fiction
director
Compute PageRank [5] scores of entities with (un-typed)
links that occur in textual descriptions of entities (pr).
Use “Backlinks” [7] (also called “mutual links”) for finding strong
connections (bl):
Combine scores:
(Link Structure)
LinkSUM
dbpedia:Category:English-language_films 220.961
dbpedia:Quentin_Tarantino 13.7403
dbpedia:John_Travolta 10.5771
dbpedia:Miramax_Films 9.9398
... ...
Institute of Applied Informatics and Formal
Description Methods (AIFB)
18 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Approach: Relation Selection
Problem: multiple relations
Approaches:
Frequency (FRQ)
#times the predicate is used
Exclusivity (EXC)
1 / (N + M)
Description (DSC):
#domain + #range + #label
Quentin
Tarantino
Pulp Fiction
director
writer of
and combinations
of those, e.g. (FREQ * EXCL)
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
19 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Used reference dataset:
Introduced in Gunaratna et al. [3].
Contains human-created summaries of 50 entities (DBpedia 3.9,
outgoing relations).
Includes seven top-5 and seven top-10 summaries for each entity.
The dataset was created by 15 experts from the Semantic Web
field.
Used similarity measure:
Reference system:
FACES (introduced in [3]).
Quantitative Evaluation: Dataset and Measures
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
20 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Quantitative Evaluation: Results
(Link Structure)
LinkSUM
SO: Subject-Object pairs (predicates not considered).
SPO: Full triple.
config-1:
config-2:
Significance with respect to both LinkSUM configurations (p < 0.05).
Significance with respect to the best LinkSUM configuration (p < 0.05).
Standard deviation.SD
9.0
8.0
Institute of Applied Informatics and Formal
Description Methods (AIFB)
21 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Qualitative Evaluation: Setup
(Link Structure)
LinkSUM
Scenario: Search Engine Result Page (SERP).
20 users, 10 entities (from the FACES dataset).
Institute of Applied Informatics and Formal
Description Methods (AIFB)
22 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Qualitative Evaluation: Results
(Link Structure)
LinkSUM
In some cases the task is
subjective.
Reasons for:
Selection
- the presented related
resources are relevant for
the entity.
Rejection
- redundancy.
- related resources do not
characterize the entity.
Institute of Applied Informatics and Formal
Description Methods (AIFB)
23
Focus: PageRank (1)
PageRank is not perfect, for example:
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
PREFIX v:http://purl.org/voc/vrank#
SELECT ?e ?r FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/
#DBpedia_PageRank>
WHERE {
?e rdf:type dbo:Scientist;
v:hasRank/v:rankValue ?r.
} ORDER BY DESC(?r) LIMIT 5
dbpedia:Carl_Linnaeus 551.791
dbpedia:Charles_Darwin 215.028
dbpedia:Albert_Einstein 186.549
dbpedia:Isaac_Newton 167.811
dbpedia:Sigmund_Freud 140.245
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
24
Focus: PageRank (2)
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Link Structure)
LinkSUM
Important parameters (for resources r):
l(r) – returns all pages that link to r.
c(r) – the number of outgoing links of r.
d – the damping factor
Traditional PageRank [5]:
Variant: Weighted Links Rank (WLRank) [6]:
Link weights (lw): relative position of a link in the article
[8]
Institute of Applied Informatics and Formal
Description Methods (AIFB)
25
Focus: PageRank (3)
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Link Structure)
LinkSUM
Newly constructed rankings:
ALL – all links from the article text and from the templates.
ATL – article text links.
TEL – template links.
ATL-RP – article text links with WLRank and relative position.
Size of input dataset:
Reference rankings (page-view-based):
TOWR-PV – “The Open Wikipedia Ranking”
SUB – SubjectiveEye3D by Paul Houle
ALL ATL TEL ATL-RP
# links 159.398.815 142.305.605 26.460.273 143.056.545
Institute of Applied Informatics and Formal
Description Methods (AIFB)
26
Focus: PageRank (4)
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Link Structure)
LinkSUM
Measure: Spearman rank correlation (range: [-1, 1])
Results:
Conclusions:
Bad correlation of TEL with TOWR-PV/SUB is the result of a small input
data set.
Weighting by relative position improves correlation to SUB. These findings
are supported by [4].
Institute of Applied Informatics and Formal
Description Methods (AIFB)
27
Conclusions and Impact
Conclusions:
LinkSUM significantly outperforms the state of the art.
Entity summarization:
Focus should be on selecting relevant resources.
Redundancies at the object level should be avoided.
LinkSUM is lightweight and can be applied in other scenarios, e.g.
Web sites with semantic annotations.
Semantic MediaWikis.
Impact:
Published and presented as full research paper at ICWE 2016.
The PageRank scores are published online and found many adopters
(e.g., the official DBpedia SPARQL endpoint includes the scores)
In use at the WDAqua project (http://wdaqua.eu/).
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
(Link Structure)
LinkSUM
Institute of Applied Informatics and Formal
Description Methods (AIFB)
28
Linked Data Entity Summarization
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Knowledge
Base(s)
Input
Output
(Usage Data)
(Link Structure)
LinkSUM
UBES
UI
SUMMA
API
1
2
3
Entity
Data
Fusion
4
Contribution 3
Institute of Applied Informatics and Formal
Description Methods (AIFB)
29
SUMMA API
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Quantitative evaluation.
Qualitative evaluation.
A/B testing.
Combination of summary services.
Idea: A common API for entity summaries
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
30 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Approach: SUMMA API
Parameters:
URI (of the entity e) – the entity needs to be identified
k (number) – an upper limit of facts related to e
Multi-language support
Statement groups (e.g., biographical data)
Restriction to specific properties
Multi-hop search space
SUMMA Vocabulary:
Output
UI
SUMMA
API
summa:Summary
xsd:positiveInteger
summa:topK
summa:entity
rdfs:Resource
xsd:String
summa:language
summa:fixedProperty
rdf:Property
summa:statement
rdf:Statement
xsd:positiveInteger
summa:maxHops
summa:SummaryGroup
summa:group
summa:path
PF
JT
VV
actor
role
_:
starring
Institute of Applied Informatics and Formal
Description Methods (AIFB)
31 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Approach: SUMMA API
SUMMA RESTful Interaction:
Client Server
POST [ a :Summary;
:entity dbpedia:Barack_Obama; :topK 10 ] .
201 CREATED
Location: http://example.com/
summary?entity=dbpedia:Barack_Obama&topK=10
@ prefix summa: <http://purl.org/voc/summa/> .
...
GET http://example.com/
summary?entity=dbpedia:Barack_Obama&topK=10
200 OK
@ prefix summa: <http://purl.org/voc/summa/> .
...
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
32 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Analysis: Setup
Search Engines:
Google Knowledge Graph
Microsoft Bing Satori/Snapshots
Yahoo Knowledge
News Portals (Alexa Top 25 News sites):
Forbes
BBC News
Can the user interfaces be generated with data from the
SUMMA API without changing their layout?
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
33 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Analysis: Criteria
Features:
1. Property Restriction
2. Statement Groups
3. Multi-hop Search Space
4. Languages
Five entities:
Spain (country)
Dirk Nowitzki (person/athlete)
Ramones (band)
SAP (company/organization)
Inglourious Basterds (movie) (Source: http://google.com)
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
34 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Analysis: Results
Which features were required by the respective system?
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
35
Conclusions and Impact
Conclusions:
Decouple user interface from actual entity summarization
system by defining a common API.
Light-weight and extensible vocabulary and interaction mechanism.
Reference implementations and their source code are publicly
available.
Empirical analysis demonstrate applicability in real-world scenarios.
Impact:
Published and presented as full research paper at ICWE 2015.
Best Paper Candidate at ICWE 2015.
Best Demo Award at ICWE 2016.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Output
UI
SUMMA
API
Institute of Applied Informatics and Formal
Description Methods (AIFB)
36 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
4. RELATED WORK
Institute of Applied Informatics and Formal
Description Methods (AIFB)
37
Related Work
Who else is working on this?
Google [1], Microsoft, Yahoo, etc.
Other researchers in the field of the
Semantic Web e.g.
Cheng et al. [2]
Gunaratna et al. [3]
What distinguishes the presented work from theirs?
LinkSUM is a lightweight and effective approach.
UBES is the first approach that uses usage data for entity summarization.
SUMMA API: first and currently only API definition that enables the
exchange of entity summaries.
Entity Data Fusion: First approach that focuses on general alignment of
structured entity data on the Web.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
RDF + lots of
background data
(Only)
RDF data
Institute of Applied Informatics and Formal
Description Methods (AIFB)
38 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
5. SUMMARY AND OUTLOOK
Institute of Applied Informatics and Formal
Description Methods (AIFB)
39
We provided contributions for Linked Data Entity Summarization.
Impact was created on the levels of research and dataset/system
adoption.
Combination with entity linking is possible.
The addressed problem is highly relevant for search and question
answering engines.
Summary
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
40
Outlook
Full integration of the entity data fusion approach.
Addressing literal values.
Personalized/contextualized summaries of entities.
Abstract entity summarization.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Institute of Applied Informatics and Formal
Description Methods (AIFB)
41 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Questions?
Institute of Applied Informatics and Formal
Description Methods (AIFB)
42
Publications
Contribution 1
Andreas Thalhammer, Nelia Lasierra, Achim Rettinger: LinkSUM: Using Link Analysis to Summarize Entity Data, In Web Engineering: 16th
International Conference, ICWE 2016. Proceedings, vol. 9671 of Lecture Notes in Computer Science, pages 244–261. Springer, 2016
Andreas Thalhammer and Achim Rettinger: Browsing DBpedia Entities with Summaries. The Semantic Web: ESWC 2014 Satellite Events,
Lecture Notes in Computer Science 2014, pages 511-515, Springer 2014
Andreas Thalhammer and Achim Rettinger: PageRank on Wikipedia: Towards General Importance Scores for Entities. In The Semantic
Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers, pages 227–240. Springer,
2016.
Contribution 2
Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel: Leveraging Usage Data for Linked Data Movie Entity
Summarization. In Proceedings of the 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD’12), 2012.
Andreas Thalhammer, Magnus Knuth, Harald Sack: Evaluating Entity Summarization Using a Game-Based Ground Truth. In International
Semantic Web Conference (2), vol. 7650, pages 350–361. Springer, 2012.
Contribution 3
Antonio Roa-Valverde, Andreas Thalhammer, Ioan Toma, and Miguel-Angel Sicilia: Towards a formal model for sharing and reusing
ranking computations. In Proceedings of the 6th International Workshop on Ranking in Databases In conjunction with VLDB 2012.
Andreas Thalhammer and Steffen Stadtmüller. SUMMA: A Common API for Linked Data Entity Summaries. In P. Cimiano, F. Frasincar,
G.-J. Houben, and D. Schwabe, editors, Engineering the Web in the Big Data Era, vol. 9114, pages 430-446. Springer, 2015.
Andreas Thalhammer, Achim Rettinger: ELES: Combining Entity Linking and Entity Summarization. In Web Engineering: 16th International
Conference, ICWE 2016. Proceedings, vol. 9671 of Lecture Notes in Computer Science, pages 547–550. Springer, 2016
Contribution 4
Andreas Thalhammer, Steffen Thoma, Andreas Harth: Entity-Centric Claim Reconciliation in Web Data, Submitted to WWW 2017.
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
Conference
Workshop
Demo
Knowledge
Base(s)
Input
Output
(Usage Data)
(Link Structure)
LinkSUM
UBES
UI
SUMMA
API
1
2
3
Entity
Data
Fusion
4
Institute of Applied Informatics and Formal
Description Methods (AIFB)
43
References
[1] A. Singhal. Introducing the knowledge graph: things, not strings.
http://goo.gl/kH1NKq, 2012.
[2] G. Cheng, T. Tran, and Y. Qu. RELIN: relatedness and informativeness-based centrality
for entity summarization. In Proc. of the 10th int. conf. on The Semantic Web - Vol. Part I,
ISWC’11. Springer, 2011.
[3] K. Gunaratna, K. Thirunarayan, and A. P. Sheth. FACES: diversity-aware entity
summarization using incremental hierarchical conceptual clustering. In Proc. of the 29th
AAAI Conf. Artificial Intelligence, 2015, Austin, Texas, USA., 2015.
[4] D. Dimitrov, P. Singer, F. Lemmerich, M. Strohmaier. What Makes a Link Successful on
Wikipedia? https://arxiv.org/abs/1611.02508
[5] S. Brin and L. Page. The Anatomy of a Large-scale Hypertextual Web Search Engine. In
Proceedings of the Seventh International Conference on World Wide Web 7, WWW7,
pages 107–117. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The
Netherlands, 1998.
[6] R. Baeza-Yates and E. Davis. Web Page Ranking Using Link Attributes. In Proceedings
of the 13th International World Wide Web Conference on Alternate Track Papers &Amp;
Posters, WWW Alt. ’04, pages 328–329, New York, NY, USA, 2004. ACM.
[7] J. Waitelonis and H. Sack. Towards exploratory video search using linked data.
Multimedia Tools and Applications, 59:645–672, 2012. 10.1007/s11042-011-0733-1.
[8] An art draw drawn by Felipe Micaroni Lalli (micaroni@gmail.com).
Andreas Thalhammer – Linked Data Entity Summarization03.10.2018

Contenu connexe

Tendances

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceGabriel Moreira
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methodsijcsity
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesPistoia Alliance
 
Kid171 chap02 english version
Kid171 chap02 english versionKid171 chap02 english version
Kid171 chap02 english versionFrank S.C. Tseng
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudDing Li
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with REdureka!
 
Logistic Regression In Data Science
Logistic Regression In Data ScienceLogistic Regression In Data Science
Logistic Regression In Data ScienceEdureka!
 
Graph based forcasting for social network
Graph based forcasting for social networkGraph based forcasting for social network
Graph based forcasting for social networkAshenafi Workie
 
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-contentPenguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-contentWenqiang Chen
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 
Approaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectApproaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectUKOLN (dev), University of Bath
 
P11 goonetilleke
P11 goonetillekeP11 goonetilleke
P11 goonetillekeRahul Yadav
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
A survey of 2013 data science salary survey”
A survey of   2013 data science salary survey”A survey of   2013 data science salary survey”
A survey of 2013 data science salary survey”show you
 

Tendances (20)

Phd thesis final presentation
Phd thesis   final presentationPhd thesis   final presentation
Phd thesis final presentation
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
Kid171 chap02 english version
Kid171 chap02 english versionKid171 chap02 english version
Kid171 chap02 english version
 
Business Intelligence and Big Data in Cloud
Business Intelligence and Big Data in CloudBusiness Intelligence and Big Data in Cloud
Business Intelligence and Big Data in Cloud
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
 
Data model
Data modelData model
Data model
 
Logistic Regression In Data Science
Logistic Regression In Data ScienceLogistic Regression In Data Science
Logistic Regression In Data Science
 
Graph based forcasting for social network
Graph based forcasting for social networkGraph based forcasting for social network
Graph based forcasting for social network
 
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-contentPenguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
Penguins in-sweaters-or-serendipitous-entity-search-on-user-generated-content
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Analytical Tools Primer
Analytical Tools PrimerAnalytical Tools Primer
Analytical Tools Primer
 
Graph
GraphGraph
Graph
 
Approaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep ProjectApproaches to automated metadata extraction : FixRep Project
Approaches to automated metadata extraction : FixRep Project
 
P11 goonetilleke
P11 goonetillekeP11 goonetilleke
P11 goonetilleke
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
A survey of 2013 data science salary survey”
A survey of   2013 data science salary survey”A survey of   2013 data science salary survey”
A survey of 2013 data science salary survey”
 

Similaire à Linked Data Entity Summarization (PhD defense)

LinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataLinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataAndreas Thalhammer
 
SUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity SummariesSUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity SummariesAndreas Thalhammer
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Gabriel Moreira
 
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...icwe2015
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
 
IRJET- Determining Document Relevance using Keyword Extraction
IRJET-  	  Determining Document Relevance using Keyword ExtractionIRJET-  	  Determining Document Relevance using Keyword Extraction
IRJET- Determining Document Relevance using Keyword ExtractionIRJET Journal
 
Iare ds lecture_notes_2
Iare ds lecture_notes_2Iare ds lecture_notes_2
Iare ds lecture_notes_2RajSingh734307
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...IRJET Journal
 
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...IJDKP
 
Modern Association Rule Mining Methods
Modern Association Rule Mining MethodsModern Association Rule Mining Methods
Modern Association Rule Mining Methodsijcsity
 
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Fabrizio Orlandi
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Rule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes ReportsRule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes ReportsCSCJournals
 
Rule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes ReportsRule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes ReportsCSCJournals
 
Techmetrics Of Dat Project Code And Designs
Techmetrics Of Dat Project Code And DesignsTechmetrics Of Dat Project Code And Designs
Techmetrics Of Dat Project Code And DesignsErin Perez
 
The linked data value chain atif
The linked data value chain atifThe linked data value chain atif
The linked data value chain atifAtif Latif
 
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET Journal
 
Social Media and Text Analytics
Social Media and Text AnalyticsSocial Media and Text Analytics
Social Media and Text AnalyticsRushikeshChikane2
 

Similaire à Linked Data Entity Summarization (PhD defense) (20)

LinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataLinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity Data
 
SUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity SummariesSUMMA: A Common API for Linked Data Entity Summaries
SUMMA: A Common API for Linked Data Entity Summaries
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
(Linked Data Interfaces and Querying track) "SUMMA: A Common API for Linked D...
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 
IRJET- Determining Document Relevance using Keyword Extraction
IRJET-  	  Determining Document Relevance using Keyword ExtractionIRJET-  	  Determining Document Relevance using Keyword Extraction
IRJET- Determining Document Relevance using Keyword Extraction
 
Iare ds lecture_notes_2
Iare ds lecture_notes_2Iare ds lecture_notes_2
Iare ds lecture_notes_2
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
Evaluating and Enhancing Efficiency of Recommendation System using Big Data A...
 
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
WEB-BASED DATA MINING TOOLS : PERFORMING FEEDBACK ANALYSIS AND ASSOCIATION RU...
 
Cal Essay
Cal EssayCal Essay
Cal Essay
 
Modern Association Rule Mining Methods
Modern Association Rule Mining MethodsModern Association Rule Mining Methods
Modern Association Rule Mining Methods
 
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
Web Intelligence 2013 - Characterizing concepts of interest leveraging Linked...
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Rule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes ReportsRule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes Reports
 
Rule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes ReportsRule-based Information Extraction for Airplane Crashes Reports
Rule-based Information Extraction for Airplane Crashes Reports
 
Techmetrics Of Dat Project Code And Designs
Techmetrics Of Dat Project Code And DesignsTechmetrics Of Dat Project Code And Designs
Techmetrics Of Dat Project Code And Designs
 
The linked data value chain atif
The linked data value chain atifThe linked data value chain atif
The linked data value chain atif
 
IRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big DataIRJET- Improving the Performance of Smart Heterogeneous Big Data
IRJET- Improving the Performance of Smart Heterogeneous Big Data
 
Social Media and Text Analytics
Social Media and Text AnalyticsSocial Media and Text Analytics
Social Media and Text Analytics
 

Dernier

Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptxVijayaKumarR28
 
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Sérgio Sacani
 
biosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsbiosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsSafaFallah
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPirithiRaju
 
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxTHE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxAkinrotimiOluwadunsi
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPirithiRaju
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 
Thermonuclear explosions on neutron stars reveal the speed of their jets
Thermonuclear explosions on neutron stars reveal the speed of their jetsThermonuclear explosions on neutron stars reveal the speed of their jets
Thermonuclear explosions on neutron stars reveal the speed of their jetsSérgio Sacani
 
Intensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxIntensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxHarshiniAlapati
 
Controlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentControlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentRahulVishwakarma71547
 
Gender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGRAPE
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchPrachya Adhyayan
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxRahulVishwakarma71547
 
soft skills question paper set for bba ca
soft skills question paper set for bba casoft skills question paper set for bba ca
soft skills question paper set for bba caohsadfeeling
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsHassan Jolany
 
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...PirithiRaju
 
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPirithiRaju
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WaySérgio Sacani
 
PSP3 employability assessment form .docx
PSP3 employability assessment form .docxPSP3 employability assessment form .docx
PSP3 employability assessment form .docxmarwaahmad357
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)chatterjeesoumili50
 

Dernier (20)

Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptx
 
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
 
biosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibioticsbiosynthesis of the cell wall and antibiotics
biosynthesis of the cell wall and antibiotics
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPR
 
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptxTHE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
THE HISTOLOGY OF THE CARDIOVASCULAR SYSTEM 2024.pptx
 
Pests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPRPests of tenai_Identification,Binomics_Dr.UPR
Pests of tenai_Identification,Binomics_Dr.UPR
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 
Thermonuclear explosions on neutron stars reveal the speed of their jets
Thermonuclear explosions on neutron stars reveal the speed of their jetsThermonuclear explosions on neutron stars reveal the speed of their jets
Thermonuclear explosions on neutron stars reveal the speed of their jets
 
Intensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptxIntensive Housing systems for Poultry.pptx
Intensive Housing systems for Poultry.pptx
 
Controlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform EnvironmentControlling Parameters of Carbonate platform Environment
Controlling Parameters of Carbonate platform Environment
 
Gender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eyeGender board diversity spillovers and the public eye
Gender board diversity spillovers and the public eye
 
Exploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & ResearchExploration Method’s in Archaeological Studies & Research
Exploration Method’s in Archaeological Studies & Research
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptx
 
soft skills question paper set for bba ca
soft skills question paper set for bba casoft skills question paper set for bba ca
soft skills question paper set for bba ca
 
geometric quantization on coadjoint orbits
geometric quantization on coadjoint orbitsgeometric quantization on coadjoint orbits
geometric quantization on coadjoint orbits
 
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
3.2 Pests of Sorghum_Identification, Symptoms and nature of damage, Binomics,...
 
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdfPests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
Pests of cumbu_Identification, Binomics, Integrated ManagementDr.UPR.pdf
 
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky WayShiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
Shiva and Shakti: Presumed Proto-Galactic Fragments in the Inner Milky Way
 
PSP3 employability assessment form .docx
PSP3 employability assessment form .docxPSP3 employability assessment form .docx
PSP3 employability assessment form .docx
 
TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)TORSION IN GASTROPODS- Anatomical event (Zoology)
TORSION IN GASTROPODS- Anatomical event (Zoology)
 

Linked Data Entity Summarization (PhD defense)

  • 1. KIT – The Research University in the Helmholtz Association INSTITUTE OF APPLIED INFORMATICS AND FORMAL DESCRIPTION METHODS (AIFB) www.kit.edu Linked Data Entity Summarization Dipl.-Inf. Univ. Andreas Thalhammer 08.12.2016
  • 2. Institute of Applied Informatics and Formal Description Methods (AIFB) 2 Outline 1. Motivation 2. Research Questions 3. Contributions a) LinkSUM (Contribution 1) b) SUMMA API (Contribution 3) 4. Related Work 5. Summary and Outlook Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 3. Institute of Applied Informatics and Formal Description Methods (AIFB) 3 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 1. MOTIVATION
  • 4. Institute of Applied Informatics and Formal Description Methods (AIFB) 4 Information need versus availability Information need (in the US*) More than 40% of all search queries are focused on one specific entity. 579 million searches per day come from home and work devices in the US every day. ~ 232 million searches for entities (every day; in the US; desktop) Information availability (Wikidata**) Wikidata covers 24.5 million entities (growth of 55% in last year). 3.2 million entities have > 10 statements (growth of 78% in last year). Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 * https://www.comscore.com/Insights/Rankings/comScore-Releases-February-2016-US-Desktop-Search-Engine-Rankings ** https://www.wikidata.org/wiki/Wikidata:Statistics
  • 5. Institute of Applied Informatics and Formal Description Methods (AIFB) 5 Wikidata entry for Pulp Fiction ~ 614 facts Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Growing amount of structured data on the Web
  • 6. Institute of Applied Informatics and Formal Description Methods (AIFB) 6 Naïve solution: Entity presentation based on class summaries Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Source: yahoo.com)
  • 7. Institute of Applied Informatics and Formal Description Methods (AIFB) 7 Problems of class summaries 1. The patterns are very static and do not reflect the individual particularities of entities. 2. A pattern needs to be created for each type and class hierarchies need to be considered. 3. Some entities are of multiple (distinct) types with unclear main type. 4. Some of the properties can have many values for which no ranking or cut-off is defined. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Person Athlete Body builder Arnold Schwarzenegger Angkor Wat
  • 8. Institute of Applied Informatics and Formal Description Methods (AIFB) 8 Entity Summarization Propositions: Every entity is individual. For different entities, different properties are of importance. Entities of the same type do not always have the same attributes. For each entity, a single property-value pair can be of different relevance. Solution: Focus on individual particularities of each entity: Entity Summarization Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 9. Institute of Applied Informatics and Formal Description Methods (AIFB) 9 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 2. RESEARCH QUESTIONS
  • 10. Institute of Applied Informatics and Formal Description Methods (AIFB) 10 Challenge #1 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 RQ1: How can we effectively summarize entities with limited background information? RQ1.1: How can we use link analysis effectively in order to derive summaries of entities? RQ1.2: How can we use usage data analysis effectively in order to derive summaries of entities? RDF data typically does not reflect importance levels in its relations. Proprietary entity summarization systems have access to a lot of data (e.g., search queries) and infrastructure (e.g., a full Web index). Other knowledge panel providers (such as publishers) are lacking that information and infrastructure. (Source: google.com)
  • 11. Institute of Applied Informatics and Formal Description Methods (AIFB) 11 Challenge #2 RQ2: Is there a minimum set of re-occurring/common features of entity summarization systems that allow us to provide a generic API? Andreas Thalhammer – Linked Data Entity Summarization03.10.201803.10.2018 Providers of knowledge panels are hiding the original graph structure in strongly abstracted interfaces. Standardized programmatic access is desirable (but not available). (Source: google.com) (Source: developers.google.com/knowledge-graph)
  • 12. Institute of Applied Informatics and Formal Description Methods (AIFB) 12 Challenge #3 RQ3: How can we align duplicate/similar facts about Linked Data entities on the Web? Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Different Web sources provide structured information about a single entity. The different sources often cover similar information but do not provide according links or vocabulary mappings. Alignments are particularly difficult as the sources typically provide data at different levels of modeling granularity. (Source: imdb.com) (Source: wikidata.org)
  • 13. Institute of Applied Informatics and Formal Description Methods (AIFB) 13 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 3. CONTRIBUTIONS
  • 14. Institute of Applied Informatics and Formal Description Methods (AIFB) 14 Knowledge Base(s) Input Output (Usage Data) (Link Structure) LinkSUM UBES UI SUMMA API 1 2 3 Entity Data Fusion 4 Overview: Research Questions and Contributions RQ1: How can we effectively summarize entities with limited background information? RQ1.1: How can we use link analysis effectively in order to derive summaries of entities? (Contribution 1) RQ1.2: How can we use usage data analysis effectively in order to derive summaries of entities? (Contribution 2) RQ2: Is there a minimum set of re-occurring/common features of entity summarization systems that allow us to provide a generic API (Contribution 3) RQ3: How can we align duplicate/similar facts about Linked Data entities on the Web? (Contribution 4) Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 15. Institute of Applied Informatics and Formal Description Methods (AIFB) 15 Linked Data Entity Summarization Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Knowledge Base(s) Input Output (Usage Data) (Link Structure) LinkSUM UBES UI SUMMA API 1 2 3 Entity Data Fusion 4 Contribution 1
  • 16. Institute of Applied Informatics and Formal Description Methods (AIFB) 16 LinkSUM Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Step 1: Select top-k important related resources. Step 2: Select the most relevant connecting predicate. Idea: Use link analysis for selecting facts. (Link Structure) LinkSUM
  • 17. Institute of Applied Informatics and Formal Description Methods (AIFB) 17 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Approach: Resource Selection Quentin Tarantino Pulp Fiction director Compute PageRank [5] scores of entities with (un-typed) links that occur in textual descriptions of entities (pr). Use “Backlinks” [7] (also called “mutual links”) for finding strong connections (bl): Combine scores: (Link Structure) LinkSUM dbpedia:Category:English-language_films 220.961 dbpedia:Quentin_Tarantino 13.7403 dbpedia:John_Travolta 10.5771 dbpedia:Miramax_Films 9.9398 ... ...
  • 18. Institute of Applied Informatics and Formal Description Methods (AIFB) 18 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Approach: Relation Selection Problem: multiple relations Approaches: Frequency (FRQ) #times the predicate is used Exclusivity (EXC) 1 / (N + M) Description (DSC): #domain + #range + #label Quentin Tarantino Pulp Fiction director writer of and combinations of those, e.g. (FREQ * EXCL) (Link Structure) LinkSUM
  • 19. Institute of Applied Informatics and Formal Description Methods (AIFB) 19 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Used reference dataset: Introduced in Gunaratna et al. [3]. Contains human-created summaries of 50 entities (DBpedia 3.9, outgoing relations). Includes seven top-5 and seven top-10 summaries for each entity. The dataset was created by 15 experts from the Semantic Web field. Used similarity measure: Reference system: FACES (introduced in [3]). Quantitative Evaluation: Dataset and Measures (Link Structure) LinkSUM
  • 20. Institute of Applied Informatics and Formal Description Methods (AIFB) 20 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Quantitative Evaluation: Results (Link Structure) LinkSUM SO: Subject-Object pairs (predicates not considered). SPO: Full triple. config-1: config-2: Significance with respect to both LinkSUM configurations (p < 0.05). Significance with respect to the best LinkSUM configuration (p < 0.05). Standard deviation.SD 9.0 8.0
  • 21. Institute of Applied Informatics and Formal Description Methods (AIFB) 21 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Qualitative Evaluation: Setup (Link Structure) LinkSUM Scenario: Search Engine Result Page (SERP). 20 users, 10 entities (from the FACES dataset).
  • 22. Institute of Applied Informatics and Formal Description Methods (AIFB) 22 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Qualitative Evaluation: Results (Link Structure) LinkSUM In some cases the task is subjective. Reasons for: Selection - the presented related resources are relevant for the entity. Rejection - redundancy. - related resources do not characterize the entity.
  • 23. Institute of Applied Informatics and Formal Description Methods (AIFB) 23 Focus: PageRank (1) PageRank is not perfect, for example: Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 PREFIX v:http://purl.org/voc/vrank# SELECT ?e ?r FROM <http://dbpedia.org> FROM <http://people.aifb.kit.edu/ath/ #DBpedia_PageRank> WHERE { ?e rdf:type dbo:Scientist; v:hasRank/v:rankValue ?r. } ORDER BY DESC(?r) LIMIT 5 dbpedia:Carl_Linnaeus 551.791 dbpedia:Charles_Darwin 215.028 dbpedia:Albert_Einstein 186.549 dbpedia:Isaac_Newton 167.811 dbpedia:Sigmund_Freud 140.245 (Link Structure) LinkSUM
  • 24. Institute of Applied Informatics and Formal Description Methods (AIFB) 24 Focus: PageRank (2) Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Link Structure) LinkSUM Important parameters (for resources r): l(r) – returns all pages that link to r. c(r) – the number of outgoing links of r. d – the damping factor Traditional PageRank [5]: Variant: Weighted Links Rank (WLRank) [6]: Link weights (lw): relative position of a link in the article [8]
  • 25. Institute of Applied Informatics and Formal Description Methods (AIFB) 25 Focus: PageRank (3) Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Link Structure) LinkSUM Newly constructed rankings: ALL – all links from the article text and from the templates. ATL – article text links. TEL – template links. ATL-RP – article text links with WLRank and relative position. Size of input dataset: Reference rankings (page-view-based): TOWR-PV – “The Open Wikipedia Ranking” SUB – SubjectiveEye3D by Paul Houle ALL ATL TEL ATL-RP # links 159.398.815 142.305.605 26.460.273 143.056.545
  • 26. Institute of Applied Informatics and Formal Description Methods (AIFB) 26 Focus: PageRank (4) Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Link Structure) LinkSUM Measure: Spearman rank correlation (range: [-1, 1]) Results: Conclusions: Bad correlation of TEL with TOWR-PV/SUB is the result of a small input data set. Weighting by relative position improves correlation to SUB. These findings are supported by [4].
  • 27. Institute of Applied Informatics and Formal Description Methods (AIFB) 27 Conclusions and Impact Conclusions: LinkSUM significantly outperforms the state of the art. Entity summarization: Focus should be on selecting relevant resources. Redundancies at the object level should be avoided. LinkSUM is lightweight and can be applied in other scenarios, e.g. Web sites with semantic annotations. Semantic MediaWikis. Impact: Published and presented as full research paper at ICWE 2016. The PageRank scores are published online and found many adopters (e.g., the official DBpedia SPARQL endpoint includes the scores) In use at the WDAqua project (http://wdaqua.eu/). Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 (Link Structure) LinkSUM
  • 28. Institute of Applied Informatics and Formal Description Methods (AIFB) 28 Linked Data Entity Summarization Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Knowledge Base(s) Input Output (Usage Data) (Link Structure) LinkSUM UBES UI SUMMA API 1 2 3 Entity Data Fusion 4 Contribution 3
  • 29. Institute of Applied Informatics and Formal Description Methods (AIFB) 29 SUMMA API Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Quantitative evaluation. Qualitative evaluation. A/B testing. Combination of summary services. Idea: A common API for entity summaries Output UI SUMMA API
  • 30. Institute of Applied Informatics and Formal Description Methods (AIFB) 30 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Approach: SUMMA API Parameters: URI (of the entity e) – the entity needs to be identified k (number) – an upper limit of facts related to e Multi-language support Statement groups (e.g., biographical data) Restriction to specific properties Multi-hop search space SUMMA Vocabulary: Output UI SUMMA API summa:Summary xsd:positiveInteger summa:topK summa:entity rdfs:Resource xsd:String summa:language summa:fixedProperty rdf:Property summa:statement rdf:Statement xsd:positiveInteger summa:maxHops summa:SummaryGroup summa:group summa:path PF JT VV actor role _: starring
  • 31. Institute of Applied Informatics and Formal Description Methods (AIFB) 31 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Approach: SUMMA API SUMMA RESTful Interaction: Client Server POST [ a :Summary; :entity dbpedia:Barack_Obama; :topK 10 ] . 201 CREATED Location: http://example.com/ summary?entity=dbpedia:Barack_Obama&topK=10 @ prefix summa: <http://purl.org/voc/summa/> . ... GET http://example.com/ summary?entity=dbpedia:Barack_Obama&topK=10 200 OK @ prefix summa: <http://purl.org/voc/summa/> . ... Output UI SUMMA API
  • 32. Institute of Applied Informatics and Formal Description Methods (AIFB) 32 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Analysis: Setup Search Engines: Google Knowledge Graph Microsoft Bing Satori/Snapshots Yahoo Knowledge News Portals (Alexa Top 25 News sites): Forbes BBC News Can the user interfaces be generated with data from the SUMMA API without changing their layout? Output UI SUMMA API
  • 33. Institute of Applied Informatics and Formal Description Methods (AIFB) 33 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Analysis: Criteria Features: 1. Property Restriction 2. Statement Groups 3. Multi-hop Search Space 4. Languages Five entities: Spain (country) Dirk Nowitzki (person/athlete) Ramones (band) SAP (company/organization) Inglourious Basterds (movie) (Source: http://google.com) Output UI SUMMA API
  • 34. Institute of Applied Informatics and Formal Description Methods (AIFB) 34 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Analysis: Results Which features were required by the respective system? Output UI SUMMA API
  • 35. Institute of Applied Informatics and Formal Description Methods (AIFB) 35 Conclusions and Impact Conclusions: Decouple user interface from actual entity summarization system by defining a common API. Light-weight and extensible vocabulary and interaction mechanism. Reference implementations and their source code are publicly available. Empirical analysis demonstrate applicability in real-world scenarios. Impact: Published and presented as full research paper at ICWE 2015. Best Paper Candidate at ICWE 2015. Best Demo Award at ICWE 2016. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Output UI SUMMA API
  • 36. Institute of Applied Informatics and Formal Description Methods (AIFB) 36 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 4. RELATED WORK
  • 37. Institute of Applied Informatics and Formal Description Methods (AIFB) 37 Related Work Who else is working on this? Google [1], Microsoft, Yahoo, etc. Other researchers in the field of the Semantic Web e.g. Cheng et al. [2] Gunaratna et al. [3] What distinguishes the presented work from theirs? LinkSUM is a lightweight and effective approach. UBES is the first approach that uses usage data for entity summarization. SUMMA API: first and currently only API definition that enables the exchange of entity summaries. Entity Data Fusion: First approach that focuses on general alignment of structured entity data on the Web. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 RDF + lots of background data (Only) RDF data
  • 38. Institute of Applied Informatics and Formal Description Methods (AIFB) 38 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 5. SUMMARY AND OUTLOOK
  • 39. Institute of Applied Informatics and Formal Description Methods (AIFB) 39 We provided contributions for Linked Data Entity Summarization. Impact was created on the levels of research and dataset/system adoption. Combination with entity linking is possible. The addressed problem is highly relevant for search and question answering engines. Summary Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 40. Institute of Applied Informatics and Formal Description Methods (AIFB) 40 Outlook Full integration of the entity data fusion approach. Addressing literal values. Personalized/contextualized summaries of entities. Abstract entity summarization. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018
  • 41. Institute of Applied Informatics and Formal Description Methods (AIFB) 41 Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Questions?
  • 42. Institute of Applied Informatics and Formal Description Methods (AIFB) 42 Publications Contribution 1 Andreas Thalhammer, Nelia Lasierra, Achim Rettinger: LinkSUM: Using Link Analysis to Summarize Entity Data, In Web Engineering: 16th International Conference, ICWE 2016. Proceedings, vol. 9671 of Lecture Notes in Computer Science, pages 244–261. Springer, 2016 Andreas Thalhammer and Achim Rettinger: Browsing DBpedia Entities with Summaries. The Semantic Web: ESWC 2014 Satellite Events, Lecture Notes in Computer Science 2014, pages 511-515, Springer 2014 Andreas Thalhammer and Achim Rettinger: PageRank on Wikipedia: Towards General Importance Scores for Entities. In The Semantic Web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 – June 2, 2016, Revised Selected Papers, pages 227–240. Springer, 2016. Contribution 2 Andreas Thalhammer, Ioan Toma, Antonio J. Roa-Valverde, Dieter Fensel: Leveraging Usage Data for Linked Data Movie Entity Summarization. In Proceedings of the 2nd International Workshop on Usage Analysis and the Web of Data (USEWOD’12), 2012. Andreas Thalhammer, Magnus Knuth, Harald Sack: Evaluating Entity Summarization Using a Game-Based Ground Truth. In International Semantic Web Conference (2), vol. 7650, pages 350–361. Springer, 2012. Contribution 3 Antonio Roa-Valverde, Andreas Thalhammer, Ioan Toma, and Miguel-Angel Sicilia: Towards a formal model for sharing and reusing ranking computations. In Proceedings of the 6th International Workshop on Ranking in Databases In conjunction with VLDB 2012. Andreas Thalhammer and Steffen Stadtmüller. SUMMA: A Common API for Linked Data Entity Summaries. In P. Cimiano, F. Frasincar, G.-J. Houben, and D. Schwabe, editors, Engineering the Web in the Big Data Era, vol. 9114, pages 430-446. Springer, 2015. Andreas Thalhammer, Achim Rettinger: ELES: Combining Entity Linking and Entity Summarization. In Web Engineering: 16th International Conference, ICWE 2016. Proceedings, vol. 9671 of Lecture Notes in Computer Science, pages 547–550. Springer, 2016 Contribution 4 Andreas Thalhammer, Steffen Thoma, Andreas Harth: Entity-Centric Claim Reconciliation in Web Data, Submitted to WWW 2017. Andreas Thalhammer – Linked Data Entity Summarization03.10.2018 Conference Workshop Demo Knowledge Base(s) Input Output (Usage Data) (Link Structure) LinkSUM UBES UI SUMMA API 1 2 3 Entity Data Fusion 4
  • 43. Institute of Applied Informatics and Formal Description Methods (AIFB) 43 References [1] A. Singhal. Introducing the knowledge graph: things, not strings. http://goo.gl/kH1NKq, 2012. [2] G. Cheng, T. Tran, and Y. Qu. RELIN: relatedness and informativeness-based centrality for entity summarization. In Proc. of the 10th int. conf. on The Semantic Web - Vol. Part I, ISWC’11. Springer, 2011. [3] K. Gunaratna, K. Thirunarayan, and A. P. Sheth. FACES: diversity-aware entity summarization using incremental hierarchical conceptual clustering. In Proc. of the 29th AAAI Conf. Artificial Intelligence, 2015, Austin, Texas, USA., 2015. [4] D. Dimitrov, P. Singer, F. Lemmerich, M. Strohmaier. What Makes a Link Successful on Wikipedia? https://arxiv.org/abs/1611.02508 [5] S. Brin and L. Page. The Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pages 107–117. Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The Netherlands, 1998. [6] R. Baeza-Yates and E. Davis. Web Page Ranking Using Link Attributes. In Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers &Amp; Posters, WWW Alt. ’04, pages 328–329, New York, NY, USA, 2004. ACM. [7] J. Waitelonis and H. Sack. Towards exploratory video search using linked data. Multimedia Tools and Applications, 59:645–672, 2012. 10.1007/s11042-011-0733-1. [8] An art draw drawn by Felipe Micaroni Lalli (micaroni@gmail.com). Andreas Thalhammer – Linked Data Entity Summarization03.10.2018

Notes de l'éditeur

  1. Good afternoon, I would like to welcome the committee and the audience to my PhD defense, my name is Andreas Thalhammer and the title of my PhD thesis is “Linked Data Entity Summarization”.
  2. Wikidata is a Wikipedia project ...
  3. roughly 600 facts now you could say: that’s too much, just show me the top part
  4. Show facts in a common order: release date, rating, ... this seems reasonable But: the second one has an important part missing: “it was the first animated feature film by walt disnesy, it is based on a fairy tale”
  5. Arnold Schwarzenegger – body builder, actor, politician Angkor Wat – tourist attraction, human-built structure, Hindu and Buddhist temple
  6. x example -> for snow white the production company is of particular importance – for pulp fiction not so much x ocean, Sri Lanka (Indian Ocean) – Austria doesn’t x If two movies have john travolta as an actor, it might be more important for the one and not so important for the other
  7. So why is it desirable: exchange, combine and remix summaries. Evaluate summaries in different ways.
  8. Baeza-Yates
  9. Filling the gap between approaches that have large amounts of background data and those who only use RDF