SlideShare a Scribd company logo
1 of 43
Download to read offline
Towards Encoding Time in
Text-Based Entity Embeddings
Federico Bianchi, Matteo Palmonari and Debora Nozza
University of Milano-Bicocca
INSID&S Lab
Interaction and Semantics for
Innovation with Data & Services
International Semantic Web Conference, Monterey, California. 2018
MIND Lab
Models in Decision making
and data analysis
Knowledge Graphs
Large knowledge bases
Entities classified using types
Types organized in sub-types graphs
Binary relationships between entities
Semantics and inference via
rules/axioms
Semantic similarity with lexical,
topological and other feature-based
approaches
A.S.
Roma
Kostas
Manolas
team
Soccer
Player
Soccer
Club
Athlete
Thing
Person
Sports
Club
Garry
Kasparov
Chess
Player
Real
Madrid
Organis.
Knowledge Graphs Embeddings
Generate vector representations of entities and relationships
A.S.
Roma
Kostas
Manolas
team 2
5
6
2
6
4
2
12
5
2
Kostas
Manolas
A.S.
Roma
4
2
12
5
2
team
Given in input a KG
Generate vector
representations
Embedding
Algorithm
Why should we embed?
● Latent components (e.g., → link prediction)
● Features generation (e.g., → entity linking)
● Fast and intuitive way to compute similarity
From Word Embeddings to Text-based Entity Embeddings
- Word embeddings (e.g., [Mikolov+, 2013])
- Text-based Entity Embeddings
- Text as main source vs. Graph as main source [Bordes+,2013][Trouillon+,2016]
- Typed Entity Embeddings (TEE): use word embeddings algorithms on documents where entities and
types replace words (next slide :) )
- Pros: good for similarity evaluation
- Cons: no embedding of relations, just entity
corpus
cat
black
eats
dog
similar words corresponds
to similar vectors
C
W
The big black cat eats its food.
My little black cat sleeps all day.
Sometimes my cat eats too much!
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
Wikipedia’s abstracts
TEE: Typed Entity Embeddings from Text
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
“dbr:Rome dbr:Italy
dbr:Rome dbr:Lazio …”
“dbo:City dbo:Country City
dbo:Administrative_Region …”
Generate Type
Vectors
From Text
Generate Entity
Vectors
From Text
Concatenate
“Rome is the capital of
Italy and a special
comune (named
Comune di Roma
Capitale). Rome also
serves as the capital of
the Lazio region.”
Link to DBpedia
entities via named
entity linking tools
Replace
entities
with their most
specific types
[Bianchi+,2017b]
[Bianchi+, 2018a]
1 3 6 3 19 5 6
v(Rome)v(City)
Wikipedia’s abstracts
Why Time?
● To the best of our knowledge this is the first approach to explicitly encode time periods into entity
embeddings
● We expect that when we evaluate similarity between entities time is important:
○ Entities are similar when they co-occur frequently, entities that share a time period co-occur
Most similar entities to “Winston Churchill” are his contemporary politicians
● In this paper we try to provide an approach to explicitly encode time in such a way that we can use
those representation to control the similarity with respect to time
Winston Churchill Harold Macmillan
Textual Descriptions of Time Periods via Events
Textual Descriptions of Time Periods via Events
“The succession of events is an inherent property of our time
perception. Memory is necessary, and the order of these
events is fundamental”
Snaider&al. 2012, Cognitive Systems Research
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Adolf Hitler
Nazi Germany
World War II
4 3 6 2 3
5 1 2 9 2
1 2 8 4 1
Embedding Years from Event Descriptions
A year is represented by the set of entities taking part in the year’s events
The year vector is the average of the entities’ vectors found inside the description
Adolf Hitler 4 3 6 2 3
Nazi Germany 5 1 2 9 2
World War II 1 2 8 4 1
1941
9 2 3 5 5
AVG
Towards Time Aware Similarity
Time flattened similarity: to reduce the impact of time in the similarity.
E.g., make US presidents similar independently from their temporal context.
Time boosted similarity: to boost the impact of time in the similarity.
E.g., make politicians that share temporal contexts more similar
Time Flattened Similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
Extract the embeddings for the two entities
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
Find the closest year vectors to the two entity
embeddings (e.g., the entity vector of Barack
Obama is close to the vector of the year
2003).
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , )
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = η( , )
Cosine similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = η( , ) - ηn
( , )1990 2003
Normalized cosine similarity
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Time Flattened Similarity
1999 2003
𝝍( , ) = ⍺η( , ) - (1 - ⍺) ηn
( , )1999 2003
⍺ to control the weight of the time factor
What’s the time flattened similarity between
Barack Obama and Bill Clinton?
Experiments: Research Questions
1. Quality: properties of the year embeddings
2. Similarity and Time:
a. Time Bias in TEE and EE: Effect of time in entity embeddings from text
i. Adherence to Natural Time Order
ii. Clustering WWI and WWII Battles
iii. Relative Ordering of Entities
b. Controlling Time Bias: handling the effect of time
Embedded Representations vs. Natural Time Flow
191X
years
201X
years
PCA in 1D vs. natural order of years: Kendall τ = 0.80 and Spearman Rank correlation coefficient = 0.94
Good resemblance of natural time flow!
2D projection (PCA)
1D projection (PCA)
Time Bias: Adherence to Natural Time Order
Task: count number of entities shared by sequences of 2-3 contiguous years vs
number of entities shared in non contiguous years (randomly sampled):
● (e.g, 1991-1992 vs 1934-1992)
Dataset: two and three contiguous years and non contiguous years (1931-1991).
Results: contiguous years share an higher amount of entities than non contiguous
years.
Time Bias: Clustering Battles with EE
Task: classify battles as belonging to WWI or WWII.
Dataset: 152 resource identifier of WWI (63) and WWII (89) battles from Wikipedia.
Method: K-means clustering (K=2) on the vector representation in the entity
embedding space.
Results: 95% accuracy. Centroids of the two groups are close to WWI years and
WWII years respectively.
Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time
Barack
Obama
Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time. E.g., find
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
Controlling Time Bias: Flattened Similarity
Task: find similar entities to a given input entity but that are far in time. E.g., find
past president given one
Ford
Coolidge
Hoover
T. Kennedy
Truman
Barack
Obama
Correct
Correct
Correct
Correct
Wrong
Controlling Time Bias: Flattened Similarity
Dataset: US presidents entities and British Prime ministers entities (19 and 19)
Method: start with the 6 most recent presidents for each group. For each entity
compute the number of older presidents that are in the ranked list created by the
similarity measures.
Time flattened reorders top-100 results from cosine similarity
Algorithms:
● Time-aware Similarity TEE (TATEE), with time-flattened similarity;
● Similarity TEE (STEE) (standard neighborhood with cosine);
● Time-Aware Similarity EE (TAEE), with time-flattened similarity;
● Similarity EE (SEE) (standard neighborhood with cosine);
● Time-flattened similarity Wiki2Vec (Baseline).
Controlling Time Bias: Flattened Similarity
Results: time-flattened similarity on TATEE seems able to get the best results. This
is also due to the fact that TATEE considers type representations and thus it can
easily retrieve entities sharing types.
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
Ford
Coolidge
T. Kennedy
Hoover
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.7
New
New
New
New
Controlling Time Bias: Qualitative Analysis
Clinton
Reagan
G. Bush
Carter
Al Gore
Nixon
J. Kerry
D. Cheney
McCain
Biden
The most
similar
entities to
Barack
Obama using
cosine
similarity in
TEE
Time flattened
similarity to
reorder the
top-100 most
similar
alpha = 0.1
New
New
New
New
Ford
Coolidge
Hoover
Truman
Roosevelt
Wilson
E. Roosevelt
Harding
Cleveland
Eisenhower
New
New
New
New
New
New
Conclusions and Future Work
Conclusions
● Time can be represented in the vector space using events descriptions
● Time sneaks into entity similarity (time bias)
● Time bias can be controlled by considering explicit representations of
time periods
Future Work
● Study compositionality of time periods representations
● Comparison with Doc2Vec
● Improve time-aware similarity measure
● Comparison with other KG embeddings models
References
Snaider, J., McCall, R., & Franklin, S. (2012). Time production and representation in a conceptual and computational cognitive
model. Cognitive Systems Research, 13(1), 59-71.
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling
multi-relational data. In Advances in neural information processing systems (pp. 2787-2795).
Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016, June). Complex embeddings for simple link prediction. In
International Conference on Machine Learning (pp. 2071-2080).
Tran, N. K., Tran, T., & Niederée, C. (2017, May). Beyond time: Dynamic context-aware entity recommendation. In European
Semantic Web Conference (pp. 353-368). Springer, Cham.
Bianchi, F., Soto, M., Palmonari, M., & Cutrona, V. (2018). Type vector representations from text: An empirical analysis. In Deep
Learning for Knowledge Graphs and Semantic Technologies Workshop, co-located with the Extended Semantic Web
Conference, Crete.
Bianchi, F., Palmonari, M., & Nozza, D. (2018), “Towards Encoding Time in Text-Based Entity Embeddings” in International
Semantic Web Conference (to appear), Monterey, California.
References
Bianchi, F., Palmonari, M., Cremaschi, M., & Fersini, E. (2017, May). Actively learning to rank semantic associations for
personalized contextual exploration of knowledge graphs. In European Semantic Web Conference (pp. 120-135). Springer,
Cham.
Bianchi, F., & Palmonari, M. (2017). Joint learning of entity and type embeddings for analogical reasoning with entities. In In
Proceedings of the NL4AI Workshop, co-located with the International Conference of the Italian Association for Artificial
Intelligence (AI* IA).
Thank you!
Qualitative Evaluation of Time Flattened Similarity
Winston Churchill Harold Macmillan
Tony Blair
Gordon Brown
Most similar 49th in
the list of
most
similars
41st in
the list of
most
similars
Method: Cosine similarity
Input: Winston Churchill
Qualitative Evaluation of Time Flattened Similarity
Winston Churchill Margaret Thatcher
Tony Blair
Gordon Brown
Most similar 16th in
the list of
most
similars
14th in
the list of
most
similars
Method: Time-flattened Similarity
Input: Winston Churchill

More Related Content

Similar to Towards Encoding Time in Text-Based Entity Embeddings

TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORAcsandit
 
Persuasive Essay On Capital Punishment. Essay on Capital Punishment Internat...
Persuasive Essay On Capital Punishment. Essay on Capital Punishment  Internat...Persuasive Essay On Capital Punishment. Essay on Capital Punishment  Internat...
Persuasive Essay On Capital Punishment. Essay on Capital Punishment Internat...Monica Clark
 
Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Richard Ingram
 
Surfacing Real-World Event Content on Twitter
Surfacing Real-World Event Content on TwitterSurfacing Real-World Event Content on Twitter
Surfacing Real-World Event Content on TwitterHila Becker
 
Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applicationsMark Greaves
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
CeB - f - s01
CeB - f - s01CeB - f - s01
CeB - f - s01gauvins
 
Relational database
Relational databaseRelational database
Relational databaseSanthiNivas
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
Temporal Case Management 1998
Temporal Case Management  1998Temporal Case Management  1998
Temporal Case Management 1998David Tryon
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus typejins0618
 
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Daniel Katz
 
Session 18, Oegema
Session 18, OegemaSession 18, Oegema
Session 18, Oegemacsrcomm
 
From text to entities: Information Extraction in the Era of Knowledge Graphs
From text to entities: Information Extraction in the Era of Knowledge GraphsFrom text to entities: Information Extraction in the Era of Knowledge Graphs
From text to entities: Information Extraction in the Era of Knowledge GraphsGraphRM
 

Similar to Towards Encoding Time in Text-Based Entity Embeddings (20)

Data journalism: Data rules, while data rule
Data journalism: Data rules, while data ruleData journalism: Data rules, while data rule
Data journalism: Data rules, while data rule
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
 
Persuasive Essay On Capital Punishment. Essay on Capital Punishment Internat...
Persuasive Essay On Capital Punishment. Essay on Capital Punishment  Internat...Persuasive Essay On Capital Punishment. Essay on Capital Punishment  Internat...
Persuasive Essay On Capital Punishment. Essay on Capital Punishment Internat...
 
Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012Visualising data: Seeing is Believing - CS Forum 2012
Visualising data: Seeing is Believing - CS Forum 2012
 
Surfacing Real-World Event Content on Twitter
Surfacing Real-World Event Content on TwitterSurfacing Real-World Event Content on Twitter
Surfacing Real-World Event Content on Twitter
 
Tutorial semantic wikis and applications
Tutorial   semantic wikis and applicationsTutorial   semantic wikis and applications
Tutorial semantic wikis and applications
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
CeB - f - s01
CeB - f - s01CeB - f - s01
CeB - f - s01
 
Relational database
Relational databaseRelational database
Relational database
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
Noah A Smith - 2017 - Invited Keynote: Squashing Computational Linguistics
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
Temporal Case Management 1998
Temporal Case Management  1998Temporal Case Management  1998
Temporal Case Management 1998
 
2015 07-tuto2-clus type
2015 07-tuto2-clus type2015 07-tuto2-clus type
2015 07-tuto2-clus type
 
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
Measuring the Complexity of the Law: The United States Code ( Slides by Danie...
 
Session 18, Oegema
Session 18, OegemaSession 18, Oegema
Session 18, Oegema
 
M21 and RDA
M21 and RDAM21 and RDA
M21 and RDA
 
From text to entities: Information Extraction in the Era of Knowledge Graphs
From text to entities: Information Extraction in the Era of Knowledge GraphsFrom text to entities: Information Extraction in the Era of Knowledge Graphs
From text to entities: Information Extraction in the Era of Knowledge Graphs
 

Recently uploaded

Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagraadet6151
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7gragkhusi
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理pyhepag
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyRafigAliyev2
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group MeetingAlison Pitt
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一0uyfyq0q4
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理pyhepag
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一hwhqz6r1y
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp onlinebalibahu1313
 

Recently uploaded (20)

Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat ViagraToko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
Toko Jual Viagra Asli Di Salatiga 081229400522 Obat Kuat Viagra
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 

Towards Encoding Time in Text-Based Entity Embeddings

  • 1. Towards Encoding Time in Text-Based Entity Embeddings Federico Bianchi, Matteo Palmonari and Debora Nozza University of Milano-Bicocca INSID&S Lab Interaction and Semantics for Innovation with Data & Services International Semantic Web Conference, Monterey, California. 2018 MIND Lab Models in Decision making and data analysis
  • 2. Knowledge Graphs Large knowledge bases Entities classified using types Types organized in sub-types graphs Binary relationships between entities Semantics and inference via rules/axioms Semantic similarity with lexical, topological and other feature-based approaches A.S. Roma Kostas Manolas team Soccer Player Soccer Club Athlete Thing Person Sports Club Garry Kasparov Chess Player Real Madrid Organis.
  • 3. Knowledge Graphs Embeddings Generate vector representations of entities and relationships A.S. Roma Kostas Manolas team 2 5 6 2 6 4 2 12 5 2 Kostas Manolas A.S. Roma 4 2 12 5 2 team Given in input a KG Generate vector representations Embedding Algorithm Why should we embed? ● Latent components (e.g., → link prediction) ● Features generation (e.g., → entity linking) ● Fast and intuitive way to compute similarity
  • 4. From Word Embeddings to Text-based Entity Embeddings - Word embeddings (e.g., [Mikolov+, 2013]) - Text-based Entity Embeddings - Text as main source vs. Graph as main source [Bordes+,2013][Trouillon+,2016] - Typed Entity Embeddings (TEE): use word embeddings algorithms on documents where entities and types replace words (next slide :) ) - Pros: good for similarity evaluation - Cons: no embedding of relations, just entity corpus cat black eats dog similar words corresponds to similar vectors C W The big black cat eats its food. My little black cat sleeps all day. Sometimes my cat eats too much!
  • 5. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 6. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …”“Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 7. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 8. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text“Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 9. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text Concatenate “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] Wikipedia’s abstracts
  • 10. TEE: Typed Entity Embeddings from Text “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” “dbr:Rome dbr:Italy dbr:Rome dbr:Lazio …” “dbo:City dbo:Country City dbo:Administrative_Region …” Generate Type Vectors From Text Generate Entity Vectors From Text Concatenate “Rome is the capital of Italy and a special comune (named Comune di Roma Capitale). Rome also serves as the capital of the Lazio region.” Link to DBpedia entities via named entity linking tools Replace entities with their most specific types [Bianchi+,2017b] [Bianchi+, 2018a] 1 3 6 3 19 5 6 v(Rome)v(City) Wikipedia’s abstracts
  • 11. Why Time? ● To the best of our knowledge this is the first approach to explicitly encode time periods into entity embeddings ● We expect that when we evaluate similarity between entities time is important: ○ Entities are similar when they co-occur frequently, entities that share a time period co-occur Most similar entities to “Winston Churchill” are his contemporary politicians ● In this paper we try to provide an approach to explicitly encode time in such a way that we can use those representation to control the similarity with respect to time Winston Churchill Harold Macmillan
  • 12. Textual Descriptions of Time Periods via Events
  • 13. Textual Descriptions of Time Periods via Events “The succession of events is an inherent property of our time perception. Memory is necessary, and the order of these events is fundamental” Snaider&al. 2012, Cognitive Systems Research
  • 14. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description
  • 15. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description
  • 16. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description Adolf Hitler Nazi Germany World War II 4 3 6 2 3 5 1 2 9 2 1 2 8 4 1
  • 17. Embedding Years from Event Descriptions A year is represented by the set of entities taking part in the year’s events The year vector is the average of the entities’ vectors found inside the description Adolf Hitler 4 3 6 2 3 Nazi Germany 5 1 2 9 2 World War II 1 2 8 4 1 1941 9 2 3 5 5 AVG
  • 18. Towards Time Aware Similarity Time flattened similarity: to reduce the impact of time in the similarity. E.g., make US presidents similar independently from their temporal context. Time boosted similarity: to boost the impact of time in the similarity. E.g., make politicians that share temporal contexts more similar
  • 19. Time Flattened Similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 20. Time Flattened Similarity Extract the embeddings for the two entities What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 21. Time Flattened Similarity 1999 2003 Find the closest year vectors to the two entity embeddings (e.g., the entity vector of Barack Obama is close to the vector of the year 2003). What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 22. Time Flattened Similarity 1999 2003 𝝍( , ) What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 23. Time Flattened Similarity 1999 2003 𝝍( , ) = η( , ) Cosine similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 24. Time Flattened Similarity 1999 2003 𝝍( , ) = η( , ) - ηn ( , )1990 2003 Normalized cosine similarity What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 25. Time Flattened Similarity 1999 2003 𝝍( , ) = ⍺η( , ) - (1 - ⍺) ηn ( , )1999 2003 ⍺ to control the weight of the time factor What’s the time flattened similarity between Barack Obama and Bill Clinton?
  • 26. Experiments: Research Questions 1. Quality: properties of the year embeddings 2. Similarity and Time: a. Time Bias in TEE and EE: Effect of time in entity embeddings from text i. Adherence to Natural Time Order ii. Clustering WWI and WWII Battles iii. Relative Ordering of Entities b. Controlling Time Bias: handling the effect of time
  • 27. Embedded Representations vs. Natural Time Flow 191X years 201X years PCA in 1D vs. natural order of years: Kendall τ = 0.80 and Spearman Rank correlation coefficient = 0.94 Good resemblance of natural time flow! 2D projection (PCA) 1D projection (PCA)
  • 28. Time Bias: Adherence to Natural Time Order Task: count number of entities shared by sequences of 2-3 contiguous years vs number of entities shared in non contiguous years (randomly sampled): ● (e.g, 1991-1992 vs 1934-1992) Dataset: two and three contiguous years and non contiguous years (1931-1991). Results: contiguous years share an higher amount of entities than non contiguous years.
  • 29. Time Bias: Clustering Battles with EE Task: classify battles as belonging to WWI or WWII. Dataset: 152 resource identifier of WWI (63) and WWII (89) battles from Wikipedia. Method: K-means clustering (K=2) on the vector representation in the entity embedding space. Results: 95% accuracy. Centroids of the two groups are close to WWI years and WWII years respectively.
  • 30. Controlling Time Bias: Flattened Similarity Task: find similar entities to a given input entity but that are far in time Barack Obama
  • 31. Controlling Time Bias: Flattened Similarity Task: find similar entities to a given input entity but that are far in time. E.g., find past president given one Ford Coolidge Hoover T. Kennedy Truman Barack Obama
  • 32. Controlling Time Bias: Flattened Similarity Task: find similar entities to a given input entity but that are far in time. E.g., find past president given one Ford Coolidge Hoover T. Kennedy Truman Barack Obama Correct Correct Correct Correct Wrong
  • 33. Controlling Time Bias: Flattened Similarity Dataset: US presidents entities and British Prime ministers entities (19 and 19) Method: start with the 6 most recent presidents for each group. For each entity compute the number of older presidents that are in the ranked list created by the similarity measures. Time flattened reorders top-100 results from cosine similarity Algorithms: ● Time-aware Similarity TEE (TATEE), with time-flattened similarity; ● Similarity TEE (STEE) (standard neighborhood with cosine); ● Time-Aware Similarity EE (TAEE), with time-flattened similarity; ● Similarity EE (SEE) (standard neighborhood with cosine); ● Time-flattened similarity Wiki2Vec (Baseline).
  • 34. Controlling Time Bias: Flattened Similarity Results: time-flattened similarity on TATEE seems able to get the best results. This is also due to the fact that TATEE considers type representations and thus it can easily retrieve entities sharing types.
  • 35. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE
  • 36. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE Clinton Reagan G. Bush Carter Al Gore Nixon Ford Coolidge T. Kennedy Hoover Time flattened similarity to reorder the top-100 most similar alpha = 0.7 New New New New
  • 37. Controlling Time Bias: Qualitative Analysis Clinton Reagan G. Bush Carter Al Gore Nixon J. Kerry D. Cheney McCain Biden The most similar entities to Barack Obama using cosine similarity in TEE Time flattened similarity to reorder the top-100 most similar alpha = 0.1 New New New New Ford Coolidge Hoover Truman Roosevelt Wilson E. Roosevelt Harding Cleveland Eisenhower New New New New New New
  • 38. Conclusions and Future Work Conclusions ● Time can be represented in the vector space using events descriptions ● Time sneaks into entity similarity (time bias) ● Time bias can be controlled by considering explicit representations of time periods Future Work ● Study compositionality of time periods representations ● Comparison with Doc2Vec ● Improve time-aware similarity measure ● Comparison with other KG embeddings models
  • 39. References Snaider, J., McCall, R., & Franklin, S. (2012). Time production and representation in a conceptual and computational cognitive model. Cognitive Systems Research, 13(1), 59-71. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., & Yakhnenko, O. (2013). Translating embeddings for modeling multi-relational data. In Advances in neural information processing systems (pp. 2787-2795). Trouillon, T., Welbl, J., Riedel, S., Gaussier, É., & Bouchard, G. (2016, June). Complex embeddings for simple link prediction. In International Conference on Machine Learning (pp. 2071-2080). Tran, N. K., Tran, T., & Niederée, C. (2017, May). Beyond time: Dynamic context-aware entity recommendation. In European Semantic Web Conference (pp. 353-368). Springer, Cham. Bianchi, F., Soto, M., Palmonari, M., & Cutrona, V. (2018). Type vector representations from text: An empirical analysis. In Deep Learning for Knowledge Graphs and Semantic Technologies Workshop, co-located with the Extended Semantic Web Conference, Crete. Bianchi, F., Palmonari, M., & Nozza, D. (2018), “Towards Encoding Time in Text-Based Entity Embeddings” in International Semantic Web Conference (to appear), Monterey, California.
  • 40. References Bianchi, F., Palmonari, M., Cremaschi, M., & Fersini, E. (2017, May). Actively learning to rank semantic associations for personalized contextual exploration of knowledge graphs. In European Semantic Web Conference (pp. 120-135). Springer, Cham. Bianchi, F., & Palmonari, M. (2017). Joint learning of entity and type embeddings for analogical reasoning with entities. In In Proceedings of the NL4AI Workshop, co-located with the International Conference of the Italian Association for Artificial Intelligence (AI* IA).
  • 42. Qualitative Evaluation of Time Flattened Similarity Winston Churchill Harold Macmillan Tony Blair Gordon Brown Most similar 49th in the list of most similars 41st in the list of most similars Method: Cosine similarity Input: Winston Churchill
  • 43. Qualitative Evaluation of Time Flattened Similarity Winston Churchill Margaret Thatcher Tony Blair Gordon Brown Most similar 16th in the list of most similars 14th in the list of most similars Method: Time-flattened Similarity Input: Winston Churchill