Information on the temporal interval of validity for facts described by RDF triples plays an important role in a large number of applications. Yet, most of the knowledge bases available on the Web of Data do not provide such information in an explicit manner. In this paper, we present a generic approach which addresses this drawback by inserting temporal information into knowledge bases. Our approach combines two types of information to associate RDF triples with time intervals. First, it relies on temporal information gathered from the document Web by an extension of the fact validation framework DeFacto. Second, it harnesses the time information contained in knowledge bases. This knowledge is combined within a three-step approach which comprises the steps matching,
selection and merging. We evaluate our approach against a corpus of facts gathered from Yago2 by using DBpedia and Freebase as input and different parameter settings for the underlying algorithms. Our results suggest that we can detect temporal information for facts from DBpedia
with an F-measure of up to 70%.
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
Hybrid acquisition of temporal scopes for rdf data
1. Hybrid Acquisition of Temporal Scopes for
RDF Data
Anisa Rula1, Matteo Palmonari1, Axel-Cyrille Ngonga Ngomo2,
Daniel Gerber2, Jens Lehmann2, and Lorenz Bühmann2
1. University of Milano-Bicocca, SITI Lab
2. Universität Leipzig, Institut für Informatik, AKSW
3. team
team
Temporally annotated RDF triples
Alexandre Pato
S.C. Corinthians
Anisa Rula
Some facts are always valid while other facts are valid for a certain
time interval (volatile facts)
Volatile facts are represented by triples whose validity is defined by a
time interval i.e. the temporal scope
Temporal Scoping of RDF triples
2007-2013
2013-2014
Temporal scopes,
represented by
time intervals
A.C. Milan
3
4. Motivation
World changes: relations represented in RDF triples may be valid only
for a specific time interval [Gutierrez et al.,2005]
o E.g. <Alexandre_Pato, team, A.C._Milan> [2007,2013]
Many applications have to use temporally annotated RDF triples
o E.g. Temporal Query Answering, Question Answering over KBs, Temporal
Reasoning, Timelines
Challenges
Low availability and quality of temporal information in RDF data
NLP challenges for web-scale temporal information extraction
(scalability, availability of corpus, conflicting information) [Derczynsk et
al., 2013]
Motivation & Challenges
Anisa Rula 4
Temporally annotated RDF triples are largely
unavailable or incomplete in the LOD
(Rula et al., 2012)
5. Anisa Rula
Approach Overview: Use the Web as Source of
Evidence
Web of Data - RDF
(61.9 Billion)
World Wide Web
(1.8 Billion)
Source of
evidence
Temporally annotated RDF triples
team
team
Alexandre Pato
team
team
Alexandre Pato
S.C. Corinthians
A.C. Milan
2007-2013
2013-2014S.C. Corinthians
A.C. Milan
5Anisa Rula
Use evidence from the Web for temporal scoping of RDF triples
6. Web of Documents
Mapping facts to time intervals
Temporal Information
Extraction
fact
t1 occ1
t2 occ2
t3 occ3
t4 occ4
Matching Selection
Reasoning
Approach Overview: Hybrid Acquisition of Time
Scopes
<s,p,o>
Web of
Data
t1 t2 t3 … tn
t1
t2
t3
…
tn
Temporally annotated
RDF triples
6Anisa Rula
Set of disconnected
time intervals
<s,p,o>[x1,y1],…,[xn,yn]
7. Temporal Information Extraction - Web Documents
Anisa Rula 7
DeFacto [Lehmann & al. 2012]
Retrieves a set of webpages that
confirm the given RDF triple
The RDF triple issued to the search
engine is verbalized by using natural
language patterns
Temporal Extension for DeFacto (TempDeFacto)
Apply Named Entity Tagger to extract the entities of type Date class
Observe the occurrences of the labels of the subject and object in less
than 20 tokens
Analyze the context window of n characters before and after subject-
object occurrences in order to retrieve the time points
Return a distribution vector of date and their number of occurrences
8. Temporal Information Extraction - Web Documents
Anisa Rula 8
<Alexandre_Pato,team, A.C._Milan>
“Alexandre Pato” “played for” “A.C. Milan”
“Pato” “’s striker” “Milan”
“CR7” “Mi”
Pato played for A.C. Milan from 2007 to 2013.
A.C. Milan’s top striker Pato left in 2013.
In 2013 Pato visited Milan for a short holiday.
2013 17
2007 11
2006 1
…. ….
2010 4
2009 4
1989 2
Occurrences of the labels of the subject and object
Context window of n characters before and after
subject-object occurrences
NamedEntityTagger
DeFacto Vector (dfv)
9. Temporal Information Extraction - Web of Data
<Alexandre_Pato>
Content negotiation
null null null null null null
0 null null null null null
0 0 null null null null
0 0 0 null null null
0 0 0 0 null null
0 0 0 0 0 null
1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
Relevant Interval Matrix (RIM)
Regular expressions
TAlexandre_Pato= {1989, 2000, 2006, 2007, 2008, 2013}
Relevant Time Points
RDF document d
Alexandre_Pato
Anisa Rula
The set of time intervals for a given
triple with starting and ending time
points defined with the set of
relevant time points
∀ 𝑟𝑖𝑚 𝑡𝑖 𝑡 𝑗
∈ 𝑅𝐼𝑀 𝑒 𝑤𝑖𝑡ℎ 𝑖, 𝑗 > 0
𝑓𝑜𝑟 𝑖 ≤ 𝑗 𝑟𝑖𝑚 𝑡𝑖 𝑡 𝑗
= 𝑛𝑢𝑙𝑙
𝑓𝑜𝑟 𝑖 > 𝑗 𝑟𝑖𝑚 𝑡𝑖 𝑡 𝑗
= 0
9
11. 1989 2000 2006 2007 2008 2013
1989
2000
2006
2007
2008
2013
SM
0.004 0.166 0.166 0.736 0.8 2.48
0 0 0.142 1.5 1.555 4.2
0 0 0.002 6 4.666 7.5
0 0 0 0.026 6.5 8.428
0 0 0 0 0.004 8
0 0 0 0 0 0.040
Mapping Facts to Time Intervals - Selection
2. Mapping Selection:
top-k function: selects the k intervals that have highest scores in the SM
neighbor-x: selects a set of intervals whose significance score is close to
the maximum significance score in the SM matrix, up to a certain
threshold x
neighbor-k-x: selects the top-k intervals in the neighborhood of the
interval with higher significance score
neighbor, 𝑥 = 23
top-k , 𝑘 = 3
neighbor-k-x , 𝑘 = 2, 𝑥 = 23 [2007, 2013][2008, 2013]
[2006,2013][2007, 2013][2008, 2013]
[2007,2008][2006,2013][2007, 2013][2008, 2013]
Matching
Selection
Reasoning
11Anisa Rula
12. [2007, 2013][2008, 2013]
[ 2007 2013]
Mapping Facts to Time Intervals - Reasoning
3. Interval merging via reasoning based on Allen’s algebra
relation
<Alexander_Pato,playsFor, A.C._Milan>
Matching
Selection
Reasoning
12Anisa Rula
13. Experimental Setup - Dataset
Dataset # facts Domain Property Equivalent Property
Freebase Yago2
DBpedia 1000 Sport team team playsFor
DBpedia 1000 Politicians office government_positions_held holdsPoliticalPosition
DBpedia 500 Celebrities spouse spouse ismarriedTo
Dataset: 2500 DBpedia triples with semantic equivalent triples in Freebase
and Yago2
Gold standard: triples annotated with temporal scopes in Yago2
manually curated to correct missing or wrong values
Anisa Rula 13
14. Experimental Setup - Evaluation Measures
The evaluation measures capture the degree of overlap between the
retrieved intervals and the intervals in the gold standard
Precision (for a triple): number of time points in the temporal scope
that fall into the time interval in the gold standard
Recall (for a triple): number of time points in the gold standard that are
covered by the temporal scope
F1 measure (for a triple): the harmonic mean of precision and recall
Macro-averaged F1 (avgF-1): aggregated measure for a set of triples
14Anisa Rula
2007 2011
2008 2010
2007 2011
2006 2012
2007 2011
2007 2011
F1=1F1=0.83F1=0.75
Ref
R
15. Temp prop DBpedia Freebase TemporalDeFacto
Config #facts avgF1 Config #facts avgF1 Config #facts avgF1
playsFor top-1 loc 264 0.505 top-1 loc 213 0.477 top-3 311 0.511
holdsPolitica
lPosition
neigh-10 702 0.699 neigh-10-2 242 0.549 top-3 709 0.586
ismarriedTo neigh-10 702 0.600 neigh-10 524 0.547 top-3 709 0.545
Good quality of the approach with an avgF1 of up to 70%
Using evidence from RDF documents the performance can be
significantly improved (significantly better results for two properties and
negligibly worst results for one property)
Experimental Results - Accuracy of Best
Configurations for all Properties
Different sources for the creation of the RIM
Setup different configurations in the selection and reasoning steps:
o E.g. config top-3 refers to selection function top-3 and reasoning = yes
15Anisa Rula
16. Temp prop Source Configuration
With
reasoning
Without
reasoning
#fact avgF1 #fact avgF1
playsFor TempDeFacto top-3 311 0.511 505 0.467
holdsPoliticalPosition DBpedia neigh-10 702 0.699 822 0.667
ismarriedTo DBpedia neigh-10 705 0.600 977 0.563
The best results are obtained when reasoning is enabled
Experimental Results - Accuracy with vs. without
Reasoning for all Properties
The best configurations for the three properties
16Anisa Rula
17. Conclusions & Future Work
Summary
Temporal extension of the DeFacto framework
Modeling a space of relevant time intervals given an RDF triple
Mapping volatile facts to time intervals based on a three-phase algorithm
Unsupervised method
Future work
Determine when to add or not to add the temporal scope based on the
confidence of the acquisition process
Collect additional relevant time points to improve the overall results
Show the effectiveness of acquired temporal scopes in temporal query
answering
17Anisa Rula
18. Thank you for your attention
Question?
#eswc2014Rula
18Anisa Rula
19. References
[Rula&2012] Anisa Rula, Matteo Palmonari, Andreas Harth, Steffen Stadtmüller,
Andrea Maurino: On the Diversity and Availability of Temporal Information in
Linked Open Data. International Semantic Web Conference (1) 2012: 492-507
[Gutiérrez&2005] C. Gutierrez, C. A. Hurtado, and A. A. Vaisman. Temporal RDF.
In The 2ndESWC, pages 93-107, 2005
[Lehmann&2012] Jens Lehmann, Daniel Gerber, Mohamed Morsey, Axel-Cyrille
Ngonga Ngomo: DeFacto - Deep Fact Validation. International Semantic Web
Conference (1) 2012: 312-327
[Ling&2010] X. Ling and D. S. Weld. Temporal information extraction. In 25th
AAAI, 2010.
[Derczynsk&2013] L. Derczynski and R. Gaizauskas. Information retrieval for
temporal bounding. In 4th ICTIR, pages 29:129–29:130. ACM, 2013.
19Anisa Rula
Notes de l'éditeur
A temporal
1.8 Billion from http://www.worldwidewebsize.com/
Note: we also consider more temporal annotation per triple!
Temporally annotated RDF triples are useful for many reasons...
-facts are usually considered as time invariant while in reality they dynamically change
Large problem space (even at high temporal granularity levels, e.g., all possible time intervals at year granularity)
Can be used as a dimensions along with facts can be organized, ranked or explored
Relevancy ranking purposes
1.8 Billion from http://www.worldwidewebsize.com/
Finally we return a distribution of all dates and their number of occurrences in a given context. Hence, the output of temporal DeFacto for a fact f <s, p, o> can be regarded as a vector DFV over all possible time points ti whose ith entry is the number of co-occurrences of s or o with ti
The links between the facts and the date are lost
We assume that temporal triples contain relevant
Dates are considered at year level
Each cell in the SM represents the significance of the interval identified by the cell for the given fact based on the distribution of time points acquired from the web
- inject a time distribution vector into the entity-level RIM by producing a significance matrix SM
Each cell of the matrix where i<j is calculated as the number of time points included in the interval [i,j] (average of time points contained in the interval)
For the diagonal we provide in alternative another formula to penalize intervals of 1 year by giving a weight to the number of time point in the diagonal
%
-macro precision as the average of all facts
Difficulty of the task
Sufficient relevant time points
Macro averge
Difficult task since it depends on the number of available time points
Molti fatti siamo molto precisi
Altri meno
Future: capire quelli che non sono precisi il perché, dare la confidence