Exploring Term Selection for Geographic Blind Feedback
1. Exploring Term Selection for Geographic
Blind Feedback
Johannes Leveling
Intelligent Information and Communication Systems (IICS)
University of Hagen (FernUniversität in Hagen)
58084 Hagen, Germany
firstname.lastname@fernuni-hagen.de
GIR 2007 Workshop, Lisbon, Portugal
2. Exploring
Term
Selection for
Geographic
Blind
Outline
Feedback
Johannes
Leveling
1 Introduction
Introduction
Creating a
Geographical 2 Creating a Geographical Knowledge Base
Knowledge
Base
GeoNames Data
GeoNames Data
PND Data
PND Data
Experiments
on
Geographic 3 Experiments on Geographic Blind Feedback
Blind
Feedback
Experimental Settings
Experimental
Settings Results
Results
Discussion Discussion
Outlook
References 4 Outlook
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 2 / 18
3. Exploring
Term
Selection for
Geographic
Blind
Blind Feedback
Feedback
Johannes
Leveling General idea:
Introduction
Improve IR performance by expanding a query
Creating a 1 The original query Qo is processed and an initial
Geographical
Knowledge ranked result set Ro of documents is obtained
Base
GeoNames Data
PND Data
2 D documents from Ro are selected and presumed to be
Experiments
relevant
on
Geographic 3 T terms from these documents are extracted for
Blind
Feedback relevance feedback
Experimental
Settings
Results
4 Qo is modified into the final query Qf , merging the
Discussion
extracted terms into the query and possibly
Outlook
re-weighting all terms
References
5 The final result set Rf is retrieved with the query Qf
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 3 / 18
4. Exploring
Term
Selection for
Geographic
Blind
Application of Blind Feedback to
Feedback
Johannes
GIR (1/2)
Leveling
• Gey and Larson (2):
Introduction
an improvement on the order of 53% to 72% MAP (mean
Creating a
Geographical average precision) was achieved for some monolingual
Knowledge
Base German GIR topics on the GeoCLEF 2006 data (using
GeoNames Data
PND Data
T = 30, D = 5); no significant improvement for English
Experiments • Gey and Petras (1):
on
Geographic “the most improved queries seem to add mostly proper
Blind
Feedback
names and word variations and very few irrelevant words
Experimental
Settings
that won’t distort the search towards another direction”
Results
Discussion
and “blind feedback improves precision, but it seems to do
Outlook so for only a particular kind of query”
References
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 4 / 18
5. Exploring
Term
Selection for
Geographic
Blind
Application of Blind Feedback to
Feedback
Johannes
GIR (2/2)
Leveling
Introduction
Creating a • Blind feedback (BF) is a method originating (and
Geographical
Knowledge intended for) ad-hoc retrieval
Base
GeoNames Data → BF does not yet reflect the geographic orientation of
PND Data
Experiments
GIR
on
Geographic
→ novel methods for document and term selection are
Blind
Feedback
required, preferably based on geographic knowledge
Experimental
Settings → BF does not generally increase performance
Results
Discussion significantly, even in standard IR
Outlook → application to GIR without adaptations seems
References questionable
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 5 / 18
6. Exploring
Term
Selection for
Geographic
Blind
The Geographical Knowledge
Feedback
Johannes
Base (GKB)
Leveling
Introduction
Creating a
Geographical
Knowledge • Avoid ambiguities for location names; sacrifice
Base
GeoNames Data
coverage (i.e. focus on important places)
PND Data
Experiments
→ Create small geographic knowledge base (GKB) with
on meronymy relations (part-whole-relations)
Geographic
Blind • GKB based on two resources:
Feedback
Experimental
Settings
• Linking between Wikipedia articles and authority
Results
Discussion
records for persons (PND), and
• GeoNames data for the largest cities world-wide
Outlook
References
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 6 / 18
7. Exploring
Term
Selection for
Geographic
Blind
GeoNames data
Feedback
Johannes • GeoNames provides data for populated places world-wide
Leveling
with more than 1,000, 5,000, or 15,000 inhabitants
Introduction • Entries contain geographic codes for the continent,
Creating a country, and administrational divisions
Geographical
Knowledge • Data for cities with more than 5,000 inhabitants
Base
GeoNames Data
→ meronymy relations for 41,228 entries
PND Data
• Names are translated by utilizing the Wikipedia linking
Experiments
on
between articles in English and German
Geographic
Blind
• Example: Nuenen is a populated place in North Brabant,
Feedback in The Netherlands in Europe
Experimental
Settings
Results
→ meronym(Nuenen, North Brabant),
Discussion → meronym(North Brabant, The Netherlands),
Outlook → meronym(The Netherlands, Europe)
References → A place is important if it is highly populated
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 7 / 18
8. Exploring
Term
Selection for
Geographic
Blind
PND Data
Feedback
Johannes • Wikipedia articles are linked with authority records for
Leveling
persons from the PND (Personennamendatei)
Introduction • PND contains information such as a person’s name, his or
Creating a her place and date of birth, place and date of death, and
Geographical
Knowledge profession
Base
GeoNames Data
• Specification of a place often encodes meronymy
PND Data
information
Experiments
on
• 152,650 PND entries → 27,734 unique meronymy
Geographic
Blind
relations
Feedback • Example: Edsger Wybe Dijkstra was born in Rotterdam,
Experimental
Settings
Results
Niederlande/the Netherlands in 1930; died in Nuenen,
Discussion Niederlande/the Netherlands in 2002
Outlook → meronym(Rotterdam, The Netherlands),
References → meronym(Nuenen, The Netherlands)
→ A place is important if some well-known person was born
or died there
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 8 / 18
9. Exploring
Term
Selection for
Geographic
Blind
Towards Less Ambiguity in
Feedback
Johannes
Geographic Resources
Leveling
characteristic GeoNames cities (pop. > X )
Introduction
X=1,000 X=5,000 X=15,000
Creating a
Geographical
Knowledge unique loc. names 124,315 83,680 57,172
Base
GeoNames Data
ambiguous loc. names 22,616 13,133 7,551
PND Data
senses per loc. name 1.587 1.455 1.345
Experiments
on
Geographic
Blind
Feedback
Experimental
Settings
Results
Discussion
Outlook
References
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 9 / 18
10. Exploring
Term
Selection for
Geographic
Blind
The Meronymy Predicate
Feedback
Johannes
Leveling
Transitive meronymy predicate mero? for two location
Introduction names:
Creating a
Geographical true if L1 is a meronym of L2
Knowledge mero?(L1, L2) :=
Base
GeoNames Data
false otherwise
PND Data
Experiments
on • Example:
Geographic
Blind mero?(Berlin, Germany) returns true
Feedback
Experimental mero?(Hong Kong, France) returns false
Settings
Results
Discussion
→ Allows term selection in BF based on meronymy
Outlook information in GKB
References → Geographic Blind Feedback
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 10 / 18
11. Exploring
Term
Selection for
Geographic
Blind
Experimental Setup
Feedback
Johannes
Leveling
Introduction
Creating a • GeoCLEF documents: 275,000 German newspaper
Geographical
Knowledge articles from Frankfurter Rundschau, Schweizerische
Base
GeoNames Data
Depeschenagentur, and Der Spiegel from the years
PND Data
1994 and 1995
Experiments
on • GeoCLEF topics: 25 topics from 2006 with a title, a
Geographic
Blind
Feedback
short description, and a narrative part
Experimental
Settings • GIRSA system: setup similar to previous GIR
Results
Discussion experiments on GeoCLEF data (4; 3)
Outlook
References
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 11 / 18
12. Exploring
Term
Selection for
Geographic
Blind
Experimental Settings for
Feedback
Johannes
Retrieval Experiments (D=5)
Leveling
Introduction L: only location names are selected from the top ranked
Creating a documents as blind feedback terms
Geographical
Knowledge
Base
M: location names are filtered utilizing the mero?
GeoNames Data predicate, keeping meronyms of a search term in the
PND Data
Experiments
original query as BF terms
on
Geographic H: a location name is filtered from the BF terms if it there
Blind
Feedback is an inverse meronymy relation to a search term in the
Experimental
Settings original query (holonym)
Results
Discussion
B1 : (Baseline) no blind feedback; query terms are
Outlook
associated with static weights
References
B2 : (Baseline) no blind feedback; bag-of-words query;
query terms are not weighted
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 12 / 18
13. Exploring
Term
Selection for
Geographic
Blind
Results for Retrieval
Feedback
Johannes
Experiments (1/2)
Leveling
Performance plot
Introduction
0.25
Creating a B1 ×
Geographical
Knowledge L ♦
Base H +
GeoNames Data + M
♦
PND Data
0.24 × × × × × ♦
× × ×
+
♦ ♦
Experiments ♦ +
on
Geographic
+
MAP ♦ +
Blind +
Feedback +
Experimental ♦
Settings 0.23 +
♦
Results
Discussion
Outlook
References
0.22
5 10 15 20 25 30 35 40
Number of terms T
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 13 / 18
14. Exploring
Term
Selection for
Geographic
Blind
Results for Retrieval
Feedback
Johannes
Experiments (2/2)
Leveling
Topic experiment
Introduction
Creating a
Geographical
B1 L H M B2
Knowledge
Base GC028 0.38 0.24 0.22 0.41 0.28
GeoNames Data
PND Data GC030 ∗ 0.81 0.65 0.66 0.63 0.71
Experiments
on
GC032 0.60 0.62 0.62 0.70 0.49
Geographic
Blind
GC039 0.00 0.03 0.03 0.01 0.00
Feedback
Experimental
GC044 0.33 0.33 0.33 0.33 0.33
Settings
Results
GC048 0.87 0.89 0.89 0.66 0.85
Discussion
Outlook MAP 0.24 0.23 0.23 0.24 0.19
References
P@5 0.31 0.32 0.31 0.34 0.24
P@10 0.27 0.24 0.24 0.29 0.21
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 14 / 18
15. Exploring
Term
Selection for
Geographic
Blind
Discussion of Results
Feedback
Johannes
Leveling
Introduction • MAP did not change considerably when using BF
Creating a compared to the upper baseline B1 (0.24)
Geographical
Knowledge • The BF strategy M (selecting meronyms) clearly
Base
GeoNames Data
PND Data
outperforms the second baseline B2 (0.24 vs. 0.19)
Experiments • Precision at five documents was increased (from
on
Geographic 0.31/0.24 in the baseline experiments to 0.34 in the
Blind
Feedback M-run)
Experimental
Settings
Results
• Per-topic comparison of MAP between B1 and M:
Discussion
MAP was increased for nine, decreased for three topics
Outlook
in M-run
References
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 15 / 18
16. Exploring
Term
Selection for
Geographic
Blind
Discussion
Feedback
Johannes
Leveling
Introduction
• Geographic semantic relation in is not used in all topics.
Creating a
Geographical Seven topics with near, in a distance of, alongside, or
Knowledge
Base around. Five of these with MAP of less than 0.03
GeoNames Data
PND Data • GKB mostly covers cities and does not include
Experiments
on
information on rivers, seas, lakes, etc.
Geographic
Blind • The initial result set may be difficult to improve. Highest
Feedback
Experimental
MAP for official monolingual German experiments in
Settings
Results GeoCLEF 2006: 0.22 (see (3))
Discussion
Baseline experiment B1 : 0.24 MAP
Outlook
References
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 16 / 18
17. Exploring
Term
Selection for
Geographic
Blind
Outlook
Feedback
Johannes
Leveling
Introduction
Creating a
Geographical • Focus on finding even more geographically oriented
Knowledge
Base term and document selection criteria
GeoNames Data
PND Data
• Investigate setting the parameters T and D in a flexible
Experiments
on way
Geographic
Blind • Consider more geographic semantic relations (other
Feedback
Experimental
Settings
than meronymy) in term selection for blind feedback
Results
Discussion
Outlook
References
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 17 / 18
18. Exploring
Term
Selection for
Geographic
Blind
Selected References
Feedback
Johannes
[1] Fredric C. Gey and Vivien Petras. Berkeley2 at GeoCLEF:
Leveling Cross-language geographic information retrieval of English and
German documents. In Carol Peters, editor, Results of the CLEF
Introduction
2005 Cross-Language System Evaluation Campaign , Vienna,
Creating a Austria, 2005.
Geographical
Knowledge [2] Ray Larson and Fredric C. Gey. GeoCLEF text retrieval and manual
Base
GeoNames Data
expansion approaches. In Alessandro Nardi, Carol Peters, and
PND Data José Luis Vicedo, editors, Results of the CLEF 2006 Cross-Language
Experiments System Evaluation Campaign , Alicante, Spain, 2006.
on
Geographic [3] Johannes Leveling and Dirk Veiel. Experiments on the exclusion of
Blind metonymic location names from GIR. In Carol Peters, et al., editors,
Feedback
Experimental Evaluation of Multilingual and Multi-modal Information Retrieval: 7th
Settings
Results
Workshop of the Cross-Language Evaluation Forum, CLEF 2006,
Discussion volume 4730 of LNCS, pages 901–904. Springer, Berlin, 2007.
Outlook [4] Johannes Leveling, Sven Hartrumpf, and Dirk Veiel. Using semantic
References networks for geographic information retrieval. In Carol Peters, et al.,
editors, Accessing Multilingual Information Repositories: 6th
Workshop of the Cross-Language Evaluation Forum, CLEF 2005,
volume 4022 of LNCS, pages 977–986. Springer, Berlin, 2006.
Johannes Leveling Exploring Term Selection for Geographic Blind Feedback 18 / 18