Ulmon's recipe for a travel guide is to fuse multiple open sources of data that you may otherwise use individually to plan your vacation, and present them as a coherent package. We are trying to fuse this data in such a way that the resulting whole is more valuable than the sum of its parts.
Our main sources of map data and knowledge about places are OpenStreetMap and Wikipedia respectively. This talk is about the challenges posed by connecting these two, and our strategies of coping with them.
6. Wikipedia tag in OpenStreetMap
08/05/2014 Linuxwochen Wien
http://taginfo.openstreetmap.org
7. Wikipedia tag statistics
Tag name Number of values
wikipedia 339,148
wikipedia:ru 30,457
wikipedia:en 16,432
wikipedia:de 13,923
wikipedia:es 4,706
404,666
Total Wikipedia entries with location:
1,621,704 in 15 languages
798,965 English
08/05/2014 Linuxwochen Wien
15. Type score
• Compare OSM tags with Dbpedia types
– Manual rules
– Word similarity
– Future: Synonymic analysis based on
Wordnet
08/05/2014 Linuxwochen Wien
16. Decision tree
• Generated using the J48 algorithm of the
Weka toolkit
• How to get learning data?
– Manual creation
– Parsing wikipedia tags from OSM
08/05/2014 Linuxwochen Wien
17. Ulmon’s matching performance
• Current
– Total wiki entries: 810K (674K English)
– Matched entries: 429K
• Future
– Total wiki entries: 1.6M
– Matched entries (extrapolation): 850K
08/05/2014 Linuxwochen Wien