4. Introduction - Motivation
Dramatic increase in the number of the websites on
the internet
7.14
billion
pages
Difficulty in
finding and
exploring
new websites
Social
bookmarking
Recommendation
systems
5. Introduction – Turkish Effect
Recommendation systems search within user inputs
Users tend to use their own language on the internet
Turkey is listed as 32nd country in English proficiency
Turkish and English is very different languages!
6. Introduction – What is
proposed?
Tag-based recommendation system
For Turkish-language
Which is based on similarity, tag weight, tag
popularity;
Where semantic properties of tags are taken into
account
7. Related Work
Collaborative filtering
Widely accepted
No context!
Topic and pattern extraction
Usage of WordNet
A lexical database for the English language
2 papers are found for Turkish WordNet but no
source
8. Related Work
Similarity calculation methods
Durao & Dolog (2009) Reference paper
Tag popularity, tag representativeness and taguser affinity
Without any semantics analysis, 60 % acceptance
level achieved
11. Problem Definition
Aim -> User satisfaction
Recommend websites
User wants to use in the future,
Already using and finds interesting
12. Problem Definition
Challenge -> Different tagging purposes and
expectations
Website
Tag
Potential Purpose
zaytung.com
zaytung
Archiving
eksisozluk.com
alışkanlık
(ENG: habit)
Internet usage
habit
evekitap.com
ücretsiz kargo
(ENG: free shipping)
Categorizing
9gag.com
eğlenceli
(ENG: funny)
Definition
Data are taken from experiment
13. Algorithm
Steps of the algorithm
Spell-check
Stemming
Semantics
Analysis
Similarity
Calculation
14. Algorithm – Spell-Checking
Spell check on the tags
Add a single letter,
Delete a single letter,
Replace one letter and
Transpose two letters
Estimated tags occur or not in Turkish National Corpus.
15. Algorithm – Spell-Checking
Correction on URLs
Original URL
Corrected URL
https://www.deviantart.com/
deviantart.com
http://www.sahadan.com/Default.aspx
sahadan.com
http://www.yemeksepeti.com/AnonymouseDefault.aspx
yemeksepeti.com
Data are taken from experiment
16. Algorithm
Steps of the algorithm
Spell-check
Stemming
Semantics
Analysis
Similarity
Calculation
17. Algorithm – Stemming
Stems of the tags are extracted by removing
suffices.
Website
facebook.com
metu.edu.tr
deviantart.com
Original Tag
Corrected Tag
arkadaşlık
arkadaş
(ENG: friendship)
(ENG: friend)
mühendislik
mühendis
(ENG: engineering)
(ENG: engineer)
eğlenceli
eğlence
(ENG: funny)
(ENG: fun)
Data are taken from experiment
18. Algorithm
Steps of the algorithm
Spell-check
Stemming
Semantics
Analysis
Similarity
Calculation
19. Algorithm – Semantics Analysis
An open source «Turkish Thesaurus» project
125.022 <Word, Synonym> pairs
20. Algorithm – Semantics Analysis
Algorithm applied:
for each “tag” in ALL-DATA do:
for each “synonym” of “tag” in SYNONYM-LIST do:
if “synonym” occurs in ALL-DATA then:
add <user, site, “synonym”> to ALL-DATA
21. Algorithm – Semantics Analysis
User
Website
Tag
User1
milliyet.com.tr
haber (ENG: news)
User2
sabah.com.tr
gazete (ENG: newspaper)
Original data (ALL-DATA)
Word
Synonym
haber (ENG: news)
gazete (ENG: newspaper)
Synonym List (SYNONYM-LIST)
User
Website
Tag
User1
milliyet.com.tr
gazete (ENG: newspaper)
User2
sabah.com.tr
haber (ENG: news)
Added data to ALL-DATA
Data are taken from experiment
22. Algorithm – Semantics Analysis
An environment where all users provide tags and
their potential meanings which other people may
have already used.
23. Algorithm
Steps of the algorithm
Spell-check
Stemming
Semantics
Analysis
Similarity
Calculation
36. Future Work
Pre-processing Stage
English inputs
Site
yandex.com
Tags
harita, e-mail, arama
Turkish inputs with English letters
Site
eksiduyuru.com
Tags
duyuru, alinik, satilik
Translation or control over them
45. References
Adrian, B., Sauermann, L., & Roth-berghofer, T. (2007). ConTag: A
Semantic Tag Recommendation System. Proceedings of ISemantics’ 07
Aksan, Y. et al. (2012). Construction of the Turkish National Corpus (TNC).
In Proceedings of the Eight International Conference on Language
Resources and Evaluation (LREC 2012). İstanbul. Turkiye.
http://www.lrec-conf.org/proceedings/lrec2012/papers.html
Brill, E., & Moore, R. C. (2000). An Improved Error Model for Noisy
Channel Spelling Correction. (Microsoft Research)
Cattuto, C., Benz, D., Hotho, A., & Stumme, G. (2008). Semantic
Grounding of Tag Relatedness in Social Bookmarking Systems. In The
Semantic Web - ISWC 2008. 2008: Springer
Durao, F., & Dolog, P. (2009). A Personalized Tag-based Recommendation
in Social Web Systems. International Workshop on Adaptation and
Personalization for Web 2.0
46. References
Education First, (2012). EF EPI Country Rankings
Frankfurt International School, (2001). The Differences Between English
and Turkish
ISPA (Investment Support and Promotion Agency) of Turkey, (2010).Turkish
Information and Communication Technologies Industry. Deloitte
Nakamoto, R., Nakajima, S., Miyazaki, J., & Uemura, S. (2007). Tagbased Contextual Collaborative Filtering. IAENG International Journal of
Computer Science
Özbek, A. (2012). Türkçe Eşanlamlı Kelimeler Sözlüğü Projesi (Turkish
Thesaurus Project). Retrieved from http://github.com/maidis/mythes-tr