Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Thesis_presentation_arda_tasci

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 57 Publicité

Thesis_presentation_arda_tasci

Télécharger pour lire hors ligne

This thesis proposes a core model to represent user profiles in a graph-based
environment which can be the base of different recommender system approaches as
well as other cutting edge applications for TV domain. The proposed graph-based
core model is explained in detail with node types, properties and edge weight
metrics. The capabilities of this core model are described in detail. Moreover, in this
thesis, a hybrid recommender system based on this core model is presented with its
design, development and evaluation phases. The hybrid recommendation algorithm
which takes unique advantages of different types of recommendation system
approaches such as collaborative filtering, context-awareness and content-based
recommendations, is explained in detail. The introduced core model and the hybrid
recommendation system are evaluated and compared with a baseline recommender
and the results are presented.

This thesis proposes a core model to represent user profiles in a graph-based
environment which can be the base of different recommender system approaches as
well as other cutting edge applications for TV domain. The proposed graph-based
core model is explained in detail with node types, properties and edge weight
metrics. The capabilities of this core model are described in detail. Moreover, in this
thesis, a hybrid recommender system based on this core model is presented with its
design, development and evaluation phases. The hybrid recommendation algorithm
which takes unique advantages of different types of recommendation system
approaches such as collaborative filtering, context-awareness and content-based
recommendations, is explained in detail. The introduced core model and the hybrid
recommendation system are evaluated and compared with a baseline recommender
and the results are presented.

Publicité
Publicité

Plus De Contenu Connexe

Les utilisateurs ont également aimé (20)

Similaire à Thesis_presentation_arda_tasci (20)

Publicité

Plus récents (20)

Thesis_presentation_arda_tasci

  1. 1. Middle East Technical University Computer Engineering A GRAPH – BASED CORE MODEL AND A HYBRID RECOMMENDER SYSTEM FOR TV USERS Arda Taşcı 05.02.2015 Supervisor: Prof. Dr. Nihan Kesim Çiçekli
  2. 2. Outline Introduction Background and Related Work Proposed Graph-Based Model Proposed Hybrid Recommender System Experiments and Evaluation Conclusion and Future Work
  3. 3. INTRODUCTION Motivation Our Study
  4. 4. Motivation • The most used conventional media tool[1] • 311 channels in Turkey and emerging new channels [2] • Users are getting lost to find relevant TV programs • TVs met the internet connection • Recommender Systems can help users • No specific applications or research for Turkish TV content
  5. 5. Our Study … proposes a graph-based model … proposes a hybrid recommender system for TV users over this model … presents the evaluation results of proposed system w.r.t a baseline method
  6. 6. BACKGROUND AND RELATED WORK Background Related Work
  7. 7. Background • Content-Based Systems • Collaborative filtering Systems • Knowledge-Based systems • Context-Aware Systems • Hybrid systems
  8. 8. Related Work (Huang et al., 2002) a method for keyword search and recommendation for digital libraries using two- layered graph architecture
  9. 9. Related Work • Bogers’ ContextWalk • Phuong similarity functions (Bogers et al., 2010)
  10. 10. GRAPH-BASED MODEL Node Types Edge Weight Metrics
  11. 11. Graph Based Model
  12. 12. USER PROG RAM PROG RAM PROG RAM Time Of Day Time Of Day Genre ACTOR ACTOR ACTOR Director Term Term Term Term Named Entity Named Entity Entity Nodes Attribute NodesContext Nodes Descriptor Nodes Named Entity Co-occuranceRelations
  13. 13. USER PROGRAM rating PROGRAM TERM TFIDF PROGRAM NAMED ENTITY TFIDF TERM TERM NAMED ENTITY NAMED ENTITY ACTOR ACTOR Co-occurance Co-occurance Co-occurance
  14. 14. Graph Based Model Capabilities • Content-based systems • Collaberative filtering systems • Context aware systems • Knowledge-based systems • Group recommandations • Personalization for TV users • Recommending other types of items • Targetted advertisments
  15. 15. HYBRID RECOMMANDATION SYSTEM OVER GRAPH-BASED MODEL Constructing Graph Based Model • User Log Collection • TV Program Content Information • Data Aggragetion Recommandation using Spreading Activation Algorithm
  16. 16. Constructing Graph Based Model User Log Collection • User Logs obtained by Arçelik A.Ş. – between the dates 1.12.2013 and 1.01.2014 – ~10 million user logs – 2938 distinct users
  17. 17. Constructing Graph Based Model User Log Collection Attribute Description id Unique id which is set by database agent user_id Unique id of the user channel_name Name of the channel start_time Start time of the watch event end_time End time of the watch event User Log User Log in Database
  18. 18. Constructing Graph Based Model TV Program Data Collection • EPG does not satisfy mature data in Turkey • Content providers were highly expensive • Solution : Web Crawling and scraping • Digiturk and Radikal are analyzed and Radikal is chosen.
  19. 19. Constructing Graph Based Model TV Program Data Collection
  20. 20. Constructing Graph Based Model TV Program Data Collection • TV program content information is collected form web in the same time interval (1.12.2013 and 1.01.2014) – 3769 distinct TV programs, – 36 distinct genres, – 1653 distinct actors, – 469 distinct directors, – 676 distinct named entities, – 3159 distinct terms
  21. 21. Constructing Graph Based Model TV Program Data Enhancement Label Time Period* NIGHT 00:00-04:00 EARLY MORNING 04:00-07:00 BREAKFEAST 07:00-09:00 LATE MORNING 09:00-13:00 DAYTIME 13:00-18:00 EVENING 18:00-20:30 PRIME TIME 20:30-24:00 * “Day Parting for TV - Wikipedia, the free encyclopedia.” [Online]. Available: http://en.wikipedia.org/wiki/Dayparting. [Accessed: 10-Jan-2015]. Day parting to extract time of day information
  22. 22. Constructing Graph Based Model TV Program Data Enhancement Term extraction operations using ZEMBEREK
  23. 23. Constructing Graph Based Model TV Program Data Enhancement <annotation text="Mehmet Yaşin lezzet rotasını bu kez çok uzaklara, Avrupa'nın çatısı Norveç'e çeviriyor. 3 bölüm sürecek olan uzun Norveç gezisinin ilk durağı, dünyanın en kuzeyinde, kuzey kutup noktasından önce üzerinde insan yaşamı olan son ada Svalbard."> <surfaceForm name="Mehmet Yaşin" offset="0"/> <surfaceForm name="Norveç" offset="114"/> <surfaceForm name="Svalbard" offset="230"/> </annotation> Named-entity extraction using DBPedia APIs
  24. 24. Constructing Graph Based Model Data Aggregation User Logs TV Program Content Graph Based Model Channel name Start time End time … Channel name Start time End time …
  25. 25. Constructing Graph Based Model Data Aggregation User Log – Channel Name TV Program Attribute – Channel Name ATVHD AtvHD ATVHD Atv HD ATV HD Channel name mapper
  26. 26. Recommandation using Spreading Activation Algorithm • Spreading activation : an algorithm designed for searching over associative networks, neural networks or semantic networks
  27. 27. u p pp p ne ne a ad dt t ne d a a p pp p p p p p
  28. 28. Recommandation using Spreading Activation Algorithm • decay_factor, is loss of passing which is set 0.6 heuristically for actors, directors, named-entities and terms • When the activation value of a node reaches 0,2 algorithm stops propagating • Collected program nodes are recommended to the users by ranking according to their activation value
  29. 29. EXPERIMENTS AND EVALUATION Evaluation Strategy and Metrics Experiments Results and Discussion
  30. 30. Evaluation Strategy • K-fold cross validation strategy • 3-fold cross validation is applied
  31. 31. Evaluation Metrics
  32. 32. Baseline Method • As baseline method, using the same dataset content-based user profiles are built which is kept in user x item vectors. • This method is commonly used as baseline methodology of the recommendation systems in information retrieval domain [14]. • The baseline method is also evaluated by 3-fold cross validation on the same data set.
  33. 33. Baseline Method User Matched Terms User1 “belgesel”, “aslan”,”göl”, … User2 “spor”,”ispanya”,”gol” … Item User Ratings “belgesel” User1 => 23 ,, User2 => 35, User3 => 13, User4 => 4 … Apriori algorithm is used to create inverted index for terms
  34. 34. Evaluation Results
  35. 35. Evaluation Results Precision 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 fold-1 fold-2 fold-3 Avg. Precision Precision of Baseline Precision Precision of Baseline fold-1 0,7534 0,232 fold-2 0,6875 0,33 fold-3 0,7298 0,335 Avg. 0,7235667 0,299
  36. 36. Evaluation Results Recall Recall Recall of Baseline fold-1 0,7011 0,4354 fold-2 0,6398 0,4908 fold-3 0,6690488 0,4623 Avg. 0,6699829 0,462833333 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 fold-1 fold-2 fold-3 Avg. Recall Recall of Baseline
  37. 37. Evaluation Results f – measure 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 fold-1 fold-2 fold-3 Avg. f-measure f-measurel of Baseline f-measure f-measurel of Baseline fold-1 0,7263097 0,3230432 fold-2 0,6627929 0,3173665 fold-3 0,6930991 0,3201792 Avg. 0,6940672 0,3201963
  38. 38. Evaluation Results Effect of Context 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Without Context With only Genre Context With Genre and Time of Day Context f-measure Fold-1 Fold-2 Fold-3 Avg
  39. 39. Evaluation Results Effect of co-occurance 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 fold-1 fold-2 fold-3 Avg. f-measure without co- occurance f-measure with co- occurance f-measure without co- occurance f-measure with co-occurance fold-1 0,5230432 0,7263097 fold-2 0,59173665 0,6627929 fold-3 0,58201792 0,6930991 Avg. 0,565599257 0,694067233
  40. 40. Evaluation Results Overall Improvement based on Context 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Baseline Our Method without Context Our Method with Genre Context Our Method with Time and Genre Context f-measure
  41. 41. Evaluation Results Overall Improvement based on Co-occurence 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Baseline Our Method with Context & without co-occurence Our Method with Context & with co- occurance f-measure
  42. 42. CONCLUSION AND FUTURE WORK
  43. 43. In this thesis … • a graph-based core model for representing users and TV programs with their attributes is presented • User logs are collected from connected TVs • TV program content information is collected from web. • Presented graph based model is constructed by aggragation • A hybrid recommandation system is created over this graph based model
  44. 44. In this thesis … • The evaluation of the propoesd system is presented. • A baseline method is employed over the exact same dataset. • The evaluation results of proposed system are compared w.r.t the evaluation results of baseline method. • The effect of context in TV domain and the effect of co- occurance relations are presented.
  45. 45. Future Work • Performance improvements by importing social media profiles of users • Performance improvements by importing demographic information of users • The maturity and quality of TV program contents can be improved to achieve better evaluation results. • When the system is used online, the explicit feedbacks can be collected from the users for more accurate similarty measurements.
  46. 46. Future Work • Based on this graph based model; – Creating personalized TV User interfaces, – Creating targeted advertisements based on TV program preferences of the users, – Recommending cinemas, theaters or shows based on TV program preferences of the users – TV program, actor, director, genre etc. rating and popularity estimations
  47. 47. References [1] ―Uydu Yayın Lisansı Olan Kuruluşlar Listesi (RD ve TV olarak).‖ [Online]. Available: http://yayinci.rtuk.org.tr/web/ web_giris.php. [Accessed: 19-Aug-2014]. . [2] F. S. da Silva, L. G. P. Alves, and G. Bressan, ―PersonalTVware: An infrastructure to support the context-aware recommendation for personalized digital TV,‖ Int. J. Comput. Theory Eng., vol. 4, no. 2, pp. 131–135, 2012. [3] T. Bogers, ―Movie recommendation using random walks over the contextual graph,‖ in Proc. of the 2nd Intl. Workshop on Context-Aware Recommender Systems, 2010. [4] P. Resnick and H. R. Varian, ―Recommender systems .( Special Section : Recommender Systems )( Cover Story ) Recommender systems .( Special Section : Recommender Systems )( Cover Story ),‖ vol. 56, no. March, pp. 1–3, 1997. [5] M. Balabanović and Y. Shoham, ―Fab: content-based, collaborative recommendation,‖ Commun. ACM, vol. 40, no. 3, 1997. [6] G. Adomavicius, B. Mobasher, F. Ricci, and A. Tuzhilin, ―Context-Aware Recommender Systems,‖ in Recommender Systems Handbook, 2011, pp. 217–253.
  48. 48. References [7] Z. Huang, W. Chung, T.-H. Ong, and H. Chen, ―A graph-based recommender system for digital library,‖ in Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, 2002, pp. 65–73. [8] R. Bambini, P. Cremonesi, and R. Turrin, ―A recommender system for an iptv service provider: a real large-scale production environment,‖ in Recommender systems handbook, 2011, pp. 299–331. [9] M.-W. Kim, E.-J. Kim, W.-M. Song, S.-Y. Song, and A. R. Khil, ―Efficient recommendation for smart TV contents,‖ in Big Data Analytics, 2012, pp. 158–167. [10] B. Martinez, A. Belen, E. Costa-Montenegro, J. C. Burguillo, M. Rey-L{‘o}pez, M.-F. F. A, and A. Peleteiro, ―A hybrid content-based and item-based collaborative filtering approach to recommend TV programs enhanced with singular value decomposition,‖ Inf. Sci. (Ny)., vol. 180, no. 22, pp. 4290–4311, 2010. [11] Z. Yu, X. Zhou, Y. Hao, and J. Gu, ―TV program recommendation for multiple viewers based on user profile merging,‖ User Model. User-adapt. Interact., vol. 16, no. 1, pp. 63–82, 2006. [12] L. Aroyo, L. Nixon, and L. Miller, ―NoTube: the television experience enhanced by online social and semantic data,‖ in Consumer Electronics-Berlin (ICCE-Berlin), 2011 IEEE International Conference on, 2011, pp. 269–273. [13] TV Rehberi- Televizyon Programı ve Yayın Akışı Radikal‘de.‖ [Online]. Available: http://www.radikal.com.tr/tvrehberi/. [Accessed: 10-Aug-2014]. [14] J. Beel, S. Langer, M. Genzmehr, B. Gipp, C. Breitinger, and A. Nürnberger, “Research Paper Recommender System Evaluation: A Quantitative Literature Survey,” RepSys, 2013.
  49. 49. Thank you …

Notes de l'éditeur

  • First of all thank you all for being here today. I will present my thesis “A graph based core model and a hybrid recommender system for tv users”. My supervisor is Prof. Çiçekli. This research is a part of a SANTEZ project which is cunducted by collaberation of METU, Arçelik and Ministry of science, industry and technolgy. If you have any quaestions or comments during the presentation please do not hesitate to stop me.
  • Here you can see my outline. I will give a brief introduction about our work. Then i will present previous works on this domain with brief background information. After that i willl move on presenting our graph-based core model and hybrid recommender system created over this model. I will share the experiments we conducted and evaluation results. Afterall i will finish my presentation with conclusion remarks and future work discussion.
  • For many years, TV has been the most used conventional media tool which enables the users to access mass information about their interests. In Turkey, while the media landscape continues to evolve with digital alternatives such as video on demand services, social media etc., traditional media still remains to be the widely used entertainment service among users according to RTUK.

    Meanwhile, broadcasting technology is also getting improved day by day which brings new channels to born and start casting. National satellites are placed in order to serve more number of channels in better quality. Currently there are 311 national channels according to RTUK reports. All these channles are broadcasting differenet contents at the same time. Among this massive amount of content, users are getting lost to find the relevant content that fits their interests .  

    TVs met the internet connection and smartTVs connected TVs and IPTVs are being sold. Since internet connectivity on TV market is improving, recommendation systems can be the key solution for finding relevant TV programs. Tracking user behavior over the internet connection of TV enables constructing user models including the metadata of TV programs.

    Moreover althoogh there are lots of research and applications for other languages there are not much work on Turkish TV program content






  • In this study, we proposed a graph-based core model to represent users and TV programs with their attributes and content. Different edge-weight metrics are proposed (similarity metrics).
    Different approaches that can be applied over this model are presented.

    A hybrid recommandation system which includes content-based filtering, collaberation, context-awareness is proposed.

    The proposed hybrid recommender system is evaluated and compared with a baseline method.

  • Recommender systems are software tools and techniques employed to find relevant items for users who are faced with a huge amount of items to select. Recommender systems are used to obtain relevant items using people’s previous decisions in a self-driven way

    Content-based recommender systems suggest items to users by analyzing the item descriptions in order to identify which items are of interest to a particular user. The recommended items are similar in content to the items that the user was previously interested. Thus, item representation and user profiling are main concerns of the content based recommender systems

    Collaborative-filtering methods, without any need for content information about items, can recommend items to the users based on the similar users’ interests or habits. These systems cluster the users based on thier item preferences and suggest items to the users based on the other users item preferences in the same cluster.

    A knowledge-based recommender system is a system which include predefined utilities by user or system itself. According to these utilities, system filters the items to be suggested and solves the basic problem for users to face with huge amount of items.

    Contextual information is recognized by researches and practitioners in many disciplines as improving the quality of the recommender systems. Although most of the recommender systems focus on the relevance of the items, some attributes such as time and place should be taken into consideration in order to produce successful recommendations by prefiltering or post filtering the items according to specified context

    Hybrid recommender systems are systems that combine the approaches presented above. The hybridization of those systems aims to use advantages of these systems and excluding the disadvantages of these systems.

    The adv and disadv of these systems are as follows.
  • Huang et al. presented a method for keyword search and recommendation for digital libraries using two-layered graph architecture. The first layer of the graph includes nodes of customer type and the other layer includes nodes of book type. The relationships in the graph layers show the similarity between customers and books accordingly

  • Another video recommender system presented by Bogers et al. uses contextual information to build the graph-based data model. In the contextual graph, the node types are users, movies, tags, actors and genre. A utilization of random walk algorithm, namely ContextWalk is applied to calculate the similarity between node types. The main advantage of this approach is that the similarity between different and same node types can be examined using their graph-based data model
  • In this thesis, we present a graph-based core model to represent users and items including their inter-item relevancy in  TV domain. This graph based model comprises different node types and weighted / unweighted edges, representing the items and the relatedness of these items respectively. The constructed core model can be the base data model for different types  of applications  
  • The contents of TV programs and interests of users to these TV programs are modeled as connected nodes by both weighted and un-weighted edges to form a graph. An edge weight between two nodes represents the degree of relatedness between these two nodes. Each edge weight is calculated using different metrics based on the type of the nodes that they are connecting.
    A user profile is described as a sub-graph which includes a user node and all the other nodes and edges obtained by a traversal  starting from this user node. Similarly, a program profile is a sub-graph which includes a program node and and all other nodes  and edges obtained by a traversal starting from  this program node. Thus, a user profile and a program profile may contain all the types of nodes.
     
  • Lets see the node types and edges in detail. There are user nodes which represent real world users. There are TV program nodes that these users have wathced. There are genre and time of day nodes which are gathered from tv progrms. There are also actor and director nodes which are attributes of tv programs and from the description of tv programs term and named entity nodes were gathered.

    These nodes are categorized since if a new node type included in the model it fits these categories and edge weight metric would be the same. User and program nodes are categorized as entity nodes since they fit the real world entites and the profiles of these entites can be extracted. TOD and Ganre nodes are categorized as context nodes since they can improve the quality of the system as context variables. Actors and directors are connected to program nodes as atributes and descriptor nodes comprises named entity and term nodes.

  • The edge weight metric (relatedness metric) between user nodes and program nodes are calculated using rating values.

    The relatedness between programs and term and entity nodes measured by tfidf measurements.

    Moreover similar types of nodes are connected by co-occurance relations according to their occurance in the same TV program. These relations are directed relations and the co-occurance weight changes between two same nodes since the formula take the other nodes into account. (Tahtada göster.)
  • The graph based model we proposed can be used in different systems in TV domain.
    Since the content of TV programs added in the graph model, content based recommandation systems can be created over this model.
    Since it includes the relations between users over program nodes, it can be employed for collabertive filtering approaches.
    Since there are contextuel nodes context aware systems can also be applied.

    Moreover, if the information of which user watches TV program with whom is satified, these user nodes can be merged to generate a group node and according to these group nodes tv program recommendation c an be applied.

    If the information of channel list of users is satified these listings can be arranged according to user profiles gathered from this graph based model.

    Other types of items for example theaters, cinemeas etc. Can also be recommended to the users according to their previous TV watching interests.

    Targetted advertisements according to users TV program preferenc es can also be applicable over this model.

  • The user logs are collected form Arçelik, Beko and Grundig smart TVs between these dates. Approximately 10 million logs were collected from 2938 distinct users after cleaning operations.
  • The logs coming from TV devices is shown on the left. And the attributes we have used in this research is shown in right.
  • There are several ways to colect TV program content informaiton. It can be gathered from EPG but in Turkey channels does not share mature content information over EPG.
    There are content provider companies that satisfy TV program information but they are highly expensive to employ in research.
    Web crawling and scrapping are used to collect program information, digiturk and radikal are analyzed and radikal is chosen since its content is much richer than digiturk.
  • Here you can see a sample TV program content provided by radikal.

  • In the same time interval with the user logs TV program data is collected from radikal. In that time interval 3769 distinct TV programs, 36 distinct genres, 1653 distinct actors, 469 distinct directors, 676 distinct named entities, 3159 distinct terms
  • While deciding for intervals of time of day slots, we have used the dayparting article on Wikipedia[60] and merged some of the day parts which are too short for our purposes.
  • Another enhancement process was applied on the description field of TV program content.
    ZEMBEREK is used for analyzing the terms in the description. First the description is tokenized into words.
    After stop words are excluded from these word list, the words are stemmed and added to program content information
  • In order to extract the named entites from description field of TV programs, dbpedia soptting apis were used.
    For example in this description the words were annotated as surface form as seen in the picture.
  • In order to create the graph based model the user logs and tv program content information needed to aggragrated.
    Since the common attributes were channel name, start time and end time,, Aggregation process is applied by matching these fields.
    In order to match the channel names a channel name mapper is created.
  • For each channel logs were anlyzed and matched with the radikal tv program content informaiton.
  • Spreading activation is an algorithm designed for searching over associative networks, neural networks or semantic networks
    Standard methodology offers labeling nodes with a weight called “activation” and propagating over connected other nodes . While propagating the nodes are labeled with an activation value which decays over each propagation. Activation process may originate from different paths.  
    In a graph-based system the associations are weighted according to the relatedness of those items. An item is labeled with an activation value and the associated items are also activated with decreasing weight according to the association weights. In the end of propagating over items, the related items are gathered in a ranked way according to the activation values of these items
    Activating a user node and traversing through the linked nodes to spread the activation value, the system collects the closer TV programs for that user.
  • Spreading activation starts with the user node that the TV program will be recommended.
    After that propagetion moves to the program nodes, each connected program node is labeled with an activation value gathered from rating edge weights
    Same ganre and same time of day program nodes are labeled to propagetion. (context prefiltering)
    For each program node propagation moves over actor director term and entity nodes and sets the activation value according to given formulas.
    The decay factor is the heuristic that we used for differentiatie the weights of different node types on similarty.
    The unwachted tv programs are collected with their activation values which we used to rank these tv programs.
  • So, since the spreading activation algorithm is propageting over content of the TV programs our system is a content based system
    Since the pre filtering is applied according to genre and time of day our system is context aware system
    Since the propagation can move through other user nodes our system is a collaberative system.
  • When a new program comes into the system, the terms appearing in that program are compared with the inverted index matrix and according to the ratings (interests) of users on these terms, the user set who has remarkable interest on these terms are collected and the program is recommended to this user set

×