SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
IBM China Research Laboratory




                        Social Map Based Recommendation for
                        Content-Centric Social Websites




  IBM Research - China
  Presenter: Shiwan Zhao (zhaosw@cn.ibm.com)

  Pharos Team:

                                赵石顽   袁泉   张夏天     郑文涛

       Advisor: Michelle Zhou, Rongyao Fu, Changyan Chi   1
IBM China Research Laboratory


About me

   1993~1998
     – B.S. Computer Science, Tsinghua University
   1998~2000
     – M.S. Computer Science, Tsinghua University
   2000~now
     – IBM Research - China


   2007~now
     – Focus on recommendation technologies

                                                    2
IBM China Research Laboratory


Agenda

   Part 1:
   – Problem & challenges
   – Pharos solution overview
   – Demo
   Part 2:
   – Some technology details




                                   3
IBM China Research Laboratory


Problem

   Content-centric social websites (e.g., forums,
   wikis, and blogs) have flourished with the
   exponential growth of user-generated information
   – Overwhelming amount
   – Evolving over time
   – Not well organized
   It is hard for users, especially new users, to grasp
   what’s out there and then find out interested
   information



                                                          4
Example China Research Laboratory
     IBM


      A Blog website contains huge amount of dynamically evolving content (blog
      entries), while not providing effective navigation approaches
       – Search
           • Be useful when users have well-defined goals
       – Recent entries
       – Top entries by
           • most comments
           • most ratings
           • most visits
       – Featured blog entries
       – Tag cloud
       – …
      Like looking for needles in a haystack, without guidance, novice users can
      NOT find anything interesting, then leaves BlogCentral quickly (low
      stickiness), and won’t come back again (low stickiness)



                                                                                   5
IBM China Research Laboratory


Existing solutions & challenges

   Researchers have developed recommender
   systems to solve this information overload
   problem
   – E.g. Blog/News/Webpage recommender


   However, current recommenders must address
   two challenges:
   – difficult to make effective recommendations for new users
     (the cold start problem) due to the lack of user
     information
   – difficult to explain recommendation rationales to end
     users to make the recommendation more trustworthy
                                                                 6
IBM China Research Laboratory

Pharos Solution
   Dynamically create a social map helping users find out who's talking
                      about what in an online site.

 Social map creation
 – Modeling & summarizing
   time-sensitive user
   behaviors of content-centric
   online sites as a set of
   “latent communities”
 Social map based
 recommendations
 – Provide social landmarks
   for new users to jump start
 – Provide personalized social
   map for experienced users
   to effectively navigate the
   community

                                                                          7
IBM China Research Laboratory
Demo screenshot




                                    John




                                    Steve




                                    Michael




                                    Alice




                                    Tom




                                              8
IBM China Research Laboratory


Agenda

   Part 1:
   – Problem & challenges
   – Pharos solution overview
   – Demo
   Part 2:
   – Some technology details




                                   9
IBM China Research Laboratory

Pharos Overview

                                                          * Multi-faceted recommendation
   Triggers      Visual Recommendation
                      Explanations                        Info item (page, fragment)
Explicit
                                                          People (reference to Bluepages, URL)
Implicit                 Recommendation
                            Algorithms                    Community (latent, dynamic
                                                          community)


                           Social Map
                 .. .         . .. .
                                  .      . .......
                                              ..        Time-sensitive social map as
                 . ...       ..... . .
                     .        ....                        recommendation context
                                         target user

                                                 Time

                         Content Modeling
                         Content Modeling
                         Behavior Mining
                          Behavior Mining


                  User behavior on content
                                                                                                 10
IBM China Research Laboratory

Pharos Technical Focus

                                     Visual Recommendation
                                          Explanations


    3. Community
                                             Recommendation
       summary                                  Algorithms

                                                                            2. Community/item/
                                                                               people ranking
                                               Social Map
                                     .. .         . .. .
                                                      .      . .......
                                                                  ..
                                     . ...       ..... . .
                                         .        ....
                                                             target user
  1. Latent community
                                                                     Time
       extraction
                                             Content Modeling
                                             Content Modeling
                                             Behavior Mining
                                              Behavior Mining


                                      User behavior on content
                                                                                            11
IBM China Research Laboratory


Latent community extraction

    Three approaches
    – Directly model user-content relationships by using co-
      clustering methods
    – Group people firstly, then find associated content
    – Group content firstly, then find associated people




                                                               12
IBM China Research Laboratory

Approach 1: time-elastic co-clustering

     How long of the time window size we should use
     to mining the communities?
                                              How long is right?

                                               . . ...
                                                     .
            ... ... .. ..... ... . .. . . .. ..... ....
                   .. .                                        . .
                                                          ........ .......
                                                          .. .. ... .... ..
                                                               . .
            ... ........... . .... .. . .. .. . ..
            .. . . . .... . .......         ..               ...... .......
                                                              .. . ..
           . . . ... . . . . . . .
                  . . . . . ..                    ..       . ... .
                                                                              Time
     Time-Elastic ad hoc                              April 2009
    community detection
                        Community Map




       GraphScope: Parameter-free Mining of Large Time-evolving
       Graphs, Jimeng Sun, et al. KDD’07
                                                                                     13
IBM China Research Laboratory
Input Data – Graph Stream
 User actions as a stream
          ... .............. ...... .. . ..... . .. ........ .......
                         . .                 . . . .. . ....
         ... . . .................. . . . . .
                    . . .. .                                  .
                                                           ... .
                . .. ..                        .. .
                                                                       Time

       Split click stream into many small time atom frame




         ... . . .. .................. . . . . . . . .... ....
           . . .... .. . . .                            . ..
                . . . . . ..                     . . . .. .
                                                                       Time



       A frame click stream data can
       be presented by a user-item
       matrix (Graph).
        – In the matrix, 1 means one
          interaction between user
          and item.
                                                                              14
IBM China Research Laboratory
Approach
   Two Step
    – Co-clustering graphs
    – Decide whether a new come graph should be merged with
      current segment or start a new segment
   Based on the MDL (Minimum Description Length) of
   graphs
    – MDL is the limit of graphs can be compressed
    – Decide merging or splitting a segment
        • If compress graphs together can save more encoding cost
          than compress them respectively, we merge the new graphs
          with current segment.
        • Otherwise, we start a new segment by the new Graph


                                                                     15
IBM China Research Laboratory

Pros and cons
    Pros
     – Clustering users and items on the same time
     – Parameter free
           • Don’t need to assign cluster numbers
     – Automatically decide the size of time window
    Cons
     – Fixed Graph Size
           • Any graphs must have the same size (rows and columns)
           • Can’t handle new users and items
     – Can’t handle large scale graphs
     – Can’t guarantee the optimal result
     – Result on very sparse graph is not very good
           • Communities don’t make sense.
           • Our data is extremely sparse (< 0.1%)

                                                                     16
IBM China Research Laboratory
Approach 2: evolutionary spectral clustering for user
clustering

     Discover communities within a time window
      – Get high quality clustering in each time window
     Model community evolution for a sequence of time windows
      – Make the evolution between time windows smooth

                                     Community Map


                .. .        .. ..          ... ..
                                            ..
                                                       .. ..
               ... ..       .. ..            .. .     ... ..
                ..                       .. .          ..

                                                                Time
           Jan 2009      Feb 2009      Mar 2009     Apr 2009

                               In BlogCentral Domain

                                                                       17
IBM China Research Laboratory
Evolutionary framework

      Basic Idea
       – Cost Function: Cost = α*CS +β*CT
            • Snapshot cost (CS), measures the snapshot quality of the current
              clustering result with respect to the current data features,
            • Temporal cost (CT), measures the temporal smoothness in terms of the
              goodness-of-fit of the current clustering result with respect to either
              historic data features or historic clustering results
      Two Evolutionary framework
       – PCQ for preserving cluster quality, the current partition is applied to
         historic data and the resulting cluster quality determines the temporal
         cost.
       – PCM for preserving cluster membership, the current partition is directly
         compared with the historic partition and the resulting difference
         determines the temporal cost.
       – PCQ is our currently implemented framework



       Evolutionary Spectral Clustering by Incorporating Temporal
       Smoothness, Yun Chi, et al. KDD’07                                               18
IBM China Research Laboratory

Approach 3: LDA for content clustering

     Latent Dirichlet Allocation (LDA), a probabilistic latent
     semantic model for topic analysis
                                 ⎛ N                               ⎞ k
        p (w α , β ) = ∫ p (θ α )⎜ ∏∑ p ( z n θ ) p ( wn z n , β ) ⎟d θ
                                 ⎜ n =1 z                          ⎟
                                 ⎝        n                        ⎠
                                                                 [Blei et al. 03]

     LDA is a generative probabilistic model of a corpus. The basic
     idea is that the documents are represented as random mixtures
     over latent topics, where a topic is characterized by a
     distribution over words.




                                                                                    19
IBM China Research Laboratory

Graphical Model of LDA




                                     20
IBM China Research Laboratory


Latent community extraction - comparison

    Co-clustering
    – Not work well for extremely sparse data (<0.1%)
    Spectral clustering for user
    – Most behaviors are from anonymous user, difficult to
      distinguish users
    – Topics are not concentrated for each community
    * LDA for content clustering
    – Users are more likely to be interested in content




                                                             21
IBM China Research Laboratory

Pharos Technical Focus

                                     Visual Recommendation
                                          Explanations


    3. Community
                                             Recommendation
       summary                                  Algorithms

                                                                            2. Item/people
                                                                                 ranking
                                               Social Map
                                     .. .         . .. .
                                                      .      . .......
                                                                  ..
                                     . ...       ..... . .
                                         .        ....
                                                             target user
  1. Latent community
                                                                     Time
       extraction
                                             Content Modeling
                                             Content Modeling
                                             Behavior Mining
                                              Behavior Mining


                                      User behavior on content
                                                                                             22
IBM China Research Laboratory
Item/People Ranking
                                                                                               PR( p j )
                                                  PR( pi ) = (1 − d )cvi + d      ∑
 Authority-based ranking by
 context-sensitive PageRank,
 considering                                                                   p j ∈M ( pi )    L( p j )
 – Time factor
                                                        Context vector (e.g., item attributes)
 – Context information, e.g., item
   attributes, report chain of people



                                            People          Blog entries
       Influential people:
       Active author with                     A                   1
       high quality entries                                                    Influential entry:
                                                                              written by influential
                                                                             authors, high visited /
                                              B                    2               commented
  Authority from author to entry
  Authority from entry to author
                                              C                    3
  Authority from commenter/rater to entry
  Authority from visitor to entry
                                              D                    4
                                                                                                      23
IBM China Research Laboratory

Pharos Technical Focus

                                     Visual Recommendation
                                          Explanations


    3. Community
                                             Recommendation
       summary                                  Algorithms

                                                                            2. Item/people
                                                                                 ranking
                                               Social Map
                                     .. .         . .. .
                                                      .      . .......
                                                                  ..
                                     . ...       ..... . .
                                         .        ....
                                                             target user
  1. Latent community
                                                                     Time
       extraction
                                             Content Modeling
                                             Content Modeling
                                             Behavior Mining
                                              Behavior Mining


                                      User behavior on content
                                                                                             24
IBM China Research Laboratory


Community Summary & visualization

  Community representative keywords extraction
   – Modified TF/IDF
   – Content topic modeling by LDA (Latent Dirichlet Allocation)




  Visualization
   – A bubble chart layout (used by ManyEyes2) to pack top-N
     communities tightly on the social map
       • bubble’s size is determined by community’s ‘hotness’
   – Inside each community, Wordle3 layout used to pack labels
     tightly
                                                                   25
IBM China Research Laboratory


Summary
 Model, detect, and use a social map that summarizes user behavior of
 online sites to make accurate and trustworthy recommendations


   Increase recommendation accuracy
   – Helps “cold start” problem by providing new users with “social landmarks” of
     a social site to jump start their engagement
   – Provides users with overall social awareness to compensate for
     recommendation inaccuracy
   Enhance recommendation trustworthiness
   – Explain recommendation results in the context of a social map
   Interactive recommendation
   – User can navigation through the social map to find what they need


                                                                                26
IBM China Research Laboratory




                      Thanks!



                                27

Contenu connexe

Similaire à Pharos Social Map Based Recommendation For Content Centric Social Websites

CUbRIK at SMILA Conference in Berlin
CUbRIK at SMILA Conference in BerlinCUbRIK at SMILA Conference in Berlin
CUbRIK at SMILA Conference in BerlinCUbRIK Project
 
8 Information Architecture Better Practices
8 Information Architecture Better Practices8 Information Architecture Better Practices
8 Information Architecture Better PracticesLouis Rosenfeld
 
Smatphone
SmatphoneSmatphone
Smatphonesinpaak
 
GeniUS:Generic User Modeling Library for the Social Semantic Web
GeniUS:Generic User Modeling Library for the Social Semantic WebGeniUS:Generic User Modeling Library for the Social Semantic Web
GeniUS:Generic User Modeling Library for the Social Semantic WebQi Gao
 
Developing the Business and Management Studies Portal (MBS)
Developing the Business and Management Studies Portal (MBS) Developing the Business and Management Studies Portal (MBS)
Developing the Business and Management Studies Portal (MBS) ALISS
 
Aum workshop paper_presentation
Aum workshop paper_presentationAum workshop paper_presentation
Aum workshop paper_presentationAhmad Ammari
 
Mobile UX for Academic Libraries
Mobile UX for Academic LibrariesMobile UX for Academic Libraries
Mobile UX for Academic LibrariesKevin Rundblad
 
GeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebGeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebWeb Information Systems, TU Delft
 
CEN e-mediating framework (AECT 2012)
CEN e-mediating framework (AECT 2012)CEN e-mediating framework (AECT 2012)
CEN e-mediating framework (AECT 2012)LeRoy Hill
 
Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functiona...
Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functiona...Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functiona...
Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functiona...BayCHI
 
Designing Useful and Usable Augmented Reality Experiences
Designing Useful and Usable Augmented Reality Experiences Designing Useful and Usable Augmented Reality Experiences
Designing Useful and Usable Augmented Reality Experiences Yan Xu
 
How to design a distributed system
How to design a distributed systemHow to design a distributed system
How to design a distributed systemJinglun Li
 
LiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceLiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceAliaksandr Birukou
 
Discovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyDiscovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyHong (Jenny) Jing
 
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
Art-Making Generative AI and Instructional Design Work:  An Early BrainstormArt-Making Generative AI and Instructional Design Work:  An Early Brainstorm
Art-Making Generative AI and Instructional Design Work: An Early BrainstormShalin Hai-Jew
 
Web search algorithms and user interfaces
Web search algorithms and user interfacesWeb search algorithms and user interfaces
Web search algorithms and user interfacesStefanos Anastasiadis
 
Defining and Specifying Functional and Content Requirements
Defining and Specifying Functional and Content RequirementsDefining and Specifying Functional and Content Requirements
Defining and Specifying Functional and Content RequirementsLuis Carlos Aceves
 
Designing with the user in mind: how user-centred design (UCD) can work for ...
Designing with the user in mind: how user-centred design (UCD) can work for ...Designing with the user in mind: how user-centred design (UCD) can work for ...
Designing with the user in mind: how user-centred design (UCD) can work for ...Jennifer Cham
 
Co-Creation with Lead Users on the Digital Research Platform www.dieNEONauten.de
Co-Creation with Lead Users on the Digital Research Platform www.dieNEONauten.deCo-Creation with Lead Users on the Digital Research Platform www.dieNEONauten.de
Co-Creation with Lead Users on the Digital Research Platform www.dieNEONauten.deNicolas Loose
 

Similaire à Pharos Social Map Based Recommendation For Content Centric Social Websites (20)

CUbRIK at SMILA Conference in Berlin
CUbRIK at SMILA Conference in BerlinCUbRIK at SMILA Conference in Berlin
CUbRIK at SMILA Conference in Berlin
 
2 1-research roadmap task force michele missikoff
2 1-research roadmap task force michele missikoff2 1-research roadmap task force michele missikoff
2 1-research roadmap task force michele missikoff
 
8 Information Architecture Better Practices
8 Information Architecture Better Practices8 Information Architecture Better Practices
8 Information Architecture Better Practices
 
Smatphone
SmatphoneSmatphone
Smatphone
 
GeniUS:Generic User Modeling Library for the Social Semantic Web
GeniUS:Generic User Modeling Library for the Social Semantic WebGeniUS:Generic User Modeling Library for the Social Semantic Web
GeniUS:Generic User Modeling Library for the Social Semantic Web
 
Developing the Business and Management Studies Portal (MBS)
Developing the Business and Management Studies Portal (MBS) Developing the Business and Management Studies Portal (MBS)
Developing the Business and Management Studies Portal (MBS)
 
Aum workshop paper_presentation
Aum workshop paper_presentationAum workshop paper_presentation
Aum workshop paper_presentation
 
Mobile UX for Academic Libraries
Mobile UX for Academic LibrariesMobile UX for Academic Libraries
Mobile UX for Academic Libraries
 
GeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic WebGeniUS: Generic User Modeling Library for the Social Semantic Web
GeniUS: Generic User Modeling Library for the Social Semantic Web
 
CEN e-mediating framework (AECT 2012)
CEN e-mediating framework (AECT 2012)CEN e-mediating framework (AECT 2012)
CEN e-mediating framework (AECT 2012)
 
Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functiona...
Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functiona...Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functiona...
Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functiona...
 
Designing Useful and Usable Augmented Reality Experiences
Designing Useful and Usable Augmented Reality Experiences Designing Useful and Usable Augmented Reality Experiences
Designing Useful and Usable Augmented Reality Experiences
 
How to design a distributed system
How to design a distributed systemHow to design a distributed system
How to design a distributed system
 
LiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceLiquidPub: Services at Service of Science
LiquidPub: Services at Service of Science
 
Discovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyDiscovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case Study
 
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
Art-Making Generative AI and Instructional Design Work:  An Early BrainstormArt-Making Generative AI and Instructional Design Work:  An Early Brainstorm
Art-Making Generative AI and Instructional Design Work: An Early Brainstorm
 
Web search algorithms and user interfaces
Web search algorithms and user interfacesWeb search algorithms and user interfaces
Web search algorithms and user interfaces
 
Defining and Specifying Functional and Content Requirements
Defining and Specifying Functional and Content RequirementsDefining and Specifying Functional and Content Requirements
Defining and Specifying Functional and Content Requirements
 
Designing with the user in mind: how user-centred design (UCD) can work for ...
Designing with the user in mind: how user-centred design (UCD) can work for ...Designing with the user in mind: how user-centred design (UCD) can work for ...
Designing with the user in mind: how user-centred design (UCD) can work for ...
 
Co-Creation with Lead Users on the Digital Research Platform www.dieNEONauten.de
Co-Creation with Lead Users on the Digital Research Platform www.dieNEONauten.deCo-Creation with Lead Users on the Digital Research Platform www.dieNEONauten.de
Co-Creation with Lead Users on the Digital Research Platform www.dieNEONauten.de
 

Plus de gu wendong

宜信大数据金融云-CSDN
宜信大数据金融云-CSDN宜信大数据金融云-CSDN
宜信大数据金融云-CSDNgu wendong
 
Resys China 创刊号
Resys China 创刊号Resys China 创刊号
Resys China 创刊号gu wendong
 
孙超 - Recommendation Algorithm as a product
孙超 - Recommendation Algorithm as a product孙超 - Recommendation Algorithm as a product
孙超 - Recommendation Algorithm as a productgu wendong
 
EdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale DataEdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale Datagu wendong
 
王守崑 - 豆瓣在推荐领域的实践和思考
王守崑 - 豆瓣在推荐领域的实践和思考王守崑 - 豆瓣在推荐领域的实践和思考
王守崑 - 豆瓣在推荐领域的实践和思考gu wendong
 
From Search To Discover by Wanght
From Search To Discover by WanghtFrom Search To Discover by Wanght
From Search To Discover by Wanghtgu wendong
 
Understanding Rbm by WangYuanTao
Understanding Rbm by WangYuanTaoUnderstanding Rbm by WangYuanTao
Understanding Rbm by WangYuanTaogu wendong
 
Netflix Prize by Xlvector
Netflix Prize by XlvectorNetflix Prize by Xlvector
Netflix Prize by Xlvectorgu wendong
 

Plus de gu wendong (8)

宜信大数据金融云-CSDN
宜信大数据金融云-CSDN宜信大数据金融云-CSDN
宜信大数据金融云-CSDN
 
Resys China 创刊号
Resys China 创刊号Resys China 创刊号
Resys China 创刊号
 
孙超 - Recommendation Algorithm as a product
孙超 - Recommendation Algorithm as a product孙超 - Recommendation Algorithm as a product
孙超 - Recommendation Algorithm as a product
 
EdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale DataEdChang - Parallel Algorithms For Mining Large Scale Data
EdChang - Parallel Algorithms For Mining Large Scale Data
 
王守崑 - 豆瓣在推荐领域的实践和思考
王守崑 - 豆瓣在推荐领域的实践和思考王守崑 - 豆瓣在推荐领域的实践和思考
王守崑 - 豆瓣在推荐领域的实践和思考
 
From Search To Discover by Wanght
From Search To Discover by WanghtFrom Search To Discover by Wanght
From Search To Discover by Wanght
 
Understanding Rbm by WangYuanTao
Understanding Rbm by WangYuanTaoUnderstanding Rbm by WangYuanTao
Understanding Rbm by WangYuanTao
 
Netflix Prize by Xlvector
Netflix Prize by XlvectorNetflix Prize by Xlvector
Netflix Prize by Xlvector
 

Dernier

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 

Dernier (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

Pharos Social Map Based Recommendation For Content Centric Social Websites

  • 1. IBM China Research Laboratory Social Map Based Recommendation for Content-Centric Social Websites IBM Research - China Presenter: Shiwan Zhao (zhaosw@cn.ibm.com) Pharos Team: 赵石顽 袁泉 张夏天 郑文涛 Advisor: Michelle Zhou, Rongyao Fu, Changyan Chi 1
  • 2. IBM China Research Laboratory About me 1993~1998 – B.S. Computer Science, Tsinghua University 1998~2000 – M.S. Computer Science, Tsinghua University 2000~now – IBM Research - China 2007~now – Focus on recommendation technologies 2
  • 3. IBM China Research Laboratory Agenda Part 1: – Problem & challenges – Pharos solution overview – Demo Part 2: – Some technology details 3
  • 4. IBM China Research Laboratory Problem Content-centric social websites (e.g., forums, wikis, and blogs) have flourished with the exponential growth of user-generated information – Overwhelming amount – Evolving over time – Not well organized It is hard for users, especially new users, to grasp what’s out there and then find out interested information 4
  • 5. Example China Research Laboratory IBM A Blog website contains huge amount of dynamically evolving content (blog entries), while not providing effective navigation approaches – Search • Be useful when users have well-defined goals – Recent entries – Top entries by • most comments • most ratings • most visits – Featured blog entries – Tag cloud – … Like looking for needles in a haystack, without guidance, novice users can NOT find anything interesting, then leaves BlogCentral quickly (low stickiness), and won’t come back again (low stickiness) 5
  • 6. IBM China Research Laboratory Existing solutions & challenges Researchers have developed recommender systems to solve this information overload problem – E.g. Blog/News/Webpage recommender However, current recommenders must address two challenges: – difficult to make effective recommendations for new users (the cold start problem) due to the lack of user information – difficult to explain recommendation rationales to end users to make the recommendation more trustworthy 6
  • 7. IBM China Research Laboratory Pharos Solution Dynamically create a social map helping users find out who's talking about what in an online site. Social map creation – Modeling & summarizing time-sensitive user behaviors of content-centric online sites as a set of “latent communities” Social map based recommendations – Provide social landmarks for new users to jump start – Provide personalized social map for experienced users to effectively navigate the community 7
  • 8. IBM China Research Laboratory Demo screenshot John Steve Michael Alice Tom 8
  • 9. IBM China Research Laboratory Agenda Part 1: – Problem & challenges – Pharos solution overview – Demo Part 2: – Some technology details 9
  • 10. IBM China Research Laboratory Pharos Overview * Multi-faceted recommendation Triggers Visual Recommendation Explanations Info item (page, fragment) Explicit People (reference to Bluepages, URL) Implicit Recommendation Algorithms Community (latent, dynamic community) Social Map .. . . .. . . . ....... .. Time-sensitive social map as . ... ..... . . . .... recommendation context target user Time Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 10
  • 11. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Community/item/ people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 11
  • 12. IBM China Research Laboratory Latent community extraction Three approaches – Directly model user-content relationships by using co- clustering methods – Group people firstly, then find associated content – Group content firstly, then find associated people 12
  • 13. IBM China Research Laboratory Approach 1: time-elastic co-clustering How long of the time window size we should use to mining the communities? How long is right? . . ... . ... ... .. ..... ... . .. . . .. ..... .... .. . . . ........ ....... .. .. ... .... .. . . ... ........... . .... .. . .. .. . .. .. . . . .... . ....... .. ...... ....... .. . .. . . . ... . . . . . . . . . . . . .. .. . ... . Time Time-Elastic ad hoc April 2009 community detection Community Map GraphScope: Parameter-free Mining of Large Time-evolving Graphs, Jimeng Sun, et al. KDD’07 13
  • 14. IBM China Research Laboratory Input Data – Graph Stream User actions as a stream ... .............. ...... .. . ..... . .. ........ ....... . . . . . .. . .... ... . . .................. . . . . . . . .. . . ... . . .. .. .. . Time Split click stream into many small time atom frame ... . . .. .................. . . . . . . . .... .... . . .... .. . . . . .. . . . . . .. . . . .. . Time A frame click stream data can be presented by a user-item matrix (Graph). – In the matrix, 1 means one interaction between user and item. 14
  • 15. IBM China Research Laboratory Approach Two Step – Co-clustering graphs – Decide whether a new come graph should be merged with current segment or start a new segment Based on the MDL (Minimum Description Length) of graphs – MDL is the limit of graphs can be compressed – Decide merging or splitting a segment • If compress graphs together can save more encoding cost than compress them respectively, we merge the new graphs with current segment. • Otherwise, we start a new segment by the new Graph 15
  • 16. IBM China Research Laboratory Pros and cons Pros – Clustering users and items on the same time – Parameter free • Don’t need to assign cluster numbers – Automatically decide the size of time window Cons – Fixed Graph Size • Any graphs must have the same size (rows and columns) • Can’t handle new users and items – Can’t handle large scale graphs – Can’t guarantee the optimal result – Result on very sparse graph is not very good • Communities don’t make sense. • Our data is extremely sparse (< 0.1%) 16
  • 17. IBM China Research Laboratory Approach 2: evolutionary spectral clustering for user clustering Discover communities within a time window – Get high quality clustering in each time window Model community evolution for a sequence of time windows – Make the evolution between time windows smooth Community Map .. . .. .. ... .. .. .. .. ... .. .. .. .. . ... .. .. .. . .. Time Jan 2009 Feb 2009 Mar 2009 Apr 2009 In BlogCentral Domain 17
  • 18. IBM China Research Laboratory Evolutionary framework Basic Idea – Cost Function: Cost = α*CS +β*CT • Snapshot cost (CS), measures the snapshot quality of the current clustering result with respect to the current data features, • Temporal cost (CT), measures the temporal smoothness in terms of the goodness-of-fit of the current clustering result with respect to either historic data features or historic clustering results Two Evolutionary framework – PCQ for preserving cluster quality, the current partition is applied to historic data and the resulting cluster quality determines the temporal cost. – PCM for preserving cluster membership, the current partition is directly compared with the historic partition and the resulting difference determines the temporal cost. – PCQ is our currently implemented framework Evolutionary Spectral Clustering by Incorporating Temporal Smoothness, Yun Chi, et al. KDD’07 18
  • 19. IBM China Research Laboratory Approach 3: LDA for content clustering Latent Dirichlet Allocation (LDA), a probabilistic latent semantic model for topic analysis ⎛ N ⎞ k p (w α , β ) = ∫ p (θ α )⎜ ∏∑ p ( z n θ ) p ( wn z n , β ) ⎟d θ ⎜ n =1 z ⎟ ⎝ n ⎠ [Blei et al. 03] LDA is a generative probabilistic model of a corpus. The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words. 19
  • 20. IBM China Research Laboratory Graphical Model of LDA 20
  • 21. IBM China Research Laboratory Latent community extraction - comparison Co-clustering – Not work well for extremely sparse data (<0.1%) Spectral clustering for user – Most behaviors are from anonymous user, difficult to distinguish users – Topics are not concentrated for each community * LDA for content clustering – Users are more likely to be interested in content 21
  • 22. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Item/people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 22
  • 23. IBM China Research Laboratory Item/People Ranking PR( p j ) PR( pi ) = (1 − d )cvi + d ∑ Authority-based ranking by context-sensitive PageRank, considering p j ∈M ( pi ) L( p j ) – Time factor Context vector (e.g., item attributes) – Context information, e.g., item attributes, report chain of people People Blog entries Influential people: Active author with A 1 high quality entries Influential entry: written by influential authors, high visited / B 2 commented Authority from author to entry Authority from entry to author C 3 Authority from commenter/rater to entry Authority from visitor to entry D 4 23
  • 24. IBM China Research Laboratory Pharos Technical Focus Visual Recommendation Explanations 3. Community Recommendation summary Algorithms 2. Item/people ranking Social Map .. . . .. . . . ....... .. . ... ..... . . . .... target user 1. Latent community Time extraction Content Modeling Content Modeling Behavior Mining Behavior Mining User behavior on content 24
  • 25. IBM China Research Laboratory Community Summary & visualization Community representative keywords extraction – Modified TF/IDF – Content topic modeling by LDA (Latent Dirichlet Allocation) Visualization – A bubble chart layout (used by ManyEyes2) to pack top-N communities tightly on the social map • bubble’s size is determined by community’s ‘hotness’ – Inside each community, Wordle3 layout used to pack labels tightly 25
  • 26. IBM China Research Laboratory Summary Model, detect, and use a social map that summarizes user behavior of online sites to make accurate and trustworthy recommendations Increase recommendation accuracy – Helps “cold start” problem by providing new users with “social landmarks” of a social site to jump start their engagement – Provides users with overall social awareness to compensate for recommendation inaccuracy Enhance recommendation trustworthiness – Explain recommendation results in the context of a social map Interactive recommendation – User can navigation through the social map to find what they need 26
  • 27. IBM China Research Laboratory Thanks! 27