SlideShare une entreprise Scribd logo
1  sur  27
DATA MINING AND MACHINE LEARNING
                                                                   IN A NUTSHELL


LEARNING TO RECOGNIZE RELIABLE USERS AND CONTENT IN SOCIAL
       MEDIA WITH COUPLED MUTUAL REINFORCEMENT



                                                      Mohammad-Ali Abbasi
                                                            http://www.public.asu.edu/~mabbasi2/

                                       SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERING
                                                           ARIZONA STATE UNIVERSITY

                Arizona State University
                                                                  http://dmml.asu.edu/ to Recognize Reliable Users and Content in Social Media with
                                                                                   Learning
  Data Mining and Machine Learning Lab
                                           Data Mining and Machine Learning- in a nutshell                                                               1
                                                                                                                          Coupled Mutual Reinforcement
About the paper

  • Learning to Recognize Reliable Users and Content in
    Social Media with Coupled Mutual Reinforcement
     – Jiang Bian, Georgia Institute of Technology
     – Yandong Liu, Emory University
     – Ding Zhou, Facebook Inc.
     – Eugene Agichtein, Emory University
     – Hongyuan Zha, Georgia Institute of Technology


  • WWW 2009, April 20–24, 2009, Madrid, Spain.


                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      2   2
Community Question Answering (CQA)

  • Is a popular forum for users to pose questions
    for the other users to answer
  • User can ask natural language question
  • Is comparable with regular web search




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      3   3
Sample: Yahoo! Answers

  • Introduction




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      4   4
What is the problem?

  • retrieve answers from a social media archive
    with a large amount information
         – the quality, accuracy, and comprehensiveness of
           the submitted questions and answers varies
           widely
         – A large fraction of the content is not useful for
           answering queries
         – Current approaches require large amounts of
           manually labeled data



                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      5   5
CQA environment

  • Users
  • Question
  • Answers




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      6   6
The goal

  • Identify
         – High quality Answers
         – High quality Questions
         – High reputation Users
  • Simultaneously
  • With the minimum manual labeling




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      7   7
The contribution of this paper

  • developing a semi-supervised coupled mutual
    reinforcement framework for simultaneously
    calculating content quality and user
    reputation, that requires relatively few labeled
    examples to initialize the training process
  • more effective for finding high-quality
    answers, questions, and users.
  • improves the accuracy of search over CQA
    archives

                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      8   8
Current approaches



  • Relies on the users reputation,
  • OR- Require large amount of supervision,
  • OR- focus on the network properties of the
    CQA
  • without considering the actual content of the
    information exchanged


                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      9   9
How to rank?

  • Current approaches:
         – Content Quality
         OR
         – User reputation
  • This paper:
         – Content Quality
         AND
         – User reputation


                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1010
Definitions

  • Question Quality
         – A question's effectiveness at attracting high quality
           answers
  • Answer Quality
         – the responsiveness, accuracy, and comprehensiveness of
           the answer to a question.
  • Question Reputation
         – indicating the expected quality of the questions posted by
           a user
  • Answer Reputation
         – the expected quality of the answers posted by a user.

                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1111
Model the problem

  • Solution




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1212
Mutual reinforcement Principle

  • Solution




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1313
Feature Space: X(Q), X(A), X(U)

  • Solution




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1414
Learning quality and reputation(Coupled Mutual Reinforcement)

  • P(x): probability of being “good”
  • Model of P(x)




  • B is Coefficient of the linear model and can be
    found by maximizing:



                  Arizona State University
                                             Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
    Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1515
Non independent equations

  • Conditional log-likelihood




  • Objective function




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1616
CQA-MR Algorithm

  • Solution




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1717
Experimental Setup- Data Collection

  • From Yahoo! Answers with their API
  • Use TREC QA benchmark Archive to crawl QA
    archives (http://trec.nist.gov/data.html)
  • Get all available answers for each question
         – 107293 users
         – 27354 questions
         – 224617 answers



                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1818
Evaluation Metrics

  • Mean Reciprocal Rank(MRR)
         – the reciprocal of the rank at which the first relevant
           answer was returned, or 0 if none of the top N results
           contained a relevant answer

  • Precision at K
         – for a given query, P(K) reports the fraction of answers
           ranked in the top K results that are labeled as relevant

  • Mean Average of Precision(MAP)
         – the mean of the precision at K values calculated after each
           relevant answer was retrieved


                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      1919
User reputation methods

  • Baseline
         – users are ranked by “indegree" (number of answers
           posted)
  • HITS
         – Users are ranked based on their authority scores
  • CQA-Supervised
         – classify users into those with "high" and "low”
           reputation, and trained over the features
  • CQA-MR
         – predict user reputation based on mutual- reinforcement
           algorithm

                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      2020
CQA Retrieval methods

  • Baseline
         – score computed as the difference of up votes and down
           votes
  • Gbrank
         – did not include answer and question quality and user
           reputation
  • GBrank-HITS:
         – optimized GBrank by adding user reputation calculated by
           HITS algorithm
  • GBrank-Supervised
         – supervised learning and optimize GBrank by adding
           obtained quality
                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      2121
Precision at K for the top contributors

  • Experiments




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      2222
Precision at K

  • Experiments




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      2323
Accuracy

  • Experiments




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      2424
Training Labels

  • Experiments




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      2525
Training Labels

  • Experiments




                 Arizona State University
                                            Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
   Data Mining and Machine Learning Lab                                                                                             Coupled Mutual Reinforcement      2626
Mohammad-Ali Abbasi (Ali),
                                         Ali, is a Ph.D student at Data Mining
                                         and Machine Learning Lab, Arizona
                                         State University.
                                         His research interests include Data
                                         Mining, Machine Learning, Social
                                         Computing, and Social Media Behavior
                                         Analysis.

                                         http://www.public.asu.edu/~mabbasi2/

              Arizona State University
                                          Data Mining and Machine Learning- in a nutshell   Learning to Recognize Reliable Users and Content in Social Media with
Data Mining and Machine Learning Lab                                                                                              Coupled Mutual Reinforcement      27

Contenu connexe

Tendances

Developing online learning resources: Big data, social networks, and cloud co...
Developing online learning resources: Big data, social networks, and cloud co...Developing online learning resources: Big data, social networks, and cloud co...
Developing online learning resources: Big data, social networks, and cloud co...
eraser Juan José Calderón
 
Howard harris again
Howard harris againHoward harris again
Howard harris again
MEL SIG
 
The Research on E-book-oriented Mobile Learning System Environment Applicatio...
The Research on E-book-oriented Mobile Learning System Environment Applicatio...The Research on E-book-oriented Mobile Learning System Environment Applicatio...
The Research on E-book-oriented Mobile Learning System Environment Applicatio...
haiguang fang
 

Tendances (16)

Developing online learning resources: Big data, social networks, and cloud co...
Developing online learning resources: Big data, social networks, and cloud co...Developing online learning resources: Big data, social networks, and cloud co...
Developing online learning resources: Big data, social networks, and cloud co...
 
Validation of Dunbar's number in Twitter conversations
Validation of Dunbar's number in Twitter conversationsValidation of Dunbar's number in Twitter conversations
Validation of Dunbar's number in Twitter conversations
 
Digital Citizenship: Information, Communication and Media Literacy
Digital Citizenship: Information, Communication and Media LiteracyDigital Citizenship: Information, Communication and Media Literacy
Digital Citizenship: Information, Communication and Media Literacy
 
Competitive & Saleable E-Content for Philippine Libraries
Competitive & Saleable E-Content for Philippine LibrariesCompetitive & Saleable E-Content for Philippine Libraries
Competitive & Saleable E-Content for Philippine Libraries
 
Howard harris again
Howard harris againHoward harris again
Howard harris again
 
Libraries Case Study
Libraries Case StudyLibraries Case Study
Libraries Case Study
 
Psychoanalysis of Online Behavior and Cyber Conduct of Chatters in Chat Rooms...
Psychoanalysis of Online Behavior and Cyber Conduct of Chatters in Chat Rooms...Psychoanalysis of Online Behavior and Cyber Conduct of Chatters in Chat Rooms...
Psychoanalysis of Online Behavior and Cyber Conduct of Chatters in Chat Rooms...
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
 
Categorize balanced dataset for troll detection
Categorize balanced dataset for troll detectionCategorize balanced dataset for troll detection
Categorize balanced dataset for troll detection
 
Challenges and prospects of using information communication technologies (ict...
Challenges and prospects of using information communication technologies (ict...Challenges and prospects of using information communication technologies (ict...
Challenges and prospects of using information communication technologies (ict...
 
Information Literacy And Digital Literacy: Life Long Learning Initiatives
Information Literacy And Digital Literacy: Life Long Learning InitiativesInformation Literacy And Digital Literacy: Life Long Learning Initiatives
Information Literacy And Digital Literacy: Life Long Learning Initiatives
 
Erm0523
Erm0523Erm0523
Erm0523
 
Educational and Technological Standards of Educational Software Based on Inte...
Educational and Technological Standards of Educational Software Based on Inte...Educational and Technological Standards of Educational Software Based on Inte...
Educational and Technological Standards of Educational Software Based on Inte...
 
eResearch activities brochure
eResearch activities brochureeResearch activities brochure
eResearch activities brochure
 
The Research on E-book-oriented Mobile Learning System Environment Applicatio...
The Research on E-book-oriented Mobile Learning System Environment Applicatio...The Research on E-book-oriented Mobile Learning System Environment Applicatio...
The Research on E-book-oriented Mobile Learning System Environment Applicatio...
 
Discovering the Digital World Together, Safely and Critically
Discovering the Digital World Together, Safely and Critically Discovering the Digital World Together, Safely and Critically
Discovering the Digital World Together, Safely and Critically
 

En vedette

25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
BigData AAI
 
Active learning
Active learningActive learning
Active learning
Ali Abbasi
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
smj
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
Saif Ullah
 

En vedette (20)

Collective Intelligence, part II
Collective Intelligence, part IICollective Intelligence, part II
Collective Intelligence, part II
 
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
25 June 2013 - Advanced Data Analytics - an Introduction - Paul kennedy Power...
 
Disaster Relief Using Social Media Data
Disaster Relief Using Social Media DataDisaster Relief Using Social Media Data
Disaster Relief Using Social Media Data
 
Real-World Behavior Analysis through a Social Media Lens
Real-World Behavior Analysis through a Social Media LensReal-World Behavior Analysis through a Social Media Lens
Real-World Behavior Analysis through a Social Media Lens
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Active learning
Active learningActive learning
Active learning
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Data Mining and Intrusion Detection
Data Mining and Intrusion Detection Data Mining and Intrusion Detection
Data Mining and Intrusion Detection
 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 
Data mining
Data miningData mining
Data mining
 
Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...Data Mining in Healthcare:  How Health Systems Can Improve Quality and Reduce...
Data Mining in Healthcare: How Health Systems Can Improve Quality and Reduce...
 
Data Mining: Association Rules Basics
Data Mining: Association Rules BasicsData Mining: Association Rules Basics
Data Mining: Association Rules Basics
 
Social Media Mining: An Introduction
Social Media Mining: An IntroductionSocial Media Mining: An Introduction
Social Media Mining: An Introduction
 
Data mining slides
Data mining slidesData mining slides
Data mining slides
 
Data mining
Data miningData mining
Data mining
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniquesData mining (lecture 1 & 2) conecpts and techniques
Data mining (lecture 1 & 2) conecpts and techniques
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similaire à Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

Paperprotopreso
PaperprotopresoPaperprotopreso
Paperprotopreso
RschDev
 
Abc MOOC presentation 2013
Abc MOOC presentation 2013Abc MOOC presentation 2013
Abc MOOC presentation 2013
tkotak013
 
Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions
R A Akerkar
 
Lecture 5: Personalization on the Social Web (2013)
Lecture 5: Personalization on the Social Web (2013)Lecture 5: Personalization on the Social Web (2013)
Lecture 5: Personalization on the Social Web (2013)
Lora Aroyo
 
What Is Social Learning Sandeep Rathod4 Wud2011
What Is Social Learning Sandeep Rathod4 Wud2011What Is Social Learning Sandeep Rathod4 Wud2011
What Is Social Learning Sandeep Rathod4 Wud2011
UExS
 
Vizi tech usa product presentation
Vizi tech usa product presentationVizi tech usa product presentation
Vizi tech usa product presentation
joeparlier
 

Similaire à Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement (20)

Collective Inteligence Part I
Collective Inteligence Part ICollective Inteligence Part I
Collective Inteligence Part I
 
Paperprotopreso
PaperprotopresoPaperprotopreso
Paperprotopreso
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
Abc MOOC presentation 2013
Abc MOOC presentation 2013Abc MOOC presentation 2013
Abc MOOC presentation 2013
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
 
Visualising activity in learning networks using open data and educational ...
Visualising activity in learning networks   using open data and educational  ...Visualising activity in learning networks   using open data and educational  ...
Visualising activity in learning networks using open data and educational ...
 
Social job search
Social job searchSocial job search
Social job search
 
Classroom of the futurev3
Classroom of the futurev3Classroom of the futurev3
Classroom of the futurev3
 
Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions
 
Lonn-Plourde_ELI11_MISI
Lonn-Plourde_ELI11_MISILonn-Plourde_ELI11_MISI
Lonn-Plourde_ELI11_MISI
 
Lecture 5: Personalization on the Social Web (2013)
Lecture 5: Personalization on the Social Web (2013)Lecture 5: Personalization on the Social Web (2013)
Lecture 5: Personalization on the Social Web (2013)
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
 
Research Data Services at the University of Utah
Research Data Services at the University of UtahResearch Data Services at the University of Utah
Research Data Services at the University of Utah
 
Developing a digital literacy framework in your school
Developing a digital literacy framework in your schoolDeveloping a digital literacy framework in your school
Developing a digital literacy framework in your school
 
CHAPTER -12 it.pptx
CHAPTER -12 it.pptxCHAPTER -12 it.pptx
CHAPTER -12 it.pptx
 
What Is Social Learning Sandeep Rathod4 Wud2011
What Is Social Learning Sandeep Rathod4 Wud2011What Is Social Learning Sandeep Rathod4 Wud2011
What Is Social Learning Sandeep Rathod4 Wud2011
 
Network Awareness Tool - Learning Analytics in the workplace: 
Detecting and ...
Network Awareness Tool - Learning Analytics in the workplace: 
Detecting and ...Network Awareness Tool - Learning Analytics in the workplace: 
Detecting and ...
Network Awareness Tool - Learning Analytics in the workplace: 
Detecting and ...
 
2013: The Connected Workplace
2013: The Connected Workplace2013: The Connected Workplace
2013: The Connected Workplace
 
Learning Analytics Oer
Learning Analytics OerLearning Analytics Oer
Learning Analytics Oer
 
Vizi tech usa product presentation
Vizi tech usa product presentationVizi tech usa product presentation
Vizi tech usa product presentation
 

Dernier

Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
fonyou31
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Dernier (20)

APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 

Learning To Recognize Reliable Users And Content In Social Media With Coupled Mutual Reinforcement

  • 1. DATA MINING AND MACHINE LEARNING IN A NUTSHELL LEARNING TO RECOGNIZE RELIABLE USERS AND CONTENT IN SOCIAL MEDIA WITH COUPLED MUTUAL REINFORCEMENT Mohammad-Ali Abbasi http://www.public.asu.edu/~mabbasi2/ SCHOOL OF COMPUTING, INFORMATICS, AND DECISION SYSTEMS ENGINEERING ARIZONA STATE UNIVERSITY Arizona State University http://dmml.asu.edu/ to Recognize Reliable Users and Content in Social Media with Learning Data Mining and Machine Learning Lab Data Mining and Machine Learning- in a nutshell 1 Coupled Mutual Reinforcement
  • 2. About the paper • Learning to Recognize Reliable Users and Content in Social Media with Coupled Mutual Reinforcement – Jiang Bian, Georgia Institute of Technology – Yandong Liu, Emory University – Ding Zhou, Facebook Inc. – Eugene Agichtein, Emory University – Hongyuan Zha, Georgia Institute of Technology • WWW 2009, April 20–24, 2009, Madrid, Spain. Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2 2
  • 3. Community Question Answering (CQA) • Is a popular forum for users to pose questions for the other users to answer • User can ask natural language question • Is comparable with regular web search Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 3 3
  • 4. Sample: Yahoo! Answers • Introduction Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 4 4
  • 5. What is the problem? • retrieve answers from a social media archive with a large amount information – the quality, accuracy, and comprehensiveness of the submitted questions and answers varies widely – A large fraction of the content is not useful for answering queries – Current approaches require large amounts of manually labeled data Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 5 5
  • 6. CQA environment • Users • Question • Answers Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 6 6
  • 7. The goal • Identify – High quality Answers – High quality Questions – High reputation Users • Simultaneously • With the minimum manual labeling Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 7 7
  • 8. The contribution of this paper • developing a semi-supervised coupled mutual reinforcement framework for simultaneously calculating content quality and user reputation, that requires relatively few labeled examples to initialize the training process • more effective for finding high-quality answers, questions, and users. • improves the accuracy of search over CQA archives Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 8 8
  • 9. Current approaches • Relies on the users reputation, • OR- Require large amount of supervision, • OR- focus on the network properties of the CQA • without considering the actual content of the information exchanged Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 9 9
  • 10. How to rank? • Current approaches: – Content Quality OR – User reputation • This paper: – Content Quality AND – User reputation Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1010
  • 11. Definitions • Question Quality – A question's effectiveness at attracting high quality answers • Answer Quality – the responsiveness, accuracy, and comprehensiveness of the answer to a question. • Question Reputation – indicating the expected quality of the questions posted by a user • Answer Reputation – the expected quality of the answers posted by a user. Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1111
  • 12. Model the problem • Solution Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1212
  • 13. Mutual reinforcement Principle • Solution Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1313
  • 14. Feature Space: X(Q), X(A), X(U) • Solution Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1414
  • 15. Learning quality and reputation(Coupled Mutual Reinforcement) • P(x): probability of being “good” • Model of P(x) • B is Coefficient of the linear model and can be found by maximizing: Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1515
  • 16. Non independent equations • Conditional log-likelihood • Objective function Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1616
  • 17. CQA-MR Algorithm • Solution Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1717
  • 18. Experimental Setup- Data Collection • From Yahoo! Answers with their API • Use TREC QA benchmark Archive to crawl QA archives (http://trec.nist.gov/data.html) • Get all available answers for each question – 107293 users – 27354 questions – 224617 answers Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1818
  • 19. Evaluation Metrics • Mean Reciprocal Rank(MRR) – the reciprocal of the rank at which the first relevant answer was returned, or 0 if none of the top N results contained a relevant answer • Precision at K – for a given query, P(K) reports the fraction of answers ranked in the top K results that are labeled as relevant • Mean Average of Precision(MAP) – the mean of the precision at K values calculated after each relevant answer was retrieved Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 1919
  • 20. User reputation methods • Baseline – users are ranked by “indegree" (number of answers posted) • HITS – Users are ranked based on their authority scores • CQA-Supervised – classify users into those with "high" and "low” reputation, and trained over the features • CQA-MR – predict user reputation based on mutual- reinforcement algorithm Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2020
  • 21. CQA Retrieval methods • Baseline – score computed as the difference of up votes and down votes • Gbrank – did not include answer and question quality and user reputation • GBrank-HITS: – optimized GBrank by adding user reputation calculated by HITS algorithm • GBrank-Supervised – supervised learning and optimize GBrank by adding obtained quality Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2121
  • 22. Precision at K for the top contributors • Experiments Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2222
  • 23. Precision at K • Experiments Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2323
  • 24. Accuracy • Experiments Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2424
  • 25. Training Labels • Experiments Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2525
  • 26. Training Labels • Experiments Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 2626
  • 27. Mohammad-Ali Abbasi (Ali), Ali, is a Ph.D student at Data Mining and Machine Learning Lab, Arizona State University. His research interests include Data Mining, Machine Learning, Social Computing, and Social Media Behavior Analysis. http://www.public.asu.edu/~mabbasi2/ Arizona State University Data Mining and Machine Learning- in a nutshell Learning to Recognize Reliable Users and Content in Social Media with Data Mining and Machine Learning Lab Coupled Mutual Reinforcement 27

Notes de l'éditeur

  1. An answer is likely to be of high quality if the content is responsive and well-formed, the question has high quality, and the answerer is of high answer-reputation. At the same time, a user will have high answer-reputation if she posts high- quality answers, and high question-reputation if she tends to post high-quality questions. Finally, a question is likely to be of high quality if it is well stated, is posted by a user with high question reputation, and attracts high-quality answers.
  2. Circular definition from user to contentIn previous work, question and answer quality were defined in terms of content, form, and style, as manually labeled by paid editors [2]. In contrast, our definitions focus on question effectiveness, and the answer accuracy { both quantities that can be measured automatically and do not necessarily require human judgments.
  3. Proportional User question-reputation and user answers-reputationQuestions QualityAnswers QualityY q (~a) denotes the quality of answera’s question
  4. 3000 factoid questions as the initial set of queries and select 1250 factoid questions that has at least one similar question in Yahoo! Answers archive
  5. and reputation as extra features for learning the ranking function