SlideShare a Scribd company logo
1 of 67
Download to read offline
Probabilistic Graphical Models for
Credibility Analysis in Evolving
Online Communities
Subhabrata Mukherjee
Advisor: Prof. Gerhard Weikum
Max Planck Institute for Informatics, Germany
Email: subhabrata.mukherjee.ju@gmail.com
Web: https://people.mpi-inf.mpg.de/~smukherjee/
Why Credibility Analysis?
▶ “Rapid spread of misinformation online" – one of the top 10 challenges
as per The World Economic Forum
1
http://www.washingtonstarnews.com/proof-obamacare-requires-all-americans-to-be-chipped/
2
http://theracketreport.com/several-injured-in-zombie-like-attack-at-tennessee-walmart-as-man-tries-to-eat-his-victims/
3
https://www.forbes.com/sites/tonybradley/2017/07/31/facebook-ai-creates-its-own-language-in-creepy-preview-of-our-potential-future/
Concerns
3
Misinformation for health can
have hazardous consequences
Rumors, spams, fake reviews have significant
business impact for retailers
Why focus on Online
Communities?
Massive repositories of knowledge accessed
by regular users and professionals
↘ 59% of adult U.S. population and half
of U.S. physicians rely on online
resources [IMS Health Report, 2014]
↘ 40% of online consumers consult online
reviews before buying products
[Nielson Corporation, 2016]
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Applications: Product
Reviews
● Conclusions
5
Truth Discovery
Structured data (e.g., SPO
triples, tables, networks)
Objective facts (e.g.,
Obama_BornIn_Hawaii vs.
Obama_BornIn_Kenya)
No contextual data (text)
Limited metadata, external KB
6
Linguistic Analysis
Unstructured text
Subjective information (e.g.,
opinion spam, bias, viewpoint )
External KB (e.g., WordNet,
KG)
No network / interactions,
metadata
1. How can we jointly leverage network,
context and users for credibility
analysis in online communities?
2. How can we model temporal evolution?
3. How can we deal with limited data?
4. How can we generate interpretable
explanations for credibility verdict?
7
Research Questions
Contributions
● Credibility Analysis Framework for Online Communities
↘ Classification: Health Communities [SIGKDD 2014]
↘ Regression: News Communities [CIKM 2015]
● Temporal Evolution of Online Communities
[ICDM 2015, SIGKDD 2016]
● Credibility Analysis of Product Reviews
[ECML-PKDD 2016, SDM 2017]
8
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Applications: Product
Reviews
● Conclusions
9
Problem Statement: Credibility Classification
Given a set of statements from different users find which ones are
True or False
10
Use-case: Finding Drug Side-effects from Healthforums
Problem: Given a set of posts from different users, extract credible statements
(subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users
11
Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil:
People on Drugs: Credibility of User Statements in Health Communities. In KDD, 2014
12
Problem: Given a set of posts from different users, extract credible statements
(subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users
Use-case: Finding Drug Side-effects from Healthforums
Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil:
People on Drugs: Credibility of User Statements in Health Communities. In KDD, 2014
Modeling Interactions in Online Communities
● Each user, post, and statement is a
random variable with edges depicting
interactions
User Post
Post Statement
User Statement in post
● Variables have observable features
(e.g, authority, emotionality,
subjectivity)
Modeling Interactions in Online Communities
● Each user, post, and statement is a
random variable with edges depicting
interactions
User post
Post statement
User statement in post
Credible statements appear in objective posts
Trustworthy users corroborate on credible statements in objective posts
KEY IDEA
15
Conditional Random Field to Exploit Joint Interactions
(Users + Network + Context)
Partial Supervision: Expert stated (top 20%) side-effects of drugs as training labels
Model predicts labels of unobserved statements.
How to complement
expert medical knowledge
with large scale
non-expert data?
Semi-Supervised Conditional Random Field
1. Estimate user trustworthiness:
2. Estimate label of unknown statements Su
by Gibbs Sampling:
3. Maximize log-likelihood to estimate feature weights:
4. Apply E-Step and M-Step till convergence
16
Healthforum Dataset
● Healthboards.com community (www.healthboards.com) with 850,000
registered users and 4.5 million posts
17
● Expert labels about drugs from MayoClinic (www.mayoclinic.org)
↘ 6 widely used drugs for experimentation
18
19
What constitutes credible language?
Affective Emotions
confidence
sympathy
self-esteem
eagerness
coolness
compunction
anxiety
embarrassment
misery
distress
20
What constitutes credible language?
determiner (this, that,..)
negation (not, never, ..)
second person (you, ..)
conjunction (therefore,
consequently, ..)
contrast (despite, though, ..)
question (what, why, ..)
conditional (if)
adverb (maybe, probably, ..)
modality (might, could, ..)
Discourse and Modalities
Drawbacks
1. Does not model Topics of discussion:
- Topic-specific expertise of users and sources
2. Conditional Random Fields cannot be used for numeric output /
regression
21
Where can we find numeric ratings?
22
In many online communities users rate items on their quality
- E.g. Amazon, Yelp, TripAdvisor
In news communities like Digg, Reddit, NewsTrust users provide
feedback on articles (items) from different sources
- User feedback often biased and subjective due to diverse
viewpoints
Credibility Regression Framework: News Communities
Topics
Climate Change
Sources
trunews.com
Articles
“Global warming is a
hoax”
Sources / Users
Scientificamerican.com
snopes.com
user-donald
Reviews & Ratings
scientific analysis, 1.5/ 5,
conspiratory theory
23
Problem Statement: Given articles from various sources and feedback from different
users: predict credibility rating of articles
Reviews / Ratings
scientific analysis, 1.5/ 5,
conspiratory theory
Topics
Climate Change
Articles
“Global warming is a
hoax”
Sources
trunews.com
Sources / Users
Scientificamerican.com
snopes.com
user-donald
Idea: Trustworthy sources publish objective articles corroborated by expert
users with credible reviews/ratings 24
We use CRF to capture these mutual interactions in
news communities (e.g., newstrust.net, digg, reddit)
to jointly rank all of the underlying factors.
Credibility Regression Framework: News Communities
Online Communities: Factors
Related to Ensemble Learning, Learning to Rank
How to incorporate continuous ratings instead of discrete labels in CRF ?
26
Probability Mass Function for discrete labels:
Probability Density Function for continuous ratings:
Subhabrata Mukherjee and Gerhard Weikum:
Leveraging Joint Interactions for Credibility Analysis in News Comm. In CIKM, 2015
How to incorporate continuous ratings instead of discrete labels in CRF ?
27
● We show that a certain energy function for clique potential --- geared for
reducing mean-squared-error --- results in multivariate gaussian p.d.f. !!!
● Constrained Gradient Ascent for inference
Subhabrata Mukherjee and Gerhard Weikum:
Leveraging Joint Interactions for Credibility Analysis in News Comm. In CIKM, 2015
Energy Function to Combine All
Predicting Article Credibility Ratings in Newstrust.net
29
Progressive decrease in mean squared error with
more network interactions, and context
Summary
● Semi-supervised and Continuous CRF to jointly identify
trustworthy users, credible statements, and reliable postings in
online communities
● A framework to incorporate richer aspects like user expertise,
topics / facets, temporal evolution etc. (discussed next)
30
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Applications: Product
Reviews
● Conclusions
31
● Online communities are dynamic, as users join and leave;
acquire new vocabulary; evolve and mature over time
● Trustworthiness and expertise of users evolve over time
Temporal Evolution
32
How to capture evolving user expertise?
“My first DSLR. Excellent camera, takes great pictures with high
definition, without a doubt it makes honor to its name.”
[Aug, 1997]
“The EF 75-300 mm lens is only good to be used outside. The 2.2X
HD lens can only be used for specific items; filters are useless if
ISO, AP,... . The short 18-55mm lens is cheap and should have a
hood to keep light off lens.”
[Oct, 2012]
Illustrative Example for Review Communities
33
● Consider following camera reviews by the same user John:
Subhabrata Mukherjee, Hemank Lamba and Gerhard Weikum, Experience-aware Item
Recommendation in Evolving Review Communities, ICDM 2015:
34
“The EF 75-300 mm lens is only good to be used outside. The 2.2X
HD lens can only be used for specific items; filters are useless if
ISO, AP,... . The short 18-55mm lens is cheap and should have a
hood to keep light off lens.”
[Oct, 2012]
Illustrative Example for Review Communities
● Consider following camera reviews by John:
“My first DSLR. Excellent camera, takes great pictures with high
definition, without a doubt it makes honor to its name.”
[Aug, 1997]
How can we quantify this change in
users’ maturity / experience ?
How can we model this evolution /
progression in users’ maturity?
Mukherjee et al.: ICDM 2015, SIGKDD 2016
Prior Work: Discrete Experience Evolution
35
Assumption: At each timepoint a user
remains at the same level of experience, or
moves to the next level
1. Users at similar levels of experience
have similar facet preferences, and
rating style (McAuley and Leskovec:
WWW 2013)
2. Additionally, our work exploits
similar writing style (Mukherjee,
Lamba and Weikum: ICDM 2015)
Language Model (KL) Divergence Increases with Experience
36Experienced users have a distinctive writing style different than that of amateurs
Abrupt change in:
- Expertise level
- Language model
Drawback
37
Assumption: At each timepoint a user
remains at the same level of experience, or
moves to the next level
Abrupt Transition
Continuous Experience Evolution
38
Subhabrata Mukherjee, Stephan Guennemann and Gerhard Weikum:
Continuous Experience-aware Language Model. In KDD, 2016
Continuous Experience Evolution: Assumptions
★ Continuous-time process, always positive
★ Markovian assumption: Experience at time t depends on that at t-1
★ Drift: Overall trend to increase over time
★ Volatility: Progression may not be smooth with occasional volatility
E.g.: series of expert reviews followed by a sloppy one
39
Geometric Brownian Motion
40
We show these properties to be satisfied by the
continuous-time stochastic process: Geometric Brownian Motion
Language Model (LM) Evolution
● Users' LM also evolve with experience evolution
● Smoothly evolve over time preserving Markov property of
experience evolution
● Variance of LM should change with experience change
● Brownian Motion to model this desiderata:
41
Inference
42
Topic Model (Blei et
al., JMLR '03)
Users ( Author-topic
model,Rosen-Zvi et
al., UAI '04)
Continuous Time
(Dynamic topic model,
Wang et al., UAI '08)
Continuous Experience
(this work)
Topic + time + users
+
+
+
Sampling based Inference
Kalman Filter for
LM evolution
Metropolis Hastings
for Exp. evolution
43
Gibbs Sampling for Facets
Kalman Filter for LM
evolution
Metropolis Hastings
for Exp. evolution
44
Gibbs Sampling for Facets
Sampling based Inference
Dataset Statistics
45
Can we recommend items better, if we consider
users’ experience to consume them?
46
User-experience aware model (blue) incurs the least error in item rating prediction
Log-likelihood, Smoothness, and Convergence
47
Interpretability: Top Words* by Experienced Users
48
Most Experience Least Experience
BeerAdvocate chestnut_hued near_viscous cherry_wood
sweet_burning faint_vanilla woody_herbal
citrus_hops mouthfeel
originally flavor color poured
pleasant bad bitter sweet
Amazon aficionados minimalist underwritten
theatrically unbridled seamless retrospect
overdramatic
viewer entertainment battle actress
tells emotional supporting
Yelp smoked marinated savory signature
contemporary selections delicate texture
mexican chicken salad love better
eat atmosphere sandwich
NewsTrust health actions cuts medicare oil climate
spending unemployment
bad god religion iraq responsibility
questions clear powerful
*Learned by our generative model without supervision
Interpretability: Top Words* by Experienced Users
49
Most Experience Least Experience
BeerAdvocate chestnut_hued near_viscous cherry_wood
sweet_burning faint_vanilla woody_herbal
citrus_hops mouthfeel
originally flavor color poured pleasant
bad bitter sweet
Amazon aficionados minimalist underwritten
theatrically unbridled seamless retrospect
overdramatic
viewer entertainment battle actress
tells emotional supporting
Yelp smoked marinated savory signature
contemporary selections delicate texture
mexican chicken salad love better eat
atmosphere sandwich
NewsTrust health actions cuts medicare oil climate
spending unemployment
bad god religion iraq responsibility
questions clear powerful
Experienced users in beer
community use more “fruity” words
to describe taste and smell of beers
Experienced users in news community
discuss about policies and regulations in
contrast to amateurs interested on
polarizing topics
*Learned by our generative model without supervision
Take-away
● Insights from Geometric Brownian Motion trajectory of users:
○ Experienced users mature faster than amateurs
○ Progression depends more on time spent in community than on activity
● Users' experience evolve continuously, along with language usage
● Recommendation models can be improved by considering users’ maturity
● Learns from only the information of users reviewing products at explicit timepoints ---
no meta-data, community-specific / platform dependent features --- easy to
generalize across different communities 50
Outline
● Motivation
● Prior Work and its Limitations
● Credibility Analysis
↘ Framework for Online
Communities
↘ Temporal Evolution of Online
Communities
↘ Applications: Product
Reviews
● Conclusions
51
Can we use this
framework to find
helpful product
reviews?
● Reviews (e.g., camera) with similar
facet-sentiment distribution (e.g.,
bashing “zoom” and “resolution”) are
likely to be equally helpful.
Distributional Hypothesis
Subhabrata Mukherjee, Kashyap Popat and Gerhard Weikum,
Exploring Latent Semantic Factors to Find Useful Product Reviews. In SDM, 2017
52
● Users with similar facet preferences and
expertise are likely to be equally helpful.
We analyze consistency of
embeddings from previous
models to detect fake /
anomalous reviews with
discrepancies like:
Subhabrata Mukherjee, Sourav Dutta and Gerhard Weikum: Credible Review Detection
with Limited Information using Consistency Features, ECML-PKDD 2016
1. Rating and review description
(promotion/demotion)
Excellent product... technical support is almost non-existent ...
this is unacceptable. [4]
2. Rating and Facet description (irrelevant)
DO NOT BUY THIS. I can’t file because Turbo Tax doesn’t have
software updates from the IRS “because of Hurricane Katrina”. [1]
3. Temporal bursts (group spamming)
Dan’s apartment was beautiful, a great location. (3/14/2012)[5]
I highly recommend working with Dan and... (3/14/2012) [5]
Dan is super friendly, confident... (3/14/2012) [4]
Consistency Analysis
of Product Reviews
53
Future Work
54
★ Going beyond topics and bag-of-words features / lexicons
Modeling context and semantics via embeddings
★ Applications to tasks like Anomaly Detection, Community
Question-Answering, Knowledge-base Curation etc.
★ Incorporating richer facets like multi-modal interactions,
stance, trend, influence etc.
EMNLP 2018
WWW 2017,
WWW 2018
In progress!
Statement
“The use of solar panels drains the sun of energy.” [Fake]
“Between 1988 and 2006, a man lived at a Paris airport.” [True]
Problem Statement: Given a statement as a sequence of tokens,
and news articles from Web, classify it as True or False
Fact-Checker for Natural Language Statements
DeClarE: Evidence-aware Deep Learning
Key Idea is to model the following jointly:
- Learn separate embeddings for
- (i) Statement Text, (ii) Statement Source, (iii) News
(Article) Text, (iv) News (Article) Source
- Bidirectional LSTM to model contextual information
- Attention to focus on relevant parts of the news article w.r.t
statement context
- Aggregate and predict
Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum:
DeClarE: Debunking Fake News and False Claims with Evidence-aware Deep Learning.
In EMNLP, 2018
Collect
Evidence
Evaluate
Evidence
Joint Modeling
Predict
Correctness
DeClarE: Framework
Statement Text
News Text
Statement Source
News Source
Credibility Classification on Snopes and PolitiFact
Credibility Regression in NewsTrust
- Declare incurs the least mean squared error (MSE) for rating prediction
Clear Separation between Fake and Authentic News
Sources
Fake Sources
Authentic
Sources
Clusters politicians (speakers of statements) of similar
ideologies close to each other in semantic space
Republicans
Democrats
Using Attention to Generate Evidence
[Source] nytimes.com [Evidence] “glimmer of truth, sketchy evidence, hasn’t said, barely true claims”
1. How can we develop models that jointly
leverage users, network, and context for
credibility analysis in online communities?
2. How can we model users’ evolution or
progression in maturity?
3. How can we deal with the limited
information scenario?
4. How can we generate interpretable
explanations for credibility verdict?
63
Interactional Framework for
Credibility Analysis
Conclusions
1. How can we jointly leverage network,
context and users for credibility
analysis in online communities?
2. How can we model evolution?
3. How can we deal with limited data?
4. How can we generate interpretable
explanations for credibility verdict?
64
Collaborators and Co-authors
Gerhard
Weikum
(Advisor)
Kashyap
Popat
Jannik
Strötgen
Cristian
Danescu-Nicul
escu-Mizil
Stephan
Günnemann
Hemank
Lamba
Sourav
Dutta
Acknowledgments (1/3)
65
Dissertation Committee
Gerhard
Weikum
Jiawei
Han
Stephan
Günnemann
Acknowledgments (2/3)
Dietrich
Klakow
66
Databases and Information Systems Department at Max Planck Institute
Acknowledgments (3/3)
1. How can we develop models that jointly
leverage users, network, and context for
credibility analysis in online communities?
2. How can we model users’ evolution or
progression in maturity?
3. How can we deal with the limited
information scenario?
4. How can we generate interpretable
explanations for credibility verdict?
67
Interactional Framework for
Credibility Analysis
1. How can we jointly leverage network,
context and users for credibility
analysis in online communities?
2. How can we model temporal evolution?
3. How can we deal with limited data?
4. How can we generate interpretable
explanations for credibility verdict?
THANKS!

More Related Content

Similar to Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

Extracting, Mining and Predicting Users’ Interests from Social Media
Extracting, Mining and Predicting Users’ Interests from Social MediaExtracting, Mining and Predicting Users’ Interests from Social Media
Extracting, Mining and Predicting Users’ Interests from Social MediaFattane Zarrinkalam
 
CAMUG - Sept 3, 2020 - User Story Quality Matters
CAMUG - Sept 3, 2020 - User Story Quality MattersCAMUG - Sept 3, 2020 - User Story Quality Matters
CAMUG - Sept 3, 2020 - User Story Quality MattersChris Edwards, P.Eng.
 
Exploration & Promotion: Implementation Strategies of Corporate Social Software
Exploration & Promotion: Implementation Strategies of Corporate Social SoftwareExploration & Promotion: Implementation Strategies of Corporate Social Software
Exploration & Promotion: Implementation Strategies of Corporate Social SoftwareAlexander Stocker
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Book recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueBook recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueeSAT Journals
 
Fundamentals of Deep Recommender Systems
 Fundamentals of Deep Recommender Systems Fundamentals of Deep Recommender Systems
Fundamentals of Deep Recommender SystemsWQ Fan
 
IRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET Journal
 
Growth and engagement 101
Growth and engagement 101Growth and engagement 101
Growth and engagement 101Manu Rekhi
 
User stories — broken vision broke the knees
User stories — broken vision broke the kneesUser stories — broken vision broke the knees
User stories — broken vision broke the kneesVladimir Tarasov
 
Lifelong Personal Health Data And Application Software Via...
Lifelong Personal Health Data And Application Software Via...Lifelong Personal Health Data And Application Software Via...
Lifelong Personal Health Data And Application Software Via...Lindsey Campbell
 
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SDiscovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SIRJET Journal
 
Different Yardsticks
Different YardsticksDifferent Yardsticks
Different YardsticksJen Cloud
 
Influence and impact of social networks on corporate reputation - Balestrieri...
Influence and impact of social networks on corporate reputation - Balestrieri...Influence and impact of social networks on corporate reputation - Balestrieri...
Influence and impact of social networks on corporate reputation - Balestrieri...Pierre Balestrieri
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET Journal
 
3282016 Additional Book Resourceshttpscourserooma.cap.docx
3282016 Additional Book Resourceshttpscourserooma.cap.docx3282016 Additional Book Resourceshttpscourserooma.cap.docx
3282016 Additional Book Resourceshttpscourserooma.cap.docxtamicawaysmith
 
An Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder TechniqueAn Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder TechniqueIJCSIS Research Publications
 

Similar to Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities (20)

Extracting, Mining and Predicting Users’ Interests from Social Media
Extracting, Mining and Predicting Users’ Interests from Social MediaExtracting, Mining and Predicting Users’ Interests from Social Media
Extracting, Mining and Predicting Users’ Interests from Social Media
 
CAMUG - Sept 3, 2020 - User Story Quality Matters
CAMUG - Sept 3, 2020 - User Story Quality MattersCAMUG - Sept 3, 2020 - User Story Quality Matters
CAMUG - Sept 3, 2020 - User Story Quality Matters
 
Exploration & Promotion: Implementation Strategies of Corporate Social Software
Exploration & Promotion: Implementation Strategies of Corporate Social SoftwareExploration & Promotion: Implementation Strategies of Corporate Social Software
Exploration & Promotion: Implementation Strategies of Corporate Social Software
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Book recommendation system using opinion mining technique
Book recommendation system using opinion mining techniqueBook recommendation system using opinion mining technique
Book recommendation system using opinion mining technique
 
Fundamentals of Deep Recommender Systems
 Fundamentals of Deep Recommender Systems Fundamentals of Deep Recommender Systems
Fundamentals of Deep Recommender Systems
 
IRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments DetectionIRJET - YouTube Spam Comments Detection
IRJET - YouTube Spam Comments Detection
 
Growth and engagement 101
Growth and engagement 101Growth and engagement 101
Growth and engagement 101
 
CX Strategy & Design presentation
CX Strategy & Design presentationCX Strategy & Design presentation
CX Strategy & Design presentation
 
User stories — broken vision broke the knees
User stories — broken vision broke the kneesUser stories — broken vision broke the knees
User stories — broken vision broke the knees
 
Intranet Usability Testing
Intranet Usability TestingIntranet Usability Testing
Intranet Usability Testing
 
Lifelong Personal Health Data And Application Software Via...
Lifelong Personal Health Data And Application Software Via...Lifelong Personal Health Data And Application Software Via...
Lifelong Personal Health Data And Application Software Via...
 
Fair Recommender Systems
Fair Recommender Systems Fair Recommender Systems
Fair Recommender Systems
 
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’SDiscovering Influential User by Coupling Multiplex Heterogeneous OSN’S
Discovering Influential User by Coupling Multiplex Heterogeneous OSN’S
 
Different Yardsticks
Different YardsticksDifferent Yardsticks
Different Yardsticks
 
Influence and impact of social networks on corporate reputation - Balestrieri...
Influence and impact of social networks on corporate reputation - Balestrieri...Influence and impact of social networks on corporate reputation - Balestrieri...
Influence and impact of social networks on corporate reputation - Balestrieri...
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
 
3282016 Additional Book Resourceshttpscourserooma.cap.docx
3282016 Additional Book Resourceshttpscourserooma.cap.docx3282016 Additional Book Resourceshttpscourserooma.cap.docx
3282016 Additional Book Resourceshttpscourserooma.cap.docx
 
An Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder TechniqueAn Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder Technique
 

More from Subhabrata Mukherjee

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsSubhabrata Mukherjee
 
OpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesOpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesSubhabrata Mukherjee
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language ModelSubhabrata Mukherjee
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesSubhabrata Mukherjee
 
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Subhabrata Mukherjee
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesSubhabrata Mukherjee
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsSubhabrata Mukherjee
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Subhabrata Mukherjee
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelSubhabrata Mukherjee
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterSubhabrata Mukherjee
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Subhabrata Mukherjee
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilaritySubhabrata Mukherjee
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...Subhabrata Mukherjee
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviewsSubhabrata Mukherjee
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...Subhabrata Mukherjee
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSubhabrata Mukherjee
 

More from Subhabrata Mukherjee (17)

XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual ModelsXtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
 
Fact Checking from Text
Fact Checking from TextFact Checking from Text
Fact Checking from Text
 
OpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product ProfilesOpenTag: Open Attribute Value Extraction From Product Profiles
OpenTag: Open Attribute Value Extraction From Product Profiles
 
Continuous Experience-aware Language Model
Continuous Experience-aware Language ModelContinuous Experience-aware Language Model
Continuous Experience-aware Language Model
 
Experience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review CommunitiesExperience aware Item Recommendation in Evolving Review Communities
Experience aware Item Recommendation in Evolving Review Communities
 
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
Domain Cartridge: Unsupervised Framework for Shallow Domain Ontology Construc...
 
Leveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News CommunitiesLeveraging Joint Interactions for Credibility Analysis in News Communities
Leveraging Joint Interactions for Credibility Analysis in News Communities
 
People on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health ForumsPeople on Drugs: Credibility of User Statements in Health Forums
People on Drugs: Credibility of User Statements in Health Forums
 
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
Author-Specific Hierarchical Sentiment Aggregation for Rating Prediction of R...
 
Joint Author Sentiment Topic Model
Joint Author Sentiment Topic ModelJoint Author Sentiment Topic Model
Joint Author Sentiment Topic Model
 
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in TwitterTwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
TwiSent: A Multi-Stage System for Analyzing Sentiment in Twitter
 
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
Adaptation of Sentiment Analysis to New Linguistic Features, Informal Languag...
 
Leveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word SimilarityLeveraging Sentiment to Compute Word Similarity
Leveraging Sentiment to Compute Word Similarity
 
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
WikiSent : Weakly Supervised Sentiment Analysis Through Extractive Summarizat...
 
Feature specific analysis of reviews
Feature specific analysis of reviewsFeature specific analysis of reviews
Feature specific analysis of reviews
 
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
YouCat : Weakly Supervised Youtube Video Categorization System from Meta Data...
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 

Recently uploaded

001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...marwaahmad357
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfNetHelix
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabkiyorndlab
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxkastureyashashree
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceLayne Sadler
 
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfSUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfsantiagojoderickdoma
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptxVijayaKumarR28
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxRahulVishwakarma71547
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)GRAPE
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxmarwaahmad357
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPirithiRaju
 
Principles & Formulation of Hair Care Products
Principles & Formulation of Hair Care  ProductsPrinciples & Formulation of Hair Care  Products
Principles & Formulation of Hair Care Productspurwaborkar@gmail.com
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024suelcarter1
 
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Sérgio Sacani
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docxmarwaahmad357
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Sérgio Sacani
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptxVaishnaviAware
 

Recently uploaded (20)

001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
001 Case Study - Submission Point_c1051231_attempt_2023-11-23-14-08-42_ABS CW...
 
Gene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdfGene transfer in plants agrobacterium.pdf
Gene transfer in plants agrobacterium.pdf
 
World Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlabWorld Water Day 22 March 2024 - kiyorndlab
World Water Day 22 March 2024 - kiyorndlab
 
Bureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptxBureau of Indian Standards Specification of Shampoo.pptx
Bureau of Indian Standards Specification of Shampoo.pptx
 
Cheminformatics tools supporting dissemination of data associated with US EPA...
Cheminformatics tools supporting dissemination of data associated with US EPA...Cheminformatics tools supporting dissemination of data associated with US EPA...
Cheminformatics tools supporting dissemination of data associated with US EPA...
 
KeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data scienceKeyBio pipeline for bioinformatics and data science
KeyBio pipeline for bioinformatics and data science
 
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdfSUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
SUKDANAN DIAGNOSTIC TEST IN PHYSICAL SCIENCE ANSWER KEYY.pdf
 
Basic Concepts in Pharmacology in molecular .pptx
Basic Concepts in Pharmacology in molecular  .pptxBasic Concepts in Pharmacology in molecular  .pptx
Basic Concepts in Pharmacology in molecular .pptx
 
Application of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptxApplication of Foraminiferal Ecology- Rahul.pptx
Application of Foraminiferal Ecology- Rahul.pptx
 
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
Applying Cheminformatics to Develop a Structure Searchable Database of Analyt...
 
Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)Contracts with Interdependent Preferences (2)
Contracts with Interdependent Preferences (2)
 
Applied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docxApplied Biochemistry feedback_M Ahwad 2023.docx
Applied Biochemistry feedback_M Ahwad 2023.docx
 
Pests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPRPests of ragi_Identification, Binomics_Dr.UPR
Pests of ragi_Identification, Binomics_Dr.UPR
 
Principles & Formulation of Hair Care Products
Principles & Formulation of Hair Care  ProductsPrinciples & Formulation of Hair Care  Products
Principles & Formulation of Hair Care Products
 
RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024RCPE terms and cycles scenarios as of March 2024
RCPE terms and cycles scenarios as of March 2024
 
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...Identification of Superclusters and Their Properties in the Sloan Digital Sky...
Identification of Superclusters and Their Properties in the Sloan Digital Sky...
 
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
Cheminformatics tools and chemistry data underpinning mass spectrometry analy...
 
CW marking grid Analytical BS - M Ahmad.docx
CW  marking grid Analytical BS - M Ahmad.docxCW  marking grid Analytical BS - M Ahmad.docx
CW marking grid Analytical BS - M Ahmad.docx
 
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
Digitized Continuous Magnetic Recordings for the August/September 1859 Storms...
 
Role of herbs in hair care Amla and heena.pptx
Role of herbs in hair care  Amla and  heena.pptxRole of herbs in hair care  Amla and  heena.pptx
Role of herbs in hair care Amla and heena.pptx
 

Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities

  • 1. Probabilistic Graphical Models for Credibility Analysis in Evolving Online Communities Subhabrata Mukherjee Advisor: Prof. Gerhard Weikum Max Planck Institute for Informatics, Germany Email: subhabrata.mukherjee.ju@gmail.com Web: https://people.mpi-inf.mpg.de/~smukherjee/
  • 2. Why Credibility Analysis? ▶ “Rapid spread of misinformation online" – one of the top 10 challenges as per The World Economic Forum 1 http://www.washingtonstarnews.com/proof-obamacare-requires-all-americans-to-be-chipped/ 2 http://theracketreport.com/several-injured-in-zombie-like-attack-at-tennessee-walmart-as-man-tries-to-eat-his-victims/ 3 https://www.forbes.com/sites/tonybradley/2017/07/31/facebook-ai-creates-its-own-language-in-creepy-preview-of-our-potential-future/
  • 3. Concerns 3 Misinformation for health can have hazardous consequences Rumors, spams, fake reviews have significant business impact for retailers
  • 4. Why focus on Online Communities? Massive repositories of knowledge accessed by regular users and professionals ↘ 59% of adult U.S. population and half of U.S. physicians rely on online resources [IMS Health Report, 2014] ↘ 40% of online consumers consult online reviews before buying products [Nielson Corporation, 2016]
  • 5. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Applications: Product Reviews ● Conclusions 5
  • 6. Truth Discovery Structured data (e.g., SPO triples, tables, networks) Objective facts (e.g., Obama_BornIn_Hawaii vs. Obama_BornIn_Kenya) No contextual data (text) Limited metadata, external KB 6 Linguistic Analysis Unstructured text Subjective information (e.g., opinion spam, bias, viewpoint ) External KB (e.g., WordNet, KG) No network / interactions, metadata
  • 7. 1. How can we jointly leverage network, context and users for credibility analysis in online communities? 2. How can we model temporal evolution? 3. How can we deal with limited data? 4. How can we generate interpretable explanations for credibility verdict? 7 Research Questions
  • 8. Contributions ● Credibility Analysis Framework for Online Communities ↘ Classification: Health Communities [SIGKDD 2014] ↘ Regression: News Communities [CIKM 2015] ● Temporal Evolution of Online Communities [ICDM 2015, SIGKDD 2016] ● Credibility Analysis of Product Reviews [ECML-PKDD 2016, SDM 2017] 8
  • 9. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Applications: Product Reviews ● Conclusions 9
  • 10. Problem Statement: Credibility Classification Given a set of statements from different users find which ones are True or False 10
  • 11. Use-case: Finding Drug Side-effects from Healthforums Problem: Given a set of posts from different users, extract credible statements (subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users 11 Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil: People on Drugs: Credibility of User Statements in Health Communities. In KDD, 2014
  • 12. 12 Problem: Given a set of posts from different users, extract credible statements (subject-predicate-object triples like DrugX_HasSideEffect_Y) from trustworthy users Use-case: Finding Drug Side-effects from Healthforums Subhabrata Mukherjee, Gerhard Weikum, Cristian Danescu-Niculescu-Mizil: People on Drugs: Credibility of User Statements in Health Communities. In KDD, 2014
  • 13. Modeling Interactions in Online Communities ● Each user, post, and statement is a random variable with edges depicting interactions User Post Post Statement User Statement in post ● Variables have observable features (e.g, authority, emotionality, subjectivity)
  • 14. Modeling Interactions in Online Communities ● Each user, post, and statement is a random variable with edges depicting interactions User post Post statement User statement in post Credible statements appear in objective posts Trustworthy users corroborate on credible statements in objective posts KEY IDEA
  • 15. 15 Conditional Random Field to Exploit Joint Interactions (Users + Network + Context) Partial Supervision: Expert stated (top 20%) side-effects of drugs as training labels Model predicts labels of unobserved statements. How to complement expert medical knowledge with large scale non-expert data?
  • 16. Semi-Supervised Conditional Random Field 1. Estimate user trustworthiness: 2. Estimate label of unknown statements Su by Gibbs Sampling: 3. Maximize log-likelihood to estimate feature weights: 4. Apply E-Step and M-Step till convergence 16
  • 17. Healthforum Dataset ● Healthboards.com community (www.healthboards.com) with 850,000 registered users and 4.5 million posts 17 ● Expert labels about drugs from MayoClinic (www.mayoclinic.org) ↘ 6 widely used drugs for experimentation
  • 18. 18
  • 19. 19 What constitutes credible language? Affective Emotions confidence sympathy self-esteem eagerness coolness compunction anxiety embarrassment misery distress
  • 20. 20 What constitutes credible language? determiner (this, that,..) negation (not, never, ..) second person (you, ..) conjunction (therefore, consequently, ..) contrast (despite, though, ..) question (what, why, ..) conditional (if) adverb (maybe, probably, ..) modality (might, could, ..) Discourse and Modalities
  • 21. Drawbacks 1. Does not model Topics of discussion: - Topic-specific expertise of users and sources 2. Conditional Random Fields cannot be used for numeric output / regression 21
  • 22. Where can we find numeric ratings? 22 In many online communities users rate items on their quality - E.g. Amazon, Yelp, TripAdvisor In news communities like Digg, Reddit, NewsTrust users provide feedback on articles (items) from different sources - User feedback often biased and subjective due to diverse viewpoints
  • 23. Credibility Regression Framework: News Communities Topics Climate Change Sources trunews.com Articles “Global warming is a hoax” Sources / Users Scientificamerican.com snopes.com user-donald Reviews & Ratings scientific analysis, 1.5/ 5, conspiratory theory 23 Problem Statement: Given articles from various sources and feedback from different users: predict credibility rating of articles
  • 24. Reviews / Ratings scientific analysis, 1.5/ 5, conspiratory theory Topics Climate Change Articles “Global warming is a hoax” Sources trunews.com Sources / Users Scientificamerican.com snopes.com user-donald Idea: Trustworthy sources publish objective articles corroborated by expert users with credible reviews/ratings 24 We use CRF to capture these mutual interactions in news communities (e.g., newstrust.net, digg, reddit) to jointly rank all of the underlying factors. Credibility Regression Framework: News Communities
  • 25. Online Communities: Factors Related to Ensemble Learning, Learning to Rank
  • 26. How to incorporate continuous ratings instead of discrete labels in CRF ? 26 Probability Mass Function for discrete labels: Probability Density Function for continuous ratings: Subhabrata Mukherjee and Gerhard Weikum: Leveraging Joint Interactions for Credibility Analysis in News Comm. In CIKM, 2015
  • 27. How to incorporate continuous ratings instead of discrete labels in CRF ? 27 ● We show that a certain energy function for clique potential --- geared for reducing mean-squared-error --- results in multivariate gaussian p.d.f. !!! ● Constrained Gradient Ascent for inference Subhabrata Mukherjee and Gerhard Weikum: Leveraging Joint Interactions for Credibility Analysis in News Comm. In CIKM, 2015
  • 28. Energy Function to Combine All
  • 29. Predicting Article Credibility Ratings in Newstrust.net 29 Progressive decrease in mean squared error with more network interactions, and context
  • 30. Summary ● Semi-supervised and Continuous CRF to jointly identify trustworthy users, credible statements, and reliable postings in online communities ● A framework to incorporate richer aspects like user expertise, topics / facets, temporal evolution etc. (discussed next) 30
  • 31. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Applications: Product Reviews ● Conclusions 31
  • 32. ● Online communities are dynamic, as users join and leave; acquire new vocabulary; evolve and mature over time ● Trustworthiness and expertise of users evolve over time Temporal Evolution 32 How to capture evolving user expertise?
  • 33. “My first DSLR. Excellent camera, takes great pictures with high definition, without a doubt it makes honor to its name.” [Aug, 1997] “The EF 75-300 mm lens is only good to be used outside. The 2.2X HD lens can only be used for specific items; filters are useless if ISO, AP,... . The short 18-55mm lens is cheap and should have a hood to keep light off lens.” [Oct, 2012] Illustrative Example for Review Communities 33 ● Consider following camera reviews by the same user John: Subhabrata Mukherjee, Hemank Lamba and Gerhard Weikum, Experience-aware Item Recommendation in Evolving Review Communities, ICDM 2015:
  • 34. 34 “The EF 75-300 mm lens is only good to be used outside. The 2.2X HD lens can only be used for specific items; filters are useless if ISO, AP,... . The short 18-55mm lens is cheap and should have a hood to keep light off lens.” [Oct, 2012] Illustrative Example for Review Communities ● Consider following camera reviews by John: “My first DSLR. Excellent camera, takes great pictures with high definition, without a doubt it makes honor to its name.” [Aug, 1997] How can we quantify this change in users’ maturity / experience ? How can we model this evolution / progression in users’ maturity? Mukherjee et al.: ICDM 2015, SIGKDD 2016
  • 35. Prior Work: Discrete Experience Evolution 35 Assumption: At each timepoint a user remains at the same level of experience, or moves to the next level 1. Users at similar levels of experience have similar facet preferences, and rating style (McAuley and Leskovec: WWW 2013) 2. Additionally, our work exploits similar writing style (Mukherjee, Lamba and Weikum: ICDM 2015)
  • 36. Language Model (KL) Divergence Increases with Experience 36Experienced users have a distinctive writing style different than that of amateurs
  • 37. Abrupt change in: - Expertise level - Language model Drawback 37 Assumption: At each timepoint a user remains at the same level of experience, or moves to the next level Abrupt Transition
  • 38. Continuous Experience Evolution 38 Subhabrata Mukherjee, Stephan Guennemann and Gerhard Weikum: Continuous Experience-aware Language Model. In KDD, 2016
  • 39. Continuous Experience Evolution: Assumptions ★ Continuous-time process, always positive ★ Markovian assumption: Experience at time t depends on that at t-1 ★ Drift: Overall trend to increase over time ★ Volatility: Progression may not be smooth with occasional volatility E.g.: series of expert reviews followed by a sloppy one 39
  • 40. Geometric Brownian Motion 40 We show these properties to be satisfied by the continuous-time stochastic process: Geometric Brownian Motion
  • 41. Language Model (LM) Evolution ● Users' LM also evolve with experience evolution ● Smoothly evolve over time preserving Markov property of experience evolution ● Variance of LM should change with experience change ● Brownian Motion to model this desiderata: 41
  • 42. Inference 42 Topic Model (Blei et al., JMLR '03) Users ( Author-topic model,Rosen-Zvi et al., UAI '04) Continuous Time (Dynamic topic model, Wang et al., UAI '08) Continuous Experience (this work) Topic + time + users + + +
  • 43. Sampling based Inference Kalman Filter for LM evolution Metropolis Hastings for Exp. evolution 43 Gibbs Sampling for Facets
  • 44. Kalman Filter for LM evolution Metropolis Hastings for Exp. evolution 44 Gibbs Sampling for Facets Sampling based Inference
  • 46. Can we recommend items better, if we consider users’ experience to consume them? 46 User-experience aware model (blue) incurs the least error in item rating prediction
  • 48. Interpretability: Top Words* by Experienced Users 48 Most Experience Least Experience BeerAdvocate chestnut_hued near_viscous cherry_wood sweet_burning faint_vanilla woody_herbal citrus_hops mouthfeel originally flavor color poured pleasant bad bitter sweet Amazon aficionados minimalist underwritten theatrically unbridled seamless retrospect overdramatic viewer entertainment battle actress tells emotional supporting Yelp smoked marinated savory signature contemporary selections delicate texture mexican chicken salad love better eat atmosphere sandwich NewsTrust health actions cuts medicare oil climate spending unemployment bad god religion iraq responsibility questions clear powerful *Learned by our generative model without supervision
  • 49. Interpretability: Top Words* by Experienced Users 49 Most Experience Least Experience BeerAdvocate chestnut_hued near_viscous cherry_wood sweet_burning faint_vanilla woody_herbal citrus_hops mouthfeel originally flavor color poured pleasant bad bitter sweet Amazon aficionados minimalist underwritten theatrically unbridled seamless retrospect overdramatic viewer entertainment battle actress tells emotional supporting Yelp smoked marinated savory signature contemporary selections delicate texture mexican chicken salad love better eat atmosphere sandwich NewsTrust health actions cuts medicare oil climate spending unemployment bad god religion iraq responsibility questions clear powerful Experienced users in beer community use more “fruity” words to describe taste and smell of beers Experienced users in news community discuss about policies and regulations in contrast to amateurs interested on polarizing topics *Learned by our generative model without supervision
  • 50. Take-away ● Insights from Geometric Brownian Motion trajectory of users: ○ Experienced users mature faster than amateurs ○ Progression depends more on time spent in community than on activity ● Users' experience evolve continuously, along with language usage ● Recommendation models can be improved by considering users’ maturity ● Learns from only the information of users reviewing products at explicit timepoints --- no meta-data, community-specific / platform dependent features --- easy to generalize across different communities 50
  • 51. Outline ● Motivation ● Prior Work and its Limitations ● Credibility Analysis ↘ Framework for Online Communities ↘ Temporal Evolution of Online Communities ↘ Applications: Product Reviews ● Conclusions 51
  • 52. Can we use this framework to find helpful product reviews? ● Reviews (e.g., camera) with similar facet-sentiment distribution (e.g., bashing “zoom” and “resolution”) are likely to be equally helpful. Distributional Hypothesis Subhabrata Mukherjee, Kashyap Popat and Gerhard Weikum, Exploring Latent Semantic Factors to Find Useful Product Reviews. In SDM, 2017 52 ● Users with similar facet preferences and expertise are likely to be equally helpful.
  • 53. We analyze consistency of embeddings from previous models to detect fake / anomalous reviews with discrepancies like: Subhabrata Mukherjee, Sourav Dutta and Gerhard Weikum: Credible Review Detection with Limited Information using Consistency Features, ECML-PKDD 2016 1. Rating and review description (promotion/demotion) Excellent product... technical support is almost non-existent ... this is unacceptable. [4] 2. Rating and Facet description (irrelevant) DO NOT BUY THIS. I can’t file because Turbo Tax doesn’t have software updates from the IRS “because of Hurricane Katrina”. [1] 3. Temporal bursts (group spamming) Dan’s apartment was beautiful, a great location. (3/14/2012)[5] I highly recommend working with Dan and... (3/14/2012) [5] Dan is super friendly, confident... (3/14/2012) [4] Consistency Analysis of Product Reviews 53
  • 54. Future Work 54 ★ Going beyond topics and bag-of-words features / lexicons Modeling context and semantics via embeddings ★ Applications to tasks like Anomaly Detection, Community Question-Answering, Knowledge-base Curation etc. ★ Incorporating richer facets like multi-modal interactions, stance, trend, influence etc. EMNLP 2018 WWW 2017, WWW 2018 In progress!
  • 55. Statement “The use of solar panels drains the sun of energy.” [Fake] “Between 1988 and 2006, a man lived at a Paris airport.” [True] Problem Statement: Given a statement as a sequence of tokens, and news articles from Web, classify it as True or False Fact-Checker for Natural Language Statements
  • 56. DeClarE: Evidence-aware Deep Learning Key Idea is to model the following jointly: - Learn separate embeddings for - (i) Statement Text, (ii) Statement Source, (iii) News (Article) Text, (iv) News (Article) Source - Bidirectional LSTM to model contextual information - Attention to focus on relevant parts of the news article w.r.t statement context - Aggregate and predict Kashyap Popat, Subhabrata Mukherjee, Andrew Yates, Gerhard Weikum: DeClarE: Debunking Fake News and False Claims with Evidence-aware Deep Learning. In EMNLP, 2018 Collect Evidence Evaluate Evidence Joint Modeling Predict Correctness
  • 57. DeClarE: Framework Statement Text News Text Statement Source News Source
  • 58. Credibility Classification on Snopes and PolitiFact
  • 59. Credibility Regression in NewsTrust - Declare incurs the least mean squared error (MSE) for rating prediction
  • 60. Clear Separation between Fake and Authentic News Sources Fake Sources Authentic Sources
  • 61. Clusters politicians (speakers of statements) of similar ideologies close to each other in semantic space Republicans Democrats
  • 62. Using Attention to Generate Evidence [Source] nytimes.com [Evidence] “glimmer of truth, sketchy evidence, hasn’t said, barely true claims”
  • 63. 1. How can we develop models that jointly leverage users, network, and context for credibility analysis in online communities? 2. How can we model users’ evolution or progression in maturity? 3. How can we deal with the limited information scenario? 4. How can we generate interpretable explanations for credibility verdict? 63 Interactional Framework for Credibility Analysis Conclusions 1. How can we jointly leverage network, context and users for credibility analysis in online communities? 2. How can we model evolution? 3. How can we deal with limited data? 4. How can we generate interpretable explanations for credibility verdict?
  • 66. 66 Databases and Information Systems Department at Max Planck Institute Acknowledgments (3/3)
  • 67. 1. How can we develop models that jointly leverage users, network, and context for credibility analysis in online communities? 2. How can we model users’ evolution or progression in maturity? 3. How can we deal with the limited information scenario? 4. How can we generate interpretable explanations for credibility verdict? 67 Interactional Framework for Credibility Analysis 1. How can we jointly leverage network, context and users for credibility analysis in online communities? 2. How can we model temporal evolution? 3. How can we deal with limited data? 4. How can we generate interpretable explanations for credibility verdict? THANKS!