SlideShare une entreprise Scribd logo
1  sur  18
08/30/18 Stefan Helmstetter, Heiko Paulheim 1
Weakly Supervised Learning for Fake News
Detection on Twitter
Stefan Helmstetter, Heiko Paulheim
08/30/18 Stefan Helmstetter, Heiko Paulheim 2
Motivation
• Social media...
– ...are an increasingly important source of information
– ...can be manipulated easily
08/30/18 Stefan Helmstetter, Heiko Paulheim 3
Motivation
• Fake news detection: a straight forward machine learning problem
– Simplest case: two classes
– Researched for several decades
– Used, e.g., for spam filtering
08/30/18 Stefan Helmstetter, Heiko Paulheim 4
Motivation
• Challenge
– The more training data, the better
– Mass labeling data is difficult (e.g., requires investigations)
●
cf. spam filtering: labeling can be done “on the fly” by laymen
08/30/18 Stefan Helmstetter, Heiko Paulheim 5
Approach
• We cannot easily tell a fake news tweet from a real one
• But we have information on fake and trustworthy sources
08/30/18 Stefan Helmstetter, Heiko Paulheim 6
Approach
• Naive mass labeling:
– every tweet from a fake source is a fake tweet
– every tweet from a trustworthy source is a true tweet
• Our collection:
– 65 fake news sources
– 47 trustworthy news sources
– 401k tweets
●
111k fake news
●
291k real news
08/30/18 Stefan Helmstetter, Heiko Paulheim 7
Approach
• Skew towards 2017
– time of crawling, limitations of Twitter API
– more real than fake news (intentionally!)
08/30/18 Stefan Helmstetter, Heiko Paulheim 8
Approach
• Naive mass labeling:
– every tweet from a fake source is a fake tweet
– every tweet from a trustworthy source is a true tweet
08/30/18 Stefan Helmstetter, Heiko Paulheim 9
Approach
• Mind the classification task
– if we train a classifier, we learn to identify
tweets from untrustworthy sources
– not necessarily the same as fake news tweets
• Assumption
– the training dataset is large
– non-fake news are also covered by trustworthy sources
– trustworthy copies outnumber fake news ones
●
incidental skew in the dataset
08/30/18 Stefan Helmstetter, Heiko Paulheim 10
Approach
• Leaving that caveat aside, we use
– 53 user-level features
e.g., no. of followers, tweet frequency
– 69 tweet-level features
e.g., length, no. of hashtags, no. of URLs
– text features
as BoW (60k features) or doc2vec model (300 features)
– topic features
10-200 topics created using LDA
– eight features using sentiment and polarity analysis
• Classifiers
– Naive Bayes, Decision Trees, SVM, Neural Net (1 hidden layer),
Random Forest, xgboost
– Voting and weighted voting of the above
08/30/18 Stefan Helmstetter, Heiko Paulheim 11
Approach
• Optimal selection of features per classifier
08/30/18 Stefan Helmstetter, Heiko Paulheim 12
Evaluation
• Setting 1
– Cross validation on the training set
– Remember: actual target is trustworthiness of source
• Setting 2
– Validation against a gold standard
– Target here: trustworthiness of tweet
• Two variants each
– with and without user level features
– idea: judging tweets from known and unknown sources
08/30/18 Stefan Helmstetter, Heiko Paulheim 13
Evaluation
• Setting 1
– Cross validation on the training set
– Remember: actual target is
trustworthiness of source
• Results
– up to .78 without user level tweets
– up to .94 with user level tweets
– xgboost and voting work best
08/30/18 Stefan Helmstetter, Heiko Paulheim 14
Evaluation
• Setting 2
– Validation against a gold standard
– Target here: trustworthiness of tweet
• Results
– up to .77 without user level tweets
– up to .89 with user level tweets
– neural net works best
• Observation:
– results are not much worse than for setting 1
– i.e.: source labels seem to be a suitable proxy for tweet labels
08/30/18 Stefan Helmstetter, Heiko Paulheim 15
Evaluation
• Feature weighting by xgboost:
– most important features are user level features
08/30/18 Stefan Helmstetter, Heiko Paulheim 16
Evaluation
• Without user level features
– surface level features are strong
– content/topics are not too important
08/30/18 Stefan Helmstetter, Heiko Paulheim 17
Conclusion
• Fake news detection is a straight forward classification task
– but training data is scarce
• Inexact mass-labeling can be done
– by using source instead of tweet labels
– collection of large-scale training is easy
– automatic re-collection is possible
(e.g., for new topics, changed twitter behavior)
• Results for tweet labeling
– not much worse than for source labeling
08/30/18 Stefan Helmstetter, Heiko Paulheim 18
Weakly Supervised Learning for Fake News
Detection on Twitter
Stefan Helmstetter, Heiko Paulheim

Contenu connexe

Tendances

Fake News detection.pptx
Fake News detection.pptxFake News detection.pptx
Fake News detection.pptxSanad Bhowmik
 
DTS Solution - Hacking ATM Machines - The Italian Job Way
DTS Solution - Hacking ATM Machines - The Italian Job WayDTS Solution - Hacking ATM Machines - The Italian Job Way
DTS Solution - Hacking ATM Machines - The Italian Job WayShah Sheikh
 
Recommendation system
Recommendation system Recommendation system
Recommendation system Vikrant Arya
 
Business Email Compromise Scam
Business Email Compromise ScamBusiness Email Compromise Scam
Business Email Compromise ScamGuardian Analytics
 
Cyber security presentation
Cyber security presentationCyber security presentation
Cyber security presentationParab Mishra
 
Mobile security issues & frauds in India
Mobile security issues & frauds in IndiaMobile security issues & frauds in India
Mobile security issues & frauds in IndiaYogesh Lolge
 
IRJET- Android Malware Detection using Machine Learning
IRJET-  	  Android Malware Detection using Machine LearningIRJET-  	  Android Malware Detection using Machine Learning
IRJET- Android Malware Detection using Machine LearningIRJET Journal
 
Cyber Security - Unit - 5 - Introduction to Cyber Crime Investigation
Cyber Security - Unit - 5 - Introduction to Cyber Crime InvestigationCyber Security - Unit - 5 - Introduction to Cyber Crime Investigation
Cyber Security - Unit - 5 - Introduction to Cyber Crime InvestigationGyanmanjari Institute Of Technology
 
DDoS - Distributed Denial of Service
DDoS - Distributed Denial of ServiceDDoS - Distributed Denial of Service
DDoS - Distributed Denial of ServiceEr. Shiva K. Shrestha
 
CYBER SECURITY ON SOCIAL MEDIA
CYBER SECURITY ON SOCIAL MEDIACYBER SECURITY ON SOCIAL MEDIA
CYBER SECURITY ON SOCIAL MEDIAcharitha garimella
 
Botnet Detection Techniques
Botnet Detection TechniquesBotnet Detection Techniques
Botnet Detection TechniquesTeam Firefly
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learningShareDocView.com
 
Cyber Crime and Social Media Security
Cyber Crime and Social Media SecurityCyber Crime and Social Media Security
Cyber Crime and Social Media SecurityHem Pokhrel
 
Steganography ProjectReport
Steganography ProjectReportSteganography ProjectReport
Steganography ProjectReportekta sharma
 

Tendances (20)

FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT FAKE NEWS DETECTION PPT
FAKE NEWS DETECTION PPT
 
Fake News detection.pptx
Fake News detection.pptxFake News detection.pptx
Fake News detection.pptx
 
DTS Solution - Hacking ATM Machines - The Italian Job Way
DTS Solution - Hacking ATM Machines - The Italian Job WayDTS Solution - Hacking ATM Machines - The Italian Job Way
DTS Solution - Hacking ATM Machines - The Italian Job Way
 
Recommendation system
Recommendation system Recommendation system
Recommendation system
 
Video Steganography
Video SteganographyVideo Steganography
Video Steganography
 
Business Email Compromise Scam
Business Email Compromise ScamBusiness Email Compromise Scam
Business Email Compromise Scam
 
Cyber security presentation
Cyber security presentationCyber security presentation
Cyber security presentation
 
Wpa3
Wpa3Wpa3
Wpa3
 
Les jeunes et Internet
Les jeunes et InternetLes jeunes et Internet
Les jeunes et Internet
 
Mobile security issues & frauds in India
Mobile security issues & frauds in IndiaMobile security issues & frauds in India
Mobile security issues & frauds in India
 
IRJET- Android Malware Detection using Machine Learning
IRJET-  	  Android Malware Detection using Machine LearningIRJET-  	  Android Malware Detection using Machine Learning
IRJET- Android Malware Detection using Machine Learning
 
Cyber Security - Unit - 5 - Introduction to Cyber Crime Investigation
Cyber Security - Unit - 5 - Introduction to Cyber Crime InvestigationCyber Security - Unit - 5 - Introduction to Cyber Crime Investigation
Cyber Security - Unit - 5 - Introduction to Cyber Crime Investigation
 
DDoS - Distributed Denial of Service
DDoS - Distributed Denial of ServiceDDoS - Distributed Denial of Service
DDoS - Distributed Denial of Service
 
CYBER SECURITY ON SOCIAL MEDIA
CYBER SECURITY ON SOCIAL MEDIACYBER SECURITY ON SOCIAL MEDIA
CYBER SECURITY ON SOCIAL MEDIA
 
Botnet Detection Techniques
Botnet Detection TechniquesBotnet Detection Techniques
Botnet Detection Techniques
 
ML PPT-1.pptx
ML PPT-1.pptxML PPT-1.pptx
ML PPT-1.pptx
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
Wardriving
WardrivingWardriving
Wardriving
 
Cyber Crime and Social Media Security
Cyber Crime and Social Media SecurityCyber Crime and Social Media Security
Cyber Crime and Social Media Security
 
Steganography ProjectReport
Steganography ProjectReportSteganography ProjectReport
Steganography ProjectReport
 

Similaire à Weakly Supervised Learning for Fake News Detection on Twitter

Exploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningExploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningHeiko Paulheim
 
Open Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media AnalysisOpen Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media Analysisikanow
 
Open analytics social media framework
Open analytics   social media frameworkOpen analytics   social media framework
Open analytics social media frameworkOpen Analytics
 
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Jason Hong
 
Using Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challengesUsing Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challengesDr Wasim Ahmed
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysisikanow
 
Analysis of Cyberbullying Tweets in Trending World Events
Analysis of Cyberbullying Tweets in Trending World EventsAnalysis of Cyberbullying Tweets in Trending World Events
Analysis of Cyberbullying Tweets in Trending World Eventskcortis
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 
Group Chapter Presentation
Group Chapter PresentationGroup Chapter Presentation
Group Chapter Presentationfeinal
 
Hallam healthresearch tweetchats
Hallam healthresearch tweetchatsHallam healthresearch tweetchats
Hallam healthresearch tweetchatsHeidi Probst
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisOpen Analytics
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Timo Wandhoefer
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDivyaPatel729457
 
Social Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutSocial Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutMatthew J. Kushin, Ph.D.
 
Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Symeon Papadopoulos
 

Similaire à Weakly Supervised Learning for Fake News Detection on Twitter (16)

Exploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningExploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data Mining
 
Open Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media AnalysisOpen Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media Analysis
 
Open analytics social media framework
Open analytics   social media frameworkOpen analytics   social media framework
Open analytics social media framework
 
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
 
Using Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challengesUsing Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challenges
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Analysis of Cyberbullying Tweets in Trending World Events
Analysis of Cyberbullying Tweets in Trending World EventsAnalysis of Cyberbullying Tweets in Trending World Events
Analysis of Cyberbullying Tweets in Trending World Events
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Group Chapter Presentation
Group Chapter PresentationGroup Chapter Presentation
Group Chapter Presentation
 
Hallam healthresearch tweetchats
Hallam healthresearch tweetchatsHallam healthresearch tweetchats
Hallam healthresearch tweetchats
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
Social Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutSocial Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - Handout
 
Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar
 

Plus de Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the WebHeiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopHeiko Paulheim
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionHeiko Paulheim
 

Plus de Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 

Dernier

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 

Dernier (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 

Weakly Supervised Learning for Fake News Detection on Twitter

  • 1. 08/30/18 Stefan Helmstetter, Heiko Paulheim 1 Weakly Supervised Learning for Fake News Detection on Twitter Stefan Helmstetter, Heiko Paulheim
  • 2. 08/30/18 Stefan Helmstetter, Heiko Paulheim 2 Motivation • Social media... – ...are an increasingly important source of information – ...can be manipulated easily
  • 3. 08/30/18 Stefan Helmstetter, Heiko Paulheim 3 Motivation • Fake news detection: a straight forward machine learning problem – Simplest case: two classes – Researched for several decades – Used, e.g., for spam filtering
  • 4. 08/30/18 Stefan Helmstetter, Heiko Paulheim 4 Motivation • Challenge – The more training data, the better – Mass labeling data is difficult (e.g., requires investigations) ● cf. spam filtering: labeling can be done “on the fly” by laymen
  • 5. 08/30/18 Stefan Helmstetter, Heiko Paulheim 5 Approach • We cannot easily tell a fake news tweet from a real one • But we have information on fake and trustworthy sources
  • 6. 08/30/18 Stefan Helmstetter, Heiko Paulheim 6 Approach • Naive mass labeling: – every tweet from a fake source is a fake tweet – every tweet from a trustworthy source is a true tweet • Our collection: – 65 fake news sources – 47 trustworthy news sources – 401k tweets ● 111k fake news ● 291k real news
  • 7. 08/30/18 Stefan Helmstetter, Heiko Paulheim 7 Approach • Skew towards 2017 – time of crawling, limitations of Twitter API – more real than fake news (intentionally!)
  • 8. 08/30/18 Stefan Helmstetter, Heiko Paulheim 8 Approach • Naive mass labeling: – every tweet from a fake source is a fake tweet – every tweet from a trustworthy source is a true tweet
  • 9. 08/30/18 Stefan Helmstetter, Heiko Paulheim 9 Approach • Mind the classification task – if we train a classifier, we learn to identify tweets from untrustworthy sources – not necessarily the same as fake news tweets • Assumption – the training dataset is large – non-fake news are also covered by trustworthy sources – trustworthy copies outnumber fake news ones ● incidental skew in the dataset
  • 10. 08/30/18 Stefan Helmstetter, Heiko Paulheim 10 Approach • Leaving that caveat aside, we use – 53 user-level features e.g., no. of followers, tweet frequency – 69 tweet-level features e.g., length, no. of hashtags, no. of URLs – text features as BoW (60k features) or doc2vec model (300 features) – topic features 10-200 topics created using LDA – eight features using sentiment and polarity analysis • Classifiers – Naive Bayes, Decision Trees, SVM, Neural Net (1 hidden layer), Random Forest, xgboost – Voting and weighted voting of the above
  • 11. 08/30/18 Stefan Helmstetter, Heiko Paulheim 11 Approach • Optimal selection of features per classifier
  • 12. 08/30/18 Stefan Helmstetter, Heiko Paulheim 12 Evaluation • Setting 1 – Cross validation on the training set – Remember: actual target is trustworthiness of source • Setting 2 – Validation against a gold standard – Target here: trustworthiness of tweet • Two variants each – with and without user level features – idea: judging tweets from known and unknown sources
  • 13. 08/30/18 Stefan Helmstetter, Heiko Paulheim 13 Evaluation • Setting 1 – Cross validation on the training set – Remember: actual target is trustworthiness of source • Results – up to .78 without user level tweets – up to .94 with user level tweets – xgboost and voting work best
  • 14. 08/30/18 Stefan Helmstetter, Heiko Paulheim 14 Evaluation • Setting 2 – Validation against a gold standard – Target here: trustworthiness of tweet • Results – up to .77 without user level tweets – up to .89 with user level tweets – neural net works best • Observation: – results are not much worse than for setting 1 – i.e.: source labels seem to be a suitable proxy for tweet labels
  • 15. 08/30/18 Stefan Helmstetter, Heiko Paulheim 15 Evaluation • Feature weighting by xgboost: – most important features are user level features
  • 16. 08/30/18 Stefan Helmstetter, Heiko Paulheim 16 Evaluation • Without user level features – surface level features are strong – content/topics are not too important
  • 17. 08/30/18 Stefan Helmstetter, Heiko Paulheim 17 Conclusion • Fake news detection is a straight forward classification task – but training data is scarce • Inexact mass-labeling can be done – by using source instead of tweet labels – collection of large-scale training is easy – automatic re-collection is possible (e.g., for new topics, changed twitter behavior) • Results for tweet labeling – not much worse than for source labeling
  • 18. 08/30/18 Stefan Helmstetter, Heiko Paulheim 18 Weakly Supervised Learning for Fake News Detection on Twitter Stefan Helmstetter, Heiko Paulheim