SlideShare une entreprise Scribd logo
1  sur  15
Filtering out improper user accounts from twitter
user accounts for discovering individuals interested
in certain topic
Chao CAI, Shun SHIRAMATSU
Dept. of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology
Background
• Continuously growing demand on participant-scouting for online opinion
collection(Web-based debate system, online survey, etc. )
• Twitter as an SNS holding over 45 million monthly active users in Japan who can
be the latent participants
• Appearances of improper user account in the user accounts collected by certain
keywords
• (e.g. official account, Bot, etc.)
Collagree
• Web-based debate system
• Also used by local government of Nagoya for opinion collection
• We aim to develop a participant invitation agent
Procedure of invitation agent
Keyword list
extraction(or
prepare in
advance)
Gathering
and filtering
the initial
user account
set
More
specific
classification
of user
group
Participants
invitation
Definition of Improper user account
Official user: specific terms in user onscreen name or description
• (e.g. kousiki akkaunto or company name).
Inactive user: retweeting only the campaign contents, usually without a
description, onscreen name consisting of random characters combination
Robot user: specific terms in user onscreen name or description, description and
tweet content containing Ads or promotion.
• (e.g. bot)
Approach
• Collecting data with Twitter search API and streaming API based on keywords or
hashtags
• For keeping the balance of data (ratio of improper and individual account)
• MeCab for tokenization, TFIDF for vectorization before constructing feature vectors
• Two ways to generate feature vector, Mixed process and Separated process
• Mixed : processing tweet contents and user information (name and description) as one
document
• Separated : processing two parts as two documents in two different corpora
• Using rbf-SVM as learning model
• Performing well in binary classification task
Related work
A Machine Learning Approach to Twitter User Classification (Marco Pennacchiotti
2011)
• Proposed a general model for user profiling and ran a deep analysis on tweet
linguistic contents
• Designing the feature vectors with (1) user Information, (2) tweet contents, (3) tweet behavior and
(4) user relationship
• We dealt with (1) and (2) in this research.
• Not considering description as good-quality information
• 48% of English users not having bio in their description
• Over 50% of Avatar irrelevant to their classification task
• Only aiming for English twitter user
• Differences of use habit between English and Japanese users
Each Tweet
data
User information
(onscreen name &
description)
Tweet contents
tf-idf of one
term
First second third … First
secon
d
third …
Combine
First
secon
d
third … First
secon
d
third …
Information
vector Text vector
Feature vector (Separated)
Tokenization
and
vectorization
First
secon
d
third …
Feature vector (Mixed)
tf-idf of one
term
Training data
• We assumed a particular topic: “child care”
• Firstly collected by streaming API based on keyword
list (子育て, 育児, 待機児童, 育休,ホームスタート, マタニ
ティ, 出産, 子どもの貧困, シングルマザー, 産後, 保育)
• 269 tweet collected, 210 improper accounts, 59 individual
accounts
• Secondly collected by twitter search API based on
hashtag list (#あたしおかあさんだけど,あたしおかあさんだか
ら,#ぼくおとうさんだから,#おまえおとうさんなのに, #おまえお
とうさんだろ) obtained from Twitter trend
• 400 tweet collected, 37 improper account, 363 individual
account
• We fortunately found the hashtags suitable for collect
tweets by individual users
• The data consisted of 669 tweet texts with user
information
• 452 accounts are individuals and 247 ones are improper
accounts.
Improper
78%
Individual
22%
Improper
9%
Individual
91%
Main Idea: Binary Classification
based on the contents of individual
information and tweet
Example of user account groups
Improper
user
Individual
user
SVM settings
The experiment ran on 5 different hyperparameter settings using rbf-kernel SVM
C: the cost parameter
• cost parameter trade off misclassification of training samples against complexity of prediction
surface with gamma.
Default Setting1 Setting2 Setting3 Setting4
C 1 2x10-5 2x1015 2x10-5 2x1015
Gamma 1/n (n: number of
dimension)
2x10-15 2x10-15 2x103 2x103
Results of
experiment
s
Result of experiments
Separate
d 4-pt
higher in
setting2
Mixed 2-pt
higher in
setting2
Separated
1-pt
higher in
setting2
Evaluation
o All settings performing well on recall score:
ounbalance of the data
o Settings2 gave the best balanced performance on both prediction and recall
accuracy:
oThe large C and small gamma providing more support vectors to deal with the
similarity of data
o Manual labeling put an influence on the result
o Mistaken labeling
o Mixed and separated process both performing well
o Separated process providing more feature of data
Conclusions
the contents of user information and tweet can be the essential factor in filtering
task
Still not enough when dealing with much more data
Some keywords or hashtags appearing in Twitter trend may help collecting
individual account
Improper account requiring time to respond to the trend
The model expected to be lack of reliance when dealing with enormous data
Simplicity of feature vector for each user, considering only one tweet of the user
Future work
Propose a method to find hashtags or keywords which can provide mostly individual
accounts
Help collecting training data
Infer some features of improper accounts
Including tweet behavior and user relationship [Marco Pennacchiotti 2011] in feature
vector design
Deep learning will be considered if training data is much more
Link with existed platform (e.g. Collagree)
Experimenting the system in practice

Contenu connexe

Tendances

Arabic tweets categorization
Arabic tweets categorizationArabic tweets categorization
Arabic tweets categorizationcsandit
 
IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...
IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...
IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...IRJET Journal
 
Tweets Classification
Tweets ClassificationTweets Classification
Tweets ClassificationVarun Gupta
 
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)Antonio Moreno
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSmritiAgarwal26
 
Development of learned dictionary based spoken language
Development of learned dictionary based spoken languageDevelopment of learned dictionary based spoken language
Development of learned dictionary based spoken languagePallavi Bharti
 
Doc format.
Doc format.Doc format.
Doc format.butest
 
An Automatic Question Paper Generation : Using Bloom's Taxonomy
An Automatic Question Paper Generation : Using Bloom's   TaxonomyAn Automatic Question Paper Generation : Using Bloom's   Taxonomy
An Automatic Question Paper Generation : Using Bloom's TaxonomyIRJET Journal
 
Performance analysis of the
Performance analysis of thePerformance analysis of the
Performance analysis of thecsandit
 
Pemrograman komputer 3 (representasi data)
Pemrograman komputer  3 (representasi data)Pemrograman komputer  3 (representasi data)
Pemrograman komputer 3 (representasi data)jayamartha
 
Spelling correction systems for e-commerce platforms
Spelling correction systems for e-commerce platformsSpelling correction systems for e-commerce platforms
Spelling correction systems for e-commerce platformsAnjan Goswami
 
SE-IT DSA THEORY SYLLABUS
SE-IT DSA THEORY SYLLABUSSE-IT DSA THEORY SYLLABUS
SE-IT DSA THEORY SYLLABUSnikshaikh786
 
Interactive and Context-Aware Tag Spell Check and Correction
Interactive and Context-Aware Tag Spell Check and CorrectionInteractive and Context-Aware Tag Spell Check and Correction
Interactive and Context-Aware Tag Spell Check and CorrectionFabrizio Silvestri
 

Tendances (17)

Aaai 1
Aaai 1Aaai 1
Aaai 1
 
Arabic tweets categorization
Arabic tweets categorizationArabic tweets categorization
Arabic tweets categorization
 
IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...
IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...
IRJET- A Pragmatic Supervised Learning Methodology of Hate Speech Detection i...
 
Tweets Classification
Tweets ClassificationTweets Classification
Tweets Classification
 
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
Dynamic learning of keyword-based preferences for news recommendation (WI-2014)
 
Scanner class java
Scanner class javaScanner class java
Scanner class java
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
Development of learned dictionary based spoken language
Development of learned dictionary based spoken languageDevelopment of learned dictionary based spoken language
Development of learned dictionary based spoken language
 
Doc format.
Doc format.Doc format.
Doc format.
 
An Automatic Question Paper Generation : Using Bloom's Taxonomy
An Automatic Question Paper Generation : Using Bloom's   TaxonomyAn Automatic Question Paper Generation : Using Bloom's   Taxonomy
An Automatic Question Paper Generation : Using Bloom's Taxonomy
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Performance analysis of the
Performance analysis of thePerformance analysis of the
Performance analysis of the
 
Pemrograman komputer 3 (representasi data)
Pemrograman komputer  3 (representasi data)Pemrograman komputer  3 (representasi data)
Pemrograman komputer 3 (representasi data)
 
Spelling correction systems for e-commerce platforms
Spelling correction systems for e-commerce platformsSpelling correction systems for e-commerce platforms
Spelling correction systems for e-commerce platforms
 
SE-IT DSA THEORY SYLLABUS
SE-IT DSA THEORY SYLLABUSSE-IT DSA THEORY SYLLABUS
SE-IT DSA THEORY SYLLABUS
 
Interactive and Context-Aware Tag Spell Check and Correction
Interactive and Context-Aware Tag Spell Check and CorrectionInteractive and Context-Aware Tag Spell Check and Correction
Interactive and Context-Aware Tag Spell Check and Correction
 

Similaire à Filtering out improper user accounts from twitter user accounts for discovering individuals interested in certain topic

IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesVarun Nathan
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
Named Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationNamed Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationIRJET Journal
 
IRJET - Artificial Conversation Entity for an Educational Institute
IRJET - Artificial Conversation Entity for an Educational InstituteIRJET - Artificial Conversation Entity for an Educational Institute
IRJET - Artificial Conversation Entity for an Educational InstituteIRJET Journal
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVFrancisco Couto
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social mediaJeremiah Fadugba
 
Sentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningSentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningIRJET Journal
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service iiKan-Han (John) Lu
 
HND Assignment Brief Session Sept.docx
              HND Assignment Brief               Session Sept.docx              HND Assignment Brief               Session Sept.docx
HND Assignment Brief Session Sept.docxjoyjonna282
 
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly DetectionDetection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly DetectionIJERA Editor
 
A Personalized Assistant Framework for Service Recommendation
A Personalized Assistant Framework for Service RecommendationA Personalized Assistant Framework for Service Recommendation
A Personalized Assistant Framework for Service RecommendationPradeep K. Venkatesh
 
Managing a Global DATIM Help Desk: Lessons Learned
Managing a Global DATIM Help Desk: Lessons LearnedManaging a Global DATIM Help Desk: Lessons Learned
Managing a Global DATIM Help Desk: Lessons LearnedMEASURE Evaluation
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineeringalessio_ferrari
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Hima Patel
 

Similaire à Filtering out improper user accounts from twitter user accounts for discovering individuals interested in certain topic (20)

IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
ML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queriesML Framework for auto-responding to customer support queries
ML Framework for auto-responding to customer support queries
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Named Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet SegmentationNamed Entity Recognition using Tweet Segmentation
Named Entity Recognition using Tweet Segmentation
 
IRJET - Artificial Conversation Entity for an Educational Institute
IRJET - Artificial Conversation Entity for an Educational InstituteIRJET - Artificial Conversation Entity for an Educational Institute
IRJET - Artificial Conversation Entity for an Educational Institute
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Text analytics in social media
Text analytics in social mediaText analytics in social media
Text analytics in social media
 
Sentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine LearningSentiment Analysis on Twitter data using Machine Learning
Sentiment Analysis on Twitter data using Machine Learning
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
HND Assignment Brief Session Sept.docx
              HND Assignment Brief               Session Sept.docx              HND Assignment Brief               Session Sept.docx
HND Assignment Brief Session Sept.docx
 
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly DetectionDetection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
 
A Personalized Assistant Framework for Service Recommendation
A Personalized Assistant Framework for Service RecommendationA Personalized Assistant Framework for Service Recommendation
A Personalized Assistant Framework for Service Recommendation
 
Managing a Global DATIM Help Desk: Lessons Learned
Managing a Global DATIM Help Desk: Lessons LearnedManaging a Global DATIM Help Desk: Lessons Learned
Managing a Global DATIM Help Desk: Lessons Learned
 
Survey Research In Empirical Software Engineering
Survey Research In Empirical Software EngineeringSurvey Research In Empirical Software Engineering
Survey Research In Empirical Software Engineering
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
 

Plus de siramatu-lab

高出力BLEビーコンによる 認知症高齢者見守りのための 徘徊経路可視化機構の試作
高出力BLEビーコンによる認知症高齢者見守りのための徘徊経路可視化機構の試作高出力BLEビーコンによる認知症高齢者見守りのための徘徊経路可視化機構の試作
高出力BLEビーコンによる 認知症高齢者見守りのための 徘徊経路可視化機構の試作siramatu-lab
 
Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法
Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法
Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法siramatu-lab
 
議題の関連情報推薦によるIBIS構造作成支援システムの試作
議題の関連情報推薦によるIBIS構造作成支援システムの試作議題の関連情報推薦によるIBIS構造作成支援システムの試作
議題の関連情報推薦によるIBIS構造作成支援システムの試作siramatu-lab
 
Watanabe civictechforum
Watanabe civictechforumWatanabe civictechforum
Watanabe civictechforumsiramatu-lab
 
Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...
Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...
Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...siramatu-lab
 
Prototype System for Recommending Academic Subjects for Students' Self Design...
Prototype System for Recommending Academic Subjects for Students' Self Design...Prototype System for Recommending Academic Subjects for Students' Self Design...
Prototype System for Recommending Academic Subjects for Students' Self Design...siramatu-lab
 
Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...siramatu-lab
 
Improvisation Ensemble Support Systems for Music Beginners based on Body Mot...
Improvisation Ensemble Support Systems for Music  Beginners based on Body Mot...Improvisation Ensemble Support Systems for Music  Beginners based on Body Mot...
Improvisation Ensemble Support Systems for Music Beginners based on Body Mot...siramatu-lab
 
韻律情報による議論の場の空気推定手法の検討
韻律情報による議論の場の空気推定手法の検討韻律情報による議論の場の空気推定手法の検討
韻律情報による議論の場の空気推定手法の検討siramatu-lab
 
即興合奏時のコード進行をユーザがデザインする機構の検討
即興合奏時のコード進行をユーザがデザインする機構の検討即興合奏時のコード進行をユーザがデザインする機構の検討
即興合奏時のコード進行をユーザがデザインする機構の検討siramatu-lab
 
BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作
BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作
BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作siramatu-lab
 
議論参加者の脳波による議論の場の空気推定手法の検討
議論参加者の脳波による議論の場の空気推定手法の検討議論参加者の脳波による議論の場の空気推定手法の検討
議論参加者の脳波による議論の場の空気推定手法の検討siramatu-lab
 
視線と表情を用いた議論の場の空気の推定手法の検討
視線と表情を用いた議論の場の空気の推定手法の検討視線と表情を用いた議論の場の空気の推定手法の検討
視線と表情を用いた議論の場の空気の推定手法の検討siramatu-lab
 
ipsj全国大会発表スライド_水野
ipsj全国大会発表スライド_水野ipsj全国大会発表スライド_水野
ipsj全国大会発表スライド_水野siramatu-lab
 
2017ipsj全国大会発表スライド_宮脇
2017ipsj全国大会発表スライド_宮脇2017ipsj全国大会発表スライド_宮脇
2017ipsj全国大会発表スライド_宮脇siramatu-lab
 
2017ipsj全国大会発表スライド_一ノ瀬
2017ipsj全国大会発表スライド_一ノ瀬2017ipsj全国大会発表スライド_一ノ瀬
2017ipsj全国大会発表スライド_一ノ瀬siramatu-lab
 
2017ipsj全国大会発表スライド_福本
2017ipsj全国大会発表スライド_福本2017ipsj全国大会発表スライド_福本
2017ipsj全国大会発表スライド_福本siramatu-lab
 
白松研卒論発表_渡辺
白松研卒論発表_渡辺白松研卒論発表_渡辺
白松研卒論発表_渡辺siramatu-lab
 
2017ipsj全国大会発表スライド_池田
2017ipsj全国大会発表スライド_池田2017ipsj全国大会発表スライド_池田
2017ipsj全国大会発表スライド_池田siramatu-lab
 

Plus de siramatu-lab (20)

高出力BLEビーコンによる 認知症高齢者見守りのための 徘徊経路可視化機構の試作
高出力BLEビーコンによる認知症高齢者見守りのための徘徊経路可視化機構の試作高出力BLEビーコンによる認知症高齢者見守りのための徘徊経路可視化機構の試作
高出力BLEビーコンによる 認知症高齢者見守りのための 徘徊経路可視化機構の試作
 
Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法
Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法
Web 議論の自動ファシリテーションのための事前知識を用いた質問生成手法
 
議題の関連情報推薦によるIBIS構造作成支援システムの試作
議題の関連情報推薦によるIBIS構造作成支援システムの試作議題の関連情報推薦によるIBIS構造作成支援システムの試作
議題の関連情報推薦によるIBIS構造作成支援システムの試作
 
Watanabe civictechforum
Watanabe civictechforumWatanabe civictechforum
Watanabe civictechforum
 
Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...
Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...
Supporting System of Improvisational Ensemble Based on User's Motion Using Sm...
 
Prototype System for Recommending Academic Subjects for Students' Self Design...
Prototype System for Recommending Academic Subjects for Students' Self Design...Prototype System for Recommending Academic Subjects for Students' Self Design...
Prototype System for Recommending Academic Subjects for Students' Self Design...
 
Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...Tag-based Approaches to Sharing Background Information regarding Social Probl...
Tag-based Approaches to Sharing Background Information regarding Social Probl...
 
Improvisation Ensemble Support Systems for Music Beginners based on Body Mot...
Improvisation Ensemble Support Systems for Music  Beginners based on Body Mot...Improvisation Ensemble Support Systems for Music  Beginners based on Body Mot...
Improvisation Ensemble Support Systems for Music Beginners based on Body Mot...
 
韻律情報による議論の場の空気推定手法の検討
韻律情報による議論の場の空気推定手法の検討韻律情報による議論の場の空気推定手法の検討
韻律情報による議論の場の空気推定手法の検討
 
即興合奏時のコード進行をユーザがデザインする機構の検討
即興合奏時のコード進行をユーザがデザインする機構の検討即興合奏時のコード進行をユーザがデザインする機構の検討
即興合奏時のコード進行をユーザがデザインする機構の検討
 
BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作
BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作
BLEビーコンを所持する徘徊高齢者のいち推定結果可視化機構の試作
 
議論参加者の脳波による議論の場の空気推定手法の検討
議論参加者の脳波による議論の場の空気推定手法の検討議論参加者の脳波による議論の場の空気推定手法の検討
議論参加者の脳波による議論の場の空気推定手法の検討
 
視線と表情を用いた議論の場の空気の推定手法の検討
視線と表情を用いた議論の場の空気の推定手法の検討視線と表情を用いた議論の場の空気の推定手法の検討
視線と表情を用いた議論の場の空気の推定手法の検討
 
Ikeda ica2017
Ikeda ica2017Ikeda ica2017
Ikeda ica2017
 
ipsj全国大会発表スライド_水野
ipsj全国大会発表スライド_水野ipsj全国大会発表スライド_水野
ipsj全国大会発表スライド_水野
 
2017ipsj全国大会発表スライド_宮脇
2017ipsj全国大会発表スライド_宮脇2017ipsj全国大会発表スライド_宮脇
2017ipsj全国大会発表スライド_宮脇
 
2017ipsj全国大会発表スライド_一ノ瀬
2017ipsj全国大会発表スライド_一ノ瀬2017ipsj全国大会発表スライド_一ノ瀬
2017ipsj全国大会発表スライド_一ノ瀬
 
2017ipsj全国大会発表スライド_福本
2017ipsj全国大会発表スライド_福本2017ipsj全国大会発表スライド_福本
2017ipsj全国大会発表スライド_福本
 
白松研卒論発表_渡辺
白松研卒論発表_渡辺白松研卒論発表_渡辺
白松研卒論発表_渡辺
 
2017ipsj全国大会発表スライド_池田
2017ipsj全国大会発表スライド_池田2017ipsj全国大会発表スライド_池田
2017ipsj全国大会発表スライド_池田
 

Dernier

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Filtering out improper user accounts from twitter user accounts for discovering individuals interested in certain topic

  • 1. Filtering out improper user accounts from twitter user accounts for discovering individuals interested in certain topic Chao CAI, Shun SHIRAMATSU Dept. of Computer Science, Graduate School of Engineering, Nagoya Institute of Technology
  • 2. Background • Continuously growing demand on participant-scouting for online opinion collection(Web-based debate system, online survey, etc. ) • Twitter as an SNS holding over 45 million monthly active users in Japan who can be the latent participants • Appearances of improper user account in the user accounts collected by certain keywords • (e.g. official account, Bot, etc.)
  • 3. Collagree • Web-based debate system • Also used by local government of Nagoya for opinion collection • We aim to develop a participant invitation agent
  • 4. Procedure of invitation agent Keyword list extraction(or prepare in advance) Gathering and filtering the initial user account set More specific classification of user group Participants invitation
  • 5. Definition of Improper user account Official user: specific terms in user onscreen name or description • (e.g. kousiki akkaunto or company name). Inactive user: retweeting only the campaign contents, usually without a description, onscreen name consisting of random characters combination Robot user: specific terms in user onscreen name or description, description and tweet content containing Ads or promotion. • (e.g. bot)
  • 6. Approach • Collecting data with Twitter search API and streaming API based on keywords or hashtags • For keeping the balance of data (ratio of improper and individual account) • MeCab for tokenization, TFIDF for vectorization before constructing feature vectors • Two ways to generate feature vector, Mixed process and Separated process • Mixed : processing tweet contents and user information (name and description) as one document • Separated : processing two parts as two documents in two different corpora • Using rbf-SVM as learning model • Performing well in binary classification task
  • 7. Related work A Machine Learning Approach to Twitter User Classification (Marco Pennacchiotti 2011) • Proposed a general model for user profiling and ran a deep analysis on tweet linguistic contents • Designing the feature vectors with (1) user Information, (2) tweet contents, (3) tweet behavior and (4) user relationship • We dealt with (1) and (2) in this research. • Not considering description as good-quality information • 48% of English users not having bio in their description • Over 50% of Avatar irrelevant to their classification task • Only aiming for English twitter user • Differences of use habit between English and Japanese users
  • 8. Each Tweet data User information (onscreen name & description) Tweet contents tf-idf of one term First second third … First secon d third … Combine First secon d third … First secon d third … Information vector Text vector Feature vector (Separated) Tokenization and vectorization First secon d third … Feature vector (Mixed) tf-idf of one term
  • 9. Training data • We assumed a particular topic: “child care” • Firstly collected by streaming API based on keyword list (子育て, 育児, 待機児童, 育休,ホームスタート, マタニ ティ, 出産, 子どもの貧困, シングルマザー, 産後, 保育) • 269 tweet collected, 210 improper accounts, 59 individual accounts • Secondly collected by twitter search API based on hashtag list (#あたしおかあさんだけど,あたしおかあさんだか ら,#ぼくおとうさんだから,#おまえおとうさんなのに, #おまえお とうさんだろ) obtained from Twitter trend • 400 tweet collected, 37 improper account, 363 individual account • We fortunately found the hashtags suitable for collect tweets by individual users • The data consisted of 669 tweet texts with user information • 452 accounts are individuals and 247 ones are improper accounts. Improper 78% Individual 22% Improper 9% Individual 91%
  • 10. Main Idea: Binary Classification based on the contents of individual information and tweet Example of user account groups Improper user Individual user
  • 11. SVM settings The experiment ran on 5 different hyperparameter settings using rbf-kernel SVM C: the cost parameter • cost parameter trade off misclassification of training samples against complexity of prediction surface with gamma. Default Setting1 Setting2 Setting3 Setting4 C 1 2x10-5 2x1015 2x10-5 2x1015 Gamma 1/n (n: number of dimension) 2x10-15 2x10-15 2x103 2x103
  • 12. Results of experiment s Result of experiments Separate d 4-pt higher in setting2 Mixed 2-pt higher in setting2 Separated 1-pt higher in setting2
  • 13. Evaluation o All settings performing well on recall score: ounbalance of the data o Settings2 gave the best balanced performance on both prediction and recall accuracy: oThe large C and small gamma providing more support vectors to deal with the similarity of data o Manual labeling put an influence on the result o Mistaken labeling o Mixed and separated process both performing well o Separated process providing more feature of data
  • 14. Conclusions the contents of user information and tweet can be the essential factor in filtering task Still not enough when dealing with much more data Some keywords or hashtags appearing in Twitter trend may help collecting individual account Improper account requiring time to respond to the trend The model expected to be lack of reliance when dealing with enormous data Simplicity of feature vector for each user, considering only one tweet of the user
  • 15. Future work Propose a method to find hashtags or keywords which can provide mostly individual accounts Help collecting training data Infer some features of improper accounts Including tweet behavior and user relationship [Marco Pennacchiotti 2011] in feature vector design Deep learning will be considered if training data is much more Link with existed platform (e.g. Collagree) Experimenting the system in practice

Notes de l'éditeur

  1. Thanks for coming at first, please let me introduce myself my name is xxxxx from department of xxxxxx Today I want to talk about our own research, the title of which is xxxxxx --------------
  2. Lets begin with the background of our research Since we are living in the IT society. There is definitely growing need of xxxxx Where we can find latent participants, we considered the social network service such as twitter Twitter as an SNS is holding xxxx who we want to invite to those events But during the collection of user data based on certain keyword list, a lot of improper user appeared such as xxxx who we want to get rid of
  3. As we mentioned before, there are a lot of web-based debate system, in this research, we concentrated on collagree Collagree is aiming for consensus generating and also used by Nagoya government to collect regional residences opinion For the better use of this platform, we think it would be great if we can invite more people from different locations with a diversity of backgrounds to offer their new ideas So we plan to develop a participants invitation agent for this system.
  4. Here is the procedure of the whole agent Firstly the agent will receive a keyword list which can be prepared by human or extracted from the introduction of the debate topic And then the agent will collect the user set based on the list and filter out the improper user Before the agent actually sent the invitation, there will be a more specific classification of the user group to find out which user can really attend the debate And then the agent will sent the invitation to the users This research is focused on the second part, filter out the improper user. So which kind of account is improper
  5. Here is how we defined the improper account There are three kinds of them Official user account is used by company or public facilities, they usually have specific terms in their user onscreen name or description The second is inactive user, who only retweet the campaign contents for a gift, and they usually don’t have a description but with random characters combination in their onscreen name The last one is robot user who are also likely to have specific terms in their description or onscreen name such as bot, and often there are Ads or promotion information in their tweet. to filter out these kinds of accounts
  6. Here is the approach for this research To begin with, we collect the data with twitter search api and streaming api based on the keywords or hashtags to keep the balance of positive and negative samples Since they are mainly written in Japanese, we need mecab to tokenize and use the tfidf for vectorization We proposed two ways to generate the feature vector which are mix process and separated process which I like to demonstrate later So for the mixed process, we xxxxx And for separated process, we xxxx And we choose the rbf-kernel SVM as learning model since the SVM perform well in binary classification
  7. There are some related work. One was done by Marco Pennacchiotti 2011 Xxxxxx They proposed a general model for user profiling and ran a deep analysis on tweet linguistic contents. They designed their feature vector with four parts, xxxxxx And we dealt with xxxxx in our research However they did not consider xxxxx Since there are xxxxxxx and over 50% xxxx And their research was focused on English twitter user But there are definitely a lot of differences between English and Japanese user such as the use habit, language As I mentioned we utilized user information and tweet contents for feature vector design Here I like to give you a walk through about the design
  8. So firstly we got the initial data which consist of ----------- For the mixed process, we process these two parts as one document to generate one vector , and each dimension is filled with the value of TFIDF of one term this is the feature vector of mixed process for separated process, we process these these two parts separately to generate two vector by tfidf of course each dimension is filled with the value of tfidf of one term then we combine two vector into one And this is the feature vector of separated process And then we try this approach in practice
  9. Here is the training data for experiment We firstly assume a topic for debate in collagree, child care Then we collect the data twice First time is by streaming api based on the keyword list as you can see Among the 269 users, 78% percent are improper Send time we used the twitter search api based on the hashtag which we happened to find in twitter trend The hashtag is about a song which is related to child care In this time of search, 91 percent of all 400 tweets are tweeted by individual users So the whole data consisted of 669 tweets contents with user information
  10. Here’s the samples from each group in the data
  11. To do that, we use SVM ran on 5 different hyperparameter settings Xxxxx with different combination of C and gamma C by the way, is the cost parameter which will trade off the misclassification against complexity of prediction surface with gamma. --------
  12. Lets take a look at the result of experiments You can see setting2 gave the best performance on F measure And the separated process is one point higher than mixed process by setting2 On recall score separated is 4 point higher but on precision mixed one is 2 point higher And though all settings performed well on recall score only setting2 gave well performance on precision score
  13. And we consider that the unbalance and lack of data was the reason why all settings gave well performance on recall score And about the setting2 giving the best balance performance, we think that is because large C and small gamma providing more support vectors to deal with the similarity of data Since we label the data ourselves, the result could affected by the mistaken label of human. and though the mixed process and separated process both performed well, but we think that separated process can give more feature of the data
  14. Here the conclusions, We consider that the contents of user information and tweet can be important factors in filtering task But they are still not enough if the data is much more Some keyword list or hashtags in twitter trend probably can help collecting individual account We are considering that improper accounts may need time to react to those trend But the model is expected to be unreliable when dealing with a large amount of data Since the feature vector is similar with each other and we only consider one tweet for each user
  15. For our future work, We would like to find a way to detect the hashtags or keywords which can help us find more individual account to help us collect training data and maybe it can reveal some features of improper users We are planning to include tweet behavior and relationship of user in feature vector design, and we are considering introduce the deep learning into our research if we can get enough data Finally we want to connect our filter system to the existed platform, which in this case, collagree, to evaluate our system in practice