SlideShare une entreprise Scribd logo
1  sur  49
Télécharger pour lire hors ligne
Data-driven Studies on Social Networks:
Privacy and Simulation
1
Sameera Horawalavithana
Ph.D. Candidate,
Department of Computer Science and Eng.,
University of South Florida
sameera1@usf.edu
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
2
Privacy in Social Networks
● Data breaches happen regularly where adversaries use sophisticated techniques
(i.e., de-anonymization) to defeat data protection (i.e., anonymization)
mechanisms.
● A main research challenge is to develop a principled understanding of how to
measure the effectiveness of an anonymization scheme and thus, conversely, the
likely success of a de-anonymization attack.
● We introduce and experiment with a framework that identifies the relationships
between graph vulnerability and graph properties (Horawalavithana et al. 2019).
○ We show that protecting graph privacy is harder than previously considered
○ For example, our results show that preserving other network properties independent of the degree
distribution can reveal node identity.
● We quantitatively study the impact of binary node attributes on node privacy using
this framework (Horawalavithana et al. 2018).
○ Our experiments show that the population’s diversity on the binary attribute consistently degrades
anonymity
3
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
4
Social Simulations
5
● Why do we need to develop accurate simulation techniques for online media
information?
○ Helpful for intervention techniques, disaster response, fraud detection, censorship removal, picking
up signals/trends as they relate to current events, etc.
Organic Discussions on Reddit Venezeulan Political Crisis
Social Simulations
● A reliable simulator can realistically respond to
internal and external stimuli and adapt to different
platforms, datasets, scenarios, each with different
characteristics.
● Our objective is to forecast finer-granular social
media activity without relying on the ground truth
in the testing period.
● Simulation results should match to the real world
data. The accuracy is measured by a set of
meaningful metrics that capture both macro-level
and micro-level simulation information.
6
Social Simulations
● Our Approach: We combine social theories with machine learning
methodologies for predicting information dissemination within and across
social online environments.
● Datasets: The majority of the datasets used in this work were collected by
Leidos, the official data provider in the DARPA SocialSim program
● Metrics: We used the evaluation code that was developed by Pacific
Northwest National Laboratory.
7
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
8
Multi-platform Cascades Social Simulator (MCAS)
● Design: Given a history of per-topic social media events and relevant exogenous
events, predict the number of information cascades and the size and growth of
cascades in the future.
● Three main design components:
○ Topic Module annotates messages with topics. This module was implemented by one of our
collaborators. They manually annotated an initial subset of messages with a predefined list of topics,
and trained a multilingual BERT model to classify each message with one or multiple such sub-topics.
○ Seed Module includes ML models that specialize predictions to particular macro-level sub-problems
(e.g., daily # cascades)
○ Cascade Module includes a probabilistic generative model to predict the micro-level events
information (e.g., who did what to whom) in the form of cascades.
9
Multi-platform Cascades Social Simulator (MCAS)
● We present two scenarios that motivate the design of the social simulators.
○ We use the endogenous features as extracted from in-platform discussions to predict the
growth of conversations on Reddit (Scenario #1).
○ We use both endogenous (e.g., in-platform discussions related to topics) and exogenous
(e.g., news articles) features to predict Twitter activity (Scenario #2).
10
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
11
Scenario #1: Endogenous Signals
● Given a set of "seeds" (e.g., original posts on a social platform, such as posts on
Reddit) in a continuous interval of time on a platform, can one predict the
information cascade trees (who responds to whom when) rooted in these seeds?
○ Can discussion threads be predicted using only post features (e.g., author who posts the
initial message, timing, textual content of the post)?
12
Scenario #1: Endogenous Signals
Conversation Pool Generation Algorithm
1. Generate N pools of conversations
probabilistically
a. Conversation Structure: We use the branching process to
generate the conversation structure
b. User: Users are assigned to conversation nodes following
the preferential attachment principle.
c. Timing: We use a distribution of message propagation
delays to estimate the timing
2. Test the goodness of generated conversation pools
using two trained classification models
3. Reconstruct the pool of conversations with the
feedback from the classification models
13
Generate N
number of
Conversation
Pools
Goodness Test
Reconstruct the
Best
Conversation
Pool
Scenario #1: Endogenous Signals
● Test the goodness of generated
conversation pools using two trained
classification models
○ We use the classification models to
assess how realistic is the generated
conversation with the attached user and
timing information.
○ We use two individual-level
properties—branching factor and
propagation delay—of conversation
nodes as the target units for the
prediction tasks.
○ We represent conversation information
in a data structure (as shown in Fig.
5.2) where each conversation node is
described by structural, user and
content features (Table 5.4).
14
Scenario #1: Endogenous Signals
● Goodness score of a conversation
○ We use the Area Under Curve (AUC) of
two branch vectors and two delay
vectors to calculate the goodness score
of a conversation.
○ Each conversation receives a goodness
score as the mean of two AUC scores
from the two models.
● This goodness score is used to
know which conversation is the best
during the simulation.
15
Scenario #1: Endogenous Signals
● Reconstruct the pool of conversations with the feedback from the
classification models
○ The objective is to create a pool of conversations that outperforms any
existing pool of conversations.
○ We treat the pool reconstruction problem as an optimization problem
that we solve using a genetic algorithm.
■ A gene is a conversation represented by the message tree with
assigned user and timing information to nodes.
■ An individual is a pool of conversations.
■ The population is the set of conversation pools.
16
Scenario #1: Endogenous Signals
17
Rank Pools New Pool Construction Reconstructed Pools
Uniform Crossover
Conversation
A Pool of Conversations
The goodness of
a pool of
conversations is
the sum of the
goodness scores
of the
conversations in
the pool.
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
18
Scenario #1: Dataset
● We used a Reddit dataset covering the discussions in nine crypto currency
and 38 cyber security related subreddits between January 2015 and August
2017 to train and test the simulator.
●
19
Measurement Crypto Cyber
Number of
Posts
0.2M 1.76M
Number of
Comments
3.5M 35.3M
Number of
Users
0.14M 1.6M
Scenario #1: Overlapping Conversations
● Users respond with comments to
the original post or other users’
comments, repeatedly getting
involved in the same conversation.
● The same user can participate in
multiple related conversation
threads
20
Bitcoin scaling debate discussions on August 2017. There are
57 conversations with 4,418 messages posted by 1,458 users.
218 and 83 users appeared in more than one, and two
conversations, respectively.
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
21
Scenario #1: Evaluation
● We predict the growth of Reddit conversations in one month (August 01 -
August 31, 2017).
○ We use the posts made between August 1 and August 3, 2017 as input seed posts.
○ There were 3,740 and 3,463 number of posts in the crypto-currency and cyber-security
domains, respectively.
● We use three baseline models.
○ Recent Replay baseline repeats the most recent n conversations from the training data.
○ Random baseline draws n conversations from the training data at random. We repeat this
process 10 times to minimize the bias of random selection.
○ Lumbreras Model uses the branching process in the generation of conversation
structures (Aragon et al. 2017).
22
Scenario #1: Evaluation
● Predicting the structure of cascades
○ We report the distribution of the size and structural virality of generated conversations
■ Structural virality is measured by the Wiener index of conversation trees (Goel et al. 2015)
○ We calculate the JS divergence between the distributions of the structural metrics reported of the
generative models and of the ground truth
23
Scenario #1: Evaluation
● Predicting the temporal growth of conversations
○ We report the growth of the Reddit discussions by the daily number of comments over 1 month.
○ We compare the predicted time series and ground truth time series using Dynamic Time
Warping (DTW) and Root Mean Square Error (RMSE) metrics.
24
Discussions on
crypto-currency
subreddits
Discussions on
cyber-security
subreddits
Scenario #1: Evaluation
● Predicting the user engagement
○ We compare the number of users engaged in
multiple conversations between simulation and
ground truth (Fig. 5.9)
● Predicting the collective behavior
○ We record user participation in conversations in a
vector [c1
, c2
, ..., cn
], where ci
indicates a binary
value to reflect the user involvement in the ith
conversation.
○ We use the Pearson correlation coefficient to
compare all pairs of binary vectors.
○ We calculate the JS-divergence and RMSE between
the coefficient distributions of the simulation and the
ground truth data (Table 5.9).
○ Lower JS-divergence values reflect collective
behavior closer to that measured from the ground
25
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
26
Scenario #2: Exogenous Signals
• Can one accurately generate the social media activity on a platform (for
example, Twitter) using the recorded signals from other platforms?
• Is that doable in the context of unexpected events, when social media users both react to
unexpected news in unpredictable ways and also generate news for many news outlets?
27
27
Scenario #2: Exogenous Signals
● Seed Module
○ We train multiple neural network models to predict the number of daily tweets per topic.
○ The module variations depend on the exogenous sources and recency of features.
■ Exogenous features are the number of news articles, and the number of Reddit posts
per topic. They are extracted on the “day before” and “day of” predictions.
○ We assign users to the predicted tweets randomly with probability proportional to the user
spread score.
■ The spread score for user u is the product of the fraction of the number of tweets
posted by u that get retweeted and the total number of retweets that user u gets for his
tweets (Alp et al. 2018).
■ Intuitively, the spread score captures the level of influence of a user: the higher the
spread score, the more influential the user is.
● Cascade Module is similar to the solution presented in Scenario #1.
○ This module takes the tweets predicted by the seed module as input.
○ We assign new users to the cascades.
■ We select leaves of the cascades predicted for each topic and assign those users a
completely new and unique identifier. 28
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
29
Scenario #2: Dataset
30
• Twitter Dataset
• We used a Twitter dataset covering the
Venezuelan Presidential Crisis between
January and February 2019.
• This dataset covers a period of high
political tension which resulted in
nationwide protests, militarized responses,
and incidents of mass violence and arrests.
Number of Tweets ~1M
Number of Retweets ~11.6M
Number of Users ~1.15M
Scenario #2: Dataset
31
• Exogenous Data Sources
• We collected Reddit discussions from
one of the largest Venezuela-related
subreddits, /r/vzla.
• The news article data was collected
via a publicly available geopolitical
event database, GDELT
Number of Reddit Messages 56K
Number of News Articles 138K
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
32
Scenario #2: Evaluation
● We predict Twitter activity in two weeks (February 15 - February 28, 2019).
● We use two baselines,
○ Replay baseline repeats the messages from the last two weeks of training data.
○ Sampling baseline draws full Twitter cascades at random to match the average daily
volume of activity per topic observed in the last two weeks of training data.
● We use three metrics,
○ Time series comparison
■ NRMSE (Normalized Root Mean Squared Error) to capture temporal pattern
■ SMAPE (Symmetric Mean Absolute Percentage Error) to capture the volume and
temporal pattern
○ Distribution level comparison
■ EM (Earth Movers Distance) to compare the page-rank distributions.
33
Scenario #2: Evaluation
● Predicting the daily number of tweets per topic.
○ We predict the big spikes in the number of tweets for most of the popular topics.
○ But spikes are mistimed in the models that use the features on the day before the predictions
(see dash lines).
34
Scenario #2: Evaluation
● Predicting the daily number of
tweets per topic.
○ Multiple variants of our solution capture
the trend of the number of tweets closer
to the ground truth than any baselines for
most of the topics.
○ The models that use the news articles in
the last 24 hours before 8 a.m. perform
better on predicting the trend of tweets
than the models that use the news
articles in the previous day of predictions
(see two light green bars in Fig. a)
○ Using current day exogenous data leads
to more accurate predictions than using
the previous day exogenous data
35
Scenario #2: Evaluation
● Predicting the daily number of tweets and retweets per topic.
○ Retweets are predicted by the cascade module. The temporal pattern of retweets is driven
mostly by the temporal pattern of tweets predicted by the seed module.
36
Scenario #2: Evaluation
● Predicting the daily number of tweets and
retweets per topic.
○ Similar to the performance of the seed module, the
cascade module also captures the trend of number
of shares closer to the ground truth than any
baselines for most of the topics.
○ Results suggest that most representative
exogenous sources depend on the topic of interest.
■ News articles are more helpful to predict the
topics related to international humanitarian aid
event and violent clashes between the military
and protesters.
■ Reddit discussions are more helpful to predict
topics related to the Maduro’s dictatorship.
37
Performance View, #S- number of shares over time, #NU
- number of new user engagements over time, page rank
(PR) measurements. Green cells present that models
beat the baselines.
Predicting Twitter topic activity using Reddit discussions
Predicting Twitter topic activity using News Articles
Case Study #2: Evaluation
● Predicting the daily number of new user
engagements per topic.
○ Our models outperform the respective baselines
across all 12 topics with respect to NRMSE and
SMAPE
○ Models using only Reddit features show better
performance than those using only news in arrests
and maduro/narco topics
● Predicting the user interaction network
○ We create a directed retweet network for each topic in
which an edge points from the user who retweeted to
the user who posted the tweet.
○ The pagerank distribution of the user interaction
network is closer to the ground truth than the
Sampling baseline method for a majority of topics.
○ The network structures predicted by the Replay
baseline model are hard to beat in this network
measurement.
38
Performance View, #S- number of shares over time, #NU
- number of new user engagements over time, degree
(DEG) and page rank (PR) measurements. Green cells
present that models beat the baselines.
Predicting Twitter topic activity using Reddit discussions
Predicting Twitter topic activity using News Articles
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
39
Lessons Learnt
• Recency matters
• To predict the social media activity (i.e., the volume of messages and the user interaction
network) in the immediate future, the immediate past is more useful than the delayed past.
• This would also make the baselines very competitive as they re-generate the recent past.
• Recency and Locality matter
• To predict activity within a particular topic, the recent activity within the same topic matters.
• This observation may be biased to the design of the topic assignment model (e.g., manual
annotation process, the distribution of topics, topic co-occurrence, etc.)
• Recency introduces small-ish data, but ML models need big-ish data?
• The number of data points available for training is depending on the time granularity of the
predictions. For example, one can generate more data points in the hourly granularity (or
less) than in the daily or weekly granularity.
• We increase the number of data points available for training by splitting the data based on
the topic. For example, given N number of topics, and M number of days, we can create N x
M number of data points. This also increases the variation in the training data which helps
ML models to learn multiple topic activity.
40
Lessons Learnt
• Exogenous features matter
• There are many potential exogenous data sources to capture the real-world events. But
selecting the most representative exogenous features to predict topic activity matter.
• “Big” spikes are hard to predict
• We tested our simulators on special cases (e.g., political crisis, influence campaigns) which
include big spikes due to external events.
• Exogenous features on the “day of” and “day before” predictions had a big impact on
predicting spikes more accurately.
• Long vs. short time horizon predictions
• The overall volume of activity can be predicted in the long time horizon with the help of
exogenous features, but predicting the temporal pattern is hard due to compounding errors in
the simulation.
• Hard to predict the structure of the user interaction network
• We found the baselines are hard to beat in the network structural measurements.
• As they regenerate the past, they capture the patterns of user interactions more accurately.
41
Outline
● Privacy in Social Networks
● Social Simulations
● The Design of the Multi-platform Cascades (MCAS) Social Simulator
○ Scenario #1: Endogenous Signals
■ Dataset
■ Evaluation
○ Scenario #2: Exogenous Signals
■ Dataset
■ Evaluation
● Lessons Learnt
● Future Work
42
Future Work
• Reducing the error accumulated over different modules in the pipeline design
• Any error on predicting the volume of discussions can not be resolved later in the current
pipeline design. Accurately identifying which module penalizes overall prediction is important
to make improvements
• Testing the generalizability of modules across various other simulation
scenarios, and datasets.
• E.g., influence operations, disinformation campaigns, private group discussions, etc.
• Explaining the performance of simulators
• What characteristics of the data determine the models’ performance?
• During our performance analysis, we have seen the simulator performing differently on
different topics. This could be partly due to the influence of external events on the activity of
particular topics, or partly due to the regular patterns observed in the data.
43
Main Publications
● Horawalavithana, S., Ng, K., Iamnitchi, A., Predicting Twitter Topic Activity during
Political Crisis using Exogenous Data (Under Review)
● Horawalavithana, S., Choudhury, N., Iamnitchi, A., Online Discussion Threads as
Cascade Pools: Predicting the Growth of Discussion Threads on Reddit (Under
Review)
● Horawalavithana, S., Ng, K., Iamnitchi, A., Drivers of Polarized Discussions on Twitter
during Venezuela Political Crisis, The 13th International ACM Conference on Web
Science (WebSci), 2021.
● Horawalavithana, S., Silva, R., Nabeel, M., Elvitigala, C., Wijesekara, P., and Iamnitchi,
A., Malicious and Low Credibility URLs on Twitter during the AstraZeneca
COVID-19 Vaccine Development, International Conference on Social Computing,
Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and
Simulation (SBP-BRiMS), DC, USA, 2021
44
Main Publications (Contd.)
● Horawalavithana, S., Ng, K., Iamnitchi, A., Twitter is the Megaphone of
Cross-Platform Messaging on the White Helmets, International Conference on Social
Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in
Modeling and Simulation, DC, USA, 2020
● Horawalavithana, S., Bhattacharjee, A., Liu, R., Choudhury, N., O. Hall, L., & Iamnitchi,
A. Mentions of Security Vulnerabilities in Reddit, Twitter and GitHub,
IEEE/WIC/ACM International Conference on Web Intelligence, Greece, October, 2019
● Horawalavithana, S., Flores, J. G. A., Skvoretz, J., & Iamnitchi, A., Behind the Mask:
Understanding the Structural Forces that Make Social Graphs Vulnerable to
De-anonymization. IEEE Transactions on Computational Social Systems (TCSS), 2019
● Horawalavithana, S., Flores, J. A., Skvoretz, J., & Iamnitchi, A., The Risk of Node
Re-identification in Labeled Social Graphs, Applied Network Science (2019)
45
Other Publications
● NG, K.,, Horawalavithana, S., & Iamnitchi, A., Multi-platform Information Operations:
Twitter, Facebook and YouTube against the White Helmets, The Workshop Proceedings
of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2021.
● Liu, R., Mubang, F., Hall, L. O., Horawalavithana, S., Iamnitchi, A., & Skvoretz, J. (2019,
October). Predicting longitudinal user activity at fine time granularity in online
collaborative platforms. In 2019 IEEE International Conference on Systems, Man and
Cybernetics (SMC) (pp. 2535-2542). IEEE.
● Alhazmi, E., Horawalavithana, S., Skvoretz, J., Blackburn, J., & Iamnitchi, A. (2017, July). An
empirical study on team formation in online games. In Proceedings of the 2017
IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
(ASONAM) 2017 (pp. 431-438).
● Alhazmi, E., Choudhury, N., Horawalavithana, S., & Iamnitchi, A. (2019). Temporal mobility
networks in online gaming. Frontiers in Big Data, 2, 21.
46
References
● Aragón, P., Gómez, V., García, D., and Kaltenbrunner, A.. Generative models
of online discussion threads: state of the art and research challenges. Journal
of Internet Services and Applications, 8(1):15, 2017.
● Alp, Z., and Öğüdücü, S.. Identifying topical influencers on twitter based on
user behavior and network topology. Knowledge-Based Systems,
141:211–221, 2018.
● Goel, S., Anderson, A., Hofman, J., and Watts, D.. The structural virality of
online diffusion. Management Science, 62(1):180–196, 2015.
47
Acknowledgments
● Funded by DARPA SocialSim Program
● Data provided by Leidos. (Thanks Kin for Reddit data)
● Evaluation code was developed by Pacific Northwest National Laboratory
48
Data-driven Studies on Social Networks:
Privacy and Simulation
49
sameera1@usf.edu

Contenu connexe

Tendances

Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred toolRaf Guns
 
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 TutorialPrivacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 TutorialKun Liu
 
News construction from microblogging post using open data
News construction from microblogging post using open dataNews construction from microblogging post using open data
News construction from microblogging post using open dataFrancisco Berrizbeitia
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...tksakaki
 
Link Prediction in (Partially) Aligned Heterogeneous Social Networks
Link Prediction in (Partially) Aligned Heterogeneous Social NetworksLink Prediction in (Partially) Aligned Heterogeneous Social Networks
Link Prediction in (Partially) Aligned Heterogeneous Social NetworksSina Sajadmanesh
 
A Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social SignalsA Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social SignalsIsmail BADACHE
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering ShowcaseTucker Truesdale
 
IRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source IdentificationIRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source IdentificationIRJET Journal
 
Emotional Social Signals for Search Ranking
Emotional Social Signals for Search RankingEmotional Social Signals for Search Ranking
Emotional Social Signals for Search RankingIsmail BADACHE
 
Earthquake shakes twitter users real-time event detection by social sensors
Earthquake shakes twitter users  real-time event detection by social sensorsEarthquake shakes twitter users  real-time event detection by social sensors
Earthquake shakes twitter users real-time event detection by social sensorsMike Mayer
 
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanOdsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanvenkatramanJ4
 
Earthquake shakes twitter users
Earthquake shakes twitter usersEarthquake shakes twitter users
Earthquake shakes twitter usersEshan Mudwel
 
Detecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSSDetecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSSIRJET Journal
 

Tendances (18)

Link prediction with the linkpred tool
Link prediction with the linkpred toolLink prediction with the linkpred tool
Link prediction with the linkpred tool
 
Link prediction
Link predictionLink prediction
Link prediction
 
Yuntech present
Yuntech presentYuntech present
Yuntech present
 
NDU Present
NDU PresentNDU Present
NDU Present
 
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 TutorialPrivacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
Privacy-Aware Data Management in Information Networks - SIGMOD 2011 Tutorial
 
News construction from microblogging post using open data
News construction from microblogging post using open dataNews construction from microblogging post using open data
News construction from microblogging post using open data
 
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
WWW2010_Earthquake Shakes Twitter User: Analyzing Tweets for Real-Time Event...
 
Link Prediction in (Partially) Aligned Heterogeneous Social Networks
Link Prediction in (Partially) Aligned Heterogeneous Social NetworksLink Prediction in (Partially) Aligned Heterogeneous Social Networks
Link Prediction in (Partially) Aligned Heterogeneous Social Networks
 
A Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social SignalsA Priori Relevance Based On Quality and Diversity Of Social Signals
A Priori Relevance Based On Quality and Diversity Of Social Signals
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
 
IRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source IdentificationIRJET- Fake News Detection and Rumour Source Identification
IRJET- Fake News Detection and Rumour Source Identification
 
Emotional Social Signals for Search Ranking
Emotional Social Signals for Search RankingEmotional Social Signals for Search Ranking
Emotional Social Signals for Search Ranking
 
Earthquake shakes twitter users real-time event detection by social sensors
Earthquake shakes twitter users  real-time event detection by social sensorsEarthquake shakes twitter users  real-time event detection by social sensors
Earthquake shakes twitter users real-time event detection by social sensors
 
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatramanOdsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
Odsc 2018 detection_classification_of_fake_news_using_cnn_venkatraman
 
Earthquake shakes twitter users
Earthquake shakes twitter usersEarthquake shakes twitter users
Earthquake shakes twitter users
 
Detecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSSDetecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSS
 
Kamel ben kmala_NLP
Kamel ben kmala_NLPKamel ben kmala_NLP
Kamel ben kmala_NLP
 

Similaire à Data-driven Studies on Social Networks: Privacy and Simulation

srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016Saurabh Deochake
 
An agent-based model of the effects of message interventions on opinion dynam...
An agent-based model of the effects of message interventions on opinion dynam...An agent-based model of the effects of message interventions on opinion dynam...
An agent-based model of the effects of message interventions on opinion dynam...Shahan Ali Memon
 
IOT-2016 7-9 Septermber, 2016, Stuttgart, Germany
IOT-2016  7-9 Septermber, 2016, Stuttgart, GermanyIOT-2016  7-9 Septermber, 2016, Stuttgart, Germany
IOT-2016 7-9 Septermber, 2016, Stuttgart, GermanyCharith Perera
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network AnalysisSujoy Bag
 
PhD Defense of Teodoro Montanaro
PhD Defense of Teodoro MontanaroPhD Defense of Teodoro Montanaro
PhD Defense of Teodoro MontanaroTeodoro Montanaro
 
IRJET- Design and Development of a System for Predicting Threats using Data S...
IRJET- Design and Development of a System for Predicting Threats using Data S...IRJET- Design and Development of a System for Predicting Threats using Data S...
IRJET- Design and Development of a System for Predicting Threats using Data S...IRJET Journal
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Lviv Data Science Summer School
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Lviv Data Science Summer School
 
Ire presentation
Ire presentationIre presentation
Ire presentationRaj Patel
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINERaj Patel
 
Designing Cross-Domain Semantic Web of Things Applications
Designing Cross-Domain Semantic Web of Things ApplicationsDesigning Cross-Domain Semantic Web of Things Applications
Designing Cross-Domain Semantic Web of Things ApplicationsAmélie Gyrard
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...IEEEMEMTECHSTUDENTSPROJECTS
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
 
Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Stanford University
 
Empowering First Responders through Automated Multimodal Content Moderation
Empowering First Responders through Automated Multimodal Content Moderation Empowering First Responders through Automated Multimodal Content Moderation
Empowering First Responders through Automated Multimodal Content Moderation IIIT Hyderabad
 

Similaire à Data-driven Studies on Social Networks: Privacy and Simulation (20)

srd117.final.512Spring2016
srd117.final.512Spring2016srd117.final.512Spring2016
srd117.final.512Spring2016
 
An agent-based model of the effects of message interventions on opinion dynam...
An agent-based model of the effects of message interventions on opinion dynam...An agent-based model of the effects of message interventions on opinion dynam...
An agent-based model of the effects of message interventions on opinion dynam...
 
IOT-2016 7-9 Septermber, 2016, Stuttgart, Germany
IOT-2016  7-9 Septermber, 2016, Stuttgart, GermanyIOT-2016  7-9 Septermber, 2016, Stuttgart, Germany
IOT-2016 7-9 Septermber, 2016, Stuttgart, Germany
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
PhD Defense of Teodoro Montanaro
PhD Defense of Teodoro MontanaroPhD Defense of Teodoro Montanaro
PhD Defense of Teodoro Montanaro
 
IRJET- Design and Development of a System for Predicting Threats using Data S...
IRJET- Design and Development of a System for Predicting Threats using Data S...IRJET- Design and Development of a System for Predicting Threats using Data S...
IRJET- Design and Development of a System for Predicting Threats using Data S...
 
Bitcoin Price Prediction
Bitcoin Price PredictionBitcoin Price Prediction
Bitcoin Price Prediction
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
 
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
Master defence 2020 - Andrew Kurochkin - Meme Generation for Social Media Aud...
 
Ire presentation
Ire presentationIre presentation
Ire presentation
 
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINEUnderstanding Large Social Networks | IRE Major Project | Team 57 | LINE
Understanding Large Social Networks | IRE Major Project | Team 57 | LINE
 
Designing Cross-Domain Semantic Web of Things Applications
Designing Cross-Domain Semantic Web of Things ApplicationsDesigning Cross-Domain Semantic Web of Things Applications
Designing Cross-Domain Semantic Web of Things Applications
 
Q046049397
Q046049397Q046049397
Q046049397
 
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
IEEE 2014 JAVA DATA MINING PROJECTS Discovering emerging topics in social str...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
2014 IEEE JAVA DATA MINING PROJECT Discovering emerging topics in social stre...
 
DIE 20130724
DIE 20130724DIE 20130724
DIE 20130724
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
 
Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016Narrative Mind Week 5 H4D Stanford 2016
Narrative Mind Week 5 H4D Stanford 2016
 
Empowering First Responders through Automated Multimodal Content Moderation
Empowering First Responders through Automated Multimodal Content Moderation Empowering First Responders through Automated Multimodal Content Moderation
Empowering First Responders through Automated Multimodal Content Moderation
 

Plus de Sameera Horawalavithana

Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political CrisisSameera Horawalavithana
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White HelmetsSameera Horawalavithana
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...Sameera Horawalavithana
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...Sameera Horawalavithana
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation Sameera Horawalavithana
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Sameera Horawalavithana
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...Sameera Horawalavithana
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...Sameera Horawalavithana
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingSameera Horawalavithana
 

Plus de Sameera Horawalavithana (15)

Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
 
Query personalization
Query personalizationQuery personalization
Query personalization
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Dernier

Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 

Dernier (20)

Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 

Data-driven Studies on Social Networks: Privacy and Simulation

  • 1. Data-driven Studies on Social Networks: Privacy and Simulation 1 Sameera Horawalavithana Ph.D. Candidate, Department of Computer Science and Eng., University of South Florida sameera1@usf.edu
  • 2. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 2
  • 3. Privacy in Social Networks ● Data breaches happen regularly where adversaries use sophisticated techniques (i.e., de-anonymization) to defeat data protection (i.e., anonymization) mechanisms. ● A main research challenge is to develop a principled understanding of how to measure the effectiveness of an anonymization scheme and thus, conversely, the likely success of a de-anonymization attack. ● We introduce and experiment with a framework that identifies the relationships between graph vulnerability and graph properties (Horawalavithana et al. 2019). ○ We show that protecting graph privacy is harder than previously considered ○ For example, our results show that preserving other network properties independent of the degree distribution can reveal node identity. ● We quantitatively study the impact of binary node attributes on node privacy using this framework (Horawalavithana et al. 2018). ○ Our experiments show that the population’s diversity on the binary attribute consistently degrades anonymity 3
  • 4. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 4
  • 5. Social Simulations 5 ● Why do we need to develop accurate simulation techniques for online media information? ○ Helpful for intervention techniques, disaster response, fraud detection, censorship removal, picking up signals/trends as they relate to current events, etc. Organic Discussions on Reddit Venezeulan Political Crisis
  • 6. Social Simulations ● A reliable simulator can realistically respond to internal and external stimuli and adapt to different platforms, datasets, scenarios, each with different characteristics. ● Our objective is to forecast finer-granular social media activity without relying on the ground truth in the testing period. ● Simulation results should match to the real world data. The accuracy is measured by a set of meaningful metrics that capture both macro-level and micro-level simulation information. 6
  • 7. Social Simulations ● Our Approach: We combine social theories with machine learning methodologies for predicting information dissemination within and across social online environments. ● Datasets: The majority of the datasets used in this work were collected by Leidos, the official data provider in the DARPA SocialSim program ● Metrics: We used the evaluation code that was developed by Pacific Northwest National Laboratory. 7
  • 8. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 8
  • 9. Multi-platform Cascades Social Simulator (MCAS) ● Design: Given a history of per-topic social media events and relevant exogenous events, predict the number of information cascades and the size and growth of cascades in the future. ● Three main design components: ○ Topic Module annotates messages with topics. This module was implemented by one of our collaborators. They manually annotated an initial subset of messages with a predefined list of topics, and trained a multilingual BERT model to classify each message with one or multiple such sub-topics. ○ Seed Module includes ML models that specialize predictions to particular macro-level sub-problems (e.g., daily # cascades) ○ Cascade Module includes a probabilistic generative model to predict the micro-level events information (e.g., who did what to whom) in the form of cascades. 9
  • 10. Multi-platform Cascades Social Simulator (MCAS) ● We present two scenarios that motivate the design of the social simulators. ○ We use the endogenous features as extracted from in-platform discussions to predict the growth of conversations on Reddit (Scenario #1). ○ We use both endogenous (e.g., in-platform discussions related to topics) and exogenous (e.g., news articles) features to predict Twitter activity (Scenario #2). 10
  • 11. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 11
  • 12. Scenario #1: Endogenous Signals ● Given a set of "seeds" (e.g., original posts on a social platform, such as posts on Reddit) in a continuous interval of time on a platform, can one predict the information cascade trees (who responds to whom when) rooted in these seeds? ○ Can discussion threads be predicted using only post features (e.g., author who posts the initial message, timing, textual content of the post)? 12
  • 13. Scenario #1: Endogenous Signals Conversation Pool Generation Algorithm 1. Generate N pools of conversations probabilistically a. Conversation Structure: We use the branching process to generate the conversation structure b. User: Users are assigned to conversation nodes following the preferential attachment principle. c. Timing: We use a distribution of message propagation delays to estimate the timing 2. Test the goodness of generated conversation pools using two trained classification models 3. Reconstruct the pool of conversations with the feedback from the classification models 13 Generate N number of Conversation Pools Goodness Test Reconstruct the Best Conversation Pool
  • 14. Scenario #1: Endogenous Signals ● Test the goodness of generated conversation pools using two trained classification models ○ We use the classification models to assess how realistic is the generated conversation with the attached user and timing information. ○ We use two individual-level properties—branching factor and propagation delay—of conversation nodes as the target units for the prediction tasks. ○ We represent conversation information in a data structure (as shown in Fig. 5.2) where each conversation node is described by structural, user and content features (Table 5.4). 14
  • 15. Scenario #1: Endogenous Signals ● Goodness score of a conversation ○ We use the Area Under Curve (AUC) of two branch vectors and two delay vectors to calculate the goodness score of a conversation. ○ Each conversation receives a goodness score as the mean of two AUC scores from the two models. ● This goodness score is used to know which conversation is the best during the simulation. 15
  • 16. Scenario #1: Endogenous Signals ● Reconstruct the pool of conversations with the feedback from the classification models ○ The objective is to create a pool of conversations that outperforms any existing pool of conversations. ○ We treat the pool reconstruction problem as an optimization problem that we solve using a genetic algorithm. ■ A gene is a conversation represented by the message tree with assigned user and timing information to nodes. ■ An individual is a pool of conversations. ■ The population is the set of conversation pools. 16
  • 17. Scenario #1: Endogenous Signals 17 Rank Pools New Pool Construction Reconstructed Pools Uniform Crossover Conversation A Pool of Conversations The goodness of a pool of conversations is the sum of the goodness scores of the conversations in the pool.
  • 18. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 18
  • 19. Scenario #1: Dataset ● We used a Reddit dataset covering the discussions in nine crypto currency and 38 cyber security related subreddits between January 2015 and August 2017 to train and test the simulator. ● 19 Measurement Crypto Cyber Number of Posts 0.2M 1.76M Number of Comments 3.5M 35.3M Number of Users 0.14M 1.6M
  • 20. Scenario #1: Overlapping Conversations ● Users respond with comments to the original post or other users’ comments, repeatedly getting involved in the same conversation. ● The same user can participate in multiple related conversation threads 20 Bitcoin scaling debate discussions on August 2017. There are 57 conversations with 4,418 messages posted by 1,458 users. 218 and 83 users appeared in more than one, and two conversations, respectively.
  • 21. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 21
  • 22. Scenario #1: Evaluation ● We predict the growth of Reddit conversations in one month (August 01 - August 31, 2017). ○ We use the posts made between August 1 and August 3, 2017 as input seed posts. ○ There were 3,740 and 3,463 number of posts in the crypto-currency and cyber-security domains, respectively. ● We use three baseline models. ○ Recent Replay baseline repeats the most recent n conversations from the training data. ○ Random baseline draws n conversations from the training data at random. We repeat this process 10 times to minimize the bias of random selection. ○ Lumbreras Model uses the branching process in the generation of conversation structures (Aragon et al. 2017). 22
  • 23. Scenario #1: Evaluation ● Predicting the structure of cascades ○ We report the distribution of the size and structural virality of generated conversations ■ Structural virality is measured by the Wiener index of conversation trees (Goel et al. 2015) ○ We calculate the JS divergence between the distributions of the structural metrics reported of the generative models and of the ground truth 23
  • 24. Scenario #1: Evaluation ● Predicting the temporal growth of conversations ○ We report the growth of the Reddit discussions by the daily number of comments over 1 month. ○ We compare the predicted time series and ground truth time series using Dynamic Time Warping (DTW) and Root Mean Square Error (RMSE) metrics. 24 Discussions on crypto-currency subreddits Discussions on cyber-security subreddits
  • 25. Scenario #1: Evaluation ● Predicting the user engagement ○ We compare the number of users engaged in multiple conversations between simulation and ground truth (Fig. 5.9) ● Predicting the collective behavior ○ We record user participation in conversations in a vector [c1 , c2 , ..., cn ], where ci indicates a binary value to reflect the user involvement in the ith conversation. ○ We use the Pearson correlation coefficient to compare all pairs of binary vectors. ○ We calculate the JS-divergence and RMSE between the coefficient distributions of the simulation and the ground truth data (Table 5.9). ○ Lower JS-divergence values reflect collective behavior closer to that measured from the ground 25
  • 26. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 26
  • 27. Scenario #2: Exogenous Signals • Can one accurately generate the social media activity on a platform (for example, Twitter) using the recorded signals from other platforms? • Is that doable in the context of unexpected events, when social media users both react to unexpected news in unpredictable ways and also generate news for many news outlets? 27 27
  • 28. Scenario #2: Exogenous Signals ● Seed Module ○ We train multiple neural network models to predict the number of daily tweets per topic. ○ The module variations depend on the exogenous sources and recency of features. ■ Exogenous features are the number of news articles, and the number of Reddit posts per topic. They are extracted on the “day before” and “day of” predictions. ○ We assign users to the predicted tweets randomly with probability proportional to the user spread score. ■ The spread score for user u is the product of the fraction of the number of tweets posted by u that get retweeted and the total number of retweets that user u gets for his tweets (Alp et al. 2018). ■ Intuitively, the spread score captures the level of influence of a user: the higher the spread score, the more influential the user is. ● Cascade Module is similar to the solution presented in Scenario #1. ○ This module takes the tweets predicted by the seed module as input. ○ We assign new users to the cascades. ■ We select leaves of the cascades predicted for each topic and assign those users a completely new and unique identifier. 28
  • 29. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 29
  • 30. Scenario #2: Dataset 30 • Twitter Dataset • We used a Twitter dataset covering the Venezuelan Presidential Crisis between January and February 2019. • This dataset covers a period of high political tension which resulted in nationwide protests, militarized responses, and incidents of mass violence and arrests. Number of Tweets ~1M Number of Retweets ~11.6M Number of Users ~1.15M
  • 31. Scenario #2: Dataset 31 • Exogenous Data Sources • We collected Reddit discussions from one of the largest Venezuela-related subreddits, /r/vzla. • The news article data was collected via a publicly available geopolitical event database, GDELT Number of Reddit Messages 56K Number of News Articles 138K
  • 32. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 32
  • 33. Scenario #2: Evaluation ● We predict Twitter activity in two weeks (February 15 - February 28, 2019). ● We use two baselines, ○ Replay baseline repeats the messages from the last two weeks of training data. ○ Sampling baseline draws full Twitter cascades at random to match the average daily volume of activity per topic observed in the last two weeks of training data. ● We use three metrics, ○ Time series comparison ■ NRMSE (Normalized Root Mean Squared Error) to capture temporal pattern ■ SMAPE (Symmetric Mean Absolute Percentage Error) to capture the volume and temporal pattern ○ Distribution level comparison ■ EM (Earth Movers Distance) to compare the page-rank distributions. 33
  • 34. Scenario #2: Evaluation ● Predicting the daily number of tweets per topic. ○ We predict the big spikes in the number of tweets for most of the popular topics. ○ But spikes are mistimed in the models that use the features on the day before the predictions (see dash lines). 34
  • 35. Scenario #2: Evaluation ● Predicting the daily number of tweets per topic. ○ Multiple variants of our solution capture the trend of the number of tweets closer to the ground truth than any baselines for most of the topics. ○ The models that use the news articles in the last 24 hours before 8 a.m. perform better on predicting the trend of tweets than the models that use the news articles in the previous day of predictions (see two light green bars in Fig. a) ○ Using current day exogenous data leads to more accurate predictions than using the previous day exogenous data 35
  • 36. Scenario #2: Evaluation ● Predicting the daily number of tweets and retweets per topic. ○ Retweets are predicted by the cascade module. The temporal pattern of retweets is driven mostly by the temporal pattern of tweets predicted by the seed module. 36
  • 37. Scenario #2: Evaluation ● Predicting the daily number of tweets and retweets per topic. ○ Similar to the performance of the seed module, the cascade module also captures the trend of number of shares closer to the ground truth than any baselines for most of the topics. ○ Results suggest that most representative exogenous sources depend on the topic of interest. ■ News articles are more helpful to predict the topics related to international humanitarian aid event and violent clashes between the military and protesters. ■ Reddit discussions are more helpful to predict topics related to the Maduro’s dictatorship. 37 Performance View, #S- number of shares over time, #NU - number of new user engagements over time, page rank (PR) measurements. Green cells present that models beat the baselines. Predicting Twitter topic activity using Reddit discussions Predicting Twitter topic activity using News Articles
  • 38. Case Study #2: Evaluation ● Predicting the daily number of new user engagements per topic. ○ Our models outperform the respective baselines across all 12 topics with respect to NRMSE and SMAPE ○ Models using only Reddit features show better performance than those using only news in arrests and maduro/narco topics ● Predicting the user interaction network ○ We create a directed retweet network for each topic in which an edge points from the user who retweeted to the user who posted the tweet. ○ The pagerank distribution of the user interaction network is closer to the ground truth than the Sampling baseline method for a majority of topics. ○ The network structures predicted by the Replay baseline model are hard to beat in this network measurement. 38 Performance View, #S- number of shares over time, #NU - number of new user engagements over time, degree (DEG) and page rank (PR) measurements. Green cells present that models beat the baselines. Predicting Twitter topic activity using Reddit discussions Predicting Twitter topic activity using News Articles
  • 39. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 39
  • 40. Lessons Learnt • Recency matters • To predict the social media activity (i.e., the volume of messages and the user interaction network) in the immediate future, the immediate past is more useful than the delayed past. • This would also make the baselines very competitive as they re-generate the recent past. • Recency and Locality matter • To predict activity within a particular topic, the recent activity within the same topic matters. • This observation may be biased to the design of the topic assignment model (e.g., manual annotation process, the distribution of topics, topic co-occurrence, etc.) • Recency introduces small-ish data, but ML models need big-ish data? • The number of data points available for training is depending on the time granularity of the predictions. For example, one can generate more data points in the hourly granularity (or less) than in the daily or weekly granularity. • We increase the number of data points available for training by splitting the data based on the topic. For example, given N number of topics, and M number of days, we can create N x M number of data points. This also increases the variation in the training data which helps ML models to learn multiple topic activity. 40
  • 41. Lessons Learnt • Exogenous features matter • There are many potential exogenous data sources to capture the real-world events. But selecting the most representative exogenous features to predict topic activity matter. • “Big” spikes are hard to predict • We tested our simulators on special cases (e.g., political crisis, influence campaigns) which include big spikes due to external events. • Exogenous features on the “day of” and “day before” predictions had a big impact on predicting spikes more accurately. • Long vs. short time horizon predictions • The overall volume of activity can be predicted in the long time horizon with the help of exogenous features, but predicting the temporal pattern is hard due to compounding errors in the simulation. • Hard to predict the structure of the user interaction network • We found the baselines are hard to beat in the network structural measurements. • As they regenerate the past, they capture the patterns of user interactions more accurately. 41
  • 42. Outline ● Privacy in Social Networks ● Social Simulations ● The Design of the Multi-platform Cascades (MCAS) Social Simulator ○ Scenario #1: Endogenous Signals ■ Dataset ■ Evaluation ○ Scenario #2: Exogenous Signals ■ Dataset ■ Evaluation ● Lessons Learnt ● Future Work 42
  • 43. Future Work • Reducing the error accumulated over different modules in the pipeline design • Any error on predicting the volume of discussions can not be resolved later in the current pipeline design. Accurately identifying which module penalizes overall prediction is important to make improvements • Testing the generalizability of modules across various other simulation scenarios, and datasets. • E.g., influence operations, disinformation campaigns, private group discussions, etc. • Explaining the performance of simulators • What characteristics of the data determine the models’ performance? • During our performance analysis, we have seen the simulator performing differently on different topics. This could be partly due to the influence of external events on the activity of particular topics, or partly due to the regular patterns observed in the data. 43
  • 44. Main Publications ● Horawalavithana, S., Ng, K., Iamnitchi, A., Predicting Twitter Topic Activity during Political Crisis using Exogenous Data (Under Review) ● Horawalavithana, S., Choudhury, N., Iamnitchi, A., Online Discussion Threads as Cascade Pools: Predicting the Growth of Discussion Threads on Reddit (Under Review) ● Horawalavithana, S., Ng, K., Iamnitchi, A., Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis, The 13th International ACM Conference on Web Science (WebSci), 2021. ● Horawalavithana, S., Silva, R., Nabeel, M., Elvitigala, C., Wijesekara, P., and Iamnitchi, A., Malicious and Low Credibility URLs on Twitter during the AstraZeneca COVID-19 Vaccine Development, International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation (SBP-BRiMS), DC, USA, 2021 44
  • 45. Main Publications (Contd.) ● Horawalavithana, S., Ng, K., Iamnitchi, A., Twitter is the Megaphone of Cross-Platform Messaging on the White Helmets, International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation, DC, USA, 2020 ● Horawalavithana, S., Bhattacharjee, A., Liu, R., Choudhury, N., O. Hall, L., & Iamnitchi, A. Mentions of Security Vulnerabilities in Reddit, Twitter and GitHub, IEEE/WIC/ACM International Conference on Web Intelligence, Greece, October, 2019 ● Horawalavithana, S., Flores, J. G. A., Skvoretz, J., & Iamnitchi, A., Behind the Mask: Understanding the Structural Forces that Make Social Graphs Vulnerable to De-anonymization. IEEE Transactions on Computational Social Systems (TCSS), 2019 ● Horawalavithana, S., Flores, J. A., Skvoretz, J., & Iamnitchi, A., The Risk of Node Re-identification in Labeled Social Graphs, Applied Network Science (2019) 45
  • 46. Other Publications ● NG, K.,, Horawalavithana, S., & Iamnitchi, A., Multi-platform Information Operations: Twitter, Facebook and YouTube against the White Helmets, The Workshop Proceedings of the 14th International AAAI Conference on Web and Social Media (ICWSM), 2021. ● Liu, R., Mubang, F., Hall, L. O., Horawalavithana, S., Iamnitchi, A., & Skvoretz, J. (2019, October). Predicting longitudinal user activity at fine time granularity in online collaborative platforms. In 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC) (pp. 2535-2542). IEEE. ● Alhazmi, E., Horawalavithana, S., Skvoretz, J., Blackburn, J., & Iamnitchi, A. (2017, July). An empirical study on team formation in online games. In Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) 2017 (pp. 431-438). ● Alhazmi, E., Choudhury, N., Horawalavithana, S., & Iamnitchi, A. (2019). Temporal mobility networks in online gaming. Frontiers in Big Data, 2, 21. 46
  • 47. References ● Aragón, P., Gómez, V., García, D., and Kaltenbrunner, A.. Generative models of online discussion threads: state of the art and research challenges. Journal of Internet Services and Applications, 8(1):15, 2017. ● Alp, Z., and Öğüdücü, S.. Identifying topical influencers on twitter based on user behavior and network topology. Knowledge-Based Systems, 141:211–221, 2018. ● Goel, S., Anderson, A., Hofman, J., and Watts, D.. The structural virality of online diffusion. Management Science, 62(1):180–196, 2015. 47
  • 48. Acknowledgments ● Funded by DARPA SocialSim Program ● Data provided by Leidos. (Thanks Kin for Reddit data) ● Evaluation code was developed by Pacific Northwest National Laboratory 48
  • 49. Data-driven Studies on Social Networks: Privacy and Simulation 49 sameera1@usf.edu