SlideShare une entreprise Scribd logo
1  sur  57
Télécharger pour lire hors ligne
User Identity Linkage: Data Collection, Dataset
Biases, Method, Control and Application
Rishabh Kaushal
PhD15008
Committee Members:
Prof. Sanjay Jha
Dr. Alessandra Sala
Prof. Anwitaman Datta
Prof. Ponnurangam Kumaraguru (PK), Advisor
PhD Defense Presentation
Who Am I ?
Sponsored PhD Student, Precog Research Group, IIIT, Delhi.
Serving as Assistant Professor, IT Dept, IGDTUW.
MS by Research from IIIT, Hyderabad.
Research Interest: Social Computing.
2
Outline of Talk
3
Identity in Physical World
4
Identity Physical World
Student
Teacher
Software Engineer
Father
Identity in Online World
Identity has three dimensions - profile, content, and network
User joins multiple social networks
5
World of Social
Networks
Professional
Personal
News
Problem: User Identity Linkage (UIL)
UIL refers to the problem of determining whether two input user
identities, taken from two different social networks A and B, belong to
the same person or not.
(Ia
, Ib
): Linked User Identity Pair
6
Motivation
7
Motivation
8
Thesis Statement
“Computational approaches can be proposed for the analysis of data
collection methods, investigation of biases in identity linkage datasets,
linkage of user identities across social networks, control-ability of user
identity linkage, and application of user identity linkage solution to
solve extraneous problems.”
9
Outline of Talk
10
Accepted at 12th IEEE International Conference on Social Computing (SocialCom 2019). Xiamen, China.
Data Collection Methods
11
Social Aggregation (SA)
We refer to such sites as social aggregation platforms on which users
create an account and provide details of their multiple social network
accounts.
Perito et al. → Google profiles, Liu et al. → About.me profiles
12
Cross Platform Sharing
Cross platform sharing refers to a user behavior in which user posts
the same content across multiple social network (Correa et al.)
13
Self Disclosure
14
On user profile page, user himself/herself discloses their identity on
other social network platform (Chen et al.)
Social Network Coverage
15
Distribution of #Identities per User
16
Linked Identity Pairs
Only top-6 social networks
where we got best coverage
are plotted.
17
Data Collection - Conclusion
Computational approaches to collect linked user identity pairs can be
implemented.
Each data collection method depends upon a particular user behavior
which is leverage to collect linked identities of that user.
18
Outline of Talk
19
Accepted at 35th ACM/SIGAPP Symposium on Applied Computing (SAC 2020). Brno, Czech Republic.
Why study dataset biases ?
20
Every data collection approach depend on the typical behaviors of
users who maintain identities across multiple social networks
As a consequence, these behavioral biases exhibited by users get
manifested in these user identity linkage datasets.
Scope of our work
We focus on two identity linkage datasets (SD and CPS) derived by
leveraging two user behaviors namely, self-disclosure and cross
platform posting, respectively on Twitter and Instagram.
(1) Detection & Impact: Does dataset bias exist? What is the impact
of dataset biases on ML models?
(2) Quantification: How to measure the amount of dataset biases?
21
UIL as Supervised Learning Problem
22
Negative Class Generation: To create unlinked user identity pairs i.e. user identities that do not
belong to the same person, done in two ways - random pairing and similar pairing.
1. Jaccard Similarity on ‘username’ of
user identity pair.
2. Edit Distance on ‘display name’ of
user identity pair.
+ve Pairs: (rishabhk_, rk.iiit)
-ve Pair: (rishab, rk.iiit)
(rahul, rk.iiit)
DataSet Details
23
User Behavioral Features
Jaccard Similarity (JS) on usernames
24
50% of user identity pairs from SD
have JS value as 0.9 as opposed to
only 23% from CPS
Proportionofusers
User Behavioral Features
25
Edit Distance (ED) on display names
Proportionofusers
58% display names of user identity
pairs obtained through SD have 0.0
ED as compared to 35% from CPS
Impact of biases on model
26
Across all learning algorithms adopted, precision of models trained and tested on same datasets
are better than the models trained & tested on different datasets.
Experiments in two ways. (1) Same dataset for train-test (2) Different dataset for
train-test
Quantification of Bias
We have detected behavioral biases in user identities, characterized
them and measured their impact on identity linkage models.
We propose a design that quantifies biases by leveraging from a
well-established discrimination measurement approach namely
‘situational testing’.
27
Situational Testing (ST)
28
Background Quantification Metric
Applying ST to quantify biases
Data Record:
Person → User Identity Pair
Protected Attribute:
Gender (male or female) → Data Collection Method (SD or CPS)
Class Label:
(Selected / Not-Selected) → (Linked / Not-Linked)
29
Results
RQ: Are both decision classes (linked and unlinked) equally affected by biases?
30
t-value=0, means no bias.
But, it is evident that probability
distributions of t−values are spread
on both positive (t>0) and negative
(t<0) sides which indicates that
behavioral biases affect many data
records.
Dataset Biases - Conclusion
Behavioral biases exist in identity linkage datasets. They can be
detected and quantified.
We recommend to collect linked user identities using more than one
data collection method.
Mitigation of biases in identity dataset - open problem.
31
Outline of Talk
32
Accepted at International School & Conference on Network Science (NetSciX, 2020), Tokyo, Japan.
Propose: NeXLink Framework
Can we obtain effective node representations such that node embeddings of users
belonging to Cross-Network Linkages (CNLs) are closer in embedding space than
other nodes?
33
Input
Output
More formally
The goal of embedding function is to transform each user identity ui
X
and uj
Y
into
low dimensional vectors zi
X
and zj
Y
of size d such that if ui
X
and uj
Y
belong to the
same person, then their embedding vectors zi
X
and zj
Y
are closer in embedding
space else far apart.
34
NeXLink Framework
35
Structural similarities of node
within their respective
networks are preserved
Similarities of nodes across the
two networks are preserved based
on common friendship relation
Local Node Embeddings*
The joint probability of ui
X
and uk
X
represented by their embedding vectors zi
X
and
zj
X
can be expressed as below
The empirical probability between ui
X
and uk
X
within same network is defined by
their normalized weights as below
Optimization: Minimize the KL-divergence between these distributions
36
* LINE algorithm: Tang et al.
Global Node Embeddings
To construct global node embeddings, we construct a
global graph (G) as follows.
G(V) = VX
+ VY
G(E) = CNL + NCNL
Positive Edge Generation (CNL): Linked identity pairs
belonging to same person across social networks.
37
Negative Edge Generation (NCNL): For every node pair (ui
X
,uj
Y
) we perform a random
walk of t length starting at node ui
X
and add (ui
X
,uk
Y
) to NCNL (Non Cross Network Links)
if uk
Y
appears in the random walk.
Global Node Embeddings
To learn node embeddings, we perform biased walks (node2vec*) guided by
common friends (CF) metric such that transition probability is
38
* node2vec algorithm: Grover et al.
Datasets
We evaluated NeXLink framework on two datasets.
Augmented Dataset: Sampled two sub-graphs from a large Facebook friendship
network data comprising of 63,713 nodes and 817,090 edges. (Man et al.)
Real-world Dataset: Twitter (5,120 users and 130,575 edges) and Instagram (5,313
users and 54,233 edges) with 1,288 common users. (Kong et al.)
39
Evaluation Metric
For a given node ui
X
, our goal is find node uj
Y
which belong to the same person.
Therefore, we count a hit if zj
Y
is present in top-k node embeddings, ordered based
on cosine similarity.
40
Evaluation - Comparison with others
We evaluate our proposed NeXLink
(LINE-node2vec) framework with two
other approaches.
IONE: Input-Output Network
Embedding (IONE) for the task of
network alignment
REGAL: Representation Learning
based Graph Alignment
41
NeXLink Framework - Conclusion
Node representation learning based approach can be proposed to
effectively learn embedding vectors for extracting linked user identities .
42
Outline of Talk
43
Accepted at 9th International Conference on Social Informatics (SocInfo, 2017), University of Oxford, London.
Linkability Nudge
Can we help users control linkability of their identities across social
networks ?
We design and implement a linkability nudge, gentle interventions to
help users towards making an informed decision.
User decides a range of linkability threshold (score) for each identity
pair. (dynamic web portal)
Whenever user behavior goes beyond the pre-configured range, the
user is nudged. (web browser extension)
44
Linkability Nudge Architecture
45
Linkability Score - Displayed to User
46
Content Driven Color Nudge
47
Attribute Driven Notify Nudge
48
Nudge Evaluation
Controlled lab experiment, control vs treatment period.
Participants were recruited and told to perform tasks related to
making a post and changing their profile attribute.
We observed the impact of linkability nudge on participants.
49
Nudge Evaluation
50
Minutes since the start of experiment
Participants
Outline of Talk
51
Accepted at 7th International Conference on Mining Intelligence & Knowledge Exploration (MIKE 2019), NIT, Goa.
Clone Detection
Clone: User identity looking similar to the victim identity within the
same social network
52
Why detect clone identities ?
53
Contributions Summary
Performed comparative analysis of data collection methods.
Investigated biases in identity linkage datasets.
Proposed node embedding framework for user identity linkage.
Helped users control linkability of their identities across OSNs.
Applied UIL solution to detect clones and flag their behaviors.
54
Limitations & Future Directions
Data collection is a challenge. Need to explore other social media platforms
goodreads, strava, etc.
We employed situational testing in detection of dataset biases. Other methods
from fairness algorithm studies need to be explored.
Our NeXLink node embedding framework takes only network information.
Leveraging content and profile features can be helpful.
We performed controlled lab study. Deploying linkability nudge for field trials.
55
Acknowledgements
PhD Advisor: Prof PK
Monitoring Committee: Prof Arun Balaji Buduru, Prof Rajiv Ratn Shah
Co-authors and Peers
Members of Precog
My family
56
57
Thanks

Contenu connexe

Tendances

$$ Using statistics to search and annotate pictures an evaluation of semantic...
$$ Using statistics to search and annotate pictures an evaluation of semantic...$$ Using statistics to search and annotate pictures an evaluation of semantic...
$$ Using statistics to search and annotate pictures an evaluation of semantic...
mhmt82
 
Learning Social Networks From Web Documents Using Support
Learning Social Networks From Web Documents Using SupportLearning Social Networks From Web Documents Using Support
Learning Social Networks From Web Documents Using Support
ceya
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis Platforms
Noah Flower
 

Tendances (19)

Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
06 Community Detection
06 Community Detection06 Community Detection
06 Community Detection
 
Q046049397
Q046049397Q046049397
Q046049397
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
 
06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)06 Regression with Networks – EGO Networks and Randomization (2017)
06 Regression with Networks – EGO Networks and Randomization (2017)
 
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
Artigo - Aplicações Interativas para TV Digital: Uma Proposta de Ontologia de...
 
Optimizing community detection in social networks using antlion and K-median
Optimizing community detection in social networks using antlion and K-medianOptimizing community detection in social networks using antlion and K-median
Optimizing community detection in social networks using antlion and K-median
 
$$ Using statistics to search and annotate pictures an evaluation of semantic...
$$ Using statistics to search and annotate pictures an evaluation of semantic...$$ Using statistics to search and annotate pictures an evaluation of semantic...
$$ Using statistics to search and annotate pictures an evaluation of semantic...
 
Node similarity
Node similarityNode similarity
Node similarity
 
Social media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / LecturerSocial media community using optimized algorithm by M. Gomathi / Lecturer
Social media community using optimized algorithm by M. Gomathi / Lecturer
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
IRJET - Visual Question Answering – Implementation using Keras
IRJET -  	  Visual Question Answering – Implementation using KerasIRJET -  	  Visual Question Answering – Implementation using Keras
IRJET - Visual Question Answering – Implementation using Keras
 
Link Prediction Survey
Link Prediction SurveyLink Prediction Survey
Link Prediction Survey
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Learning Social Networks From Web Documents Using Support
Learning Social Networks From Web Documents Using SupportLearning Social Networks From Web Documents Using Support
Learning Social Networks From Web Documents Using Support
 
Making the invisible visible through SNA
Making the invisible visible through SNAMaking the invisible visible through SNA
Making the invisible visible through SNA
 
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
AN GROUP BEHAVIOR MOBILITY MODEL FOR OPPORTUNISTIC NETWORKS
 
Overview Of Network Analysis Platforms
Overview Of Network Analysis PlatformsOverview Of Network Analysis Platforms
Overview Of Network Analysis Platforms
 
Community detection in complex social networks
Community detection in complex social networksCommunity detection in complex social networks
Community detection in complex social networks
 

Similaire à User Identity Linkage: Data Collection, DataSet Biases, Method, Control and Application

Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
moresmile
 
Studying user footprints in different online social networks
Studying user footprints in different online social networksStudying user footprints in different online social networks
Studying user footprints in different online social networks
IIIT Hyderabad
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
Harish Vaidyanathan
 

Similaire à User Identity Linkage: Data Collection, DataSet Biases, Method, Control and Application (20)

IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
Subscriber Churn Prediction Model using Social Network Analysis In Telecommun...
 
IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...IRJET- Predicting Social Network Communities Structure Changes and Detection ...
IRJET- Predicting Social Network Communities Structure Changes and Detection ...
 
security enhanced content sharing in social io t a directed hypergraph based ...
security enhanced content sharing in social io t a directed hypergraph based ...security enhanced content sharing in social io t a directed hypergraph based ...
security enhanced content sharing in social io t a directed hypergraph based ...
 
Studying user footprints in different online social networks
Studying user footprints in different online social networksStudying user footprints in different online social networks
Studying user footprints in different online social networks
 
Current trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networksCurrent trends of opinion mining and sentiment analysis in social networks
Current trends of opinion mining and sentiment analysis in social networks
 
Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.
 
Ppt
PptPpt
Ppt
 
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKSSCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
SCALABLE LOCAL COMMUNITY DETECTION WITH MAPREDUCE FOR LARGE NETWORKS
 
Scalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large NetworksScalable Local Community Detection with Mapreduce for Large Networks
Scalable Local Community Detection with Mapreduce for Large Networks
 
Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks Clustering in Aggregated User Profiles across Multiple Social Networks
Clustering in Aggregated User Profiles across Multiple Social Networks
 
01 Network Data Collection (2017)
01 Network Data Collection (2017)01 Network Data Collection (2017)
01 Network Data Collection (2017)
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dyna...
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
 
Visually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of LifeVisually Exploring Social Participation in Encyclopedia of Life
Visually Exploring Social Participation in Encyclopedia of Life
 
IRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social NetworksIRJET- Link Prediction in Social Networks
IRJET- Link Prediction in Social Networks
 
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
LCF: A Temporal Approach to Link Prediction in Dynamic Social Networks
 

Plus de IIIT Hyderabad

Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
IIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
IIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
IIIT Hyderabad
 

Plus de IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 

Dernier

AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Dernier (20)

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 

User Identity Linkage: Data Collection, DataSet Biases, Method, Control and Application

  • 1. User Identity Linkage: Data Collection, Dataset Biases, Method, Control and Application Rishabh Kaushal PhD15008 Committee Members: Prof. Sanjay Jha Dr. Alessandra Sala Prof. Anwitaman Datta Prof. Ponnurangam Kumaraguru (PK), Advisor PhD Defense Presentation
  • 2. Who Am I ? Sponsored PhD Student, Precog Research Group, IIIT, Delhi. Serving as Assistant Professor, IT Dept, IGDTUW. MS by Research from IIIT, Hyderabad. Research Interest: Social Computing. 2
  • 4. Identity in Physical World 4 Identity Physical World Student Teacher Software Engineer Father
  • 5. Identity in Online World Identity has three dimensions - profile, content, and network User joins multiple social networks 5 World of Social Networks Professional Personal News
  • 6. Problem: User Identity Linkage (UIL) UIL refers to the problem of determining whether two input user identities, taken from two different social networks A and B, belong to the same person or not. (Ia , Ib ): Linked User Identity Pair 6
  • 9. Thesis Statement “Computational approaches can be proposed for the analysis of data collection methods, investigation of biases in identity linkage datasets, linkage of user identities across social networks, control-ability of user identity linkage, and application of user identity linkage solution to solve extraneous problems.” 9
  • 10. Outline of Talk 10 Accepted at 12th IEEE International Conference on Social Computing (SocialCom 2019). Xiamen, China.
  • 12. Social Aggregation (SA) We refer to such sites as social aggregation platforms on which users create an account and provide details of their multiple social network accounts. Perito et al. → Google profiles, Liu et al. → About.me profiles 12
  • 13. Cross Platform Sharing Cross platform sharing refers to a user behavior in which user posts the same content across multiple social network (Correa et al.) 13
  • 14. Self Disclosure 14 On user profile page, user himself/herself discloses their identity on other social network platform (Chen et al.)
  • 17. Linked Identity Pairs Only top-6 social networks where we got best coverage are plotted. 17
  • 18. Data Collection - Conclusion Computational approaches to collect linked user identity pairs can be implemented. Each data collection method depends upon a particular user behavior which is leverage to collect linked identities of that user. 18
  • 19. Outline of Talk 19 Accepted at 35th ACM/SIGAPP Symposium on Applied Computing (SAC 2020). Brno, Czech Republic.
  • 20. Why study dataset biases ? 20 Every data collection approach depend on the typical behaviors of users who maintain identities across multiple social networks As a consequence, these behavioral biases exhibited by users get manifested in these user identity linkage datasets.
  • 21. Scope of our work We focus on two identity linkage datasets (SD and CPS) derived by leveraging two user behaviors namely, self-disclosure and cross platform posting, respectively on Twitter and Instagram. (1) Detection & Impact: Does dataset bias exist? What is the impact of dataset biases on ML models? (2) Quantification: How to measure the amount of dataset biases? 21
  • 22. UIL as Supervised Learning Problem 22 Negative Class Generation: To create unlinked user identity pairs i.e. user identities that do not belong to the same person, done in two ways - random pairing and similar pairing. 1. Jaccard Similarity on ‘username’ of user identity pair. 2. Edit Distance on ‘display name’ of user identity pair. +ve Pairs: (rishabhk_, rk.iiit) -ve Pair: (rishab, rk.iiit) (rahul, rk.iiit)
  • 24. User Behavioral Features Jaccard Similarity (JS) on usernames 24 50% of user identity pairs from SD have JS value as 0.9 as opposed to only 23% from CPS Proportionofusers
  • 25. User Behavioral Features 25 Edit Distance (ED) on display names Proportionofusers 58% display names of user identity pairs obtained through SD have 0.0 ED as compared to 35% from CPS
  • 26. Impact of biases on model 26 Across all learning algorithms adopted, precision of models trained and tested on same datasets are better than the models trained & tested on different datasets. Experiments in two ways. (1) Same dataset for train-test (2) Different dataset for train-test
  • 27. Quantification of Bias We have detected behavioral biases in user identities, characterized them and measured their impact on identity linkage models. We propose a design that quantifies biases by leveraging from a well-established discrimination measurement approach namely ‘situational testing’. 27
  • 28. Situational Testing (ST) 28 Background Quantification Metric
  • 29. Applying ST to quantify biases Data Record: Person → User Identity Pair Protected Attribute: Gender (male or female) → Data Collection Method (SD or CPS) Class Label: (Selected / Not-Selected) → (Linked / Not-Linked) 29
  • 30. Results RQ: Are both decision classes (linked and unlinked) equally affected by biases? 30 t-value=0, means no bias. But, it is evident that probability distributions of t−values are spread on both positive (t>0) and negative (t<0) sides which indicates that behavioral biases affect many data records.
  • 31. Dataset Biases - Conclusion Behavioral biases exist in identity linkage datasets. They can be detected and quantified. We recommend to collect linked user identities using more than one data collection method. Mitigation of biases in identity dataset - open problem. 31
  • 32. Outline of Talk 32 Accepted at International School & Conference on Network Science (NetSciX, 2020), Tokyo, Japan.
  • 33. Propose: NeXLink Framework Can we obtain effective node representations such that node embeddings of users belonging to Cross-Network Linkages (CNLs) are closer in embedding space than other nodes? 33 Input Output
  • 34. More formally The goal of embedding function is to transform each user identity ui X and uj Y into low dimensional vectors zi X and zj Y of size d such that if ui X and uj Y belong to the same person, then their embedding vectors zi X and zj Y are closer in embedding space else far apart. 34
  • 35. NeXLink Framework 35 Structural similarities of node within their respective networks are preserved Similarities of nodes across the two networks are preserved based on common friendship relation
  • 36. Local Node Embeddings* The joint probability of ui X and uk X represented by their embedding vectors zi X and zj X can be expressed as below The empirical probability between ui X and uk X within same network is defined by their normalized weights as below Optimization: Minimize the KL-divergence between these distributions 36 * LINE algorithm: Tang et al.
  • 37. Global Node Embeddings To construct global node embeddings, we construct a global graph (G) as follows. G(V) = VX + VY G(E) = CNL + NCNL Positive Edge Generation (CNL): Linked identity pairs belonging to same person across social networks. 37 Negative Edge Generation (NCNL): For every node pair (ui X ,uj Y ) we perform a random walk of t length starting at node ui X and add (ui X ,uk Y ) to NCNL (Non Cross Network Links) if uk Y appears in the random walk.
  • 38. Global Node Embeddings To learn node embeddings, we perform biased walks (node2vec*) guided by common friends (CF) metric such that transition probability is 38 * node2vec algorithm: Grover et al.
  • 39. Datasets We evaluated NeXLink framework on two datasets. Augmented Dataset: Sampled two sub-graphs from a large Facebook friendship network data comprising of 63,713 nodes and 817,090 edges. (Man et al.) Real-world Dataset: Twitter (5,120 users and 130,575 edges) and Instagram (5,313 users and 54,233 edges) with 1,288 common users. (Kong et al.) 39
  • 40. Evaluation Metric For a given node ui X , our goal is find node uj Y which belong to the same person. Therefore, we count a hit if zj Y is present in top-k node embeddings, ordered based on cosine similarity. 40
  • 41. Evaluation - Comparison with others We evaluate our proposed NeXLink (LINE-node2vec) framework with two other approaches. IONE: Input-Output Network Embedding (IONE) for the task of network alignment REGAL: Representation Learning based Graph Alignment 41
  • 42. NeXLink Framework - Conclusion Node representation learning based approach can be proposed to effectively learn embedding vectors for extracting linked user identities . 42
  • 43. Outline of Talk 43 Accepted at 9th International Conference on Social Informatics (SocInfo, 2017), University of Oxford, London.
  • 44. Linkability Nudge Can we help users control linkability of their identities across social networks ? We design and implement a linkability nudge, gentle interventions to help users towards making an informed decision. User decides a range of linkability threshold (score) for each identity pair. (dynamic web portal) Whenever user behavior goes beyond the pre-configured range, the user is nudged. (web browser extension) 44
  • 46. Linkability Score - Displayed to User 46
  • 49. Nudge Evaluation Controlled lab experiment, control vs treatment period. Participants were recruited and told to perform tasks related to making a post and changing their profile attribute. We observed the impact of linkability nudge on participants. 49
  • 50. Nudge Evaluation 50 Minutes since the start of experiment Participants
  • 51. Outline of Talk 51 Accepted at 7th International Conference on Mining Intelligence & Knowledge Exploration (MIKE 2019), NIT, Goa.
  • 52. Clone Detection Clone: User identity looking similar to the victim identity within the same social network 52
  • 53. Why detect clone identities ? 53
  • 54. Contributions Summary Performed comparative analysis of data collection methods. Investigated biases in identity linkage datasets. Proposed node embedding framework for user identity linkage. Helped users control linkability of their identities across OSNs. Applied UIL solution to detect clones and flag their behaviors. 54
  • 55. Limitations & Future Directions Data collection is a challenge. Need to explore other social media platforms goodreads, strava, etc. We employed situational testing in detection of dataset biases. Other methods from fairness algorithm studies need to be explored. Our NeXLink node embedding framework takes only network information. Leveraging content and profile features can be helpful. We performed controlled lab study. Deploying linkability nudge for field trials. 55
  • 56. Acknowledgements PhD Advisor: Prof PK Monitoring Committee: Prof Arun Balaji Buduru, Prof Rajiv Ratn Shah Co-authors and Peers Members of Precog My family 56