SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Anonymizing graphs: measuring quality for
clustering
Jordi Casas-Roma 1
Jordi Herrera-Joancomart´ı 2
Vicen¸c Torra 3
1
Universitat Oberta de Catalunya (UOC)
jcasasr@uoc.edu
2
Universitat Aut`onoma de Barcelona (UAB)
jherrera@deic.uab.cat
3
Artificial Intelligence Research Institute (IIIA)
Spanish National Research Council (CSIC)
vtorra@iiia.csic.es
UOC Research Showcase 2015. February 11, 2015
1 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Overview
1 Motivation
2 Information loss measures
3 Experimental framework
4 Correlating GIL and SIL measures
5 Conclusions
2 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Scenario
Release data to third parties
Preserve the privacy of users
3 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Motivation
We observe...
There are several graph-mining tasks and several methods to
compute each task.
How can we evaluate the real data utility?
Question
Can we use some generic graph metrics to predict real graph-mining
tasks?
4 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Generic information loss (GIL)
Specific information loss (SIL)
Generic information loss measures (GIL)
G G
m(G, G)
Anonymization
process p
Metric m Metric m
Framework for evaluating generic information loss measures
5 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Generic information loss (GIL)
Specific information loss (SIL)
Generic information loss measures (GIL)
Network metrics
average distance (dist)
diameter (d)
harmonic mean of the shortest distance (h)
sub-graph centrality (SC)
transitivity (T)
edge intersection (EI)
clustering coefficient (C)
modularity (Q)
m(G, G) = |m(G) − m(Gp)| (1)
6 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Generic information loss (GIL)
Specific information loss (SIL)
Generic information loss measures (GIL)
Spectral metrics
the largest eigenvalue of the adjacency matrix A (λ1)
the second smallest eigenvalue of the Laplacian matrix L (µ2)
Vertex metrics
betweenness centrality (CB )
closeness centrality (CC )
degree centrality (CD)
m(G, G) =
1
n
n
i=1
(m(vi ) − m(vi ))
2
(2)
7 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Generic information loss (GIL)
Specific information loss (SIL)
Clustering-specific information loss measures (SIL)
G G
Original clusters
c(G)
Precision
index
Perturbed clusters
c(G)
Anonymization
process p
Clustering
method c
Clustering
method c
precision index(G, G) =
1
n
n
v=1
ltc =lpc (3)
8 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Generic information loss (GIL)
Specific information loss (SIL)
Clustering-specific information loss measures (SIL)
Clustering algorithms
Markov Cluster Algorithm (MCL)
Algorithm of Girvan and Newman (Girvan-Newman or GN)
Fast greedy modularity optimization (Fastgreedy or FG)
Walktrap (WT)
Infomap (IM)
Multilevel (ML)
9 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Experimental framework
Original 1% Anon. 25% Anon.
Graph assessment
1% ... 25%
GIL
Clustering assessment
1% ... 25%
SIL
Perturbation
process
Are they equal?
Experimental framework for testing the correlation between GIL and SIL
10 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
GIL Self-correlation
SIL Self-correlation
GIL vs. SIL
Comparing datasets
GIL Self-correlation
Do the generic information loss measures behave in similar way
independently of the dataset?
Pearson dist d CB CC CD EI C T λ1 µ2
r 0.85 0.15 0.96 0.90 0.99 0.99 0.97 0.94 0.24 0.09
ρ-value 0 0.007 0 0 0 0 0 0 0 0.006
Pearson self-correlation value (r) and its associated ρ-value of GIL measures.
11 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
GIL Self-correlation
SIL Self-correlation
GIL vs. SIL
Comparing datasets
SIL self-correlation
Do the clustering-specific information loss measures behave in
similar way independently of the dataset?
Pearson MCL IM ML GN FG WT
r 0.287 0.626 0.777 0.828 0.782 0.656
ρ-value 0 0 0 0 0 0
Pearson self-correlation value (r) and its associated ρ-value of precision index.
12 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
GIL Self-correlation
SIL Self-correlation
GIL vs. SIL
Comparing datasets
GIL vs. SIL
Are GIL and SIL measures correlated?
Pearson MCL IM ML GN FG WT µ
dist 0.580 0.716 0.807 0.785 0.747 0.755 0.732
d 0.201 0.101 * 0.098 * 0.134 0.218 0.014 * 0.128
CB 0.559 0.687 0.854 0.865 0.831 0.724 0.753
CC 0.667 0.833 0.903 0.909 0.874 0.899 0.848
CD 0.296 0.380 0.416 0.504 0.481 0.457 0.422
EI 0.581 0.820 0.861 0.887 0.814 0.748 0.785
C 0.614 0.833 0.889 0.909 0.836 0.802 0.814
T 0.557 0.763 0.840 0.840 0.770 0.690 0.743
λ1 0.191 0.482 0.509 0.546 0.529 0.397 0.442
µ2 0.086 * 0.152 0.131 0.154 0.135 0.040 * 0.116
µ 0.433 0.577 0.631 0.653 0.624 0.553 NA
Pearson correlation values (r) and their average values µ. An asterisk indicates
ρ-values ≥ 0.05, i.e, results which are not statistically significant.
13 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
GIL Self-correlation
SIL Self-correlation
GIL vs. SIL
Comparing datasets
Aggregated GIL vs. SIL
Can we use more than one GIL measure to improve correlation?
Num. GIL measures r-square σ
1 CC 0.725 0.146
2 CB +CC 0.742 0.150
3 CB +CC +EI 0.765 0.155
4 d+CB +CC +EI 0.777 0.127
5 dist+d+CB +CC +EI 0.787 0.117
Multivariate regression analysis: r-square is indicative of the aggregate
correlation and σ is the standard deviation.
14 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
GIL Self-correlation
SIL Self-correlation
GIL vs. SIL
Comparing datasets
GIL vs. SIL
Are the results independently of the data where they are applied?
Pearson Karate Football Jazz Flickr URV Email
µ 0.716 0.796 0.717 0.780 0.729
σ 0.247 0.119 0.170 0.184 0.163
Pearson correlation averaged values (µ) and standard deviation (σ) for each
dataset.
15 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
Conclusions
Some measures behave in similar way independently of the data in
which they are applied.
There is strong correlation between some GIL and SIL:
1 closeness centrality
2 clustering coefficient
3 edge intersection
4 betweenness centrality
5 transitivity
6 average distance
Considering more than one metric helps us to get slightly higher
correlation values, but adding computational cost.
16 / 17
Motivation
Information loss measures
Experimental framework
Correlating GIL and SIL measures
Conclusions
The End
Thanks for your attention
Jordi Casas-Roma UOC jcasasr@uoc.edu
Jordi Herrera-Joancomart´ı UAB jherrera@deic.uab.cat
Vicen¸c Torra IIIA-CSIC vtorra@iiia.csic.es
17 / 17

Contenu connexe

Similaire à Anonymizing Graphs: Measuring Quality for Clustering

Statistics_Regression_Project
Statistics_Regression_ProjectStatistics_Regression_Project
Statistics_Regression_ProjectAlekhya Bhupati
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS DatasetKan Yuenyong
 
Monte Carlo Simulation in Social Services
Monte Carlo Simulation in Social ServicesMonte Carlo Simulation in Social Services
Monte Carlo Simulation in Social ServicesArt Serna, MSMOT
 
Add slides
Add slidesAdd slides
Add slidesRupa D
 
Use of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodUse of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodPhilip Ramsey
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataJay (Jianqiang) Wang
 
665 Sessions13-14-stats data vis-s13
665 Sessions13-14-stats data vis-s13665 Sessions13-14-stats data vis-s13
665 Sessions13-14-stats data vis-s13Diane Nahl
 

Similaire à Anonymizing Graphs: Measuring Quality for Clustering (8)

Statistics_Regression_Project
Statistics_Regression_ProjectStatistics_Regression_Project
Statistics_Regression_Project
 
SEM on MIDUS Dataset
SEM on MIDUS DatasetSEM on MIDUS Dataset
SEM on MIDUS Dataset
 
Monte Carlo Simulation in Social Services
Monte Carlo Simulation in Social ServicesMonte Carlo Simulation in Social Services
Monte Carlo Simulation in Social Services
 
Add slides
Add slidesAdd slides
Add slides
 
Use of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical MethodUse of Definitive Screening Designs to Optimize an Analytical Method
Use of Definitive Screening Designs to Optimize an Analytical Method
 
Presentation eng
Presentation engPresentation eng
Presentation eng
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
 
665 Sessions13-14-stats data vis-s13
665 Sessions13-14-stats data vis-s13665 Sessions13-14-stats data vis-s13
665 Sessions13-14-stats data vis-s13
 

Plus de UOC Universitat Oberta de Catalunya

Mobile ensembles: The uses of mobile phones for social protest by Spain’s ind...
Mobile ensembles: The uses of mobile phones for social protest by Spain’s ind...Mobile ensembles: The uses of mobile phones for social protest by Spain’s ind...
Mobile ensembles: The uses of mobile phones for social protest by Spain’s ind...UOC Universitat Oberta de Catalunya
 
El principio de integridad en la contratación pública. Mecanismos para la pre...
El principio de integridad en la contratación pública. Mecanismos para la pre...El principio de integridad en la contratación pública. Mecanismos para la pre...
El principio de integridad en la contratación pública. Mecanismos para la pre...UOC Universitat Oberta de Catalunya
 
Smart contradictions: The politics of making Barcelona a Self-sufficient city
Smart contradictions: The politics of making Barcelona a Self-sufficient citySmart contradictions: The politics of making Barcelona a Self-sufficient city
Smart contradictions: The politics of making Barcelona a Self-sufficient cityUOC Universitat Oberta de Catalunya
 
Gender Stereotypes and Attitudes Towards Information and Communication Techno...
Gender Stereotypes and Attitudes Towards Information and Communication Techno...Gender Stereotypes and Attitudes Towards Information and Communication Techno...
Gender Stereotypes and Attitudes Towards Information and Communication Techno...UOC Universitat Oberta de Catalunya
 
The end of scarcity? Water desalination as the new cornucopia for Mediterrane...
The end of scarcity? Water desalination as the new cornucopia for Mediterrane...The end of scarcity? Water desalination as the new cornucopia for Mediterrane...
The end of scarcity? Water desalination as the new cornucopia for Mediterrane...UOC Universitat Oberta de Catalunya
 
Urban Ecology Under Fire: Water Supply in Madrid During the Spanish Civil War...
Urban Ecology Under Fire: Water Supply in Madrid During the Spanish Civil War...Urban Ecology Under Fire: Water Supply in Madrid During the Spanish Civil War...
Urban Ecology Under Fire: Water Supply in Madrid During the Spanish Civil War...UOC Universitat Oberta de Catalunya
 
Little arrangements that matter. Rethinking autonomy-enabling innovations for...
Little arrangements that matter. Rethinking autonomy-enabling innovations for...Little arrangements that matter. Rethinking autonomy-enabling innovations for...
Little arrangements that matter. Rethinking autonomy-enabling innovations for...UOC Universitat Oberta de Catalunya
 
La construcción colaborativa de proyectos como metodología para adquirir comp...
La construcción colaborativa de proyectos como metodología para adquirir comp...La construcción colaborativa de proyectos como metodología para adquirir comp...
La construcción colaborativa de proyectos como metodología para adquirir comp...UOC Universitat Oberta de Catalunya
 
What leads people to keep on e-learning? An empirical analysis of users’ expe...
What leads people to keep on e-learning? An empirical analysis of users’ expe...What leads people to keep on e-learning? An empirical analysis of users’ expe...
What leads people to keep on e-learning? An empirical analysis of users’ expe...UOC Universitat Oberta de Catalunya
 
Rethinking dropout in online higher education: The case of the Universitat Ob...
Rethinking dropout in online higher education: The case of the Universitat Ob...Rethinking dropout in online higher education: The case of the Universitat Ob...
Rethinking dropout in online higher education: The case of the Universitat Ob...UOC Universitat Oberta de Catalunya
 
Framework for preserving security and privacy in peer-to-peer content distrib...
Framework for preserving security and privacy in peer-to-peer content distrib...Framework for preserving security and privacy in peer-to-peer content distrib...
Framework for preserving security and privacy in peer-to-peer content distrib...UOC Universitat Oberta de Catalunya
 
Routing Fleets with Multiple Driving Ranges: is it possible to use greener fl...
Routing Fleets with Multiple Driving Ranges: is it possible to use greener fl...Routing Fleets with Multiple Driving Ranges: is it possible to use greener fl...
Routing Fleets with Multiple Driving Ranges: is it possible to use greener fl...UOC Universitat Oberta de Catalunya
 
On the verification of UML/OCL class diagrams using constraint programming
On the verification of UML/OCL class diagrams using constraint programmingOn the verification of UML/OCL class diagrams using constraint programming
On the verification of UML/OCL class diagrams using constraint programmingUOC Universitat Oberta de Catalunya
 
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...UOC Universitat Oberta de Catalunya
 

Plus de UOC Universitat Oberta de Catalunya (20)

Irrupción de la FP en línea
Irrupción de la FP en líneaIrrupción de la FP en línea
Irrupción de la FP en línea
 
Irrupció de la FP en línia
Irrupció de la FP en líniaIrrupció de la FP en línia
Irrupció de la FP en línia
 
Mobile ensembles: The uses of mobile phones for social protest by Spain’s ind...
Mobile ensembles: The uses of mobile phones for social protest by Spain’s ind...Mobile ensembles: The uses of mobile phones for social protest by Spain’s ind...
Mobile ensembles: The uses of mobile phones for social protest by Spain’s ind...
 
El principio de integridad en la contratación pública. Mecanismos para la pre...
El principio de integridad en la contratación pública. Mecanismos para la pre...El principio de integridad en la contratación pública. Mecanismos para la pre...
El principio de integridad en la contratación pública. Mecanismos para la pre...
 
Smart contradictions: The politics of making Barcelona a Self-sufficient city
Smart contradictions: The politics of making Barcelona a Self-sufficient citySmart contradictions: The politics of making Barcelona a Self-sufficient city
Smart contradictions: The politics of making Barcelona a Self-sufficient city
 
Gender Stereotypes and Attitudes Towards Information and Communication Techno...
Gender Stereotypes and Attitudes Towards Information and Communication Techno...Gender Stereotypes and Attitudes Towards Information and Communication Techno...
Gender Stereotypes and Attitudes Towards Information and Communication Techno...
 
The end of scarcity? Water desalination as the new cornucopia for Mediterrane...
The end of scarcity? Water desalination as the new cornucopia for Mediterrane...The end of scarcity? Water desalination as the new cornucopia for Mediterrane...
The end of scarcity? Water desalination as the new cornucopia for Mediterrane...
 
Urban Ecology Under Fire: Water Supply in Madrid During the Spanish Civil War...
Urban Ecology Under Fire: Water Supply in Madrid During the Spanish Civil War...Urban Ecology Under Fire: Water Supply in Madrid During the Spanish Civil War...
Urban Ecology Under Fire: Water Supply in Madrid During the Spanish Civil War...
 
Little arrangements that matter. Rethinking autonomy-enabling innovations for...
Little arrangements that matter. Rethinking autonomy-enabling innovations for...Little arrangements that matter. Rethinking autonomy-enabling innovations for...
Little arrangements that matter. Rethinking autonomy-enabling innovations for...
 
La construcción colaborativa de proyectos como metodología para adquirir comp...
La construcción colaborativa de proyectos como metodología para adquirir comp...La construcción colaborativa de proyectos como metodología para adquirir comp...
La construcción colaborativa de proyectos como metodología para adquirir comp...
 
What leads people to keep on e-learning? An empirical analysis of users’ expe...
What leads people to keep on e-learning? An empirical analysis of users’ expe...What leads people to keep on e-learning? An empirical analysis of users’ expe...
What leads people to keep on e-learning? An empirical analysis of users’ expe...
 
Rethinking dropout in online higher education: The case of the Universitat Ob...
Rethinking dropout in online higher education: The case of the Universitat Ob...Rethinking dropout in online higher education: The case of the Universitat Ob...
Rethinking dropout in online higher education: The case of the Universitat Ob...
 
Framework for preserving security and privacy in peer-to-peer content distrib...
Framework for preserving security and privacy in peer-to-peer content distrib...Framework for preserving security and privacy in peer-to-peer content distrib...
Framework for preserving security and privacy in peer-to-peer content distrib...
 
Automated Prediction of Preferences Using Facial Expressions
Automated Prediction of Preferences Using Facial ExpressionsAutomated Prediction of Preferences Using Facial Expressions
Automated Prediction of Preferences Using Facial Expressions
 
Routing Fleets with Multiple Driving Ranges: is it possible to use greener fl...
Routing Fleets with Multiple Driving Ranges: is it possible to use greener fl...Routing Fleets with Multiple Driving Ranges: is it possible to use greener fl...
Routing Fleets with Multiple Driving Ranges: is it possible to use greener fl...
 
On the verification of UML/OCL class diagrams using constraint programming
On the verification of UML/OCL class diagrams using constraint programmingOn the verification of UML/OCL class diagrams using constraint programming
On the verification of UML/OCL class diagrams using constraint programming
 
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
Experiences in Digital Circuit Design Courses: A Self-Study Platform for Lear...
 
Memòria de la UOC. Curs 2013 - 2014 0
Memòria de la UOC. Curs 2013 - 2014 0Memòria de la UOC. Curs 2013 - 2014 0
Memòria de la UOC. Curs 2013 - 2014 0
 
Consells per a comprar tecnologia
Consells per a comprar tecnologiaConsells per a comprar tecnologia
Consells per a comprar tecnologia
 
Eines 2.0 per comunicar l'activitat d'R+D+I, per Xavier Lasauca
Eines 2.0 per comunicar l'activitat d'R+D+I, per Xavier LasaucaEines 2.0 per comunicar l'activitat d'R+D+I, per Xavier Lasauca
Eines 2.0 per comunicar l'activitat d'R+D+I, per Xavier Lasauca
 

Dernier

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 

Dernier (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 

Anonymizing Graphs: Measuring Quality for Clustering

  • 1. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Anonymizing graphs: measuring quality for clustering Jordi Casas-Roma 1 Jordi Herrera-Joancomart´ı 2 Vicen¸c Torra 3 1 Universitat Oberta de Catalunya (UOC) jcasasr@uoc.edu 2 Universitat Aut`onoma de Barcelona (UAB) jherrera@deic.uab.cat 3 Artificial Intelligence Research Institute (IIIA) Spanish National Research Council (CSIC) vtorra@iiia.csic.es UOC Research Showcase 2015. February 11, 2015 1 / 17
  • 2. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Overview 1 Motivation 2 Information loss measures 3 Experimental framework 4 Correlating GIL and SIL measures 5 Conclusions 2 / 17
  • 3. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Scenario Release data to third parties Preserve the privacy of users 3 / 17
  • 4. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Motivation We observe... There are several graph-mining tasks and several methods to compute each task. How can we evaluate the real data utility? Question Can we use some generic graph metrics to predict real graph-mining tasks? 4 / 17
  • 5. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Generic information loss (GIL) Specific information loss (SIL) Generic information loss measures (GIL) G G m(G, G) Anonymization process p Metric m Metric m Framework for evaluating generic information loss measures 5 / 17
  • 6. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Generic information loss (GIL) Specific information loss (SIL) Generic information loss measures (GIL) Network metrics average distance (dist) diameter (d) harmonic mean of the shortest distance (h) sub-graph centrality (SC) transitivity (T) edge intersection (EI) clustering coefficient (C) modularity (Q) m(G, G) = |m(G) − m(Gp)| (1) 6 / 17
  • 7. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Generic information loss (GIL) Specific information loss (SIL) Generic information loss measures (GIL) Spectral metrics the largest eigenvalue of the adjacency matrix A (λ1) the second smallest eigenvalue of the Laplacian matrix L (µ2) Vertex metrics betweenness centrality (CB ) closeness centrality (CC ) degree centrality (CD) m(G, G) = 1 n n i=1 (m(vi ) − m(vi )) 2 (2) 7 / 17
  • 8. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Generic information loss (GIL) Specific information loss (SIL) Clustering-specific information loss measures (SIL) G G Original clusters c(G) Precision index Perturbed clusters c(G) Anonymization process p Clustering method c Clustering method c precision index(G, G) = 1 n n v=1 ltc =lpc (3) 8 / 17
  • 9. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Generic information loss (GIL) Specific information loss (SIL) Clustering-specific information loss measures (SIL) Clustering algorithms Markov Cluster Algorithm (MCL) Algorithm of Girvan and Newman (Girvan-Newman or GN) Fast greedy modularity optimization (Fastgreedy or FG) Walktrap (WT) Infomap (IM) Multilevel (ML) 9 / 17
  • 10. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Experimental framework Original 1% Anon. 25% Anon. Graph assessment 1% ... 25% GIL Clustering assessment 1% ... 25% SIL Perturbation process Are they equal? Experimental framework for testing the correlation between GIL and SIL 10 / 17
  • 11. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions GIL Self-correlation SIL Self-correlation GIL vs. SIL Comparing datasets GIL Self-correlation Do the generic information loss measures behave in similar way independently of the dataset? Pearson dist d CB CC CD EI C T λ1 µ2 r 0.85 0.15 0.96 0.90 0.99 0.99 0.97 0.94 0.24 0.09 ρ-value 0 0.007 0 0 0 0 0 0 0 0.006 Pearson self-correlation value (r) and its associated ρ-value of GIL measures. 11 / 17
  • 12. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions GIL Self-correlation SIL Self-correlation GIL vs. SIL Comparing datasets SIL self-correlation Do the clustering-specific information loss measures behave in similar way independently of the dataset? Pearson MCL IM ML GN FG WT r 0.287 0.626 0.777 0.828 0.782 0.656 ρ-value 0 0 0 0 0 0 Pearson self-correlation value (r) and its associated ρ-value of precision index. 12 / 17
  • 13. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions GIL Self-correlation SIL Self-correlation GIL vs. SIL Comparing datasets GIL vs. SIL Are GIL and SIL measures correlated? Pearson MCL IM ML GN FG WT µ dist 0.580 0.716 0.807 0.785 0.747 0.755 0.732 d 0.201 0.101 * 0.098 * 0.134 0.218 0.014 * 0.128 CB 0.559 0.687 0.854 0.865 0.831 0.724 0.753 CC 0.667 0.833 0.903 0.909 0.874 0.899 0.848 CD 0.296 0.380 0.416 0.504 0.481 0.457 0.422 EI 0.581 0.820 0.861 0.887 0.814 0.748 0.785 C 0.614 0.833 0.889 0.909 0.836 0.802 0.814 T 0.557 0.763 0.840 0.840 0.770 0.690 0.743 λ1 0.191 0.482 0.509 0.546 0.529 0.397 0.442 µ2 0.086 * 0.152 0.131 0.154 0.135 0.040 * 0.116 µ 0.433 0.577 0.631 0.653 0.624 0.553 NA Pearson correlation values (r) and their average values µ. An asterisk indicates ρ-values ≥ 0.05, i.e, results which are not statistically significant. 13 / 17
  • 14. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions GIL Self-correlation SIL Self-correlation GIL vs. SIL Comparing datasets Aggregated GIL vs. SIL Can we use more than one GIL measure to improve correlation? Num. GIL measures r-square σ 1 CC 0.725 0.146 2 CB +CC 0.742 0.150 3 CB +CC +EI 0.765 0.155 4 d+CB +CC +EI 0.777 0.127 5 dist+d+CB +CC +EI 0.787 0.117 Multivariate regression analysis: r-square is indicative of the aggregate correlation and σ is the standard deviation. 14 / 17
  • 15. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions GIL Self-correlation SIL Self-correlation GIL vs. SIL Comparing datasets GIL vs. SIL Are the results independently of the data where they are applied? Pearson Karate Football Jazz Flickr URV Email µ 0.716 0.796 0.717 0.780 0.729 σ 0.247 0.119 0.170 0.184 0.163 Pearson correlation averaged values (µ) and standard deviation (σ) for each dataset. 15 / 17
  • 16. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions Conclusions Some measures behave in similar way independently of the data in which they are applied. There is strong correlation between some GIL and SIL: 1 closeness centrality 2 clustering coefficient 3 edge intersection 4 betweenness centrality 5 transitivity 6 average distance Considering more than one metric helps us to get slightly higher correlation values, but adding computational cost. 16 / 17
  • 17. Motivation Information loss measures Experimental framework Correlating GIL and SIL measures Conclusions The End Thanks for your attention Jordi Casas-Roma UOC jcasasr@uoc.edu Jordi Herrera-Joancomart´ı UAB jherrera@deic.uab.cat Vicen¸c Torra IIIA-CSIC vtorra@iiia.csic.es 17 / 17