SlideShare une entreprise Scribd logo
1  sur  53
Scalable and Parallelizable Processing
of Influence Maximization
for Large-Scale Social Networks
for Large-Scale Social Networks
Apr 9, 2013
Jinha Kim, Seung-Keol Kim, Hwanjo Yu
Pohang University of Science and Technology (POSTECH)
2
Goal
• Boosting Influence Maximization processing
by efficient influence evaluation
3
4
Viral MarketingViral Marketing
Influence Maximization ProblemInfluence Maximization Problem
GraphGraph
DiffusionDiffusion
ModelModel
ProcessingProcessing
AlgorithmAlgorithm
5
Word of Mouth Effect
...
...
...
7
A Marketer’s Perspective
...
...
...PERSUADPERSUAD
EE
ONE!ONE!
MakingMaking
Money!!!!Money!!!!
9
How to find in an
algorithmic way?
10
Viral MarketingViral Marketing
Influence Maximization ProblemInfluence Maximization Problem
GraphGraph
DiffusionDiffusion
ModelModel
ProcessingProcessing
AlgorithmAlgorithm
11
Quantifying Influence
The expected number of users influenced by S
12
Influence Maximization
Problem (KKT 03)
13
Viral MarketingViral Marketing
Influence Maximization ProblemInfluence Maximization Problem
GraphGraph
DiffusionDiffusion
ModelModel
ProcessingProcessing
AlgorithmAlgorithm
14
Abstracting Social
Networks
15
Abstracting Social
Network
uu vv
e
16
Viral MarketingViral Marketing
Influence Maximization ProblemInfluence Maximization Problem
GraphGraph
DiffusionDiffusion
ModelModel
ProcessingProcessing
AlgorithmAlgorithm
17
Quantifying Influence
The expected number of entities influenced by S
DEPENDS ON
how influence is propagated through a graph
19
SEEDSSEEDS
Independent Cascade
(IC) model
active
inactive
t = 0
20
Independent
Cascade(IC) model
active at t = i
inactive
t = i + 1
active at t < i
21
Independent
Cascade(IC) model
inactive
active at t < j
t = j + 1
Propagation ends!!!
22
Viral MarketingViral Marketing
Influence Maximization ProblemInfluence Maximization Problem
GraphGraph
DiffusionDiffusion
ModelModel
ProcessingProcessing
AlgorithmAlgorithm
24
Processing AlgorithmProcessing Algorithm
Macro LevelMacro Level
ProcessingProcessing
Micro LevelMicro Level
ProcessingProcessing
25
Processing AlgorithmProcessing Algorithm
Macro LevelMacro Level
ProcessingProcessing
Micro LevelMicro Level
ProcessingProcessing
26
Macro Level (KKT 03)
• Finding the maximum from
cases
• Reducible to set-covering problem
(NP-Hard)
27
Greedy Algorithm
(KKT 03)
• Repeatedly selects the node which gives
the most marginal gain from
• and are two major
evaluation components
28
Processing AlgorithmProcessing Algorithm
Macro LevelMacro Level
ProcessingProcessing
Micro LevelMicro Level
ProcessingProcessing
29
Micro Level (CWW 10)
• Cannot count influence propagation routes
between two nodes
30
Evaluating (S)σ
• Monte-Carlo Simulation (KKT 03)
• Simultaneous simulation (CWY 09)
• Breaking down a graph into communities
(WCS 10)
• Shortest path between two nodes (KS 06)
• Local arborescence based on the most
probable path (CWW 10)
31
Processing AlgorithmProcessing Algorithm
Macro LevelMacro Level
ProcessingProcessing IPAIPA
33
Intuition
• How about extremely localizing influence??
• Influence path between two nodes as
influence evaluation unit !!
• Considering all path is not tractable
(#P-hard)
• Only considering meaningful influence
paths
36
Meaningful Influence
Path in IC model
vv11vv11 vv22vv22 vv33vv33 vv44vv44 vv55vv55
0.1 0.1 0.1 0.1
37
Traversing Graph
Graph A traversing tree from a
38
Extracting Paths
A traversing tree from a A path collection from a
39
Organizing Paths
A path collection from a
40
Approximating ({v})σ
Influence of a node v
infl. of v to itself
Influence of a node v to u
41
Parallel evaluation
• To approximate ({v}),σ
Pv V→ is required
• For v≠u, Pv V→ and Pu V→ do
not have common paths
• Independent evaluation
of ({v}) is guaranttedσ
vv11vv11
uu1111
uu1111
uu1n1n
uu1n1n
......
vv22vv22
uu2121
uu2121
uu2n2n
uu2n2n
......
42
Re-organizing
• Changing perspective from starting nodes
to ending nodes
43
• ({v}) ≠ (S {v}) - (S)σ σ ∪ σ
• influence blocking!!!!
• v blocks a path from u S∈
• We should detect blocked(invalid) paths
Approximating (S {v}) - (S)σ ∪ σ
is not trivial
is not trivial
uuuu vvvv
uuuu vvvv
before
after
44
Detecting influence
blocking
• Current seed set : S
• New seed node : v
• Valid Paths
uuuu vvvv
vvvv uuuu
45
Adding a seed node
46
Detect invalid paths
47
Approximating
(S {v}) - (S)σ ∪ σ
(S {v}) - (S)σ ∪ σ
Marginal infl. of a node v
infl. of v to itself
Infl. of seeds S to a node v
Only consider valid paths
51
Empirical EvaluationEmpirical Evaluation
52
Dataset
53
Algorithms
• Monte-Carlo[Greedy] (LKG 07)
• PMIA (CWW 10)
• SD (single discount)
• Random (baseline)
• IPA
54
Finding Threshold
55
Processing Time
57
Influence
58
Influence
59
Influence
60
Parallelization Effect
61
Q & A
62
References
63
• KKT 03 : Kempe, D., Kleinberg, J., andTardos, E. Maximizing
the spread of influence through a social network.
(KDD ’03)
• SC 06 : Kimura, M., and Saito, K.Tractable models for
information diffusion in social networks.
(PKDD ’06)
• LKG 07 : Leskovec, J., Krause,A., Guestrin, C., Faloutsos, C.,
VanBriesen, J., and Glance, N. Cost-effective outbreak
detection in networks.
(KDD ’07)
• CWY 09 : Chen,W.,Wang,Y., andYang, S. Efficient influence
maximization in social networks.
(KDD ’09)
64
• CWW 10 : Chen,W.,Wang, C., and Wang,Y. Scalable influence
maximization for prevalent viral marketing in large-scale social
networks.
(KDD ’10)
• WCS 10 : Wang,Y., Cong, G., Song, G., and Xie, K. Community-based
greedy algorithm for mining top- k influential nodes in mobile social
networks.
(KDD ’10)
• JSC 11 : Jiang, Q., Song, G., and Cong, G., Simulated Annealing Based
Influence Maximization in Social Networks.
(AAAI ’11)
• LYK 12 : Lee,W., Kim, J., andYu, H., CT-IC: Continuously activated
and Time-restricted Independent Cascade Model forViral Marketing
(ICDM ’12)

Contenu connexe

Tendances

les topologies réseaux informatique
les topologies réseaux informatiqueles topologies réseaux informatique
les topologies réseaux informatique
boukrab
 
1 hydraulique introduction
1 hydraulique introduction1 hydraulique introduction
1 hydraulique introduction
jean
 
Corrigé qcm initiation informatique sgbd - réseau - internet - architectu...
Corrigé qcm   initiation informatique   sgbd - réseau - internet - architectu...Corrigé qcm   initiation informatique   sgbd - réseau - internet - architectu...
Corrigé qcm initiation informatique sgbd - réseau - internet - architectu...
Sofien Zarrouki
 

Tendances (20)

qcm développement informatique
qcm développement informatiqueqcm développement informatique
qcm développement informatique
 
Introduction au Génie Logiciel
Introduction au Génie LogicielIntroduction au Génie Logiciel
Introduction au Génie Logiciel
 
Examen du-concours-ministere-de-linterieur-technicien-specialise-en-informati...
Examen du-concours-ministere-de-linterieur-technicien-specialise-en-informati...Examen du-concours-ministere-de-linterieur-technicien-specialise-en-informati...
Examen du-concours-ministere-de-linterieur-technicien-specialise-en-informati...
 
Karim Baina Big Data ENSIAS December 2016
Karim Baina Big Data ENSIAS December 2016Karim Baina Big Data ENSIAS December 2016
Karim Baina Big Data ENSIAS December 2016
 
les topologies réseaux informatique
les topologies réseaux informatiqueles topologies réseaux informatique
les topologies réseaux informatique
 
Le destin dans_la_littérature
Le destin dans_la_littératureLe destin dans_la_littérature
Le destin dans_la_littérature
 
QCM basique sur les réseaux informatiques
QCM basique sur les réseaux informatiquesQCM basique sur les réseaux informatiques
QCM basique sur les réseaux informatiques
 
Presentation (SOUTENANCE) : PFE
Presentation (SOUTENANCE) : PFE Presentation (SOUTENANCE) : PFE
Presentation (SOUTENANCE) : PFE
 
Présentation de Mémoire de fin d’étude
Présentation de Mémoire de fin d’étudePrésentation de Mémoire de fin d’étude
Présentation de Mémoire de fin d’étude
 
1 hydraulique introduction
1 hydraulique introduction1 hydraulique introduction
1 hydraulique introduction
 
Presentation these
Presentation thesePresentation these
Presentation these
 
Conception et réalisation d'un quadricoptère pour la prise de vue aérienne
Conception et réalisation d'un quadricoptère pour la prise de vue aérienneConception et réalisation d'un quadricoptère pour la prise de vue aérienne
Conception et réalisation d'un quadricoptère pour la prise de vue aérienne
 
BigData_Chp4: NOSQL
BigData_Chp4: NOSQLBigData_Chp4: NOSQL
BigData_Chp4: NOSQL
 
QCM informatique de base
QCM informatique de baseQCM informatique de base
QCM informatique de base
 
Corrigé qcm initiation informatique sgbd - réseau - internet - architectu...
Corrigé qcm   initiation informatique   sgbd - réseau - internet - architectu...Corrigé qcm   initiation informatique   sgbd - réseau - internet - architectu...
Corrigé qcm initiation informatique sgbd - réseau - internet - architectu...
 
Présentation PFE Module Article GPAO
Présentation PFE Module Article GPAOPrésentation PFE Module Article GPAO
Présentation PFE Module Article GPAO
 
Livre blanc hopital numérique
Livre blanc hopital numériqueLivre blanc hopital numérique
Livre blanc hopital numérique
 
Slides de présentation de la thèse du doctorat
Slides de présentation de la thèse du doctoratSlides de présentation de la thèse du doctorat
Slides de présentation de la thèse du doctorat
 
Cours Big Data Chap1
Cours Big Data Chap1Cours Big Data Chap1
Cours Big Data Chap1
 
Etude d’un répartiteur générale téléphonique
Etude d’un répartiteur générale téléphoniqueEtude d’un répartiteur générale téléphonique
Etude d’un répartiteur générale téléphonique
 

En vedette

A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence Maximization
Surendra Gadwal
 
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
Wei Lu
 
Viral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret
Viral Marketing Meets Social Advertising: Ad Allocation with Minimum RegretViral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret
Viral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret
Cigdem Aslay
 
Spread influence on social networks
Spread influence on social networksSpread influence on social networks
Spread influence on social networks
Armando Vieira
 
IMAX PRESENTATION
IMAX PRESENTATIONIMAX PRESENTATION
IMAX PRESENTATION
Sebby23
 

En vedette (8)

Maximizing Social Influence: A Case Study (or, GlassesA A Love Story
Maximizing Social Influence: A Case Study (or, GlassesA A Love Story Maximizing Social Influence: A Case Study (or, GlassesA A Love Story
Maximizing Social Influence: A Case Study (or, GlassesA A Love Story
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence Maximization
 
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
From Competition to Complementarity: Comparative Influence Diffusion and Maxi...
 
Viral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret
Viral Marketing Meets Social Advertising: Ad Allocation with Minimum RegretViral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret
Viral Marketing Meets Social Advertising: Ad Allocation with Minimum Regret
 
Aslay Ph.D. Defense
Aslay Ph.D. DefenseAslay Ph.D. Defense
Aslay Ph.D. Defense
 
Advanced Search Techniques
Advanced Search TechniquesAdvanced Search Techniques
Advanced Search Techniques
 
Spread influence on social networks
Spread influence on social networksSpread influence on social networks
Spread influence on social networks
 
IMAX PRESENTATION
IMAX PRESENTATIONIMAX PRESENTATION
IMAX PRESENTATION
 

Similaire à Scalable and Parallelizable Processing of Influence Maximization for Large-Scale Social Networks

Socable Influence Maximization
Socable Influence MaximizationSocable Influence Maximization
Socable Influence Maximization
robertlz
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
South West Data Meetup
 
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
GeekPwn Keen
 
Diff thatmakesdiff viz
Diff thatmakesdiff vizDiff thatmakesdiff viz
Diff thatmakesdiff viz
Tony Hirst
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
acijjournal
 

Similaire à Scalable and Parallelizable Processing of Influence Maximization for Large-Scale Social Networks (20)

Socable Influence Maximization
Socable Influence MaximizationSocable Influence Maximization
Socable Influence Maximization
 
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the ObviousBig-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
Big-O(Q) VLDB 2015 Keynote: Social Network Analytics: Beyond the Obvious
 
Big-O(Q) Social Network Analytics
Big-O(Q) Social Network AnalyticsBig-O(Q) Social Network Analytics
Big-O(Q) Social Network Analytics
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
Modelling of Quality of Experience in No-Reference (NR) Model
Modelling of Quality of Experience in No-Reference (NR) ModelModelling of Quality of Experience in No-Reference (NR) Model
Modelling of Quality of Experience in No-Reference (NR) Model
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
 
(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim(141205) Masters_Thesis_Defense_Sundong_Kim
(141205) Masters_Thesis_Defense_Sundong_Kim
 
Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)Networks, Deep Learning (and COVID-19)
Networks, Deep Learning (and COVID-19)
 
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
BOOSTING ADVERSARIAL ATTACKS WITH MOMENTUM - Tianyu Pang and Chao Du, THU - D...
 
Diff thatmakesdiff viz
Diff thatmakesdiff vizDiff thatmakesdiff viz
Diff thatmakesdiff viz
 
A network pruning based approach for subset specific influential detection
A network pruning based approach for subset specific influential detectionA network pruning based approach for subset specific influential detection
A network pruning based approach for subset specific influential detection
 
Deep learning italia speech galazzo
Deep learning italia speech galazzoDeep learning italia speech galazzo
Deep learning italia speech galazzo
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
WWW 2021report public
WWW 2021report publicWWW 2021report public
WWW 2021report public
 
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
How Do Gain and Discount Functions Affect the Correlation between DCG and Use...
 
Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms
Diversified Recommendation on Graphs: Pitfalls, Measures, and AlgorithmsDiversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms
Diversified Recommendation on Graphs: Pitfalls, Measures, and Algorithms
 
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
COMPARATIVE PERFORMANCE ANALYSIS OF RNSC AND MCL ALGORITHMS ON POWER-LAW DIST...
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphs
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 

Dernier

Dernier (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Scalable and Parallelizable Processing of Influence Maximization for Large-Scale Social Networks

Notes de l'éditeur

  1. Hello, my name is Jinha Kim and let me present our research topic named scalable and pralleliazable processing of Influence Maximization for Large-Scale Social Networks. This is a joint work with Seung-Keol Kim and my advisor hwanjo Yu.
  2. The goal of this presentation is to devise a method that efficiently evaluates influence which is the most time-consuming part of the influence maximization problem.
  3. This diagram outlines this talk. First, how viral marketing exploits the social network is shown briefly. Then, to find the most effective users in viral marketing, the influence maximization problem is formulated. To concretize the problem, how social networks are abstracted as graphs and how influence is propagated throughout graphs are described briefly. Finally, how the influence maximization problem is solved using our method is described in detail.
  4. Let me show how viral marketing works in social networks.
  5. In social networks, a user’s opinion is spread throughout the network. For example, a twitter user writes an impressive posting and his or her followers may re-tweet it as a sign of agreement. Then, the followers of followers may re-tweet it again and this kind of chain reaction affects the whole network. This is called the ‘word of mouth’ effect.
  6. To exploit the word-of-mouth effect, marketers persuade some influential users and hope that the positive opinion of them is inflated into the social network. This is how the viral marketing works in social networks. Therefore, to be a successful marketing, finding the top most influential people is the most crucial task.
  7. Then, an important question arises. How can we find such users in an algorithmic way!
  8. The question is formulated as the influence maximization problem.
  9. First, the influence should quantified. When a user subset is given as S, a function of sigma S returns the expected number of users influenced by S. This is the quantified influence in networks.
  10. Then, the influence maximization problem is formulated as a combinatorial optimization expression.
  11. To define sigma S concretely, a graph and a influence diffusion process should be modeled.
  12. A social network can be abstracted as a weighted directed graph.
  13. For example, assuming that in facebook, a user ‘v’ likes a posting of his or her friend ‘u’. In the corresponding graph, user ‘u’ and ‘v’ become nodes and their friendship relation becomes an edge and how much user ‘v’ likes his or her friends ‘u’’s posting becomes the weight of the edge.
  14. And diffusion model should also be defined.
  15. Given a graph, the quantified influence sigma of S depends on how influence is propagated. In this research, our method is based on the independent cascade model which is simple but well-established. In the next few slides, I will explain how the independent cascade model works in an inductive way.
  16. At time zero, several seed nodes are activated by the marketers. and All the other nodes are inactive.
  17. At the time i plus one, as shown in the figure, active nodes which are activated at time i have one chance to activate its inactive out-neighbors. on the contrary active nodes which are activated before i do not have such chance.
  18. The influence propagation continues until no nodes are activated Assuming that no nodes are activated at time j consequently at time j plus one, no inactive nodes have chance to be activated
  19. After defining the graph and the diffusion model, the influence maximization problem can be solved.
  20. Before describing our method, let me show two challenges that the influence maximization processing confronts.
  21. At the macro level,
  22. The optimization expression itself is NP-hard. Intuitively, finding the optimal solution requires finding the best from all possible combinations. The expression is reducible to set-covering problem and proven to be NP-hard.
  23. To detour the NP-hard challenge, the greedy algorithm is proposed in the seminal paper of the influence maximization problem. The greedy algorithm repeatedly chooses the node which gives the most marginal influence increase from the current seed set. In the greedy algorithm, influence of each node and the marginal influence increase are two major evaluation components. However, evaluating the exact influence is also hard.
  24. We call it the micro level challenge of the influence maximization.
  25. The influence evaluation itself is included in the #P-hard problem class, which says we cannot count the number of all possible solutions of a given problem. In the influence evaluation perspective, it is related to the fact that we cannot count all influence propagation paths even between two nodes in a polynomial time.
  26. To overcome the micro level challenge, several methods are proposed. In the seminal paper of the influence maximization, influence is evaluated using Monte-Carlo simulation. in which actual diffusion process is repeated over ten thousand times and the average activated nodes are determined to be the influence. However, the Monte-Carlo simulation takes too much time. To boost the evaluation time, local structure such as shortest path between two nodes or local arborescence structures are used.
  27. Along with these methods, we propose a more efficient influence approximation heuristic, IPA.
  28. To evaluate influence efficiently, existing methods confines the influence diffusion locally. Our intuition is how about localizing influence extremely. That leads to set all meaningful paths between two nodes as influence evaluation unit. The word ‘meaningful’ in this context is formally defined in the next slide.
  29. When an influence path is a sequence of nodes, the influence propagation probability ipp(.) is defined as the product of the sequence of edge weights. We only consider influence paths whose ipp is no less than the pre-defined threshold theta. For example, assuming that all edge weights are 0.1 and the threshold is 0.001, we only consider influence paths of length up-to three, but paths longer than three will be ignored.
  30. With the definition of the meaningful influence paths, let us see how meaningful influence paths are collected and organized to evaluate the single node influence. Suppose that a graph is given as the left figure. IPA first traverses the graph from each node in a breadth first way. The right figure is the result of the traverse from node a. The traversal stops when a cycle is detected or ipp() becomes less than the threshold
  31. After the traversal, IPA extracts the influence paths from the tree. Influence paths are all the paths from the root to each non-root nodes in the traversal tree. From the traversal tree in the left figure, ten paths that start from node ‘a’ are extracted. We call such path set as P sub a to V.
  32. For each node, the graph traversal is conducted and all influence paths are collected. The paths are grouped by their starting nodes.
  33. Now, IPA can approximate the influence of a single node. hat symbol is used to indicate that it is an approximation. The influence of single node ‘v’ is the sum of one which is the influence of itself and the sum of influence from v to the nodes of ‘v’’s reaching area O sub v. The reaching area of ‘a’ in the example is b,c,d and e. The influence between two nodes are defined as the complement of the probability that no paths between them do not influence the sink node.
  34. The parallel evaluation of single node influence is simple. To approximate the influence of a node v, P sub v to node set V is required. For two different node u and v, P sub v to capital V and P sub u to capital V are required but do not have common paths. Therefore, parallel evaluation of single node influence is possible.
  35. Up-to now, IPA evaluated the single node influence. To evaluate the marginal influence increase, IPA re-organizes the paths. In the single node influence evaluation, paths are grouped by their starting nodes. Now, Paths are re-grouped by their ending nodes. By re-organizing, IPA can efficiently evaluate the marginal influence increase in parallel
  36. Now, we reach the marginal influence increase evaluation phase. It is complicated because the marginal increase is not equal to the mere difference of the influence before and after adding new seed candidate. For example, before adding v as a seed, a path of u,v and the remaining is valid. However, after adding, such path becomes meaningless because activation trial of u to v is impossible in the independent cascade model. We call this influence blocking and should detect such invalid paths
  37. For the current seed set S, among the paths that start from seed nodes ///----------------------of P sub capital S to capital V, all paths that have v as their element are invalidated. For the new seed node v, among the paths that start from v ///------------------------of P sub v to capical V all paths that have any current seed nodes are invalidated. In sum, a valid path contains only one seed node as its starting node.
  38. Let us see how invalid paths are detected. Suppose that the current seed set consists of only ‘a’ and ‘d’ is added as a new seed candidate. The left figure shows that before adding d, paths from a to e are only valid. After adding ‘d’ into seed set, five paths become candidate paths.
  39. However, adding new seed makes some paths invalid. For example, d blocks the influence of a in the path (a,d,e) and a blocks the influence of d in the path (d,a,c,e). In the end, only three paths are valid and used to evaluate the marginal influence increase.
  40. Using the valid paths, marginal influence increase is evaluated. The marginal influence increase is the sum of one which is the influence of a new seed v and the sum of the marginal influence increase from seed nodes to the v’s reaching area. The marginal influence increase of the seed nodes to a node u a member of v’s reaching area is the complement of the probability that no valid paths from seed nodes do not influence u. Similarly to the single node influence evaluation, green box is also parallelizable. This is all about how IPA evaluates influence.
  41. Now, let me show the empirical evaluation result of our method
  42. Five publicly available real datasets are used. The node size ranges from 75 thousand to 5mil and the edge size ranges from 500 thousand to 70m.
  43. Along with IPA, four other influence evaluation methods are used. Monte-Carlo is the Monte-Carlo simulation method which is used in the seminal paper of the problem. the number of repetition is 20,000. PMIA is the state of the art influence evaluation method which exploits the local arborescence structure. SD is an influence evaluation method that only counts on the graph structure but not influence diffusion model Random is random. All five influence evaluation methods are plugged into the greedy algorithm.
  44. First, we should find the threshold in each dataset for IPA and PMIA. As shown in the figure, although processing time and influence are both desirable features, they have trade-off relation. Thus, we find the elbow point in which neither feature sacrifices the other.
  45. This plot shows the log-scaled processing time of the five methods. Greedy is slow. In patent and livejournal, it couldn’t finish until one hundred thousand seconds elapsed. The single discount and random is trivially fast because they do not consider the influence diffusion but the influence of their solution is not good. IPA shows an order of magnitude shorter processing time than PMIA which is the state of the art. PMIA did not finish in livejournal due to the memory problem.
  46. Along with the processing time, we also evaluate how fast the next seed node is pop out after the first seed node is found. As shown in the plots, IPA
  47. These plots show the influence of the solutions of five methods in five datasets. In influence of the seed node, greedy is trivially the best because it repeats the influence diffusion simulation until stable influence is acquired. In Epinion, both IPA and PMIA shows influence close to that of greedy. Single discount and Random show low influence. In Stanford, IPA only loses 8% of influence compared to greedy, but PMIA loses over 20%.
  48. In DBLP, IPA shows slightly lower influence than PMIA, but the difference is not much compared to Stanford dataset
  49. In patent, IPA shows more influence as the number of seed nodes increases. In LiveJournal, only IPA produces meaningful influence.
  50. Finally, we report the parallelization effect. The parallelization effect is measured by the speed-up which is a fraction of the processing time of single threaded IPA over that of multi threaded IPA. As shown in the figure, IPA parallelizes more when the dataset size is bigger.
  51. That’s it. This is the end of this talk. Any questions??