SlideShare a Scribd company logo
1 of 39
Counting Fast
                               (Part II)

                                   Sergei Vassilvitskii
                                 Columbia University
                         Computational Social Science
                                       March 8, 2013



Thursday, March 14, 13
Last time

              Counting fast:
                – Quadratic time doesn’t scale
                – Sorting is slightly more than linear
                – Hashing allows you to do membership queries in constant time




                                                   2                      Sergei Vassilvitskii

Thursday, March 14, 13
Today

              Counting on Networks:
                – Large Graphs: Internet, Facebook, Twitter
                – Recommendation Graphs: Netflix, Amazon, etc.




                                                 3              Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?




                                                4   Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users




                                                5                       Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users
                – Twitter’s who to follow?




                                                6                       Sergei Vassilvitskii

Thursday, March 14, 13
Friends & Followers

              Given a network:
                – When do people become friends?
                – What factors influence this?


              Products:
                – People You May Know (PYMK). Reconnect people, help new users
                – Twitter’s who to follow?


              Recommendations:
                – Netflix, Amazon, etc. (Future lectures)




                                                  7                     Sergei Vassilvitskii

Thursday, March 14, 13
Triadic Closure

              Likely to become friends with:
                – People in similar groups
                – Friends of friends




                                             8   Sergei Vassilvitskii

Thursday, March 14, 13
Defining Tight Knit Circles

              Looking for tight-knit circles:
                – People whose friends are friends themselves


              Why?
                – Network Cohesion: Tightly knit communities foster more trust, social
                  norms. [Coleman ’88, Portes ’88]
                – Structural Holes: Individuals benefit form bridging [Burt ’04, ’07]




                                                   9                          Sergei Vassilvitskii

Thursday, March 14, 13
Clustering Coefficient




                         vs.




                         10    Sergei Vassilvitskii

Thursday, March 14, 13
Clustering Coefficient



       cc (        ) = 0.5                            cc (   ) = 0.1


                                                vs.




             Given an undirected graph
             - For each node, it’s the fraction of v’s neighbors who are neighbors
             themselves
             - Identical to the number of triangles containing the node


                                                11                          Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=0




                                           12                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=1
                                                w

                                      u
                                           13                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++




                                  v

                                                      Triangles[v]=1

                         w
                                      u
                                           14                     Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles

              Sequential Version:
                    foreach v in V
                        foreach u,w in Adjacency(v)
                           if (u,w) in E
                              Triangles[v]++


              Running time:
                – For each vertex, look at all pairs of neighbors
                – Number of pairs ~ quadratic in the degree of the vertex


                – What happens if the degree is very large?




                                                  15                        Sergei Vassilvitskii

Thursday, March 14, 13
Parallel Version

              But use 1,000 machines!
                – Quadratic algorithms still don’t scale
                – Simple parallelization: process each vertex separately




              Naive parallelization does not help with data skew
                – Some nodes will have very high degree
                – Example. 3.2 Million followers, must generate 10 Trillion (10^13)
                  potential edges to check.
                – Even if generating 100M edges per second this is 100K seconds ~ 27
                  hours for one vertex!




                                                  16                        Sergei Vassilvitskii

Thursday, March 14, 13
“Just 5 more minutes”

              On the LiveJournal Graph (5M nodes, 70M edges)
                – 80% of vertices are done after 5 min
                – 99% done after 35 min




                                                 17            Sergei Vassilvitskii

Thursday, March 14, 13
Adapting the Algorithm

              Approach 1: Dealing with skew directly
                – currently every triangle counted 3 times (once per vertex)
                – Running time quadratic in the degree of the vertex
                – Idea: Count each once, from the perspective of lowest degree vertex
                – Does this heuristic work?




                                                 18                            Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              Idea [Schank ’07]
                – Only pivot on nodes who have smaller degrees than both neighbors.
                – Neighbors of high degree nodes tend to have small degrees




                                                19                        Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              foreach v in V
                 foreach u in Adjacency(v) with deg(u) > deg(v):
                     foreach w in Adjacency(v) with deg(w) > deg(v):
                       if (u,w) is an edge:
                          Triangles[v]++
                          Triangles[w]++
                          Triangles[u]++




                                       20                   Sergei Vassilvitskii

Thursday, March 14, 13
Does it make a difference?




                         21      Sergei Vassilvitskii

Thursday, March 14, 13
Why does it help?

              Look at two different kinds of nodes:
                – Few friends:
                         • OK to be quadratic on small instances
                – Lots of friends
                         • Only care about number of friends with even more friends!
                         • Cannot have too many (can make this formal)




                                                            22                         Sergei Vassilvitskii

Thursday, March 14, 13
Break




                         23   Sergei Vassilvitskii

Thursday, March 14, 13
Working in Parallel

              MapReduce (review):


              Map:
                – Decide how to group the data for computation

              Reduce:
                – Given the grouping, perform the computation




                                               24                Sergei Vassilvitskii

Thursday, March 14, 13
Building People You May Know

              Friendships are undirected:
                – If Alice knows Bob, Bob knows Alice
                – Data stored as a list of all edges
                – Find all friends of friends
                – Score the possible pairs




                                                   25   Sergei Vassilvitskii

Thursday, March 14, 13
Data

              Suppose you have edges and degrees of each vertex:


              Joe        56    Mary       78
              Alice 398        Bob    198
              Dan        983   Justin 11,985,234
              ...


              An alternate view may be data stored as adjacency list:
              Joe         56    Mary 78        Don   99   Bill 1
              Alice 398         Kate 55        Bob 198    Mary 78
              ...


                                                26                  Sergei Vassilvitskii

Thursday, March 14, 13
Previous Algorithm

              Adjacency list input.
                – Map:
                         • For each node and its neighbors, output all paths through the node
                – Reduce:
                         • none




                – Map: [          |         ]
                – Output:
                – Map: [          |    ]
                – Output: None
                                                            27                                  Sergei Vassilvitskii

Thursday, March 14, 13
How to Count Triangles Better

              Idea [Schank ’07]
                – Only pivot on nodes who have smaller degrees than both neighbors.
                – Neighbors of high degree nodes tend to have small degrees




                                                28                        Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                29    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                30    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                                                31    Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Data Needed:
                – Central node
                – Neighbors that have higher degree




                – Orient each edge to point to a node of higher degree, breaking ties
                  arbitrarily but consistently




                                                 32                          Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Map:
                – Orient each edge to point to a node of higher degree, breaking ties
                  arbitrarily but consistently
                – Given: Joe      56    Mary         78
                – Output: <Key    = Joe, Value      = Mary>
                – Given: Alice    398   Bob         198
                – Output: <Key    = Bob, Value      = Alice>

                     map(key, value):
                        split = value.split()
                        if split[3] > split[1] or
                          (split[3] == split[1] and split[0] < split[2]):
                             emit(split[0], split[2])
                        if split[3] < split[1] or
                          (split[3] == split[1] and split[0] > split[2]):
                             emit(split[2], split[0])


                                                 33                          Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)




              Computation:
                – Generate all 2-paths (friend of a friend relationships):




                                                   34                           Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)




              Computation:
                – Generate all 2-paths (friend of a friend relationships):
                – Generate all 2-paths:       ,         ,




                                                   35                           Sergei Vassilvitskii

Thursday, March 14, 13
Want to compute all open triads

              Aggregate (Shuffle):
                – Collect all values with same key (nodes with higher degree)

              Computation:
                – Generate all 2-paths (friend of a friend relationships)
                – Given: key= Joe, value={Mary, Justin, Alice}
                – Output:
                         • (key = Joe, Value = (Mary, Justin))
                         • (key = Joe, Value = (Mary, Alice))
                         • (key = Joe, Value = (Justin, Alice))

                    reduce(key, values):
                       for friend1 : values
                          for friend2 : values
                             emit(key, (friend1, friend2))


                                                    36                          Sergei Vassilvitskii

Thursday, March 14, 13
Comparing Algorithms

              Edgelist MapOnly Algorithm:
                – MapOnly
                – Output from some nodes is quadratic


              Edge at a time Algroithm:
                – Map & Reduce
                – More balanced output from each node




                                               37       Sergei Vassilvitskii

Thursday, March 14, 13
Scoring

              Some suggestions are better than others:
                – Some people are already friends!
                – Or they used to be friends...
                – Connected through a friend with 1000s of friends
                – Connected through multiple friends
                – ...




                                                  38                 Sergei Vassilvitskii

Thursday, March 14, 13
Spring Break!




                         39   Sergei Vassilvitskii

Thursday, March 14, 13

More Related Content

Viewers also liked

Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIjakehofman
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Auto Elektrikoen Aurkezpena
Auto Elektrikoen AurkezpenaAuto Elektrikoen Aurkezpena
Auto Elektrikoen Aurkezpenaguest40206e1d
 
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social CulturalProyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social Culturalguest2a9b0b0
 
Student textual analysis
Student textual analysisStudent textual analysis
Student textual analysisGudj
 
Tic ted bravo nicolás
Tic ted bravo nicolásTic ted bravo nicolás
Tic ted bravo nicolásNico Bravo
 
Поетична свічка
Поетична свічкаПоетична свічка
Поетична свічкаDoom Doom
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcingasherad
 

Viewers also liked (20)

Computational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part IIComputational Social Science, Lecture 04: Counting at Scale, Part II
Computational Social Science, Lecture 04: Counting at Scale, Part II
 
Computational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to CountingComputational Social Science, Lecture 02: An Introduction to Counting
Computational Social Science, Lecture 02: An Introduction to Counting
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Auto Elektrikoen Aurkezpena
Auto Elektrikoen AurkezpenaAuto Elektrikoen Aurkezpena
Auto Elektrikoen Aurkezpena
 
Iso 9001 2008 - qms
Iso 9001  2008 - qmsIso 9001  2008 - qms
Iso 9001 2008 - qms
 
лабар5
лабар5лабар5
лабар5
 
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social CulturalProyecto De FormacióN E InvestigacióNúCleo Social Cultural
Proyecto De FormacióN E InvestigacióNúCleo Social Cultural
 
Presentación del centro Don Juan I
Presentación del centro Don  Juan IPresentación del centro Don  Juan I
Presentación del centro Don Juan I
 
звіти
звітизвіти
звіти
 
Student textual analysis
Student textual analysisStudent textual analysis
Student textual analysis
 
Tic ted bravo nicolás
Tic ted bravo nicolásTic ted bravo nicolás
Tic ted bravo nicolás
 
Evaluation Q7
Evaluation Q7Evaluation Q7
Evaluation Q7
 
Поетична свічка
Поетична свічкаПоетична свічка
Поетична свічка
 
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
Rukovodstvo po montazhu_i_nastrojke_touch_board_i_
 
Virus y antivirus
Virus y antivirusVirus y antivirus
Virus y antivirus
 
SEPM Outsourcing
SEPM OutsourcingSEPM Outsourcing
SEPM Outsourcing
 
Lake maggiore
Lake maggioreLake maggiore
Lake maggiore
 

More from jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbitjakehofman
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10jakehofman
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09jakehofman
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brainjakehofman
 

More from jakehofman (16)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Technical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal WabbitTechnical Tricks of Vowpal Wabbit
Technical Tricks of Vowpal Wabbit
 
Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10Data-driven modeling: Lecture 10
Data-driven modeling: Lecture 10
 
Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09Data-driven modeling: Lecture 09
Data-driven modeling: Lecture 09
 
Using Data to Understand the Brain
Using Data to Understand the BrainUsing Data to Understand the Brain
Using Data to Understand the Brain
 

Recently uploaded

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 

Recently uploaded (20)

Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 

Computational Social Science, Lecture 08: Counting Fast, Part II

  • 1. Counting Fast (Part II) Sergei Vassilvitskii Columbia University Computational Social Science March 8, 2013 Thursday, March 14, 13
  • 2. Last time Counting fast: – Quadratic time doesn’t scale – Sorting is slightly more than linear – Hashing allows you to do membership queries in constant time 2 Sergei Vassilvitskii Thursday, March 14, 13
  • 3. Today Counting on Networks: – Large Graphs: Internet, Facebook, Twitter – Recommendation Graphs: Netflix, Amazon, etc. 3 Sergei Vassilvitskii Thursday, March 14, 13
  • 4. Friends & Followers Given a network: – When do people become friends? – What factors influence this? 4 Sergei Vassilvitskii Thursday, March 14, 13
  • 5. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users 5 Sergei Vassilvitskii Thursday, March 14, 13
  • 6. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? 6 Sergei Vassilvitskii Thursday, March 14, 13
  • 7. Friends & Followers Given a network: – When do people become friends? – What factors influence this? Products: – People You May Know (PYMK). Reconnect people, help new users – Twitter’s who to follow? Recommendations: – Netflix, Amazon, etc. (Future lectures) 7 Sergei Vassilvitskii Thursday, March 14, 13
  • 8. Triadic Closure Likely to become friends with: – People in similar groups – Friends of friends 8 Sergei Vassilvitskii Thursday, March 14, 13
  • 9. Defining Tight Knit Circles Looking for tight-knit circles: – People whose friends are friends themselves Why? – Network Cohesion: Tightly knit communities foster more trust, social norms. [Coleman ’88, Portes ’88] – Structural Holes: Individuals benefit form bridging [Burt ’04, ’07] 9 Sergei Vassilvitskii Thursday, March 14, 13
  • 10. Clustering Coefficient vs. 10 Sergei Vassilvitskii Thursday, March 14, 13
  • 11. Clustering Coefficient cc ( ) = 0.5 cc ( ) = 0.1 vs. Given an undirected graph - For each node, it’s the fraction of v’s neighbors who are neighbors themselves - Identical to the number of triangles containing the node 11 Sergei Vassilvitskii Thursday, March 14, 13
  • 12. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=0 12 Sergei Vassilvitskii Thursday, March 14, 13
  • 13. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 13 Sergei Vassilvitskii Thursday, March 14, 13
  • 14. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ v Triangles[v]=1 w u 14 Sergei Vassilvitskii Thursday, March 14, 13
  • 15. How to Count Triangles Sequential Version: foreach v in V foreach u,w in Adjacency(v) if (u,w) in E Triangles[v]++ Running time: – For each vertex, look at all pairs of neighbors – Number of pairs ~ quadratic in the degree of the vertex – What happens if the degree is very large? 15 Sergei Vassilvitskii Thursday, March 14, 13
  • 16. Parallel Version But use 1,000 machines! – Quadratic algorithms still don’t scale – Simple parallelization: process each vertex separately Naive parallelization does not help with data skew – Some nodes will have very high degree – Example. 3.2 Million followers, must generate 10 Trillion (10^13) potential edges to check. – Even if generating 100M edges per second this is 100K seconds ~ 27 hours for one vertex! 16 Sergei Vassilvitskii Thursday, March 14, 13
  • 17. “Just 5 more minutes” On the LiveJournal Graph (5M nodes, 70M edges) – 80% of vertices are done after 5 min – 99% done after 35 min 17 Sergei Vassilvitskii Thursday, March 14, 13
  • 18. Adapting the Algorithm Approach 1: Dealing with skew directly – currently every triangle counted 3 times (once per vertex) – Running time quadratic in the degree of the vertex – Idea: Count each once, from the perspective of lowest degree vertex – Does this heuristic work? 18 Sergei Vassilvitskii Thursday, March 14, 13
  • 19. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 19 Sergei Vassilvitskii Thursday, March 14, 13
  • 20. How to Count Triangles Better foreach v in V foreach u in Adjacency(v) with deg(u) > deg(v): foreach w in Adjacency(v) with deg(w) > deg(v): if (u,w) is an edge: Triangles[v]++ Triangles[w]++ Triangles[u]++ 20 Sergei Vassilvitskii Thursday, March 14, 13
  • 21. Does it make a difference? 21 Sergei Vassilvitskii Thursday, March 14, 13
  • 22. Why does it help? Look at two different kinds of nodes: – Few friends: • OK to be quadratic on small instances – Lots of friends • Only care about number of friends with even more friends! • Cannot have too many (can make this formal) 22 Sergei Vassilvitskii Thursday, March 14, 13
  • 23. Break 23 Sergei Vassilvitskii Thursday, March 14, 13
  • 24. Working in Parallel MapReduce (review): Map: – Decide how to group the data for computation Reduce: – Given the grouping, perform the computation 24 Sergei Vassilvitskii Thursday, March 14, 13
  • 25. Building People You May Know Friendships are undirected: – If Alice knows Bob, Bob knows Alice – Data stored as a list of all edges – Find all friends of friends – Score the possible pairs 25 Sergei Vassilvitskii Thursday, March 14, 13
  • 26. Data Suppose you have edges and degrees of each vertex: Joe 56 Mary 78 Alice 398 Bob 198 Dan 983 Justin 11,985,234 ... An alternate view may be data stored as adjacency list: Joe 56 Mary 78 Don 99 Bill 1 Alice 398 Kate 55 Bob 198 Mary 78 ... 26 Sergei Vassilvitskii Thursday, March 14, 13
  • 27. Previous Algorithm Adjacency list input. – Map: • For each node and its neighbors, output all paths through the node – Reduce: • none – Map: [ | ] – Output: – Map: [ | ] – Output: None 27 Sergei Vassilvitskii Thursday, March 14, 13
  • 28. How to Count Triangles Better Idea [Schank ’07] – Only pivot on nodes who have smaller degrees than both neighbors. – Neighbors of high degree nodes tend to have small degrees 28 Sergei Vassilvitskii Thursday, March 14, 13
  • 29. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 29 Sergei Vassilvitskii Thursday, March 14, 13
  • 30. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 30 Sergei Vassilvitskii Thursday, March 14, 13
  • 31. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree 31 Sergei Vassilvitskii Thursday, March 14, 13
  • 32. Want to compute all open triads Data Needed: – Central node – Neighbors that have higher degree – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently 32 Sergei Vassilvitskii Thursday, March 14, 13
  • 33. Want to compute all open triads Map: – Orient each edge to point to a node of higher degree, breaking ties arbitrarily but consistently – Given: Joe 56 Mary 78 – Output: <Key = Joe, Value = Mary> – Given: Alice 398 Bob 198 – Output: <Key = Bob, Value = Alice> map(key, value): split = value.split() if split[3] > split[1] or (split[3] == split[1] and split[0] < split[2]): emit(split[0], split[2]) if split[3] < split[1] or (split[3] == split[1] and split[0] > split[2]): emit(split[2], split[0]) 33 Sergei Vassilvitskii Thursday, March 14, 13
  • 34. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): 34 Sergei Vassilvitskii Thursday, March 14, 13
  • 35. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships): – Generate all 2-paths: , , 35 Sergei Vassilvitskii Thursday, March 14, 13
  • 36. Want to compute all open triads Aggregate (Shuffle): – Collect all values with same key (nodes with higher degree) Computation: – Generate all 2-paths (friend of a friend relationships) – Given: key= Joe, value={Mary, Justin, Alice} – Output: • (key = Joe, Value = (Mary, Justin)) • (key = Joe, Value = (Mary, Alice)) • (key = Joe, Value = (Justin, Alice)) reduce(key, values): for friend1 : values for friend2 : values emit(key, (friend1, friend2)) 36 Sergei Vassilvitskii Thursday, March 14, 13
  • 37. Comparing Algorithms Edgelist MapOnly Algorithm: – MapOnly – Output from some nodes is quadratic Edge at a time Algroithm: – Map & Reduce – More balanced output from each node 37 Sergei Vassilvitskii Thursday, March 14, 13
  • 38. Scoring Some suggestions are better than others: – Some people are already friends! – Or they used to be friends... – Connected through a friend with 1000s of friends – Connected through multiple friends – ... 38 Sergei Vassilvitskii Thursday, March 14, 13
  • 39. Spring Break! 39 Sergei Vassilvitskii Thursday, March 14, 13