SlideShare une entreprise Scribd logo
1  sur  120
Télécharger pour lire hors ligne
Filtering: A Method for Solving
                           Graph Problems in
                               MapReduce
               Benjamin Moseley   Silvio Lattanzi   Siddharth Suri    Sergei Vassilvitski
                    UIUC              Google           Yahoo!              Yahoo!




                                         Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Overview

 • Introduction to MapReduce model

 • Our settings

 • Our results

 • Open questions




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Introduction to the MapReduce
                              model




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
XXL Data

 • Huge amount of data

 • Main problem is to analyze information quickly




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
XXL Data

 • Huge amount of data

 • Main problem is to analyze information quickly

 • New tools

 • Suitable efficient algorithms




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce

 • MapReduce is the platform of choice for processing
   massive data
                                                                            INPUT




                                                                             MAP
                                                                            PHASE




                                                                            REDUCE
                                                                             PHASE




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce

 • MapReduce is the platform of choice for processing
   massive data
                                                                                         INPUT

 • Data are represented as tuple
                            < key, value >
                                                                                          MAP
                                                                                         PHASE




                                                                                         REDUCE
                                                                                          PHASE




                                         Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce

 • MapReduce is the platform of choice for processing
   massive data
                                                                                         INPUT

 • Data are represented as tuple
                            < key, value >
                                                                                          MAP
                                                                                         PHASE
 • Mapper decides how data is
   distributed
                                                                                         REDUCE
                                                                                          PHASE




                                         Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce

 • MapReduce is the platform of choice for processing
   massive data
                                                                                         INPUT

 • Data are represented as tuple
                            < key, value >
                                                                                          MAP
                                                                                         PHASE
 • Mapper decides how data is
   distributed
                                                                                         REDUCE

 • Reducer performs non-trivial                                                           PHASE


   computation locally


                                         Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
How can we model MapReduce?

   PRAM

             •No limit on the number of processors

             •Memory is uniformly accessible from any
              processor

             •No limit on the memory available




                             Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
How can we model MapReduce?

   STREAMING

             •Just one processor

             •There is a limited amount of memory

             •No parallelization




                               Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce model

   [Karloff, Suri and Vassilvitskii]

             • N is the input size and   0 is some fixed
              constant




                              Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce model

   [Karloff, Suri and Vassilvitskii]

             • N is the input size and   0 is some fixed
              constant
                            1−
             •Less than N         machines




                              Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce model

   [Karloff, Suri and Vassilvitskii]

             • N is the input size and   0 is some fixed
              constant
                            1−
             •Less than N         machines
                            1−
             •Less than N         memory on each machine




                              Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce model

   [Karloff, Suri and Vassilvitskii]

             • N is the input size and   0 is some fixed
              constant
                                         1−
             •Less than N                      machines
                                         1−
             •Less than N                      memory on each machine

                            i                                                          i
    MRC : problem that can be solved in O(log N )
                                rounds
                                           Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Combining map and reduce phase


•Mapper and Reducer work only on a subgraph




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Combining map and reduce phase


•Mapper and Reducer work only on a subgraph


•Keep time in each round polynomial




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Combining map and reduce phase


•Mapper and Reducer work only on a subgraph


•Keep time in each round polynomial


•Time is constrained by the number of rounds




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic challenges

•No machine can see the entire input




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic challenges

•No machine can see the entire input


•No communication between machines during each
 phase




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic challenges

•No machine can see the entire input


•No communication between machines during each
 phase

                                2−2
•Total memory is            N


                                Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce vs MUD algorithms


•In MUD framework each reducer operates on a
 stream of data.


•In MUD, each reducer is restricted to only using
 polylogarithmic space.




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings




                             Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings


   •We study the Karloff, Suri and Vassilvitskii model




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings


   •We study the Karloff, Suri and Vassilvitskii model



                                      0
   •We focus on class MRC




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings


   •We assume to work with dense graph m                         = n,       1+c
    for some constant c  0




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings


   •We assume to work with dense graph m                         = n,       1+c
    for some constant c  0



   •Empirical evidences that social networks are dense
    graphs
    [Leskovec, Kleinberg and Faloutsos]



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Dense graph motivation

   [Leskovec, Kleinberg and Faloutsos]

         •They study 9 different social networks




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Dense graph motivation

   [Leskovec, Kleinberg and Faloutsos]

         •They study 9 different social networks

         •They show that several graphs from different
                         1+c
          domains have n     edges




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Dense graph motivation

   [Leskovec, Kleinberg and Faloutsos]

         •They study 9 different social networks

         •They show that several graphs from different
                         1+c
          domains have n     edges

         •Lowest value of c founded .08 and four graphs
          have c  .5



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our results




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Results

     Constant rounds algorithms for
         •Maximal matching

         •Minimum cut

         •8-approx for maximum weighted matching

         •2-approx for vertex cover

         •3/2-approx for edge cover

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Notation


   • G = (V, E) : Input graph

   •     n: number of nodes
   •     m : number of edges
   •     η : memory available on each machine

   •    N : input size
                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

  •Part of the input is dropped or filtered on the first
   stage in parallel




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

  •Part of the input is dropped or filtered on the first
   stage in parallel


  •Next some computation is performed on the filtered
   input




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

  •Part of the input is dropped or filtered on the first
   stage in parallel


  •Next some computation is performed on the filtered
   input


  •Finally some patchwork is done to ensure a proper
   solution

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

                                 FILTER




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

                                 FILTER




                                                     COMPUTE
                                                     SOLUTION




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

                                 FILTER




                                                     COMPUTE
                                                     SOLUTION

                              PATCHWORK
                             ON REST OF THE
                                 GRAPH




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

                                 FILTER




                                                     COMPUTE
                                                     SOLUTION

                              PATCHWORK
                             ON REST OF THE
                                 GRAPH




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree

        m=n                 1+c




                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree

        m=n                 1+c




          PARTITION
           EDGES
                                        nc/2




                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree

        m=n                 1+c




          PARTITION
           EDGES
                                        nc/2




  n1+c/2 edges

                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree
                                                                     n1+c/2   edges




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree
                                                                               n1+c/2   edges




                            COMPUTE
                              MST
                                          nc/2




                                      Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree
                                                                               n1+c/2   edges




                            COMPUTE
                              MST
                                          nc/2




                            SECOND
                            ROUND
                                          nc/2




         n1+c/2 edges


                                      Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                             MRC0


 •The algorithm is correct




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                             MRC0


 •The algorithm is correct

 •No edge in the final solution is discarded in partial
  solution




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                             MRC0


 •The algorithm is correct

 •No edge in the final solution is discarded in partial
  solution

 •The algorithm runs in two rounds




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                                   MRC0


 •The algorithm is correct

 •No edge in the final solution is discarded in partial
  solution

 •The algorithm runs in two rounds
                                 c   
 •No more than              O n   2       machines are used




                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                                   MRC0


 •The algorithm is correct

 •No edge in the final solution is discarded in partial
  solution

 •The algorithm runs in two rounds
                                 c   
 •No more than              O n   2       machines are used

 •No machine uses memory greater than O(n
                                                                           1+c/2
                                                                                   )

                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximal matching



    •Algorithmic difficulty is that each machine can only
     see edges assigned to the machine




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximal matching



    •Algorithmic difficulty is that each machine can only
     see edges assigned to the machine


    •Is a partitioning based algorithm feasible?




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Partitioning algorithm




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Partitioning algorithm

                            PARTITIONING




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Partitioning algorithm

                            PARTITIONING




                                            COMBINE
                                           MATCHINGS




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Partitioning algorithm

                            PARTITIONING




                                            COMBINE
                                           MATCHINGS




   It is impossible to build a maximal matching using only
                           red edges!!
                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight


  •Consider any subset of the edges E 




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight


  •Consider any subset of the edges E 


                            
  •Let M be a maximal matching on
     G[E ]          




                                Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight


  •Consider any subset of the edges E 


                            
  •Let M be a maximal matching on
     G[E ]          




  •The unmatched vertices form a
   independent set

                                Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight

•We pick each edge with probability p




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight

•We pick each edge with probability p

•Find a matching on a sample and then find a
 matching on unmatched vertices




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight

•We pick each edge with probability p

•Find a matching on a sample and then find a
 matching on unmatched vertices

•For dense portions of the graph, many
 edges should be sampled




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight

•We pick each edge with probability p

•Find a matching on a sample and then find a
 matching on unmatched vertices

•For dense portions of the graph, many
 edges should be sampled

•Sparse portions of the graph are small
 and can be placed on a single machine

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

•Sample each edge independently with probability
    10 log n
 p=
     nc/2




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

•Sample each edge independently with probability
    10 log n
 p=
     nc/2

•Find a matching on the sample




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

•Sample each edge independently with probability
    10 log n
 p=
     nc/2

•Find a matching on the sample

•Consider the induced subgraph on unmatched
 vertices



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

•Sample each edge independently with probability
    10 log n
 p=
     nc/2

•Find a matching on the sample

•Consider the induced subgraph on unmatched
 vertices

•Find a matching on this graph and output the union

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

                                SAMPLE




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

                                SAMPLE




                                                           FIRST
                                                         MATCHING




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

                                SAMPLE




                                                           FIRST
                                                         MATCHING



                               MATCHING ON
                             UNMATCHED NODES




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

                                SAMPLE




                                                           FIRST
                                                         MATCHING



                               MATCHING ON
                             UNMATCHED NODES




                   UNION



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Correctness

  •Consider the last step of the algorithm




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Correctness

  •Consider the last step of the algorithm


  •All unmatched vertices are placed on a machine




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Correctness

  •Consider the last step of the algorithm


  •All unmatched vertices are placed on a machine


  •All unmatched vertices or are matched at the last
   step or have only matched neighbors



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the rounds

 Three rounds:

 •Sampling the edges




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the rounds

 Three rounds:

 •Sampling the edges

 •Find a matching on a single machine for the sampled
  edges




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the rounds

 Three rounds:

 •Sampling the edges

 •Find a matching on a single machine for the sampled
  edges

 •Find a matching for the unmatched vertices



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                             10 log n
 •Each edge was sampled with probability p =
                                              nc/2




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                             10 log n
 •Each edge was sampled with probability p =
                                              nc/2
                                            ˜ 1+c/2 )
 •Using Chernoff the sampled graph has size O(n
    with high probability




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                             10 log n
 •Each edge was sampled with probability p =
                                              nc/2
                                            ˜ 1+c/2 )
 •Using Chernoff the sampled graph has size O(n
    with high probability

 •Can we bound the size of the induced subgraph on
  the unmatched vertices?



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                                                                     1+c/2
•For a fixed induced subgraph with at least n
 edges the probability an edge is not sampled is:
                                         n1+c/2
                               10 log n
                            1−                      ≤ exp(−10n log n)
                                nc/2




                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                                                                     1+c/2
•For a fixed induced subgraph with at least n
 edges the probability an edge is not sampled is:
                                         n1+c/2
                               10 log n
                            1−                      ≤ exp(−10n log n)
                                nc/2

•Union bounding over all 2 induced subgraphs shows   n

 that at least one edge is sampled from every dense
 subgraph with probability
                                                         1
                                1 − exp(−10 log n) ≥ 1 − 10
                                                        n

                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory


                                            ˜ 1+c/2 )
   •Can we run the algorithm with less then O(n
      memory?




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory


                                            ˜ 1+c/2 )
   •Can we run the algorithm with less then O(n
      memory?


   •We can amplify our sampling technique!




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

•If the edges between unmatched vertices fit onto a
 single machine, find a matching on those vertices




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

•If the edges between unmatched vertices fit onto a
 single machine, find a matching on those vertices

•Otherwise, recurse on the unmatched nodes




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

•If the edges between unmatched vertices fit onto a
 single machine, find a matching on those vertices

•Otherwise, recurse on the unmatched nodes

                        1+                                                   
•With n     memory each iteration a factor of n edges
 are removed resulting in O(c/) rounds
                              Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Parallel computation power

  •Maximal matching algorithm does not use
   parallelization




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Parallel computation power

  •Maximal matching algorithm does not use
   parallelization


  •We use a single machine in every step




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Parallel computation power

  •Maximal matching algorithm does not use
   parallelization


  •We use a single machine in every step


  •When parallelization is used?



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching
                                            8

                            2                       2
                                            7
                                    1

                                4
                                                3
                            4           2
                                                     2




                                    Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching
                                                8

                                2                       2
                                                7
                                        1

                                    4
                                                    3
                                4           2
                                                         2


                                                                         SPLIT THE GRAPH




                                                                                8

       2                    2
                                                                               7
             1
                                        4           3
                    2               4
                            2




                                        Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching
                                                                           8

         2                  2
                                                                          7
                1
                                    4        3
                      2         4
                            2




                                    Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching
                                                                                   8

         2                  2
                                                                                  7
                1
                                            4        3
                      2                 4
                            2




                            MATCHINGS




         2
                                                                                  7

                                                     3
                      2                 4
                            2




                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching

         2
                                                                         7

                                            3
                      2         4
                            2




                                    Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching

         2
                                                                                 7

                                                        3
                      2                4
                            2



                            SCAN THE EDGE
                            SEQUENTIALLY




                                       2
                                                    7

                                                        3
                                       4        2
                                                            2




                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the rounds and memory


  Four rounds:

  •Splitting the graph and running the maximal
   matching:
                ˜ 1+c/2 ) memory
   3 rounds and O(n


  •Compute the final solution:
   1 round and O(n log n) memory

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Approximation guarantee (sketch)

 •An edge in the solution can block at most 2 edges in
  each subgraph of smaller weight
                                                                            8

              2
                                                                            7

                                                 3
                            2       4
                                2




                                        Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Approximation guarantee (sketch)

 •An edge in the solution can block at most 2 edges in
  each subgraph of smaller weight
                                                                            8

              2
                                                                            7

                                                 3
                            2       4
                                2




 •We loose a factor or 2 because we do not consider
  the weight


                                        Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Other algorithms

 Based on the same intuition:


 •2-approx for vertex cover


 •3/2-approx for edge cover




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut

•Partition does not work, because we loose structural
 informations




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut

•Partition does not work, because we loose structural
 informations

•Sampling does not seem to work either




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut

•Partition does not work, because we loose structural
 informations

•Sampling does not seem to work either

•We can use the first steps of Karger algorithm as a
 filtering technique




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut

•Partition does not work, because we loose structural
 informations

•Sampling does not seem to work either

•We can use the first steps of Karger algorithm as a
 filtering technique

•The random choices made in the early rounds
 succeed with high probability, whereas the later rounds
 have a much lower probability of success
                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut




       1




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut



      RANDOM
      WEIGHT
                                                       n δ1
                 3                  1                                 5
       1                    2   3               2             2           3           1     1       3
                                            4                                                   4
           4                            3                         4
                                                                          4                 3
                  1         3                                                                       2




                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut



      RANDOM
      WEIGHT
                                                                              n δ1
                 3                                     1                                     5
       1                    2            3                             2             2           3           1       1           3
                                                                   4                                                         4
           4                                               3                             4
                                                                                                 4                   3
                  1         3                                                                                                    2

                                 COMPRESS
                                 EDGES WITH
                                SMALL WEIGTH

                                                                                             2
                                               2                                                                         2
                                                               2
                      2                            2                                                             2




                                                                   Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut



      RANDOM
      WEIGHT
                                                                              n δ1
                 3                                     1                                     5
       1                    2            3                             2             2           3           1       1           3
                                                                   4                                                         4
           4                                               3                             4
                                                                                                 4                   3
                  1         3                                                                                                    2

                                 COMPRESS
                                 EDGES WITH
                                SMALL WEIGTH

                                                                                             2
                                               2                                                                         2
                                                               2
                      2                            2                                                             2




                      From Karger at least one graph will contain a min cut w.h.p.
                                                                   Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut
                                                         2
                            2                                                     2
                                    2
                 2              2                                           2




                                        Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut
                                                                   2
                                      2                                                     2
                                              2
                 2                        2                                           2




                            COMPUTE
                            MINIMUM
                              CUT

                                                                   2
                                      2                                                      2
                                              2
                 2                        2                                           2




           We will find the minimum cut w.h.p.


                                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Empirical result (matching)
                                      %#!

                                                      '()(**+*,*-.)/012
                                                      30)+(2/4-
                                      %!!
                  !#$%'(%)#*+,'




                                      $#!




                                      $!!




                                       #!




                                        !
                                             !!!$          !!!#         !!$            !!%            !!#   !$   $
                                                                                    -.%/0'1234.4)0)*5'(/,'




                                                                             Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Open problems




                                Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Open problems 1

  •Maximum matching


  •Shortest path


  •Dynamic programming




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Open problems 2


  •Algorithms for sparse graph


  •Does connected components require more than 2
   rounds?


  •Lower bounds


                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Thank you!




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012

Contenu connexe

Dernier

DAKSHIN BIHAR GRAMIN BANK: REDEFINING THE DIGITAL BANKING EXPERIENCE WITH A U...
DAKSHIN BIHAR GRAMIN BANK: REDEFINING THE DIGITAL BANKING EXPERIENCE WITH A U...DAKSHIN BIHAR GRAMIN BANK: REDEFINING THE DIGITAL BANKING EXPERIENCE WITH A U...
DAKSHIN BIHAR GRAMIN BANK: REDEFINING THE DIGITAL BANKING EXPERIENCE WITH A U...Rishabh Aryan
 
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造kbdhl05e
 
Design principles on typography in design
Design principles on typography in designDesign principles on typography in design
Design principles on typography in designnooreen17
 
Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025Rndexperts
 
General Knowledge Quiz Game C++ CODE.pptx
General Knowledge Quiz Game C++ CODE.pptxGeneral Knowledge Quiz Game C++ CODE.pptx
General Knowledge Quiz Game C++ CODE.pptxmarckustrevion
 
原版1:1定制堪培拉大学毕业证(UC毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制堪培拉大学毕业证(UC毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制堪培拉大学毕业证(UC毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制堪培拉大学毕业证(UC毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
韩国SKKU学位证,成均馆大学毕业证书1:1制作
韩国SKKU学位证,成均馆大学毕业证书1:1制作韩国SKKU学位证,成均馆大学毕业证书1:1制作
韩国SKKU学位证,成均馆大学毕业证书1:1制作7tz4rjpd
 
Iconic Global Solution - web design, Digital Marketing services
Iconic Global Solution - web design, Digital Marketing servicesIconic Global Solution - web design, Digital Marketing services
Iconic Global Solution - web design, Digital Marketing servicesIconic global solution
 
CREATING A POSITIVE SCHOOL CULTURE CHAPTER 10
CREATING A POSITIVE SCHOOL CULTURE CHAPTER 10CREATING A POSITIVE SCHOOL CULTURE CHAPTER 10
CREATING A POSITIVE SCHOOL CULTURE CHAPTER 10uasjlagroup
 
办理(麻省罗威尔毕业证书)美国麻省大学罗威尔校区毕业证成绩单原版一比一
办理(麻省罗威尔毕业证书)美国麻省大学罗威尔校区毕业证成绩单原版一比一办理(麻省罗威尔毕业证书)美国麻省大学罗威尔校区毕业证成绩单原版一比一
办理(麻省罗威尔毕业证书)美国麻省大学罗威尔校区毕业证成绩单原版一比一diploma 1
 
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改yuu sss
 
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree ttt fff
 
group_15_empirya_p1projectIndustrial.pdf
group_15_empirya_p1projectIndustrial.pdfgroup_15_empirya_p1projectIndustrial.pdf
group_15_empirya_p1projectIndustrial.pdfneelspinoy
 
Apresentação Clamo Cristo -letra música Matheus Rizzo
Apresentação Clamo Cristo -letra música Matheus RizzoApresentação Clamo Cristo -letra música Matheus Rizzo
Apresentação Clamo Cristo -letra música Matheus RizzoCarolTelles6
 
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Dubai Calls Girl Tapes O525547819 Real Tapes Escort Services Dubai
Dubai Calls Girl Tapes O525547819 Real Tapes Escort Services DubaiDubai Calls Girl Tapes O525547819 Real Tapes Escort Services Dubai
Dubai Calls Girl Tapes O525547819 Real Tapes Escort Services Dubaikojalkojal131
 
306MTAMount UCLA University Bachelor's Diploma in Social Media
306MTAMount UCLA University Bachelor's Diploma in Social Media306MTAMount UCLA University Bachelor's Diploma in Social Media
306MTAMount UCLA University Bachelor's Diploma in Social MediaD SSS
 
Pharmaceutical Packaging for the elderly.pdf
Pharmaceutical Packaging for the elderly.pdfPharmaceutical Packaging for the elderly.pdf
Pharmaceutical Packaging for the elderly.pdfAayushChavan5
 
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxUntitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxmapanig881
 
Passbook project document_april_21__.pdf
Passbook project document_april_21__.pdfPassbook project document_april_21__.pdf
Passbook project document_april_21__.pdfvaibhavkanaujia
 

Dernier (20)

DAKSHIN BIHAR GRAMIN BANK: REDEFINING THE DIGITAL BANKING EXPERIENCE WITH A U...
DAKSHIN BIHAR GRAMIN BANK: REDEFINING THE DIGITAL BANKING EXPERIENCE WITH A U...DAKSHIN BIHAR GRAMIN BANK: REDEFINING THE DIGITAL BANKING EXPERIENCE WITH A U...
DAKSHIN BIHAR GRAMIN BANK: REDEFINING THE DIGITAL BANKING EXPERIENCE WITH A U...
 
西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造西北大学毕业证学位证成绩单-怎么样办伪造
西北大学毕业证学位证成绩单-怎么样办伪造
 
Design principles on typography in design
Design principles on typography in designDesign principles on typography in design
Design principles on typography in design
 
Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025Top 10 Modern Web Design Trends for 2025
Top 10 Modern Web Design Trends for 2025
 
General Knowledge Quiz Game C++ CODE.pptx
General Knowledge Quiz Game C++ CODE.pptxGeneral Knowledge Quiz Game C++ CODE.pptx
General Knowledge Quiz Game C++ CODE.pptx
 
原版1:1定制堪培拉大学毕业证(UC毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制堪培拉大学毕业证(UC毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制堪培拉大学毕业证(UC毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制堪培拉大学毕业证(UC毕业证)#文凭成绩单#真实留信学历认证永久存档
 
韩国SKKU学位证,成均馆大学毕业证书1:1制作
韩国SKKU学位证,成均馆大学毕业证书1:1制作韩国SKKU学位证,成均馆大学毕业证书1:1制作
韩国SKKU学位证,成均馆大学毕业证书1:1制作
 
Iconic Global Solution - web design, Digital Marketing services
Iconic Global Solution - web design, Digital Marketing servicesIconic Global Solution - web design, Digital Marketing services
Iconic Global Solution - web design, Digital Marketing services
 
CREATING A POSITIVE SCHOOL CULTURE CHAPTER 10
CREATING A POSITIVE SCHOOL CULTURE CHAPTER 10CREATING A POSITIVE SCHOOL CULTURE CHAPTER 10
CREATING A POSITIVE SCHOOL CULTURE CHAPTER 10
 
办理(麻省罗威尔毕业证书)美国麻省大学罗威尔校区毕业证成绩单原版一比一
办理(麻省罗威尔毕业证书)美国麻省大学罗威尔校区毕业证成绩单原版一比一办理(麻省罗威尔毕业证书)美国麻省大学罗威尔校区毕业证成绩单原版一比一
办理(麻省罗威尔毕业证书)美国麻省大学罗威尔校区毕业证成绩单原版一比一
 
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
1比1办理美国北卡罗莱纳州立大学毕业证成绩单pdf电子版制作修改
 
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree 毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲弗林德斯大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
group_15_empirya_p1projectIndustrial.pdf
group_15_empirya_p1projectIndustrial.pdfgroup_15_empirya_p1projectIndustrial.pdf
group_15_empirya_p1projectIndustrial.pdf
 
Apresentação Clamo Cristo -letra música Matheus Rizzo
Apresentação Clamo Cristo -letra música Matheus RizzoApresentação Clamo Cristo -letra música Matheus Rizzo
Apresentação Clamo Cristo -letra música Matheus Rizzo
 
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
原版美国亚利桑那州立大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Dubai Calls Girl Tapes O525547819 Real Tapes Escort Services Dubai
Dubai Calls Girl Tapes O525547819 Real Tapes Escort Services DubaiDubai Calls Girl Tapes O525547819 Real Tapes Escort Services Dubai
Dubai Calls Girl Tapes O525547819 Real Tapes Escort Services Dubai
 
306MTAMount UCLA University Bachelor's Diploma in Social Media
306MTAMount UCLA University Bachelor's Diploma in Social Media306MTAMount UCLA University Bachelor's Diploma in Social Media
306MTAMount UCLA University Bachelor's Diploma in Social Media
 
Pharmaceutical Packaging for the elderly.pdf
Pharmaceutical Packaging for the elderly.pdfPharmaceutical Packaging for the elderly.pdf
Pharmaceutical Packaging for the elderly.pdf
 
Untitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptxUntitled presedddddddddddddddddntation (1).pptx
Untitled presedddddddddddddddddntation (1).pptx
 
Passbook project document_april_21__.pdf
Passbook project document_april_21__.pdfPassbook project document_april_21__.pdf
Passbook project document_april_21__.pdf
 

En vedette

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

En vedette (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Filtering: a method for solving graph problems in map reduce

  • 1. Filtering: A Method for Solving Graph Problems in MapReduce Benjamin Moseley Silvio Lattanzi Siddharth Suri Sergei Vassilvitski UIUC Google Yahoo! Yahoo! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 2. Overview • Introduction to MapReduce model • Our settings • Our results • Open questions Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 3. Introduction to the MapReduce model Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 4. XXL Data • Huge amount of data • Main problem is to analyze information quickly Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 5. XXL Data • Huge amount of data • Main problem is to analyze information quickly • New tools • Suitable efficient algorithms Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 6. MapReduce • MapReduce is the platform of choice for processing massive data INPUT MAP PHASE REDUCE PHASE Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 7. MapReduce • MapReduce is the platform of choice for processing massive data INPUT • Data are represented as tuple < key, value > MAP PHASE REDUCE PHASE Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 8. MapReduce • MapReduce is the platform of choice for processing massive data INPUT • Data are represented as tuple < key, value > MAP PHASE • Mapper decides how data is distributed REDUCE PHASE Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 9. MapReduce • MapReduce is the platform of choice for processing massive data INPUT • Data are represented as tuple < key, value > MAP PHASE • Mapper decides how data is distributed REDUCE • Reducer performs non-trivial PHASE computation locally Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 10. How can we model MapReduce? PRAM •No limit on the number of processors •Memory is uniformly accessible from any processor •No limit on the memory available Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 11. How can we model MapReduce? STREAMING •Just one processor •There is a limited amount of memory •No parallelization Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 12. MapReduce model [Karloff, Suri and Vassilvitskii] • N is the input size and 0 is some fixed constant Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 13. MapReduce model [Karloff, Suri and Vassilvitskii] • N is the input size and 0 is some fixed constant 1− •Less than N machines Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 14. MapReduce model [Karloff, Suri and Vassilvitskii] • N is the input size and 0 is some fixed constant 1− •Less than N machines 1− •Less than N memory on each machine Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 15. MapReduce model [Karloff, Suri and Vassilvitskii] • N is the input size and 0 is some fixed constant 1− •Less than N machines 1− •Less than N memory on each machine i i MRC : problem that can be solved in O(log N ) rounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 16. Combining map and reduce phase •Mapper and Reducer work only on a subgraph Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 17. Combining map and reduce phase •Mapper and Reducer work only on a subgraph •Keep time in each round polynomial Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 18. Combining map and reduce phase •Mapper and Reducer work only on a subgraph •Keep time in each round polynomial •Time is constrained by the number of rounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 19. Algorithmic challenges •No machine can see the entire input Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 20. Algorithmic challenges •No machine can see the entire input •No communication between machines during each phase Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 21. Algorithmic challenges •No machine can see the entire input •No communication between machines during each phase 2−2 •Total memory is N Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 22. MapReduce vs MUD algorithms •In MUD framework each reducer operates on a stream of data. •In MUD, each reducer is restricted to only using polylogarithmic space. Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 23. Our settings Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 24. Our settings •We study the Karloff, Suri and Vassilvitskii model Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 25. Our settings •We study the Karloff, Suri and Vassilvitskii model 0 •We focus on class MRC Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 26. Our settings •We assume to work with dense graph m = n, 1+c for some constant c 0 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 27. Our settings •We assume to work with dense graph m = n, 1+c for some constant c 0 •Empirical evidences that social networks are dense graphs [Leskovec, Kleinberg and Faloutsos] Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 28. Dense graph motivation [Leskovec, Kleinberg and Faloutsos] •They study 9 different social networks Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 29. Dense graph motivation [Leskovec, Kleinberg and Faloutsos] •They study 9 different social networks •They show that several graphs from different 1+c domains have n edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 30. Dense graph motivation [Leskovec, Kleinberg and Faloutsos] •They study 9 different social networks •They show that several graphs from different 1+c domains have n edges •Lowest value of c founded .08 and four graphs have c .5 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 31. Our results Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 32. Results Constant rounds algorithms for •Maximal matching •Minimum cut •8-approx for maximum weighted matching •2-approx for vertex cover •3/2-approx for edge cover Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 33. Notation • G = (V, E) : Input graph • n: number of nodes • m : number of edges • η : memory available on each machine • N : input size Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 34. Filtering •Part of the input is dropped or filtered on the first stage in parallel Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 35. Filtering •Part of the input is dropped or filtered on the first stage in parallel •Next some computation is performed on the filtered input Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 36. Filtering •Part of the input is dropped or filtered on the first stage in parallel •Next some computation is performed on the filtered input •Finally some patchwork is done to ensure a proper solution Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 37. Filtering Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 38. Filtering FILTER Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 39. Filtering FILTER COMPUTE SOLUTION Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 40. Filtering FILTER COMPUTE SOLUTION PATCHWORK ON REST OF THE GRAPH Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 41. Filtering FILTER COMPUTE SOLUTION PATCHWORK ON REST OF THE GRAPH Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 42. Warm-up: Compute the minimum spanning tree m=n 1+c Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 43. Warm-up: Compute the minimum spanning tree m=n 1+c PARTITION EDGES nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 44. Warm-up: Compute the minimum spanning tree m=n 1+c PARTITION EDGES nc/2 n1+c/2 edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 45. Warm-up: Compute the minimum spanning tree n1+c/2 edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 46. Warm-up: Compute the minimum spanning tree n1+c/2 edges COMPUTE MST nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 47. Warm-up: Compute the minimum spanning tree n1+c/2 edges COMPUTE MST nc/2 SECOND ROUND nc/2 n1+c/2 edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 48. Show that the algorithm is in MRC0 •The algorithm is correct Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 49. Show that the algorithm is in MRC0 •The algorithm is correct •No edge in the final solution is discarded in partial solution Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 50. Show that the algorithm is in MRC0 •The algorithm is correct •No edge in the final solution is discarded in partial solution •The algorithm runs in two rounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 51. Show that the algorithm is in MRC0 •The algorithm is correct •No edge in the final solution is discarded in partial solution •The algorithm runs in two rounds c •No more than O n 2 machines are used Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 52. Show that the algorithm is in MRC0 •The algorithm is correct •No edge in the final solution is discarded in partial solution •The algorithm runs in two rounds c •No more than O n 2 machines are used •No machine uses memory greater than O(n 1+c/2 ) Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 53. Maximal matching •Algorithmic difficulty is that each machine can only see edges assigned to the machine Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 54. Maximal matching •Algorithmic difficulty is that each machine can only see edges assigned to the machine •Is a partitioning based algorithm feasible? Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 55. Partitioning algorithm Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 56. Partitioning algorithm PARTITIONING Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 57. Partitioning algorithm PARTITIONING COMBINE MATCHINGS Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 58. Partitioning algorithm PARTITIONING COMBINE MATCHINGS It is impossible to build a maximal matching using only red edges!! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 59. Algorithmic insight •Consider any subset of the edges E Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 60. Algorithmic insight •Consider any subset of the edges E •Let M be a maximal matching on G[E ] Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 61. Algorithmic insight •Consider any subset of the edges E •Let M be a maximal matching on G[E ] •The unmatched vertices form a independent set Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 62. Algorithmic insight •We pick each edge with probability p Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 63. Algorithmic insight •We pick each edge with probability p •Find a matching on a sample and then find a matching on unmatched vertices Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 64. Algorithmic insight •We pick each edge with probability p •Find a matching on a sample and then find a matching on unmatched vertices •For dense portions of the graph, many edges should be sampled Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 65. Algorithmic insight •We pick each edge with probability p •Find a matching on a sample and then find a matching on unmatched vertices •For dense portions of the graph, many edges should be sampled •Sparse portions of the graph are small and can be placed on a single machine Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 66. Algorithm •Sample each edge independently with probability 10 log n p= nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 67. Algorithm •Sample each edge independently with probability 10 log n p= nc/2 •Find a matching on the sample Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 68. Algorithm •Sample each edge independently with probability 10 log n p= nc/2 •Find a matching on the sample •Consider the induced subgraph on unmatched vertices Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 69. Algorithm •Sample each edge independently with probability 10 log n p= nc/2 •Find a matching on the sample •Consider the induced subgraph on unmatched vertices •Find a matching on this graph and output the union Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 70. Algorithm Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 71. Algorithm SAMPLE Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 72. Algorithm SAMPLE FIRST MATCHING Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 73. Algorithm SAMPLE FIRST MATCHING MATCHING ON UNMATCHED NODES Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 74. Algorithm SAMPLE FIRST MATCHING MATCHING ON UNMATCHED NODES UNION Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 75. Correctness •Consider the last step of the algorithm Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 76. Correctness •Consider the last step of the algorithm •All unmatched vertices are placed on a machine Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 77. Correctness •Consider the last step of the algorithm •All unmatched vertices are placed on a machine •All unmatched vertices or are matched at the last step or have only matched neighbors Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 78. Bounding the rounds Three rounds: •Sampling the edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 79. Bounding the rounds Three rounds: •Sampling the edges •Find a matching on a single machine for the sampled edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 80. Bounding the rounds Three rounds: •Sampling the edges •Find a matching on a single machine for the sampled edges •Find a matching for the unmatched vertices Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 81. Bounding the memory 10 log n •Each edge was sampled with probability p = nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 82. Bounding the memory 10 log n •Each edge was sampled with probability p = nc/2 ˜ 1+c/2 ) •Using Chernoff the sampled graph has size O(n with high probability Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 83. Bounding the memory 10 log n •Each edge was sampled with probability p = nc/2 ˜ 1+c/2 ) •Using Chernoff the sampled graph has size O(n with high probability •Can we bound the size of the induced subgraph on the unmatched vertices? Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 84. Bounding the memory 1+c/2 •For a fixed induced subgraph with at least n edges the probability an edge is not sampled is: n1+c/2 10 log n 1− ≤ exp(−10n log n) nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 85. Bounding the memory 1+c/2 •For a fixed induced subgraph with at least n edges the probability an edge is not sampled is: n1+c/2 10 log n 1− ≤ exp(−10n log n) nc/2 •Union bounding over all 2 induced subgraphs shows n that at least one edge is sampled from every dense subgraph with probability 1 1 − exp(−10 log n) ≥ 1 − 10 n Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 86. Using less memory ˜ 1+c/2 ) •Can we run the algorithm with less then O(n memory? Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 87. Using less memory ˜ 1+c/2 ) •Can we run the algorithm with less then O(n memory? •We can amplify our sampling technique! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 88. Using less memory •Sample as many edges as possible that fits in memory Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 89. Using less memory •Sample as many edges as possible that fits in memory •Find a matching Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 90. Using less memory •Sample as many edges as possible that fits in memory •Find a matching •If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 91. Using less memory •Sample as many edges as possible that fits in memory •Find a matching •If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices •Otherwise, recurse on the unmatched nodes Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 92. Using less memory •Sample as many edges as possible that fits in memory •Find a matching •If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices •Otherwise, recurse on the unmatched nodes 1+ •With n memory each iteration a factor of n edges are removed resulting in O(c/) rounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 93. Parallel computation power •Maximal matching algorithm does not use parallelization Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 94. Parallel computation power •Maximal matching algorithm does not use parallelization •We use a single machine in every step Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 95. Parallel computation power •Maximal matching algorithm does not use parallelization •We use a single machine in every step •When parallelization is used? Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 96. Maximum weighted matching 8 2 2 7 1 4 3 4 2 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 97. Maximum weighted matching 8 2 2 7 1 4 3 4 2 2 SPLIT THE GRAPH 8 2 2 7 1 4 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 98. Maximum weighted matching 8 2 2 7 1 4 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 99. Maximum weighted matching 8 2 2 7 1 4 3 2 4 2 MATCHINGS 2 7 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 100. Maximum weighted matching 2 7 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 101. Maximum weighted matching 2 7 3 2 4 2 SCAN THE EDGE SEQUENTIALLY 2 7 3 4 2 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 102. Bounding the rounds and memory Four rounds: •Splitting the graph and running the maximal matching: ˜ 1+c/2 ) memory 3 rounds and O(n •Compute the final solution: 1 round and O(n log n) memory Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 103. Approximation guarantee (sketch) •An edge in the solution can block at most 2 edges in each subgraph of smaller weight 8 2 7 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 104. Approximation guarantee (sketch) •An edge in the solution can block at most 2 edges in each subgraph of smaller weight 8 2 7 3 2 4 2 •We loose a factor or 2 because we do not consider the weight Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 105. Other algorithms Based on the same intuition: •2-approx for vertex cover •3/2-approx for edge cover Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 106. Minimum cut •Partition does not work, because we loose structural informations Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 107. Minimum cut •Partition does not work, because we loose structural informations •Sampling does not seem to work either Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 108. Minimum cut •Partition does not work, because we loose structural informations •Sampling does not seem to work either •We can use the first steps of Karger algorithm as a filtering technique Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 109. Minimum cut •Partition does not work, because we loose structural informations •Sampling does not seem to work either •We can use the first steps of Karger algorithm as a filtering technique •The random choices made in the early rounds succeed with high probability, whereas the later rounds have a much lower probability of success Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 110. Minimum cut 1 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 111. Minimum cut RANDOM WEIGHT n δ1 3 1 5 1 2 3 2 2 3 1 1 3 4 4 4 3 4 4 3 1 3 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 112. Minimum cut RANDOM WEIGHT n δ1 3 1 5 1 2 3 2 2 3 1 1 3 4 4 4 3 4 4 3 1 3 2 COMPRESS EDGES WITH SMALL WEIGTH 2 2 2 2 2 2 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 113. Minimum cut RANDOM WEIGHT n δ1 3 1 5 1 2 3 2 2 3 1 1 3 4 4 4 3 4 4 3 1 3 2 COMPRESS EDGES WITH SMALL WEIGTH 2 2 2 2 2 2 2 From Karger at least one graph will contain a min cut w.h.p. Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 114. Minimum cut 2 2 2 2 2 2 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 115. Minimum cut 2 2 2 2 2 2 2 COMPUTE MINIMUM CUT 2 2 2 2 2 2 2 We will find the minimum cut w.h.p. Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 116. Empirical result (matching) %#! '()(**+*,*-.)/012 30)+(2/4- %!! !#$%'(%)#*+,' $#! $!! #! ! !!!$ !!!# !!$ !!% !!# !$ $ -.%/0'1234.4)0)*5'(/,' Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 117. Open problems Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 118. Open problems 1 •Maximum matching •Shortest path •Dynamic programming Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 119. Open problems 2 •Algorithms for sparse graph •Does connected components require more than 2 rounds? •Lower bounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 120. Thank you! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012