SlideShare a Scribd company logo
1 of 9
Fractality of Massive Graphs:
Scalable Analysis with
Sketch-Based Box-Covering
Algorithm
Takuya Akiba (Preferred Networks, Inc.)
Kenko Nakamura (Recruit Communications., Ltd.)
Taro Takaguchi (National Institute of Information and
Communications Technology)
*Work done while all authors were at National Institute of Informatics
1
Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Fractality of networks
2
Some of real-world
networks are fractal.
[Song+, Nature’05]
Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
▶ box := set of vertices within a radius of ℓ
▶b(ℓ) := number of boxes needed to cover the whole graph
▶ graph said to be fractal ⇔ b(ℓ) ∝ ℓ−d
Definition of Graph Fractality
3
← Fractal network model
Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
▶ b(ℓ) := number of boxes needed to cover the whole graph
Box-Covering Problem
4
Box-Covering Problem : Determination of the fractality
▶ Minimize b(ℓ)
▶ Box-Covering Problem is NP-Hard
▶ Approximation algorithms are used
Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Box-Covering Problem
Previous Algorithms
computation time is too long!
infeasible for networks with millions of vertices
5
This Work
near-liner time complexity
works with tens of millions of vertices
Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Compared with Previous Method
Previous Naive Method [Song+’05]
▶ Step 1: Instantiate all boxes
BFS from each vertex
▶ Step 2: Solve set cover problem
Greedy algorithm with approximation ratio 1 + ln n
Proposed Method
▶ Step 1: Instantiate Min-Hash of all boxes
Similar to algorithms for All-Distances Sketches
▶ Step 2: Solve set cover problem in the sketch-space
Near-linear time complexity by using BST and Heap
6
Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Experimental Results
Computation Time
Memory Usage
Environment:
Intel Xeon 2.67GHz, 96GB
10 times faster than the previous algorithms
Flower model BA model
Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Real Large Network
▶ Web graph with 1M vertices and 17M edges (in-2004)
– 11.7 hours in total
▶ Fractality analysis of million-scale network for the first time
8
Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm
Summary
Background: Fractality of real-world network
▶ Some of the real-world networks are fractal.
▶ Lack of an efficient algorithm
Proposed Method: Box-Covering on Min-Hash
▶ Avoid explicit representation of boxes
▶ Efficient Min-Hash computation: Similar to ADS
▶ Efficient Greedy by Binary Search Tree and Heap
▶ Fractality analysis of the network with 17M edges
9

More Related Content

What's hot

What's hot (20)

Hubba Deep Learning
Hubba Deep LearningHubba Deep Learning
Hubba Deep Learning
 
[0312] joohee
[0312] joohee[0312] joohee
[0312] joohee
 
YOLACT
YOLACTYOLACT
YOLACT
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
Weakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloudWeakly supervised semantic segmentation of 3D point cloud
Weakly supervised semantic segmentation of 3D point cloud
 
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series DataToeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
Toeplitz Inverse Covariance-Based Clustering of Multivariate Time Series Data
 
Internship
InternshipInternship
Internship
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Improving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computingImproving access to satellite imagery with Cloud computing
Improving access to satellite imagery with Cloud computing
 
Graph neural networks overview
Graph neural networks overviewGraph neural networks overview
Graph neural networks overview
 
Compiler Design
Compiler DesignCompiler Design
Compiler Design
 
Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)Accelerated Logistic Regression on GPU(s)
Accelerated Logistic Regression on GPU(s)
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Lec4 Clustering
Lec4 ClusteringLec4 Clustering
Lec4 Clustering
 
Bidirectional graph search techniques for finding shortest path in image base...
Bidirectional graph search techniques for finding shortest path in image base...Bidirectional graph search techniques for finding shortest path in image base...
Bidirectional graph search techniques for finding shortest path in image base...
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"Ivan Sahumbaiev "Deep Learning approaches meet 3D data"
Ivan Sahumbaiev "Deep Learning approaches meet 3D data"
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talk
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...Vladimir Milov and  Andrey Savchenko - Classification of Dangerous Situations...
Vladimir Milov and Andrey Savchenko - Classification of Dangerous Situations...
 

Viewers also liked

sublabel accurate convex relaxation of vectorial multilabel energies
sublabel accurate convex relaxation of vectorial multilabel energiessublabel accurate convex relaxation of vectorial multilabel energies
sublabel accurate convex relaxation of vectorial multilabel energies
Fujimoto Keisuke
 

Viewers also liked (20)

NIPS2016 Supervised Word Mover's Distance
NIPS2016 Supervised Word Mover's DistanceNIPS2016 Supervised Word Mover's Distance
NIPS2016 Supervised Word Mover's Distance
 
Dynamic filter networks
Dynamic filter networksDynamic filter networks
Dynamic filter networks
 
Bayesian Nonparametric Motor-skill Representations for Efficient Learning of ...
Bayesian Nonparametric Motor-skill Representations for Efficient Learning of ...Bayesian Nonparametric Motor-skill Representations for Efficient Learning of ...
Bayesian Nonparametric Motor-skill Representations for Efficient Learning of ...
 
NIPS Paper Reading, Data Programing
NIPS Paper Reading, Data ProgramingNIPS Paper Reading, Data Programing
NIPS Paper Reading, Data Programing
 
Binarized Neural Networks
Binarized Neural NetworksBinarized Neural Networks
Binarized Neural Networks
 
20170819 CV勉強会 CVPR 2017
20170819 CV勉強会 CVPR 201720170819 CV勉強会 CVPR 2017
20170819 CV勉強会 CVPR 2017
 
CVPR2016読み会 Sparsifying Neural Network Connections for Face Recognition
CVPR2016読み会 Sparsifying Neural Network Connections for Face RecognitionCVPR2016読み会 Sparsifying Neural Network Connections for Face Recognition
CVPR2016読み会 Sparsifying Neural Network Connections for Face Recognition
 
Stochastic Variational Inference
Stochastic Variational InferenceStochastic Variational Inference
Stochastic Variational Inference
 
On the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
On the Dynamics of Machine Learning Algorithms and Behavioral Game TheoryOn the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
On the Dynamics of Machine Learning Algorithms and Behavioral Game Theory
 
LCA and RMQ ~簡潔もあるよ!~
LCA and RMQ ~簡潔もあるよ!~LCA and RMQ ~簡潔もあるよ!~
LCA and RMQ ~簡潔もあるよ!~
 
sublabel accurate convex relaxation of vectorial multilabel energies
sublabel accurate convex relaxation of vectorial multilabel energiessublabel accurate convex relaxation of vectorial multilabel energies
sublabel accurate convex relaxation of vectorial multilabel energies
 
プログラミングコンテストでのデータ構造 2 ~動的木編~
プログラミングコンテストでのデータ構造 2 ~動的木編~プログラミングコンテストでのデータ構造 2 ~動的木編~
プログラミングコンテストでのデータ構造 2 ~動的木編~
 
DeepLearningTutorial
DeepLearningTutorialDeepLearningTutorial
DeepLearningTutorial
 
Greed is Good: 劣モジュラ関数最大化とその発展
Greed is Good: 劣モジュラ関数最大化とその発展Greed is Good: 劣モジュラ関数最大化とその発展
Greed is Good: 劣モジュラ関数最大化とその発展
 
ウェーブレット木の世界
ウェーブレット木の世界ウェーブレット木の世界
ウェーブレット木の世界
 
PRML輪読#14
PRML輪読#14PRML輪読#14
PRML輪読#14
 
Practical recommendations for gradient-based training of deep architectures
Practical recommendations for gradient-based training of deep architecturesPractical recommendations for gradient-based training of deep architectures
Practical recommendations for gradient-based training of deep architectures
 
ORB-SLAMを動かしてみた
ORB-SLAMを動かしてみたORB-SLAMを動かしてみた
ORB-SLAMを動かしてみた
 
強化学習その2
強化学習その2強化学習その2
強化学習その2
 
多項式あてはめで眺めるベイズ推定 ~今日からきみもベイジアン~
多項式あてはめで眺めるベイズ推定~今日からきみもベイジアン~多項式あてはめで眺めるベイズ推定~今日からきみもベイジアン~
多項式あてはめで眺めるベイズ推定 ~今日からきみもベイジアン~
 

Similar to Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mapping
satrajit
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo Summit
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 

Similar to Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm (20)

Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Reducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology MappingReducing Structural Bias in Technology Mapping
Reducing Structural Bias in Technology Mapping
 
Space time & power.
Space time & power.Space time & power.
Space time & power.
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and SupercomputingAccumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
Accumulo and the Convergence of Machine Learning, Big Data, and Supercomputing
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Deep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorchDeep_Learning_Frameworks_CNTK_PyTorch
Deep_Learning_Frameworks_CNTK_PyTorch
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
MLconf seattle 2015 presentation
MLconf seattle 2015 presentationMLconf seattle 2015 presentation
MLconf seattle 2015 presentation
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
ECET 375 Success Begins/Newtonhelp.com
ECET 375 Success Begins/Newtonhelp.comECET 375 Success Begins/Newtonhelp.com
ECET 375 Success Begins/Newtonhelp.com
 

Recently uploaded

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm

  • 1. Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Takuya Akiba (Preferred Networks, Inc.) Kenko Nakamura (Recruit Communications., Ltd.) Taro Takaguchi (National Institute of Information and Communications Technology) *Work done while all authors were at National Institute of Informatics 1
  • 2. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Fractality of networks 2 Some of real-world networks are fractal. [Song+, Nature’05]
  • 3. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm ▶ box := set of vertices within a radius of ℓ ▶b(ℓ) := number of boxes needed to cover the whole graph ▶ graph said to be fractal ⇔ b(ℓ) ∝ ℓ−d Definition of Graph Fractality 3 ← Fractal network model
  • 4. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm ▶ b(ℓ) := number of boxes needed to cover the whole graph Box-Covering Problem 4 Box-Covering Problem : Determination of the fractality ▶ Minimize b(ℓ) ▶ Box-Covering Problem is NP-Hard ▶ Approximation algorithms are used
  • 5. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Box-Covering Problem Previous Algorithms computation time is too long! infeasible for networks with millions of vertices 5 This Work near-liner time complexity works with tens of millions of vertices
  • 6. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Compared with Previous Method Previous Naive Method [Song+’05] ▶ Step 1: Instantiate all boxes BFS from each vertex ▶ Step 2: Solve set cover problem Greedy algorithm with approximation ratio 1 + ln n Proposed Method ▶ Step 1: Instantiate Min-Hash of all boxes Similar to algorithms for All-Distances Sketches ▶ Step 2: Solve set cover problem in the sketch-space Near-linear time complexity by using BST and Heap 6
  • 7. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Experimental Results Computation Time Memory Usage Environment: Intel Xeon 2.67GHz, 96GB 10 times faster than the previous algorithms Flower model BA model
  • 8. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Real Large Network ▶ Web graph with 1M vertices and 17M edges (in-2004) – 11.7 hours in total ▶ Fractality analysis of million-scale network for the first time 8
  • 9. Akiba+ | Fractality of Massive Graphs: Scalable Analysis with Sketch-Based Box-Covering Algorithm Summary Background: Fractality of real-world network ▶ Some of the real-world networks are fractal. ▶ Lack of an efficient algorithm Proposed Method: Box-Covering on Min-Hash ▶ Avoid explicit representation of boxes ▶ Efficient Min-Hash computation: Similar to ADS ▶ Efficient Greedy by Binary Search Tree and Heap ▶ Fractality analysis of the network with 17M edges 9

Editor's Notes

  1. Welcome to my presentation. I am Kenko Nakamura, a software engineer at Recruit Communications. Today, I would like to talk about Fractality of Massive Graphs and Scalable Analysis with Sketch-Based Box-Covering Algorithm.
  2. For data mining on network, we can use many kinds of properties of networks, such as vertex degree, average distances and so on. As a non-local property, the fractality of complex networks was found in network science. The fractality of a network suggests that the network shows a self-similar structure (like that).
  3. This is the definition of graph fractality. The set of vertices within a radius of L is called “box”. Then, if the number of boxes follows a power-low function of L, the network is said to be fractal. This figure illustrate the comparison for a fractal network model. Plotted points of the numbers of boxes are closer to the power-law function than to the exponential function.
  4. Determination of the fractality is based on the box-covering problem. We have to minimize the number of boxes. However, it is known to be an NP-hard problem. So, to determine the fractality of networks, approximation algorithms are used.
  5. In previous algorithms, computation time is too long. Because they generate all boxes with quadratic space, they are infeasible for large-scale networks with millions of vertices. In this work, our algorithm achieves near-linear time complexity And works with tens of millions of vertices.
  6. Compared with Previous Method, there are two different points. In our method, First, all boxes are generated as Min-Hash Sketch. This generation algorithm is similar to one used in All-Distance Sketches. Second, set cover problem is solved in the Sketch space.
  7. These are the Experimental Results with previous methods. Our method is showed as these red lines. These figures are plotted in log-log scale. Left figures are for fractal networks, and right figures are for non-fractal networks. They shows that our algorithm can run at least 10 times faster than the previous algorithms.
  8. This is the experimental result for real-world large network. This network is crawled web graph of 1M vertices and 17M edges. A large part of the points fall on the line of the fitted power-law function, which suggests the fractality of this network. The fractality of the million-scale network is unveiled for the first time.
  9. 9