SlideShare a Scribd company logo
1 of 28
Graph Clustering
Based on Structural/Attribute Similarities
         Yang Zhou, Hong Cheng, Jeffrey Xu Yu

     Proc. Of the VLDB Endowment, France, 2009




                 Thursday, August 16, 2012



                           Presenter
                     Waqas Nawaz

   Data Knowledge and Engineering Lab, Kyung Hee University Korea
Agenda




                                              3/8
Data and Knowledge Engineering Lab            2
Introduction
 X = {x1, … , xN}: a set of data points
 S = (sij)i,j=1,…,N: the similarity matrix in which each element indicates the similarity sij
  between two data points xi and xj

 The goal of clustering is to divide the data points into several groups such that
  points in the same group are similar and points in different groups are dissimilar.

 Modeling the dataset as a graph

 The clustering problem in graph perspective is then formulated as a partition of
  the graph such that nodes in the same sub-graph are densely
  connected/homogeneous and sparsely connected /heterogeneous to the rest of
  the graph.

 Distances and similarities are reverse to each other. In the following, only talk
  about similarities, everything also works with distances.



                                                                                                 3/8
      Data and Knowledge Engineering Lab                                                         3
Motivation

 The identification of clusters, well-connected components in a
  graph, which is useful in many applications from biological
  function prediction to social community detection

                                                                     Attribute of Authors




                                  from manyeyes.alphaworks.ibm.com
                                                                                      3/8
    Data and Knowledge Engineering Lab                                                 4
Objective

 A desired clustering of attributed graph should achieve a good
  balance between the following:

    Structural cohesiveness: Vertices within one cluster are close to each
     other in terms of structure, while vertices between clusters are
     distant from each other

    Attribute homogeneity: Vertices within one cluster have similar
     attribute values, while vertices between clusters have quite different
     attribute values


                                  Structural
                                Cohesiveness     Attribute
                                               Homogeneity



                                                                              3/8
    Data and Knowledge Engineering Lab                                        5
Related Work

 Structure Based Clustering
    Normalized cuts [Shi and Malik, TPAMI 2000]
    Modularity [Newman and Girvan, Phys. Rev. 2004]
    SCAN [Xu et al., KDD'07]
  The clusters generated have a rather random distribution of vertex
  properties within clusters

 Attribute Based Clustering
    K-SNAP [Tian et al., SIGMOD’08]
    Attributes compatible grouping
  The clusters generated have a rather loose intra-cluster structure

  Is there any way to consider both factors (Structure and Attribute)
  simultaneously while Clustering…? YES

                                                                        3/8
    Data and Knowledge Engineering Lab                                  6
Graph Clustering with Structure & Attribute (1/11)

 Structure-based Clustering
    Vertices with heterogeneous values in a cluster

 Attribute-based Clustering
    Lose much structure information

 Structural/Attribute Cluster
    Vertices with homogeneous values in a cluster
    Keep most structure information




                                                       3/8
    Data and Knowledge Engineering Lab                 7
Graph Clustering with Structure & Attribute (2/11)
                                                                       r1. XML
 Example: A Coauthor Network

Attribute-based Cluster
Structural Clustering
Structural/Attribute Cluster
                                                    r3. XML, Skyline             r2. XML



                                                                             r4. XML


                                                                       r5. XML
                                                                                           r6. XML
                                             r9. Skyline




                             r10. Skyline              r11. Skyline              r7. XML      r8. XML




                                                                                                        3/8
        Data and Knowledge Engineering Lab                                                              8
Graph Clustering with Structure & Attribute (3/11)

 Proposed iDEA: Flow Diagram


                                            G        Transform vertex attributes
                  Desired
                                                     to attribute edges
                  Clusters



                     Clustering
                                                      Ga
                       on G


      Mapping onto the                                A unified distance
      original graph                    Clustering    on edges
                                          on Ga


                                                                                   3/8
   Data and Knowledge Engineering Lab                                              9
Graph Clustering with Structure & Attribute (4/11)

 Attribute Augmented Coauthor Graph with Topics
                                         r1. XML




                      r3. XML, Skyline             r2. XML



                                               r4. XML


                                         r5. XML
                                                             r6. XML
               r9. Skyline




r10. Skyline             r11. Skyline              r7. XML      r8. XML




                         Original                                         Modified
                        Then we use neighborhood random walk distance on the augmented
                               graph to combine structural and attribute similarities
                                                                                         3/8
           Data and Knowledge Engineering Lab                                            10
Neighborhood Random Walk (1/2)

    A           B           C                A         B           C

A                                        A
B                                        B
C                                        C


Adjacency matrix A                           Transition matrix P


                    B                                  B
            1                                    1
                                     1                                 1/2
                        1                                  1
        A                                    A

                1                                    1/2       C
                            C


                                                                             3/8
Data and Knowledge Engineering Lab                                           11
Neighborhood Random Walk (2/2)


                                t=0                                 t=1
                    B
          1
                                      1/2                   B
                        1
    A                                               1
                                                                            1/2
                                                                1
               1/2                              A
                            C
                                                        1/2         C
                                t=2
                B
      1                                                             t=3
                                  1/2                   B
                    1
  A                                             1
                                                                          1/2
                                                            1
              1/2           C               A

                                                    1/2         C

                                                                                  3/8
Data and Knowledge Engineering Lab                                                12
Graph Clustering with Structure & Attribute (5/11)

 The Kinds of Vertices and Edges
    Two kinds of vertices
         • The Structure Vertex Set V
         • The Attribute Vertex Set Va


    Two kinds of edges
         • The structure edges E
         • The attribute edges Ea


    The attribute augmented graph




                                                      3/8
    Data and Knowledge Engineering Lab                13
Graph Clustering with Structure & Attribute (6/11)

 New Clustering Framework
                                  Calculate the distance


                            Initialize the cluster centroids


                              Assign vertices to a cluster


                             Update the cluster centroids


                         Adjust edge weights automatically


                          Re-calculate the distance matrix
     The objective function converges


                                                               3/8
   Data and Knowledge Engineering Lab                          14
Graph Clustering with Structure & Attribute (7/11)

 Transition Probability Matrix on Attribute Augmented Graph




      PV: probabilities from structure vertices to structure vertices
      A: probabilities from structure vertices to attribute vertices
      B: probabilities from attribute vertices to structure vertices
      O: probabilities from attributes to attributes, all entries are zero

                                                                              3/8
   Data and Knowledge Engineering Lab                                         15
Graph Clustering with Structure & Attribute (8/11)

 A Unified Distance Measure
    The unified neighborhood random walk distance:


    The matrix form of the neighborhood random walk distance:


 Cluster Centroid Initialization
    Identify good initial centroids from the density point of view
     [Hinneburg and Keim, AAAI 1998]

    Influence function of vi on vj


    Density function of vi

                                                                      3/8
    Data and Knowledge Engineering Lab                                16
Graph Clustering with Structure & Attribute (9/11)

 Clustering Process (K-means framework)
    Assign each vertex vi              V to its closest centroid c* :


    Update the centroid with the most centrally located vertex in
     each cluster:
        •   Compute the “average point” vi of a cluster Vi




        • Find the new centroid whose random walk distance vector is the closest to
          the cluster average




                                                                                      3/8
   Data and Knowledge Engineering Lab                                                 17
Graph Clustering with Structure & Attribute (10/11)

 Edge Weight Definition
    Different types of edges may have different degrees of importance
        • Structure edge weight 0 fixed to 1.0 in the whole clustering process
        • Attribute edge weight i for i 1,2,...,m
        • All weights are initialized to 1.0, but will be automatically updated during clustering



  “Topic” has a
  more important
  role than “age”




                                                                                                    3/8
   Data and Knowledge Engineering Lab                                                               18
Graph Clustering with Structure & Attribute (11/11)

 Weight Self-Adjustment
    A vote mechanism determines whether two vertices share an
     attribute value:


    Weight Increment:




    How the weight adjustment affects clustering convergence?
        • Objective Function


        • Demonstrate that the weights are adjusted towards the direction of
          clustering convergence when we iteratively refine the clusters.



                                                                               3/8
   Data and Knowledge Engineering Lab                                          19
Experimental Evaluation (1/5)

 Datasets
    Political Blogs Dataset: 1490 vertices, 19090 edges, one
     attribute political leaning
    DBLP Dataset: 5000 vertices, 16010 edges, two attributes
     prolific and topic

 Methods
      K-SNAP [Tian et al., SIGMOD'08]: attribute only
      S-Cluster structure-based clustering
      W-Cluster weighted function
      SA-Cluster proposed method




                                                                3/8
   Data and Knowledge Engineering Lab                           20
Experimental Evaluation (2/5)

 Evaluation Metrics
    Density: intra-cluster structural cohesiveness




    Entropy: intra-cluster attribute homogeneity




                                                      3/8
   Data and Knowledge Engineering Lab                 21
Experimental Evaluation (3/5)

 Cluster Quality Evaluation




                                                   3/8
   Data and Knowledge Engineering Lab              22
Experimental Evaluation (4/5)

 Cluster Quality Evaluation




                                                   3/8
   Data and Knowledge Engineering Lab              23
Experimental Evaluation (5/5)

 Clustering Convergence




                                                   3/8
   Data and Knowledge Engineering Lab              24
Conclusion
 Studied the problem of clustering graph with multiple
  attributes on the attribute augmented graph

 A unified neighborhood random walk distance measures vertex
  closeness on an attribute augmented graph

 Theoretical analysis to quantitatively        estimate   the
  contributions of attribute similarity

 Automatically adjust the degree of contributions of different
  attributes towards the direction of clustering convergence



                                                                  3/8
   Data and Knowledge Engineering Lab                         25
Critical Review
 In literature, many algorithms have been proposed by various
  authors, however they consider structural or attribute aspect
  for finding similarities among nodes in the graph

 In this paper, both aspects are considered simultaneously
  which reflect the true nature of the cluster or similarity among
  different objects

 It utilizes the concept of Random Walk on the graph which
  requires matrix manipulation (i.e. multiplication) so it become
  unrealistic for huge dataset

 Due to iterative calculation of the similarity , it can not be
  scalable to huge network (graph dataset)
                                                                     3/8
    Data and Knowledge Engineering Lab                           26
Feasible Improvements
 Iterative nature of the similarity calculation should be avoided
  by incorporating other feasible methods for relevancy check

 It can be scalable to the network where the nodes are not
  densely connected with each other. In this way, they have less
  degree and similarity calculation can be done easily

 Augmentation process can be remodeled/avoided to reduce
  the space complexity and time consumption




                                                                     3/8
    Data and Knowledge Engineering Lab                           27
Questions




                                Suggestions…!
                                                3/8
Data and Knowledge Engineering Lab              28

More Related Content

What's hot

A NOBEL HYBRID APPROACH FOR EDGE DETECTION
A NOBEL HYBRID APPROACH FOR EDGE  DETECTIONA NOBEL HYBRID APPROACH FOR EDGE  DETECTION
A NOBEL HYBRID APPROACH FOR EDGE DETECTIONijcses
 
DNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review pptDNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review ppttaeseon ryu
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...Sunny Kr
 
Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...
Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...
Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...ActiveEon
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionActiveEon
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Ha Phuong
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIWanjin Yu
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2Shrayes Ramesh
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료taeseon ryu
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Dongmin Choi
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesKeyon Vafa
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Christopher Morris
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationChristopher Morris
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbfkylin
 
Exact network reconstruction from consensus signals and one eigen value
Exact network reconstruction from consensus signals and one eigen valueExact network reconstruction from consensus signals and one eigen value
Exact network reconstruction from consensus signals and one eigen valueIJCNCJournal
 
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...ActiveEon
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection aftab alam
 

What's hot (20)

A NOBEL HYBRID APPROACH FOR EDGE DETECTION
A NOBEL HYBRID APPROACH FOR EDGE  DETECTIONA NOBEL HYBRID APPROACH FOR EDGE  DETECTION
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
 
CSMR11b.ppt
CSMR11b.pptCSMR11b.ppt
CSMR11b.ppt
 
DNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review pptDNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review ppt
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
 
Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...
Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...
Double-constrained RPCA based on Saliency Maps for Foreground Detection in Au...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)Tutorial of topological data analysis part 3(Mapper algorithm)
Tutorial of topological data analysis part 3(Mapper algorithm)
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료Detection focal loss 딥러닝 논문읽기 모임 발표자료
Detection focal loss 딥러닝 논문읽기 모임 발표자료
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
Training and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian ProcessesTraining and Inference for Deep Gaussian Processes
Training and Inference for Deep Gaussian Processes
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph Classification
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Exact network reconstruction from consensus signals and one eigen value
Exact network reconstruction from consensus signals and one eigen valueExact network reconstruction from consensus signals and one eigen value
Exact network reconstruction from consensus signals and one eigen value
 
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 

Similar to Presentation on Graph Clustering (vldb 09)

Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingKyong-Ha Lee
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET Journal
 
Modelo de dados vetorial e matricial - slides
Modelo de dados vetorial e matricial - slidesModelo de dados vetorial e matricial - slides
Modelo de dados vetorial e matricial - slidesLuzianeRibeiroIndjai
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationKAMAL CHOUDHARY
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupOscar Corcho
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And MiningSrinath Srinivasa
 
Integrating GIS utility data in the UK
Integrating GIS utility data in the UKIntegrating GIS utility data in the UK
Integrating GIS utility data in the UKAntArch
 
Cross domain sentiment classification via spectral feature alignment
Cross domain sentiment classification via spectral feature alignmentCross domain sentiment classification via spectral feature alignment
Cross domain sentiment classification via spectral feature alignmentlau
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databasesijdms
 
Ontology-based approach for BIM exchanges
Ontology-based approach for BIM exchangesOntology-based approach for BIM exchanges
Ontology-based approach for BIM exchangesManu Venugopal
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
Graph Space Viewer
Graph Space ViewerGraph Space Viewer
Graph Space Viewerrydark
 
Spatio textual similarity join
Spatio textual similarity joinSpatio textual similarity join
Spatio textual similarity joinIJDKP
 

Similar to Presentation on Graph Clustering (vldb 09) (20)

Bitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query ProcessingBitmap Indexes for Relational XML Twig Query Processing
Bitmap Indexes for Relational XML Twig Query Processing
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
 
Modelo de dados vetorial e matricial - slides
Modelo de dados vetorial e matricial - slidesModelo de dados vetorial e matricial - slides
Modelo de dados vetorial e matricial - slides
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum Computation
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering Group
 
Project TRAIN
Project TRAINProject TRAIN
Project TRAIN
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
 
Topology in GIS
Topology in GISTopology in GIS
Topology in GIS
 
GraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query ProcessorGraphREL: A Relational Graph Query Processor
GraphREL: A Relational Graph Query Processor
 
Graph Theory and Databases
Graph Theory and DatabasesGraph Theory and Databases
Graph Theory and Databases
 
Raster data and Vector data
Raster data and Vector dataRaster data and Vector data
Raster data and Vector data
 
Integrating GIS utility data in the UK
Integrating GIS utility data in the UKIntegrating GIS utility data in the UK
Integrating GIS utility data in the UK
 
Cross domain sentiment classification via spectral feature alignment
Cross domain sentiment classification via spectral feature alignmentCross domain sentiment classification via spectral feature alignment
Cross domain sentiment classification via spectral feature alignment
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databases
 
Ontology-based approach for BIM exchanges
Ontology-based approach for BIM exchangesOntology-based approach for BIM exchanges
Ontology-based approach for BIM exchanges
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
Graph Space Viewer
Graph Space ViewerGraph Space Viewer
Graph Space Viewer
 
Ijetcas14 314
Ijetcas14 314Ijetcas14 314
Ijetcas14 314
 
Geographical Information System (GIS)
Geographical Information System (GIS)Geographical Information System (GIS)
Geographical Information System (GIS)
 
Spatio textual similarity join
Spatio textual similarity joinSpatio textual similarity join
Spatio textual similarity join
 

More from Waqas Nawaz

Design and analysis of algorithms - Abstract View
Design and analysis of algorithms - Abstract ViewDesign and analysis of algorithms - Abstract View
Design and analysis of algorithms - Abstract ViewWaqas Nawaz
 
(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphsWaqas Nawaz
 
(Icmia 2013) personalized community detection using collaborative similarity ...
(Icmia 2013) personalized community detection using collaborative similarity ...(Icmia 2013) personalized community detection using collaborative similarity ...
(Icmia 2013) personalized community detection using collaborative similarity ...Waqas Nawaz
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...Waqas Nawaz
 
Andrewng webinar moocs
Andrewng webinar moocsAndrewng webinar moocs
Andrewng webinar moocsWaqas Nawaz
 
Oritentation session at Kyung Hee University for new students 2014
Oritentation session at Kyung Hee University for new students 2014Oritentation session at Kyung Hee University for new students 2014
Oritentation session at Kyung Hee University for new students 2014Waqas Nawaz
 
Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...Waqas Nawaz
 
Social Media and We
Social Media and WeSocial Media and We
Social Media and WeWaqas Nawaz
 
Social Media vs. Social Relationships
Social Media vs. Social RelationshipsSocial Media vs. Social Relationships
Social Media vs. Social RelationshipsWaqas Nawaz
 
Fourteen steps to a clearly written technical paper
Fourteen steps to a clearly written technical paperFourteen steps to a clearly written technical paper
Fourteen steps to a clearly written technical paperWaqas Nawaz
 
강의(영어) 한국의Smu(이재창)-2012
강의(영어) 한국의Smu(이재창)-2012강의(영어) 한국의Smu(이재창)-2012
강의(영어) 한국의Smu(이재창)-2012Waqas Nawaz
 

More from Waqas Nawaz (12)

Design and analysis of algorithms - Abstract View
Design and analysis of algorithms - Abstract ViewDesign and analysis of algorithms - Abstract View
Design and analysis of algorithms - Abstract View
 
(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs
 
(Icmia 2013) personalized community detection using collaborative similarity ...
(Icmia 2013) personalized community detection using collaborative similarity ...(Icmia 2013) personalized community detection using collaborative similarity ...
(Icmia 2013) personalized community detection using collaborative similarity ...
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
 
Andrewng webinar moocs
Andrewng webinar moocsAndrewng webinar moocs
Andrewng webinar moocs
 
Oritentation session at Kyung Hee University for new students 2014
Oritentation session at Kyung Hee University for new students 2014Oritentation session at Kyung Hee University for new students 2014
Oritentation session at Kyung Hee University for new students 2014
 
Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...
 
Social Media and We
Social Media and WeSocial Media and We
Social Media and We
 
Social Media vs. Social Relationships
Social Media vs. Social RelationshipsSocial Media vs. Social Relationships
Social Media vs. Social Relationships
 
Fourteen steps to a clearly written technical paper
Fourteen steps to a clearly written technical paperFourteen steps to a clearly written technical paper
Fourteen steps to a clearly written technical paper
 
Big data
Big dataBig data
Big data
 
강의(영어) 한국의Smu(이재창)-2012
강의(영어) 한국의Smu(이재창)-2012강의(영어) 한국의Smu(이재창)-2012
강의(영어) 한국의Smu(이재창)-2012
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

Presentation on Graph Clustering (vldb 09)

  • 1. Graph Clustering Based on Structural/Attribute Similarities Yang Zhou, Hong Cheng, Jeffrey Xu Yu Proc. Of the VLDB Endowment, France, 2009 Thursday, August 16, 2012 Presenter Waqas Nawaz Data Knowledge and Engineering Lab, Kyung Hee University Korea
  • 2. Agenda 3/8 Data and Knowledge Engineering Lab 2
  • 3. Introduction  X = {x1, … , xN}: a set of data points  S = (sij)i,j=1,…,N: the similarity matrix in which each element indicates the similarity sij between two data points xi and xj  The goal of clustering is to divide the data points into several groups such that points in the same group are similar and points in different groups are dissimilar.  Modeling the dataset as a graph  The clustering problem in graph perspective is then formulated as a partition of the graph such that nodes in the same sub-graph are densely connected/homogeneous and sparsely connected /heterogeneous to the rest of the graph.  Distances and similarities are reverse to each other. In the following, only talk about similarities, everything also works with distances. 3/8 Data and Knowledge Engineering Lab 3
  • 4. Motivation  The identification of clusters, well-connected components in a graph, which is useful in many applications from biological function prediction to social community detection Attribute of Authors from manyeyes.alphaworks.ibm.com 3/8 Data and Knowledge Engineering Lab 4
  • 5. Objective  A desired clustering of attributed graph should achieve a good balance between the following:  Structural cohesiveness: Vertices within one cluster are close to each other in terms of structure, while vertices between clusters are distant from each other  Attribute homogeneity: Vertices within one cluster have similar attribute values, while vertices between clusters have quite different attribute values Structural Cohesiveness Attribute Homogeneity 3/8 Data and Knowledge Engineering Lab 5
  • 6. Related Work  Structure Based Clustering  Normalized cuts [Shi and Malik, TPAMI 2000]  Modularity [Newman and Girvan, Phys. Rev. 2004]  SCAN [Xu et al., KDD'07] The clusters generated have a rather random distribution of vertex properties within clusters  Attribute Based Clustering  K-SNAP [Tian et al., SIGMOD’08]  Attributes compatible grouping The clusters generated have a rather loose intra-cluster structure Is there any way to consider both factors (Structure and Attribute) simultaneously while Clustering…? YES 3/8 Data and Knowledge Engineering Lab 6
  • 7. Graph Clustering with Structure & Attribute (1/11)  Structure-based Clustering  Vertices with heterogeneous values in a cluster  Attribute-based Clustering  Lose much structure information  Structural/Attribute Cluster  Vertices with homogeneous values in a cluster  Keep most structure information 3/8 Data and Knowledge Engineering Lab 7
  • 8. Graph Clustering with Structure & Attribute (2/11) r1. XML  Example: A Coauthor Network Attribute-based Cluster Structural Clustering Structural/Attribute Cluster r3. XML, Skyline r2. XML r4. XML r5. XML r6. XML r9. Skyline r10. Skyline r11. Skyline r7. XML r8. XML 3/8 Data and Knowledge Engineering Lab 8
  • 9. Graph Clustering with Structure & Attribute (3/11)  Proposed iDEA: Flow Diagram G Transform vertex attributes Desired to attribute edges Clusters Clustering Ga on G Mapping onto the A unified distance original graph Clustering on edges on Ga 3/8 Data and Knowledge Engineering Lab 9
  • 10. Graph Clustering with Structure & Attribute (4/11)  Attribute Augmented Coauthor Graph with Topics r1. XML r3. XML, Skyline r2. XML r4. XML r5. XML r6. XML r9. Skyline r10. Skyline r11. Skyline r7. XML r8. XML Original Modified Then we use neighborhood random walk distance on the augmented graph to combine structural and attribute similarities 3/8 Data and Knowledge Engineering Lab 10
  • 11. Neighborhood Random Walk (1/2) A B C A B C A A B B C C Adjacency matrix A Transition matrix P B B 1 1 1 1/2 1 1 A A 1 1/2 C C 3/8 Data and Knowledge Engineering Lab 11
  • 12. Neighborhood Random Walk (2/2) t=0 t=1 B 1 1/2 B 1 A 1 1/2 1 1/2 A C 1/2 C t=2 B 1 t=3 1/2 B 1 A 1 1/2 1 1/2 C A 1/2 C 3/8 Data and Knowledge Engineering Lab 12
  • 13. Graph Clustering with Structure & Attribute (5/11)  The Kinds of Vertices and Edges  Two kinds of vertices • The Structure Vertex Set V • The Attribute Vertex Set Va  Two kinds of edges • The structure edges E • The attribute edges Ea  The attribute augmented graph 3/8 Data and Knowledge Engineering Lab 13
  • 14. Graph Clustering with Structure & Attribute (6/11)  New Clustering Framework Calculate the distance Initialize the cluster centroids Assign vertices to a cluster Update the cluster centroids Adjust edge weights automatically Re-calculate the distance matrix The objective function converges 3/8 Data and Knowledge Engineering Lab 14
  • 15. Graph Clustering with Structure & Attribute (7/11)  Transition Probability Matrix on Attribute Augmented Graph  PV: probabilities from structure vertices to structure vertices  A: probabilities from structure vertices to attribute vertices  B: probabilities from attribute vertices to structure vertices  O: probabilities from attributes to attributes, all entries are zero 3/8 Data and Knowledge Engineering Lab 15
  • 16. Graph Clustering with Structure & Attribute (8/11)  A Unified Distance Measure  The unified neighborhood random walk distance:  The matrix form of the neighborhood random walk distance:  Cluster Centroid Initialization  Identify good initial centroids from the density point of view [Hinneburg and Keim, AAAI 1998]  Influence function of vi on vj  Density function of vi 3/8 Data and Knowledge Engineering Lab 16
  • 17. Graph Clustering with Structure & Attribute (9/11)  Clustering Process (K-means framework)  Assign each vertex vi V to its closest centroid c* :  Update the centroid with the most centrally located vertex in each cluster: • Compute the “average point” vi of a cluster Vi • Find the new centroid whose random walk distance vector is the closest to the cluster average 3/8 Data and Knowledge Engineering Lab 17
  • 18. Graph Clustering with Structure & Attribute (10/11)  Edge Weight Definition  Different types of edges may have different degrees of importance • Structure edge weight 0 fixed to 1.0 in the whole clustering process • Attribute edge weight i for i 1,2,...,m • All weights are initialized to 1.0, but will be automatically updated during clustering “Topic” has a more important role than “age” 3/8 Data and Knowledge Engineering Lab 18
  • 19. Graph Clustering with Structure & Attribute (11/11)  Weight Self-Adjustment  A vote mechanism determines whether two vertices share an attribute value:  Weight Increment:  How the weight adjustment affects clustering convergence? • Objective Function • Demonstrate that the weights are adjusted towards the direction of clustering convergence when we iteratively refine the clusters. 3/8 Data and Knowledge Engineering Lab 19
  • 20. Experimental Evaluation (1/5)  Datasets  Political Blogs Dataset: 1490 vertices, 19090 edges, one attribute political leaning  DBLP Dataset: 5000 vertices, 16010 edges, two attributes prolific and topic  Methods  K-SNAP [Tian et al., SIGMOD'08]: attribute only  S-Cluster structure-based clustering  W-Cluster weighted function  SA-Cluster proposed method 3/8 Data and Knowledge Engineering Lab 20
  • 21. Experimental Evaluation (2/5)  Evaluation Metrics  Density: intra-cluster structural cohesiveness  Entropy: intra-cluster attribute homogeneity 3/8 Data and Knowledge Engineering Lab 21
  • 22. Experimental Evaluation (3/5)  Cluster Quality Evaluation 3/8 Data and Knowledge Engineering Lab 22
  • 23. Experimental Evaluation (4/5)  Cluster Quality Evaluation 3/8 Data and Knowledge Engineering Lab 23
  • 24. Experimental Evaluation (5/5)  Clustering Convergence 3/8 Data and Knowledge Engineering Lab 24
  • 25. Conclusion  Studied the problem of clustering graph with multiple attributes on the attribute augmented graph  A unified neighborhood random walk distance measures vertex closeness on an attribute augmented graph  Theoretical analysis to quantitatively estimate the contributions of attribute similarity  Automatically adjust the degree of contributions of different attributes towards the direction of clustering convergence 3/8 Data and Knowledge Engineering Lab 25
  • 26. Critical Review  In literature, many algorithms have been proposed by various authors, however they consider structural or attribute aspect for finding similarities among nodes in the graph  In this paper, both aspects are considered simultaneously which reflect the true nature of the cluster or similarity among different objects  It utilizes the concept of Random Walk on the graph which requires matrix manipulation (i.e. multiplication) so it become unrealistic for huge dataset  Due to iterative calculation of the similarity , it can not be scalable to huge network (graph dataset) 3/8 Data and Knowledge Engineering Lab 26
  • 27. Feasible Improvements  Iterative nature of the similarity calculation should be avoided by incorporating other feasible methods for relevancy check  It can be scalable to the network where the nodes are not densely connected with each other. In this way, they have less degree and similarity calculation can be done easily  Augmentation process can be remodeled/avoided to reduce the space complexity and time consumption 3/8 Data and Knowledge Engineering Lab 27
  • 28. Questions Suggestions…! 3/8 Data and Knowledge Engineering Lab 28