SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Frequent subgraph discovery for
      a single large graph
Agenda
•   Motivation
•   Summary of existing approaches ?
•   Support computations
•   Comparison and Evaluation
Background
• Frequent subgraph mining
  – Graph-transection setting (for graph datasets)
     • Many small graphs
  – Single-graph setting
     • One big graph


• New problem for single-graph setting
  – Definition of support
Challenge
• Difficulty of defining the support in a large
  graph
  – Property of anti-monotone is required in pruning
    the search space


• Anti-monotone
  – A⊂B ⇒sup(A) > sup(B)
Subgraph Support
• The most intuitive definition
   – Count of embeddings in input graph
       • Not anti-monotone
 Count of embeddings   1      2           2




                                              5
Motivation
• Suggest a new definition of support for
  subgraph that
  – Resulting support is anti-monotone
  – Support can be computed efficiently


• Three Support computation algorithms
  – Overlap based (2)
  – Minimum image based (1)
Agenda
• Motivation
• Summary of existing approaches
• Support computations
  – Simple overlap
                         Overlap based methods
  – Harmful overlap
  – Minimum image
• Comparison and Evaluation
Overlap based support
• The size of maximum independent set (MIS)
  – Find overlaps
  – Find maximum independent node size
Overlap
• Sharing at least one node in each embeddings

• 𝑉1 ∩ 𝑉2 ≠ ∅
    (𝑉1 , 𝑉2 : 𝑛𝑜𝑑𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑠)




  Embedding is an occurrence of pattern
                                                 9
Overlap Graph
• 𝑂 = (𝑉 𝑂 , 𝐸 𝑂 )
   – 𝑉 𝑂 : set of embeddings as its node set
   – 𝐸 𝑂 = { 𝑓1 , 𝑓2 |
      𝑓1 , 𝑓2 ∈ 𝑉 𝑂 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 ∩ 𝑉2 ≠ ∅ 1 ∈ 1 , 𝑓2 ∈
                                            𝑓   𝑉          𝑉
                                                           2   }
   – If two embeddings share at least one node,
     nodes of overlap graph is connected




                                                                   10
Maximum Independent Set Support
• Independent node set of Graph 𝐺 = (𝑉, 𝐸)
  – 𝐼 ⊆ 𝑉 𝑤𝑖𝑡ℎ ∀𝑢, 𝑣 ∈ 𝐼: 𝑢, 𝑣 ∈ 𝐸
  – Maximum independent node set need not to be
    unique


   The size of                        The size of
   maximum independent node set : 1   maximum independent node set : 2

• MIS-support = size of maximum independent
  node set
                                                                         11
Harmful Overlap Support(1/3)
• MIS-support
  – Considering any overlap as harmful


• Overlap is Not necessarily harmful
  – Anti-monotone property is important




                                          12
Harmful Overlap Support(2/3)
• Harmful Overlap Graph 𝐻 = (𝑉 𝐻 , 𝐸 𝐻 )
  – 𝑉 𝐻 : set of embeddings as its node set
  – 𝐸 𝐻 = {(𝑓1 , 𝑓2 )|𝑓1 , 𝑓2 ∈ 𝑉 𝐻 ∧ 𝑓1 ≡ 𝑓2 ∧
      𝑉1 = 𝑉2 ∨ 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉1 = 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉2
                      𝑓1 ∈ 𝑉1 , 𝑓2 ∈ 𝑉2 }

• HO-support              In this case,
                          MIS-support = 1, HO-support = 2




                                                        13
Harmful Overlap(3/3)
• Completing anti-monotone property




                                      #A : 2
                                      #B : 3
                                      #AB : 2
                                      #BAB : 2




                                            14
Note
• Harmful overlap is a weaker concept than
  simple overlap
  – HO-support is never lower than MIS-support




                                                 15
Experiment
• Support computation as Part of the
  MoSS(Molecular Substructure Miner) program
  – IC93 dataset[7]
     • 1283 molecules forms a connected component
  – Tic-Tac-Toc win dataset
     • This consists of 626 connected components




                                                    16
Result
• Vertical axis: Number of frequent subgraphs of
  which support exceeds threshold
• Horizontal axis: Number of nodes (of pattern)?
• In the case IC93
  – Up to 30% more
     • Due to heavily overlapping
       with of carbon atoms
• In the case Tic-Tac-Toe
  – Around 5 % more
                                               17
Agenda
• Motivation
• Summary of existing approaches
• Support computations
  – Simple overlap
  – Harmful overlap
  – Minimum image
• Comparison and Evaluation
Minimum image based definition
• Minimum image based support of p in g
  – Number of unique nodes mapped
      1

      2              Embeddings           Unique
                 1    3     3       5      3
      3          2    2     4       4      2
                 3    1     5       3      3
       4

       5
Benefits
I. Instead of 𝑂(𝑁 2 ) 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑠, 𝑂 𝑁 𝑑𝑎𝑡𝑎𝑠𝑒𝑡
II. No NP-compete MIS problem
III. Not necessary to compute all occurrence,
     only for all nodes
Agenda
• Motivation
• Summary of existing approaches
• Support computations
  – Simple overlap
  – Harmful overlap
  – Minimum image
• Comparison and Evaluation
Embedding of a Pattern
• 𝑃𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 = (𝑉𝑝 , 𝐸 𝑝 , 𝜆 𝑝 )
• 𝐷𝑎𝑡𝑎 𝑔𝑟𝑎𝑝ℎ 𝑔 = (𝑉𝑔 , 𝐸 𝑔 , 𝜆 𝑔 )
• 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝜑: 𝑉𝑝 → 𝑉𝑔
Three support measures
• Simple Overlap
   – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓
                 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝 ≠ ∅


• Harmful overlap
   – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓
        ∃𝑣 ∈ 𝑉𝑝 : 𝜑 𝑣 , 𝜑′(𝑣) ∈ 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝

• Minimum image based support of p in g
   – 𝜎3 𝑝, 𝑔 = min |{𝜑 𝑖 𝑣 : 𝜑 𝑖 𝑖𝑠 𝑎𝑛 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑜𝑓 𝑝 𝑖𝑛 𝑔}|
                𝑣∈𝑉 𝑝
Comparison




𝜎1 = 1    <     𝜎2 = 2      <     𝜎3 = 3
Overlap   harmful overlap       Minimum image
Experimental Setting
• Comparisons of Image-based and overlap-
  based algorithms

• Dataset
  – WebKB dataset (4 large graphs of structure of web
    pages)
Experiment Result
Conclusion
• Conclusion
  – Overlap based support measure that is anti-
    monotone
  – Maximum image based algorithm that is more
    efficient than previous ones

Contenu connexe

Tendances

Tendances (16)

Histogram Equalization(Image Processing Presentation)
Histogram Equalization(Image Processing Presentation)Histogram Equalization(Image Processing Presentation)
Histogram Equalization(Image Processing Presentation)
 
Graph convolutional networks in apache spark
Graph convolutional networks in apache sparkGraph convolutional networks in apache spark
Graph convolutional networks in apache spark
 
Complex numbers polynomial multiplication
Complex numbers polynomial multiplicationComplex numbers polynomial multiplication
Complex numbers polynomial multiplication
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th May
 
Restricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theoryRestricted Boltzman Machine (RBM) presentation of fundamental theory
Restricted Boltzman Machine (RBM) presentation of fundamental theory
 
Introduction to Image Processing
Introduction to Image ProcessingIntroduction to Image Processing
Introduction to Image Processing
 
Matlab_LT_0718
Matlab_LT_0718Matlab_LT_0718
Matlab_LT_0718
 
L1-based compression of random forest modelSlide
L1-based compression of random forest modelSlideL1-based compression of random forest modelSlide
L1-based compression of random forest modelSlide
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
 
Digital signal processing on arm new
Digital signal processing on arm newDigital signal processing on arm new
Digital signal processing on arm new
 
Lec15 sfm
Lec15 sfmLec15 sfm
Lec15 sfm
 
Popular image restoration technique
Popular image restoration techniquePopular image restoration technique
Popular image restoration technique
 
Dsp2
Dsp2Dsp2
Dsp2
 

Similaire à 120808

Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Grigory Yaroslavtsev
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
Laks Lakshmanan
 
clique-summary
clique-summaryclique-summary
clique-summary
Jia Wang
 

Similaire à 120808 (20)

Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)Parallel Algorithms for Geometric Graph Problems (at Stanford)
Parallel Algorithms for Geometric Graph Problems (at Stanford)
 
unit 4 nearest neighbor.ppt
unit 4 nearest neighbor.pptunit 4 nearest neighbor.ppt
unit 4 nearest neighbor.ppt
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
 
مدخل إلى تعلم الآلة
مدخل إلى تعلم الآلةمدخل إلى تعلم الآلة
مدخل إلى تعلم الآلة
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Introduction to dynamic programming
Introduction to dynamic programmingIntroduction to dynamic programming
Introduction to dynamic programming
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
Lecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest NeighborsLecture 8: Decision Trees & k-Nearest Neighbors
Lecture 8: Decision Trees & k-Nearest Neighbors
 
Clustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesClustering of graphs and search of assemblages
Clustering of graphs and search of assemblages
 
PR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation LearningPR-305: Exploring Simple Siamese Representation Learning
PR-305: Exploring Simple Siamese Representation Learning
 
IJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphsIJCAI13 Paper review: Large-scale spectral clustering on graphs
IJCAI13 Paper review: Large-scale spectral clustering on graphs
 
Case Study of Convolutional Neural Network
Case Study of Convolutional Neural NetworkCase Study of Convolutional Neural Network
Case Study of Convolutional Neural Network
 
Paper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipelinePaper Study: Melding the data decision pipeline
Paper Study: Melding the data decision pipeline
 
Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)Deep Learning Theory Seminar (Chap 3, part 2)
Deep Learning Theory Seminar (Chap 3, part 2)
 
clique-summary
clique-summaryclique-summary
clique-summary
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

120808

  • 1. Frequent subgraph discovery for a single large graph
  • 2. Agenda • Motivation • Summary of existing approaches ? • Support computations • Comparison and Evaluation
  • 3. Background • Frequent subgraph mining – Graph-transection setting (for graph datasets) • Many small graphs – Single-graph setting • One big graph • New problem for single-graph setting – Definition of support
  • 4. Challenge • Difficulty of defining the support in a large graph – Property of anti-monotone is required in pruning the search space • Anti-monotone – A⊂B ⇒sup(A) > sup(B)
  • 5. Subgraph Support • The most intuitive definition – Count of embeddings in input graph • Not anti-monotone Count of embeddings 1 2 2 5
  • 6. Motivation • Suggest a new definition of support for subgraph that – Resulting support is anti-monotone – Support can be computed efficiently • Three Support computation algorithms – Overlap based (2) – Minimum image based (1)
  • 7. Agenda • Motivation • Summary of existing approaches • Support computations – Simple overlap Overlap based methods – Harmful overlap – Minimum image • Comparison and Evaluation
  • 8. Overlap based support • The size of maximum independent set (MIS) – Find overlaps – Find maximum independent node size
  • 9. Overlap • Sharing at least one node in each embeddings • 𝑉1 ∩ 𝑉2 ≠ ∅ (𝑉1 , 𝑉2 : 𝑛𝑜𝑑𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔𝑠) Embedding is an occurrence of pattern 9
  • 10. Overlap Graph • 𝑂 = (𝑉 𝑂 , 𝐸 𝑂 ) – 𝑉 𝑂 : set of embeddings as its node set – 𝐸 𝑂 = { 𝑓1 , 𝑓2 | 𝑓1 , 𝑓2 ∈ 𝑉 𝑂 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 ∩ 𝑉2 ≠ ∅ 1 ∈ 1 , 𝑓2 ∈ 𝑓 𝑉 𝑉 2 } – If two embeddings share at least one node, nodes of overlap graph is connected 10
  • 11. Maximum Independent Set Support • Independent node set of Graph 𝐺 = (𝑉, 𝐸) – 𝐼 ⊆ 𝑉 𝑤𝑖𝑡ℎ ∀𝑢, 𝑣 ∈ 𝐼: 𝑢, 𝑣 ∈ 𝐸 – Maximum independent node set need not to be unique The size of The size of maximum independent node set : 1 maximum independent node set : 2 • MIS-support = size of maximum independent node set 11
  • 12. Harmful Overlap Support(1/3) • MIS-support – Considering any overlap as harmful • Overlap is Not necessarily harmful – Anti-monotone property is important 12
  • 13. Harmful Overlap Support(2/3) • Harmful Overlap Graph 𝐻 = (𝑉 𝐻 , 𝐸 𝐻 ) – 𝑉 𝐻 : set of embeddings as its node set – 𝐸 𝐻 = {(𝑓1 , 𝑓2 )|𝑓1 , 𝑓2 ∈ 𝑉 𝐻 ∧ 𝑓1 ≡ 𝑓2 ∧ 𝑉1 = 𝑉2 ∨ 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉1 = 𝑎𝑛𝑐𝑒𝑠𝑡𝑜𝑟𝑠 𝑜𝑓 𝑉2 𝑓1 ∈ 𝑉1 , 𝑓2 ∈ 𝑉2 } • HO-support In this case, MIS-support = 1, HO-support = 2 13
  • 14. Harmful Overlap(3/3) • Completing anti-monotone property #A : 2 #B : 3 #AB : 2 #BAB : 2 14
  • 15. Note • Harmful overlap is a weaker concept than simple overlap – HO-support is never lower than MIS-support 15
  • 16. Experiment • Support computation as Part of the MoSS(Molecular Substructure Miner) program – IC93 dataset[7] • 1283 molecules forms a connected component – Tic-Tac-Toc win dataset • This consists of 626 connected components 16
  • 17. Result • Vertical axis: Number of frequent subgraphs of which support exceeds threshold • Horizontal axis: Number of nodes (of pattern)? • In the case IC93 – Up to 30% more • Due to heavily overlapping with of carbon atoms • In the case Tic-Tac-Toe – Around 5 % more 17
  • 18. Agenda • Motivation • Summary of existing approaches • Support computations – Simple overlap – Harmful overlap – Minimum image • Comparison and Evaluation
  • 19. Minimum image based definition • Minimum image based support of p in g – Number of unique nodes mapped 1 2 Embeddings Unique 1 3 3 5 3 3 2 2 4 4 2 3 1 5 3 3 4 5
  • 20. Benefits I. Instead of 𝑂(𝑁 2 ) 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑠, 𝑂 𝑁 𝑑𝑎𝑡𝑎𝑠𝑒𝑡 II. No NP-compete MIS problem III. Not necessary to compute all occurrence, only for all nodes
  • 21. Agenda • Motivation • Summary of existing approaches • Support computations – Simple overlap – Harmful overlap – Minimum image • Comparison and Evaluation
  • 22. Embedding of a Pattern • 𝑃𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 = (𝑉𝑝 , 𝐸 𝑝 , 𝜆 𝑝 ) • 𝐷𝑎𝑡𝑎 𝑔𝑟𝑎𝑝ℎ 𝑔 = (𝑉𝑔 , 𝐸 𝑔 , 𝜆 𝑔 ) • 𝐸𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 𝜑: 𝑉𝑝 → 𝑉𝑔
  • 23. Three support measures • Simple Overlap – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝 ≠ ∅ • Harmful overlap – 𝑂𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒 𝜑 𝑎𝑛𝑑 𝜑 ′ 𝑜𝑓 𝑝𝑎𝑡𝑡𝑒𝑟𝑛 𝑝 𝑒𝑥𝑖𝑠𝑡𝑠 𝑖𝑓 ∃𝑣 ∈ 𝑉𝑝 : 𝜑 𝑣 , 𝜑′(𝑣) ∈ 𝜑(𝑉𝑝 ) ∩ 𝜑 ′ 𝑉𝑝 • Minimum image based support of p in g – 𝜎3 𝑝, 𝑔 = min |{𝜑 𝑖 𝑣 : 𝜑 𝑖 𝑖𝑠 𝑎𝑛 𝑒𝑚𝑏𝑒𝑑𝑑𝑖𝑛𝑔 𝑜𝑓 𝑝 𝑖𝑛 𝑔}| 𝑣∈𝑉 𝑝
  • 24. Comparison 𝜎1 = 1 < 𝜎2 = 2 < 𝜎3 = 3 Overlap harmful overlap Minimum image
  • 25. Experimental Setting • Comparisons of Image-based and overlap- based algorithms • Dataset – WebKB dataset (4 large graphs of structure of web pages)
  • 27. Conclusion • Conclusion – Overlap based support measure that is anti- monotone – Maximum image based algorithm that is more efficient than previous ones