Exploring the Future Potential of AI-Enabled Smartphone Processors
NS - CUK Seminar : V.T.Hoang, Review on "Structure-Aware Transformer for Graph Representation Learning", ICML 2022
1. Van Thuy Hoang
Dept. of Artificial Intelligence,
The Catholic University of Korea
hoangvanthuy90@gmail.com
Proceedings of the 39th International Conference on Machine Learning
2. 2
Background
Graph Embedding tries to map graph vertices into a low-dimensional vector
space under the condition of preserving different types of graph properties.
Node classification
Link Prediction
Network Visualization
Community detection
…...
3. 3
Background
Unsupervised vs. Supervised
DeepWalk, LINE, node2vec, etc.
GCN, GraphSAGE, etc.
Euclidean vs. Non-Euclidean
Hyperbolic space (Tag2Vec, WWW’19)
Vector vs. Distribution
Using variance to model uncertainty of semantic
4. 4
Message passing GNNs use neighbourhood aggregation
Limitations:
Modeling long range dependencies
Strong structural inductive bias
Over smoothing
Over squashing
We need architectures beyond aggregation
5. 5
Graph transformer could address some limitations of MP
Key idea:
Encode structural graph
extracting a subgraph representation centered around each node
Encode positional relationship between nodes in Tranformer
6. 6
Contributions
A self attention to deal local structures by extracting a subgraph rooted at
each nodes
Can leverage any GNN to extract subgraph and create structure aware node
representation
An effortless enchncer of any GNNs
7. 7
Overview of an example SAT layer
Overview of an example SAT layer that uses the k-subgraph GNN
extractor as its structure extractor.
9. 9
Structure-Aware Self-Attention
k-subtree GNN extractor
to extract local structural information at node u is to apply any
existing GNN model and take the output node representation at
u as the subgraph representation at u
k-subgraph GNN extractor
to use a GNN to directly compute the representation of the entire
k-hop subgraph centered at u
11. 11
SAT vs sparse GNN
Since SAT uses a GNN to extract structures, compare the
performance of the original sparse GNN to SAT which uses that GNN
(“base GNN”)
12. 12
Theorem
The distance between their representations after the structure-aware
attention is bounded by: