Networks represent a type of dataset that is ubiquitous in many
disciplines and areas. Examples are social networks (ties between
people), communication networks, trophic networks ("who eats who"), the
World Wide Web, computer networks, lexical networks (connections between
words), transport networks, metabolic networks (e.g., interactions
between proteins), neural networks, animal networks, citation networks,
affiliation networks (of people in groups), software dependency
networks, and many more. In this talk, we present ongoing work on
answering the question "Can the type of network be detected from the
network structure alone?" For instance, given a completely unlabeled
network dataset consisting only of node and edges, can we detect whether
the data represents a social network or a hyperlink network? We present
machine learning and statistical approaches to answering questions of
this type. The presented results will make use of data in the KONECT
project, one of the largest repositories of network datasets, curated at
the University of Namur.
Similaire à Title: What Is the Difference between a Social and a Hyperlink Network? -- How the Type of Network Can Be Determined from the Network Structure Alone
Similaire à Title: What Is the Difference between a Social and a Hyperlink Network? -- How the Type of Network Can Be Determined from the Network Structure Alone (20)
Title: What Is the Difference between a Social and a Hyperlink Network? -- How the Type of Network Can Be Determined from the Network Structure Alone
1. naXys – Namur Centre for Complex Networks – Univ. of Namur
What Is the Difference between a
Social and a Hyperlink Network?
How the Type of Network Can Be Determined from the
Network Structure Alone
Jérôme KUNEGIS
University of Oxford, Department of Statistics, 2017-09-12
3. “Network Category” 3J. Kunegis
Networks Are Everywhere
Cliché: “Everything is a Network”
It's a cliché because it's true:
– Social network, road network, lexical network, metabolic
network, trophic network, affiliation network, citation
network, hyperlink network, etc., etc., etc.
4. “Network Category” 4J. Kunegis
Network Categories
From https://github.com/kunegis/konect-handbook
5. “Network Category” 5J. Kunegis
Collections of Network Datasets
SNAP
– by Jure Leskovec, Stanford Univ. (~2009)
– several 100 networks; not systematic
– Available for download
– Some statistics available
KONECT
– by Jérôme Kunegis, Univ. of Namur (~2011)
– 1000+ networks, but only 200+ unipartite
– Most networks available for download
– Many statistics available
ICON
– by Aaron Clauset, Univ. of Colorado (~2016)
– 4000+ datasets
– Not available for download (“index”)
6. “Network Category” 6J. Kunegis
Datasets in KONECT In this work: 165
non-bipartite networks
(out of 194 non-bip.
networks in KONECT)
7. “Network Category” 7J. Kunegis
Network Statistics
A statistic is a real number that characterizes a
network
Examples:
– Average degree (d)
– Number of triangles (t)
– Diameter (δ)
– Clustering coefficient (c)
– Gini coefficient of degree distribution (G)
– Degree assortativity (ρ)
8. “Network Category” 8J. Kunegis
More Statistics
– Number of wegdes (s)
– Number of squares (q)
– Number of claws (z)
– Number of crosses (x)
– Maximum degree (dmax)
– Relative maximum degree (dMR = dmax / d)
– Number of degree-1 nodes (d )₁
– 50-percentile effective diameter (δ0.5)
– Relative edge distribution entropy (Her)
– Bipartivity (bA = 1 – λmin[A] / λmax[A])
– Normalized two-star count (sd = s / (n d (d – 1) / 2))
– Eigenvalues of certain matrices (a = λ2[L], |λmax[A]|, …)
– etc.
9. “Network Category” 9J. Kunegis
Distribution of Clustering Coefficient (c)
Communication
Interaction
Hyperlink
Online social
10. “Network Category” 10J. Kunegis
Distribution of Gini Coefficient (G)
Online social
Infrastructure
Interaction
Hum
an
social
11. “Network Category” 11J. Kunegis
Distribution of Diameter (δ)
Infrastructure
Hyperlink
Citation
13. “Network Category” 13J. Kunegis
Statistical Testing
Kolmogorov–Smirnov test on each pair of categories; non-white cell when statistic is
significantly different (p < 0.10). Base colour by HSL: Hue denotes network statistic; S & L is
constant. Shown colour is interpolated between base colour and white for 0 ≤ p ≤ 0.10.
Statistics (fixed position):
17. “Network Category” 17J. Kunegis
Feature Engineering
Find size-independent formulations of statistics
– E.g., c instead of t
Avoid highly correlated statistics
– E.g., keep only one of G and P
Find statistics that are easy to compute
– E.g., algebraic connectivity (a) needs O(n²) runtime
18. “Network Category” 18J. Kunegis
Thank You
What We Want:
– More datasets, in particular, more diverse categories!
– More statistics: both ideas, and code
Contribute
– konect.math.fundp.ac.be (temporary URL!)
– Ask me about Stu: our build tool for doing all of this
https://github.com/kunegis/konecttoolbox
https://github.com/kunegis/konectanalysis
https://github.com/kunegis/konectextr
https://github.com/kunegis/konecthandbook
https://github.com/kunegis/konectwww
https://github.com/kunegis/stu
For more news about KONECT: follow @KONECTproject
Jérôme Kunegis <jerome.kunegis@unamur.be>