Formations & Deformations of Social Network Graphs

Shalin Hai-Jew
Kansas State University
Aesthesia
March 2, 2017
Marianna Kistler Beach Museum of Art
Kansas State University
(updated)

 Social network graphs are node-link
(vertex-edge; entity-relationship)
diagrams that show relationships
between people and groups. Open-
source tools like NodeXL Basic
(available on Microsoft’s CodePlex)
enable the capture of network data
from select social media platforms
through third-party add-ons and
social media APIs. From social
groups, relational clusters are
extracted with clustering algorithms
which identify intensities of
connections. Visually, structural
relational data is conveyed with
layout algorithms in two-dimensional
space. Using these various layout
options and built-in visual design
features, it is possible to aesthetically
“deform” the network graph data for
visual effects. This presentation
introduces novel datasets and novel
data visualizations.
2
node
(vertex)
(ego, entity)
link
(edge)
(relationship)

Network Graph
Challenges
Challenge 1:
Can you spot the nodes and the
links in the following network
graphs (particularly in the
deformed ones)?
Challenge 2:
How many network graphs are
in this slideshow?
(Of course, some are hidden.)
3

SETTING THE STAGE:
“NATURAL” FORMS of NETWORK
GRAPHS
 Part 1: Formations w/
alphanumeric labels
 to get a sense of what social network
graphs look like
 Part 2: Formations w/o
alphanumeric labels
 to get a sense of layout algorithms and
grouping algorithms
NETWORK GRAPH DEFORMATIONS
 Part 3: Deformations
 to get a sense of what’s possible with the
data visualizations
4

6
mass_media article network on Wikipedia (1 deg.)

7
#media hashtag network on Twitter

8

9

10

11

12
#food hashtag network on Twitter (lim. 200 Tweets)

13#media related tags network on Flickr (1.5 deg.), with subgraph images

14“life” keyword search on Twitter, basic network, with subgraph images

15
(without alphanumeric labels)
(based on common built-in
layout algorithms…
and clustering representations)

polar graph layout algorithm
polar absolute layout algorithm
26

27
treemap
(grouping /
clustering)
(with
Sugiyama
layout
of groups /
clusters)

28
packed
rectangles
(grouping /
clustering)
(with grid
layout
of groups /
clusters)

29
packed
rectangles
(grouping /
clustering)
(with Harel-
Koren
Fast Multiscale
layout
of groups /
clusters)

30
force-
directed
(grouping /
clustering)
(with
Fruchterman-
Reingold
force-based
layout
of groups /
clusters)

31
force-
directed
(grouping /
clustering)
(with
Fruchterman-
Reingold
layout
of groups /
clusters)

32
force-
directed
(grouping /
clustering)
(with
Fruchterman-
Reingold
layout
of groups /
clusters)

Clauset-Newman-Moore
Wakita-Tsurumi
33

Girvan-Newman
(for smaller graphs)
Connected Components
34

Motifs
(subgraph micro structures)
Vertex Attribute: PageRank
35

 Data worksheets: Edges, vertices, groups,
group vertices, overall (summary) metrics,
and additional worksheets depending on
the social media data source
 All expressed in row data in related worksheets
 Basic edge data: Dyadic followership,
relational reciprocation, relationship type,
dates (UTC) of the relationship, URLs,
#hashtags, and others
 Basic vertex data: Name, image URLs, in-
degree, out-degree, betweenness
centrality, closeness centrality,
eigenvector centrality, PageRank,
clustering coefficient, reciprocated vertex
pair ratio, and others
 Clustering: Group (cluster) partitioning by
empirically observed low-dimension
distance-based measures (which are
variable)
 Motifs: Mini-subgraphs that show dyadic,
triadic, quadratic,…node relationships in
the social network (as a general
definition)
 In NodeXL, the “Group by Motif” visualization
shows three types of small-group node
relationships: fan motifs (a central node as a
connector to otherwise unconnected nodes), D-
connector motifs (dyadic nodes connected by
multiple intermediary nodes), and clique motifs
(with interconnected nodes)
36

37
anything goes…
except no outright manual manipulation of
the image or its elements…
except no placement of an external
background image in the graph pane…
except no faux or simulated data…
except no data manipulation…
except no visual editing or post-production
outside the tool…
except no graph image rotation…
except no inclusion of words or lettering or
numbering…

1. Extracting social network data
from a social media platform (via
third-party add-ons to NodeXL)
2. Data processing
• Processing graph metrics
• Identifying sub-structures such as
groups or clusters, motifs, or connected
components (through clustering
algorithms)
3. Creating graph visualizations in
the graph pane with layout
algorithms
4. Analyzing the data visualizations
5. Deforming the visualizations
based on the NodeXL tools alone
55

 Selection of data for extraction from social
media platform
 …with rate limiting, built-in tool limits, and other
limits, and …with user-set parameters for the
data extraction type and seeding terms
 Data limiting
 Selection of data processing measures
 Selection of grouping algorithm
 Layout algorithm
 Autofill selections from columns
 Dynamic filters
 Group effects
 Graph options
 Scale
 Zoom
 Layout iteration (with or without updates
to the data processing)
 Element selection / highlighting
 Resizing the graph pane
 Fluorescing colors
 with RGB (red, green, blue) or HSL (hue,
saturation, lightness) swap outs
 …all within data limits, machine
processing limits, and parameter pre-sets
in the software (at every step in the
sequence)…on glass monitors with light-
emitting phosphors
73

data structures, clustering
algorithms, and layout
algorithms account for a
majority of the visual
differences and effects…
and the decorative elements in
the visuals affect only a small
portion
91

it’s the residua that enable
some of the cooler visual
effects;
however, there are some
“anchor points,” too, beyond
which changes cannot be made;
in experimentation, it’s easy to
end up in a visually
irredeemable place, but the
“reset all” buttons exist for a
reason
110

sometimes,
a glance at network graph
metrics is sufficient to let you
know what visualization
possibilities are available (with
enough experience);
sometimes,
a glance at network graph
metrics can be highly
misleading about what
visualizations are possible
128

also, an early graph map of the
data (without graph metrics,
without grouping) can also
reveal a lot about the data’s
social groupness and
connectivity;
the “hard fun” in this endeavor
though is the pursuit of visual
surprise
129

…but first experiment very broadly
to actually learn the tool and its
behaviors (don’t lock in to some
“go to’s” simply because you know
how those work)
…some data extractions take days,
and processing some visualizations
from large datasets can take days
(so schedule time on backup
computers)
147

…all available underlying data
should be processed because
there may be interesting
patterns available for discovery
…graph metrics include vertex
degree, in-degree, out-degree,
betweenness and closeness
centralities, eigenvector
centrality, PageRank, edge
reciprocation, group metrics,
and other details;
…there is also geolocational
data, time data, and other
scraped data
…there are subgraph images
165

…and it’s pretty important to visualize
data in different ways (and with text
labels) to exploit all the meaning that
can be found in that data…and to
engage with
the underlying data, and not
the visualization alone per se
…all network graphs have to be read
along with the underlying datasets for
actual research-quality meaning;
183

…often without any reference to
x- or y- axes just spatiality in a
two-dimensional plane
…where physical proximity
sometimes matters
…where sizes of objects
sometimes matter
…where colors of objects
sometimes matter
…where connecting lines always
matter
…where arrows on the ends of
lines always matter
201

But why?
 …focusing on the journey, not the destination
 …learning the tool, every last function and the
practical and theoretical limits
 …understanding how data and graph data
metrics relate to visualizations (and gaining a
sense of how data visualizations are perceived
and what they communicate)
 …making the tool do things that its maker(s)
did not intend (albeit in a friendly sense)
 …enjoying graphs that look like signals but are
actually just pleasant noise (with a small
amount of actual information)
 …digital doodling and pretty for pretty’s sake
But why not?
 …a cost in time
 …a cost in computer processing
 …alarming the makers of the tool with the
network graph “hairballs”
219

…
 When deforming social network graphs, you’re
playing to the following:
 the social media platform (and how people
are using it at that particular slice-in-time)
 the extracted social data (and serendipitous
aspects of that data)
 the software (NodeXL and APIs)
 how people perceive visually and their
tendency to see patterns
 your innate need for play, and
 your enjoyment at amusing others
The trick…and the secret
 Huh, how did I get here?!
 The trick is to remember your way in and
your way out of the deformations (which
ultimately means you learn the tool and its
many functionalities and how to
troubleshoot within the tool)
 The secret is that the eye candy (deformed
network graphs) is to motivate the learning
and defuse learning frustrations (while
learning network analysis and NodeXL) and
to increase learning persistence
220

 The presenter is using a version of
NodeXL that is between the free
and open-source NodeXL Basic (a
limited version) and the function-
added commercial NodeXL Pro…
on Excel 2016.
 NodeXL stands for Network Overview,
Discovery and Exploration for Excel. This
“template” add-on to Excel was formerly
known as NetViz.
 There is a server version available.
 The NodeXL template / add-on is available
on Microsoft’s CodePlex site.
 The third-party add-ons to NodeXL
enable access to social media
application programming
interfaces (APIs) and open-source
structures like MediaWiki.
 Data captured include unstructured
(image), semi-structured (text), and
structured data (numerical).
238

 How to define relationship and
depth of relationship
 Frequencies and types of interactions
 Conveying relationships with
shapes, lines, and placement in 2D
space
 Emplace data objects with some
likelihood of covering the 2D space and
with some balance (but not symmetry)
 Using colors strategically and non-
offensively (general neutrality for edges,
color for vertices)
 Uses shapes and shape sizes strategically
and non-offensively
 Require data limits to enable
visualization in fixed physical space
(2D and 3D)
 Require alignment with human
visual capabilities and visual sense-
making (and understanding the
limits of perception with dense
data and visual occlusions)
239

 All data visualizations are original
and based on unique social media
datasets. The data visualizations
here are not from any prior
presentation or publication.
 Sometimes, one dataset was used for
multiple data visualizations.
 The social network platforms used here
include Twitter (microblogging site),
Wikipedia (MediaWiki understructure, a
crowd-sourced online encyclopedia), and
Flickr (video and image-sharing site).
 The social network graph types include
#hashtag networks, keyword search
networks, user networks, related tags
networks, and article-article networks.
240

 The social networks are all
directional single-mode graphs.
 The direction of relationships are
indicated.
 The nodes represent one type of a thing
instead of multiple types.
 Clusters are created based on
inter-relating around topics of
shared interest, and such clusters
are captured through unsupervised
learning.
 Groups are not pre-labeled with any
classification but are just “Group 1,”
“Group 2,” and such in descending order.
 There are ways to cluster by vertex
attribute, connected component, or other
methods.
 These included sociograms that…
 consist of 30 – 100,000+ nodes/vertices
each
 contain 1 – 8,000+ groups (which vary
based on which clustering algorithms are
used).
 consist of 1 - 1.5 - 2 degrees when degree
is a definable parameter in the data
extraction.
241

 So social network graphs represent
people’s relationships based on
various types of relating; they may be
understood
 at global network scale (most broad level)
 as various mixes of subgroups and motifs
 as (egos) (most granular level)
 Relationships may be one-to-oneself
(isolate, reflexive self-loop), one-to-
one (dyadic), one-to-several, one-to-
many, several-to-several, several-to-
many, many-to-one, many-to-several,
many-to-many
 Relationships cost, so people are selective
when they connect… There is a trust
premium in every connection.
 Relationships may benefit their members, so
people are strategic and tactical when they
connect…
 Relationships are dynamic and changing over
time, with varying levels of speed-of-change
(especially on social media platforms).
 In a social ecology, the
interrelationships often determine
 power and capabilities; resource distribution;
information sharing
257

 “Relating” online include the
following:
 Undeclared transient (ad hoc)
relationships:
 replying to, retweeting, commenting,
mentioning, collaboration, co-funding,
co-authorship, co-editing, co-tagging
digital contents, and others
 Declared formalized and announced
relationships:
 following, un-following, friending,
unfriending, relational status updates,
and others
258

 General types of available data on
social media include the following:
 Content data: text messages, audio,
photos, video, shared digital objects, and
others
 Trace data: who interacts with whom
(which enables drawing of the social
network graphs), when messages are
shared, when accounts are created, when
accounts are closed, and others
 Metadata: locational information, “folk”
tags linked to digital contents, auto-tags
linked to digital contents, system
information of those contacting social
media platforms, and others
259

 In general, people relate and connect
around shared interests and
similarity (homophily).
 Human similarity can be a predictor of long-
term relating and bonding.
 Some relate around heterophily or
interpersonal differences.
 In terms of online fame, the power
law applies—with a few garnering
most of the followership and
attention.
 Then the rest of the frequency curve involves
a long tail of those with few close friends.
 Actual reciprocal relationships are not so
common. The followed do not often follow-
back.
 On social media, people often pose (perform
socially) and over-share for imagined
audiences.
 One-to-many virality does not truly
exist.
 It’s often the bigger entities (governments,
corporations) that are the ones that push
designed messages one-to-many that often
create trending topics.
 Social influence is concentrated at cores.
277

 Over time, there are predictable
evolutionary patterns with online
social networks.
 For one, “isolates” (singletons),
“whiskers,” and subgroups either meld
with a larger connected component in an
online community, or they simply
disappear. In other words, nodes move to
the core and connect with the social mass
or move out of that particular network.
 Mainline interests have to converge for
people to continue participation in a
community.
278

 Virtual relationships are
ephemeral, with varying degrees of
friending and unfriending.
 Average length of FB relationships are said
to be about three years.
 People looking for romance are
often entranced by ‘bots, who
stand in for actual people. People
may be “catfished” into
“relationships” by automata
(scripts).
 Also, a majority of people who encounter
Twitter ‘bots are unable to tell that they
are not people and will accept them as
friends (and give them access to their real
social networks).
 Predictive analytics have been
applied to the length of people’s
romantic relationships with fairly
high accuracy.
 The time period length of initial
interactivity is one indicator of overall
relationship longevity.
279

 Individually, people can be fairly
accurately profiled psychologically
by what they post online.
 People’s social circles may be used to
profile the individual even if he / she does
not have a direct online presence.
 There are geographical effects on
virtual connectivity. The physical
real has effects on the virtual.
 Likewise, language and culture have
effects on social media usage.
 The cyber-physical confluence exists.
 People’s geolocational check-ins
have been used to profile
individuals as to their lifestyles and
behaviors because people tend
towards habitual “patterns of life”
and times/places where they are
comfortable.
 With a few data points, people’s
likelihood of being in a particular place at
a certain time may be projected with
fairly high accuracy into the future (out
about a little over a year).
280

 Dr. Shalin Hai-Jew
 Instructional Designer
 iTAC, Kansas State University
 785-532-5262
 shalin@k-state.edu
 For more information about social
network graphs and the analytics aspects,
please see “Beauty as a Bridge to NodeXL”
(on SlideShare).
 Thanks to Dr. Brent Chamberlain and the
sponsors of “Aesthesia” for including me.
 Thanks also to the Social Media Research
Foundation (SMRF), which promotes
“Open Tools, Open Data, Open Scholarship
for Social Media” and enables free and
open access to NodeXL.
 The presenter has no tie to either SMRF or
CodePlex.
 Challenge 2: So how many graphs are in
this slideshow? 372.
298

Formations & Deformations of Social Network Graphs

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (14)

En vedette

En vedette (20)

Similaire à Formations & Deformations of Social Network Graphs

Similaire à Formations & Deformations of Social Network Graphs (20)

Plus de Shalin Hai-Jew

Plus de Shalin Hai-Jew (20)

Dernier

Dernier (20)

Formations & Deformations of Social Network Graphs