Social network graphs are node-link (vertex-edge; entity-relationship) diagrams that show relationships between people and groups. Open-source tools like NodeXL Basic (available on Microsoft’s CodePlex) enable the capture of network data from select social media platforms through third-party add-ons and social media APIs. From social groups, relational clusters are extracted with clustering algorithms which identify intensities of connections. Visually, structural relational data is conveyed with layout algorithms in two-dimensional space. Using these various layout options and built-in visual design features, it is possible to aesthetically “deform” the network graph data for visual effects. This presentation introduces novel datasets and novel data visualizations.
Formations & Deformations of Social Network Graphs
1. Shalin Hai-Jew
Kansas State University
Aesthesia
March 2, 2017
Marianna Kistler Beach Museum of Art
Kansas State University
(updated)
2. Social network graphs are node-link
(vertex-edge; entity-relationship)
diagrams that show relationships
between people and groups. Open-
source tools like NodeXL Basic
(available on Microsoft’s CodePlex)
enable the capture of network data
from select social media platforms
through third-party add-ons and
social media APIs. From social
groups, relational clusters are
extracted with clustering algorithms
which identify intensities of
connections. Visually, structural
relational data is conveyed with
layout algorithms in two-dimensional
space. Using these various layout
options and built-in visual design
features, it is possible to aesthetically
“deform” the network graph data for
visual effects. This presentation
introduces novel datasets and novel
data visualizations.
2
node
(vertex)
(ego, entity)
link
(edge)
(relationship)
3. Network Graph
Challenges
Challenge 1:
Can you spot the nodes and the
links in the following network
graphs (particularly in the
deformed ones)?
Challenge 2:
How many network graphs are
in this slideshow?
(Of course, some are hidden.)
3
4. SETTING THE STAGE:
“NATURAL” FORMS of NETWORK
GRAPHS
Part 1: Formations w/
alphanumeric labels
to get a sense of what social network
graphs look like
Part 2: Formations w/o
alphanumeric labels
to get a sense of layout algorithms and
grouping algorithms
NETWORK GRAPH DEFORMATIONS
Part 3: Deformations
to get a sense of what’s possible with the
data visualizations
4
36. Data worksheets: Edges, vertices, groups,
group vertices, overall (summary) metrics,
and additional worksheets depending on
the social media data source
All expressed in row data in related worksheets
Basic edge data: Dyadic followership,
relational reciprocation, relationship type,
dates (UTC) of the relationship, URLs,
#hashtags, and others
Basic vertex data: Name, image URLs, in-
degree, out-degree, betweenness
centrality, closeness centrality,
eigenvector centrality, PageRank,
clustering coefficient, reciprocated vertex
pair ratio, and others
Clustering: Group (cluster) partitioning by
empirically observed low-dimension
distance-based measures (which are
variable)
Motifs: Mini-subgraphs that show dyadic,
triadic, quadratic,…node relationships in
the social network (as a general
definition)
In NodeXL, the “Group by Motif” visualization
shows three types of small-group node
relationships: fan motifs (a central node as a
connector to otherwise unconnected nodes), D-
connector motifs (dyadic nodes connected by
multiple intermediary nodes), and clique motifs
(with interconnected nodes)
36
37. 37
anything goes…
except no outright manual manipulation of
the image or its elements…
except no placement of an external
background image in the graph pane…
except no faux or simulated data…
except no data manipulation…
except no visual editing or post-production
outside the tool…
except no graph image rotation…
except no inclusion of words or lettering or
numbering…
55. 1. Extracting social network data
from a social media platform (via
third-party add-ons to NodeXL)
2. Data processing
• Processing graph metrics
• Identifying sub-structures such as
groups or clusters, motifs, or connected
components (through clustering
algorithms)
3. Creating graph visualizations in
the graph pane with layout
algorithms
4. Analyzing the data visualizations
5. Deforming the visualizations
based on the NodeXL tools alone
55
73. Selection of data for extraction from social
media platform
…with rate limiting, built-in tool limits, and other
limits, and …with user-set parameters for the
data extraction type and seeding terms
Data limiting
Selection of data processing measures
Selection of grouping algorithm
Layout algorithm
Autofill selections from columns
Dynamic filters
Group effects
Graph options
Scale
Zoom
Layout iteration (with or without updates
to the data processing)
Element selection / highlighting
Resizing the graph pane
Fluorescing colors
with RGB (red, green, blue) or HSL (hue,
saturation, lightness) swap outs
…all within data limits, machine
processing limits, and parameter pre-sets
in the software (at every step in the
sequence)…on glass monitors with light-
emitting phosphors
73
91. data structures, clustering
algorithms, and layout
algorithms account for a
majority of the visual
differences and effects…
and the decorative elements in
the visuals affect only a small
portion
91
110. it’s the residua that enable
some of the cooler visual
effects;
however, there are some
“anchor points,” too, beyond
which changes cannot be made;
in experimentation, it’s easy to
end up in a visually
irredeemable place, but the
“reset all” buttons exist for a
reason
110
128. sometimes,
a glance at network graph
metrics is sufficient to let you
know what visualization
possibilities are available (with
enough experience);
sometimes,
a glance at network graph
metrics can be highly
misleading about what
visualizations are possible
128
129. also, an early graph map of the
data (without graph metrics,
without grouping) can also
reveal a lot about the data’s
social groupness and
connectivity;
the “hard fun” in this endeavor
though is the pursuit of visual
surprise
129
147. …but first experiment very broadly
to actually learn the tool and its
behaviors (don’t lock in to some
“go to’s” simply because you know
how those work)
…some data extractions take days,
and processing some visualizations
from large datasets can take days
(so schedule time on backup
computers)
147
165. …all available underlying data
should be processed because
there may be interesting
patterns available for discovery
…graph metrics include vertex
degree, in-degree, out-degree,
betweenness and closeness
centralities, eigenvector
centrality, PageRank, edge
reciprocation, group metrics,
and other details;
…there is also geolocational
data, time data, and other
scraped data
…there are subgraph images
165
183. …and it’s pretty important to visualize
data in different ways (and with text
labels) to exploit all the meaning that
can be found in that data…and to
engage with
the underlying data, and not
the visualization alone per se
…all network graphs have to be read
along with the underlying datasets for
actual research-quality meaning;
183
201. …often without any reference to
x- or y- axes just spatiality in a
two-dimensional plane
…where physical proximity
sometimes matters
…where sizes of objects
sometimes matter
…where colors of objects
sometimes matter
…where connecting lines always
matter
…where arrows on the ends of
lines always matter
201
219. But why?
…focusing on the journey, not the destination
…learning the tool, every last function and the
practical and theoretical limits
…understanding how data and graph data
metrics relate to visualizations (and gaining a
sense of how data visualizations are perceived
and what they communicate)
…making the tool do things that its maker(s)
did not intend (albeit in a friendly sense)
…enjoying graphs that look like signals but are
actually just pleasant noise (with a small
amount of actual information)
…digital doodling and pretty for pretty’s sake
But why not?
…a cost in time
…a cost in computer processing
…alarming the makers of the tool with the
network graph “hairballs”
219
220. …
When deforming social network graphs, you’re
playing to the following:
the social media platform (and how people
are using it at that particular slice-in-time)
the extracted social data (and serendipitous
aspects of that data)
the software (NodeXL and APIs)
how people perceive visually and their
tendency to see patterns
your innate need for play, and
your enjoyment at amusing others
The trick…and the secret
Huh, how did I get here?!
The trick is to remember your way in and
your way out of the deformations (which
ultimately means you learn the tool and its
many functionalities and how to
troubleshoot within the tool)
The secret is that the eye candy (deformed
network graphs) is to motivate the learning
and defuse learning frustrations (while
learning network analysis and NodeXL) and
to increase learning persistence
220
238. The presenter is using a version of
NodeXL that is between the free
and open-source NodeXL Basic (a
limited version) and the function-
added commercial NodeXL Pro…
on Excel 2016.
NodeXL stands for Network Overview,
Discovery and Exploration for Excel. This
“template” add-on to Excel was formerly
known as NetViz.
There is a server version available.
The NodeXL template / add-on is available
on Microsoft’s CodePlex site.
The third-party add-ons to NodeXL
enable access to social media
application programming
interfaces (APIs) and open-source
structures like MediaWiki.
Data captured include unstructured
(image), semi-structured (text), and
structured data (numerical).
238
239. How to define relationship and
depth of relationship
Frequencies and types of interactions
Conveying relationships with
shapes, lines, and placement in 2D
space
Emplace data objects with some
likelihood of covering the 2D space and
with some balance (but not symmetry)
Using colors strategically and non-
offensively (general neutrality for edges,
color for vertices)
Uses shapes and shape sizes strategically
and non-offensively
Require data limits to enable
visualization in fixed physical space
(2D and 3D)
Require alignment with human
visual capabilities and visual sense-
making (and understanding the
limits of perception with dense
data and visual occlusions)
239
240. All data visualizations are original
and based on unique social media
datasets. The data visualizations
here are not from any prior
presentation or publication.
Sometimes, one dataset was used for
multiple data visualizations.
The social network platforms used here
include Twitter (microblogging site),
Wikipedia (MediaWiki understructure, a
crowd-sourced online encyclopedia), and
Flickr (video and image-sharing site).
The social network graph types include
#hashtag networks, keyword search
networks, user networks, related tags
networks, and article-article networks.
240
241. The social networks are all
directional single-mode graphs.
The direction of relationships are
indicated.
The nodes represent one type of a thing
instead of multiple types.
Clusters are created based on
inter-relating around topics of
shared interest, and such clusters
are captured through unsupervised
learning.
Groups are not pre-labeled with any
classification but are just “Group 1,”
“Group 2,” and such in descending order.
There are ways to cluster by vertex
attribute, connected component, or other
methods.
These included sociograms that…
consist of 30 – 100,000+ nodes/vertices
each
contain 1 – 8,000+ groups (which vary
based on which clustering algorithms are
used).
consist of 1 - 1.5 - 2 degrees when degree
is a definable parameter in the data
extraction.
241
257. So social network graphs represent
people’s relationships based on
various types of relating; they may be
understood
at global network scale (most broad level)
as various mixes of subgroups and motifs
as (egos) (most granular level)
Relationships may be one-to-oneself
(isolate, reflexive self-loop), one-to-
one (dyadic), one-to-several, one-to-
many, several-to-several, several-to-
many, many-to-one, many-to-several,
many-to-many
Relationships cost, so people are selective
when they connect… There is a trust
premium in every connection.
Relationships may benefit their members, so
people are strategic and tactical when they
connect…
Relationships are dynamic and changing over
time, with varying levels of speed-of-change
(especially on social media platforms).
In a social ecology, the
interrelationships often determine
power and capabilities; resource distribution;
information sharing
257
258. “Relating” online include the
following:
Undeclared transient (ad hoc)
relationships:
replying to, retweeting, commenting,
mentioning, collaboration, co-funding,
co-authorship, co-editing, co-tagging
digital contents, and others
Declared formalized and announced
relationships:
following, un-following, friending,
unfriending, relational status updates,
and others
258
259. General types of available data on
social media include the following:
Content data: text messages, audio,
photos, video, shared digital objects, and
others
Trace data: who interacts with whom
(which enables drawing of the social
network graphs), when messages are
shared, when accounts are created, when
accounts are closed, and others
Metadata: locational information, “folk”
tags linked to digital contents, auto-tags
linked to digital contents, system
information of those contacting social
media platforms, and others
259
277. In general, people relate and connect
around shared interests and
similarity (homophily).
Human similarity can be a predictor of long-
term relating and bonding.
Some relate around heterophily or
interpersonal differences.
In terms of online fame, the power
law applies—with a few garnering
most of the followership and
attention.
Then the rest of the frequency curve involves
a long tail of those with few close friends.
Actual reciprocal relationships are not so
common. The followed do not often follow-
back.
On social media, people often pose (perform
socially) and over-share for imagined
audiences.
One-to-many virality does not truly
exist.
It’s often the bigger entities (governments,
corporations) that are the ones that push
designed messages one-to-many that often
create trending topics.
Social influence is concentrated at cores.
277
278. Over time, there are predictable
evolutionary patterns with online
social networks.
For one, “isolates” (singletons),
“whiskers,” and subgroups either meld
with a larger connected component in an
online community, or they simply
disappear. In other words, nodes move to
the core and connect with the social mass
or move out of that particular network.
Mainline interests have to converge for
people to continue participation in a
community.
278
279. Virtual relationships are
ephemeral, with varying degrees of
friending and unfriending.
Average length of FB relationships are said
to be about three years.
People looking for romance are
often entranced by ‘bots, who
stand in for actual people. People
may be “catfished” into
“relationships” by automata
(scripts).
Also, a majority of people who encounter
Twitter ‘bots are unable to tell that they
are not people and will accept them as
friends (and give them access to their real
social networks).
Predictive analytics have been
applied to the length of people’s
romantic relationships with fairly
high accuracy.
The time period length of initial
interactivity is one indicator of overall
relationship longevity.
279
280. Individually, people can be fairly
accurately profiled psychologically
by what they post online.
People’s social circles may be used to
profile the individual even if he / she does
not have a direct online presence.
There are geographical effects on
virtual connectivity. The physical
real has effects on the virtual.
Likewise, language and culture have
effects on social media usage.
The cyber-physical confluence exists.
People’s geolocational check-ins
have been used to profile
individuals as to their lifestyles and
behaviors because people tend
towards habitual “patterns of life”
and times/places where they are
comfortable.
With a few data points, people’s
likelihood of being in a particular place at
a certain time may be projected with
fairly high accuracy into the future (out
about a little over a year).
280
298. Dr. Shalin Hai-Jew
Instructional Designer
iTAC, Kansas State University
785-532-5262
shalin@k-state.edu
For more information about social
network graphs and the analytics aspects,
please see “Beauty as a Bridge to NodeXL”
(on SlideShare).
Thanks to Dr. Brent Chamberlain and the
sponsors of “Aesthesia” for including me.
Thanks also to the Social Media Research
Foundation (SMRF), which promotes
“Open Tools, Open Data, Open Scholarship
for Social Media” and enables free and
open access to NodeXL.
The presenter has no tie to either SMRF or
CodePlex.
Challenge 2: So how many graphs are in
this slideshow? 372.
298