Big Data Visualization
Kwan-Liu Ma
Professor of Computer Science and Chair of the Graduate Group in Computer Science (GGCS) at the University of California-Davis
January 22nd 2014
We are entering a data-rich era. Advanced computing, imaging, and sensing technologies enable scientists to study natural and physical phenomena at unprecedented precision, resulting in an explosive growth of data. The size of the collected information about the Web and mobile device users is expected to be even greater. To make sense and maximize utilization of such vast amounts of data for knowledge discovery and decision making, we need a new set of tools beyond conventional data mining and statistical analysis. One such a tool is visualization. I will present visualizations designed for gleaning insight from massive data and guiding complex data analysis tasks. I will show case studies using data from cyber/homeland security, large-scale scientific simulations, medicine, and sociological studies.
Big Data Visualization Meetup - South Bay
http://www.meetup.com/Big-Data-Visualisation-South-Bay/
6. Large
Scien>fic
Data
Visualiza>on
• In
situ
visualiza>on
• Parallel
visualiza>on
that
is
highly
scalable
• In
situ
data
reduc>on
and
triage
• In
situ
data
processing
for
interac>ve
data
explora>on
and
analysis
As we move to Exascale, it’s no longer feasible
to store most of the data for post processing!
We must do:
12. Network
Simplifica>on/Characteriza>on
Friendster social network Astrophysics co-author network
Links exhibit negative sensitivity (red) One competitive network (red) and
between cluster centers one collaborative network (blue)
Using centrality sensitivity
Competitive
Collaborative
TVCG 18(1) 2012
13. The
Graph
Layout
Problem
• The
cost
of
displaying
a
graph
• The
hairball
problem
of
large
graph
layouts
– Large,
dense
graphs
become
a
mess
– Inefficient
use
of
space
– Details
cluLered
• Solu>ons
– Filtering
– Clustering
– Abstrac>on
– Focus+context
California data 6,107 nodes 15,160 edges
High dimensional embedding method
14. A
Fast
Graph
Layout
Method
l Hierarchically
cluster
the
nodes
(if
no
clustering
given)
l Traverse
the
hierarchy
to
order
the
nodes
l Place
the
nodes
in
that
order
along
a
space
filling
curve
Order 1 Order 2 Order 3 Order 4 Order 5 Order 11
Hilbert
curves
TVCG 14(6) 2008
15. Fast
Graph
Layout
A Graph with 6,107 nodes 15,160 edges
HibertSpace filling curve: Gosper
Treemap
High dimensional embedding: 0.19s
One time clustering:
0.5 seconds
Layout + rendering:
0.0005 seconds
LinLog (force directed): 10,737s
16. Fast
Graph
Layout
Internet Connectivity 41,928 nodes 218,080 edges
Space filling curve: Hibert
Space filling curve: GosperFM3 40.8s
GRIP 6.87s
One time clustering:
18.87 seconds
Layout + rendering:
0.0036 seconds
Treemap
19. Time-‐Varying
Networks
• Almost
all
networks
found
in
real-‐world
applica>ons
are
>me-‐varying
• Both
nodes
and
edges
can
change
• Visualiza>on
methods:
– Anima>ons
– Small
mul>ples
visualiza>on
– Difference
visualiza>on
– Storyline
visualiza>on
21. Storyline
Visualiza>on
• Consis>ng
of
a
series
of
lines,
going
from
leU
to
right
along
the
>me-‐axis,
that
converge
and
diverge
in
the
course
of
their
paths.
• Each
line
represents
a
unique
en>ty
(character)
in
the
data.
• The
star>ng
&
ending
points
of
each
line
represent
the
lifespan
of
the
corresponding
en>ty.
• Lines
are
bundled
together
during
the
>me
period
of
their
interac>on.
• Exis>ng
algorithms:
1.
Rules
and
heuris>cs
based
[Ogawa
&
Ma
2008]
2.
Gene>c
algorithm
[Tanahashi
&
Ma
2012]
3.
Convex
quadra>c
op>miza>on
[Liu
et
al.
2013]
4.
Greedy
algorithms
26. Enron
Scandal
Email
Data
1230 days, 1264 employees, 495,408 messages, and 3478 email clusters
Video
27. Current
Projects
• Dynamic
network
visualiza>on
[Biological
science,
Internet,
social
networks]
• Visual
recommenda>ons
and
predic>ve
analysis
[Transporta>on]
• Visual
analy>cs
for
cyber
and
airborne
intelligence
• Remote
and
collabora>ve
visualiza>on
• Volume
data
visualiza>on
[Flow
simula>on,
biomedical
imaging,
NDT]
• Health
record
visualiza>on
• Visual
analysis
of
driving
behaviors
and
energy
use
[Transporta>on]
• Visualiza>on
for
scien>fic
storytelling
• Massively
parallel
visualiza>on
• In
situ
visualiza>on
and
data
reduc>on
• Visualizing
large
scale
compu>ng
[Scien>fic
compu>ng,
cloud
compu>ng]
• Video
visualiza>on
[Security]
• Uncertainty
visualiza>on
• Visualiza>on
interface
design