In this talk, I reflect on the tasks commonly involved in crafting visualizations and show examples of different applications of information/data visualization. Along this ride I will share my workflow, point out the common pitfalls and provide recommendations.
These slides were from my guest lecture in InfoVis class at UC Berkeley iSchool on Apr 11, 2016. Thank you Prof. Marti Hearst for inviting.
1. WHAT TO EXPECT
WHEN YOU ARE
VISUALIZING
Krist Wongsuphasawat / @kristw
Based on true stories
Forever querying
Never-ending cleaning
Hopelessly prototyping
Last minute coding
and many more…
2. Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Information Visualization
Univ. of Maryland
IBM
Microsoft
Data Visualization Scientist
Twitter
Krist Wongsuphasawat / @kristw
19. DATA SOURCES
Open data
Publicly available
Internal data
Private, owned by clients’ organization
Self-collected data
Manual, site scraping, etc.
Combine the above
20. MANY FORMS OF DATA
Standalone files
txt, csv, tsv, json, Google Docs, …, pdf*
APIs
better quality with more overhead
Databases
doesn’t necessary mean they are organized
Big data
bigger pain
23. CHALLENGES
Get relevant Tweets
hashtag: #oscars
keywords: “spotlight” (movie name)
Too big
Need to aggregate & reduce size
Slow
Long processing time (hours)
30. DATA WRANGLING
Clean
A clean dataset? Joking, right?
Filter
Less is more
Parse, Format, Correct, etc.
Change country code from 3-letter to 2-letter
Correct time of day based on users’ timezone
etc.
31. EXPECT A LOT OF TIME
WITH DATA WRANGLING
70-80% of time
“Data Janitor”
32. RECOMMENDATIONS
Always think that you will have to do it again
document the process, automation
Reusable scripts
break a gigantic do-it-all function into smaller ones
Reusable data
keep for future project
35. TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
36. TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
39. STORYTELLING : WHAT TO EXPECT
timely
Deadline is strict. Also can be unexpected events.
wide audience
easy to explain and understand, multi-device support
one-off projects
content screening
94. TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
98. ANALYTICS TOOLS : WHAT TO EXPECT
richer, more features
to support exploration of complex data
more technical audience
product managers, engineers, data scientists
accuracy
designed for dynamic input
long-term projects
124. client page section component element action
Search
Find
Log data
in Hadoop
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
125. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
126. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client events count
Engineers & Data Scientists
127. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client events count
Engineers & Data Scientists
128. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
129. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
130. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
136. See
HOW TO VISUALIZE?
narrow down
Client event collection
Engineers & Data Scientists
Interactions
search box => filter
137. See
Client event collection
Engineers & Data Scientists
client : page : section : component : element : action
HOW TO VISUALIZE?
narrow down
Interactions
search box => filter
166. Read the details in
Krist Wongsuphasawat and Jimmy Lin.
“Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter “
Proc. IEEE Conference on Visual Analytics Science and Technology (VAST) 2014
HOW TO MAKE IT WORK?
168. WORKFLOW
Requested / Identify needs
Design & Prototype
Make it work for sample dataset
Refine & Generalize
Productionize
Document & Release
Maintain & Support
Keep it running, Feature requests & Bugs fix
169. TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
183. TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
184. TAKE-AWAY
Getting data and data wrangling are time-consuming.
Different projects, different requirements
Storytelling, Product insights, Art, etc.
Combine visualization with other skills
HCI, Design, Stats, ML, etc.
Expect the unexpected
Learn and improve
do more with less time
grow the team, expand skills, improve tooling
Krist Wongsuphasawat / @kristw
kristw.yellowpigz.com
185. Nicolas Garcia Belmonte, Robert Harris, Miguel Rios,
Simon Rogers, Jimmy Lin, Linus Lee, Chuang Liu,
and many colleagues at Twitter.
Lastly, to my wife for taking care of our 3 months old baby, so I had time to prepare these slides.
ACKNOWLEDGEMENT
186. RESOURCES
Images
Banana phone http://goo.gl/GmcMPq
Bar chart https://goo.gl/1G1GBg
Boss https://goo.gl/gcY8Kw
Champions League http://goo.gl/DjtNKE
Database http://goo.gl/5N7zZz
Fishing shark http://goo.gl/2fp4zW
Globe visualization http://goo.gl/UiGMMj
Harry Potter http://goo.gl/Q9Cy64
Holding phone http://goo.gl/It2TzH
Kiwi orange http://goo.gl/ejQ73y
Kiwi http://goo.gl/9yk7o5
Library https://goo.gl/HVeE6h
Library earthquake http://goo.gl/rBqBrs
Minion http://goo.gl/I19Ijg
NBA http://goo.gl/p7HBdG
NFL http://goo.gl/feQMZs
Orange & Apple http://goo.gl/NG6RIL
Pile of paper http://goo.gl/mGLQTx
Premier League http://goo.gl/AqIINO
Scrooge McDuck https://goo.gl/aKv8D7
The Sound of Music https://goo.gl/dqHlzj
Trash pile http://goo.gl/OsFfo3
Tyrion http://goo.gl/WaBonl
Watercolor Map by Stamen Design