Krist Wongsuphasawat /@kristw
visualizationdata
A quick tour for data science enthusiasts
visualizationdata
What is it about?
What is it good for?
How is it related to data science?
Example projects
…
1. What is it about?
“A picture is worth more than a thousand words.”
— ใครสักคนได้กล่าวไว้
Data Picture
Data Visual display
Help audience consume a lot of information rapidly
Data Visual display
2. What is it good for?
Example / History
data
location (lat,lon => x,y), quantity of troops (width), direction (color)
time (x), temperature (y)
Example / Cholera epidemic
List of deceased patients
!
Mr. Smith, who lived at 11 Sunny St.
Miss White, who lived at 23 Cloudy Rd.
Mr. Jones, who liv...
John Snow
What is it good for?
Storytelling
Communicate known information
Exploratory data analysis
Explore data to reveal insights
More powerful
Visualization = Visual display + Interaction
3. How is it related
to data science?
Turn data into
valuable insights
data product
interesting stories
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
raw data
in-depth
analysis
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication...
4. Example projects
4.1 Ballon d’Or
FIFA released voting data
• 3 voters / country
• National team captain
• National team coach
• Journalist (media)
• Each voter select 3 players for ...
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication...
• Given data are tables in PDF.
• Extract to csv
• Format data to desired format.
Data Wrangling
Demo / Ballon d’Or
https://medium.com/@kristw/who-voted-for-who-diving-into-ballon-dor-voting-data-e09138ba9712
4.2 Public-facing vis
& New year 2013
interactive.twitter.com
Geo
Heatmap
Low density
High density
Geo
San Francisco
flickr.com/photos/twitteroffice/8798020541
Low density
High density
Geo
San Francisco
Rebuild the world
based on
tweet volumes
twitter.github.io/interactive/andes/
How are these phrases used in Tweets?
Is there any pattern?
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication...
Big data wrangling
Having all Tweets
How people think I feel.
How people think I feel. How I really feel.
Having all Tweets
• Too much data, want only relevant Tweets
• contain “สวัสดีปีใหม่”
• variations: หวัดดีปีใหม่, หวัดดีปีหม่ายยย
• typos: ห...
Hadoop Cluster
Data Storage
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Smaller datasetYour laptop
Workflow
Hadoop Cluster
Pig / Hive / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / etc. (fast)
Your lapto...
Exploratory Data Analysis
Improve design
for releasing to public
Demo / New Year 2013
twitter.github.io/interactive/newyear2014/
Another fun fact:
Developed using 2012 data
Then update data on Jan 2, 2013
4.3 Data Analysis Tool
data
wrangling
output
insights, products, stories
exploratory
data analysis
report
results
in-depth
analysis
communication...
Logging user activities
UsersUseTwitter
UsersUse
Product Managers
Curious
Twitter
UsersUse
Curious
Engineers
Log data
in Hadoop
Write Twitter
Instrument
Product Managers
What are being logged?
tweet
activities
What are being logged?
tweet from home timeline on twitter.com
tweet from search page on iPhone
activities
What are being logged?
tweet from home timeline on twitter.com
tweet from search page on iPhone
sign up
log in
retweet
etc...
Organize?
log event a.k.a. “client event”
[Lee et al. 2012]
log event a.k.a. “client event”
client : page : section : component : element : action
web : home : timeline : tweet_box :...
Twitter for Banana
Count page visits
banana : home : - : - : - : impression
home page
User sessions
Session#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
Aclient event
cl...
Funnel
home page
profile page
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression
1 jobhome page
profile page...
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : ...
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : ...
Goal
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
User sessions
Session#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
A
Aggregate
4 sessions
A
BB C
start
end endend
A A
end
A
Aggregate
A
BB C
start
end endend
end
4 sessions
Aggregate
C
start
end endend
end
A
B
4 sessions
Aggregate
C
start
end endend
end
A
B
4 sessions
Aggregate
C
start
end endend
A
B end
4 sessions
Aggregate
C
start
endend
A
B end
4 sessions
Aggregate
C
start
endend
A
B end
4 sessions
Aggregate
start
endend
A
CB end
4 sessions
Aggregate
4,000,000 sessions
endend
A
CB end
start
Demo / Flying Sessions
Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging
Infrastruc...
visualizationdata
What is it about?
Data => Visual display + Interaction
What is it good for?
Exploratory data analysis & ...
Thank you
Questions?
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Data Visualization: A Quick Tour for Data Science Enthusiasts
Prochain SlideShare
Chargement dans…5
×

Data Visualization: A Quick Tour for Data Science Enthusiasts

24 218 vues

Publié le

This was a talk given at Chulalongkorn University in Bangkok, Thailand on January 16, 2015 as a part of CodeMania X2 "Data Science 101"

Publié dans : Données & analyses

Data Visualization: A Quick Tour for Data Science Enthusiasts

  1. Krist Wongsuphasawat /@kristw visualizationdata A quick tour for data science enthusiasts
  2. visualizationdata What is it about? What is it good for? How is it related to data science? Example projects …
  3. 1. What is it about?
  4. “A picture is worth more than a thousand words.” — ใครสักคนได้กล่าวไว้
  5. Data Picture
  6. Data Visual display
  7. Help audience consume a lot of information rapidly Data Visual display
  8. 2. What is it good for?
  9. Example / History
  10. data
  11. location (lat,lon => x,y), quantity of troops (width), direction (color) time (x), temperature (y)
  12. Example / Cholera epidemic
  13. List of deceased patients ! Mr. Smith, who lived at 11 Sunny St. Miss White, who lived at 23 Cloudy Rd. Mr. Jones, who lived at 30 Rainy St. Mrs. Robinson, who lived at 34 Windy Rd. … data
  14. John Snow
  15. What is it good for? Storytelling Communicate known information Exploratory data analysis Explore data to reveal insights
  16. More powerful Visualization = Visual display + Interaction
  17. 3. How is it related to data science?
  18. Turn data into valuable insights data product interesting stories
  19. data wrangling output insights, products, stories exploratory data analysis report results raw data in-depth analysis
  20. data wrangling output insights, products, stories exploratory data analysis report results in-depth analysis communication, storytelling raw data
  21. 4. Example projects
  22. 4.1 Ballon d’Or
  23. FIFA released voting data
  24. • 3 voters / country • National team captain • National team coach • Journalist (media) • Each voter select 3 players for 1st, 2nd and 3rd place Rules
  25. data wrangling output insights, products, stories exploratory data analysis report results in-depth analysis communication, storytelling raw data
  26. • Given data are tables in PDF. • Extract to csv • Format data to desired format. Data Wrangling
  27. Demo / Ballon d’Or https://medium.com/@kristw/who-voted-for-who-diving-into-ballon-dor-voting-data-e09138ba9712
  28. 4.2 Public-facing vis & New year 2013
  29. interactive.twitter.com
  30. Geo Heatmap Low density High density
  31. Geo San Francisco flickr.com/photos/twitteroffice/8798020541 Low density High density
  32. Geo San Francisco Rebuild the world based on tweet volumes twitter.github.io/interactive/andes/
  33. How are these phrases used in Tweets? Is there any pattern?
  34. data wrangling output insights, products, stories exploratory data analysis report results in-depth analysis communication, storytelling raw data
  35. Big data wrangling
  36. Having all Tweets How people think I feel.
  37. How people think I feel. How I really feel. Having all Tweets
  38. • Too much data, want only relevant Tweets • contain “สวัสดีปีใหม่” • variations: หวัดดีปีใหม่, หวัดดีปีหม่ายยย • typos: หวัดตีปีใหม่ • Need to aggregate & reduce size • Long processing time (hours) Challenges
  39. Hadoop Cluster Data Storage Workflow
  40. Hadoop Cluster Pig / Hive / Scalding (slow) Data Storage Tool Workflow
  41. Hadoop Cluster Pig / Hive / Scalding (slow) Data Storage Tool Workflow
  42. Hadoop Cluster Pig / Hive / Scalding (slow) Data Storage Tool Smaller datasetYour laptop Workflow
  43. Hadoop Cluster Pig / Hive / Scalding (slow) Data Storage Tool Final dataset Tool node.js / python / etc. (fast) Your laptop Workflow Smaller dataset
  44. Exploratory Data Analysis
  45. Improve design for releasing to public
  46. Demo / New Year 2013 twitter.github.io/interactive/newyear2014/
  47. Another fun fact: Developed using 2012 data Then update data on Jan 2, 2013
  48. 4.3 Data Analysis Tool
  49. data wrangling output insights, products, stories exploratory data analysis report results in-depth analysis communication, storytelling raw data
  50. Logging user activities
  51. UsersUseTwitter
  52. UsersUse Product Managers Curious Twitter
  53. UsersUse Curious Engineers Log data in Hadoop Write Twitter Instrument Product Managers
  54. What are being logged? tweet activities
  55. What are being logged? tweet from home timeline on twitter.com tweet from search page on iPhone activities
  56. What are being logged? tweet from home timeline on twitter.com tweet from search page on iPhone sign up log in retweet etc. activities
  57. Organize?
  58. log event a.k.a. “client event” [Lee et al. 2012]
  59. log event a.k.a. “client event” client : page : section : component : element : action web : home : timeline : tweet_box : button : tweet 1) User ID 2) Timestamp 3) Event name 4) Event detail [Lee et al. 2012]
  60. Twitter for Banana
  61. Count page visits banana : home : - : - : - : impression home page
  62. User sessions Session#1 A B start end Session#4 start end A Session#2 B start end A Session#3 C start end Aclient event client event
  63. Funnel home page profile page
  64. Funnel analysis banana : home : - : - : - : impression banana : profile : - : - : - : impression 1 jobhome page profile page 1 hour
  65. Funnel analysis banana : home : - : - : - : impression banana : profile : - : - : - : impression banana : search : - : - : - : impression home page profile page search page 2 jobs 2 hours
  66. Funnel analysis banana : home : - : - : - : impression banana : profile : - : - : - : impression banana : search : - : - : - : impression home page profile page search page Specify all funnels manually! n jobs n hours
  67. Goal banana : home : - : - : - : impression … …… 1 job => all funnels, visualized home page
  68. User sessions Session#1 A B start end Session#4 start end A Session#2 B start end A Session#3 C start end A
  69. Aggregate 4 sessions A BB C start end endend A A end A
  70. Aggregate A BB C start end endend end 4 sessions
  71. Aggregate C start end endend end A B 4 sessions
  72. Aggregate C start end endend end A B 4 sessions
  73. Aggregate C start end endend A B end 4 sessions
  74. Aggregate C start endend A B end 4 sessions
  75. Aggregate C start endend A B end 4 sessions
  76. Aggregate start endend A CB end 4 sessions
  77. Aggregate 4,000,000 sessions endend A CB end start
  78. Demo / Flying Sessions Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter by Krist Wongsuphasawat and Jimmy Lin. in Proc. IEEE Conference on Visual Analytics Science and Technology (VAST), Paris, France, 13 November, 2014
  79. visualizationdata What is it about? Data => Visual display + Interaction What is it good for? Exploratory data analysis & storytelling How is it related to data science? It is one of the skills often utilized in the process. Example projects interactive.twitter.com @kristw / kristw.yellowpigz.com
  80. Thank you
  81. Questions?

×