The document discusses information visualization for large-scale data workflows. It describes principles for workflow visualization including making visualizations latent, modular, and using a consistent visual language. Specific tools are mentioned for workflow management, including Azkaban and White Elephant, as well as visualization tools like Tableau, RStudio Shiny, GoogleVis, and D3.js. Examples of visualizations described include topic modeling, model selection comparisons, and joint variable distributions.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Information Visualization for Large-Scale Data Workflows
1. Information Visualization for
Large-Scale Data Workflows
Michael Conover
Senior Data Scientist, LinkedIn
@vagabondjack
reasonengine.wordpress.com
Wednesday, October 9, 2013
3. Elegant Complexity
Pedro Cruz, University of Coimbra
David Crandall, Indiana University
John Nelson, IDV Solutions
Credit
Wednesday, October 9, 2013
4. Intellectual Dividends
Realistic Mental Models
Verification of Assumptions
Shortened Iteration Cycles
Improved Predictive Performance
Product Insights
Clarity of Communication
Wednesday, October 9, 2013
14. 0.0
0.1
0.2
0.3
0.4
−2.5 0.0 2.5 5.0
Standard Normal
Density
0.0
0.1
0.2
0.3
0.4
−5.0 −2.5 0.0 2.5 5.0
Standard Normal
Density
100,0001,000,000
Wednesday, October 9, 2013
15. A Lens On The Joint Distribution
log(Connections)
log(EndorsementPagerank)
geom_point()
Wednesday, October 9, 2013
16. A Lens On The Joint Distribution
log(Connections)
log(EndorsementPagerank)
geom_point(alpha=1/5)
Wednesday, October 9, 2013
17. A Lens On The Joint Distribution
log(Connections)
log(EndorsementPagerank)
25
50
75
100
count
geom_bin2d(bins=35)
Wednesday, October 9, 2013
18. A Lens On The Joint Distribution
log(Connections)
log(EndorsementPagerank)
Class
Negative
Positive geom_point(alpha=1/5, aes(color=label))
Wednesday, October 9, 2013
19. A Lens On The Joint Distribution
log(Connections)
log(EndorsementPagerank)
Class
Negative
Positive geom_density2d(aes(color=label), bins=10)
Wednesday, October 9, 2013
20. A Lens On The Joint Distribution
Marginal Histograms
Wednesday, October 9, 2013
32. Information Visualization for
Large-Scale Data Workflows
Michael Conover
Senior Data Scientist, LinkedIn
@vagabondjack
reasonengine.wordpress.com
Wednesday, October 9, 2013