4. Will Talk About:
• Data pre-processing tools
• Visualization tools and techniques
• How to make great looking charts
• What makes visuals effective
• How to avoid visualization mistakes
5. Will NOT Talk About:
• How to collect performance data
• Cool ASH queries
• How to program in R
• Statistics
• Machine Learning
• What the data actually means
• How to explain the results to your boss
The goal is to...Structure = Trends, repetitions and outliers, etc. High bandwidth information channel.Apply pattern matching skills and prior knowledge to analysis of data.
Just a photo. Add a list of resources at the end. R is my favorite but there are many many others.
3 data preparation techniques
You can also pivot and apply pre-analysis.The goal is on one hand to get all the data you are going to need, so you won’t have to move back and forth between the database and R.On the other hand, minimizing the amount of data you have to copy over the network. And as DB experts and R newbies – most cleanup activities are easier for us in the DB rather than elsewhere.
Example from Greg Rahn blog post: http://structureddata.org/2011/12/20/visualizing-active-session-history-ash-data-with-r/
Re-shape makes pivoting easySometimes you didn’t know you should filter out data before you started working on it in R
I don’t really want the buffer cache data, its too large and will distort all my charts
Perl is awesome for processing lines of text, can be used to aggregate (with hash maps), filter, etc. So are SED and AWKAlso, data that is not from the database, sometimes doesn’t look like a table, so you can’t massage it with R easily.Frits Hoogland has wonderful example of using sed to extract wait information our of 10046 file.:http://fritshoogland.wordpress.com/2012/01/18/using-r-and-oracle-tracefiles/
Shape of data – distribution, common values, outliers. Charts should be useful, but not necessarily sexy.
You also need at least two solutions, but that’s for later
We can see what looks like failed exports (but don’t know when they failed), we can see that our largest database has large variance in times, we can see that most databases have export times far outside the average, and we can see the 75% percent point
We pay attention to what is interesting. And what is interesting is the story, the outliers, the changes, the discoverieshttp://headrush.typepad.com/creating_passionate_users/2005/12/but_is_it_inter.html
From Baron Schwartz blog: http://www.xaprb.com/blog/2011/01/15/sleep-while-you-can-because-it-wont-last-long/Showing number of blog posts on MySQL over time. Clearly we are running out of blog posts.Extrapolating without a model to explain what you are looking at.just drawing a line through data is not enough – you need a model.