Visualizing database performance hotsos 13-v2

Visualizing Database
Performance with R

Gwen Shapira, Senior Consultant
February, 2013

About Me
– Oracle ACE Director
– Member of Oak Table
– 14 years of IT

– Performance Tuning
– Troubleshooting
– Hadoop

– Presents, Blogs, Tweets
– @gwenshap

2 © 2013 Pythian

About Pythian
• Recognized Leader:
– Global industry-leader in remote database administration services and
consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server
– Work with over 250 multinational companies such as Forbes.com, Fox
Sports, Nordion and Western Union to help manage their complex IT
deployments
• Expertise:
– Pythian’s data experts are the elite in their field. We have the highest
concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2
Microsoft MVPs.
– Pythian holds 7 Specializations under Oracle Platinum Partner
program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC
• Global Reach & Scalability:
– Around the clock global remote support for DBA and consulting, systems
administration, special projects or emergency response

3 © 2013 Pythian

Will Talk About:
• Data pre-processing tools
• Visualization tools and techniques
• How to make great looking charts
• What makes visuals effective
• How to avoid visualization mistakes

Will NOT Talk About:
• How to collect performance data
• Cool ASH queries
• How to program in R
• Statistics
• Machine Learning
• What the data actually means
• How to explain the results to your boss

Why Visualize?
• Yet another analysis tool
• But more fun
• Highly effective

• Communications tool, too
• But not at the same time

6 © 2013 Pythian

R Studio

9 © 2013 Pythian

Getting Data In Shape

10 © 2013 Pythian

Use the DB, Luke
Aggregate

Scale

Filter

11 © 2013 Pythian

Getting DB Data to R
library(RJDBC)
drv <-JDBC("oracle.jdbc.driver.OracleDriver",
"/Users/grahn/code/jdbc/ojdbc6.jar")

conn<-dbConnect(drv,
"jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl",
"grahn","grahn")

# import the data into a data.frame
lfs <-dbGetQuery(conn,
"select SAMPLE_ID, TIME_WAITED
from ashdump
where EVENT='log file sync’
order by SAMPLE_ID")

12 © 2013 Pythian

With R
"NAME","SNAP_TIME","BYTES"
"free memory",12-03-09 00:00:00,645935368
"KGH: NO ACCESS",12-03-09 00:00:00,325214880
"db_block_hash_buckets",12-03-09 00:00:00,186650624
"free memory",12-03-09 00:00:00,134211304
"shared_io_pool",12-03-09 00:00:00,536870912
"log_buffer",12-03-09 00:00:00,16924672
"buffer_cache",12-03-09 00:00:00,21676163072
"fixed_sga",12-03-09 00:00:00,2238472
"JOXLE",12-03-10 04:00:01,27349056
"free memory",12-03-10 04:00:01,105800192
"free memory",12-03-10 04:00:01,192741376
"PX msg pool",12-03-10 04:00:01,8192000

13 © 2013 Pythian

Reshape
shared_pool <- read.csv(~/shapira/shared_pool.csv")
install.packages("reshape")
library(reshape)
max_shared_pool<-
cast(shared_pool,SNAP_TIME ~ NAME,max)

Time free memory log_buffer buffer_cache
12-03-09 00:00:00 645935368 16924672 21676163072
12-03-09 04:00:00 192741376

14 © 2013 Pythian

With R

out of scale

15 © 2013 Pythian

Select Subset of data
max_shared_pool <-
subset(max_shared_pool, select = -
c(buffer_cache))

boxplot(
(max_shared_pool)/1024/1024,
xlab="Size in MBytes",
horizontal=TRUE,
las=1,par(mar=c(4,6,2,1))
)
16 © 2013 Pythian

With R

17 © 2013 Pythian

More Subsets
SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT
14929 User I/O cell single block
10526629 physical read

10526629 5015 User I/O cell single block
physical read

10465699 21572 Concurrency library cache:
mutex X
10465699 65938 Concurrency library cache:
mutex X

18 © 2013 Pythian

Filtering Data
new <- subset (old, row filter, column filter)

phys_io <- subset(ash,
WAIT_CLASS == ―User I/O‖,
select = -c(EVENT))

SAMPLE_ID TIME_WAITED WAIT_CLASS
10526629 14929 User I/O
10526629 5015 User I/O

19 © 2013 Pythian

Another Filtering Syntax
short_waits <- subset(ash, ash$TIME_WAITED < 10000)

short_waits <- ash[ash$TIME_WAITED < 10000,]

Not a Typo!

SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT
10526629 5015 User I/O cell single block
physical read

20 © 2013 Pythian

Summarize with DDPLY
install.packages(‖plyr")
library(plyr)

ash2 <- ddply(ash, ‖SAMPLE_ID‖, summarise,
N=length(TIME_WAITED),
mean=mean(TIME_WAITED),
max=max(TIME_WAITED));
SAMPLE_ID N MEAN MAX
10526629 2 9972 14929
10465699 2 43755 65938

21 © 2013 Pythian

Cheating for DBAs
library(sqldf)

ash2 = sqldf('select
SAMPLE_ID, count(*) N,
mean(TIME_WAITED), max(TIME_WAITED)
from ash
where WAIT_CLASS=―User I/O‖
group by SAMPLE_ID')

22 © 2013 Pythian

When all else fails
Text is text.
Frits Hoogland converts 10046 trace to CSV for R with
SED:

s/^(WAIT) #([0-9]*): nam='(.*)' ela= *([0-9]*)
[0-9a-z #|]*=([0-9]*) [0-9a-z #|]*=([0-9]*) [0-9a-
z #|]*=([0-9]*) obj#=([0-9-]*) tim=([0-
9]*)$/1|2|3|4|5|6|7|8|9/

23 © 2013 Pythian

Exploring Data

24 © 2013 Pythian

Directions to Explore
• Shape of data
• Correlations
• Changes over time

25 © 2013 Pythian

The Goal of Analysis is a Story
• Who
• What
• When
• Where
• Why
• Why
• Why
• Why
• Why
26 © 2013 Pythian

Boxplot
75% of
exports take
• Initial step less than
600m
• Identify outliers
• Compare groups
• Summarize

Fail?

27 © 2013 Pythian

For Example:

WHAT?

28 © 2013 Pythian

How its done?
ash <- read.csv('~/Downloads/ash1.csv')

boxplot(ash$TIME_WAITED/1000000 ~
ash$WAIT_CLASS,
xlab="Wait Class",
ylab="Time Waited (s)",
cex.axis=1.2)

29 © 2013 Pythian

Scatter Plot
• Incredibly versatile
• Use to:
– Show changes over time
– Show correlations
– Highlight trends
– Find model
– Pretty much everything

30 © 2013 Pythian

WHAT?

31 © 2013 Pythian

Log Data

32 © 2013 Pythian

How its done?
install.packages("ggplot2")
library(ggplot2)
ggplot(ash,
aes(SAMPLE_ID,TIME_WAITED,
color=factor(WAIT_CLASS)))+geom_point();
ggplot(ash,
aes(SAMPLE_ID,log(TIME_WAITED),
color=factor(WAIT_CLASS)))+geom_point();

33 © 2013 Pythian

Only ”Small Waits”

500us
Physical
IO?

34 © 2013 Pythian

Filtering

small_waits <- ash[ash$TIME_WAITED<15000,]

ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITE
D,color=factor(WAIT_CLASS))) + geom_point()

35 © 2013 Pythian

Smoothing

36 © 2013 Pythian

Smoothing
ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000
000,color=factor(WAIT_CLASS))) +
geom_smooth()

ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000
000,color=factor(WAIT_CLASS))) + geom_point()
+ geom_smooth()

37 © 2013 Pythian

Data over Time

11gR2
!

38 © 2013 Pythian

Finding Correlation

39 © 2013 Pythian

Regression (is not Causation)

40 © 2013 Pythian

How?
concurr2 <-
ddply(concurr,.(SAMPLE_ID), summarise,
N=length(TIME_WAITED),
max=max(TIME_WAITED));

ggplot(concurr2,aes(N,max/1000000))+geom_poin
t()+geom_smooth(method=lm)+xlab("Number of
Samples")
+ylab("Max Time Waited (s)")
41 © 2013 Pythian

Heatmap
• Values as ―blocks‖ in
a matrix
• Clearer than scatter
plot for large amounts
of data
• Shows less
information
• Performance data
made sexy

42 © 2013 Pythian

Heatmap

43 © 2013 Pythian

How?
ash2 <- ddply(concurr,.(SAMPLE_ID),
summarise,N=length(TIME_WAITED),
max=max(TIME_WAITED))
ash2 <- ash2[ash2$WAIT_CLASS %in%
c("Concurrency","User I/O","Other"),]
ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) +
geom_tile(aes(fill = log(N))) +
scale_fill_gradient(low = ‖green‖, high = ‖red")

44 © 2013 Pythian

FACT
―Even irrelevant
neuroscience
information in an
explanation of a
psychological
phenomenon may
interfere with people’s
abilities to critically
consider the underlying
logic of this
explanation.‖
46 © 2013 Pythian

Numerical quantities focus on
expected values –

graphical summaries on unexpected
values

--John Tukey

47

Our goal is an interesting presentation.
What is “Interesting”?

• Surprise
• Beauty
• Stories
• Visuals
• Counterintuitive
• Variety

48 © 2013 Pythian

Bad Visualizations Lie
1. Omit important data
2. Distort data
3. Misleading
4. Confusing
5. Fake correlations and Bad models

49 © 2013 Pythian

Eye-API
• Good: • Bad:
– distances – shades
– locations – relative area
– length – angles
– high contrast

51 © 2013 Pythian

Make it Beautiful – for Geeks
• Contrast
• Reduce noise
• Few colors
• Few fonts
• Lots of Data
• More Signal
• Less Noise

60 © 2013 Pythian

IMPORTant R Libraries
• reshape
• plyr
• ggplot2
• sqldf
• http://blog.revolutionanalytics.com/2013/02/10-r-
packages-every-data-scientist-should-know-
about.html

61 © 2013 Pythian

Thank you – Q&A
To contact us
sales@pythian.com

1-877-PYTHIAN

To follow us
http://www.pythian.com/blog

http://www.facebook.com/pages/The-Pythian-
Group/163902527671

@pythian

http://www.linkedin.com/company/pythian

63 © 2013 Pythian

Visualizing database performance hotsos 13-v2

Recommended

Recommended

More Related Content

What's hot

What's hot (9)

Similar to Visualizing database performance hotsos 13-v2

Similar to Visualizing database performance hotsos 13-v2 (20)

More from Gwen (Chen) Shapira

More from Gwen (Chen) Shapira (20)

Visualizing database performance hotsos 13-v2

Editor's Notes