SlideShare a Scribd company logo
1 of 63
Visualizing Database
Performance with R


Gwen Shapira, Senior Consultant
February, 2013
About Me
                     – Oracle ACE Director
                     – Member of Oak Table
                     – 14 years of IT

                     – Performance Tuning
                     – Troubleshooting
                     – Hadoop

                     – Presents, Blogs, Tweets
                     – @gwenshap


2          © 2013 Pythian
About Pythian
•   Recognized Leader:
    – Global industry-leader in remote database administration services and
      consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server
    – Work with over 250 multinational companies such as Forbes.com, Fox
      Sports, Nordion and Western Union to help manage their complex IT
      deployments
•   Expertise:
    – Pythian’s data experts are the elite in their field. We have the highest
      concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2
      Microsoft MVPs.
    – Pythian holds 7 Specializations under Oracle Platinum Partner
      program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC
•   Global Reach & Scalability:
    – Around the clock global remote support for DBA and consulting, systems
      administration, special projects or emergency response

3                                   © 2013 Pythian
Will Talk About:
•   Data pre-processing tools
•   Visualization tools and techniques
•   How to make great looking charts
•   What makes visuals effective
•   How to avoid visualization mistakes
Will NOT Talk About:
•   How to collect performance data
•   Cool ASH queries
•   How to program in R
•   Statistics
•   Machine Learning
•   What the data actually means
•   How to explain the results to your boss
Why Visualize?
• Yet another analysis tool
• But more fun
• Highly effective


• Communications tool, too
• But not at the same time




6                     © 2013 Pythian
Reveal Structure
in Data
Visualization Tools
R Studio




9          © 2013 Pythian
Getting Data In Shape




10            © 2013 Pythian
Use the DB, Luke
                               Aggregate

     Scale



                               Filter




11            © 2013 Pythian
Getting DB Data to R
library(RJDBC)
drv <-JDBC("oracle.jdbc.driver.OracleDriver",
             "/Users/grahn/code/jdbc/ojdbc6.jar")

conn<-dbConnect(drv,
               "jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl",
               "grahn","grahn")

# import the data into a data.frame
lfs <-dbGetQuery(conn,
                   "select SAMPLE_ID, TIME_WAITED
                   from ashdump
                   where EVENT='log file sync’
                   order by SAMPLE_ID")

12                                 © 2013 Pythian
With R
"NAME","SNAP_TIME","BYTES"
"free memory",12-03-09 00:00:00,645935368
"KGH: NO ACCESS",12-03-09 00:00:00,325214880
"db_block_hash_buckets",12-03-09 00:00:00,186650624
"free memory",12-03-09 00:00:00,134211304
"shared_io_pool",12-03-09 00:00:00,536870912
"log_buffer",12-03-09 00:00:00,16924672
"buffer_cache",12-03-09 00:00:00,21676163072
"fixed_sga",12-03-09 00:00:00,2238472
"JOXLE",12-03-10 04:00:01,27349056
"free memory",12-03-10 04:00:01,105800192
"free memory",12-03-10 04:00:01,192741376
"PX msg pool",12-03-10 04:00:01,8192000

13                           © 2013 Pythian
Reshape
shared_pool <- read.csv(~/shapira/shared_pool.csv")
install.packages("reshape")
library(reshape)
max_shared_pool<-
    cast(shared_pool,SNAP_TIME ~ NAME,max)

     Time                free memory       log_buffer   buffer_cache
     12-03-09 00:00:00   645935368         16924672     21676163072
     12-03-09 04:00:00   192741376




14                                     © 2013 Pythian
With R

         out of scale




15           © 2013 Pythian
Select Subset of data
max_shared_pool <-
subset(max_shared_pool, select = -
c(buffer_cache))

boxplot(
  (max_shared_pool)/1024/1024,
  xlab="Size in MBytes",
  horizontal=TRUE,
  las=1,par(mar=c(4,6,2,1))
)
16                   © 2013 Pythian
With R




17       © 2013 Pythian
More Subsets
SAMPLE_ID   TIME_WAITED           WAIT_CLASS    EVENT
            14929                 User I/O      cell single block
10526629                                        physical read

10526629    5015                  User I/O      cell single block
                                                physical read

10465699    21572                 Concurrency   library cache:
                                                mutex X
10465699    65938                 Concurrency   library cache:
                                                mutex X




18                        © 2013 Pythian
Filtering Data
new <- subset (old, row filter, column filter)

phys_io <- subset(ash,
                    WAIT_CLASS == ―User I/O‖,
                    select = -c(EVENT))

SAMPLE_ID             TIME_WAITED              WAIT_CLASS
10526629              14929                    User I/O
10526629              5015                     User I/O




19                            © 2013 Pythian
Another Filtering Syntax
short_waits <- subset(ash, ash$TIME_WAITED < 10000)

short_waits <- ash[ash$TIME_WAITED < 10000,]


                     Not a Typo!



SAMPLE_ID    TIME_WAITED           WAIT_CLASS   EVENT
10526629     5015                  User I/O     cell single block
                                                physical read




20                         © 2013 Pythian
Summarize with DDPLY
install.packages(‖plyr")
library(plyr)


ash2 <- ddply(ash, ‖SAMPLE_ID‖, summarise,
   N=length(TIME_WAITED),
   mean=mean(TIME_WAITED),
   max=max(TIME_WAITED));
SAMPLE_ID       N                  MEAN     MAX
10526629        2                  9972     14929
10465699        2                  43755    65938



21                         © 2013 Pythian
Cheating for DBAs
library(sqldf)

ash2 = sqldf('select
SAMPLE_ID, count(*) N,
mean(TIME_WAITED), max(TIME_WAITED)
from ash
where WAIT_CLASS=―User I/O‖
group by SAMPLE_ID')


22                © 2013 Pythian
When all else fails
Text is text.
Frits Hoogland converts 10046 trace to CSV for R with
SED:


s/^(WAIT) #([0-9]*): nam='(.*)' ela= *([0-9]*)
[0-9a-z #|]*=([0-9]*) [0-9a-z #|]*=([0-9]*) [0-9a-
z #|]*=([0-9]*) obj#=([0-9-]*) tim=([0-
9]*)$/1|2|3|4|5|6|7|8|9/



23                        © 2013 Pythian
Exploring Data




24               © 2013 Pythian
Directions to Explore
• Shape of data
• Correlations
• Changes over time




25                    © 2013 Pythian
The Goal of Analysis is a Story
                         •      Who
                         •      What
                         •      When
                         •      Where
                         •      Why
                         •      Why
                         •      Why
                         •      Why
                         •      Why
26             © 2013 Pythian
Boxplot
                                            75% of
                                          exports take
•    Initial step                          less than
                                             600m
•    Identify outliers
•    Compare groups
•    Summarize


                Fail?




27                       © 2013 Pythian
For Example:

                                WHAT?




28             © 2013 Pythian
How its done?
ash <- read.csv('~/Downloads/ash1.csv')

boxplot(ash$TIME_WAITED/1000000 ~
         ash$WAIT_CLASS,
         xlab="Wait Class",
         ylab="Time Waited (s)",
         cex.axis=1.2)



29                    © 2013 Pythian
Scatter Plot
• Incredibly versatile
• Use to:
     –   Show changes over time
     –   Show correlations
     –   Highlight trends
     –   Find model
     –   Pretty much everything




30                          © 2013 Pythian
WHAT?




31   © 2013 Pythian
Log Data




32         © 2013 Pythian
How its done?
install.packages("ggplot2")
library(ggplot2)
ggplot(ash,
    aes(SAMPLE_ID,TIME_WAITED,
    color=factor(WAIT_CLASS)))+geom_point();
ggplot(ash,
    aes(SAMPLE_ID,log(TIME_WAITED),
    color=factor(WAIT_CLASS)))+geom_point();


33                   © 2013 Pythian
Only ”Small Waits”




                                500us
                               Physical
                                 IO?




34            © 2013 Pythian
Filtering

small_waits <- ash[ash$TIME_WAITED<15000,]

ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITE
D,color=factor(WAIT_CLASS))) + geom_point()




35                  © 2013 Pythian
Smoothing




36          © 2013 Pythian
Smoothing
ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000
000,color=factor(WAIT_CLASS))) +
geom_smooth()

ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000
000,color=factor(WAIT_CLASS))) + geom_point()
+ geom_smooth()




37                   © 2013 Pythian
Data over Time

                 11gR2
                   !




38               © 2013 Pythian
Finding Correlation




39            © 2013 Pythian
Regression (is not Causation)




40             © 2013 Pythian
How?
concurr2 <-
ddply(concurr,.(SAMPLE_ID), summarise,
  N=length(TIME_WAITED),
  mean=mean(TIME_WAITED),
  max=max(TIME_WAITED));

ggplot(concurr2,aes(N,max/1000000))+geom_poin
t()+geom_smooth(method=lm)+xlab("Number of
Samples")
+ylab("Max Time Waited (s)")
41                   © 2013 Pythian
Heatmap
• Values as ―blocks‖ in
  a matrix
• Clearer than scatter
  plot for large amounts
  of data
• Shows less
  information
• Performance data
  made sexy


42                     © 2013 Pythian
Heatmap




43        © 2013 Pythian
How?
ash2 <- ddply(concurr,.(SAMPLE_ID),
summarise,N=length(TIME_WAITED),
   mean=mean(TIME_WAITED),
   max=max(TIME_WAITED))
ash2 <- ash2[ash2$WAIT_CLASS %in%
c("Concurrency","User I/O","Other"),]
ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) +
geom_tile(aes(fill = log(N))) +
scale_fill_gradient(low = ‖green‖, high = ‖red")



44                      © 2013 Pythian
Presenting Your Data




45            © 2013 Pythian
FACT
―Even irrelevant
neuroscience
information in an
explanation of a
psychological
phenomenon may
interfere with people’s
abilities to critically
consider the underlying
logic of this
explanation.‖
46                    © 2013 Pythian
Numerical quantities focus on
     expected values –

     graphical summaries on unexpected
     values

        --John Tukey




47
Our goal is an interesting presentation.
What is “Interesting”?

•    Surprise
•    Beauty
•    Stories
•    Visuals
•    Counterintuitive
•    Variety



48                      © 2013 Pythian
Bad Visualizations Lie
1.   Omit important data
2.   Distort data
3.   Misleading
4.   Confusing
5.   Fake correlations and Bad models




49                     © 2013 Pythian
Bad vs. Good Visuals




50             © 2013 Pythian
Eye-API
• Good:                            • Bad:
     –   distances                        – shades
     –   locations                        – relative area
     –   length                           – angles
     –   high contrast




51                       © 2013 Pythian
Good or Bad?




52             © 2013 Pythian
53   © 2013 Pythian
#1 Mistake – Throw a line on Data




54              © 2013 Pythian
55   © 2013 Pythian
Avoid Pie Charts




56            © 2013 Pythian
Infographics always have Pie Charts




57                © 2013 Pythian
Which is better?




58            © 2013 Pythian
Creativity is Allowed




59               © 2013 Pythian
Make it Beautiful – for Geeks
•    Contrast
•    Reduce noise
•    Few colors
•    Few fonts
•    Lots of Data
•    More Signal
•    Less Noise



60                  © 2013 Pythian
IMPORTant R Libraries
•    reshape
•    plyr
•    ggplot2
•    sqldf
•    http://blog.revolutionanalytics.com/2013/02/10-r-
     packages-every-data-scientist-should-know-
     about.html




61                        © 2013 Pythian
Other Visualization Tools
•    R + R Studio
•    Excel
•    Gephi
•    JIT, D3.js
•    Excel
•    ggobi




62                  © 2013 Pythian
Thank you – Q&A
To contact us
        sales@pythian.com

        1-877-PYTHIAN

To follow us
        http://www.pythian.com/blog

        http://www.facebook.com/pages/The-Pythian-
     Group/163902527671

        @pythian

        http://www.linkedin.com/company/pythian

63                             © 2013 Pythian

More Related Content

What's hot

Working with the Scalding Type-Safe API
Working with the Scalding Type-Safe API Working with the Scalding Type-Safe API
Working with the Scalding Type-Safe API Criteolabs
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceHansol Kang
 
Machine learning using spark
Machine learning using sparkMachine learning using spark
Machine learning using sparkRan Silberman
 
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016Comsysto Reply GmbH
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruitymrphilroth
 
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Comsysto Reply GmbH
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesReal-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesEugene Dvorkin
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 

What's hot (9)

Working with the Scalding Type-Safe API
Working with the Scalding Type-Safe API Working with the Scalding Type-Safe API
Working with the Scalding Type-Safe API
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Machine learning using spark
Machine learning using sparkMachine learning using spark
Machine learning using spark
 
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
Spark RDD-DF-SQL-DS-Spark Hadoop User Group Munich Meetup 2016
 
Time Series Analysis for Network Secruity
Time Series Analysis for Network SecruityTime Series Analysis for Network Secruity
Time Series Analysis for Network Secruity
 
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016
 
Real-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL DatabasesReal-Time Integration Between MongoDB and SQL Databases
Real-Time Integration Between MongoDB and SQL Databases
 
Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 

Similar to Visualizing database performance hotsos 13-v2

Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center ZurichData Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center ZurichRomeo Kienzler
 
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...Jean Ihm
 
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)Kristofferson A
 
20160821 coscup-my sql57docstorelab01
20160821 coscup-my sql57docstorelab0120160821 coscup-my sql57docstorelab01
20160821 coscup-my sql57docstorelab01Ivan Ma
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
Oracle AWR Data mining
Oracle AWR Data miningOracle AWR Data mining
Oracle AWR Data miningYury Velikanov
 
오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)Linux Foundation Korea
 
Get the most out of Oracle Data Guard - POUG version
Get the most out of Oracle Data Guard - POUG versionGet the most out of Oracle Data Guard - POUG version
Get the most out of Oracle Data Guard - POUG versionLudovico Caldara
 
Hydra - Getting Started
Hydra - Getting StartedHydra - Getting Started
Hydra - Getting Startedabramsm
 
Oracle 12c Application development
Oracle 12c Application developmentOracle 12c Application development
Oracle 12c Application developmentpasalapudi123
 
Advanced SQL - Quebec 2014
Advanced SQL - Quebec 2014Advanced SQL - Quebec 2014
Advanced SQL - Quebec 2014Connor McDonald
 
Christo kutrovsky oracle, memory & linux
Christo kutrovsky   oracle, memory & linuxChristo kutrovsky   oracle, memory & linux
Christo kutrovsky oracle, memory & linuxKyle Hailey
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceNagios
 
CloudStack Meetup Santa Clara
CloudStack Meetup Santa Clara CloudStack Meetup Santa Clara
CloudStack Meetup Santa Clara NetApp
 
The State of the Dolphin, MySQL Keynote at Percona Live Europe 2019, Amsterda...
The State of the Dolphin, MySQL Keynote at Percona Live Europe 2019, Amsterda...The State of the Dolphin, MySQL Keynote at Percona Live Europe 2019, Amsterda...
The State of the Dolphin, MySQL Keynote at Percona Live Europe 2019, Amsterda...Geir Høydalsvik
 
Get the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionGet the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionLudovico Caldara
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...AMD Developer Central
 
[C12]元気Hadoop! OracleをHadoopで分析しちゃうぜ by Daisuke Hirama
[C12]元気Hadoop! OracleをHadoopで分析しちゃうぜ by Daisuke Hirama[C12]元気Hadoop! OracleをHadoopで分析しちゃうぜ by Daisuke Hirama
[C12]元気Hadoop! OracleをHadoopで分析しちゃうぜ by Daisuke HiramaInsight Technology, Inc.
 

Similar to Visualizing database performance hotsos 13-v2 (20)

Spark etl
Spark etlSpark etl
Spark etl
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center ZurichData Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
Data Science Connect, July 22nd 2014 @IBM Innovation Center Zurich
 
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
Powerful Spatial Features You Never Knew Existed in Oracle Spatial and Graph ...
 
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
RMOUG2016 - Resource Management (the critical piece of the consolidation puzzle)
 
20160821 coscup-my sql57docstorelab01
20160821 coscup-my sql57docstorelab0120160821 coscup-my sql57docstorelab01
20160821 coscup-my sql57docstorelab01
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Oracle AWR Data mining
Oracle AWR Data miningOracle AWR Data mining
Oracle AWR Data mining
 
오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)오라클 DR 및 복제 솔루션(Dbvisit 소개)
오라클 DR 및 복제 솔루션(Dbvisit 소개)
 
Get the most out of Oracle Data Guard - POUG version
Get the most out of Oracle Data Guard - POUG versionGet the most out of Oracle Data Guard - POUG version
Get the most out of Oracle Data Guard - POUG version
 
Hydra - Getting Started
Hydra - Getting StartedHydra - Getting Started
Hydra - Getting Started
 
Oracle 12c Application development
Oracle 12c Application developmentOracle 12c Application development
Oracle 12c Application development
 
Advanced SQL - Quebec 2014
Advanced SQL - Quebec 2014Advanced SQL - Quebec 2014
Advanced SQL - Quebec 2014
 
Christo kutrovsky oracle, memory & linux
Christo kutrovsky   oracle, memory & linuxChristo kutrovsky   oracle, memory & linux
Christo kutrovsky oracle, memory & linux
 
Dave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical ExperienceDave Williams - Nagios Log Server - Practical Experience
Dave Williams - Nagios Log Server - Practical Experience
 
CloudStack Meetup Santa Clara
CloudStack Meetup Santa Clara CloudStack Meetup Santa Clara
CloudStack Meetup Santa Clara
 
The State of the Dolphin, MySQL Keynote at Percona Live Europe 2019, Amsterda...
The State of the Dolphin, MySQL Keynote at Percona Live Europe 2019, Amsterda...The State of the Dolphin, MySQL Keynote at Percona Live Europe 2019, Amsterda...
The State of the Dolphin, MySQL Keynote at Percona Live Europe 2019, Amsterda...
 
Get the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW versionGet the most out of Oracle Data Guard - OOW version
Get the most out of Oracle Data Guard - OOW version
 
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
PL-4047, Big Data Workload Analysis Using SWAT and Ipython Notebooks, by Moni...
 
[C12]元気Hadoop! OracleをHadoopで分析しちゃうぜ by Daisuke Hirama
[C12]元気Hadoop! OracleをHadoopで分析しちゃうぜ by Daisuke Hirama[C12]元気Hadoop! OracleをHadoopで分析しちゃうぜ by Daisuke Hirama
[C12]元気Hadoop! OracleをHadoopで分析しちゃうぜ by Daisuke Hirama
 

More from Gwen (Chen) Shapira

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep DiveGwen (Chen) Shapira
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote Gwen (Chen) Shapira
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGwen (Chen) Shapira
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Gwen (Chen) Shapira
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebookGwen (Chen) Shapira
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupGwen (Chen) Shapira
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Gwen (Chen) Shapira
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings MeetupGwen (Chen) Shapira
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersGwen (Chen) Shapira
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingGwen (Chen) Shapira
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupGwen (Chen) Shapira
 

More from Gwen (Chen) Shapira (20)

Velocity 2019 - Kafka Operations Deep Dive
Velocity 2019  - Kafka Operations Deep DiveVelocity 2019  - Kafka Operations Deep Dive
Velocity 2019 - Kafka Operations Deep Dive
 
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote Lies Enterprise Architects Tell - Data Day Texas 2018  Keynote
Lies Enterprise Architects Tell - Data Day Texas 2018 Keynote
 
Gluecon - Kafka and the service mesh
Gluecon - Kafka and the service meshGluecon - Kafka and the service mesh
Gluecon - Kafka and the service mesh
 
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
Multi-Cluster and Failover for Apache Kafka - Kafka Summit SF 17
 
Papers we love realtime at facebook
Papers we love   realtime at facebookPapers we love   realtime at facebook
Papers we love realtime at facebook
 
Kafka reliability velocity 17
Kafka reliability   velocity 17Kafka reliability   velocity 17
Kafka reliability velocity 17
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Streaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data MeetupStreaming Data Integration - For Women in Big Data Meetup
Streaming Data Integration - For Women in Big Data Meetup
 
Kafka at scale facebook israel
Kafka at scale   facebook israelKafka at scale   facebook israel
Kafka at scale facebook israel
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Fraud Detection for Israel BigThings Meetup
Fraud Detection  for Israel BigThings MeetupFraud Detection  for Israel BigThings Meetup
Fraud Detection for Israel BigThings Meetup
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clustersNyc kafka meetup 2015 - when bad things happen to good kafka clusters
Nyc kafka meetup 2015 - when bad things happen to good kafka clusters
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Have your cake and eat it too
Have your cake and eat it tooHave your cake and eat it too
Have your cake and eat it too
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
Data Architectures for Robust Decision Making
Data Architectures for Robust Decision MakingData Architectures for Robust Decision Making
Data Architectures for Robust Decision Making
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Kafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka MeetupKafka & Hadoop - for NYC Kafka Meetup
Kafka & Hadoop - for NYC Kafka Meetup
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 

Visualizing database performance hotsos 13-v2

  • 1. Visualizing Database Performance with R Gwen Shapira, Senior Consultant February, 2013
  • 2. About Me – Oracle ACE Director – Member of Oak Table – 14 years of IT – Performance Tuning – Troubleshooting – Hadoop – Presents, Blogs, Tweets – @gwenshap 2 © 2013 Pythian
  • 3. About Pythian • Recognized Leader: – Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server – Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments • Expertise: – Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs. – Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC • Global Reach & Scalability: – Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response 3 © 2013 Pythian
  • 4. Will Talk About: • Data pre-processing tools • Visualization tools and techniques • How to make great looking charts • What makes visuals effective • How to avoid visualization mistakes
  • 5. Will NOT Talk About: • How to collect performance data • Cool ASH queries • How to program in R • Statistics • Machine Learning • What the data actually means • How to explain the results to your boss
  • 6. Why Visualize? • Yet another analysis tool • But more fun • Highly effective • Communications tool, too • But not at the same time 6 © 2013 Pythian
  • 9. R Studio 9 © 2013 Pythian
  • 10. Getting Data In Shape 10 © 2013 Pythian
  • 11. Use the DB, Luke Aggregate Scale Filter 11 © 2013 Pythian
  • 12. Getting DB Data to R library(RJDBC) drv <-JDBC("oracle.jdbc.driver.OracleDriver", "/Users/grahn/code/jdbc/ojdbc6.jar") conn<-dbConnect(drv, "jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl", "grahn","grahn") # import the data into a data.frame lfs <-dbGetQuery(conn, "select SAMPLE_ID, TIME_WAITED from ashdump where EVENT='log file sync’ order by SAMPLE_ID") 12 © 2013 Pythian
  • 13. With R "NAME","SNAP_TIME","BYTES" "free memory",12-03-09 00:00:00,645935368 "KGH: NO ACCESS",12-03-09 00:00:00,325214880 "db_block_hash_buckets",12-03-09 00:00:00,186650624 "free memory",12-03-09 00:00:00,134211304 "shared_io_pool",12-03-09 00:00:00,536870912 "log_buffer",12-03-09 00:00:00,16924672 "buffer_cache",12-03-09 00:00:00,21676163072 "fixed_sga",12-03-09 00:00:00,2238472 "JOXLE",12-03-10 04:00:01,27349056 "free memory",12-03-10 04:00:01,105800192 "free memory",12-03-10 04:00:01,192741376 "PX msg pool",12-03-10 04:00:01,8192000 13 © 2013 Pythian
  • 14. Reshape shared_pool <- read.csv(~/shapira/shared_pool.csv") install.packages("reshape") library(reshape) max_shared_pool<- cast(shared_pool,SNAP_TIME ~ NAME,max) Time free memory log_buffer buffer_cache 12-03-09 00:00:00 645935368 16924672 21676163072 12-03-09 04:00:00 192741376 14 © 2013 Pythian
  • 15. With R out of scale 15 © 2013 Pythian
  • 16. Select Subset of data max_shared_pool <- subset(max_shared_pool, select = - c(buffer_cache)) boxplot( (max_shared_pool)/1024/1024, xlab="Size in MBytes", horizontal=TRUE, las=1,par(mar=c(4,6,2,1)) ) 16 © 2013 Pythian
  • 17. With R 17 © 2013 Pythian
  • 18. More Subsets SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT 14929 User I/O cell single block 10526629 physical read 10526629 5015 User I/O cell single block physical read 10465699 21572 Concurrency library cache: mutex X 10465699 65938 Concurrency library cache: mutex X 18 © 2013 Pythian
  • 19. Filtering Data new <- subset (old, row filter, column filter) phys_io <- subset(ash, WAIT_CLASS == ―User I/O‖, select = -c(EVENT)) SAMPLE_ID TIME_WAITED WAIT_CLASS 10526629 14929 User I/O 10526629 5015 User I/O 19 © 2013 Pythian
  • 20. Another Filtering Syntax short_waits <- subset(ash, ash$TIME_WAITED < 10000) short_waits <- ash[ash$TIME_WAITED < 10000,] Not a Typo! SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT 10526629 5015 User I/O cell single block physical read 20 © 2013 Pythian
  • 21. Summarize with DDPLY install.packages(‖plyr") library(plyr) ash2 <- ddply(ash, ‖SAMPLE_ID‖, summarise, N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED)); SAMPLE_ID N MEAN MAX 10526629 2 9972 14929 10465699 2 43755 65938 21 © 2013 Pythian
  • 22. Cheating for DBAs library(sqldf) ash2 = sqldf('select SAMPLE_ID, count(*) N, mean(TIME_WAITED), max(TIME_WAITED) from ash where WAIT_CLASS=―User I/O‖ group by SAMPLE_ID') 22 © 2013 Pythian
  • 23. When all else fails Text is text. Frits Hoogland converts 10046 trace to CSV for R with SED: s/^(WAIT) #([0-9]*): nam='(.*)' ela= *([0-9]*) [0-9a-z #|]*=([0-9]*) [0-9a-z #|]*=([0-9]*) [0-9a- z #|]*=([0-9]*) obj#=([0-9-]*) tim=([0- 9]*)$/1|2|3|4|5|6|7|8|9/ 23 © 2013 Pythian
  • 24. Exploring Data 24 © 2013 Pythian
  • 25. Directions to Explore • Shape of data • Correlations • Changes over time 25 © 2013 Pythian
  • 26. The Goal of Analysis is a Story • Who • What • When • Where • Why • Why • Why • Why • Why 26 © 2013 Pythian
  • 27. Boxplot 75% of exports take • Initial step less than 600m • Identify outliers • Compare groups • Summarize Fail? 27 © 2013 Pythian
  • 28. For Example: WHAT? 28 © 2013 Pythian
  • 29. How its done? ash <- read.csv('~/Downloads/ash1.csv') boxplot(ash$TIME_WAITED/1000000 ~ ash$WAIT_CLASS, xlab="Wait Class", ylab="Time Waited (s)", cex.axis=1.2) 29 © 2013 Pythian
  • 30. Scatter Plot • Incredibly versatile • Use to: – Show changes over time – Show correlations – Highlight trends – Find model – Pretty much everything 30 © 2013 Pythian
  • 31. WHAT? 31 © 2013 Pythian
  • 32. Log Data 32 © 2013 Pythian
  • 33. How its done? install.packages("ggplot2") library(ggplot2) ggplot(ash, aes(SAMPLE_ID,TIME_WAITED, color=factor(WAIT_CLASS)))+geom_point(); ggplot(ash, aes(SAMPLE_ID,log(TIME_WAITED), color=factor(WAIT_CLASS)))+geom_point(); 33 © 2013 Pythian
  • 34. Only ”Small Waits” 500us Physical IO? 34 © 2013 Pythian
  • 36. Smoothing 36 © 2013 Pythian
  • 38. Data over Time 11gR2 ! 38 © 2013 Pythian
  • 39. Finding Correlation 39 © 2013 Pythian
  • 40. Regression (is not Causation) 40 © 2013 Pythian
  • 41. How? concurr2 <- ddply(concurr,.(SAMPLE_ID), summarise, N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED)); ggplot(concurr2,aes(N,max/1000000))+geom_poin t()+geom_smooth(method=lm)+xlab("Number of Samples") +ylab("Max Time Waited (s)") 41 © 2013 Pythian
  • 42. Heatmap • Values as ―blocks‖ in a matrix • Clearer than scatter plot for large amounts of data • Shows less information • Performance data made sexy 42 © 2013 Pythian
  • 43. Heatmap 43 © 2013 Pythian
  • 44. How? ash2 <- ddply(concurr,.(SAMPLE_ID), summarise,N=length(TIME_WAITED), mean=mean(TIME_WAITED), max=max(TIME_WAITED)) ash2 <- ash2[ash2$WAIT_CLASS %in% c("Concurrency","User I/O","Other"),] ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) + geom_tile(aes(fill = log(N))) + scale_fill_gradient(low = ‖green‖, high = ‖red") 44 © 2013 Pythian
  • 45. Presenting Your Data 45 © 2013 Pythian
  • 46. FACT ―Even irrelevant neuroscience information in an explanation of a psychological phenomenon may interfere with people’s abilities to critically consider the underlying logic of this explanation.‖ 46 © 2013 Pythian
  • 47. Numerical quantities focus on expected values – graphical summaries on unexpected values --John Tukey 47
  • 48. Our goal is an interesting presentation. What is “Interesting”? • Surprise • Beauty • Stories • Visuals • Counterintuitive • Variety 48 © 2013 Pythian
  • 49. Bad Visualizations Lie 1. Omit important data 2. Distort data 3. Misleading 4. Confusing 5. Fake correlations and Bad models 49 © 2013 Pythian
  • 50. Bad vs. Good Visuals 50 © 2013 Pythian
  • 51. Eye-API • Good: • Bad: – distances – shades – locations – relative area – length – angles – high contrast 51 © 2013 Pythian
  • 52. Good or Bad? 52 © 2013 Pythian
  • 53. 53 © 2013 Pythian
  • 54. #1 Mistake – Throw a line on Data 54 © 2013 Pythian
  • 55. 55 © 2013 Pythian
  • 56. Avoid Pie Charts 56 © 2013 Pythian
  • 57. Infographics always have Pie Charts 57 © 2013 Pythian
  • 58. Which is better? 58 © 2013 Pythian
  • 59. Creativity is Allowed 59 © 2013 Pythian
  • 60. Make it Beautiful – for Geeks • Contrast • Reduce noise • Few colors • Few fonts • Lots of Data • More Signal • Less Noise 60 © 2013 Pythian
  • 61. IMPORTant R Libraries • reshape • plyr • ggplot2 • sqldf • http://blog.revolutionanalytics.com/2013/02/10-r- packages-every-data-scientist-should-know- about.html 61 © 2013 Pythian
  • 62. Other Visualization Tools • R + R Studio • Excel • Gephi • JIT, D3.js • Excel • ggobi 62 © 2013 Pythian
  • 63. Thank you – Q&A To contact us sales@pythian.com 1-877-PYTHIAN To follow us http://www.pythian.com/blog http://www.facebook.com/pages/The-Pythian- Group/163902527671 @pythian http://www.linkedin.com/company/pythian 63 © 2013 Pythian

Editor's Notes

  1. The goal is to...Structure = Trends, repetitions and outliers, etc. High bandwidth information channel.Apply pattern matching skills and prior knowledge to analysis of data.
  2. Just a photo. Add a list of resources at the end. R is my favorite but there are many many others.
  3. 3 data preparation techniques
  4. You can also pivot and apply pre-analysis.The goal is on one hand to get all the data you are going to need, so you won’t have to move back and forth between the database and R.On the other hand, minimizing the amount of data you have to copy over the network. And as DB experts and R newbies – most cleanup activities are easier for us in the DB rather than elsewhere.
  5. Example from Greg Rahn blog post: http://structureddata.org/2011/12/20/visualizing-active-session-history-ash-data-with-r/
  6. Re-shape makes pivoting easySometimes you didn’t know you should filter out data before you started working on it in R
  7. I don’t really want the buffer cache data, its too large and will distort all my charts
  8. Perl is awesome for processing lines of text, can be used to aggregate (with hash maps), filter, etc. So are SED and AWKAlso, data that is not from the database, sometimes doesn’t look like a table, so you can’t massage it with R easily.Frits Hoogland has wonderful example of using sed to extract wait information our of 10046 file.:http://fritshoogland.wordpress.com/2012/01/18/using-r-and-oracle-tracefiles/
  9. Shape of data – distribution, common values, outliers. Charts should be useful, but not necessarily sexy.
  10. You also need at least two solutions, but that’s for later
  11. We can see what looks like failed exports (but don’t know when they failed), we can see that our largest database has large variance in times, we can see that most databases have export times far outside the average, and we can see the 75% percent point
  12. YuryVelikanov: http://www.pythian.com/blog/upgraded-to-11gr2-congrats-you-are-in-direct-reads-trouble/
  13. Published by Greg Rahn: http://structureddata.org/2011/12/20/visualizing-active-session-history-ash-data-with-r/
  14. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778755/
  15. We pay attention to what is interesting. And what is interesting is the story, the outliers, the changes, the discoverieshttp://headrush.typepad.com/creating_passionate_users/2005/12/but_is_it_inter.html
  16. From Baron Schwartz blog: http://www.xaprb.com/blog/2011/01/15/sleep-while-you-can-because-it-wont-last-long/Showing number of blog posts on MySQL over time. Clearly we are running out of blog posts.Extrapolating without a model to explain what you are looking at.just drawing a line through data is not enough – you need a model.