SlideShare une entreprise Scribd logo
1  sur  35
R for Pirates
     Mandi Walls
      @lnxchk
 EscConf, Boston, MA
  October 27, 2011
whoami

• stats misfit
• R tinkerer
• large-farm runner
• not a professional statistician :D
What is R
• Scripting language for stats work
• Inspired by earlier S (for statistics)
  developed at AT&T
• FOSS
• Syntax inherits through Algol family, so
  looks somewhat like C/C++
What Does R Do?

•   Manipulate data

•   Complex Modeling and
    Computation

•   Graphics and
    Visualization
Why R?


• WHY NOT!?
But Other Math Stuff!
•   Mathematica
•   MatLab
•   Minitab
•   MAPLE
•   Excel (yes. shutup h8rs. ask your CFOs what they
    use)
•   R provides sophisticated statistical and modeling
    capabilities, and is extendible through your own code
Get R


• Available for Linux, Mac, Windows
• http://www.r-project.org/
Fire!

•   R console on Mac

•   Interactive interpreter
    for your R needs

•   Can also run from the
    command line: R
R Basics
•   R considers all elements
    to be vectors

•   A single number is a
    one-element vector

•   Use <- for assignment

•   Use c() to concatenate
    values into a vector
Let’s see that again
Practice Datasets


•   data()

•   shows the sample sets
    included with your R
Functions

•   Looks familiar!

•   Let’s see one!

•   “evencount” counts the number of even ints in a vector
Datatypes
•   Vectors, the important ones

•   Scalars are really single-element vectors

•   Character strings

•   Matrices, rectangular arrays of numbers

•   Lists

•   Tables, useful for data transitions and temp work
Vectors
•   R’s most-used data structure

•   All elements in a vector must have the same mode
    or data type

•   To add values to a vector, you concatenate into it
    with the c() function

•   Many mathematical functions can be performed on
    a vector, they can also be traversed like arrays

•   Index starts at 1, not 0!
Scalars

•   One-element vectors

    > x <- 8

    > x[1]

    [1] 8

•   also climb your rigging


                                  ©Disney.
Character Strings
•   Single-element vectors   •   Can do normal string
    with mode character          things, like
                                 > t <- paste("yo","dawg")
    > y <- "abc"
                                 > t
    > length(y)
                                 [1] "yo dawg"
    [1] 1
                                 > u <- strsplit(t,"")
    > mode(y)
                                 > u
    [1] "character"
                                 [[1]]

                                 [1] "y" "o" " " "d" "a" "w" "g"
Matrices
•   Two-dimensional array

    > m <- rbind(c(1,4),c(2,2))

    > m
           [,1] [,2]
    [1,]      1    4
    [2,]      2    2
    > m[1,2]
    [1] 4
    > m[1,]
    [1] 1 4
Lists
•   Contain elements of different types

•   Have a particular syntax

    > x <- list(u=2, v="abc")
    > x
    $u
    [1] 2

    $v
    [1] "abc"

    > x$u
    [1] 2
Data Frames
•   Matrices are limited to only a single type for all elements
•   A data frame can contain different types of data, can be read
    in from a file or created in realtime
    > df <- data.frame(list(kids=c("Olivia","Madison"),ages=c(10,8)))

    > df

           kids ages

    1   Olivia    10

    2 Madison      8

    > df$ages

    [1] 10    8
Putting R to Work

•   Read in a log file:
    access <- read.table("access.log", header=FALSE)
    > head(access)
               V1 V2 V3                      V4     V5                            V6   V7    V8
    1 192.168.1.10   -   - [23/Oct/2011:07:03:33 -0500]   GET /menu/menu.js HTTP/1.1 401    401
    2 192.168.1.10   -   - [23/Oct/2011:07:03:33 -0500]   GET /menu/menu.js HTTP/1.1 200    1970
    3 192.168.1.10   -   - [23/Oct/2011:07:03:33 -0500]   GET /menu/menu.css HTTP/1.1 200   2258
Fun with Plots
• This plot series is going to
   make use of the “return
   codes” from the access log

• We’ll do a series of plots
   that gradually get more
   sophisticated

• This is a basic histogram of
   the data, it’s not much fun
Barplot
barplot(table(access[,7]))
Barplot v2
barplot(table(access[,7]),ylab="Number of Pages",xlab="Return
Code",main="Plot of Return Codes")
Barplot v3
barplot(table(access[,7]),ylab="Number of
Pages",xlab="Return Code",main="Plot of
Return Codes", col=heat.colors(length(x)))
Barplot v4




Source: wikipedia, http://en.wikipedia.org/wiki/Bar_%28establishment%29
Writing Graphical
             Output to Files
•   Set up the output target by calling a graphics function:

•   pdf(), png(), jpeg(), etc

•   jpeg(“/var/www/images/returncodes-date.jpg”)

•   Call the plot function you have chosen, then call dev.off()

•   Can be used in batch mode to create graphics from your data
Shopping is Hard, Let’s
          Do Math
•   Read in some load averages (one-min)

    loadavg<-read.table("load_avg.txt")

    head(loadavg)
        V1
    1 3.79
    2 3.11
    3 2.94
    4 4.81
Summary Stats
•   Summarize the data with one function call

•   Gives the min, max, mean, median, and quartiles
    summary(loadavg)
              V1
     Min.      :0.760
     1st Qu.:1.390
     Median :1.970
     Mean      :2.302
     3rd Qu.:3.080
     Max.      :5.070
Summary Stats as
   Boxplot
Same Thing, 3
                                  Datacenters
               > cpu<-read.table("cpu")

               > head(cpu)

                    V1    V2

               1 3.78 smq

               2 2.57 smq

               3 3.69 smq

               4 0.86 smq

          •    Looks like there’s outliers. That could spell
               trouble! You found them with R awesomeness.
               Horay!




boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot
                                 of One-Minute Load Average, FEs", col=topo.colors(3))
Running R in Your
              Workflow
  •   The little bit of boxplotting we did eariler, in a script:
[mandi@mandi ~]$ cat sample.R
#!/usr/bin/env Rscript
cpu<-read.table("cpu")
jpeg("./sample.jpg")
boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by
Datacenter", ylab="One-Minute Load Average", main="Box Plot
of One-Minute Load Average, FEs", col=heat.colors(3))
dev.off()
[mandi@mandi ~]$ Rscript sample.R > /dev/null
[mandi@mandi ~]$ ls -l sample.jpg
-rw-rw-r-- 1 mandi staff 20137 Oct 24 20:44 sample.jpg
Hey!


•   I made a graph with a
    script!
What Else?
•   R can read data input from a variety of files with regular
    formats

•   R can also fetch data from the internet using the url()
    function

•   R has a number of functions available for dealing with
    reading data, creating data frames or other structures, and
    converting string text into numerical data modes

•   Extended packages provide support for structured data
    formats like JSON.
References
• http://www.slideshare.net/dataspora/an-
  interactive-introduction-to-r-programming-
  language-for-statistics
• http://www.harding.edu/fmccown/R/
• Art of R Programming, Norman Matloff, Copyright
  2011 No Starch Press
• Statistical Analysis with R, John M. Quick, Copyright
  2011 Packt Publishing

Contenu connexe

Tendances

Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
ActiveState
 

Tendances (20)

Clojure class
Clojure classClojure class
Clojure class
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
Machine Learning Live
Machine Learning LiveMachine Learning Live
Machine Learning Live
 
Scala
ScalaScala
Scala
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
Meetup slides
Meetup slidesMeetup slides
Meetup slides
 
Clojure Intro
Clojure IntroClojure Intro
Clojure Intro
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Apache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API BasicsApache Flink Training: DataSet API Basics
Apache Flink Training: DataSet API Basics
 
Merge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using RMerge Multiple CSV in single data frame using R
Merge Multiple CSV in single data frame using R
 
Language R
Language RLanguage R
Language R
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Jug java7
Jug java7Jug java7
Jug java7
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
2019-01-29 - Demystifying Kotlin Coroutines
2019-01-29 - Demystifying Kotlin Coroutines2019-01-29 - Demystifying Kotlin Coroutines
2019-01-29 - Demystifying Kotlin Coroutines
 
Collections
CollectionsCollections
Collections
 
Haskell
HaskellHaskell
Haskell
 

Similaire à R for Pirates. ESCCONF October 27, 2011

SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
Reza Rahimi
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 

Similaire à R for Pirates. ESCCONF October 27, 2011 (20)

محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي   R program د.هديل القفيديمحاضرة برنامج التحليل الكمي   R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
 
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي   R program د.هديل القفيديمحاضرة برنامج التحليل الكمي   R program د.هديل القفيدي
محاضرة برنامج التحليل الكمي R program د.هديل القفيدي
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
Unit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptxUnit I - 1R introduction to R program.pptx
Unit I - 1R introduction to R program.pptx
 
Matlab lec1
Matlab lec1Matlab lec1
Matlab lec1
 
Aggregate.pptx
Aggregate.pptxAggregate.pptx
Aggregate.pptx
 
MATLAB Programming
MATLAB Programming MATLAB Programming
MATLAB Programming
 
Advanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.pptAdvanced Data Analytics with R Programming.ppt
Advanced Data Analytics with R Programming.ppt
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
Learning python
Learning pythonLearning python
Learning python
 
SMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning ApproachSMS Spam Filter Design Using R: A Machine Learning Approach
SMS Spam Filter Design Using R: A Machine Learning Approach
 
C
CC
C
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Modern C++
Modern C++Modern C++
Modern C++
 
R training2
R training2R training2
R training2
 

Plus de Mandi Walls

Addo reducing trauma in organizations with SLOs and chaos engineering
Addo  reducing trauma in organizations with SLOs and chaos engineeringAddo  reducing trauma in organizations with SLOs and chaos engineering
Addo reducing trauma in organizations with SLOs and chaos engineering
Mandi Walls
 

Plus de Mandi Walls (20)

DOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdfDOD Raleigh Gamedays with Chaos Engineering.pdf
DOD Raleigh Gamedays with Chaos Engineering.pdf
 
Addo reducing trauma in organizations with SLOs and chaos engineering
Addo  reducing trauma in organizations with SLOs and chaos engineeringAddo  reducing trauma in organizations with SLOs and chaos engineering
Addo reducing trauma in organizations with SLOs and chaos engineering
 
Full Service Ownership
Full Service OwnershipFull Service Ownership
Full Service Ownership
 
PagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call TeamsPagerDuty: Best Practices for On Call Teams
PagerDuty: Best Practices for On Call Teams
 
InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020InSpec at DevOps ATL Meetup January 22, 2020
InSpec at DevOps ATL Meetup January 22, 2020
 
Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019Prescriptive Security with InSpec - All Things Open 2019
Prescriptive Security with InSpec - All Things Open 2019
 
Using Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure SecurityUsing Chef InSpec for Infrastructure Security
Using Chef InSpec for Infrastructure Security
 
Adding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17xAdding Security to Your Workflow With InSpec - SCaLE17x
Adding Security to Your Workflow With InSpec - SCaLE17x
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
 
BuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec WorkshopBuildStuff.LT 2018 InSpec Workshop
BuildStuff.LT 2018 InSpec Workshop
 
InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018InSpec Workshop at Velocity London 2018
InSpec Workshop at Velocity London 2018
 
DevOpsDays InSpec Workshop
DevOpsDays InSpec WorkshopDevOpsDays InSpec Workshop
DevOpsDays InSpec Workshop
 
Adding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpecAdding Security and Compliance to Your Workflow with InSpec
Adding Security and Compliance to Your Workflow with InSpec
 
InSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.beInSpec - June 2018 at Open28.be
InSpec - June 2018 at Open28.be
 
habitat at docker bud
habitat at docker budhabitat at docker bud
habitat at docker bud
 
Ingite Slides for InSpec
Ingite Slides for InSpecIngite Slides for InSpec
Ingite Slides for InSpec
 
Habitat at LinuxLab IT
Habitat at LinuxLab ITHabitat at LinuxLab IT
Habitat at LinuxLab IT
 
InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017InSpec Workshop DevSecCon 2017
InSpec Workshop DevSecCon 2017
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017InSpec Workflow for DevOpsDays Riga 2017
InSpec Workflow for DevOpsDays Riga 2017
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

R for Pirates. ESCCONF October 27, 2011

  • 1. R for Pirates Mandi Walls @lnxchk EscConf, Boston, MA October 27, 2011
  • 2. whoami • stats misfit • R tinkerer • large-farm runner • not a professional statistician :D
  • 3. What is R • Scripting language for stats work • Inspired by earlier S (for statistics) developed at AT&T • FOSS • Syntax inherits through Algol family, so looks somewhat like C/C++
  • 4. What Does R Do? • Manipulate data • Complex Modeling and Computation • Graphics and Visualization
  • 6. But Other Math Stuff! • Mathematica • MatLab • Minitab • MAPLE • Excel (yes. shutup h8rs. ask your CFOs what they use) • R provides sophisticated statistical and modeling capabilities, and is extendible through your own code
  • 7. Get R • Available for Linux, Mac, Windows • http://www.r-project.org/
  • 8. Fire! • R console on Mac • Interactive interpreter for your R needs • Can also run from the command line: R
  • 9. R Basics • R considers all elements to be vectors • A single number is a one-element vector • Use <- for assignment • Use c() to concatenate values into a vector
  • 11. Practice Datasets • data() • shows the sample sets included with your R
  • 12. Functions • Looks familiar! • Let’s see one! • “evencount” counts the number of even ints in a vector
  • 13.
  • 14. Datatypes • Vectors, the important ones • Scalars are really single-element vectors • Character strings • Matrices, rectangular arrays of numbers • Lists • Tables, useful for data transitions and temp work
  • 15. Vectors • R’s most-used data structure • All elements in a vector must have the same mode or data type • To add values to a vector, you concatenate into it with the c() function • Many mathematical functions can be performed on a vector, they can also be traversed like arrays • Index starts at 1, not 0!
  • 16. Scalars • One-element vectors > x <- 8 > x[1] [1] 8 • also climb your rigging ©Disney.
  • 17. Character Strings • Single-element vectors • Can do normal string with mode character things, like > t <- paste("yo","dawg") > y <- "abc" > t > length(y) [1] "yo dawg" [1] 1 > u <- strsplit(t,"") > mode(y) > u [1] "character" [[1]] [1] "y" "o" " " "d" "a" "w" "g"
  • 18. Matrices • Two-dimensional array > m <- rbind(c(1,4),c(2,2)) > m [,1] [,2] [1,] 1 4 [2,] 2 2 > m[1,2] [1] 4 > m[1,] [1] 1 4
  • 19. Lists • Contain elements of different types • Have a particular syntax > x <- list(u=2, v="abc") > x $u [1] 2 $v [1] "abc" > x$u [1] 2
  • 20. Data Frames • Matrices are limited to only a single type for all elements • A data frame can contain different types of data, can be read in from a file or created in realtime > df <- data.frame(list(kids=c("Olivia","Madison"),ages=c(10,8))) > df kids ages 1 Olivia 10 2 Madison 8 > df$ages [1] 10 8
  • 21. Putting R to Work • Read in a log file: access <- read.table("access.log", header=FALSE) > head(access) V1 V2 V3 V4 V5 V6 V7 V8 1 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 401 401 2 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.js HTTP/1.1 200 1970 3 192.168.1.10 - - [23/Oct/2011:07:03:33 -0500] GET /menu/menu.css HTTP/1.1 200 2258
  • 22. Fun with Plots • This plot series is going to make use of the “return codes” from the access log • We’ll do a series of plots that gradually get more sophisticated • This is a basic histogram of the data, it’s not much fun
  • 24. Barplot v2 barplot(table(access[,7]),ylab="Number of Pages",xlab="Return Code",main="Plot of Return Codes")
  • 25. Barplot v3 barplot(table(access[,7]),ylab="Number of Pages",xlab="Return Code",main="Plot of Return Codes", col=heat.colors(length(x)))
  • 26. Barplot v4 Source: wikipedia, http://en.wikipedia.org/wiki/Bar_%28establishment%29
  • 27. Writing Graphical Output to Files • Set up the output target by calling a graphics function: • pdf(), png(), jpeg(), etc • jpeg(“/var/www/images/returncodes-date.jpg”) • Call the plot function you have chosen, then call dev.off() • Can be used in batch mode to create graphics from your data
  • 28. Shopping is Hard, Let’s Do Math • Read in some load averages (one-min) loadavg<-read.table("load_avg.txt") head(loadavg) V1 1 3.79 2 3.11 3 2.94 4 4.81
  • 29. Summary Stats • Summarize the data with one function call • Gives the min, max, mean, median, and quartiles summary(loadavg) V1 Min. :0.760 1st Qu.:1.390 Median :1.970 Mean :2.302 3rd Qu.:3.080 Max. :5.070
  • 30. Summary Stats as Boxplot
  • 31. Same Thing, 3 Datacenters > cpu<-read.table("cpu") > head(cpu) V1 V2 1 3.78 smq 2 2.57 smq 3 3.69 smq 4 0.86 smq • Looks like there’s outliers. That could spell trouble! You found them with R awesomeness. Horay! boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=topo.colors(3))
  • 32. Running R in Your Workflow • The little bit of boxplotting we did eariler, in a script: [mandi@mandi ~]$ cat sample.R #!/usr/bin/env Rscript cpu<-read.table("cpu") jpeg("./sample.jpg") boxplot(cpu[,1] ~ cpu[,2], xlab="Load Average at Time t, by Datacenter", ylab="One-Minute Load Average", main="Box Plot of One-Minute Load Average, FEs", col=heat.colors(3)) dev.off() [mandi@mandi ~]$ Rscript sample.R > /dev/null [mandi@mandi ~]$ ls -l sample.jpg -rw-rw-r-- 1 mandi staff 20137 Oct 24 20:44 sample.jpg
  • 33. Hey! • I made a graph with a script!
  • 34. What Else? • R can read data input from a variety of files with regular formats • R can also fetch data from the internet using the url() function • R has a number of functions available for dealing with reading data, creating data frames or other structures, and converting string text into numerical data modes • Extended packages provide support for structured data formats like JSON.
  • 35. References • http://www.slideshare.net/dataspora/an- interactive-introduction-to-r-programming- language-for-statistics • http://www.harding.edu/fmccown/R/ • Art of R Programming, Norman Matloff, Copyright 2011 No Starch Press • Statistical Analysis with R, John M. Quick, Copyright 2011 Packt Publishing

Notes de l'éditeur

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n