SlideShare une entreprise Scribd logo
1  sur  31
Télécharger pour lire hors ligne
R - scripted data
History
Language
Packages
Tools
RPubs
Slidify
Shiny
A Brief History of R
– 1976 S - Bell Labs; Fortran
– John Chambers
– 1988 S Version 3; C language
● 1991 R Created
– Ross Ihaka and Robert Gentleman
● 1993 R Announced
– 1993 S licensed to StatSci (now Insightful)
● 2000 R Version 1.0.0 released
– 2004 S purchased from Lucent (2MM)
– 2008 TIBCO acquires Insightful (25MM)
Other “Stats” Tools
● R – additional, commercial support
Oracle: “Big Data Appliance” - R + Hadoop
+ Linux + NoSQL + Exadata(H/W)
IBM: R executing in Hadoop (massively
parallel in-databse analytics)
● SAS (SAS Institute) dev. 1966, 1st rel 1972
● SPSS (IBM) 1st rel 1968
Model Development and
Execution Comparison
http://inside-bigdata.com/2014/06/25/revolution-r-enterprise-vs-sas-performance/
Oracle + INTEL Libraries
https://blogs.oracle.com/R/entry/oracle_r_distribution_performance_benchmark
Language
● Derviative of S (S PLUS)
● Portable (includes Playstation 3)
● Interpreted, calls into C libraries
● Functional!
● GPL
● 40 year old technology
● Open Source (you want it, you do it)
Data Types
● Symbols refer to objects
● Object attributes
– names
– dimnames
– dimensions
– class
– length
– user defined attributes/metadata
Data Types
● Object types – single class, except list
– List
(may have mixed classes)
– Vectors
(scalar is a vector of length 1)
– Matrices
(vector with 'dimension' attribute)
(column major order)
Data Types
● Object types
– Factors
● Categorical data (like an enumeration)
– Data frames
● Special list, each element has same length
● Elements are columns with length rows
● Each elements (column) has its own type
● row.names() attribute to name the rows
● Convert to matrix with data.matrix()
● Load with read.table(), read.csv()
Data Types
● Object “atomic” classes
– character
– numeric (double precision real)
– integer
– complex
– logical (booleans)
Numeric and Integer include Inf and NaN
1 / Inf == 0 !
any class can be NA
NaN is NA, NA is not NaN
Data Types
● Dates
– “Date” class
– Days since epoch (1970-01-01)
● Times
– “POSIXct” or “POSIXlt” class
– Seconds since epoch
● Coerce to string with as.Date()
● Generic functions include 'weekdays()',
months()', 'quarters()'
Operators
● Grouping: ()
● Assignment: to<-from AND from->to
● Vectorized: + - ! * / ^ %% & |
● ~ ? : %/% %*% %o% %x% %in% < > == >=
<= && ||
● Element access: [[]] [] $
● Function argument types:
– symbol, symbol=default, ...
Control Structures
● if, else
● for
● while
● repeat
● break, next, return
Apply
● apply – apply functions over arrays
● lapply – apply functions over list / vector
● sapply – apply function to data frames
● tapply – apply function over ragged array
● mapply – apply function to multiple objects
Functions
● Functions are objects
● Functional closure consists of:
– Formal argument list
– Function body (definition)
– Environment
● Each of these can be assigned to
● Assign to environment can eliminate
unwanted environment capture
Packages
● CRAN (Comprehensive R Archive Network)
– Main site, includes R download
● Bioconductor
– Analysis of genomic data
– Next generation high-throughput
sequencing
● R-forge
● GitHub and Personal repositories
Packages
● Analysis
– Statistical analysis (stats, linprog)
● Linear (and general linear) modeling
● Tree models
● Analysis of variance
– Machine learning (caret, kernlab)
● Clustering (forests, k-means, knn, etc)
● Training and predictions
● Cross validation and error analysis
Packages
● Graphics
– Base graphics
● Plot: plot, hist, ...
● Annotate: text, lines, points, axis, ...
– Lattice
● Single command: xyplot, bwplot, ...
– Ggplot2
● Single command: qplot
● Defining objects: aesthetics, geoms
● Chain commands: ggplot, geom_*, ...
Packages
● Data visualization
– rCharts (GitHub), converts visualizations to
Javascript (e.g. d3.js)
http://www.google.com/trends/explore#q=R%20language%2C%20Data%20Visualization%2C%20D3.js%2C%20Processing.js&cmpt=q
Tools
● Command line
● Rstudio (can run on remote Linux server)
● Rkward
● Rcommander (tcl/tk)
● JGR – Java (GUI for R)
● Rattle - RGtk2
Tools
● Debugging
– Print statements!
– Interactive tools:
● traceback() – stack trace on error
● debug() – flags function for stepping
● browser() - stops function and enters debug
● trace() - insert trace statements
● recover() - modify error behavior, can
browse call stack
Tools
● Profiling
– “We should forget about small efficiencies,
say about 97% of the time: premature
optimization is the root of all evil”
– Donald Knuth
– system.time() - CPU, wall times
– Rprof() - use symmaryRprof() to see results
● Do not use Rprof() and system.time()
together
● Calls to C/Fortran libraries not profiled
Data Exploration
● Script it!
– If you can't repeat it, it didn't happen
● Get the data (ingest)
– Functions to download, uncompress,
unarchive, store, read, and organize
● Clean the data
– Handle missing and incomplete data,
impute values, identify outliers
Data Exploration
● Look at the data (models, visualization)
– Model – regressions (linear, logistic),
clustering, ANOVA
– Refine models and plot the result
● Look for systematic issues – unexpected
trends, bias, unexplained variance, error
estimates, residual analysis
● Explore complexity – number of explanatory
factors
– Plot the models
● What does it look like?
Reproducible Research
● Allows others to validate the work
● Ensures that the results are accepted
● Reduces the chance of errors propagating
– http://youtu.be/7gYIs7uYbMo
– 2010 Anil Potti resigns from Duke after
research was found flawed (off by 1!)
● Clinical trials based on the flawed research
was finally cancelled
● Closed data, non-reproducible research
exacerbated the problem
Reproducible Research
● Don't do things by hand – especially editing
spreadsheets to “clean up” data (removing
outliers, validating, editing) or dowloading
files
● Actions taken by hand need very detailed
documentation to reproduce – such as
download sites and what files were
downloaded to
● GUIs are convenient, but can't be repeated
Reproducible Research
● Capture the steps in a script:
– download.file(“http://...”, “localfile.zip”)
● Can be repeated as long as the link is
available. Can keep and manage the
downloaded file if that is an issue
– Use version control
● Capture small steps at a time (git is good
for this!)
● Can track changes and revert if needed
● Can use GitHub, BitBucket, SouceForge to
publish the results as well
Reproducible Research
● Capture environment – OS, tools, versions
● Don't save outputs – regenerate
– Ok to cache results while in use, but don't
store the results, just the code+data that
produced it
– If you keep intermediate files, document
how they were created
● Set random seed
Sharing Research
● Rmarkdown – markdown with embedded R
– knitr package executes the R fragments
and embeds the code and results into
markdown, which can convert to HTML or
PDF
– Literate programming!
● Hosted documentation
– Rpubs (rpubs.com)
– GitHub gh-pages (github.io)
Sharing Research
● Embedded presentations
– Author using slidify package
– Rmarkdown with embedded R code
– Creates HTML5 presentation slide deck
– Can include inline quizes
Data Products
● Interactive visualizations
– shiny, shinyapp packages
– RStudio includes interactive display of
shiny applications during development
– Generates bootstrap + HTML5 + javascript
+ d3 application
● Hosted!
– Hosted at shinyapp.io
– Private? Server images available (for
purchase)

Contenu connexe

Tendances

R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
 
R Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB AcademyR Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB Academyrajkamaltibacademy
 
R basics
R basicsR basics
R basicsFAO
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RYanchang Zhao
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data scienceSovello Hildebrand
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to RstudioOlga Scrivner
 
R language tutorial
R language tutorialR language tutorial
R language tutorialDavid Chiu
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
Functional Programming in R
Functional Programming in RFunctional Programming in R
Functional Programming in RSoumendra Dhanee
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
R programming language
R programming languageR programming language
R programming languageKeerti Verma
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurSiddharth Mathur
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 

Tendances (20)

An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 
R Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB AcademyR Programming Tutorial for Beginners - -TIB Academy
R Programming Tutorial for Beginners - -TIB Academy
 
R basics
R basicsR basics
R basics
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
Introduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in RIntroduction to Data Mining with R and Data Import/Export in R
Introduction to Data Mining with R and Data Import/Export in R
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
R programming for data science
R programming for data scienceR programming for data science
R programming for data science
 
R programming by ganesh kavhar
R programming by ganesh kavharR programming by ganesh kavhar
R programming by ganesh kavhar
 
Introduction to Rstudio
Introduction to RstudioIntroduction to Rstudio
Introduction to Rstudio
 
R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
Functional Programming in R
Functional Programming in RFunctional Programming in R
Functional Programming in R
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
R programming language
R programming languageR programming language
R programming language
 
Apache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathurApache pig presentation_siddharth_mathur
Apache pig presentation_siddharth_mathur
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 

Similaire à R - the language

Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Hakka Labs
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdfBusyBird2
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowPyData
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in RSamuel Bosch
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dan Lynn
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowLaura Lorenz
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptanshikagoel52
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGAdam Kawa
 
IIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into RIIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into RKevin Smith
 

Similaire à R - the language (20)

Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
Introduction to InfluxDB, an Open Source Distributed Time Series Database by ...
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Lecture1_R.pdf
Lecture1_R.pdfLecture1_R.pdf
Lecture1_R.pdf
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'Handout: 'Open Source Tools & Resources'
Handout: 'Open Source Tools & Resources'
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Reproducible Computational Research in R
Reproducible Computational Research in RReproducible Computational Research in R
Reproducible Computational Research in R
 
Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga Introduction to R software, by Leire ibaibarriaga
Introduction to R software, by Leire ibaibarriaga
 
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
Dirty Data? Clean it up! - Rocky Mountain DataCon 2016
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1_R.ppt
Lecture1_R.pptLecture1_R.ppt
Lecture1_R.ppt
 
Lecture1 r
Lecture1 rLecture1 r
Lecture1 r
 
Modeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.pptModeling in R Programming Language for Beginers.ppt
Modeling in R Programming Language for Beginers.ppt
 
Introduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUGIntroduction To Apache Pig at WHUG
Introduction To Apache Pig at WHUG
 
IIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into RIIUG 2016 Gathering Informix data into R
IIUG 2016 Gathering Informix data into R
 

Dernier

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Dernier (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

R - the language

  • 1. R - scripted data History Language Packages Tools RPubs Slidify Shiny
  • 2. A Brief History of R – 1976 S - Bell Labs; Fortran – John Chambers – 1988 S Version 3; C language ● 1991 R Created – Ross Ihaka and Robert Gentleman ● 1993 R Announced – 1993 S licensed to StatSci (now Insightful) ● 2000 R Version 1.0.0 released – 2004 S purchased from Lucent (2MM) – 2008 TIBCO acquires Insightful (25MM)
  • 3. Other “Stats” Tools ● R – additional, commercial support Oracle: “Big Data Appliance” - R + Hadoop + Linux + NoSQL + Exadata(H/W) IBM: R executing in Hadoop (massively parallel in-databse analytics) ● SAS (SAS Institute) dev. 1966, 1st rel 1972 ● SPSS (IBM) 1st rel 1968
  • 4. Model Development and Execution Comparison http://inside-bigdata.com/2014/06/25/revolution-r-enterprise-vs-sas-performance/
  • 5. Oracle + INTEL Libraries https://blogs.oracle.com/R/entry/oracle_r_distribution_performance_benchmark
  • 6. Language ● Derviative of S (S PLUS) ● Portable (includes Playstation 3) ● Interpreted, calls into C libraries ● Functional! ● GPL ● 40 year old technology ● Open Source (you want it, you do it)
  • 7. Data Types ● Symbols refer to objects ● Object attributes – names – dimnames – dimensions – class – length – user defined attributes/metadata
  • 8. Data Types ● Object types – single class, except list – List (may have mixed classes) – Vectors (scalar is a vector of length 1) – Matrices (vector with 'dimension' attribute) (column major order)
  • 9. Data Types ● Object types – Factors ● Categorical data (like an enumeration) – Data frames ● Special list, each element has same length ● Elements are columns with length rows ● Each elements (column) has its own type ● row.names() attribute to name the rows ● Convert to matrix with data.matrix() ● Load with read.table(), read.csv()
  • 10. Data Types ● Object “atomic” classes – character – numeric (double precision real) – integer – complex – logical (booleans) Numeric and Integer include Inf and NaN 1 / Inf == 0 ! any class can be NA NaN is NA, NA is not NaN
  • 11. Data Types ● Dates – “Date” class – Days since epoch (1970-01-01) ● Times – “POSIXct” or “POSIXlt” class – Seconds since epoch ● Coerce to string with as.Date() ● Generic functions include 'weekdays()', months()', 'quarters()'
  • 12. Operators ● Grouping: () ● Assignment: to<-from AND from->to ● Vectorized: + - ! * / ^ %% & | ● ~ ? : %/% %*% %o% %x% %in% < > == >= <= && || ● Element access: [[]] [] $ ● Function argument types: – symbol, symbol=default, ...
  • 13. Control Structures ● if, else ● for ● while ● repeat ● break, next, return
  • 14. Apply ● apply – apply functions over arrays ● lapply – apply functions over list / vector ● sapply – apply function to data frames ● tapply – apply function over ragged array ● mapply – apply function to multiple objects
  • 15. Functions ● Functions are objects ● Functional closure consists of: – Formal argument list – Function body (definition) – Environment ● Each of these can be assigned to ● Assign to environment can eliminate unwanted environment capture
  • 16. Packages ● CRAN (Comprehensive R Archive Network) – Main site, includes R download ● Bioconductor – Analysis of genomic data – Next generation high-throughput sequencing ● R-forge ● GitHub and Personal repositories
  • 17. Packages ● Analysis – Statistical analysis (stats, linprog) ● Linear (and general linear) modeling ● Tree models ● Analysis of variance – Machine learning (caret, kernlab) ● Clustering (forests, k-means, knn, etc) ● Training and predictions ● Cross validation and error analysis
  • 18. Packages ● Graphics – Base graphics ● Plot: plot, hist, ... ● Annotate: text, lines, points, axis, ... – Lattice ● Single command: xyplot, bwplot, ... – Ggplot2 ● Single command: qplot ● Defining objects: aesthetics, geoms ● Chain commands: ggplot, geom_*, ...
  • 19. Packages ● Data visualization – rCharts (GitHub), converts visualizations to Javascript (e.g. d3.js) http://www.google.com/trends/explore#q=R%20language%2C%20Data%20Visualization%2C%20D3.js%2C%20Processing.js&cmpt=q
  • 20. Tools ● Command line ● Rstudio (can run on remote Linux server) ● Rkward ● Rcommander (tcl/tk) ● JGR – Java (GUI for R) ● Rattle - RGtk2
  • 21. Tools ● Debugging – Print statements! – Interactive tools: ● traceback() – stack trace on error ● debug() – flags function for stepping ● browser() - stops function and enters debug ● trace() - insert trace statements ● recover() - modify error behavior, can browse call stack
  • 22. Tools ● Profiling – “We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil” – Donald Knuth – system.time() - CPU, wall times – Rprof() - use symmaryRprof() to see results ● Do not use Rprof() and system.time() together ● Calls to C/Fortran libraries not profiled
  • 23. Data Exploration ● Script it! – If you can't repeat it, it didn't happen ● Get the data (ingest) – Functions to download, uncompress, unarchive, store, read, and organize ● Clean the data – Handle missing and incomplete data, impute values, identify outliers
  • 24. Data Exploration ● Look at the data (models, visualization) – Model – regressions (linear, logistic), clustering, ANOVA – Refine models and plot the result ● Look for systematic issues – unexpected trends, bias, unexplained variance, error estimates, residual analysis ● Explore complexity – number of explanatory factors – Plot the models ● What does it look like?
  • 25. Reproducible Research ● Allows others to validate the work ● Ensures that the results are accepted ● Reduces the chance of errors propagating – http://youtu.be/7gYIs7uYbMo – 2010 Anil Potti resigns from Duke after research was found flawed (off by 1!) ● Clinical trials based on the flawed research was finally cancelled ● Closed data, non-reproducible research exacerbated the problem
  • 26. Reproducible Research ● Don't do things by hand – especially editing spreadsheets to “clean up” data (removing outliers, validating, editing) or dowloading files ● Actions taken by hand need very detailed documentation to reproduce – such as download sites and what files were downloaded to ● GUIs are convenient, but can't be repeated
  • 27. Reproducible Research ● Capture the steps in a script: – download.file(“http://...”, “localfile.zip”) ● Can be repeated as long as the link is available. Can keep and manage the downloaded file if that is an issue – Use version control ● Capture small steps at a time (git is good for this!) ● Can track changes and revert if needed ● Can use GitHub, BitBucket, SouceForge to publish the results as well
  • 28. Reproducible Research ● Capture environment – OS, tools, versions ● Don't save outputs – regenerate – Ok to cache results while in use, but don't store the results, just the code+data that produced it – If you keep intermediate files, document how they were created ● Set random seed
  • 29. Sharing Research ● Rmarkdown – markdown with embedded R – knitr package executes the R fragments and embeds the code and results into markdown, which can convert to HTML or PDF – Literate programming! ● Hosted documentation – Rpubs (rpubs.com) – GitHub gh-pages (github.io)
  • 30. Sharing Research ● Embedded presentations – Author using slidify package – Rmarkdown with embedded R code – Creates HTML5 presentation slide deck – Can include inline quizes
  • 31. Data Products ● Interactive visualizations – shiny, shinyapp packages – RStudio includes interactive display of shiny applications during development – Generates bootstrap + HTML5 + javascript + d3 application ● Hosted! – Hosted at shinyapp.io – Private? Server images available (for purchase)