SlideShare a Scribd company logo
1 of 21
Revolution Confidential 
Revolution Analytics 
R 
and 
Data Science 
Joseph B Rickert 
September 25, 2014
Revolution Confidential What is R? 
 Most widely used data analysis 
software 
 Used by 2M+ data scientists, 
statisticians and analysts 
 Most powerful statistical 
programming language 
 Flexible, extensible and 
comprehensive for productivity 
 Platform for beautiful and unique 
data visualizations 
 As seen in New York Times, Twitter 
and Flowing Data 
 Thriving open-source community 
 Leading edge of analytics research 
www.revolutionanalytics.com/what-r
OPEN SOURCE R
Revolution Confidential 
4 
R’s popularity is growing rapidly 
R Usage Growth 
Rexer Data Miner Survey, 2007-2013 
• Rexer Data Miner Survey • IEEE Spectrum, July 2014 
#9: R 
Language Popularity 
IEEE Spectrum Top Programming Languages
Revolution Confidential Poll Question #1 
 What are the statistical programming 
languages/platforms you are most familiar 
with? (choose all that apply) 
 A) R 
 B) SAS 
 C) SPSS 
 D) KXEN 
 E) Statistica 
5
Revolution Confidential Tools for Data Science 
Source: O’Reilly Data Science Survey 
6
Revolution Confidential 
7 
R is among the highest-paid IT skills in the 
US 
Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
Revolution Confidential 
8 
Photo by Ksayer1 on flickr.
Revolution Confidential Why R for Data Science? 
X <- if (!is.empty.model(mt)) 
model.matrix(mt, mf, contrasts) 
else matrix(, NROW(Y), 0L) 
weights <- as.vector(model.weights(mf)) 
if (!is.null(weights) && !is.numeric(weights)) 
stop("'weights' must be a numeric vector") 
if (!is.null(weights) && any(weights < 0)) 
stop("negative weights not allowed") 
offset <- as.vector(model.offset(mf)) 
if (!is.null(offset)) { 
if (length(offset) != NROW(Y)) 
stop(gettextf("number of offsets is %d should equal %d (number of observations)", 
length(offset), NROW(Y)), domain = NA) 
} 
mustart <- model.extract(mf, "mustart") 
etastart <- model.extract(mf, "etastart") 
fit <- eval(call(if (is.function(method)) "method" else method, 
Algorithms 
x = X, y = Y, weights = weights, start = start, etastart = etastart, 
mustart = mustart, offset = offset, family = family, 
control = control, intercept = attr(mt, "intercept") > 
0L)) 
if (length(offset) && attr(mt, "intercept") > 0L) { 
fit2 <- eval(call(if (is.function(method)) "method" else method, 
x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, 
offset = offset, family = family, control = control, 
intercept = TRUE)) 
if (!fit2$converged) 
warning("fitting to calculate the null deviance did not converge -- increase 'maxit'?") 
fit$null.deviance <- fit2$deviance 
} 
if (model) 
fit$model <- mf 
fit$na.action <- attr(mf, "na.action") 
if (x) 
fit$x <- X 
if (!y) 
fit$y <- NULL 
fit <- c(fit, list(call = call, formula = formula, terms = mt, 
data = data, offset = offset, control = control, method = method, 
contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt, 
mf))) 
class(fit) <- c(fit$class, c("glm", "lm")) 
fit 
9 
Task Views
Revolution Confidential R Growth 
Put this astonishing growth in 
perspective: 
 SAS.V 9.3S contains ~ 
1,200 commands that are 
roughly equivalent to R 
functions 
 R packages contain a 
median of 5 functions 
 Therefore R has ~ 36,820 
functions 
 During 2013 alone, R added 
more functions than SAS 
Institute has written in its 
entire history! 
Bob Muenchen 
10 
5882 packages 9/25/14
Revolution Confidential Why R for Data Science? 
Visualizations 
11
Revolution Confidential Why R for Data Science? 
 Scripting 
 Functional programming 
 Parallel programming 
 Data structures 
 Objects 
 Data Types 
 Regular expressions 
 Data connections 
 Interfaces to other 
Programming 
languages 
12
Revolution Confidential Why R for Data Science? 
Data Manipulation 
13 
“It's often said that 80% of the effort of analysis is spent just getting the data 
ready to analyse, the process of data cleaning. Data cleaning is not only a 
vital first step, but it is often repeated multiple times over the course of an 
analysis as new problems come to light.” Hadley Wickham Tidy Data
Revolution Confidential Why R for Data Science? 
R Integrates 
 Web applications 
 Internet graphics 
 D3 
 Potly 
 Other Languages 
 C, C++ 
 Java 
 BI Tools 
 Data bases 
 SQL 
 MongoDB 
14
Revolution Confidential Poll Question #2 
 What are the data platforms that you are 
connecting to regularly? (choose all that 
apply) 
 A) Hadoop 
 B) Spark 
 C) Cloud-based (Azure/AWS/Google) 
 D) Data Warehouses 
 E) Servers (Grid or Cluster) 
15
Revolution Confidential Why R for Data Science 
Hadoop 
Servers & 
Clusters 
Data 
Warehouses 
R Scales
Revolution Confidential Poll Question #3 
 What are the types of models that you are 
working with most? (choose all that apply) 
 A) Linear models / Regression / GLM 
 B) Decision Trees / Random Forests 
 C) Survival Models 
 D) GBM 
 E) Time Series models 
17
Let’s look at some 
code. 
www.revolutionanalytics.com 
1.855.GET.REVO 
Twitter: @RevolutionR
Revolution Confidential 
19 
Why is R Right for Data Science? 
 R is open source 
 R is a powerful language 
 Data Manipulation 
 Computational Statistics 
 Machine Learning 
 R is an innovation engine 
 R has a rich and expanding ecosystem
Revolution Confidential 
20 
Q&A / Resources 
R Code and Markdown Files 
https://github.com/joseph-rickert/DataScienceRWebinar 
What is R? 
revolutionanalytics.com/what-is-r 
Companies using R 
revolutionanalytics.com/companies-using-r 
AcademyR training 
revolutionanalytics.com/AcademyR 
AcademyR Certification 
revolutionanalytics.com/AcademyR-certification 
Contact Revolution Analytics 
revolutionanalytics.com/contact-us
Thank you 
Revolution Analytics is the leading commercial 
provider of software and support for the 
popular open source R statistics language. 
www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR 21

More Related Content

What's hot

Dbms Lec Uog 02
Dbms Lec Uog 02Dbms Lec Uog 02
Dbms Lec Uog 02smelltulip
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in RRupak Roy
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingVictor Ordu
 
Query Optimization
Query OptimizationQuery Optimization
Query Optimizationrohitsalunke
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query ProcessingMythili Kannan
 
R Programming Language
R Programming LanguageR Programming Language
R Programming LanguageNareshKarela1
 
Query optimization
Query optimizationQuery optimization
Query optimizationNeha Behl
 
Syntax Directed Definition and its applications
Syntax Directed Definition and its applicationsSyntax Directed Definition and its applications
Syntax Directed Definition and its applicationsShivanandManjaragi2
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerDataminingTools Inc
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING RUmair Shafique
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in CompilerAkhil Kaushik
 
Lecture 11 - distributed database
Lecture 11 - distributed databaseLecture 11 - distributed database
Lecture 11 - distributed databaseHoneySah
 
Data security and Integrity
Data security and IntegrityData security and Integrity
Data security and IntegrityZaid Shabbir
 

What's hot (20)

Dbms Lec Uog 02
Dbms Lec Uog 02Dbms Lec Uog 02
Dbms Lec Uog 02
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
R programming Language
R programming LanguageR programming Language
R programming Language
 
Query Optimization
Query OptimizationQuery Optimization
Query Optimization
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
 
R Programming Language
R Programming LanguageR Programming Language
R Programming Language
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
Syntax Directed Definition and its applications
Syntax Directed Definition and its applicationsSyntax Directed Definition and its applications
Syntax Directed Definition and its applications
 
RapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid MinerRapidMiner: Introduction To Rapid Miner
RapidMiner: Introduction To Rapid Miner
 
BIG DATA ANALYTICS USING R
BIG DATA ANALYTICS USING  RBIG DATA ANALYTICS USING  R
BIG DATA ANALYTICS USING R
 
Step By Step Guide to Learn R
Step By Step Guide to Learn RStep By Step Guide to Learn R
Step By Step Guide to Learn R
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in Compiler
 
Unit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptxUnit 1 - R Programming (Part 2).pptx
Unit 1 - R Programming (Part 2).pptx
 
Lecture 11 - distributed database
Lecture 11 - distributed databaseLecture 11 - distributed database
Lecture 11 - distributed database
 
R programming
R programmingR programming
R programming
 
Shortest Path in Graph
Shortest Path in GraphShortest Path in Graph
Shortest Path in Graph
 
Data security and Integrity
Data security and IntegrityData security and Integrity
Data security and Integrity
 
Boyer more algorithm
Boyer more algorithmBoyer more algorithm
Boyer more algorithm
 

Viewers also liked

A Workshop on R
A Workshop on RA Workshop on R
A Workshop on RAjay Ohri
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL ServerStéphane Fréchette
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsAjay Ohri
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with RWei Zhong Toh
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services finalWasim Akram
 

Viewers also liked (6)

A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
RHadoop
RHadoopRHadoop
RHadoop
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
 
Training in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media AnalyticsTraining in Analytics, R and Social Media Analytics
Training in Analytics, R and Social Media Analytics
 
Introduction to Data Analytics with R
Introduction to Data Analytics with RIntroduction to Data Analytics with R
Introduction to Data Analytics with R
 
Tata consultancy services final
Tata consultancy services finalTata consultancy services final
Tata consultancy services final
 

Similar to R and Data Science

Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and moreMasayoshi Ootsuka
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationAlvaro Gil
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analyticstempledf
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document usefulssuser3c3f88
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelMSDEVMTL
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxRevolution Analytics
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
#rstats lessons for #measure
#rstats lessons for #measure#rstats lessons for #measure
#rstats lessons for #measureMark Edmondson
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Mark Tabladillo
 

Similar to R and Data Science (20)

Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and more
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
R as supporting tool for analytics and simulation
R as supporting tool for analytics and simulationR as supporting tool for analytics and simulation
R as supporting tool for analytics and simulation
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Revolution Analytics
Revolution AnalyticsRevolution Analytics
Revolution Analytics
 
An introduction to R is a document useful
An introduction to R is a document usefulAn introduction to R is a document useful
An introduction to R is a document useful
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans Excel
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & Alteryx
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
#rstats lessons for #measure
#rstats lessons for #measure#rstats lessons for #measure
#rstats lessons for #measure
 
Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629Microsoft and Revolution Analytics -- what's the add-value? 20150629
Microsoft and Revolution Analytics -- what's the add-value? 20150629
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 

More from Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

R and Data Science

  • 1. Revolution Confidential Revolution Analytics R and Data Science Joseph B Rickert September 25, 2014
  • 2. Revolution Confidential What is R?  Most widely used data analysis software  Used by 2M+ data scientists, statisticians and analysts  Most powerful statistical programming language  Flexible, extensible and comprehensive for productivity  Platform for beautiful and unique data visualizations  As seen in New York Times, Twitter and Flowing Data  Thriving open-source community  Leading edge of analytics research www.revolutionanalytics.com/what-r
  • 4. Revolution Confidential 4 R’s popularity is growing rapidly R Usage Growth Rexer Data Miner Survey, 2007-2013 • Rexer Data Miner Survey • IEEE Spectrum, July 2014 #9: R Language Popularity IEEE Spectrum Top Programming Languages
  • 5. Revolution Confidential Poll Question #1  What are the statistical programming languages/platforms you are most familiar with? (choose all that apply)  A) R  B) SAS  C) SPSS  D) KXEN  E) Statistica 5
  • 6. Revolution Confidential Tools for Data Science Source: O’Reilly Data Science Survey 6
  • 7. Revolution Confidential 7 R is among the highest-paid IT skills in the US Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey
  • 8. Revolution Confidential 8 Photo by Ksayer1 on flickr.
  • 9. Revolution Confidential Why R for Data Science? X <- if (!is.empty.model(mt)) model.matrix(mt, mf, contrasts) else matrix(, NROW(Y), 0L) weights <- as.vector(model.weights(mf)) if (!is.null(weights) && !is.numeric(weights)) stop("'weights' must be a numeric vector") if (!is.null(weights) && any(weights < 0)) stop("negative weights not allowed") offset <- as.vector(model.offset(mf)) if (!is.null(offset)) { if (length(offset) != NROW(Y)) stop(gettextf("number of offsets is %d should equal %d (number of observations)", length(offset), NROW(Y)), domain = NA) } mustart <- model.extract(mf, "mustart") etastart <- model.extract(mf, "etastart") fit <- eval(call(if (is.function(method)) "method" else method, Algorithms x = X, y = Y, weights = weights, start = start, etastart = etastart, mustart = mustart, offset = offset, family = family, control = control, intercept = attr(mt, "intercept") > 0L)) if (length(offset) && attr(mt, "intercept") > 0L) { fit2 <- eval(call(if (is.function(method)) "method" else method, x = X[, "(Intercept)", drop = FALSE], y = Y, weights = weights, offset = offset, family = family, control = control, intercept = TRUE)) if (!fit2$converged) warning("fitting to calculate the null deviance did not converge -- increase 'maxit'?") fit$null.deviance <- fit2$deviance } if (model) fit$model <- mf fit$na.action <- attr(mf, "na.action") if (x) fit$x <- X if (!y) fit$y <- NULL fit <- c(fit, list(call = call, formula = formula, terms = mt, data = data, offset = offset, control = control, method = method, contrasts = attr(X, "contrasts"), xlevels = .getXlevels(mt, mf))) class(fit) <- c(fit$class, c("glm", "lm")) fit 9 Task Views
  • 10. Revolution Confidential R Growth Put this astonishing growth in perspective:  SAS.V 9.3S contains ~ 1,200 commands that are roughly equivalent to R functions  R packages contain a median of 5 functions  Therefore R has ~ 36,820 functions  During 2013 alone, R added more functions than SAS Institute has written in its entire history! Bob Muenchen 10 5882 packages 9/25/14
  • 11. Revolution Confidential Why R for Data Science? Visualizations 11
  • 12. Revolution Confidential Why R for Data Science?  Scripting  Functional programming  Parallel programming  Data structures  Objects  Data Types  Regular expressions  Data connections  Interfaces to other Programming languages 12
  • 13. Revolution Confidential Why R for Data Science? Data Manipulation 13 “It's often said that 80% of the effort of analysis is spent just getting the data ready to analyse, the process of data cleaning. Data cleaning is not only a vital first step, but it is often repeated multiple times over the course of an analysis as new problems come to light.” Hadley Wickham Tidy Data
  • 14. Revolution Confidential Why R for Data Science? R Integrates  Web applications  Internet graphics  D3  Potly  Other Languages  C, C++  Java  BI Tools  Data bases  SQL  MongoDB 14
  • 15. Revolution Confidential Poll Question #2  What are the data platforms that you are connecting to regularly? (choose all that apply)  A) Hadoop  B) Spark  C) Cloud-based (Azure/AWS/Google)  D) Data Warehouses  E) Servers (Grid or Cluster) 15
  • 16. Revolution Confidential Why R for Data Science Hadoop Servers & Clusters Data Warehouses R Scales
  • 17. Revolution Confidential Poll Question #3  What are the types of models that you are working with most? (choose all that apply)  A) Linear models / Regression / GLM  B) Decision Trees / Random Forests  C) Survival Models  D) GBM  E) Time Series models 17
  • 18. Let’s look at some code. www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR
  • 19. Revolution Confidential 19 Why is R Right for Data Science?  R is open source  R is a powerful language  Data Manipulation  Computational Statistics  Machine Learning  R is an innovation engine  R has a rich and expanding ecosystem
  • 20. Revolution Confidential 20 Q&A / Resources R Code and Markdown Files https://github.com/joseph-rickert/DataScienceRWebinar What is R? revolutionanalytics.com/what-is-r Companies using R revolutionanalytics.com/companies-using-r AcademyR training revolutionanalytics.com/AcademyR AcademyR Certification revolutionanalytics.com/AcademyR-certification Contact Revolution Analytics revolutionanalytics.com/contact-us
  • 21. Thank you Revolution Analytics is the leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com, 1.855.GET.REVO, Twitter: @RevolutionR 21

Editor's Notes

  1. Image reference: http://www.facebook.com/notes/facebook-engineering/visualizing-friendships/469716398919
  2. Dice Tech Salary Survey, January 2014 O’Reilly Strata 2013 Data Science Salary Survey