SlideShare une entreprise Scribd logo
1  sur  21
R In Production:
the products
Yasmin Lucero, PhD
Senior Statistician, Gravity-AOL
UserR! 2014
Outline
• Internal products
• 1. one-off analysis
• 2. automated reports
• 3. internal R packages
• 4. internal dashboards
• External products
• 1. customer facing web-app
• 2. analytical backend service
• Ops and the managing of an R environment
Internal Product 1:
one-off analytical product
http://rpubs.com/nathanesau1/21383
Nathan Esau
Hilary Parker
Internal Product 2:
Automated reports
Thursday morning:
Automated Business Reporting with
R (Zhengying (Doro) Lour)
R + bash + email
R + markdown + web server
Internal Product 3:
The Internal R package
• Data APIs
• Business specific metrics
• Custom plotting functions
• Custom data manipulation utilities
Thursday Morning:
An R tools platform in Cosmetic Industry (Jean-Francois Collin)
Internal Product 4:
The internal dashboard
Gravity-AOL
External Product 1:
Customer facing web app
Wednesday afternoon
Rapid Prototyping with R/Shiny at
McKinsey (Aaron Horowitz)
http://www.showmeshiny.com/
External Product 2:
analytical back-end
Wed afternoon:
Deploying R into Business Intelligence and Real-time Applications
(Louis Bajuk-Yorgan)
Zillow’s Big Data and Real-time Services in R (Yeng Bun)
Artwork
& Brands
Bank
Partner
Transactions
CARD.COM
Site / App
CARD.COM
AdTech Platform
APIs
RTB Ad
Xchgs
CARD.COM
Analytics Platform
Members
Visitors
1
2
3
Details: card.com/useR-2014
predict
deploy
learn
CARD.com
More good example applications:
• http://blog.revolutionanalytics.com/2014/06/how-data-
driven-companies-use-r-to-compete.html
Ops: Managing an R Environment
• Overall: not complex, but there are pain points:
• R library management
• CRAN, non-CRAN and internal packages
• Version management
• Dependency management (pulling all dependencies)
• Non-R dependencies (especially C++ and Java)
• Hardware specifications: How much RAM is enough?
Conclusion: Why R?
• Plotting
• Rich analytical library
• More than a DSL: end to end functionality from data APIs
to web apps
• Solid IDE support
• Sturdy, stable easy to support platform
• Rapid prototyping
yasmin.lucero@gmail.com
Thanks.
Tools: plotting
• Major frameworks
• Base graphics
• lattice
• ggplot2
• Useful utilties
• grid/gridExtra/gtable
• latticeExtra
• Color: RColorBrewer/munsell/colorspace/dichromat
• gplots (the ‘g’ school)
• plotrix
• Custom plots
• plot.ts
• maps
• igraph (network visualization)
• ggmap
• ggvis: interactive graphics
• rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github)
• rgl (3d)/scatterplot3d
• vcd (categorical data)
Tools: data manipulation
• Base R features
• Data structures: the data.frame
• Vectorized data manipulation: apply, tapply, lapply…
• Data structures: ts
• Comprehensive, elegant missing data handling (NA)
• Packages
• Wickham school: reshape2/plyr/dplyr/tidyr
• data.table
• Time series: zoo, xts, lubridate
• Spatial data tools: sp/maptools
• The ‘G’ school: gdata
Tools: Data interfaces
• Connections: read.table(); url()
• DBI: RpostgresSQL; RMySQL; RSQLite;…
• RODBC; RJDBC: (vertica, redshift)
• Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; …
• yaml, XML, rjson, RJSONIO,
• MS Excel: xlsx, XLConnect
• SAS, SYSTAT, SPSS, Stata…: foreign
• Rcurl
• RProtoBuf: Efficient cross-language data serialization in R
Tools: Package development
• Package development:
• package.skeleton(); tools (base package)
• pkgKitten (CRAN): improvements to package.skeleton
• devtools (CRAN) : miscellaneous and very useful tools
• gtools: various R programming tools
• roxygen2 (CRAN): literate documentation
• testthat/testR: unit testing
• IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
Tools: Web development & reporting
• Shiny
• Interactive documents
• Knitr
• Sweave
Tools: parallel computing
• parallel: lots of features formerly distributed among
packages have recently been collected into this base R
package
• Revolution analytics
• Map-Reduce: rmr/rhadoop
• H20 (hexadata)
• SparkR (not on CRAN yet, look on github)
Tools: big or out of memory computing
• dplyr: supports database backed data structures
• ff: supports file based data
• biglm/bigmemory: shared memory matrices
• HadoopStreaming
Tools: memory profiling
• lineprof
• profr
• proftools
• object.size()

Contenu connexe

Similaire à 2014 july use_r

Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Spark Summit
 
Introduction to Decision Intelligence using Data
Introduction to Decision Intelligence using DataIntroduction to Decision Intelligence using Data
Introduction to Decision Intelligence using Data
Karen Lim
 

Similaire à 2014 july use_r (20)

An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 
Cloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/ShinyCloud-Based Spatial Data Analytics with R/Shiny
Cloud-Based Spatial Data Analytics with R/Shiny
 
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
 
R - the language
R - the languageR - the language
R - the language
 
A Workshop on R
A Workshop on RA Workshop on R
A Workshop on R
 
A Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdfA Gentle Introduction to Tidy Statistics in R.pdf
A Gentle Introduction to Tidy Statistics in R.pdf
 
Sard HMSC Tech Talk
Sard HMSC Tech TalkSard HMSC Tech Talk
Sard HMSC Tech Talk
 
HDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's GuideHDF-EOS Data Product Developer's Guide
HDF-EOS Data Product Developer's Guide
 
R training at Aimia
R training at AimiaR training at Aimia
R training at Aimia
 
Overview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis ToolsOverview of Modern Graph Analysis Tools
Overview of Modern Graph Analysis Tools
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...Using SparkR to Scale Data Science Applications in Production. Lessons from t...
Using SparkR to Scale Data Science Applications in Production. Lessons from t...
 
Introduction to Decision Intelligence using Data
Introduction to Decision Intelligence using DataIntroduction to Decision Intelligence using Data
Introduction to Decision Intelligence using Data
 
"R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)""R, Hadoop, and Amazon Web Services (20 December 2011)"
"R, Hadoop, and Amazon Web Services (20 December 2011)"
 
R, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web ServicesR, Hadoop and Amazon Web Services
R, Hadoop and Amazon Web Services
 
Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014Giraph at Hadoop Summit 2014
Giraph at Hadoop Summit 2014
 
From Developer to Data Scientist
From Developer to Data ScientistFrom Developer to Data Scientist
From Developer to Data Scientist
 
Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?Data science in ruby, is it possible? is it fast? should we use it?
Data science in ruby, is it possible? is it fast? should we use it?
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

2014 july use_r

  • 1. R In Production: the products Yasmin Lucero, PhD Senior Statistician, Gravity-AOL UserR! 2014
  • 2. Outline • Internal products • 1. one-off analysis • 2. automated reports • 3. internal R packages • 4. internal dashboards • External products • 1. customer facing web-app • 2. analytical backend service • Ops and the managing of an R environment
  • 3. Internal Product 1: one-off analytical product http://rpubs.com/nathanesau1/21383 Nathan Esau Hilary Parker
  • 4. Internal Product 2: Automated reports Thursday morning: Automated Business Reporting with R (Zhengying (Doro) Lour) R + bash + email R + markdown + web server
  • 5. Internal Product 3: The Internal R package • Data APIs • Business specific metrics • Custom plotting functions • Custom data manipulation utilities Thursday Morning: An R tools platform in Cosmetic Industry (Jean-Francois Collin)
  • 6. Internal Product 4: The internal dashboard Gravity-AOL
  • 7. External Product 1: Customer facing web app Wednesday afternoon Rapid Prototyping with R/Shiny at McKinsey (Aaron Horowitz) http://www.showmeshiny.com/
  • 8. External Product 2: analytical back-end Wed afternoon: Deploying R into Business Intelligence and Real-time Applications (Louis Bajuk-Yorgan) Zillow’s Big Data and Real-time Services in R (Yeng Bun)
  • 9. Artwork & Brands Bank Partner Transactions CARD.COM Site / App CARD.COM AdTech Platform APIs RTB Ad Xchgs CARD.COM Analytics Platform Members Visitors 1 2 3 Details: card.com/useR-2014 predict deploy learn CARD.com
  • 10. More good example applications: • http://blog.revolutionanalytics.com/2014/06/how-data- driven-companies-use-r-to-compete.html
  • 11. Ops: Managing an R Environment • Overall: not complex, but there are pain points: • R library management • CRAN, non-CRAN and internal packages • Version management • Dependency management (pulling all dependencies) • Non-R dependencies (especially C++ and Java) • Hardware specifications: How much RAM is enough?
  • 12. Conclusion: Why R? • Plotting • Rich analytical library • More than a DSL: end to end functionality from data APIs to web apps • Solid IDE support • Sturdy, stable easy to support platform • Rapid prototyping
  • 14. Tools: plotting • Major frameworks • Base graphics • lattice • ggplot2 • Useful utilties • grid/gridExtra/gtable • latticeExtra • Color: RColorBrewer/munsell/colorspace/dichromat • gplots (the ‘g’ school) • plotrix • Custom plots • plot.ts • maps • igraph (network visualization) • ggmap • ggvis: interactive graphics • rcharts: interactive graphics, wraps js libraries, not on CRAN yet (look on github) • rgl (3d)/scatterplot3d • vcd (categorical data)
  • 15. Tools: data manipulation • Base R features • Data structures: the data.frame • Vectorized data manipulation: apply, tapply, lapply… • Data structures: ts • Comprehensive, elegant missing data handling (NA) • Packages • Wickham school: reshape2/plyr/dplyr/tidyr • data.table • Time series: zoo, xts, lubridate • Spatial data tools: sp/maptools • The ‘G’ school: gdata
  • 16. Tools: Data interfaces • Connections: read.table(); url() • DBI: RpostgresSQL; RMySQL; RSQLite;… • RODBC; RJDBC: (vertica, redshift) • Native: rredis; rmongodb; prestodb; RCassandra; Rhadoop; … • yaml, XML, rjson, RJSONIO, • MS Excel: xlsx, XLConnect • SAS, SYSTAT, SPSS, Stata…: foreign • Rcurl • RProtoBuf: Efficient cross-language data serialization in R
  • 17. Tools: Package development • Package development: • package.skeleton(); tools (base package) • pkgKitten (CRAN): improvements to package.skeleton • devtools (CRAN) : miscellaneous and very useful tools • gtools: various R programming tools • roxygen2 (CRAN): literate documentation • testthat/testR: unit testing • IDEs: RStudio, Eclipse (StatET), TINN-R, Emacs ESS, …
  • 18. Tools: Web development & reporting • Shiny • Interactive documents • Knitr • Sweave
  • 19. Tools: parallel computing • parallel: lots of features formerly distributed among packages have recently been collected into this base R package • Revolution analytics • Map-Reduce: rmr/rhadoop • H20 (hexadata) • SparkR (not on CRAN yet, look on github)
  • 20. Tools: big or out of memory computing • dplyr: supports database backed data structures • ff: supports file based data • biglm/bigmemory: shared memory matrices • HadoopStreaming
  • 21. Tools: memory profiling • lineprof • profr • proftools • object.size()

Notes de l'éditeur

  1. Introduce self State goal of presentation: overview of the ways that R is being used Define ‘product’ for the non-business folks (deliverable)
  2. Bread and butter for many; everyone does some of this; even non-primary R users often turn to R for this Why R: R has always tried to be a platform for statistical analysis
  3. R fits neatly into this kind of pipeline, there are useful command line utilities
  4. This product is basically an extension of the automated reporting idea.