SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
RRE: Faster than SAS
Results from Benchmarking
Thomas W. Dinsmore, Revolution Analytics
John Wallace, DataSong
Polling Question
Do you currently use:
– A) R or Revolution R Enterprise (RRE)
– B) SAS
– C) Both
– D) Neither
Benchmarking RRE vs. SAS
Background
Approach
Results
Discussion
4
Revolution R Enterprise
Open source R
Commercially support distribution
Enhanced for enterprise use:
– Scalable analytics
– Developer tools
– Integration tools
– Deployment tools
5
2012: Allstate Benchmark
0 50 100 150 200 250 300
6
300
Runtime, Minutes
SAS PROC GENMOD RRE
Poisson Regression, 150MM rows
Criticism: “Apples to Oranges”
6
20 Cores16 Cores
7
Most SAS/STAT PROCs (including PROC
GENMOD) run single-threaded.
SAS/STAT: 91 PROCs
• 69 single threaded
• 13 multi-threaded
• 9 distributed (if you license SAS HP Statistics)
8
9
2013: SAS Benchmark
PROC HPGENSELECT
– SAS/STAT
– SAS High Performance Statistics
Massive grid (140/144 nodes)
– 16 cores per node
– 2,240/2,304 cores
Conclusion: SAS on 2,304 cores is competitive
with RRE on 20 cores.
Honest Benchmarking
Compare RRE and SAS/STAT performance
– Same data
– Same environment
– Same tasks
Test under real-world conditions
Make the test fair and transparent
Data
11
 Manufactured data
 Reproducible in any environment
 Designed to emulate “typical” working data
 “Entity” tables: 1MM, 5MM rows
 “Predict” tables: 10MM, 50MM rows
Fact
Pre-
dict
Entity 1
Entity 2
Entity key
571 Columns
21 Columns
Benchmarking Environment
12
SAS 9.4:
• Base
• STAT
• Grid Manager
Commodity servers:
• 4 cores
• 16GB Memory
Gbit network
CentOS
RRE 7.0
Platform LSF 9
Analytic Tasks
13
Task SAS Capability RRE Capability
Descriptive Statistics PROC SURVEYMEANS rxSummary
Median and Deciles PROC SURVEYMEANS rxQuantile
Frequency Distribution PROC FREQ rxCube
Linear Regression (Numeric predictors) PROC REG, HPREG rxLinMod
Linear Regression (Mixed predictors) PROC GENMOD rxLinMod
Stepwise Linear (100 predictors) PROC REG rxLinMod/rxStepControl
Logistic Regression PROC LOGISTIC rxLogit
Generalized Linear PROC GENMOD rxGLM
K-Means Clustering PROC FASTCLUS rxKMeans
Score PROC SCORE rxPredict
14
Preparation
Generated data with randomized procedure
Loaded data into native formats:
– RRE: XDF file
– SAS: SAS DATA set
Generation and load times not included
No meaningful differences
15
RRE: 42 Times Faster Than SAS 9.4
0 1,000 2,000 3,000 4,000 5,000 6,000
124
5,192
Runtime, Seconds
N=5,000,000
SAS 9.4 RRE RRE ~2 minutes
SAS ~1 hour, 26 minutes
Complete script: ten analytic tasks.
16
RRE: Linear Scalability
68 124
623
5,192
0
1,000
2,000
3,000
4,000
5,000
6,000
0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000
Runtime,Seconds
# Rows in Entity Table
RRE 7
SAS 9.4
RRE: consistent
performance with
increased data volume.
17
RRE: Up to 350X Faster Than SAS
0
50
100
150
200
250
300
350
400
RRE Speed Multiple
213 185
351
39 37
19
58
18
101
32
Runtime,Seconds
N=5MM
Stats
Quintiles
Freq
Lin Reg 1
Lin Reg 2
Step Lin
Logistic
GLM
Kmeans 1
Kmeans 2
18
Why is RRE faster than SAS?
RRE supports scalable computing out of the
box
– Multi-threaded processing
– Distributed processing
Legacy SAS is mostly single-threaded
– DATA Step processing
– Most SAS/STAT PROCs
19
SAS HP PROCs
9 new SAS PROCs
Bundled into SAS 9.4
Designed for scalability
Multiple operating modes:
– Single machine
– Distributed (must license SAS HP
Statistics)
20
HP PROCs: Minimal Improvement
0 50 100 150 200 250 300
6.8
267.17
253.82
Runtime, Seconds
N=5,000,000
SAS: PROC HPREG SAS: PROC REG RRE: rxLinMod
Linear regression, 20 predictors
HPREG running in single machine mode.
21
Summary
 RRE is faster than Legacy SAS:
– Same tasks
– Same hardware
 RRE speed:
– Efficient engineering
– Multi-threaded and distributed processing
 SAS performance claims:
– Massive hardware requirements
– Force you to license more software from SAS
– Don’t apply to Legacy SAS
22
Polling Question
Which of the following analytic software
benefits is most important to you:
– A) Completing projects faster
– B) Building better predictive models
– C) High performance with low infrastructure costs
23
John Wallace, Founder & CEO
 Background
 Approaching $1 trillion in revenue analyzed. $3 billion in marketing spend under our lens.
 Experienced 60+ person team based in San Francisco with offices in Seattle, Los Angeles,
Singapore, and India.
 Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from
consulting through close partnerships with our clients.
 Our Offerings
 Customer interaction insight that powers applications for customer-level revenue attribution,
targeting, media optimization.
 Descriptive and predictive modeling of hidden trends and relationships in big data.
 Custom development including applications, process automation, and decision support solutions.
DataSong at a Glance
DataSong Offerings
Hosted Applications
● Revenue Attribution
● Customer Targeting
● Marketing Planning
We know Big Data. We analyze and provide the “so what”.
DataSong Architecture
• ETL
• N marketing channels
• Behavioral variables
• Promotional data
• Overlay data
• Functions to read Hadoop output;
xdf creation
• Exploratory data analysis
• GAM survival models
• Scoring for inference
• Scoring for prediction
• 5 billion scores per day
per customer
DATASONG DATA
FORMAT (DDF)
CUSTOM VARIABLES
(PMML)
Where Speed Matters3 key dimensions
● how many rows
● how many variables
● how many iterations of a model
Trade offs for speed
● Sampling variance
● Test fewers features
● Have less understanding of the signal
This 3rd dimension means we must multiply any benchmark by N
28
29
30
Thank You

Contenu connexe

Tendances

Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormRevolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Revolution Analytics
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationRevolution Analytics
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for HadoopWilly Marroquin (WillyDevNET)
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R ServicesGregg Barrett
 
American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)Revolution Analytics
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with RTechsparks
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
DeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsDeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsRevolution Analytics
 

Tendances (20)

Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
 
How the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeedHow the growth of R helps data-driven organizations succeed
How the growth of R helps data-driven organizations succeed
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical ComputationModel Building with RevoScaleR: Using R and Hadoop for Statistical Computation
Model Building with RevoScaleR: Using R and Hadoop for Statistical Computation
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Introduction to Microsoft R Services
Introduction to Microsoft R ServicesIntroduction to Microsoft R Services
Introduction to Microsoft R Services
 
American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)American Century (Revolution Analytics Customer Day)
American Century (Revolution Analytics Customer Day)
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
DeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence ApplicationsDeployR: Revolution R Enterprise with Business Intelligence Applications
DeployR: Revolution R Enterprise with Business Intelligence Applications
 

En vedette

ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsEdwin de Jonge
 
Facebook og søk for BRAK
Facebook og søk for BRAKFacebook og søk for BRAK
Facebook og søk for BRAKEspen Grimmert
 
Почему не работают корпоративные социальные сети?
Почему не работают корпоративные социальные сети?Почему не работают корпоративные социальные сети?
Почему не работают корпоративные социальные сети?Anna Nesmeeva
 
Unit 2: NUTRITION
Unit 2: NUTRITIONUnit 2: NUTRITION
Unit 2: NUTRITIONalfonsodios
 
Krijesa të mrekullueshme. albanian (shqip)
Krijesa të mrekullueshme. albanian (shqip)Krijesa të mrekullueshme. albanian (shqip)
Krijesa të mrekullueshme. albanian (shqip)HarunyahyaAlbanian
 
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITA
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITALR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITA
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITAArunas Vizickas ✔
 
World Computer Congress Keynote
World Computer Congress KeynoteWorld Computer Congress Keynote
World Computer Congress Keynotefabricapo
 
Tutorial for the ReportLinker App
Tutorial for the ReportLinker AppTutorial for the ReportLinker App
Tutorial for the ReportLinker AppReportLinker.com
 
Medier i en digital verden 150922
Medier i en digital verden 150922Medier i en digital verden 150922
Medier i en digital verden 150922Stale Lindblad
 
Google analytics konferenz gtm hands on alkan_cem_webalytics
Google analytics konferenz gtm hands on alkan_cem_webalyticsGoogle analytics konferenz gtm hands on alkan_cem_webalytics
Google analytics konferenz gtm hands on alkan_cem_webalyticse-dialog GmbH
 
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...Jaisen Nedumpala
 
Turkey is a New Kind Of Silicon Valley
Turkey is a New Kind Of Silicon ValleyTurkey is a New Kind Of Silicon Valley
Turkey is a New Kind Of Silicon ValleyZafer Elcik
 
00025233
0002523300025233
00025233fpem
 
Bear gss experiences shareing
Bear gss experiences shareingBear gss experiences shareing
Bear gss experiences shareingDesBear Li
 
Međuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatoriMeđuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatoriabogosavljev
 

En vedette (20)

ffbase, statistical functions for large datasets
ffbase, statistical functions for large datasetsffbase, statistical functions for large datasets
ffbase, statistical functions for large datasets
 
Facebook og søk for BRAK
Facebook og søk for BRAKFacebook og søk for BRAK
Facebook og søk for BRAK
 
Почему не работают корпоративные социальные сети?
Почему не работают корпоративные социальные сети?Почему не работают корпоративные социальные сети?
Почему не работают корпоративные социальные сети?
 
Unit 2: NUTRITION
Unit 2: NUTRITIONUnit 2: NUTRITION
Unit 2: NUTRITION
 
islam & art
islam & artislam & art
islam & art
 
Krijesa të mrekullueshme. albanian (shqip)
Krijesa të mrekullueshme. albanian (shqip)Krijesa të mrekullueshme. albanian (shqip)
Krijesa të mrekullueshme. albanian (shqip)
 
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITA
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITALR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITA
LR KONKURENCIJOS TARYBOS (KT) 2015 m. VEIKLOS ATASKAITA
 
World Computer Congress Keynote
World Computer Congress KeynoteWorld Computer Congress Keynote
World Computer Congress Keynote
 
Bolsas de Estudo para Australia
Bolsas de Estudo para AustraliaBolsas de Estudo para Australia
Bolsas de Estudo para Australia
 
Digital Marketing
Digital MarketingDigital Marketing
Digital Marketing
 
Tutorial for the ReportLinker App
Tutorial for the ReportLinker AppTutorial for the ReportLinker App
Tutorial for the ReportLinker App
 
Medier i en digital verden 150922
Medier i en digital verden 150922Medier i en digital verden 150922
Medier i en digital verden 150922
 
Google analytics konferenz gtm hands on alkan_cem_webalytics
Google analytics konferenz gtm hands on alkan_cem_webalyticsGoogle analytics konferenz gtm hands on alkan_cem_webalytics
Google analytics konferenz gtm hands on alkan_cem_webalytics
 
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...
ഒരു അണ്‍സര്‍വ്വേ പ്രദേശത്തെ ഭൂപടനിര്‍മ്മാണപരിശ്രമം - കൂരാച്ചുണ്ടു് ഗ്രാമപഞ്ചാ...
 
Turkey is a New Kind Of Silicon Valley
Turkey is a New Kind Of Silicon ValleyTurkey is a New Kind Of Silicon Valley
Turkey is a New Kind Of Silicon Valley
 
11-16
11-1611-16
11-16
 
00025233
0002523300025233
00025233
 
Bear gss experiences shareing
Bear gss experiences shareingBear gss experiences shareing
Bear gss experiences shareing
 
Boletín XVII abril 2016
Boletín XVII abril 2016Boletín XVII abril 2016
Boletín XVII abril 2016
 
Međuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatoriMeđuinduktivitet i zračni transformatori
Međuinduktivitet i zračni transformatori
 

Similaire à Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDatabricks
 
Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Dougsichie
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2Revolution Analytics
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonSumit Sarkar
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems MongoDB
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterMongoDB
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServiceswebuploader
 
DataVard BW Fitness Test and HeatMap
DataVard BW Fitness Test and HeatMapDataVard BW Fitness Test and HeatMap
DataVard BW Fitness Test and HeatMapDataVard
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkDatabricks
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneMongoDB
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 

Similaire à Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed (20)

Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Demantra Case Study Doug
Demantra Case Study DougDemantra Case Study Doug
Demantra Case Study Doug
 
What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2What's New in Revolution R Enterprise 6.2
What's New in Revolution R Enterprise 6.2
 
Journey to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, PythonJourney to SAS Analytics Grid with SAS, R, Python
Journey to SAS Analytics Grid with SAS, R, Python
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Sql 2016 2017 full
Sql 2016   2017 fullSql 2016   2017 full
Sql 2016 2017 full
 
Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems Using Compass to Diagnose Performance Problems
Using Compass to Diagnose Performance Problems
 
Using Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your ClusterUsing Compass to Diagnose Performance Problems in Your Cluster
Using Compass to Diagnose Performance Problems in Your Cluster
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
 
DataVard BW Fitness Test and HeatMap
DataVard BW Fitness Test and HeatMapDataVard BW Fitness Test and HeatMap
DataVard BW Fitness Test and HeatMap
 
Sql 2017 net raf
Sql 2017  net rafSql 2017  net raf
Sql 2017 net raf
 
Resume_Rahim
Resume_RahimResume_Rahim
Resume_Rahim
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
Resume
ResumeResume
Resume
 
L’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazioneL’architettura di classe enterprise di nuova generazione
L’architettura di classe enterprise di nuova generazione
 
microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 

Plus de Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution Analytics
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solutionRevolution Analytics
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceRevolution Analytics
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 

Plus de Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 

Dernier

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Dernier (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Is Revolution R Enterprise Faster than SAS? Benchmarking Results Revealed

  • 1. RRE: Faster than SAS Results from Benchmarking Thomas W. Dinsmore, Revolution Analytics John Wallace, DataSong
  • 2. Polling Question Do you currently use: – A) R or Revolution R Enterprise (RRE) – B) SAS – C) Both – D) Neither
  • 3. Benchmarking RRE vs. SAS Background Approach Results Discussion
  • 4. 4 Revolution R Enterprise Open source R Commercially support distribution Enhanced for enterprise use: – Scalable analytics – Developer tools – Integration tools – Deployment tools
  • 5. 5 2012: Allstate Benchmark 0 50 100 150 200 250 300 6 300 Runtime, Minutes SAS PROC GENMOD RRE Poisson Regression, 150MM rows
  • 6. Criticism: “Apples to Oranges” 6 20 Cores16 Cores
  • 7. 7 Most SAS/STAT PROCs (including PROC GENMOD) run single-threaded. SAS/STAT: 91 PROCs • 69 single threaded • 13 multi-threaded • 9 distributed (if you license SAS HP Statistics)
  • 8. 8
  • 9. 9 2013: SAS Benchmark PROC HPGENSELECT – SAS/STAT – SAS High Performance Statistics Massive grid (140/144 nodes) – 16 cores per node – 2,240/2,304 cores Conclusion: SAS on 2,304 cores is competitive with RRE on 20 cores.
  • 10. Honest Benchmarking Compare RRE and SAS/STAT performance – Same data – Same environment – Same tasks Test under real-world conditions Make the test fair and transparent
  • 11. Data 11  Manufactured data  Reproducible in any environment  Designed to emulate “typical” working data  “Entity” tables: 1MM, 5MM rows  “Predict” tables: 10MM, 50MM rows Fact Pre- dict Entity 1 Entity 2 Entity key 571 Columns 21 Columns
  • 12. Benchmarking Environment 12 SAS 9.4: • Base • STAT • Grid Manager Commodity servers: • 4 cores • 16GB Memory Gbit network CentOS RRE 7.0 Platform LSF 9
  • 13. Analytic Tasks 13 Task SAS Capability RRE Capability Descriptive Statistics PROC SURVEYMEANS rxSummary Median and Deciles PROC SURVEYMEANS rxQuantile Frequency Distribution PROC FREQ rxCube Linear Regression (Numeric predictors) PROC REG, HPREG rxLinMod Linear Regression (Mixed predictors) PROC GENMOD rxLinMod Stepwise Linear (100 predictors) PROC REG rxLinMod/rxStepControl Logistic Regression PROC LOGISTIC rxLogit Generalized Linear PROC GENMOD rxGLM K-Means Clustering PROC FASTCLUS rxKMeans Score PROC SCORE rxPredict
  • 14. 14 Preparation Generated data with randomized procedure Loaded data into native formats: – RRE: XDF file – SAS: SAS DATA set Generation and load times not included No meaningful differences
  • 15. 15 RRE: 42 Times Faster Than SAS 9.4 0 1,000 2,000 3,000 4,000 5,000 6,000 124 5,192 Runtime, Seconds N=5,000,000 SAS 9.4 RRE RRE ~2 minutes SAS ~1 hour, 26 minutes Complete script: ten analytic tasks.
  • 16. 16 RRE: Linear Scalability 68 124 623 5,192 0 1,000 2,000 3,000 4,000 5,000 6,000 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 Runtime,Seconds # Rows in Entity Table RRE 7 SAS 9.4 RRE: consistent performance with increased data volume.
  • 17. 17 RRE: Up to 350X Faster Than SAS 0 50 100 150 200 250 300 350 400 RRE Speed Multiple 213 185 351 39 37 19 58 18 101 32 Runtime,Seconds N=5MM Stats Quintiles Freq Lin Reg 1 Lin Reg 2 Step Lin Logistic GLM Kmeans 1 Kmeans 2
  • 18. 18 Why is RRE faster than SAS? RRE supports scalable computing out of the box – Multi-threaded processing – Distributed processing Legacy SAS is mostly single-threaded – DATA Step processing – Most SAS/STAT PROCs
  • 19. 19 SAS HP PROCs 9 new SAS PROCs Bundled into SAS 9.4 Designed for scalability Multiple operating modes: – Single machine – Distributed (must license SAS HP Statistics)
  • 20. 20 HP PROCs: Minimal Improvement 0 50 100 150 200 250 300 6.8 267.17 253.82 Runtime, Seconds N=5,000,000 SAS: PROC HPREG SAS: PROC REG RRE: rxLinMod Linear regression, 20 predictors HPREG running in single machine mode.
  • 21. 21 Summary  RRE is faster than Legacy SAS: – Same tasks – Same hardware  RRE speed: – Efficient engineering – Multi-threaded and distributed processing  SAS performance claims: – Massive hardware requirements – Force you to license more software from SAS – Don’t apply to Legacy SAS
  • 22. 22 Polling Question Which of the following analytic software benefits is most important to you: – A) Completing projects faster – B) Building better predictive models – C) High performance with low infrastructure costs
  • 24.  Background  Approaching $1 trillion in revenue analyzed. $3 billion in marketing spend under our lens.  Experienced 60+ person team based in San Francisco with offices in Seattle, Los Angeles, Singapore, and India.  Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from consulting through close partnerships with our clients.  Our Offerings  Customer interaction insight that powers applications for customer-level revenue attribution, targeting, media optimization.  Descriptive and predictive modeling of hidden trends and relationships in big data.  Custom development including applications, process automation, and decision support solutions. DataSong at a Glance
  • 25. DataSong Offerings Hosted Applications ● Revenue Attribution ● Customer Targeting ● Marketing Planning We know Big Data. We analyze and provide the “so what”.
  • 26. DataSong Architecture • ETL • N marketing channels • Behavioral variables • Promotional data • Overlay data • Functions to read Hadoop output; xdf creation • Exploratory data analysis • GAM survival models • Scoring for inference • Scoring for prediction • 5 billion scores per day per customer DATASONG DATA FORMAT (DDF) CUSTOM VARIABLES (PMML)
  • 27. Where Speed Matters3 key dimensions ● how many rows ● how many variables ● how many iterations of a model Trade offs for speed ● Sampling variance ● Test fewers features ● Have less understanding of the signal This 3rd dimension means we must multiply any benchmark by N
  • 28. 28
  • 29. 29