SlideShare une entreprise Scribd logo
1  sur  50
Télécharger pour lire hors ligne
Batter Up! Advanced Sports Analytics with R and Storm 
Meeting the Real-Time Analytics Opportunity Head-On 
Bill Jacobs 
VP Product Marketing 
Revolution Analytics 
@bill_jacobs 
December 11, 2014 
Allen Day 
Principal Data Scientist 
MapR Technologies 
@allenday 
Vineet Sharma 
Dir., Partner Marketing 
MapR Technologies
Who Am I? 
Bill Jacobs, VP Product Marketing 
Revolution Analytics 
@bill_jacobs
Polling Question #1: 
Who Are You? (choose one) 
–Statistician or modeler 
–Data Scientist 
–Hadoop Expert 
–Application builder 
–Data guru 
–Business user 
–Baseball fan
Sports Analytics as Analogy. 
Sports Teams Are Like Other Corporations. 
–Great Value Achievable With Data 
–Vast Range of Data Sources 
–Timely Analysis Amplifies Value 
And apologies if you came to learn whom to bet upon in next year’s season.
Game Changing Big Data Analytics Applications 
Marketing: Clickstream & Campaign Analyses 
Digital Media: Recommendation Engines 
Social Media: Sentiment Analysis 
Retail: Purchase Prediction 
Insurance: Fraud Waste and Abuse 
Healthcare Delivery: Treatment Outcome Prediction 
Risk Analysis: Insurance Underwriting 
Manufacturing: Predictive Maintenance 
Operations: Supply Chain Optimization 
Econometrics: Market Prediction 
Marketing: Mix and Price Optimization 
Life Sciences: Pharmacogenetics 
Transportation: Asset Utilization
Polling Question #2: 
What Language or Tools Is In Use for Analytics (check all that apply) 
–R 
–SAS or SPSS 
–Python 
–Java 
–BI tools including: MSTR, Qlik, Tableau, Business Objects, Cognos 
–Salford Systems or MATLAB 
–H20, RapidMiner, KNIME or similar 
–Other data mining tools 
–Other programming languages 
–None or Don’t know
WELCOME & INTRODUCTIONS 
R Open Source 
-Language, Community, Collaboration 
-Robert Gentleman & Ross Ihaka, 1993 
-Version 1.0 released 2000 
-2.5 Million Global Users 
-Over 4,800 add-on “Packages” 
-Why R? R in Universities = New Talent Emerging Modeling/Visualization Lower Cost Alternative Open Source = Flexible & Innovative Access to Free Packages
R is exploding in popularity & functionality 
R Usage Growth Rexer Data Miner Survey, 2007-2013 
70% of data miners report using R 
R is the first choice of more data miners than any other software 
Source: www.rexeranalytics.com
Innovate with R 
Most widely used data analysis software 
•Used by 2M+ data scientists, statisticians and analysts 
Most powerful statistical programming language 
•Flexible, extensible and comprehensive for productivity 
Create beautiful and unique data visualizations 
•As seen in New York Times, Twitter and Flowing Data 
Thriving open-source community 
•Leading edge of analytics research 
Fills the talent gap 
•New graduates prefer R 
White Paper R is Hot bit.ly/r-is-hot
Polling Question #3: 
How are you using R today? (choose one) 
–Not using R 
–Studying R now 
–Initial R project(s) underway 
–R is widely used for exploration & modeling 
–R is deployed into production
Revolution Analytics In A Nutshell 
Our Vision: 
R is becoming the de- facto standard for enterprise predictive analytics 
Our Mission: 
Drive enterprise adoption of R by providing enhanced R products tailored to meet enterprise challenges
Revolution Analytics Builds & Delivers: 
Software Products: 
Stable Distributions 
Broad Platform Support 
Big Data Analytics in R 
Application Integration 
Deployment Platforms 
Agile Development Tooling 
Future Platform Support 
Support & Services 
Commercial Support Programs 
Training Programs 
Professional Services 
Academic Support Programs 
IP Indemnification
Revolution R Advantages for Analytics Professionals: 
Broadly-used, scalable R language 
Large (2M+), collaborative, young R analytics community 
Largest repository of statistical & analytical algorithms 
Big data analytics capabilities 
–Scales from workstations to Hadoop 
–Transparent parallelism 
–Cross platform compatibility 
–Multi-platform architectures 
Broadens career opportunities
Revolution R Advantages for Business Executives 
Viable Alternative to Legacy Analytics Solutions 
–Predictable Time To Results 
–Simplified Licensing 
–All-Inclusive Environment 
Lower Staffing Costs 
Controllable Open Source Risks 
–Support 
–IP Infringement Protections
Revolution R Advantages for IT Organizations 
Consistency Across Platforms Avoids Sprawl 
Support for Workstations, Servers, Hadoop, EDWs and Grids 
Heterogeneous Architecture Capabilities 
Integrates With Major BI & Application Tools 
Streamline Model Deployment 
Run Complex Analytics in the “Data Lake” 
Be a “Good Citizen” in shared systems 
Commercial Support Reduces Project Risks 
Quick Start Programs Accelerate Results 
Platform Continuity Future-Proofs Architectures
YARN 
Revolution R Enterprise: Predictive Analytics Across Huge Data in Hadoop 
Exploration Visualization Predictive Modeling 
HDFS
Polling Question #4: 
Stage of Hadoop Adoption? (choose one) 
–No Need 
–Studying 
–Setting-Up Hadoop 
–Experimenting with Hadoop 
–Deploying Hadoop Now 
–Hadoop in Production
© 2014 MapR Technologies 18 
Introducing: 
Vineet Sharma 
Director, Partner Marketing 
MapR Technologies
© 2014 MapR Technologies 19 
MapR + Revolution Leverages MapR As A Scalable 
Enterprise R Engine. 
• Plus: 
– Run RRE Analytics In MapR 
Hadoop Without Change 
– Eliminate Need To Design 
Parallel Software or “Think In 
MapReduce” 
– Leverage All Revolution R 
Enterprise Pre-Parallelized 
Algorithms 
– Enable Users To Build Custom 
Apps That Leverage Hadoop’s 
Parallelism 
– Slash Data Movement by 
Analyzing Data Inside the MapR 
Data Platform 
– Expand Deployment and 
Integration Options 
Rapid Adoption of R 
MapR Enterprise Data 
Platform Capabilities 
Broad Adoption of Hadoop 
for Big Data Analytics
© 2014 MapR Technologies 20 
Predictive 
Modeling 
Algorithms 
MapR 
FS 
Data 
Desktop Users with Analytical Access to Huge Data in 
Hadoop
© 2014 MapR Technologies 21 
MapR: Best Solution for Customer Success 
Top Ranked 
Exponential 
Growth 
500+ 
Customers 
Premier 
Investors 
>2x annual bookings 
80% of accounts expand 3X 
90% software licenses 
< 1% lifetime churn 
> $1B in incremental revenue 
generated by 1 customer
© 2014 MapR Technologies 22 
Management 
MapR Data Platform 
APACHE HADOOP AND OSS ECOSYSTEM 
Security 
YARN 
Pig 
Cascading 
Spark 
Batch 
Spark 
Streaming 
Storm* 
Streaming 
HBase 
Solr 
NoSQL & 
Search 
Juju 
Provisioning 
& 
coordination 
Savannah* 
Mahout 
MLLib 
ML, Graph 
GraphX 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Workflow 
& Data 
Governance 
Tez* 
Accumulo* 
Hive 
Impala 
Shark 
Drill 
SQL 
Sqoop Sentry* Oozie ZooKeeper 
Flume Knox* Falcon* Whirr 
Data 
Integration 
& Access 
HttpFS 
Hue 
MapR-FS MapR-DB 
* Certification/support planned 
The Power of the Open Source Community
© 2014 MapR Technologies 23 
MapR Distribution for Hadoop 
Management 
MapR Data Platform 
APACHE HADOOP AND OSS ECOSYSTEM 
Security 
YARN 
Pig 
Cascading 
Spark 
Batch 
Spark 
Streaming 
Storm* 
Streaming 
HBase 
Solr 
NoSQL & 
Search 
Juju 
Provisioning 
& 
coordination 
Savannah* 
Mahout 
MLLib 
ML, Graph 
GraphX 
MapReduce 
v1 & v2 
EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS 
Workflow 
& Data 
Governance 
Tez* 
Accumulo* 
Hive 
Impala 
Shark 
Drill 
SQL 
Sqoop Sentry* Oozie ZooKeeper 
Flume Knox* Falcon* Whirr 
Data 
Integration 
& Access 
HttpFS 
Hue 
Enterprise-grade Interoperability Performance Multi-tenancy Security Operational 
MapR-FS MapR-DB 
• Standard file access 
• Standard database 
access 
• Pluggable services 
• Broad developer 
support 
• Enterprise security 
authorization 
• Wire-level 
authentication 
• Data governance 
• Ability to support 
predictive analytics, 
real-time database 
operations, and 
support high arrival 
rate data 
• Ability to logically 
divide a cluster to 
support different 
use cases, job 
types, user groups, 
and administrators 
• 2X to 7X higher 
performance 
• Consistent, low 
latency 
• High availability 
• Data protection 
• Disaster recovery 
* Certification/support planned
Management 
APACHE HADOOP & OSS ECOSYSTEM 
ZooKeeper Oozie Hue Pig Hive Impala Shark 
Flume HttpFS Cascading Solr Juju Mahout MLLib 
Storm 
Spark 
Streaming Sqoop Whirr HBase YARN 
Drill Tez 
Knox Sentry 
Spark Falcon 
Revolution R Enterprise and MapR Hadoop 
Edge Node 
BI and other Apps 
R on the Desktop 
Via Browsers 
Mobile 
YARN 
MapReduce 
MapR Data Platform
© 2014 MapR Technologies 25 
Introducing: 
Allen Day 
Principal Data Scientist 
MapR Technologies 
@allenday
© 2014 MapR Technologies 26 
Talk Overview 
• Agile Real-time Stats 
• R + Storm 
github.com/allenday/R-Storm 
• DEMO 
• How to do it? 
• Q & A @allenday 
Agile 
Methods 
Advanced 
Statistics 
Continuous 
Real-time 
Delivery 
github.com/allenday/hadoop-summit-r-storm-demo-public
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 27 
Architecting R into the Storm 
Application Development Process
© 2014 MapR Technologies 28 
Quick intro 
• Allen Day, Principal Data Scientist [ @allenday ] 
7yr Hadoop dev, 12yr R dev/author 
PhD, Human Genetics, UCLA Medicine
© 2014 MapR Technologies 29 
What’s Storm? What’s R? 
• What’s Storm? 
– Processes a data stream. Akin to UNIX pipe + tee & merge commands 
– Runs on a cluster. Fault-tolerant and designed to scale out 
– Used for: real-time analytics & machine learning 
• What’s R? 
– Programming language with advanced statistics libraries 
– Does not scale out. Can scale up 
– Used for: prototyping, data modeling, visualization 
How to combine these?
© 2014 MapR Technologies 30 
R outside, Storm inside: not practical. Why? 
• Model-building and QA is done 
on data snapshots 
• However, R => Hadoop is 
realistic. Key difference: 
referenced data can be static 
– Use MapR snapshots for dev and 
QA 
– See also: RHIPE (Purdue) and 
RHadoop (RevolutionAnalytics) 
R 
Storm 
User
© 2014 MapR Technologies 31 
Storm outside, R inside: a good fit 
• Enables separation of concerns 
– Independently manage 
modeling, ops timelines, and 
version control 
– Integrate as needed 
• Enables role specialization 
– R built-ins allow faster iteration 
and more concise stats-type 
code 
– Do DevOps with specific SW 
engineering tech, e.g. Java 
Storm 
R 
User
© 2014 MapR Technologies 32 © 2014 MapR Technologies 
Q: Who really likes statistics? 
A: Baseball fans 
A: Team Managers = Portfolio Managers
© 2014 MapR Technologies 33 
Famous Vintage Data 
Oakland Athletics 
2002 Season 
20 consecutive 
wins – the current 
record
© 2014 MapR Technologies 34 © 2014 MapR Technologies 
Goal: Detect “Moneyball” 2002 Winning Streak
© 2014 MapR Technologies 35 
Methods: 
Change Point Detection 
Find natural breakpoints in a 
time-series set of data points 
R packages implement this: 
changepoint: more 
sensitve, but not streaming 
bcp: streaming, but less 
sensitive
© 2014 MapR Technologies 36 
GIFs to 
MapR 
Filesystem 
Methods: R+Storm Demo Architecture 
Storm Bolt 
R online 
change 
point 
detector 
Storm Bolt 
(write to Jetty) 
Oakland A’s 
Data 
(accelerated) 
Jetty 
Webserver 
Browser 
(D3.js) Us  
github.com/allenday/hadoop-summit-r-storm-demo-public
© 2014 MapR Technologies 37 © 2014 MapR Technologies 
50-game sliding 
window/buffer to 
detect change points 
Cumulative history 
with detected break 
points 
Raw data (score 
difference between 
A’s and opponent) 
Demo
© 2014 MapR Technologies 38 
Methods Details: How it’s done 
• Uses R-Storm binding github.com/allenday/R-Storm 
– Storm package on CRAN cran.r-project.org/web/packages/Storm 
Storm (dev 
team) 
R 
(stats team) 
Storm 
(dev team, pure 
Java) 
Producer Consumer
© 2014 MapR Technologies 39 
Methods Details: Easy integration 
R: lambda function 
storm = Storm$new(); 
storm$lambda = function(s) { 
t = s$tuple; 
t$output = vector(length=1); t$output[1] = “tada!” 
s$emit(t) 
} 
Storm: extend ShellBolt 
public static class MyRBolt extends ShellBolt implements 
IRichBolt { 
public RBolt() { 
super("Rscript", ”my.R"); 
} 
}
© 2014 MapR Technologies 40 
Results 
• Change points are identified, but none for winning streak 
– Not using score difference, anyway 
• Time to integrate with the modeling team! 
– Send @kunpognr or @allenday a pull request on GitHub 
• Applicable to many other use cases, e.g. 
– Security (fraud detection, intrusion detection) 
– Marketing (intent to purchase / social media streams) 
– Customer Support (help desk voice calls) 
Discussion
Polling Question #5 
How important will Real-Time analytical apps become? (choose one) 
–Uncertain 
–Not important 
–Necessary 
–Critical
Real-Time and Internet of Things: Foundation of a Compelling Trend for 2015 
Big Data Analytics Meets The Internet of Things 
–Transactions + 
–Human Behavior + 
–Internet of Things: Sensors 
… and extracting value using 
–Traditional Statistics 
–Visualization 
–Machine Learning 
… plus adaptability 
–Real-Time –Agile Modeling & Fast Model Execution 
–Production Capable, Stable and Secure 
–Rapidly-Evolving Data Science
Where Does Real Time Impact The Analytical Lifecycle? 
Data Engineering 
–Collection and Ingest 
–“Blending” 
Modeling 
–Aggregation, Segmentation & Exploration 
–Model Development & Optimization 
–Testing & Validation 
Operationalization 
–Deployment & Scoring 
–Delivery 
–Monitoring & Evaluation
Typical Analytical Lifecycle 
Ingest 
Explore 
Model 
Deploy 
Score 
Act 
Measure 
Model 
Score
More Complex Event Driven Analytical Cycle 
Historic Ingest 
Explore 
Model 
Deploy 
Act 
Measure 
Data Analytics & Process Design 
Scoring 
Event Ingest 
Trans- Form 
“Blend” 
Append 
Improve 
Enrich
Real-Time Analytics Best Practices 
Develop a Common Lexicon for Real-Time 
Discriminate Between Needs of Each Stage in Lifecycle 
–Data Ingest & Manipulation and Enrichment 
–Data Source / Repository Integration Needs 
–Processes that “Fill the Lake” 
–Process that “Act on the Stream” 
–Vastly different computationally 
–Big differences in data ingest volume & latency 
Start with Tractable Goals 
–Anticipate Growing Requirements: Microbatched >> Interactive >> Autonomy 
Build for today, Architect for tomorrow
Real-Time Realities 
Plan for Diverse Needs 
–Real-Time Score Retrieval, Scoring, Modeling 
–Wide-Ranging Performance – Microbatch – Interactive - Autonomous 
Fragmentation 
–Data Delivery Systems Pre-Exist 
–Will Vary Widely by Vertical Market 
–Competing Proprietary Solutions 
Growing Demand 
–Numerous high-value targets 
–“The next step”: Put big data analytics to work
What’s Needed 
Real-Time Performance… plus… 
Agility 
–Deployment models 
–Organization 
–Infrastructure 
–Analytics 
Manageable Costs 
–Hadoop 
–Open Source R 
Production Platform(s) 
–Proven 
–Performant
Next Steps… 
www.revolutionanalytcs.com 
Whitepaper: Revolution R for Hadoop: 
—http://www.revolutionanalytics.com/whitepaper/delivering-value-big-data-revolution-r-enterprise-and- hadoop 
—…or http://bit.ly/1ua43bu 
www.maprtech.com 
Resources: 
R foundation URL: www.r-project.org 
Download Revolution R: http://mran.revolutionanalytics.com/download/ 
Learn about Apache Storm: https://storm.apache.org/ 
R-Storm bindings: github.com/allenday/R-Storm 
Storm package on CRAN: cran.r-project.org/web/packages/Storm
Thank you. 
www.revolutionanalytics.com 
1.855.GET.REVO 
Twitter: @RevolutionR

Contenu connexe

Tendances

Melas Quiz 2017, Final With Answers
Melas Quiz 2017, Final With AnswersMelas Quiz 2017, Final With Answers
Melas Quiz 2017, Final With AnswersSiliguri Quiz Club
 
IIT Kanpur Music Quiz
IIT Kanpur Music QuizIIT Kanpur Music Quiz
IIT Kanpur Music QuizAnshul Roy
 
Space Quiz Grand Finale
Space Quiz Grand FinaleSpace Quiz Grand Finale
Space Quiz Grand FinaleSomnath Chanda
 
Indian MMT Quiz (Movies Music TV ) Prelims - By Partha Sarathi Ghatak (Partha...
Indian MMT Quiz (Movies Music TV ) Prelims - By Partha Sarathi Ghatak (Partha...Indian MMT Quiz (Movies Music TV ) Prelims - By Partha Sarathi Ghatak (Partha...
Indian MMT Quiz (Movies Music TV ) Prelims - By Partha Sarathi Ghatak (Partha...Partha Abarki
 
QUIZZOMANIA - The Vidyuth 2k16 General Quiz
QUIZZOMANIA - The Vidyuth 2k16 General QuizQUIZZOMANIA - The Vidyuth 2k16 General Quiz
QUIZZOMANIA - The Vidyuth 2k16 General QuizRahul Khatri
 
Spent quiz finals
Spent quiz finalsSpent quiz finals
Spent quiz finalsw5h
 
The Indian Television Quiz
The Indian Television QuizThe Indian Television Quiz
The Indian Television QuizQuestions2Chew
 
Entertainment Quiz 2017
Entertainment Quiz 2017Entertainment Quiz 2017
Entertainment Quiz 2017Rakshit Sood
 

Tendances (13)

Melas Quiz 2017, Final With Answers
Melas Quiz 2017, Final With AnswersMelas Quiz 2017, Final With Answers
Melas Quiz 2017, Final With Answers
 
INDIA QUIZ Prelims
INDIA QUIZ PrelimsINDIA QUIZ Prelims
INDIA QUIZ Prelims
 
IIT Kanpur Music Quiz
IIT Kanpur Music QuizIIT Kanpur Music Quiz
IIT Kanpur Music Quiz
 
Space Quiz Grand Finale
Space Quiz Grand FinaleSpace Quiz Grand Finale
Space Quiz Grand Finale
 
Bollywood quiz Alcheringa 2018
Bollywood quiz Alcheringa 2018Bollywood quiz Alcheringa 2018
Bollywood quiz Alcheringa 2018
 
Indian MMT Quiz (Movies Music TV ) Prelims - By Partha Sarathi Ghatak (Partha...
Indian MMT Quiz (Movies Music TV ) Prelims - By Partha Sarathi Ghatak (Partha...Indian MMT Quiz (Movies Music TV ) Prelims - By Partha Sarathi Ghatak (Partha...
Indian MMT Quiz (Movies Music TV ) Prelims - By Partha Sarathi Ghatak (Partha...
 
QUIZZOMANIA - The Vidyuth 2k16 General Quiz
QUIZZOMANIA - The Vidyuth 2k16 General QuizQUIZZOMANIA - The Vidyuth 2k16 General Quiz
QUIZZOMANIA - The Vidyuth 2k16 General Quiz
 
Spent quiz finals
Spent quiz finalsSpent quiz finals
Spent quiz finals
 
POP QUIZ: MAINS Synapse'19
POP QUIZ: MAINS Synapse'19POP QUIZ: MAINS Synapse'19
POP QUIZ: MAINS Synapse'19
 
Incredible india
Incredible indiaIncredible india
Incredible india
 
The Indian Television Quiz
The Indian Television QuizThe Indian Television Quiz
The Indian Television Quiz
 
VENCEDORES S1 Finals
VENCEDORES S1 FinalsVENCEDORES S1 Finals
VENCEDORES S1 Finals
 
Entertainment Quiz 2017
Entertainment Quiz 2017Entertainment Quiz 2017
Entertainment Quiz 2017
 

En vedette

The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Sloan Sports and Analytics Conference Day 1 Recap
Sloan Sports and Analytics Conference Day 1 RecapSloan Sports and Analytics Conference Day 1 Recap
Sloan Sports and Analytics Conference Day 1 RecapNeil Horowitz
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R OpenRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
MYagonism basketball analytics innovation -public-
MYagonism basketball analytics innovation -public-MYagonism basketball analytics innovation -public-
MYagonism basketball analytics innovation -public-Paolo Raineri
 
2016 Sloan Sports Analytics Conference (Sports Business angle)
2016 Sloan Sports Analytics Conference (Sports Business angle)2016 Sloan Sports Analytics Conference (Sports Business angle)
2016 Sloan Sports Analytics Conference (Sports Business angle)Neil Horowitz
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionRevolution Analytics
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solutionRevolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 

En vedette (20)

The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Sloan Sports and Analytics Conference Day 1 Recap
Sloan Sports and Analytics Conference Day 1 RecapSloan Sports and Analytics Conference Day 1 Recap
Sloan Sports and Analytics Conference Day 1 Recap
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R Open
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
MYagonism basketball analytics innovation -public-
MYagonism basketball analytics innovation -public-MYagonism basketball analytics innovation -public-
MYagonism basketball analytics innovation -public-
 
Sports Analytics 2015 Brochure
Sports Analytics 2015 BrochureSports Analytics 2015 Brochure
Sports Analytics 2015 Brochure
 
2016 Sloan Sports Analytics Conference (Sports Business angle)
2016 Sloan Sports Analytics Conference (Sports Business angle)2016 Sloan Sports Analytics Conference (Sports Business angle)
2016 Sloan Sports Analytics Conference (Sports Business angle)
 
Analytics - Sports Style, ESPN
Analytics - Sports Style, ESPNAnalytics - Sports Style, ESPN
Analytics - Sports Style, ESPN
 
Reproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R ConferenceReproducibility with Checkpoint & RRO - NYC R Conference
Reproducibility with Checkpoint & RRO - NYC R Conference
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
In-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and RevolutionIn-Database Analytics Deep Dive with Teradata and Revolution
In-Database Analytics Deep Dive with Teradata and Revolution
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseR + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San Jose
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 

Similaire à Batter Up! Advanced Sports Analytics with R and Storm

R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar Revolution Analytics
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelMSDEVMTL
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big DataInfochimps, a CSC Big Data Business
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR Technologies
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareBAQMaR
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxRevolution Analytics
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceMapR Technologies
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...Revolution Analytics
 
Dba to data scientist -Satyendra
Dba to data scientist -SatyendraDba to data scientist -Satyendra
Dba to data scientist -Satyendrapasalapudi123
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution ShowcaseInside Analysis
 

Similaire à Batter Up! Advanced Sports Analytics with R and Storm (20)

R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Revolution Analytics Podcast
Revolution Analytics PodcastRevolution Analytics Podcast
Revolution Analytics Podcast
 
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar 18Mar14 Find the Hidden Signal in Market Data Noise Webinar
18Mar14 Find the Hidden Signal in Market Data Noise Webinar
 
Robert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans ExcelRobert Luong: Analyse prédictive dans Excel
Robert Luong: Analyse prédictive dans Excel
 
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
[Webinar] Getting to Insights Faster: A Framework for Agile Big Data
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012
 
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics SoftwareKristof Coussement - The Debate: the Future of (Big) Data Analytics Software
Kristof Coussement - The Debate: the Future of (Big) Data Analytics Software
 
Creating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & AlteryxCreating Value That Scales with Revolution Analytics & Alteryx
Creating Value That Scales with Revolution Analytics & Alteryx
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
The Modern Data Architecture for Predictive Analytics with Hortonworks and Re...
 
Dba to data scientist -Satyendra
Dba to data scientist -SatyendraDba to data scientist -Satyendra
Dba to data scientist -Satyendra
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
Big Data in Action – Real-World Solution Showcase
 Big Data in Action – Real-World Solution Showcase Big Data in Action – Real-World Solution Showcase
Big Data in Action – Real-World Solution Showcase
 

Plus de Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution Analytics
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in RRevolution Analytics
 
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Revolution Analytics
 

Plus de Revolution Analytics (16)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
A Step Towards Reproducibility in R
A Step Towards Reproducibility in RA Step Towards Reproducibility in R
A Step Towards Reproducibility in R
 
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
Introducing Revolution R Open: Enhanced, Open Source R distribution from Revo...
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 

Dernier

HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 

Dernier (20)

HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 

Batter Up! Advanced Sports Analytics with R and Storm

  • 1. Batter Up! Advanced Sports Analytics with R and Storm Meeting the Real-Time Analytics Opportunity Head-On Bill Jacobs VP Product Marketing Revolution Analytics @bill_jacobs December 11, 2014 Allen Day Principal Data Scientist MapR Technologies @allenday Vineet Sharma Dir., Partner Marketing MapR Technologies
  • 2. Who Am I? Bill Jacobs, VP Product Marketing Revolution Analytics @bill_jacobs
  • 3. Polling Question #1: Who Are You? (choose one) –Statistician or modeler –Data Scientist –Hadoop Expert –Application builder –Data guru –Business user –Baseball fan
  • 4. Sports Analytics as Analogy. Sports Teams Are Like Other Corporations. –Great Value Achievable With Data –Vast Range of Data Sources –Timely Analysis Amplifies Value And apologies if you came to learn whom to bet upon in next year’s season.
  • 5. Game Changing Big Data Analytics Applications Marketing: Clickstream & Campaign Analyses Digital Media: Recommendation Engines Social Media: Sentiment Analysis Retail: Purchase Prediction Insurance: Fraud Waste and Abuse Healthcare Delivery: Treatment Outcome Prediction Risk Analysis: Insurance Underwriting Manufacturing: Predictive Maintenance Operations: Supply Chain Optimization Econometrics: Market Prediction Marketing: Mix and Price Optimization Life Sciences: Pharmacogenetics Transportation: Asset Utilization
  • 6. Polling Question #2: What Language or Tools Is In Use for Analytics (check all that apply) –R –SAS or SPSS –Python –Java –BI tools including: MSTR, Qlik, Tableau, Business Objects, Cognos –Salford Systems or MATLAB –H20, RapidMiner, KNIME or similar –Other data mining tools –Other programming languages –None or Don’t know
  • 7. WELCOME & INTRODUCTIONS R Open Source -Language, Community, Collaboration -Robert Gentleman & Ross Ihaka, 1993 -Version 1.0 released 2000 -2.5 Million Global Users -Over 4,800 add-on “Packages” -Why R? R in Universities = New Talent Emerging Modeling/Visualization Lower Cost Alternative Open Source = Flexible & Innovative Access to Free Packages
  • 8. R is exploding in popularity & functionality R Usage Growth Rexer Data Miner Survey, 2007-2013 70% of data miners report using R R is the first choice of more data miners than any other software Source: www.rexeranalytics.com
  • 9. Innovate with R Most widely used data analysis software •Used by 2M+ data scientists, statisticians and analysts Most powerful statistical programming language •Flexible, extensible and comprehensive for productivity Create beautiful and unique data visualizations •As seen in New York Times, Twitter and Flowing Data Thriving open-source community •Leading edge of analytics research Fills the talent gap •New graduates prefer R White Paper R is Hot bit.ly/r-is-hot
  • 10. Polling Question #3: How are you using R today? (choose one) –Not using R –Studying R now –Initial R project(s) underway –R is widely used for exploration & modeling –R is deployed into production
  • 11. Revolution Analytics In A Nutshell Our Vision: R is becoming the de- facto standard for enterprise predictive analytics Our Mission: Drive enterprise adoption of R by providing enhanced R products tailored to meet enterprise challenges
  • 12. Revolution Analytics Builds & Delivers: Software Products: Stable Distributions Broad Platform Support Big Data Analytics in R Application Integration Deployment Platforms Agile Development Tooling Future Platform Support Support & Services Commercial Support Programs Training Programs Professional Services Academic Support Programs IP Indemnification
  • 13. Revolution R Advantages for Analytics Professionals: Broadly-used, scalable R language Large (2M+), collaborative, young R analytics community Largest repository of statistical & analytical algorithms Big data analytics capabilities –Scales from workstations to Hadoop –Transparent parallelism –Cross platform compatibility –Multi-platform architectures Broadens career opportunities
  • 14. Revolution R Advantages for Business Executives Viable Alternative to Legacy Analytics Solutions –Predictable Time To Results –Simplified Licensing –All-Inclusive Environment Lower Staffing Costs Controllable Open Source Risks –Support –IP Infringement Protections
  • 15. Revolution R Advantages for IT Organizations Consistency Across Platforms Avoids Sprawl Support for Workstations, Servers, Hadoop, EDWs and Grids Heterogeneous Architecture Capabilities Integrates With Major BI & Application Tools Streamline Model Deployment Run Complex Analytics in the “Data Lake” Be a “Good Citizen” in shared systems Commercial Support Reduces Project Risks Quick Start Programs Accelerate Results Platform Continuity Future-Proofs Architectures
  • 16. YARN Revolution R Enterprise: Predictive Analytics Across Huge Data in Hadoop Exploration Visualization Predictive Modeling HDFS
  • 17. Polling Question #4: Stage of Hadoop Adoption? (choose one) –No Need –Studying –Setting-Up Hadoop –Experimenting with Hadoop –Deploying Hadoop Now –Hadoop in Production
  • 18. © 2014 MapR Technologies 18 Introducing: Vineet Sharma Director, Partner Marketing MapR Technologies
  • 19. © 2014 MapR Technologies 19 MapR + Revolution Leverages MapR As A Scalable Enterprise R Engine. • Plus: – Run RRE Analytics In MapR Hadoop Without Change – Eliminate Need To Design Parallel Software or “Think In MapReduce” – Leverage All Revolution R Enterprise Pre-Parallelized Algorithms – Enable Users To Build Custom Apps That Leverage Hadoop’s Parallelism – Slash Data Movement by Analyzing Data Inside the MapR Data Platform – Expand Deployment and Integration Options Rapid Adoption of R MapR Enterprise Data Platform Capabilities Broad Adoption of Hadoop for Big Data Analytics
  • 20. © 2014 MapR Technologies 20 Predictive Modeling Algorithms MapR FS Data Desktop Users with Analytical Access to Huge Data in Hadoop
  • 21. © 2014 MapR Technologies 21 MapR: Best Solution for Customer Success Top Ranked Exponential Growth 500+ Customers Premier Investors >2x annual bookings 80% of accounts expand 3X 90% software licenses < 1% lifetime churn > $1B in incremental revenue generated by 1 customer
  • 22. © 2014 MapR Technologies 22 Management MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill SQL Sqoop Sentry* Oozie ZooKeeper Flume Knox* Falcon* Whirr Data Integration & Access HttpFS Hue MapR-FS MapR-DB * Certification/support planned The Power of the Open Source Community
  • 23. © 2014 MapR Technologies 23 MapR Distribution for Hadoop Management MapR Data Platform APACHE HADOOP AND OSS ECOSYSTEM Security YARN Pig Cascading Spark Batch Spark Streaming Storm* Streaming HBase Solr NoSQL & Search Juju Provisioning & coordination Savannah* Mahout MLLib ML, Graph GraphX MapReduce v1 & v2 EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS Workflow & Data Governance Tez* Accumulo* Hive Impala Shark Drill SQL Sqoop Sentry* Oozie ZooKeeper Flume Knox* Falcon* Whirr Data Integration & Access HttpFS Hue Enterprise-grade Interoperability Performance Multi-tenancy Security Operational MapR-FS MapR-DB • Standard file access • Standard database access • Pluggable services • Broad developer support • Enterprise security authorization • Wire-level authentication • Data governance • Ability to support predictive analytics, real-time database operations, and support high arrival rate data • Ability to logically divide a cluster to support different use cases, job types, user groups, and administrators • 2X to 7X higher performance • Consistent, low latency • High availability • Data protection • Disaster recovery * Certification/support planned
  • 24. Management APACHE HADOOP & OSS ECOSYSTEM ZooKeeper Oozie Hue Pig Hive Impala Shark Flume HttpFS Cascading Solr Juju Mahout MLLib Storm Spark Streaming Sqoop Whirr HBase YARN Drill Tez Knox Sentry Spark Falcon Revolution R Enterprise and MapR Hadoop Edge Node BI and other Apps R on the Desktop Via Browsers Mobile YARN MapReduce MapR Data Platform
  • 25. © 2014 MapR Technologies 25 Introducing: Allen Day Principal Data Scientist MapR Technologies @allenday
  • 26. © 2014 MapR Technologies 26 Talk Overview • Agile Real-time Stats • R + Storm github.com/allenday/R-Storm • DEMO • How to do it? • Q & A @allenday Agile Methods Advanced Statistics Continuous Real-time Delivery github.com/allenday/hadoop-summit-r-storm-demo-public
  • 27. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 27 Architecting R into the Storm Application Development Process
  • 28. © 2014 MapR Technologies 28 Quick intro • Allen Day, Principal Data Scientist [ @allenday ] 7yr Hadoop dev, 12yr R dev/author PhD, Human Genetics, UCLA Medicine
  • 29. © 2014 MapR Technologies 29 What’s Storm? What’s R? • What’s Storm? – Processes a data stream. Akin to UNIX pipe + tee & merge commands – Runs on a cluster. Fault-tolerant and designed to scale out – Used for: real-time analytics & machine learning • What’s R? – Programming language with advanced statistics libraries – Does not scale out. Can scale up – Used for: prototyping, data modeling, visualization How to combine these?
  • 30. © 2014 MapR Technologies 30 R outside, Storm inside: not practical. Why? • Model-building and QA is done on data snapshots • However, R => Hadoop is realistic. Key difference: referenced data can be static – Use MapR snapshots for dev and QA – See also: RHIPE (Purdue) and RHadoop (RevolutionAnalytics) R Storm User
  • 31. © 2014 MapR Technologies 31 Storm outside, R inside: a good fit • Enables separation of concerns – Independently manage modeling, ops timelines, and version control – Integrate as needed • Enables role specialization – R built-ins allow faster iteration and more concise stats-type code – Do DevOps with specific SW engineering tech, e.g. Java Storm R User
  • 32. © 2014 MapR Technologies 32 © 2014 MapR Technologies Q: Who really likes statistics? A: Baseball fans A: Team Managers = Portfolio Managers
  • 33. © 2014 MapR Technologies 33 Famous Vintage Data Oakland Athletics 2002 Season 20 consecutive wins – the current record
  • 34. © 2014 MapR Technologies 34 © 2014 MapR Technologies Goal: Detect “Moneyball” 2002 Winning Streak
  • 35. © 2014 MapR Technologies 35 Methods: Change Point Detection Find natural breakpoints in a time-series set of data points R packages implement this: changepoint: more sensitve, but not streaming bcp: streaming, but less sensitive
  • 36. © 2014 MapR Technologies 36 GIFs to MapR Filesystem Methods: R+Storm Demo Architecture Storm Bolt R online change point detector Storm Bolt (write to Jetty) Oakland A’s Data (accelerated) Jetty Webserver Browser (D3.js) Us  github.com/allenday/hadoop-summit-r-storm-demo-public
  • 37. © 2014 MapR Technologies 37 © 2014 MapR Technologies 50-game sliding window/buffer to detect change points Cumulative history with detected break points Raw data (score difference between A’s and opponent) Demo
  • 38. © 2014 MapR Technologies 38 Methods Details: How it’s done • Uses R-Storm binding github.com/allenday/R-Storm – Storm package on CRAN cran.r-project.org/web/packages/Storm Storm (dev team) R (stats team) Storm (dev team, pure Java) Producer Consumer
  • 39. © 2014 MapR Technologies 39 Methods Details: Easy integration R: lambda function storm = Storm$new(); storm$lambda = function(s) { t = s$tuple; t$output = vector(length=1); t$output[1] = “tada!” s$emit(t) } Storm: extend ShellBolt public static class MyRBolt extends ShellBolt implements IRichBolt { public RBolt() { super("Rscript", ”my.R"); } }
  • 40. © 2014 MapR Technologies 40 Results • Change points are identified, but none for winning streak – Not using score difference, anyway • Time to integrate with the modeling team! – Send @kunpognr or @allenday a pull request on GitHub • Applicable to many other use cases, e.g. – Security (fraud detection, intrusion detection) – Marketing (intent to purchase / social media streams) – Customer Support (help desk voice calls) Discussion
  • 41. Polling Question #5 How important will Real-Time analytical apps become? (choose one) –Uncertain –Not important –Necessary –Critical
  • 42. Real-Time and Internet of Things: Foundation of a Compelling Trend for 2015 Big Data Analytics Meets The Internet of Things –Transactions + –Human Behavior + –Internet of Things: Sensors … and extracting value using –Traditional Statistics –Visualization –Machine Learning … plus adaptability –Real-Time –Agile Modeling & Fast Model Execution –Production Capable, Stable and Secure –Rapidly-Evolving Data Science
  • 43. Where Does Real Time Impact The Analytical Lifecycle? Data Engineering –Collection and Ingest –“Blending” Modeling –Aggregation, Segmentation & Exploration –Model Development & Optimization –Testing & Validation Operationalization –Deployment & Scoring –Delivery –Monitoring & Evaluation
  • 44. Typical Analytical Lifecycle Ingest Explore Model Deploy Score Act Measure Model Score
  • 45. More Complex Event Driven Analytical Cycle Historic Ingest Explore Model Deploy Act Measure Data Analytics & Process Design Scoring Event Ingest Trans- Form “Blend” Append Improve Enrich
  • 46. Real-Time Analytics Best Practices Develop a Common Lexicon for Real-Time Discriminate Between Needs of Each Stage in Lifecycle –Data Ingest & Manipulation and Enrichment –Data Source / Repository Integration Needs –Processes that “Fill the Lake” –Process that “Act on the Stream” –Vastly different computationally –Big differences in data ingest volume & latency Start with Tractable Goals –Anticipate Growing Requirements: Microbatched >> Interactive >> Autonomy Build for today, Architect for tomorrow
  • 47. Real-Time Realities Plan for Diverse Needs –Real-Time Score Retrieval, Scoring, Modeling –Wide-Ranging Performance – Microbatch – Interactive - Autonomous Fragmentation –Data Delivery Systems Pre-Exist –Will Vary Widely by Vertical Market –Competing Proprietary Solutions Growing Demand –Numerous high-value targets –“The next step”: Put big data analytics to work
  • 48. What’s Needed Real-Time Performance… plus… Agility –Deployment models –Organization –Infrastructure –Analytics Manageable Costs –Hadoop –Open Source R Production Platform(s) –Proven –Performant
  • 49. Next Steps… www.revolutionanalytcs.com Whitepaper: Revolution R for Hadoop: —http://www.revolutionanalytics.com/whitepaper/delivering-value-big-data-revolution-r-enterprise-and- hadoop —…or http://bit.ly/1ua43bu www.maprtech.com Resources: R foundation URL: www.r-project.org Download Revolution R: http://mran.revolutionanalytics.com/download/ Learn about Apache Storm: https://storm.apache.org/ R-Storm bindings: github.com/allenday/R-Storm Storm package on CRAN: cran.r-project.org/web/packages/Storm
  • 50. Thank you. www.revolutionanalytics.com 1.855.GET.REVO Twitter: @RevolutionR