SlideShare une entreprise Scribd logo
1  sur  16
David Smith
                           Revolution Analytics
                                   @revodavid




Real-Time Big Data Analytics
From Deployment to Production


                                            1
2
Buzzword
 Bingo!


           REAL TIME

           BIG DATA

   PREDICTIVE ANALYTICS
                          3
Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0   4
User ID
Predictive                                         Browser
                     Factors                       Time/Date / Location
                                                    Any known information
Analytics                                          Previous purchases
                                                   Friend data
Model
                                                                   Decision Tree
                                                                   Logistic Regression
                                                                   Neural Network
                                                                   Predictive Model
                                                                   K-means clustering
                   Scoring Rules                                   Ensemble Model

                                                   Product of most interest
                                                   Offer of most likely sale
                      Scores                       Most relevant Selection
                                                   Prediction or link
                                                   Forecast sale value
                                                   Optimal Bid
             ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0   5
Real-time Deployment
1. Data distillation
2. Model development and
   validation
3. Model deployment
4. Real-time model scoring
5. Model refresh
                 "CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0   6
1. Data Distillation in Hadoop

   Log Files


Sensor Streams HDFS Load    Map-Reduce   Structured
                                            Data
                                 rmr
  Language Text


 Unstructured                            Analytics
    Data                                 Data Mart

                                                      7
2. The Model Development Cycle
                                    Feature
                                   Selection
                                   Sampling
                                   Aggregati
                                      on
                   Model
                  Comparis                             Variable
Structured Data     on /
                   Bench-
                                                        Trans-
                                                      formation
                                                                  Predictive Model
                  marking




                         Model
                                                 Model
                        Refineme
                            nt
                                               Estimation           R White Paper
                                                                        bit.ly/r-is-hot



                                                                                          8
3: Deployment Options
                                 Factors
 Unknown factors
   SQL / Rules Engine
   Code (C++, Java, R, Hadoop)
   PMML Engine
 Factors known in advance
   Batch Lookup Tables           Scores


                                           9
Why did I buy that blender?
 Just browsing in the mall
 TV ad / magazine ad
 Coupon in the mail
 “Just moved” promo email
 Webstore recommendation
 Browsing catalog

                              10
UpStream: Attribution Modeling




                                 11
4. Model
                                  • Exploratory data analysis
Scoring                           • Time-to-event models
                                  • GAM survival models


UPSTREAM DATA                                                                        CUSTOM VARIABLES
FORMAT                                                                                         (PMML)




     •   ETL                                                    • Scoring for inference
     •   Marketing channel data                                 • Scoring for prediction
     •   Behavioral variables
                                                                • 5 billion scores per day
     •   Promotional data                                         per retailer
     •   Overlay data
5. Model refresh      Factors




                       Scores

                   Actual Outcomes
Big Data     Real Time
Kilobytes/S
               Seconds
     ec

Megabytes/
              Milliseconds
   Sec


 Gigabytes
                Minutes
 Terabytes



Petabytes    Minutes 
 Exabytes       Hours

                             14
PREDICTIVE
ANALYTICS
 BIG DATA

REAL TIME
             15
Real-Time Big Data Predictive Analytics:                                            David Smith
From Deployment to Production                                                             @revodavid




             The leading enterprise provider of software and services for Open Source R



                          Booth 618 / Office Hours Weds 1:30PM

    www.revolutionanalytics.com             +1 650 646 9545               Twitter: @RevolutionR




                                                                                                  16

Contenu connexe

Tendances

Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Pactera_US
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introductionSujaMaryD
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsRavi Teja
 
On Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesOn Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesPetteri Alahuhta
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniDonatella Cambosu
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesShilpi Sharma
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challengesfazail amin
 
Ehr challenges [bigdata]
Ehr challenges [bigdata]Ehr challenges [bigdata]
Ehr challenges [bigdata]Nesma Almoazamy
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7Rohit Mittal
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraChun Myung Kyu
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentationiHub Research
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 
Big data Presentation
Big data PresentationBig data Presentation
Big data PresentationAswadmehar
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data scienceMahesh Kumar CV
 

Tendances (20)

Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data
 
Unit i big data introduction
Unit  i big data introductionUnit  i big data introduction
Unit i big data introduction
 
Apply (Big) Data Analytics & Predictive Analytics to Business Application
Apply (Big) Data Analytics & Predictive Analytics to Business ApplicationApply (Big) Data Analytics & Predictive Analytics to Business Application
Apply (Big) Data Analytics & Predictive Analytics to Business Application
 
Tools for Unstructured Data Analytics
Tools for Unstructured Data AnalyticsTools for Unstructured Data Analytics
Tools for Unstructured Data Analytics
 
On Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesOn Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challenges
 
Come diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo PellegriniCome diventare data scientist - Paolo Pellegrini
Come diventare data scientist - Paolo Pellegrini
 
Big data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & ChallengesBig data - Key Enablers, Drivers & Challenges
Big data - Key Enablers, Drivers & Challenges
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Big data deep learning: applications and challenges
Big data deep learning: applications and challengesBig data deep learning: applications and challenges
Big data deep learning: applications and challenges
 
Ehr challenges [bigdata]
Ehr challenges [bigdata]Ehr challenges [bigdata]
Ehr challenges [bigdata]
 
BigData Analytics_1.7
BigData Analytics_1.7BigData Analytics_1.7
BigData Analytics_1.7
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Big Data analytics
Big Data analyticsBig Data analytics
Big Data analytics
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infra
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 
Big data Presentation
Big data PresentationBig data Presentation
Big data Presentation
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Demystify big data data science
Demystify big data  data scienceDemystify big data  data science
Demystify big data data science
 

En vedette

Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Mobile Commerce - como aprender, medir e converter
Mobile Commerce - como aprender, medir e converterMobile Commerce - como aprender, medir e converter
Mobile Commerce - como aprender, medir e converterVictor Lima
 
The 2012 Future of Open Source Survey Results
The 2012 Future of Open Source Survey ResultsThe 2012 Future of Open Source Survey Results
The 2012 Future of Open Source Survey ResultsBlack Duck by Synopsys
 
Gercek Zamanli Odeme Sistemleri Analitigi
Gercek Zamanli Odeme Sistemleri AnalitigiGercek Zamanli Odeme Sistemleri Analitigi
Gercek Zamanli Odeme Sistemleri AnalitigiHakan ERDOGAN
 
Conversion Optimization with Realtime Payment Analytics - 2014-11-19
Conversion Optimization with Realtime Payment Analytics - 2014-11-19Conversion Optimization with Realtime Payment Analytics - 2014-11-19
Conversion Optimization with Realtime Payment Analytics - 2014-11-19Hakan ERDOGAN
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveEMC
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopSkillspeed
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning Pranya Prabhakar
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi
 
Design patterns in MapReduce
Design patterns in MapReduceDesign patterns in MapReduce
Design patterns in MapReduceAkhilesh Joshi
 
Predictive Analytics on Big Data. DIY or BUY?
Predictive Analytics on Big Data. DIY or BUY?Predictive Analytics on Big Data. DIY or BUY?
Predictive Analytics on Big Data. DIY or BUY?Apigee | Google Cloud
 
Population Health Management, Predictive Analytics, Big Data and Text Analytics
Population Health Management, Predictive Analytics, Big Data and Text AnalyticsPopulation Health Management, Predictive Analytics, Big Data and Text Analytics
Population Health Management, Predictive Analytics, Big Data and Text AnalyticsFrank Wang
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsTeradata Aster
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallDATAVERSITY
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoopVarun Narang
 

En vedette (20)

Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Mobile Commerce - como aprender, medir e converter
Mobile Commerce - como aprender, medir e converterMobile Commerce - como aprender, medir e converter
Mobile Commerce - como aprender, medir e converter
 
The 2012 Future of Open Source Survey Results
The 2012 Future of Open Source Survey ResultsThe 2012 Future of Open Source Survey Results
The 2012 Future of Open Source Survey Results
 
R2DOCX example
R2DOCX exampleR2DOCX example
R2DOCX example
 
Gercek Zamanli Odeme Sistemleri Analitigi
Gercek Zamanli Odeme Sistemleri AnalitigiGercek Zamanli Odeme Sistemleri Analitigi
Gercek Zamanli Odeme Sistemleri Analitigi
 
Conversion Optimization with Realtime Payment Analytics - 2014-11-19
Conversion Optimization with Realtime Payment Analytics - 2014-11-19Conversion Optimization with Realtime Payment Analytics - 2014-11-19
Conversion Optimization with Realtime Payment Analytics - 2014-11-19
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
Predicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via HadoopPredicting Consumer Behaviour via Hadoop
Predicting Consumer Behaviour via Hadoop
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
 
Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010Daniel Abadi HadoopWorld 2010
Daniel Abadi HadoopWorld 2010
 
Design patterns in MapReduce
Design patterns in MapReduceDesign patterns in MapReduce
Design patterns in MapReduce
 
Predictive Analytics on Big Data. DIY or BUY?
Predictive Analytics on Big Data. DIY or BUY?Predictive Analytics on Big Data. DIY or BUY?
Predictive Analytics on Big Data. DIY or BUY?
 
Population Health Management, Predictive Analytics, Big Data and Text Analytics
Population Health Management, Predictive Analytics, Big Data and Text AnalyticsPopulation Health Management, Predictive Analytics, Big Data and Text Analytics
Population Health Management, Predictive Analytics, Big Data and Text Analytics
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
From Business Intelligence to Predictive Analytics
From Business Intelligence to Predictive AnalyticsFrom Business Intelligence to Predictive Analytics
From Business Intelligence to Predictive Analytics
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
Predictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal BallPredictive Analytics - How to get stuff out of your Crystal Ball
Predictive Analytics - How to get stuff out of your Crystal Ball
 
Seminar_Report_hadoop
Seminar_Report_hadoopSeminar_Report_hadoop
Seminar_Report_hadoop
 

Similaire à Real-time Big Data Analytics: From Deployment to Production

Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionRevolution Analytics
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Kun Le
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Sybase Türkiye
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase
 
Today's BI and Data Mining ecosystem
Today's BI and Data Mining ecosystemToday's BI and Data Mining ecosystem
Today's BI and Data Mining ecosystemJosep Arroyo
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data European Data Forum
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureOdinot Stanislas
 
Today's bi and data mining ecosystem v2
Today's bi and data mining ecosystem v2Today's bi and data mining ecosystem v2
Today's bi and data mining ecosystem v2Josep Arroyo
 
Analyzing Multi-Structured Data
Analyzing Multi-Structured DataAnalyzing Multi-Structured Data
Analyzing Multi-Structured DataDataWorks Summit
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaleBase
 
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Our Social Times
 
IBM Cognos - Vad handlar egentligen prediktiv analys om?
IBM Cognos - Vad handlar egentligen prediktiv analys om?IBM Cognos - Vad handlar egentligen prediktiv analys om?
IBM Cognos - Vad handlar egentligen prediktiv analys om?IBM Sverige
 
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Foviance
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)Mark Heid
 
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
An Integrated Framework for Parameter-based Optimization of Scientific WorkflowsAn Integrated Framework for Parameter-based Optimization of Scientific Workflows
An Integrated Framework for Parameter-based Optimization of Scientific Workflowsvijayskumar
 

Similaire à Real-time Big Data Analytics: From Deployment to Production (20)

Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...Best practices for building and deploying predictive models over big data pre...
Best practices for building and deploying predictive models over big data pre...
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOutScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
ScaleBase Webinar 8.16: ScaleUp vs. ScaleOut
 
Today's BI and Data Mining ecosystem
Today's BI and Data Mining ecosystemToday's BI and Data Mining ecosystem
Today's BI and Data Mining ecosystem
 
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
EDF2013: Selected Talk: Bryan Drexler: The 80/20 Rule and Big Data
 
Big Data and Implications on Platform Architecture
Big Data and Implications on Platform ArchitectureBig Data and Implications on Platform Architecture
Big Data and Implications on Platform Architecture
 
Today's bi and data mining ecosystem v2
Today's bi and data mining ecosystem v2Today's bi and data mining ecosystem v2
Today's bi and data mining ecosystem v2
 
Analyzing Multi-Structured Data
Analyzing Multi-Structured DataAnalyzing Multi-Structured Data
Analyzing Multi-Structured Data
 
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL DatabaseScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
ScaleBase Webinar: Methods and Challenges to Scale Out a MySQL Database
 
Scaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data DistributionScaling MySQL: Benefits of Automatic Data Distribution
Scaling MySQL: Benefits of Automatic Data Distribution
 
Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13Marshall Sponder - Social Media Monitoring Analytics - Measure13
Marshall Sponder - Social Media Monitoring Analytics - Measure13
 
Barak regev
Barak regevBarak regev
Barak regev
 
IBM Cognos - Vad handlar egentligen prediktiv analys om?
IBM Cognos - Vad handlar egentligen prediktiv analys om?IBM Cognos - Vad handlar egentligen prediktiv analys om?
IBM Cognos - Vad handlar egentligen prediktiv analys om?
 
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
Neil Mason presents on Data Mining and Predictive Analytics at Emetrics San F...
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
 
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
An Integrated Framework for Parameter-based Optimization of Scientific WorkflowsAn Integrated Framework for Parameter-based Optimization of Scientific Workflows
An Integrated Framework for Parameter-based Optimization of Scientific Workflows
 

Plus de Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint packageRevolution Analytics
 

Plus de Revolution Analytics (20)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Simple Reproducibility with the checkpoint package
Simple Reproducibilitywith the checkpoint packageSimple Reproducibilitywith the checkpoint package
Simple Reproducibility with the checkpoint package
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 

Real-time Big Data Analytics: From Deployment to Production

  • 1. David Smith Revolution Analytics @revodavid Real-Time Big Data Analytics From Deployment to Production 1
  • 2. 2
  • 3. Buzzword Bingo! REAL TIME BIG DATA PREDICTIVE ANALYTICS 3
  • 4. Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0 4
  • 5. User ID Predictive Browser Factors Time/Date / Location Any known information Analytics Previous purchases Friend data Model Decision Tree Logistic Regression Neural Network Predictive Model K-means clustering Scoring Rules Ensemble Model Product of most interest Offer of most likely sale Scores Most relevant Selection Prediction or link Forecast sale value Optimal Bid ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0 5
  • 6. Real-time Deployment 1. Data distillation 2. Model development and validation 3. Model deployment 4. Real-time model scoring 5. Model refresh "CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0 6
  • 7. 1. Data Distillation in Hadoop Log Files Sensor Streams HDFS Load Map-Reduce Structured Data rmr Language Text Unstructured Analytics Data Data Mart 7
  • 8. 2. The Model Development Cycle Feature Selection Sampling Aggregati on Model Comparis Variable Structured Data on / Bench- Trans- formation Predictive Model marking Model Model Refineme nt Estimation R White Paper bit.ly/r-is-hot 8
  • 9. 3: Deployment Options Factors Unknown factors SQL / Rules Engine Code (C++, Java, R, Hadoop) PMML Engine Factors known in advance Batch Lookup Tables Scores 9
  • 10. Why did I buy that blender? Just browsing in the mall TV ad / magazine ad Coupon in the mail “Just moved” promo email Webstore recommendation Browsing catalog 10
  • 12. 4. Model • Exploratory data analysis Scoring • Time-to-event models • GAM survival models UPSTREAM DATA CUSTOM VARIABLES FORMAT (PMML) • ETL • Scoring for inference • Marketing channel data • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per retailer • Overlay data
  • 13. 5. Model refresh Factors Scores Actual Outcomes
  • 14. Big Data Real Time Kilobytes/S Seconds ec Megabytes/ Milliseconds Sec Gigabytes Minutes  Terabytes Petabytes  Minutes  Exabytes Hours 14
  • 16. Real-Time Big Data Predictive Analytics: David Smith From Deployment to Production @revodavid The leading enterprise provider of software and services for Open Source R Booth 618 / Office Hours Weds 1:30PM www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR 16

Notes de l'éditeur

  1. Get out your buzzword bingo cards!
  2. Data as “new oil” – valuable commodityBig Data is crude oil: messy, hard to get at, got contaminants in it.
  3. Start off with stuff we know in real time.
  4. Model development processNot just about the computational speed. Also about productivity of developer.
  5. Demographics: consumer, product, marketActions: web clicks, email clicks, mobile app usage, call center logs, social, search …Outcomes: impressions, touches, orders (retail, online, mobile)Strategic allocation
  6. Outcome is “buying” instead of “dying”
  7. From Revolution Analytics. We help companies deploy predictive models created in R to real-time production systems.