Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Real-time Big Data Analytics: From Deployment to Production

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 16 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (20)

Publicité

Similaire à Real-time Big Data Analytics: From Deployment to Production (20)

Plus par Revolution Analytics (20)

Publicité

Real-time Big Data Analytics: From Deployment to Production

  1. 1. David Smith Revolution Analytics @revodavid Real-Time Big Data Analytics From Deployment to Production 1
  2. 2. 2
  3. 3. Buzzword Bingo! REAL TIME BIG DATA PREDICTIVE ANALYTICS 3
  4. 4. Photo: Sarah&Boston (flickr: pocheco) Creative Commons BY-SA 2.0 4
  5. 5. User ID Predictive Browser Factors Time/Date / Location Any known information Analytics Previous purchases Friend data Model Decision Tree Logistic Regression Neural Network Predictive Model K-means clustering Scoring Rules Ensemble Model Product of most interest Offer of most likely sale Scores Most relevant Selection Prediction or link Forecast sale value Optimal Bid ”IO VAPOURA” by Jaya Prime flickr.com/photos/sanjayaprime/4924462993 CC-BY 2.0 5
  6. 6. Real-time Deployment 1. Data distillation 2. Model development and validation 3. Model deployment 4. Real-time model scoring 5. Model refresh "CLOCK" by Heiko Klingele flickr.com/photos/divdax/3458668053/ CC-BY 2.0 6
  7. 7. 1. Data Distillation in Hadoop Log Files Sensor Streams HDFS Load Map-Reduce Structured Data rmr Language Text Unstructured Analytics Data Data Mart 7
  8. 8. 2. The Model Development Cycle Feature Selection Sampling Aggregati on Model Comparis Variable Structured Data on / Bench- Trans- formation Predictive Model marking Model Model Refineme nt Estimation R White Paper bit.ly/r-is-hot 8
  9. 9. 3: Deployment Options Factors Unknown factors SQL / Rules Engine Code (C++, Java, R, Hadoop) PMML Engine Factors known in advance Batch Lookup Tables Scores 9
  10. 10. Why did I buy that blender? Just browsing in the mall TV ad / magazine ad Coupon in the mail “Just moved” promo email Webstore recommendation Browsing catalog 10
  11. 11. UpStream: Attribution Modeling 11
  12. 12. 4. Model • Exploratory data analysis Scoring • Time-to-event models • GAM survival models UPSTREAM DATA CUSTOM VARIABLES FORMAT (PMML) • ETL • Scoring for inference • Marketing channel data • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per retailer • Overlay data
  13. 13. 5. Model refresh Factors Scores Actual Outcomes
  14. 14. Big Data Real Time Kilobytes/S Seconds ec Megabytes/ Milliseconds Sec Gigabytes Minutes  Terabytes Petabytes  Minutes  Exabytes Hours 14
  15. 15. PREDICTIVE ANALYTICS BIG DATA REAL TIME 15
  16. 16. Real-Time Big Data Predictive Analytics: David Smith From Deployment to Production @revodavid The leading enterprise provider of software and services for Open Source R Booth 618 / Office Hours Weds 1:30PM www.revolutionanalytics.com +1 650 646 9545 Twitter: @RevolutionR 16

Notes de l'éditeur

  • Get out your buzzword bingo cards!
  • Data as “new oil” – valuable commodityBig Data is crude oil: messy, hard to get at, got contaminants in it.
  • Start off with stuff we know in real time.
  • Model development processNot just about the computational speed. Also about productivity of developer.
  • Demographics: consumer, product, marketActions: web clicks, email clicks, mobile app usage, call center logs, social, search …Outcomes: impressions, touches, orders (retail, online, mobile)Strategic allocation
  • Outcome is “buying” instead of “dying”
  • From Revolution Analytics. We help companies deploy predictive models created in R to real-time production systems.

×