SlideShare une entreprise Scribd logo
1  sur  31
Machine Learning in Big Data
- Look forward or be left behind
V. William Porto
Hadoop Summit Dublin 2016
Overview of RedPoint Global
2  RedPoint Global Inc. 2016 Confidential
Launchedin2006
Foundedandstaffedbyindustryveterans
Headquarters: Wellesley,Massachusetts
OfficesinUS,UK,Australia,Philippines
Globalcustomerbase
Servesmostmajorindustries
Overview of RedPoint Global
3  RedPoint Global Inc. 2016 Confidential
MAGIC QUADRANT
Data Quality
MAGIC QUADRANT
Integrated Marketing
Management
MAGIC QUADRANT
Multichannel Campaign
Management
MAGIC QUADRANT
Digital Marketing Hubs
FORRESTER WAVE™
Cross-channel
Campaign Management
FORRESTER WAVE™
Data Quality Solutions
4  RedPoint Global Inc. 2015 Confidential
With apologies to Gary Larson
Hadoop
5  RedPoint Global Inc. 2015 Confidential
Machine Learning – why bother?
If you have always done it that way, it is probably wrong” - Charles Kettering
6  RedPoint Global Inc. 2015 Confidential
Machine Learning – keeping ahead of the curve
• Three basic tenants for success in today’s world
• Prediction - you need to learn and use what you’ve learned
• Optimization - the world is a dynamic place
• Automation - because people don’t scale well
7  RedPoint Global Inc. 2015 Confidential
Machine Learning – what really is it all about?
• Learning vs. instruction
• Humans learn instinctively – computers not so much
• Intelligent Systems
• Memory
• Prediction (modeling)
• Assessment
• Feedback
• Adaptation
8  RedPoint Global Inc. 2015 Confidential
Data Modeling – what, why, how
• Regression – what happened in the past
• Prediction – what will happen in the future
“Prediction is very difficult – especially if it’s about the future”
- Nihls Bohr
9  RedPoint Global Inc. 2015 Confidential
Data Modeling – what, why, how
The wide world of data modeling
• Supervised models
• you have historical data and known correlated outputs (truth)
• Unsupervised models
• historical data, but may not have (or trust) associated outputs
10  RedPoint Global Inc. 2015 Confidential
Decision Trees
Major Assumption: the world is discrete
• fast, easy to understand, no linearity assumptions
• ‘human time’ required, unbalanced and/or large trees
11  RedPoint Global Inc. 2015 Confidential
Standard Linear Models
Assumption: the world is linear
• the real world really isn’t linear
• all errors are not all equal
• easy to get misleading results
? !
Which line is best?
12  RedPoint Global Inc. 2015 Confidential
Generalized ‘Non-Linear’ Models
Assumptions
• underlying functional mapping is known
• all errors are equal
• data is ‘well-conditioned’
• ‘standard’ error distribution
• Polynomials
• Exponentials (e.g., Gaussian, Poisson)
• Piece-wise linear
13  RedPoint Global Inc. 2015 Confidential
Non-Linear Models
Assumption: data is representative
• ‘universal’ modeling tools
• fast execution
• no linearity assumptions
• lots of parameters, many techniques
• difficult to explain
Artificial Neural Network
14  RedPoint Global Inc. 2015 Confidential
User Story: Predict Retention / Attrition
Historical Behavioral Data
Customer
Rating
Retention Customer Name
Loyalty
Member
Days Since
Last Purchase
Immediate
Relatives
Household
Children
Customer ID
Latest
Purchase
Price
Latest
Purchase
Item ID
Region
Code
Customer
Capture
Method
Customer
Contact Code
Domicile
1 1 Allen, Geraldine yes 29 0 2 24160 211.39 B5 MW 2 6 St Louis, MO
1 1 Anderson, Harry no 48 0 3 19952 26.55 E12 NE 3 New York, NY
1 1 Andrews, Cynthia yes 63 1 0 13502 77.95 D7 NE 10 6 Hudson, NY
1 0 Andrews, Thomas Jr no 39 0 0 112050 0 A36 SW Los Angeles, CA
1 1 Appleton, Mary yes 53 2 3 11769 51.49 C101 NE D Bayside, Queens, NY
1 0 Ashbury, Jeffrey no 47 1 0 PC 17757 29.99 C62 C64 NE 124 New York, NY
1 1 Aston, Mrs. yes 18 1 0 PC 17757 29.99 C62 C64 NE 4 New York, NY
1 1 Barber, Ellen yes 26 0 2 19877 78.85 S 6
1 1 Barkley, Henry no 80 0 0 27042 30 A23 NE B Yorktown, PA
1 0 Baumann, David no 0 0 PC 17318 25.99 NE New York, NY
1 1 Bazzeno, Alice yes 32 0 1 11813 76.95 D15 C 8 34
1 0 Beattie, Mr. Samuel no 36 0 0 13050 75.29 C6 C A 11 Winnipeg, MN
1 1 Beckworth, June yes 47 1 1 11751 52.49 D35 NE 5 New York, NY
1 1 Behr, John no 26 0 0 111369 30 C148 NE 5 New York, NY
1 1 Biden, Roseanne yes 42 0 0 PC 17757 127.99 C 4
1 1 Bird, Ellen yes 29 0 0 PC 17483 18.95 C97 S 8
1 0 Birnbaum, Jason no 25 0 0 13905 26 C 148 San Francisco, CA
15  RedPoint Global Inc. 2015 Confidential
User Story: Predict Customer Retention / Attrition
Machine Learning Processing Chain - Training
16  RedPoint Global Inc. 2015 Confidential
User Story: Predict Retention / Attrition
Machine Learning Processing Chain - Prediction
Reward predicted
‘retainees’ with
targeted product
offerings
Give potential attrition
customers special
incentives to stay with
the business
17  RedPoint Global Inc. 2015 Confidential
User Story: Accurate vs. Useful Prediction
Sparse data + Least-Squares (Linear) Classifier
• Task: predict chance of purchasing a sundry item
• Result: ‘best’ model always predicts “none”
• Analysis: LS algorithm assumes all errors are equal
Bread
Cake &
Pie
Chocolate Coffee Cookie Diesel
Juice &
Smoothies
Lubricants Milk
Other
Bakery
Premium Sandwich Snack Tea
Total
Transaction
Total
Revenue
0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3000
0 0 0 0 0 3 0 0 0 0 0 0 0 0 3 2000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1800
0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 4800
0 0 0 2 0 0 0 0 0 0 0 0 0 0 2 100
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1828
0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 16460
0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 1000
0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 1500
0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 4600
0 0 0 0 0 11 0 0 0 0 0 0 0 0 11 19381.5
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1860
0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3000
0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 9838.82
0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 11000
0 0 0 0 0 5 0 0 0 0 0 0 0 0 19 18225
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 500
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 800
0 0 0 0 0 0 0 0 0 0 0 1 0 0 7 7990
0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 3820
0 0 0 0 0 1 0 0 0 0 0 0 0 0 55 43230
18  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation – group think
Collaborative Filtering
Relationship Matrix
19  RedPoint Global Inc. 2015 Confidential
Personalization – not really
!=
20  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation
Similarity?
Customer Browser Gender
Age
Sector
Income
Sector
Married Children Homeowner
Recent Baby
Clothes
Purchase
George IE9 M 0 A N 0 1 N
Carol Chrome F 1 B Y 1 0 Y
Mary IE9 F 0 A N 1 0 Y
Dist(George,Carol) = 8
Dist(George,Mary) = 4
Dist(Carol,Mary) = 4
Can you afford to target (George,Mary) the
same way as (Carol,Mary) ?
21  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation
Basic Question – which one describes the data the best?
Raw data
How many clusters are there ?
Two Clusters
Four Clusters
Six Clusters
22  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation with Statistics
• relatively simple
• data distribution assumptions
• initialization dependencies
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
Raw Data
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
Ellipsoidal Clustering
0
10
20
30
40
50
60
70
80
90
100
0 20 40 60 80 100
K-Means Clustering
23  RedPoint Global Inc. 2015 Confidential
Clustering/Segmentation – data driven
• let the data speak for itself
• multiple data projection ‘views’
• important boundary relationships
(“swing voters”)
Customer Demographics
24  RedPoint Global Inc. 2015 Confidential
User Story: Clustering / Segmentation
ML Clustering - Training ML Clustering – Processing New Data
25  RedPoint Global Inc. 2015 Confidential
Model Selection – how to choose?
• Basic Model Type (prediction or segmentation)
• inputs + correlated outputs
• inputs only?
• Basic Questions:
• what to use for my problem?
• parameters?
• is this the best choice?
• could I do better, and how?
26  RedPoint Global Inc. 2015 Confidential
Optimization – Evolving better solutions
• Simulated Evolution
• fast, efficient search
• always have a solution
• arbitrary ‘evaluation’ functions
• can start with existing solution(s)
• Variation – alter model type, parameters
• Assessment – how well does the model work?
• Selection – survival of the fittest
27  RedPoint Global Inc. 2015 Confidential
Evolutionary Optimization – Evaluation Function
• can use any measureable data
• no continuity assumptions
• no differentiability assumptions
• no symmetry assumptions
Sunshine Hurricane
20 -1000
5 50
Sunshine
Hurricane
Prediction
Reality (Truth)
28  RedPoint Global Inc. 2015 Confidential
User Story: Optimizing Classification Models
Task: Predict Retention/Attrition
62.00
70.2
72.3 73.4 75.2
34.8
28.8
24.5
22.1 20.9
0.00
20.00
40.00
60.00
80.00
100.00
0 1 2 3 4 5 6
Performance
Generation
Model Performance Optimization
Classification Accuracy
Test Set Error (RMS)
17 Potential input features
(customer demographics)
2 outputs (retention/attrition)
1300 Training Samples (50 – 50, A / B Split)
1300 Test Samples ( naïve test data )
29  RedPoint Global Inc. 2015 Confidential
Use Case – Fully Adaptive Feedback (Next Best Offer)
DB
Historical User
Behavior
(stimulus/response)
Train / Update
Model
Non-Adaptive
(Fixed) Mode
Randomized A/B/C
Offer Selection
Adaptive
ML Mode
ML Prediction
Offer Selection
Operation
(Trigger)
Ad / Offer
(stimulus)
Feedback
Cycle
30  RedPoint Global Inc. 2015 Confidential
Five Keys to Successful Machine Learning
• Let the data speak for itself – don’t force fit your models
• Remember, all errors are not all equal – use this to your advantage
• True learning requires continual adaptation !
• Automate the process with feedback – remove the “man-in-the-loop”
• Trust the optimization process – it really works!
31  RedPoint Global Inc. 2015 Confidential
Q&A
Contact Info
Visit : www.redpoint.net
Bill Porto
Sr. Engineering Analyst
RedPoint Global Inc.
vwporto@redpoint.net
Want More Information about this topic?
Fill out your card or go to redpoint.net/hadoopeurope

Contenu connexe

Tendances

Point processing
Point processingPoint processing
Point processingpanupriyaa7
 
M2M - Machine to Machine Technology
M2M - Machine to Machine TechnologyM2M - Machine to Machine Technology
M2M - Machine to Machine TechnologySamip jain
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
IoT Communication Protocols
IoT Communication ProtocolsIoT Communication Protocols
IoT Communication ProtocolsPradeep Kumar TS
 
Cyber Physical System: Architecture, Applications and Research Challenges
Cyber Physical System: Architecture, Applicationsand Research ChallengesCyber Physical System: Architecture, Applicationsand Research Challenges
Cyber Physical System: Architecture, Applications and Research ChallengesSyed Hassan Ahmed
 
IoT Enabling Technologies
IoT Enabling TechnologiesIoT Enabling Technologies
IoT Enabling TechnologiesPrakash Honnur
 
Internet of things
Internet of thingsInternet of things
Internet of thingsPalak Sood
 
IOT System Management with NETCONF-YANG.pptx
IOT System Management with NETCONF-YANG.pptxIOT System Management with NETCONF-YANG.pptx
IOT System Management with NETCONF-YANG.pptxArchanaPandiyan
 
Biometric security using cryptography
Biometric security using cryptographyBiometric security using cryptography
Biometric security using cryptographySampat Patnaik
 
20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging Technologies20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging TechnologiesSeminar Links
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannswapnac12
 
digital image processing
digital image processingdigital image processing
digital image processingAbinaya B
 
Digital Image Processing Fundamental
Digital Image Processing FundamentalDigital Image Processing Fundamental
Digital Image Processing FundamentalThuong Nguyen Canh
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learningdataalcott
 
Internet of Things (IOT) - Technology and Applications
Internet of Things (IOT) - Technology and ApplicationsInternet of Things (IOT) - Technology and Applications
Internet of Things (IOT) - Technology and ApplicationsDr. Mazlan Abbas
 
Frequency Domain Image Enhancement Techniques
Frequency Domain Image Enhancement TechniquesFrequency Domain Image Enhancement Techniques
Frequency Domain Image Enhancement TechniquesDiwaker Pant
 

Tendances (20)

Point processing
Point processingPoint processing
Point processing
 
M2M - Machine to Machine Technology
M2M - Machine to Machine TechnologyM2M - Machine to Machine Technology
M2M - Machine to Machine Technology
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Semantic Web
Semantic WebSemantic Web
Semantic Web
 
IoT Communication Protocols
IoT Communication ProtocolsIoT Communication Protocols
IoT Communication Protocols
 
Cyber Physical System: Architecture, Applications and Research Challenges
Cyber Physical System: Architecture, Applicationsand Research ChallengesCyber Physical System: Architecture, Applicationsand Research Challenges
Cyber Physical System: Architecture, Applications and Research Challenges
 
IoT Enabling Technologies
IoT Enabling TechnologiesIoT Enabling Technologies
IoT Enabling Technologies
 
Internet of things
Internet of thingsInternet of things
Internet of things
 
IOT System Management with NETCONF-YANG.pptx
IOT System Management with NETCONF-YANG.pptxIOT System Management with NETCONF-YANG.pptx
IOT System Management with NETCONF-YANG.pptx
 
Stroke Prediction
Stroke PredictionStroke Prediction
Stroke Prediction
 
Biometric security using cryptography
Biometric security using cryptographyBiometric security using cryptography
Biometric security using cryptography
 
20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging Technologies20 Latest Computer Science Seminar Topics on Emerging Technologies
20 Latest Computer Science Seminar Topics on Emerging Technologies
 
Using prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbannUsing prior knowledge to initialize the hypothesis,kbann
Using prior knowledge to initialize the hypothesis,kbann
 
Iot
IotIot
Iot
 
digital image processing
digital image processingdigital image processing
digital image processing
 
Digital Image Processing Fundamental
Digital Image Processing FundamentalDigital Image Processing Fundamental
Digital Image Processing Fundamental
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learning
 
Internet of Things (IOT) - Technology and Applications
Internet of Things (IOT) - Technology and ApplicationsInternet of Things (IOT) - Technology and Applications
Internet of Things (IOT) - Technology and Applications
 
Frequency Domain Image Enhancement Techniques
Frequency Domain Image Enhancement TechniquesFrequency Domain Image Enhancement Techniques
Frequency Domain Image Enhancement Techniques
 
Cognitive computing
Cognitive computing Cognitive computing
Cognitive computing
 

En vedette

Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyDataWorks Summit
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016alanfgates
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016alanfgates
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015alanfgates
 
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016alanfgates
 
Hortonworks apache training
Hortonworks apache trainingHortonworks apache training
Hortonworks apache trainingalanfgates
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015alanfgates
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...DataWorks Summit
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big DataDataWorks Summit
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-productionTuri, Inc.
 

En vedette (17)

Harnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case StudyHarnessing Hadoop Distuption: A Telco Case Study
Harnessing Hadoop Distuption: A Telco Case Study
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Fast Distributed Online Classification
Fast Distributed Online Classification Fast Distributed Online Classification
Fast Distributed Online Classification
 
Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016Hive2.0 big dataspain-nov-2016
Hive2.0 big dataspain-nov-2016
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
 
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
Hive & HBase for Transaction Processing Hadoop Summit EU Apr 2015
 
Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016Keynote apache bd-eu-nov-2016
Keynote apache bd-eu-nov-2016
 
Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
Hortonworks apache training
Hortonworks apache trainingHortonworks apache training
Hortonworks apache training
 
The Heterogeneous Data lake
The Heterogeneous Data lakeThe Heterogeneous Data lake
The Heterogeneous Data lake
 
Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015Hive acid-updates-strata-sjc-feb-2015
Hive acid-updates-strata-sjc-feb-2015
 
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
Apache Flume - Streaming data easily to Hadoop from any source for Telco oper...
 
Machine Learning in Big Data
Machine Learning in Big DataMachine Learning in Big Data
Machine Learning in Big Data
 
Real-World NoSQL Schema Design
Real-World NoSQL Schema DesignReal-World NoSQL Schema Design
Real-World NoSQL Schema Design
 
Py data scikit-production
Py data scikit-productionPy data scikit-production
Py data scikit-production
 

Similaire à Machine Learning in Big Data

Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for BanksProfinit
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesDATAVERSITY
 
Presentation for the Nexus Conference on the Internet of Things and the Evolu...
Presentation for the Nexus Conference on the Internet of Things and the Evolu...Presentation for the Nexus Conference on the Internet of Things and the Evolu...
Presentation for the Nexus Conference on the Internet of Things and the Evolu...Lora Cecere
 
Supply Chain 2030: Presentation by Lora Cecere at CLX Conference
Supply Chain 2030: Presentation by Lora Cecere at CLX ConferenceSupply Chain 2030: Presentation by Lora Cecere at CLX Conference
Supply Chain 2030: Presentation by Lora Cecere at CLX ConferenceLora Cecere
 
Optimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckOptimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckTamrMarketing
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupKen Tucker
 
20151008 REx Predictive presentation v 1 0 - distributed
20151008 REx Predictive presentation v 1 0 - distributed20151008 REx Predictive presentation v 1 0 - distributed
20151008 REx Predictive presentation v 1 0 - distributedJefferson Lynch
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & MoreThe New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & MoreAggregage
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?Ganes Kesari
 
Future of Supply Chain Technologies
Future of Supply Chain TechnologiesFuture of Supply Chain Technologies
Future of Supply Chain TechnologiesLora Cecere
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninInside Analysis
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industryGramener
 
Clarity First - Problem Solving
Clarity First - Problem Solving Clarity First - Problem Solving
Clarity First - Problem Solving TKMG, Inc.
 
Powering Supply Chain Transformation Through Analytics Innovation
Powering Supply Chain Transformation Through Analytics InnovationPowering Supply Chain Transformation Through Analytics Innovation
Powering Supply Chain Transformation Through Analytics Innovationloracecere1
 
Organisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesOrganisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesBarry Magee
 
Organisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesOrganisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesBarry Magee
 
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)Chief Analytics Officer Forum
 

Similaire à Machine Learning in Big Data (20)

Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for Banks
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use Cases
 
Presentation for the Nexus Conference on the Internet of Things and the Evolu...
Presentation for the Nexus Conference on the Internet of Things and the Evolu...Presentation for the Nexus Conference on the Internet of Things and the Evolu...
Presentation for the Nexus Conference on the Internet of Things and the Evolu...
 
Supply Chain 2030: Presentation by Lora Cecere at CLX Conference
Supply Chain 2030: Presentation by Lora Cecere at CLX ConferenceSupply Chain 2030: Presentation by Lora Cecere at CLX Conference
Supply Chain 2030: Presentation by Lora Cecere at CLX Conference
 
Optimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deckOptimize supply chains using machine learning superpowers webinar deck
Optimize supply chains using machine learning superpowers webinar deck
 
Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April Meetup
 
20151008 REx Predictive presentation v 1 0 - distributed
20151008 REx Predictive presentation v 1 0 - distributed20151008 REx Predictive presentation v 1 0 - distributed
20151008 REx Predictive presentation v 1 0 - distributed
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & MoreThe New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
The New Tech Toolbelt: Digital Twins, IoT, Cobots, & More
 
Big Data and E-Commerce
Big Data and E-CommerceBig Data and E-Commerce
Big Data and E-Commerce
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Future of Supply Chain Technologies
Future of Supply Chain TechnologiesFuture of Supply Chain Technologies
Future of Supply Chain Technologies
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine Learnin
 
Entering the Data Analytics industry
Entering the Data Analytics industryEntering the Data Analytics industry
Entering the Data Analytics industry
 
Clarity First - Problem Solving
Clarity First - Problem Solving Clarity First - Problem Solving
Clarity First - Problem Solving
 
Why Embracing Digital Transformation Keeps Manufacturers Ahead of the Competi...
Why Embracing Digital Transformation Keeps Manufacturers Ahead of the Competi...Why Embracing Digital Transformation Keeps Manufacturers Ahead of the Competi...
Why Embracing Digital Transformation Keeps Manufacturers Ahead of the Competi...
 
Powering Supply Chain Transformation Through Analytics Innovation
Powering Supply Chain Transformation Through Analytics InnovationPowering Supply Chain Transformation Through Analytics Innovation
Powering Supply Chain Transformation Through Analytics Innovation
 
Organisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesOrganisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven Practices
 
Organisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven PracticesOrganisational Transformation with Data-Driven Practices
Organisational Transformation with Data-Driven Practices
 
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
PWC presentation at the Chief Analytics Officer Forum East Coast USA (#CAOForum)
 

Plus de DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

Plus de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Dernier

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Dernier (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Machine Learning in Big Data

  • 1. Machine Learning in Big Data - Look forward or be left behind V. William Porto Hadoop Summit Dublin 2016
  • 2. Overview of RedPoint Global 2  RedPoint Global Inc. 2016 Confidential Launchedin2006 Foundedandstaffedbyindustryveterans Headquarters: Wellesley,Massachusetts OfficesinUS,UK,Australia,Philippines Globalcustomerbase Servesmostmajorindustries
  • 3. Overview of RedPoint Global 3  RedPoint Global Inc. 2016 Confidential MAGIC QUADRANT Data Quality MAGIC QUADRANT Integrated Marketing Management MAGIC QUADRANT Multichannel Campaign Management MAGIC QUADRANT Digital Marketing Hubs FORRESTER WAVE™ Cross-channel Campaign Management FORRESTER WAVE™ Data Quality Solutions
  • 4. 4  RedPoint Global Inc. 2015 Confidential With apologies to Gary Larson Hadoop
  • 5. 5  RedPoint Global Inc. 2015 Confidential Machine Learning – why bother? If you have always done it that way, it is probably wrong” - Charles Kettering
  • 6. 6  RedPoint Global Inc. 2015 Confidential Machine Learning – keeping ahead of the curve • Three basic tenants for success in today’s world • Prediction - you need to learn and use what you’ve learned • Optimization - the world is a dynamic place • Automation - because people don’t scale well
  • 7. 7  RedPoint Global Inc. 2015 Confidential Machine Learning – what really is it all about? • Learning vs. instruction • Humans learn instinctively – computers not so much • Intelligent Systems • Memory • Prediction (modeling) • Assessment • Feedback • Adaptation
  • 8. 8  RedPoint Global Inc. 2015 Confidential Data Modeling – what, why, how • Regression – what happened in the past • Prediction – what will happen in the future “Prediction is very difficult – especially if it’s about the future” - Nihls Bohr
  • 9. 9  RedPoint Global Inc. 2015 Confidential Data Modeling – what, why, how The wide world of data modeling • Supervised models • you have historical data and known correlated outputs (truth) • Unsupervised models • historical data, but may not have (or trust) associated outputs
  • 10. 10  RedPoint Global Inc. 2015 Confidential Decision Trees Major Assumption: the world is discrete • fast, easy to understand, no linearity assumptions • ‘human time’ required, unbalanced and/or large trees
  • 11. 11  RedPoint Global Inc. 2015 Confidential Standard Linear Models Assumption: the world is linear • the real world really isn’t linear • all errors are not all equal • easy to get misleading results ? ! Which line is best?
  • 12. 12  RedPoint Global Inc. 2015 Confidential Generalized ‘Non-Linear’ Models Assumptions • underlying functional mapping is known • all errors are equal • data is ‘well-conditioned’ • ‘standard’ error distribution • Polynomials • Exponentials (e.g., Gaussian, Poisson) • Piece-wise linear
  • 13. 13  RedPoint Global Inc. 2015 Confidential Non-Linear Models Assumption: data is representative • ‘universal’ modeling tools • fast execution • no linearity assumptions • lots of parameters, many techniques • difficult to explain Artificial Neural Network
  • 14. 14  RedPoint Global Inc. 2015 Confidential User Story: Predict Retention / Attrition Historical Behavioral Data Customer Rating Retention Customer Name Loyalty Member Days Since Last Purchase Immediate Relatives Household Children Customer ID Latest Purchase Price Latest Purchase Item ID Region Code Customer Capture Method Customer Contact Code Domicile 1 1 Allen, Geraldine yes 29 0 2 24160 211.39 B5 MW 2 6 St Louis, MO 1 1 Anderson, Harry no 48 0 3 19952 26.55 E12 NE 3 New York, NY 1 1 Andrews, Cynthia yes 63 1 0 13502 77.95 D7 NE 10 6 Hudson, NY 1 0 Andrews, Thomas Jr no 39 0 0 112050 0 A36 SW Los Angeles, CA 1 1 Appleton, Mary yes 53 2 3 11769 51.49 C101 NE D Bayside, Queens, NY 1 0 Ashbury, Jeffrey no 47 1 0 PC 17757 29.99 C62 C64 NE 124 New York, NY 1 1 Aston, Mrs. yes 18 1 0 PC 17757 29.99 C62 C64 NE 4 New York, NY 1 1 Barber, Ellen yes 26 0 2 19877 78.85 S 6 1 1 Barkley, Henry no 80 0 0 27042 30 A23 NE B Yorktown, PA 1 0 Baumann, David no 0 0 PC 17318 25.99 NE New York, NY 1 1 Bazzeno, Alice yes 32 0 1 11813 76.95 D15 C 8 34 1 0 Beattie, Mr. Samuel no 36 0 0 13050 75.29 C6 C A 11 Winnipeg, MN 1 1 Beckworth, June yes 47 1 1 11751 52.49 D35 NE 5 New York, NY 1 1 Behr, John no 26 0 0 111369 30 C148 NE 5 New York, NY 1 1 Biden, Roseanne yes 42 0 0 PC 17757 127.99 C 4 1 1 Bird, Ellen yes 29 0 0 PC 17483 18.95 C97 S 8 1 0 Birnbaum, Jason no 25 0 0 13905 26 C 148 San Francisco, CA
  • 15. 15  RedPoint Global Inc. 2015 Confidential User Story: Predict Customer Retention / Attrition Machine Learning Processing Chain - Training
  • 16. 16  RedPoint Global Inc. 2015 Confidential User Story: Predict Retention / Attrition Machine Learning Processing Chain - Prediction Reward predicted ‘retainees’ with targeted product offerings Give potential attrition customers special incentives to stay with the business
  • 17. 17  RedPoint Global Inc. 2015 Confidential User Story: Accurate vs. Useful Prediction Sparse data + Least-Squares (Linear) Classifier • Task: predict chance of purchasing a sundry item • Result: ‘best’ model always predicts “none” • Analysis: LS algorithm assumes all errors are equal Bread Cake & Pie Chocolate Coffee Cookie Diesel Juice & Smoothies Lubricants Milk Other Bakery Premium Sandwich Snack Tea Total Transaction Total Revenue 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3000 0 0 0 0 0 3 0 0 0 0 0 0 0 0 3 2000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 1800 0 0 0 0 0 5 0 0 0 0 0 0 0 0 6 4800 0 0 0 2 0 0 0 0 0 0 0 0 0 0 2 100 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1828 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13 16460 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 1000 0 0 0 0 0 2 0 0 0 0 0 0 0 0 2 1500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 4600 0 0 0 0 0 11 0 0 0 0 0 0 0 0 11 19381.5 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1860 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 3000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18 9838.82 0 0 0 0 0 0 0 0 0 0 0 0 0 0 22 11000 0 0 0 0 0 5 0 0 0 0 0 0 0 0 19 18225 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 500 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 800 0 0 0 0 0 0 0 0 0 0 0 1 0 0 7 7990 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 3820 0 0 0 0 0 1 0 0 0 0 0 0 0 0 55 43230
  • 18. 18  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation – group think Collaborative Filtering Relationship Matrix
  • 19. 19  RedPoint Global Inc. 2015 Confidential Personalization – not really !=
  • 20. 20  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation Similarity? Customer Browser Gender Age Sector Income Sector Married Children Homeowner Recent Baby Clothes Purchase George IE9 M 0 A N 0 1 N Carol Chrome F 1 B Y 1 0 Y Mary IE9 F 0 A N 1 0 Y Dist(George,Carol) = 8 Dist(George,Mary) = 4 Dist(Carol,Mary) = 4 Can you afford to target (George,Mary) the same way as (Carol,Mary) ?
  • 21. 21  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation Basic Question – which one describes the data the best? Raw data How many clusters are there ? Two Clusters Four Clusters Six Clusters
  • 22. 22  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation with Statistics • relatively simple • data distribution assumptions • initialization dependencies 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 Raw Data 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 Ellipsoidal Clustering 0 10 20 30 40 50 60 70 80 90 100 0 20 40 60 80 100 K-Means Clustering
  • 23. 23  RedPoint Global Inc. 2015 Confidential Clustering/Segmentation – data driven • let the data speak for itself • multiple data projection ‘views’ • important boundary relationships (“swing voters”) Customer Demographics
  • 24. 24  RedPoint Global Inc. 2015 Confidential User Story: Clustering / Segmentation ML Clustering - Training ML Clustering – Processing New Data
  • 25. 25  RedPoint Global Inc. 2015 Confidential Model Selection – how to choose? • Basic Model Type (prediction or segmentation) • inputs + correlated outputs • inputs only? • Basic Questions: • what to use for my problem? • parameters? • is this the best choice? • could I do better, and how?
  • 26. 26  RedPoint Global Inc. 2015 Confidential Optimization – Evolving better solutions • Simulated Evolution • fast, efficient search • always have a solution • arbitrary ‘evaluation’ functions • can start with existing solution(s) • Variation – alter model type, parameters • Assessment – how well does the model work? • Selection – survival of the fittest
  • 27. 27  RedPoint Global Inc. 2015 Confidential Evolutionary Optimization – Evaluation Function • can use any measureable data • no continuity assumptions • no differentiability assumptions • no symmetry assumptions Sunshine Hurricane 20 -1000 5 50 Sunshine Hurricane Prediction Reality (Truth)
  • 28. 28  RedPoint Global Inc. 2015 Confidential User Story: Optimizing Classification Models Task: Predict Retention/Attrition 62.00 70.2 72.3 73.4 75.2 34.8 28.8 24.5 22.1 20.9 0.00 20.00 40.00 60.00 80.00 100.00 0 1 2 3 4 5 6 Performance Generation Model Performance Optimization Classification Accuracy Test Set Error (RMS) 17 Potential input features (customer demographics) 2 outputs (retention/attrition) 1300 Training Samples (50 – 50, A / B Split) 1300 Test Samples ( naïve test data )
  • 29. 29  RedPoint Global Inc. 2015 Confidential Use Case – Fully Adaptive Feedback (Next Best Offer) DB Historical User Behavior (stimulus/response) Train / Update Model Non-Adaptive (Fixed) Mode Randomized A/B/C Offer Selection Adaptive ML Mode ML Prediction Offer Selection Operation (Trigger) Ad / Offer (stimulus) Feedback Cycle
  • 30. 30  RedPoint Global Inc. 2015 Confidential Five Keys to Successful Machine Learning • Let the data speak for itself – don’t force fit your models • Remember, all errors are not all equal – use this to your advantage • True learning requires continual adaptation ! • Automate the process with feedback – remove the “man-in-the-loop” • Trust the optimization process – it really works!
  • 31. 31  RedPoint Global Inc. 2015 Confidential Q&A Contact Info Visit : www.redpoint.net Bill Porto Sr. Engineering Analyst RedPoint Global Inc. vwporto@redpoint.net Want More Information about this topic? Fill out your card or go to redpoint.net/hadoopeurope