SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Live predictions with
schemaless data at scale
Ondrej Brichta
CONFIDENTIAL
WE ARE HIRING
Our problem at Exponea
What we have:
● Terabytes of customer data
● Means to process all the data fast
● Customers with business understanding (or our consultants)
What we needed:
● Probability classificator
○ For any chosen event or attribute
○ Able to process any given data and return a reasonable output without human interaction
Our problem at Exponea
One specific problem:
● How to choose which is the best banner to show right now?
● Web page personalization?
Solution?
In session prediction
● Predict probability of buying for each customer live when they are online
What steps do we need to do and automate?
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
1. Business understanding
Key questions:
● What problem do we have?
● What are our use cases?
● Whom are we creating this model for?
1. Business understanding
Key questions:
● What problem do we have? - We want to optimize our campaigns, ...
● What are our use cases? - Should I show this banner?
○ Translate this use case into machine learning - predict whether customer will buy something
● Whom are we creating this model for? - NOT FOR US, our customers!
1. Business understanding
Key questions:
● What problem do we have? - We want to optimize our campaigns, ...
● What are our use cases? - Should I show this banner? ...send this email?
○ Translate this use case into machine learning - predict whether customer will buy something
● Whom are we creating this model for? - NOT FOR US, our customers!
Hard or impossible to automate, but!
We can delegate this to customers or consultants as easier problems.
● Create few templates to choose from
○ At least few easy to use and few advanced to modify the model
2. Data understanding
● Closely related to business understanding
● Most important part of machine learning!
● Key questions
○ What data do i have?
○ Is the data valuable?
○ What do I want to predict?
○ Should I train on all data or only subset? <- crucial
● Changes here can influence outcome by the largest amount
2. Data understanding
Automation?
Yes, but difficult...
2. Data understanding
● Have pre sets for each template
○ We have in session predictions just for this problem
● Experiment with variables (date ranges, data filters,...)
○ We have variables set to best values for all projects. Can be manually adjusted by user
● Have a model providing unselected variables
3. Data preparation
● Cleaning the data - have some common structure for easy selection
● Derive attributes - have general easy aggregates (count, first, last,...)
● Dataset balancing - easy rules to determine which technique to use
● Aggregates - have a model which selects which complex aggregates to use
● Reducing dimensionality - feature selection
● Mapping data to “clean” values - most frequent items, vector indexer
● Casting data to vectors - vector assembler
3. Data preparation
3. Data preparation
● Customers connect from different platforms
● Customers log in after few events
3. Data preparation
3. Data preparation
4. Modeling
4. Modeling
5. Evaluation
● How good is my model?
● ROC, F1, Lift curve
5. Evaluation
6. Deployment
How do you want to use it? Must it be fast? Online or Offline?
6. Deployment
For us, Fast and Online!
6. Deployment
6. Deployment
Other deployments:
● Fast and Offline -> pre computed results for each customer
● Slow and Online -> improved results with slower algorithms
● Slow and Offline -> you messed up. Rework your solution.
App overview
Ask me anything
ondrej.brichta@exponea.com
CONFIDENTIAL
WE ARE HIRING

Contenu connexe

Similaire à Live predictions with schemaless data at scale. MLMU Kosice, Exponea

Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
VMware Tanzu
 

Similaire à Live predictions with schemaless data at scale. MLMU Kosice, Exponea (20)

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...
 
A journey of ai driven analytics insights engine
A journey of  ai driven analytics insights engineA journey of  ai driven analytics insights engine
A journey of ai driven analytics insights engine
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
 
Key Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PMKey Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PM
 
DS Life Cycle
DS Life CycleDS Life Cycle
DS Life Cycle
 
DS Life Cycle
DS Life CycleDS Life Cycle
DS Life Cycle
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
How to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PMHow to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PM
 
Data Integration and Marketing Attribution
Data Integration and Marketing Attribution Data Integration and Marketing Attribution
Data Integration and Marketing Attribution
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
 
Panaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product ManagerPanaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product Manager
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product Manager
 

Dernier

Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Dernier (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Live predictions with schemaless data at scale. MLMU Kosice, Exponea

  • 1. Live predictions with schemaless data at scale Ondrej Brichta
  • 3. Our problem at Exponea What we have: ● Terabytes of customer data ● Means to process all the data fast ● Customers with business understanding (or our consultants) What we needed: ● Probability classificator ○ For any chosen event or attribute ○ Able to process any given data and return a reasonable output without human interaction
  • 4. Our problem at Exponea One specific problem: ● How to choose which is the best banner to show right now? ● Web page personalization? Solution? In session prediction ● Predict probability of buying for each customer live when they are online
  • 5. What steps do we need to do and automate? 1. Business understanding 2. Data understanding 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment
  • 6. 1. Business understanding Key questions: ● What problem do we have? ● What are our use cases? ● Whom are we creating this model for?
  • 7. 1. Business understanding Key questions: ● What problem do we have? - We want to optimize our campaigns, ... ● What are our use cases? - Should I show this banner? ○ Translate this use case into machine learning - predict whether customer will buy something ● Whom are we creating this model for? - NOT FOR US, our customers!
  • 8. 1. Business understanding Key questions: ● What problem do we have? - We want to optimize our campaigns, ... ● What are our use cases? - Should I show this banner? ...send this email? ○ Translate this use case into machine learning - predict whether customer will buy something ● Whom are we creating this model for? - NOT FOR US, our customers! Hard or impossible to automate, but! We can delegate this to customers or consultants as easier problems. ● Create few templates to choose from ○ At least few easy to use and few advanced to modify the model
  • 9. 2. Data understanding ● Closely related to business understanding ● Most important part of machine learning! ● Key questions ○ What data do i have? ○ Is the data valuable? ○ What do I want to predict? ○ Should I train on all data or only subset? <- crucial ● Changes here can influence outcome by the largest amount
  • 11. 2. Data understanding ● Have pre sets for each template ○ We have in session predictions just for this problem ● Experiment with variables (date ranges, data filters,...) ○ We have variables set to best values for all projects. Can be manually adjusted by user ● Have a model providing unselected variables
  • 12. 3. Data preparation ● Cleaning the data - have some common structure for easy selection ● Derive attributes - have general easy aggregates (count, first, last,...) ● Dataset balancing - easy rules to determine which technique to use ● Aggregates - have a model which selects which complex aggregates to use ● Reducing dimensionality - feature selection ● Mapping data to “clean” values - most frequent items, vector indexer ● Casting data to vectors - vector assembler
  • 15. ● Customers connect from different platforms ● Customers log in after few events 3. Data preparation
  • 19. 5. Evaluation ● How good is my model? ● ROC, F1, Lift curve
  • 21. 6. Deployment How do you want to use it? Must it be fast? Online or Offline?
  • 22. 6. Deployment For us, Fast and Online!
  • 24. 6. Deployment Other deployments: ● Fast and Offline -> pre computed results for each customer ● Slow and Online -> improved results with slower algorithms ● Slow and Offline -> you messed up. Rework your solution.