SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
Data Science:
Use Cases and Tools
Alexey Grigorev
28/05/2020
mlbookcamp.com
Plan
● Use cases
○ Advertisement
○ Moderation in Online Classifieds
● Base skills
Advertisement
Exchange
Exchange
Exchange
DSP1
DSP2
DSP3
...
Exchange
DSP1
DSP2
DSP3
...
Exchange
DSP1
DSP2
DSP3
...
❌
$0.10
$0.09
Exchange
DSP1
DSP2
DSP3
...
❌
$0.10
$0.09
Exchange
$0.10
Exchange
DSP1
● What do we know about the user?
DSP1
● What do we know about the user?
● What are they interested in?
DSP1
● What do we know about the user?
● What are they interested in?
● What’s the probability of click and conversion?
DSP1
● What do we know about the user?
● What are they interested in?
● What’s the probability of click and conversion?
● How much are we willing to pay for it?
$0.10
DSP1
● What do we know about the user?
● What are they interested in?
● What’s the probability of click and conversion?
● How much are we willing to pay for it?
● Should we show any ad?
DSP1
● What do we know about the user?
● What are they interested in?
● What’s the probability of click and conversion?
● How much are we willing to pay for it?
● Should we show any ad?
Probability of click
Estimate the probability that the user clicks on the ad
Probability of click
● Device characteristics (e.g. OS)
● Geography (country, city)
● Demography (gender, age)
● History: visited pages, installed apps
Probability of conversion
Estimate the probability that the user will buy the product after clicking
Probability of conversion
● Device characteristics (e.g. OS)
● Geography (country, city)
● Demography (gender, age)
● History: visited pages, installed apps
● Features of the advertiser (how convenient the page is, etc)
Tools
For data scientists
● SQL (AWS Athena)
● Apache Spark
● Scikit-Learn
● Own tools (e.g. FTRL)
Plan
● Use cases
○ Advertisement
○ Moderation in Online Classifieds
● Base skills
Classified Advertisement
Such description. So much text
Such description. So
much text
Such description. So
much text
Such description. So
much text
Problems
● Illegal goods
● NSFW content
● Duplicates
● Spam
● Fraud
Moderation
Such description
So much text
ML
Such description
So much text
Automated
moderation
ML
Such description
So much text
Automated
moderation
ML
Such description
So much text
Ad queue
MP
Moderation panel
Moderators
Automated
moderation
ML
Such description
So much text
Ad queue
Automated
moderation
Duplicate
detection
Illegal items
detection
Other
models
Illegal goods
● Analyse the title and description
● Analyse the image
Duplicates
● How similar the listing is to other listings
● IP addresses, device signature
● How many other ads the user postes
● City, category
Tools
For data scientists
● SQL (AWS Athena)
● Scikit-Learn
● TensforFlow, Apache MXNet
● Flask
Plan
● Use cases
○ Advertisement
○ Moderation in Online Classifieds
● Base skills
Base skills
● SQL, data manipulation
● Git
● Python
● NumPy, Pandas, Scikit-Learn
● Training and validating models
● Microservices, Flask, Docker
How to learn?
● Come up with a problem
● Look for solution (tools, libraries, tutorials)
● Solve the problem
● …
● Profit
How to learn?
● Come up with a problem ⇐ Important! Focus on the problem
● Look for solution (tools, libraries, tutorials)
● Solve the problem
● …
● Profit
mlbookcamp.com
● Learn ML by doing projects
● http://bit.ly/mlbookcamp
● Get 40% off with code “grigorevpc”
● Twitter: @Al_Grigor (book give-away
this Sunday!)
Machine Learning
Bookcamp
That’s all!
Questions?

Contenu connexe

Similaire à Data science: use cases and tools

Similaire à Data science: use cases and tools (20)

AI & Personalised Experiences
AI & Personalised ExperiencesAI & Personalised Experiences
AI & Personalised Experiences
 
Get to know data science
Get to know data scienceGet to know data science
Get to know data science
 
2016 XUG Conference Big Data: Big Deal for Personalized Communications or Meh?
2016 XUG Conference   Big Data: Big Deal for Personalized Communications or Meh?2016 XUG Conference   Big Data: Big Deal for Personalized Communications or Meh?
2016 XUG Conference Big Data: Big Deal for Personalized Communications or Meh?
 
Codementor - Data Science at OLX
Codementor - Data Science at OLX Codementor - Data Science at OLX
Codementor - Data Science at OLX
 
Course outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seoCourse outline for affiliate with amazon and seo
Course outline for affiliate with amazon and seo
 
An Introduction to Pay-Per-View Marketing - AffiliateFix and BoxOfAds.com Web...
An Introduction to Pay-Per-View Marketing - AffiliateFix and BoxOfAds.com Web...An Introduction to Pay-Per-View Marketing - AffiliateFix and BoxOfAds.com Web...
An Introduction to Pay-Per-View Marketing - AffiliateFix and BoxOfAds.com Web...
 
Artificial Intelligence and Antitrust (Hal Varian)
Artificial Intelligence and Antitrust (Hal Varian)Artificial Intelligence and Antitrust (Hal Varian)
Artificial Intelligence and Antitrust (Hal Varian)
 
Search Engine PPT For Students and Professionals
Search Engine PPT For Students and ProfessionalsSearch Engine PPT For Students and Professionals
Search Engine PPT For Students and Professionals
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Data Con LA 2019 - The challenges of data science for veteran media organizat...
Data Con LA 2019 - The challenges of data science for veteran media organizat...Data Con LA 2019 - The challenges of data science for veteran media organizat...
Data Con LA 2019 - The challenges of data science for veteran media organizat...
 
NTC17 For the Love of Volunteers.pptx
NTC17   For the Love of Volunteers.pptxNTC17   For the Love of Volunteers.pptx
NTC17 For the Love of Volunteers.pptx
 
1 - AI in healthcare - Intro
1 - AI in healthcare - Intro1 - AI in healthcare - Intro
1 - AI in healthcare - Intro
 
UK GIAF: Winter 2015
UK GIAF: Winter 2015UK GIAF: Winter 2015
UK GIAF: Winter 2015
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 
AI Tools & Best Practices
AI Tools  & Best PracticesAI Tools  & Best Practices
AI Tools & Best Practices
 
General introduction to AI ML DL DS
General introduction to AI ML DL DSGeneral introduction to AI ML DL DS
General introduction to AI ML DL DS
 
Lessons learnt from applying PyData to GetYourGuide marketing
Lessons learnt from applying PyData to GetYourGuide marketingLessons learnt from applying PyData to GetYourGuide marketing
Lessons learnt from applying PyData to GetYourGuide marketing
 
Being a Data Science Product Manager
Being a Data Science Product ManagerBeing a Data Science Product Manager
Being a Data Science Product Manager
 
Creating intelligent content: How to automate personalised, one-to-one market...
Creating intelligent content: How to automate personalised, one-to-one market...Creating intelligent content: How to automate personalised, one-to-one market...
Creating intelligent content: How to automate personalised, one-to-one market...
 

Plus de Alexey Grigorev

Plus de Alexey Grigorev (20)

MLOps week 1 intro
MLOps week 1 introMLOps week 1 intro
MLOps week 1 intro
 
Data Monitoring with whylogs
Data Monitoring with whylogsData Monitoring with whylogs
Data Monitoring with whylogs
 
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
 
AI in Fashion - Size & Fit - Nour Karessli
 AI in Fashion - Size & Fit - Nour Karessli AI in Fashion - Size & Fit - Nour Karessli
AI in Fashion - Size & Fit - Nour Karessli
 
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia PavlovaAI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
AI-Powered Computer Vision Applications in Media Industry - Yulia Pavlova
 
ML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - KubernetesML Zoomcamp 10 - Kubernetes
ML Zoomcamp 10 - Kubernetes
 
Paradoxes in Data Science
Paradoxes in Data ScienceParadoxes in Data Science
Paradoxes in Data Science
 
ML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learningML Zoomcamp 8 - Neural networks and deep learning
ML Zoomcamp 8 - Neural networks and deep learning
 
Algorithmic fairness
Algorithmic fairnessAlgorithmic fairness
Algorithmic fairness
 
MLOps at OLX
MLOps at OLXMLOps at OLX
MLOps at OLX
 
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble LearningML Zoomcamp 6 - Decision Trees and Ensemble Learning
ML Zoomcamp 6 - Decision Trees and Ensemble Learning
 
ML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deploymentML Zoomcamp 5 - Model deployment
ML Zoomcamp 5 - Model deployment
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
ML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for ClassificationML Zoomcamp 4 - Evaluation Metrics for Classification
ML Zoomcamp 4 - Evaluation Metrics for Classification
 
ML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for ClassificationML Zoomcamp 3 - Machine Learning for Classification
ML Zoomcamp 3 - Machine Learning for Classification
 
ML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office HoursML Zoomcamp Week #2 Office Hours
ML Zoomcamp Week #2 Office Hours
 
AMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplacesAMLD2021 - ML in online marketplaces
AMLD2021 - ML in online marketplaces
 
ML Zoomcamp 2 - Slides
ML Zoomcamp 2 - SlidesML Zoomcamp 2 - Slides
ML Zoomcamp 2 - Slides
 
ML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction ProjectML Zoomcamp 2.1 - Car Price Prediction Project
ML Zoomcamp 2.1 - Car Price Prediction Project
 
ML Zoomcamp 1.10 - Summary
ML Zoomcamp 1.10 - SummaryML Zoomcamp 1.10 - Summary
ML Zoomcamp 1.10 - Summary
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

Data science: use cases and tools