SlideShare une entreprise Scribd logo
1  sur  22
Leveraging Machine Learning Techniques for
the Vehicle Auction Industry
Raji Balasubramaniyan, PhD
Senior Data Scientist
Manheim, Inc.,
Manheim | Proprietary and Confidential 1
Overview
• Automobile auction
– Manheim
• Introduce the ML use cases
– Churn rate
– recommendations
– Forecasting
• How to approach a problem?
– Tools and algorithms used
• QA
Manheim, Inc., Automobile auction
Providing auction services for the physical sale of
automobiles as well as online tools to connect wholesale
vehicle buyers and sellers.
Leader in wholesale
vehicle auction
industry. 85% vehicle
auction business
happens at Manheim.
We have over 100
location across US and
Canada
About 15 million cars
goes through auction
every year
ML use case 1: Predicting Churn rate
• What is Churn?
– Churn rate, refers to the proportion of members who leave during a
given time period
• Motto: Make customer happy
– If the customer is happy, he/she wont churn.
• Why it is important?
– It helps us predict and analyze the parameters that drives the
customers away helps sales force team to focus on those parameters
and coach the customer
Manheim | Proprietary and Confidential 4
Predicting Churn rate: The approach
• Step 1
– Create profile for current and cancelled members by collecting their
behavior data for last 6 months
• Activity, Transactions, Messages, Response time etc.,
• Step 2
– Segment the customer according to their behavior
• Unsupervised clustering
• Step 3
– For every segment perform supervised learning, to select parameters
that influence current members Vs. cancelled members
• Logistic regression, Neural net
• Step 4
– Include sentiment analysis add another score
Manheim | Proprietary and Confidential 5
Algorithms: Unsupervised K-means clustering
• Given a set of observations (x1, x2, …, xn), where each
observation is a d-dimensional real vector consists of each
members parameters, k-means clustering aims to partition
the n observations into k (≤ n) sets S = {Successful Seller,
Successful Buyer, Buyer at risk, Seller at risk, undecided} so as
to minimize the within-cluster sum of squares (WCSS).
In other words, its objective is to find:
• where μi is the mean of points in Si.
Manheim | Proprietary and Confidential 6
Algorithms: Logistic regression
Manheim | Proprietary and Confidential 7
If P is viewed as a linear function of an explanatory variable, or a linear
combination of explanatory variables, then the logistic regression function can be
written as
Where
α1…αn are parameters influencing the churn
Algorithms: Neural net
Manheim | Proprietary and Confidential 8
Given a specific task to assign a user in a group, given 5 groups, learning
means using a set of factors to find f* ∈ F which solves the task in
optimal sense.
Our training data consists of N dealers from each group from 5 groups.
x1 :Activity
x2 : Number of messages
x3: Response time
xn : etc
w1
w2
w3
wn
wnå xn
Output
Our cost function is the mean-squared error,
which tries to minimize the average squared
error between the network's output.
Algorithms :Sentiment analysis
Manheim | Proprietary and Confidential 9
Sentiment refers to the use of natural language processing, text analysis and
computational linguistics to identify and extract subjective information in source
materials. We used Naïve-Bayes model.
We have two training groups G ={ ‘Cancel’, “Member”}, D= Messages
Example tk= {“like”, “love”, “hate”, “bad”, “worst” , "interesting-to-me" : "not-interesting-to-
me”,…..k-terms}
Goal is to find best group for a message D using maximum a posteriori (MAP)
group Gmap
tk is a term;
Dm is the set from ‘Members’;
Dmk is the subset that contain tk;
Dc is the set from ‘Cancelled
Member’;
Dck is the subset that contain tk.
The Result
• Every dealer will be assigned to a group
• He / She will have 3 different health score (1-Churn rate)
– 0-30 days health score (Calculated using last 30 days data)
– 30-60 days health score (Calculated using last 30-60 days data)
– 60+days health score (Calculated using last 60-120 days data)
• Sales force will be alarmed to see if a successful user turned
to fall in risk category. They will look into the parameter which
forced them to be in risk category
– Example : Last 30 days less Activity
• Marketing team will take risk category users and aim
promotion schemes to them
Manheim | Proprietary and Confidential 10
ML use case 2: Recommendation
Manheim | Proprietary and Confidential 11
What is recommendation system?
Recommender systems are a subclass of information
filtering system that seek to predict the 'rating' or
'preference' that a user would give to an item.
Goal
Suggest relevant content to the users
Recommendation: The Approach
Manheim | Proprietary and Confidential 12
• Step 1
– Segment customers according their transaction patterns
• Step 2
– For every segment create user profile per customer
• Step 3
– Match user profile with vehicle profile and arrive at matching score
• Step 4
– Rank the relevant content
• Step 5
– Combine profile matching and ranking and provide recommendations
The approach: Segment the customers
Manheim | Proprietary and Confidential 13
Segment the customers according to their behavior
• Franchise dealer, Independent, Wholesaler
K-means or any clustering technique could be used for this
purpose
Our objective is to find best group every dealer
belongs to.
where μi is the mean of points in Si. and
S = {different customer segments}
The approach :Creating user profile and
Matching
• Create user profiles by collecting the dealer transaction pattern for a
period of time
• For every user profile perform vehicle filtering using content based
collaborative filtering
– User – Item collaborative filtering: Relevant content recommendation
• Customers who bought car X also bought car Y
– 2010 Honda Accord Vs 2010 Toyota Camry
– User- User collaborative filtering : You may also like these
• Dealer A and Dealer B how much their profiles match
Similarity or Co-rating matrix is used to arrive at relevant content
matching correlations
Manheim | Proprietary and Confidential 14
The approach: Ranking scores using regression
Customer need score
Once we have filtered the profiles that are relevant to the users, rank/sort
the vehicles according to some goal to provide more relevant content on top
• Example: Suggest items that makes more profit for the customers in
the retail market, in this case regression goal is profit.
Where
α1…αn can be Buying price from auction, retail selling price,
Detailing work done on the cars etc.,
Result
Suggest relevant cars to the dealers when they login to the site
ML use case 3: Forecasting
• How many transaction a buyer is going to make in next few
weeks?
– Given the past year transaction history for a buyer, how many cars the
dealer will buy in future few auctions or online.
– Which year, make and model the dealer buy?
– In which auction, region he will buy.
• How many users are going to Churn in next few months?
– How many will move from risk category to successful category
– How many will move to risk category
– How many non active moved to active category
Manheim | Proprietary and Confidential 16
Synopsis : Time series and ARIMA
Manheim | Proprietary and Confidential 17
A time series can be viewed as a combination of signal and noise, and
could have different patterns like, and it could also have a seasonal
component.
• Mean reversion
• The trend will tend to move to the mean over time
• Sinusoidal oscillation
• Etc.,
An ARIMA model can be viewed as a “filter” that tries to separate the
signal from the noise, and the signal is then extrapolated into the
future to obtain forecasts.
ARIMA models are, the most general class of models for forecasting a
time series.
The Approach :ARIMA
Auto Regressive Integrated moving average model for calculating the forecast,
A non seasonal ARIMA model is classified as an"ARIMA(p,d,q) model,
where:
p is the number of autoregressive terms
d is the number of non seasonal differences needed for stationarity
q is the number of moving average terms.
A seasonal ARIMA model is classified as an ARIMA(p,d,q)x(P,D,Q) model, where
P=number of seasonal autoregressive (SAR) terms
D=number of seasonal differences
Q=number of seasonal moving average (SMA) terms
According to signal type, we developed automatic forecast parameter prediction algorithm, that choses
different p,P, d,D and q,Q values and selects the one which has lowest RMSE value using 80-20 rule.
Manheim | Proprietary and Confidential 18
Manheim | Proprietary and Confidential 19
perioid− Example4−c(0, 0, 0),S(1,0,0)
Weeks
count
0 20 40 60 80 100
400005000060000700008000090000
80/20
Weeks
count
0 20 40 60
400005000060000700008000090000
One Example
Summary
• We used various ML techniques and implemented them for
vehicle auction industry use cases.
• Choosing the algorithm determines the success of the results
and depending on the use case, various algorithms can be
used
• Extracting , Cleaning and normalizing the data forms the
crucial layer in determining the use case success
Manheim | Proprietary and Confidential 20
Acknowledgement
• Dr. Stephane Pinel
• Sonar Team
• Manheim
Manheim | Proprietary and Confidential 21
Q &A
Manheim | Proprietary and Confidential 22

Contenu connexe

Similaire à Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15

Customer Analytics & Segmentation
Customer Analytics & SegmentationCustomer Analytics & Segmentation
Customer Analytics & SegmentationGeorge Krasadakis
 
Building an algorithmic price management system using ML
Building an algorithmic price management system using MLBuilding an algorithmic price management system using ML
Building an algorithmic price management system using MLGrid Dynamics
 
Offer Recommendation methodology for Vito's Mobile App
Offer Recommendation methodology for Vito's Mobile AppOffer Recommendation methodology for Vito's Mobile App
Offer Recommendation methodology for Vito's Mobile AppDipesh Patel
 
Offer recommendation methodology
Offer recommendation methodologyOffer recommendation methodology
Offer recommendation methodologyDipesh Patel
 
Predictable results for high growth sales organizations
Predictable results for high growth sales organizationsPredictable results for high growth sales organizations
Predictable results for high growth sales organizationsConnectLeader_Marketing
 
Predictable Results for High Growth Sales Organizations
Predictable Results for High Growth Sales OrganizationsPredictable Results for High Growth Sales Organizations
Predictable Results for High Growth Sales OrganizationsKen Smith
 
Using PySpark to Scale Markov Decision Problems for Policy Exploration
Using PySpark to Scale Markov Decision Problems for Policy ExplorationUsing PySpark to Scale Markov Decision Problems for Policy Exploration
Using PySpark to Scale Markov Decision Problems for Policy ExplorationDatabricks
 
Recency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryRecency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryQualex Asia
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxAniket Patil
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptxpatilaniket2418
 
Campaign optimization using Business Intelligence and Data Mining
Campaign optimization using Business Intelligence and Data MiningCampaign optimization using Business Intelligence and Data Mining
Campaign optimization using Business Intelligence and Data MiningGeorge Krasadakis
 
Value Stream Mapping
Value Stream MappingValue Stream Mapping
Value Stream Mappingmahawar1987
 
Customer Relationship Management Unit-4 IMBA Osmania University
Customer Relationship Management Unit-4 IMBA Osmania UniversityCustomer Relationship Management Unit-4 IMBA Osmania University
Customer Relationship Management Unit-4 IMBA Osmania UniversityBalasri Kamarapu
 
Analytics for the supply chain
Analytics for the supply chain Analytics for the supply chain
Analytics for the supply chain Saurav Kumar
 
Business development framework brochure
Business development framework brochureBusiness development framework brochure
Business development framework brochureIncedo
 

Similaire à Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15 (20)

Segmentation
SegmentationSegmentation
Segmentation
 
Archaic to Advanced in Akron
Archaic to Advanced in AkronArchaic to Advanced in Akron
Archaic to Advanced in Akron
 
Customer Analytics & Segmentation
Customer Analytics & SegmentationCustomer Analytics & Segmentation
Customer Analytics & Segmentation
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Building an algorithmic price management system using ML
Building an algorithmic price management system using MLBuilding an algorithmic price management system using ML
Building an algorithmic price management system using ML
 
Offer Recommendation methodology for Vito's Mobile App
Offer Recommendation methodology for Vito's Mobile AppOffer Recommendation methodology for Vito's Mobile App
Offer Recommendation methodology for Vito's Mobile App
 
Offer recommendation methodology
Offer recommendation methodologyOffer recommendation methodology
Offer recommendation methodology
 
Predictable results for high growth sales organizations
Predictable results for high growth sales organizationsPredictable results for high growth sales organizations
Predictable results for high growth sales organizations
 
Predictable Results for High Growth Sales Organizations
Predictable Results for High Growth Sales OrganizationsPredictable Results for High Growth Sales Organizations
Predictable Results for High Growth Sales Organizations
 
Customer 360 brochure
Customer 360  brochureCustomer 360  brochure
Customer 360 brochure
 
Using PySpark to Scale Markov Decision Problems for Policy Exploration
Using PySpark to Scale Markov Decision Problems for Policy ExplorationUsing PySpark to Scale Markov Decision Problems for Policy Exploration
Using PySpark to Scale Markov Decision Problems for Policy Exploration
 
Recency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industryRecency/Frequency and Predictive Analytics in the gaming industry
Recency/Frequency and Predictive Analytics in the gaming industry
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
 
Campaign optimization using Business Intelligence and Data Mining
Campaign optimization using Business Intelligence and Data MiningCampaign optimization using Business Intelligence and Data Mining
Campaign optimization using Business Intelligence and Data Mining
 
Value Stream Mapping
Value Stream MappingValue Stream Mapping
Value Stream Mapping
 
Customer Relationship Management Unit-4 IMBA Osmania University
Customer Relationship Management Unit-4 IMBA Osmania UniversityCustomer Relationship Management Unit-4 IMBA Osmania University
Customer Relationship Management Unit-4 IMBA Osmania University
 
Crm
CrmCrm
Crm
 
Analytics for the supply chain
Analytics for the supply chain Analytics for the supply chain
Analytics for the supply chain
 
Business development framework brochure
Business development framework brochureBusiness development framework brochure
Business development framework brochure
 

Plus de MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

Plus de MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Dernier

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Dernier (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Raji Balasuubramaniyan, Senior Data Scientist, Manheim at MLconf ATL - 9/18/15

  • 1. Leveraging Machine Learning Techniques for the Vehicle Auction Industry Raji Balasubramaniyan, PhD Senior Data Scientist Manheim, Inc., Manheim | Proprietary and Confidential 1
  • 2. Overview • Automobile auction – Manheim • Introduce the ML use cases – Churn rate – recommendations – Forecasting • How to approach a problem? – Tools and algorithms used • QA
  • 3. Manheim, Inc., Automobile auction Providing auction services for the physical sale of automobiles as well as online tools to connect wholesale vehicle buyers and sellers. Leader in wholesale vehicle auction industry. 85% vehicle auction business happens at Manheim. We have over 100 location across US and Canada About 15 million cars goes through auction every year
  • 4. ML use case 1: Predicting Churn rate • What is Churn? – Churn rate, refers to the proportion of members who leave during a given time period • Motto: Make customer happy – If the customer is happy, he/she wont churn. • Why it is important? – It helps us predict and analyze the parameters that drives the customers away helps sales force team to focus on those parameters and coach the customer Manheim | Proprietary and Confidential 4
  • 5. Predicting Churn rate: The approach • Step 1 – Create profile for current and cancelled members by collecting their behavior data for last 6 months • Activity, Transactions, Messages, Response time etc., • Step 2 – Segment the customer according to their behavior • Unsupervised clustering • Step 3 – For every segment perform supervised learning, to select parameters that influence current members Vs. cancelled members • Logistic regression, Neural net • Step 4 – Include sentiment analysis add another score Manheim | Proprietary and Confidential 5
  • 6. Algorithms: Unsupervised K-means clustering • Given a set of observations (x1, x2, …, xn), where each observation is a d-dimensional real vector consists of each members parameters, k-means clustering aims to partition the n observations into k (≤ n) sets S = {Successful Seller, Successful Buyer, Buyer at risk, Seller at risk, undecided} so as to minimize the within-cluster sum of squares (WCSS). In other words, its objective is to find: • where μi is the mean of points in Si. Manheim | Proprietary and Confidential 6
  • 7. Algorithms: Logistic regression Manheim | Proprietary and Confidential 7 If P is viewed as a linear function of an explanatory variable, or a linear combination of explanatory variables, then the logistic regression function can be written as Where α1…αn are parameters influencing the churn
  • 8. Algorithms: Neural net Manheim | Proprietary and Confidential 8 Given a specific task to assign a user in a group, given 5 groups, learning means using a set of factors to find f* ∈ F which solves the task in optimal sense. Our training data consists of N dealers from each group from 5 groups. x1 :Activity x2 : Number of messages x3: Response time xn : etc w1 w2 w3 wn wnå xn Output Our cost function is the mean-squared error, which tries to minimize the average squared error between the network's output.
  • 9. Algorithms :Sentiment analysis Manheim | Proprietary and Confidential 9 Sentiment refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials. We used Naïve-Bayes model. We have two training groups G ={ ‘Cancel’, “Member”}, D= Messages Example tk= {“like”, “love”, “hate”, “bad”, “worst” , "interesting-to-me" : "not-interesting-to- me”,…..k-terms} Goal is to find best group for a message D using maximum a posteriori (MAP) group Gmap tk is a term; Dm is the set from ‘Members’; Dmk is the subset that contain tk; Dc is the set from ‘Cancelled Member’; Dck is the subset that contain tk.
  • 10. The Result • Every dealer will be assigned to a group • He / She will have 3 different health score (1-Churn rate) – 0-30 days health score (Calculated using last 30 days data) – 30-60 days health score (Calculated using last 30-60 days data) – 60+days health score (Calculated using last 60-120 days data) • Sales force will be alarmed to see if a successful user turned to fall in risk category. They will look into the parameter which forced them to be in risk category – Example : Last 30 days less Activity • Marketing team will take risk category users and aim promotion schemes to them Manheim | Proprietary and Confidential 10
  • 11. ML use case 2: Recommendation Manheim | Proprietary and Confidential 11 What is recommendation system? Recommender systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item. Goal Suggest relevant content to the users
  • 12. Recommendation: The Approach Manheim | Proprietary and Confidential 12 • Step 1 – Segment customers according their transaction patterns • Step 2 – For every segment create user profile per customer • Step 3 – Match user profile with vehicle profile and arrive at matching score • Step 4 – Rank the relevant content • Step 5 – Combine profile matching and ranking and provide recommendations
  • 13. The approach: Segment the customers Manheim | Proprietary and Confidential 13 Segment the customers according to their behavior • Franchise dealer, Independent, Wholesaler K-means or any clustering technique could be used for this purpose Our objective is to find best group every dealer belongs to. where μi is the mean of points in Si. and S = {different customer segments}
  • 14. The approach :Creating user profile and Matching • Create user profiles by collecting the dealer transaction pattern for a period of time • For every user profile perform vehicle filtering using content based collaborative filtering – User – Item collaborative filtering: Relevant content recommendation • Customers who bought car X also bought car Y – 2010 Honda Accord Vs 2010 Toyota Camry – User- User collaborative filtering : You may also like these • Dealer A and Dealer B how much their profiles match Similarity or Co-rating matrix is used to arrive at relevant content matching correlations Manheim | Proprietary and Confidential 14
  • 15. The approach: Ranking scores using regression Customer need score Once we have filtered the profiles that are relevant to the users, rank/sort the vehicles according to some goal to provide more relevant content on top • Example: Suggest items that makes more profit for the customers in the retail market, in this case regression goal is profit. Where α1…αn can be Buying price from auction, retail selling price, Detailing work done on the cars etc., Result Suggest relevant cars to the dealers when they login to the site
  • 16. ML use case 3: Forecasting • How many transaction a buyer is going to make in next few weeks? – Given the past year transaction history for a buyer, how many cars the dealer will buy in future few auctions or online. – Which year, make and model the dealer buy? – In which auction, region he will buy. • How many users are going to Churn in next few months? – How many will move from risk category to successful category – How many will move to risk category – How many non active moved to active category Manheim | Proprietary and Confidential 16
  • 17. Synopsis : Time series and ARIMA Manheim | Proprietary and Confidential 17 A time series can be viewed as a combination of signal and noise, and could have different patterns like, and it could also have a seasonal component. • Mean reversion • The trend will tend to move to the mean over time • Sinusoidal oscillation • Etc., An ARIMA model can be viewed as a “filter” that tries to separate the signal from the noise, and the signal is then extrapolated into the future to obtain forecasts. ARIMA models are, the most general class of models for forecasting a time series.
  • 18. The Approach :ARIMA Auto Regressive Integrated moving average model for calculating the forecast, A non seasonal ARIMA model is classified as an"ARIMA(p,d,q) model, where: p is the number of autoregressive terms d is the number of non seasonal differences needed for stationarity q is the number of moving average terms. A seasonal ARIMA model is classified as an ARIMA(p,d,q)x(P,D,Q) model, where P=number of seasonal autoregressive (SAR) terms D=number of seasonal differences Q=number of seasonal moving average (SMA) terms According to signal type, we developed automatic forecast parameter prediction algorithm, that choses different p,P, d,D and q,Q values and selects the one which has lowest RMSE value using 80-20 rule. Manheim | Proprietary and Confidential 18
  • 19. Manheim | Proprietary and Confidential 19 perioid− Example4−c(0, 0, 0),S(1,0,0) Weeks count 0 20 40 60 80 100 400005000060000700008000090000 80/20 Weeks count 0 20 40 60 400005000060000700008000090000 One Example
  • 20. Summary • We used various ML techniques and implemented them for vehicle auction industry use cases. • Choosing the algorithm determines the success of the results and depending on the use case, various algorithms can be used • Extracting , Cleaning and normalizing the data forms the crucial layer in determining the use case success Manheim | Proprietary and Confidential 20
  • 21. Acknowledgement • Dr. Stephane Pinel • Sonar Team • Manheim Manheim | Proprietary and Confidential 21
  • 22. Q &A Manheim | Proprietary and Confidential 22