SlideShare une entreprise Scribd logo
1  sur  19
Insuarance
Churn Prediction
Leveraging Machine Learning for
Enhanced Customer Retention Presented by : Tanmay
Badgujar
Introduction
• The Insurance sector is undergoing rapid transformation,
propelled by technological advancements, shifting
consumer preferences, and a fiercely competitive market.
• Policyholder churn, the phenomenon of policyholders
terminating their relationship with an insurance company,
presents distinct challenges and opportunities. When an
insurance company loses policyholders, it can significantly
impact its revenue and market position.
Through data-driven insights and predictive modeling, this presentation aims to showcase my
Machine Learning Capstone Project focused on predicting customer churn in the Retail Sector.
Why Retail Domain?
I chose the Retail Domain for my Capstone Project because:
Consumer Behavior: Retail is all about understanding consumer behavior. Predicting how customers make their
purchasing decisions and what influences them is like solving an intriguing puzzle.
Dynamic Market: The retail market is full of rules and regulations. These rules keep changing and adapting to these
changes is a challenge, but it also keeps things interesting.
Data Privacy: Customer data in retail is confidential. We need to figure out how to analyze it without compromising
privacy, making it a complex but fascinating task.
Diverse Customers: Every customer’s shopping needs are different. Managing relationships with a diverse customer
base adds another layer of complexity.
Technological Advancements: New technologies are always emerging, especially in retail. Figuring out how to
leverage these technological innovations to enhance the customer shopping experience is part of the adventure. 😊
Project’s Significance and
its Benefits to Insuarance company
• Better Customer Experience: By predicting churn, we can create personalized strategies that
improve relationships with customers and make them happier.
• Saving Money: It’s cheaper to keep existing customers than to find new ones. By predicting and
reducing churn, we can keep more customers and increase our profits.
• Reducing Risk: By figuring out which customers might leave, we can take steps to prevent it. This
helps us manage risks and plan better.
• Staying Competitive: By managing churn effectively, our insurance company can stand out by
offering services that are tailored to each customer’s needs. This gives us an edge over our
competitors.
• Long-Term Success: Our project doesn’t just help keep customers—it also helps the insurance
company succeed in the long run. By focusing on customers and reducing churn, we’re building a
business that’s built to last.
Dataset
Information
Our dataset is a comprehensive collection of data, consisting of 3914 records. Each record
represents a unique customer, contributing to the depth and breadth of our analysis.
The dataset includes a diverse set of 24 features, each offering valuable insights into
customer behavior, preferences, and their insurance policies. These features form the
foundation of our predictive modeling.
Exploratory Data Analysis (EDA)
• Exploring the data allowed us to gain a comprehensive overview of
the data's structure. It uncovered potential patterns, helped us
identify key trends and get essential insights from the dataset.
• Throughout the EDA process, we analyzed the distribution of
individual features, investigated correlations, and explored any
inherent relationships between variables.
• Visualizations also played a crucial role in providing a clear
representation of the data, offering insights into customer behavior
and identifying the factors that may contribute to customer churn.
• First, we made sure there were no Null values and Duplicates in the dataset. And luckily,
there weren't any. Our dataset was clean to begin with.
• In our exploratory data analysis, we discovered an imbalance in our target variable,
“Response”. Over 7000 individuals had not responded, creating a class imbalance. This
insight will guide our next steps in addressing this issue for a more balanced and accurate
predictive model.
Exploratory Data Analysis (EDA)
Visualizations
Most responses fall under the “No” category, with
Two-Door Cars having the highest count of over
3500.
There are significantly fewer “Yes” responses across
all vehicle classes.
The x-axis represents different ranges of CLV, and
the y-axis represents the frequency of customers
falling within those ranges.
The majority of customers have a CLV between 0
and 10,000.
As the customer lifetime value increases, the
frequency decreases sharply.
• Churn Customers Demographics: This table shows the percentage of churned customers by
employment status and gender. For example, 70.31% of female and 74.03% of male churned
customers are retired.
• Response by Education: This bar graph shows the count of customers based on their education
levels: Bachelor, College, High School or Below, Master, Doctor. The highest count is for customers
with a Bachelor’s degree, and the lowest is for those with a Doctorate.
• The x-axis represents the days, marked at intervals of 5 up to 35 days.
• The y-axis represents the sum or number of policies, ranging from 0 to 1000.
• The orange line represents the trend in policies.
• For most of the duration (up to day 30), the policy count fluctuates between approximately 500 and
just under a thousand.
• After day 30, there’s a significant drop in policy count.
This heatmap provides insights into the relationships between different variables, which could be useful for
understanding patterns and dependencies in the data.
Red indicates positive correlation while blue indicates negative correlation.
Variables like “Customer Lifetime Value”, “Income”, “Monthly Premium Auto”, “Months Since Last Claim”,
“Months Since Policy Inception”, “Number of Open Complaints”, “Number of Policies” and “Total Claim
Amount” are being compared.
Preprocessing
• Dropping Irrelevant Columns: The ‘Customer’ and ‘Effective To Date’ columns are
dropped from the dataframe ‘df’ as they are deemed irrelevant for the analysis.
• Encoding Target Variable: The ‘Response’ column, which is the target variable, is
encoded from ‘Yes’ or ‘No’ to ‘1’ or ‘0’. This is done to facilitate machine learning
algorithms which work better with numerical data.
Splitting the data into X and
y• In this step, we partitioned the dataset into two components: X and y.
• The variable X encompasses all independent variables, representing the features
that contribute to our predictions.
• On the other hand, y encapsulates the dependent variable or target variable,
serving as the outcome we aim to predict.
Train-Test Split
• We then split the dataset into training data and testing data.
• We did an 80:20 split, meaning 80% of our data is Training Data and 20% of our data is
Testing Data. So, our test size was set to 0.2.
• We will take Random State as 123. This will guarantee the reproducibility of our results
across different runs.
Standard Scaler
• We used Standard Scaler to standardize the features of the dataset.
• This ensured that the consistency between the features of the dataset was maintained.
• Standardization is crucial for certain machine learning algorithms, promoting optimal
model performance by mitigating the influence of varying magnitudes among features
Over-Sampling with SMOTE
• We had data imbalance within our target variable. Initially, we evaluated our model's
accuracy in the presence of this imbalance.
• Then, to rectify the issue of imbalance, we implemented the Synthetic Minority Over-
Sampling Technique (SMOTE) as an oversampling method.
• We then compared the model accuracies before and after addressing the data imbalance using
SMOTE, providing valuable insights into the impact of this preprocessing technique.
• Distribution of our y_train before oversampling :
• Distribution of our y_train after oversampling:
Not Churned Churned
6261 6261
Not Churned Churned
6261 1046
Applying Machine
Learning Algorithms
This Insuarance Company Churn problem is a Binary Classification problem.
Models used:
• Logistic Regression : Logistic Regression is a powerful tool in binary classification. Its very good at modeling the
probability of an event occurring, making it suitable for scenarios where understanding the likelihood of customers
churning is essential.
• Support Vector Classification (SVC) : Support Vector Classification is a robust algorithm employed for
classification tasks, especially when there's a need for clear separation between classes. In the context of customer
churn prediction, it draws distinct decision boundaries between loyal and potential churned customers.
• Random Forest Classification: Random Forest Classification is a powerful algorithm used for both classification
and regression tasks. It operates by constructing multiple decision trees during training and outputting the class
that is the mode of the classes for classification, or mean prediction for regression. In the context of customer
churn prediction, it can handle a large number of features and identify the most significant ones, making it
effective in predicting customer churn.
Model Selection and Considerations
The Random Forest model was chosen for its high accuracy of 98% and its resistance to
overfitting. This was confirmed by consistently high cross-validation scores, averaging
around 0.984. Therefore, due to its performance and stability, the Random Forest model
was found to be the best fit for this analysis
Conclusion
• With the help of several insights, patterns and trends in our data, we’ve used Machine Learning to
address the intricate challenge of predicting Customer Churn.
• This project offers significant benefits to Insuarance Company:
 By predicting potential churners, Insuarance Companies can adopt proactive strategies to
retain valuable customers. This involves personalized interventions, loyalty programs, and
targeted communication to address customer concerns and enhance satisfaction.
 Understanding the factors influencing customer churn enables Insuarance Companies to tailor
their services to meet individual needs. This level of personalization fosters stronger customer
relationships, increases loyalty, and enhances the overall banking experience.
Thank You !

Contenu connexe

Similaire à Insurance Churn Prediction Data Analysis Project

Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonaliSonali Gupta
 
Telecom analytics brochure
Telecom analytics brochure Telecom analytics brochure
Telecom analytics brochure Daniel Thomas
 
Retail Energy Analytics_Marketelligent
Retail Energy Analytics_MarketelligentRetail Energy Analytics_Marketelligent
Retail Energy Analytics_MarketelligentMarketelligent
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationKaushik Rajan
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesSindhujanDhayalan
 
MA- UNIT -1.pptx for ipu bba sem 5, complete pdf
MA- UNIT -1.pptx for ipu bba sem 5, complete pdfMA- UNIT -1.pptx for ipu bba sem 5, complete pdf
MA- UNIT -1.pptx for ipu bba sem 5, complete pdfzm2pfgpcdt
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 
Evans_Analytics2e_ppt_01.pdf
Evans_Analytics2e_ppt_01.pdfEvans_Analytics2e_ppt_01.pdf
Evans_Analytics2e_ppt_01.pdfUmaDeviAnanth
 
Chapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfChapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfShamshadAli58
 
Intro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfIntro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfMachineLearning22
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersLucinda Linde
 
Data MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData MiningData MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData Miningabdulraqeebalareqi1
 
TELECOM SERVICES: I.T. & ANALYTICS
TELECOM SERVICES: I.T. & ANALYTICSTELECOM SERVICES: I.T. & ANALYTICS
TELECOM SERVICES: I.T. & ANALYTICSGeorge Krasadakis
 

Similaire à Insurance Churn Prediction Data Analysis Project (20)

Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
 
Telecom analytics brochure
Telecom analytics brochure Telecom analytics brochure
Telecom analytics brochure
 
Retail Energy Analytics_Marketelligent
Retail Energy Analytics_MarketelligentRetail Energy Analytics_Marketelligent
Retail Energy Analytics_Marketelligent
 
Telcom churn .pptx
Telcom churn .pptxTelcom churn .pptx
Telcom churn .pptx
 
Data Mining on Customer Churn Classification
Data Mining on Customer Churn ClassificationData Mining on Customer Churn Classification
Data Mining on Customer Churn Classification
 
Customer churn classification using machine learning techniques
Customer churn classification using machine learning techniquesCustomer churn classification using machine learning techniques
Customer churn classification using machine learning techniques
 
Business Analytics.pptx
Business Analytics.pptxBusiness Analytics.pptx
Business Analytics.pptx
 
MA- UNIT -1.pptx for ipu bba sem 5, complete pdf
MA- UNIT -1.pptx for ipu bba sem 5, complete pdfMA- UNIT -1.pptx for ipu bba sem 5, complete pdf
MA- UNIT -1.pptx for ipu bba sem 5, complete pdf
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
Evans_Analytics2e_ppt_01.pdf
Evans_Analytics2e_ppt_01.pdfEvans_Analytics2e_ppt_01.pdf
Evans_Analytics2e_ppt_01.pdf
 
Chapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdfChapter 1 Introduction to Business Analytics.pdf
Chapter 1 Introduction to Business Analytics.pdf
 
Intro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdfIntro_to_business_analytics_1707852756.pdf
Intro_to_business_analytics_1707852756.pdf
 
Risk Based Loan Approval Framework
Risk Based Loan Approval FrameworkRisk Based Loan Approval Framework
Risk Based Loan Approval Framework
 
Evolving Business Models in Digital Health
Evolving Business Models in Digital HealthEvolving Business Models in Digital Health
Evolving Business Models in Digital Health
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
Wooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit CustomersWooing the Best Bank Deposit Customers
Wooing the Best Bank Deposit Customers
 
Data MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData MiningData MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData Mining
 
TELECOM SERVICES: I.T. & ANALYTICS
TELECOM SERVICES: I.T. & ANALYTICSTELECOM SERVICES: I.T. & ANALYTICS
TELECOM SERVICES: I.T. & ANALYTICS
 

Plus de Boston Institute of Analytics

Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Boston Institute of Analytics
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceBoston Institute of Analytics
 
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBeyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBoston Institute of Analytics
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureBoston Institute of Analytics
 
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsUnveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsBoston Institute of Analytics
 
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgBoston Institute of Analytics
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFBoston Institute of Analytics
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Boston Institute of Analytics
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 

Plus de Boston Institute of Analytics (20)

Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBeyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsUnveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
 
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Detecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven ApproachDetecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven Approach
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 

Dernier

Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单aqpto5bt
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444saurabvyas476
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...ThinkInnovation
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 

Dernier (20)

Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 

Insurance Churn Prediction Data Analysis Project

  • 1.
  • 2. Insuarance Churn Prediction Leveraging Machine Learning for Enhanced Customer Retention Presented by : Tanmay Badgujar
  • 3. Introduction • The Insurance sector is undergoing rapid transformation, propelled by technological advancements, shifting consumer preferences, and a fiercely competitive market. • Policyholder churn, the phenomenon of policyholders terminating their relationship with an insurance company, presents distinct challenges and opportunities. When an insurance company loses policyholders, it can significantly impact its revenue and market position. Through data-driven insights and predictive modeling, this presentation aims to showcase my Machine Learning Capstone Project focused on predicting customer churn in the Retail Sector.
  • 4. Why Retail Domain? I chose the Retail Domain for my Capstone Project because: Consumer Behavior: Retail is all about understanding consumer behavior. Predicting how customers make their purchasing decisions and what influences them is like solving an intriguing puzzle. Dynamic Market: The retail market is full of rules and regulations. These rules keep changing and adapting to these changes is a challenge, but it also keeps things interesting. Data Privacy: Customer data in retail is confidential. We need to figure out how to analyze it without compromising privacy, making it a complex but fascinating task. Diverse Customers: Every customer’s shopping needs are different. Managing relationships with a diverse customer base adds another layer of complexity. Technological Advancements: New technologies are always emerging, especially in retail. Figuring out how to leverage these technological innovations to enhance the customer shopping experience is part of the adventure. 😊
  • 5. Project’s Significance and its Benefits to Insuarance company • Better Customer Experience: By predicting churn, we can create personalized strategies that improve relationships with customers and make them happier. • Saving Money: It’s cheaper to keep existing customers than to find new ones. By predicting and reducing churn, we can keep more customers and increase our profits. • Reducing Risk: By figuring out which customers might leave, we can take steps to prevent it. This helps us manage risks and plan better. • Staying Competitive: By managing churn effectively, our insurance company can stand out by offering services that are tailored to each customer’s needs. This gives us an edge over our competitors. • Long-Term Success: Our project doesn’t just help keep customers—it also helps the insurance company succeed in the long run. By focusing on customers and reducing churn, we’re building a business that’s built to last.
  • 6. Dataset Information Our dataset is a comprehensive collection of data, consisting of 3914 records. Each record represents a unique customer, contributing to the depth and breadth of our analysis. The dataset includes a diverse set of 24 features, each offering valuable insights into customer behavior, preferences, and their insurance policies. These features form the foundation of our predictive modeling.
  • 7. Exploratory Data Analysis (EDA) • Exploring the data allowed us to gain a comprehensive overview of the data's structure. It uncovered potential patterns, helped us identify key trends and get essential insights from the dataset. • Throughout the EDA process, we analyzed the distribution of individual features, investigated correlations, and explored any inherent relationships between variables. • Visualizations also played a crucial role in providing a clear representation of the data, offering insights into customer behavior and identifying the factors that may contribute to customer churn.
  • 8. • First, we made sure there were no Null values and Duplicates in the dataset. And luckily, there weren't any. Our dataset was clean to begin with. • In our exploratory data analysis, we discovered an imbalance in our target variable, “Response”. Over 7000 individuals had not responded, creating a class imbalance. This insight will guide our next steps in addressing this issue for a more balanced and accurate predictive model. Exploratory Data Analysis (EDA)
  • 9. Visualizations Most responses fall under the “No” category, with Two-Door Cars having the highest count of over 3500. There are significantly fewer “Yes” responses across all vehicle classes. The x-axis represents different ranges of CLV, and the y-axis represents the frequency of customers falling within those ranges. The majority of customers have a CLV between 0 and 10,000. As the customer lifetime value increases, the frequency decreases sharply.
  • 10. • Churn Customers Demographics: This table shows the percentage of churned customers by employment status and gender. For example, 70.31% of female and 74.03% of male churned customers are retired. • Response by Education: This bar graph shows the count of customers based on their education levels: Bachelor, College, High School or Below, Master, Doctor. The highest count is for customers with a Bachelor’s degree, and the lowest is for those with a Doctorate.
  • 11. • The x-axis represents the days, marked at intervals of 5 up to 35 days. • The y-axis represents the sum or number of policies, ranging from 0 to 1000. • The orange line represents the trend in policies. • For most of the duration (up to day 30), the policy count fluctuates between approximately 500 and just under a thousand. • After day 30, there’s a significant drop in policy count.
  • 12. This heatmap provides insights into the relationships between different variables, which could be useful for understanding patterns and dependencies in the data. Red indicates positive correlation while blue indicates negative correlation. Variables like “Customer Lifetime Value”, “Income”, “Monthly Premium Auto”, “Months Since Last Claim”, “Months Since Policy Inception”, “Number of Open Complaints”, “Number of Policies” and “Total Claim Amount” are being compared.
  • 13. Preprocessing • Dropping Irrelevant Columns: The ‘Customer’ and ‘Effective To Date’ columns are dropped from the dataframe ‘df’ as they are deemed irrelevant for the analysis. • Encoding Target Variable: The ‘Response’ column, which is the target variable, is encoded from ‘Yes’ or ‘No’ to ‘1’ or ‘0’. This is done to facilitate machine learning algorithms which work better with numerical data. Splitting the data into X and y• In this step, we partitioned the dataset into two components: X and y. • The variable X encompasses all independent variables, representing the features that contribute to our predictions. • On the other hand, y encapsulates the dependent variable or target variable, serving as the outcome we aim to predict.
  • 14. Train-Test Split • We then split the dataset into training data and testing data. • We did an 80:20 split, meaning 80% of our data is Training Data and 20% of our data is Testing Data. So, our test size was set to 0.2. • We will take Random State as 123. This will guarantee the reproducibility of our results across different runs. Standard Scaler • We used Standard Scaler to standardize the features of the dataset. • This ensured that the consistency between the features of the dataset was maintained. • Standardization is crucial for certain machine learning algorithms, promoting optimal model performance by mitigating the influence of varying magnitudes among features
  • 15. Over-Sampling with SMOTE • We had data imbalance within our target variable. Initially, we evaluated our model's accuracy in the presence of this imbalance. • Then, to rectify the issue of imbalance, we implemented the Synthetic Minority Over- Sampling Technique (SMOTE) as an oversampling method. • We then compared the model accuracies before and after addressing the data imbalance using SMOTE, providing valuable insights into the impact of this preprocessing technique. • Distribution of our y_train before oversampling : • Distribution of our y_train after oversampling: Not Churned Churned 6261 6261 Not Churned Churned 6261 1046
  • 16. Applying Machine Learning Algorithms This Insuarance Company Churn problem is a Binary Classification problem. Models used: • Logistic Regression : Logistic Regression is a powerful tool in binary classification. Its very good at modeling the probability of an event occurring, making it suitable for scenarios where understanding the likelihood of customers churning is essential. • Support Vector Classification (SVC) : Support Vector Classification is a robust algorithm employed for classification tasks, especially when there's a need for clear separation between classes. In the context of customer churn prediction, it draws distinct decision boundaries between loyal and potential churned customers. • Random Forest Classification: Random Forest Classification is a powerful algorithm used for both classification and regression tasks. It operates by constructing multiple decision trees during training and outputting the class that is the mode of the classes for classification, or mean prediction for regression. In the context of customer churn prediction, it can handle a large number of features and identify the most significant ones, making it effective in predicting customer churn.
  • 17. Model Selection and Considerations The Random Forest model was chosen for its high accuracy of 98% and its resistance to overfitting. This was confirmed by consistently high cross-validation scores, averaging around 0.984. Therefore, due to its performance and stability, the Random Forest model was found to be the best fit for this analysis
  • 18. Conclusion • With the help of several insights, patterns and trends in our data, we’ve used Machine Learning to address the intricate challenge of predicting Customer Churn. • This project offers significant benefits to Insuarance Company:  By predicting potential churners, Insuarance Companies can adopt proactive strategies to retain valuable customers. This involves personalized interventions, loyalty programs, and targeted communication to address customer concerns and enhance satisfaction.  Understanding the factors influencing customer churn enables Insuarance Companies to tailor their services to meet individual needs. This level of personalization fosters stronger customer relationships, increases loyalty, and enhances the overall banking experience.