SlideShare une entreprise Scribd logo
1  sur  16
BANK
CUSTOMER
SEGMENTATION
Research Project 1
INTRODUCTION
 I got this dataset from Kaggle website. This dataset is all
about transactions.
 Most banks have a large customer base - with different
characteristics in terms of age, income, values, lifestyle, and
more.
 Customer segmentation is the process of dividing a customer
dataset into specific groups based on shared traits.
 This process allows financial institutions to better understand
their customers and tailor their products, services, and
marketing strategies to meet the unique requirements of each
segment.
 Customer understanding should be a living, breathing part of
everyday business, with insights underpinning the full range of
banking operations.
CONTENT
 Importing Libraries
 Dataset Features
 ​EDA (Exploratory Data Analysis)
 Visualization
 ​Manipulating Data
 Dealing with “Null” Values
 Encoding the Categorical Data
 KMeans
 DBSCAN
 Conclusion
IMPORTING LIBRARIES
We will be using the following libraries :
 Pandas Library :-
It is useful for Data Processing and Analysis.
 Pandas Data frame :-
It is a Two-Dimensional tabular data structured
with labeled axes(rows and columns).
 Seaborn :-
It is useful for Data Visualization.
 Numpy :-
It is a Python library used for working
with Arrays.
 Matplotlib.pyplot :-
It is useful for making Plots.
DATASET FEATURES
 TransactionID
 CustomerID
 CustomerDOB
 CustGender
 CustLocation
 CustAccountBalance
 TransactionDate
 TransactionTime
 TransactionAmount (INR)
EDA (EXPLORATORY DATA ANALYSIS)
 Exploratory Data Analysis (EDA) is a crucial phase in the data analysis process, where analysts and data
scientists examine and summarize the main characteristics of a dataset.
 EDA plays a pivotal role in hypothesis generation, data cleaning, and guiding the selection of appropriate
modeling techniques, ultimately facilitating more informed and effective decision-making processes based
on a solid understanding of the data at hand.
 As we can see there some null values in “CustomerDOB” , “CustGender” and
“CustAccountBalance” . We will treat it further.
 Then we use describe function, with the help of this function we will get Count, mean, minimum,
maximum and some more statistical values of numeric column.
VISUALIZATION
 Seaborn : It is useful for making Plots.
1. Heat Map or Co-relation Matrix : With the help of heat map we can see the co-relation between each
column in dataset.
2. Histplot : This type of plot displays the distribution of a dataset by dividing it into bins and representing
the frequency of data points within each bin with bars, providing insights into the underlying data
distribution.
3. As we can see in histplot about customer gender, there are more male customers as compared to
female customers
MANIPULATING DATA
 Manipulating data involves transforming, cleaning or organizing information within a dataset to extract
meaningful insights.
 There is column “TransactionDate” I changed his type to datetime.
 With the help of this column I created three new columns “transaction_year”, “transaction_month” and
“transaction_day”.
 After all the process I deleted or drop that columns which are not useful or not matter for machine
learning model
DEALING WITH “NULL” VALUES
 As we saw in EDA there are some null values in “CustAccountBalance” and “CustGender”.
 I filled “CustAccountBalance” null values with “0” value cause account balance is very sensitive part
in transactions and we can’t just filled it with assumptions cause this will mislead us.
 “CustGender” is a categorical column so null values of this column can’t filled with mean or median.
This null values can only filled with mode value of that column.
ENCODING THE CATEGORICAL DATA
 The process of converting categorical data into numerical data form is called “Categorical Encoding.
 There are few methods of categorical encoding like Label encoding and One-Hot encoding.
 I choose label encoding instead of one-hot encoding cause it makes data too complicated.
 After deleting or dropping some columns, now there are only two categorical columns which we
have too encode or convert into numeric column. The two columns are “ CustGender” and
“Custloaction” .
 This is how our data looks like after all preprocessing and encoding the categorical data.
KMEANS
 K-means clustering is a popular unsupervised machine learning algorithm used for partitioning a
dataset into a set of distinct, non-overlapping subgroups or clusters.
 The primary goal of K-means is to group similar data points together and assign them to clusters based
on certain features or attributes.
 Deciding clusters is one of the critical and
important part in KMeans algorithm.
 There is a method for deciding number of
cluster which called Elbow Method.
 Elbow Method: It involves plotting the
Within-Cluster Sum of Squares (WCSS)
against different values of k and identifying
the "elbow point," where the reduction in
WCSS starts to slow down.
 So in this dataset according to elbow
method the number of cluster should be 2
which are based on customer gender “Male”
and “Female”.
This will not very helpful or making sense.
 After observing and studying the dataset I find out there are total twenty unique locations in
customer location column.
 So I decided to make 20 clusters cause it will make some sense for the machine learning model.
 After making twenty cluster I check the “Silhouette Score” metric.
 This metric is used to assess the quality of clusters in clustering methods.
 The Silhouette score for this algorithm is 69.83% which is decent score.
DBSCAN
 DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a popular unsupervised
machine learning algorithm used for clustering spatial data points based on their density distribution.
 Unlike K-means, DBSCAN does not require specifying the number of clusters in advance. Instead, it
defines clusters as dense regions separated by areas of lower point density.
CONCLUSION
 KMeans algorithm works more better than
DBCSAN(Density-Based Spatial Clustering of
Applications with Noise).
 We made 20 clusters in KMeans algorithm based on
customer location. Which are helpful for bank to
target those locations for making promotion through
ads or creating new exciting offers or policies from
where the most of transactions or huge amount of
transactions were done.
 DBSCAN algorithm is not resulting good as his
silhouette score comes in negative.
 Silhouette score of DBSACN comes negative cause
DBSCAN is not good for high density datasets.
 This all information is enough to choose KMeans
algorithm instead of DBSCAN algorithm.
THANK YOU!!!

Contenu connexe

Similaire à Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Customer Segmentation

K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...IOSR Journals
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_SagarSagar Kumar
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and ComparisonBenjamin Franklin
 
dataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfdataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfAnilGupta681764
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data modelmoni sindhu
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningUjjawal
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345AkhilSinghal21
 
Clustering
ClusteringClustering
ClusteringMeme Hei
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dmsumit621
 
Mi0034 –database management systems
Mi0034 –database management systemsMi0034 –database management systems
Mi0034 –database management systemssmumbahelp
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginnerexcel content
 

Similaire à Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Customer Segmentation (20)

Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
K Means Clustering Algorithm for Partitioning Data Sets Evaluated From Horizo...
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
 
Predictive modeling
Predictive modelingPredictive modeling
Predictive modeling
 
Weka_Manual_Sagar
Weka_Manual_SagarWeka_Manual_Sagar
Weka_Manual_Sagar
 
69.pdf
69.pdf69.pdf
69.pdf
 
Characterization and Comparison
Characterization and ComparisonCharacterization and Comparison
Characterization and Comparison
 
dataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdfdataminingpres-150821063129-lva1-app6891 (3).pdf
dataminingpres-150821063129-lva1-app6891 (3).pdf
 
multi dimensional data model
multi dimensional data modelmulti dimensional data model
multi dimensional data model
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Clustering
ClusteringClustering
Clustering
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Cssu dw dm
Cssu dw dmCssu dw dm
Cssu dw dm
 
Mi0034 –database management systems
Mi0034 –database management systemsMi0034 –database management systems
Mi0034 –database management systems
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 

Plus de Boston Institute of Analytics

Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Boston Institute of Analytics
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesBoston Institute of Analytics
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceBoston Institute of Analytics
 
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBeyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBoston Institute of Analytics
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureBoston Institute of Analytics
 
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsUnveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsBoston Institute of Analytics
 
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgBoston Institute of Analytics
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFBoston Institute of Analytics
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Boston Institute of Analytics
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 

Plus de Boston Institute of Analytics (20)

Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBeyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsUnveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
 
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Detecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven ApproachDetecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven Approach
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 

Dernier

如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancingmohamed Elzalabany
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证a8om7o51
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 

Dernier (20)

如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 

Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Customer Segmentation

  • 2. INTRODUCTION  I got this dataset from Kaggle website. This dataset is all about transactions.  Most banks have a large customer base - with different characteristics in terms of age, income, values, lifestyle, and more.  Customer segmentation is the process of dividing a customer dataset into specific groups based on shared traits.  This process allows financial institutions to better understand their customers and tailor their products, services, and marketing strategies to meet the unique requirements of each segment.  Customer understanding should be a living, breathing part of everyday business, with insights underpinning the full range of banking operations.
  • 3. CONTENT  Importing Libraries  Dataset Features  ​EDA (Exploratory Data Analysis)  Visualization  ​Manipulating Data  Dealing with “Null” Values  Encoding the Categorical Data  KMeans  DBSCAN  Conclusion
  • 4. IMPORTING LIBRARIES We will be using the following libraries :  Pandas Library :- It is useful for Data Processing and Analysis.  Pandas Data frame :- It is a Two-Dimensional tabular data structured with labeled axes(rows and columns).  Seaborn :- It is useful for Data Visualization.  Numpy :- It is a Python library used for working with Arrays.  Matplotlib.pyplot :- It is useful for making Plots.
  • 5. DATASET FEATURES  TransactionID  CustomerID  CustomerDOB  CustGender  CustLocation  CustAccountBalance  TransactionDate  TransactionTime  TransactionAmount (INR)
  • 6. EDA (EXPLORATORY DATA ANALYSIS)  Exploratory Data Analysis (EDA) is a crucial phase in the data analysis process, where analysts and data scientists examine and summarize the main characteristics of a dataset.  EDA plays a pivotal role in hypothesis generation, data cleaning, and guiding the selection of appropriate modeling techniques, ultimately facilitating more informed and effective decision-making processes based on a solid understanding of the data at hand.
  • 7.  As we can see there some null values in “CustomerDOB” , “CustGender” and “CustAccountBalance” . We will treat it further.  Then we use describe function, with the help of this function we will get Count, mean, minimum, maximum and some more statistical values of numeric column.
  • 8. VISUALIZATION  Seaborn : It is useful for making Plots. 1. Heat Map or Co-relation Matrix : With the help of heat map we can see the co-relation between each column in dataset. 2. Histplot : This type of plot displays the distribution of a dataset by dividing it into bins and representing the frequency of data points within each bin with bars, providing insights into the underlying data distribution. 3. As we can see in histplot about customer gender, there are more male customers as compared to female customers
  • 9. MANIPULATING DATA  Manipulating data involves transforming, cleaning or organizing information within a dataset to extract meaningful insights.  There is column “TransactionDate” I changed his type to datetime.  With the help of this column I created three new columns “transaction_year”, “transaction_month” and “transaction_day”.  After all the process I deleted or drop that columns which are not useful or not matter for machine learning model
  • 10. DEALING WITH “NULL” VALUES  As we saw in EDA there are some null values in “CustAccountBalance” and “CustGender”.  I filled “CustAccountBalance” null values with “0” value cause account balance is very sensitive part in transactions and we can’t just filled it with assumptions cause this will mislead us.  “CustGender” is a categorical column so null values of this column can’t filled with mean or median. This null values can only filled with mode value of that column.
  • 11. ENCODING THE CATEGORICAL DATA  The process of converting categorical data into numerical data form is called “Categorical Encoding.  There are few methods of categorical encoding like Label encoding and One-Hot encoding.  I choose label encoding instead of one-hot encoding cause it makes data too complicated.  After deleting or dropping some columns, now there are only two categorical columns which we have too encode or convert into numeric column. The two columns are “ CustGender” and “Custloaction” .  This is how our data looks like after all preprocessing and encoding the categorical data.
  • 12. KMEANS  K-means clustering is a popular unsupervised machine learning algorithm used for partitioning a dataset into a set of distinct, non-overlapping subgroups or clusters.  The primary goal of K-means is to group similar data points together and assign them to clusters based on certain features or attributes.  Deciding clusters is one of the critical and important part in KMeans algorithm.  There is a method for deciding number of cluster which called Elbow Method.  Elbow Method: It involves plotting the Within-Cluster Sum of Squares (WCSS) against different values of k and identifying the "elbow point," where the reduction in WCSS starts to slow down.  So in this dataset according to elbow method the number of cluster should be 2 which are based on customer gender “Male” and “Female”. This will not very helpful or making sense.
  • 13.  After observing and studying the dataset I find out there are total twenty unique locations in customer location column.  So I decided to make 20 clusters cause it will make some sense for the machine learning model.  After making twenty cluster I check the “Silhouette Score” metric.  This metric is used to assess the quality of clusters in clustering methods.  The Silhouette score for this algorithm is 69.83% which is decent score.
  • 14. DBSCAN  DBSCAN, or Density-Based Spatial Clustering of Applications with Noise, is a popular unsupervised machine learning algorithm used for clustering spatial data points based on their density distribution.  Unlike K-means, DBSCAN does not require specifying the number of clusters in advance. Instead, it defines clusters as dense regions separated by areas of lower point density.
  • 15. CONCLUSION  KMeans algorithm works more better than DBCSAN(Density-Based Spatial Clustering of Applications with Noise).  We made 20 clusters in KMeans algorithm based on customer location. Which are helpful for bank to target those locations for making promotion through ads or creating new exciting offers or policies from where the most of transactions or huge amount of transactions were done.  DBSCAN algorithm is not resulting good as his silhouette score comes in negative.  Silhouette score of DBSACN comes negative cause DBSCAN is not good for high density datasets.  This all information is enough to choose KMeans algorithm instead of DBSCAN algorithm.