SlideShare une entreprise Scribd logo
1  sur  6
DATA MINING


Data that has relevance for managerial decisions is accumulating at an incredible rate due to a
host of technological advances. Electronic data capture has become inexpensive and ubiquitous
as a by-product of innovations such as the internet, e-commerce, electronic banking, point-of-
sale devices, bar-code readers, and intelligent machines.
Such data is often stored in data warehouses and data marts specifically intended for
management decision support. Data mining is a rapidly growing field that is concerned with
developing techniques to assist managers to make intelligent use of these repositories.
A number of successful applications have been reported in areas such as credit rating, fraud
detection, database marketing, customer relationship management, and stock market
investments.
The field of data mining has evolved from the disciplines of statistics and artificial intelligence.


Definition

Term for confluence of ideas from statistics and computer science (machine
learning and database methods) applied to large databases in science, engineering
and business.



Gartner Group
• “Data mining is the process of discovering meaningful new
correlations, patterns and trends by sifting through large amounts of
data stored in repositories, using pattern recognition technologies as well
as statistical and mathematical techniques.”
Drivers
• Market: From focus on product/service to focus on customer

• IT: From focus on up-to-date balances to focus on patterns in transactions - Data
Warehouses -OLAP

• Dramatic drop in storage costs : Huge databases – e.g Walmart: 20 million
transactions/day, 10 terabyte database, Blockbuster: 36 million households

• Automatic Data Capture of Transactions
– e.g. Bar Codes , POS devices, Mouse clicks, Location
data (GPS, cell phones)
• Internet: Personalized interactions, longitudinal
Data
Process
1. Develop understanding of application, goals
2. Create dataset for study (often from Data
Warehouse)
3. Data Cleaning and Preprocessing
4. Data Reduction and projection
5. Choose Data Mining task
6. Choose Data Mining algorithms
7. Use algorithms to perform task
8. Interpret and iterate thru 1-7 if necessary
Data Mining step 4-8
9. Deploy: integrate into operational systems.
SEMMA Methodology
• Sample from data sets, Partition into Training, Validation and Test
datasets

• Explore data set statistically and graphically

• Modify:Transform variables, Impute missing values

• Model: fit models e.g. regression, classfication tree, neural net

• Assess: Compare models using Partition, Test datasets

Customer Relationship Management
• Target Marketing
• Attrition Prediction/Churn Analysis
• Fraud Detection
• Credit Scoring
Target marketing
• Business problem: Use list of prospects for direct mailing campaign
• Solution: Use Data Mining to identify most promising respondents
combining demographic and geographic data with data on past purchase
behavior
• Benefit: Better response rate, savings in campaign cost
Example: Fleet Financial Group
• Redesign of customer service infrastructure, including $38 million investment in
data warehouse and marketing automation

• Used logistic regression to predict response probabilities to home-equity product
for sample of 20,000 customer profiles from 15 million customer base

• Used to predict profitable customers and customers who would be unprofitable
even if they respond
Churn Analysis: Telcos
• Business Problem: Prevent loss of customers, avoid adding churn-prone
customers

• Solution: Use neural nets, time series analysis to identify typical patterns of
telephone usage of likely-to-defect and likely-to-churn customers

• Benefit: Retention of customers, more effective promotions

Example: IDEA CELLULAR
• CHURN/Customer Profiling System implemented as part of major custom data
warehouse solution

• Preventive action based on customer characteristics and known cases of churning
and non-churning customers identify significant characteristics for churn

• Early detection Customer Profiling Systems based on usage pattern matching
with known cases of churn customers.

Fraud Detection
• Business problem: Fraud increases costs or reduces revenue
• Solution: Use logistic regression, neural nets to identify characteristics
of fraudulent cases to prevent in future or prosecute more vigorously
• Benefit: Increased profits by reducing undesirable customers
Risk Analysis
• Business problem: Reduce risk of loans to delinquent customers
• Solution: Use credit scoring models using discriminant analysis to
create score functions that separate out risky customers
• Benefit: Decrease in cost of bad debts

Finance
• Business problem: Pricing of corporate bonds depends on several
factors, risk profile of company , seniority of debt, dividends, prior
history, etc.
• Solution Approach: Through Data Mining , develop more accurate
models of predicting prices.

E-commerce and Internet
• Collaborative Filtering
• From Clicks to Customers


Recommendation Systems
• Business opportunity: Users rate items (Amazon.com, CDNOW.com,
MovieFinder.com) on the web. How to use information from other users to infer
ratings for a particular user?
• Solution: Use of a technique known as collaborative filtering
• Benefit: Increase revenues by cross selling, up selling

Clicks to Customers
• Business problem: 50% of Dell’s clients order their computer through the web.
However, the retention rate is 0.5%, i.e. of visitors of Dell’s web page become
customers.
• Solution Approach: Through the sequence of their clicks, cluster customers and
design website, interventions to maximize the number of customers who eventually
buy.
• Benefit: Increase revenues
Emerging Major Data Mining applications
• Spam
• Bioinformatics/Genomics
• Medical History Data – Insurance Claims
• Personalization of services in e-commerce
• RF Tags
• Security :
– Container Shipments
– Network Intrusion Detection




DATA STORAGE

Core Concepts
• Types of Data:
– Numeric
• Continuous – ratio and interval
• Discrete

• Need for Binning
– Categorical – order and unordered
– Binary

• Overfitting and Generalization
• Regularization: Penalty for model complexity
• Distance
• Curse of Dimensionality
• Random and stratified sampling, resampling
• Loss Functions




Typical characteristics of mining data
• “Standard” format is spreadsheet:
– Row=observation unit, Column=variable
• Many rows, many columns
• Many rows moderate number of columns (e.g. tel. calls)
• Many columns, moderate number of rows (e.g.genomics)
• Opportunistic (often by-product of transactions)
– Not from designed experiments
– Often has outliers, missing data




Techniques
• Supervised Techniques
– Classification:
• k-Nearest Neighbors, Naïve Bayes, Classification Trees
• Discriminant Analysis, Logistic Regression, Neural Nets
– Prediction (Estimation):
• Regression, Regression Trees, k-Nearest Neighbors


• Unsupervised Techniques
– Cluster Analysis, Principal Components
– Association Rules, Collaborative Filtering

Contenu connexe

Tendances

Data Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data SetData Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data Set
Mateusz Brzoska
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.
Mateusz Brzoska
 

Tendances (20)

Data Mining in Marketing
Data Mining in MarketingData Mining in Marketing
Data Mining in Marketing
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Data Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data SetData Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data Set
 
DATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTORDATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTOR
 
Data mining
Data miningData mining
Data mining
 
Data mining in Telecommunications
Data mining in TelecommunicationsData mining in Telecommunications
Data mining in Telecommunications
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi MojabData Mining and Business Analytics by Seyed Ziae Mousavi Mojab
Data Mining and Business Analytics by Seyed Ziae Mousavi Mojab
 
Data mining in retail industry
Data mining in retail industryData mining in retail industry
Data mining in retail industry
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
 
Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.Data Mining – analyse Bank Marketing Data Set by WEKA.
Data Mining – analyse Bank Marketing Data Set by WEKA.
 
intro_to_business_analytics_and_data_science_ver 1.0
intro_to_business_analytics_and_data_science_ver 1.0intro_to_business_analytics_and_data_science_ver 1.0
intro_to_business_analytics_and_data_science_ver 1.0
 
Data mining on Financial Data
Data mining on Financial DataData mining on Financial Data
Data mining on Financial Data
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 
Data mining
Data miningData mining
Data mining
 
Datamining and Business Analytics
Datamining and Business Analytics Datamining and Business Analytics
Datamining and Business Analytics
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Data mining
Data miningData mining
Data mining
 

En vedette

Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Esteban Donato
 
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
IJCSIS Research Publications
 
Webpage Classification
Webpage ClassificationWebpage Classification
Webpage Classification
PacharaStudio
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 

En vedette (20)

Hadoop - Intro
Hadoop - IntroHadoop - Intro
Hadoop - Intro
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
 
Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.Brief Introduction about Hadoop and Core Services.
Brief Introduction about Hadoop and Core Services.
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
35 algorithm-types
35 algorithm-types35 algorithm-types
35 algorithm-types
 
Winnow vs perceptron
Winnow vs perceptronWinnow vs perceptron
Winnow vs perceptron
 
Webpage Classification
Webpage ClassificationWebpage Classification
Webpage Classification
 
Comparative evaluation of four multi label classification algorithms in class...
Comparative evaluation of four multi label classification algorithms in class...Comparative evaluation of four multi label classification algorithms in class...
Comparative evaluation of four multi label classification algorithms in class...
 
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION
DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION
 
Random forest
Random forestRandom forest
Random forest
 
Scaling up Machine Learning Algorithms for Classification
Scaling up Machine Learning Algorithms for ClassificationScaling up Machine Learning Algorithms for Classification
Scaling up Machine Learning Algorithms for Classification
 
HiveServer2
HiveServer2HiveServer2
HiveServer2
 
R Workshop for Beginners
R Workshop for BeginnersR Workshop for Beginners
R Workshop for Beginners
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With R
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Building Random Forest at Scale
Building Random Forest at ScaleBuilding Random Forest at Scale
Building Random Forest at Scale
 
Comparative analysis of algorithms classification and methods the presentatio...
Comparative analysis of algorithms classification and methods the presentatio...Comparative analysis of algorithms classification and methods the presentatio...
Comparative analysis of algorithms classification and methods the presentatio...
 
Introduction to Machine Learning & Classification
Introduction to Machine Learning & ClassificationIntroduction to Machine Learning & Classification
Introduction to Machine Learning & Classification
 
An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)An Interactive Introduction To R (Programming Language For Statistics)
An Interactive Introduction To R (Programming Language For Statistics)
 

Similaire à Data mining

Similaire à Data mining (20)

Deep learning
Deep learningDeep learning
Deep learning
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Telecom Data Analytics
Telecom Data AnalyticsTelecom Data Analytics
Telecom Data Analytics
 
Data mining wrhousing-lec
Data mining wrhousing-lecData mining wrhousing-lec
Data mining wrhousing-lec
 
Big data
Big dataBig data
Big data
 
Operationalize analytics through modern data strategy
Operationalize analytics through modern data strategyOperationalize analytics through modern data strategy
Operationalize analytics through modern data strategy
 
Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101Internet of things, Big Data and Analytics 101
Internet of things, Big Data and Analytics 101
 
Machine Learning in Customer Analytics
Machine Learning in Customer AnalyticsMachine Learning in Customer Analytics
Machine Learning in Customer Analytics
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Machine Learning in Cyber Security
Machine Learning in Cyber SecurityMachine Learning in Cyber Security
Machine Learning in Cyber Security
 
Big Data solution for multi-national Bank
Big Data solution for multi-national BankBig Data solution for multi-national Bank
Big Data solution for multi-national Bank
 
Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise Deteo. Data science, Big Data expertise
Deteo. Data science, Big Data expertise
 
Customer 360
Customer 360Customer 360
Customer 360
 
Artificial Intelligence high ROI case studies from around the world: approach...
Artificial Intelligence high ROI case studies from around the world: approach...Artificial Intelligence high ROI case studies from around the world: approach...
Artificial Intelligence high ROI case studies from around the world: approach...
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
Use of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economyUse of Analytics to recover from COVID19 hit economy
Use of Analytics to recover from COVID19 hit economy
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Business intelligence and big data
Business intelligence and big dataBusiness intelligence and big data
Business intelligence and big data
 
Bdml ecom
Bdml ecomBdml ecom
Bdml ecom
 

Plus de Kinshook Chaturvedi (20)

Working and functions_of_rbi[1]
Working and functions_of_rbi[1]Working and functions_of_rbi[1]
Working and functions_of_rbi[1]
 
Role of idfc_in_infrastucture_finance
Role of idfc_in_infrastucture_financeRole of idfc_in_infrastucture_finance
Role of idfc_in_infrastucture_finance
 
Mutual funds
Mutual fundsMutual funds
Mutual funds
 
Iifcl ppt
Iifcl pptIifcl ppt
Iifcl ppt
 
Basel ii norms.ppt
Basel ii norms.pptBasel ii norms.ppt
Basel ii norms.ppt
 
Retail banking pres
Retail banking presRetail banking pres
Retail banking pres
 
Presentation on lic of india
Presentation on lic of indiaPresentation on lic of india
Presentation on lic of india
 
Mfi dfi
Mfi dfiMfi dfi
Mfi dfi
 
Management of np as imt
Management of np as imtManagement of np as imt
Management of np as imt
 
Life insurance in india final raja
Life insurance in india final rajaLife insurance in india final raja
Life insurance in india final raja
 
Financial inclusion
Financial inclusionFinancial inclusion
Financial inclusion
 
Corporate banking v2
Corporate banking v2Corporate banking v2
Corporate banking v2
 
Corporate banking latest
Corporate banking latestCorporate banking latest
Corporate banking latest
 
Financial mgt exercises
Financial mgt exercisesFinancial mgt exercises
Financial mgt exercises
 
Csac10[1].p
Csac10[1].pCsac10[1].p
Csac10[1].p
 
Csac08[1].p
Csac08[1].pCsac08[1].p
Csac08[1].p
 
Csac05[1].p
Csac05[1].pCsac05[1].p
Csac05[1].p
 
Csac14[1].p
Csac14[1].pCsac14[1].p
Csac14[1].p
 
Csac06[1].p
Csac06[1].pCsac06[1].p
Csac06[1].p
 
Xyber001 16 12_09 (2)
Xyber001 16 12_09 (2)Xyber001 16 12_09 (2)
Xyber001 16 12_09 (2)
 

Dernier

Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
vineshkumarsajnani12
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
Nauman Safdar
 

Dernier (20)

Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
Ooty Call Gril 80022//12248 Only For Sex And High Profile Best Gril Sex Avail...
 
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book nowKalyan Call Girl 98350*37198 Call Girls in Escort service book now
Kalyan Call Girl 98350*37198 Call Girls in Escort service book now
 
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book nowPARK STREET 💋 Call Girl 9827461493 Call Girls in  Escort service book now
PARK STREET 💋 Call Girl 9827461493 Call Girls in Escort service book now
 
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165Lucknow Housewife Escorts  by Sexy Bhabhi Service 8250092165
Lucknow Housewife Escorts by Sexy Bhabhi Service 8250092165
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024
 
Pre Engineered Building Manufacturers Hyderabad.pptx
Pre Engineered  Building Manufacturers Hyderabad.pptxPre Engineered  Building Manufacturers Hyderabad.pptx
Pre Engineered Building Manufacturers Hyderabad.pptx
 
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTSJAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR  ESCORTS
JAJPUR CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN JAJPUR ESCORTS
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur DubaiUAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
UAE Bur Dubai Call Girls ☏ 0564401582 Call Girl in Bur Dubai
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
 
Buy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail AccountsBuy gmail accounts.pdf buy Old Gmail Accounts
Buy gmail accounts.pdf buy Old Gmail Accounts
 
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptxQSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
New 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck TemplateNew 2024 Cannabis Edibles Investor Pitch Deck Template
New 2024 Cannabis Edibles Investor Pitch Deck Template
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 

Data mining

  • 1. DATA MINING Data that has relevance for managerial decisions is accumulating at an incredible rate due to a host of technological advances. Electronic data capture has become inexpensive and ubiquitous as a by-product of innovations such as the internet, e-commerce, electronic banking, point-of- sale devices, bar-code readers, and intelligent machines. Such data is often stored in data warehouses and data marts specifically intended for management decision support. Data mining is a rapidly growing field that is concerned with developing techniques to assist managers to make intelligent use of these repositories. A number of successful applications have been reported in areas such as credit rating, fraud detection, database marketing, customer relationship management, and stock market investments. The field of data mining has evolved from the disciplines of statistics and artificial intelligence. Definition Term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science, engineering and business. Gartner Group • “Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories, using pattern recognition technologies as well as statistical and mathematical techniques.” Drivers • Market: From focus on product/service to focus on customer • IT: From focus on up-to-date balances to focus on patterns in transactions - Data Warehouses -OLAP • Dramatic drop in storage costs : Huge databases – e.g Walmart: 20 million transactions/day, 10 terabyte database, Blockbuster: 36 million households • Automatic Data Capture of Transactions – e.g. Bar Codes , POS devices, Mouse clicks, Location data (GPS, cell phones)
  • 2. • Internet: Personalized interactions, longitudinal Data Process 1. Develop understanding of application, goals 2. Create dataset for study (often from Data Warehouse) 3. Data Cleaning and Preprocessing 4. Data Reduction and projection 5. Choose Data Mining task 6. Choose Data Mining algorithms 7. Use algorithms to perform task 8. Interpret and iterate thru 1-7 if necessary Data Mining step 4-8 9. Deploy: integrate into operational systems. SEMMA Methodology • Sample from data sets, Partition into Training, Validation and Test datasets • Explore data set statistically and graphically • Modify:Transform variables, Impute missing values • Model: fit models e.g. regression, classfication tree, neural net • Assess: Compare models using Partition, Test datasets Customer Relationship Management • Target Marketing • Attrition Prediction/Churn Analysis • Fraud Detection • Credit Scoring Target marketing • Business problem: Use list of prospects for direct mailing campaign • Solution: Use Data Mining to identify most promising respondents combining demographic and geographic data with data on past purchase
  • 3. behavior • Benefit: Better response rate, savings in campaign cost Example: Fleet Financial Group • Redesign of customer service infrastructure, including $38 million investment in data warehouse and marketing automation • Used logistic regression to predict response probabilities to home-equity product for sample of 20,000 customer profiles from 15 million customer base • Used to predict profitable customers and customers who would be unprofitable even if they respond Churn Analysis: Telcos • Business Problem: Prevent loss of customers, avoid adding churn-prone customers • Solution: Use neural nets, time series analysis to identify typical patterns of telephone usage of likely-to-defect and likely-to-churn customers • Benefit: Retention of customers, more effective promotions Example: IDEA CELLULAR • CHURN/Customer Profiling System implemented as part of major custom data warehouse solution • Preventive action based on customer characteristics and known cases of churning and non-churning customers identify significant characteristics for churn • Early detection Customer Profiling Systems based on usage pattern matching with known cases of churn customers. Fraud Detection • Business problem: Fraud increases costs or reduces revenue • Solution: Use logistic regression, neural nets to identify characteristics of fraudulent cases to prevent in future or prosecute more vigorously • Benefit: Increased profits by reducing undesirable customers Risk Analysis
  • 4. • Business problem: Reduce risk of loans to delinquent customers • Solution: Use credit scoring models using discriminant analysis to create score functions that separate out risky customers • Benefit: Decrease in cost of bad debts Finance • Business problem: Pricing of corporate bonds depends on several factors, risk profile of company , seniority of debt, dividends, prior history, etc. • Solution Approach: Through Data Mining , develop more accurate models of predicting prices. E-commerce and Internet • Collaborative Filtering • From Clicks to Customers Recommendation Systems • Business opportunity: Users rate items (Amazon.com, CDNOW.com, MovieFinder.com) on the web. How to use information from other users to infer ratings for a particular user? • Solution: Use of a technique known as collaborative filtering • Benefit: Increase revenues by cross selling, up selling Clicks to Customers • Business problem: 50% of Dell’s clients order their computer through the web. However, the retention rate is 0.5%, i.e. of visitors of Dell’s web page become customers. • Solution Approach: Through the sequence of their clicks, cluster customers and design website, interventions to maximize the number of customers who eventually buy. • Benefit: Increase revenues Emerging Major Data Mining applications • Spam • Bioinformatics/Genomics
  • 5. • Medical History Data – Insurance Claims • Personalization of services in e-commerce • RF Tags • Security : – Container Shipments – Network Intrusion Detection DATA STORAGE Core Concepts • Types of Data: – Numeric • Continuous – ratio and interval • Discrete • Need for Binning – Categorical – order and unordered – Binary • Overfitting and Generalization
  • 6. • Regularization: Penalty for model complexity • Distance • Curse of Dimensionality • Random and stratified sampling, resampling • Loss Functions Typical characteristics of mining data • “Standard” format is spreadsheet: – Row=observation unit, Column=variable • Many rows, many columns • Many rows moderate number of columns (e.g. tel. calls) • Many columns, moderate number of rows (e.g.genomics) • Opportunistic (often by-product of transactions) – Not from designed experiments – Often has outliers, missing data Techniques • Supervised Techniques – Classification: • k-Nearest Neighbors, Naïve Bayes, Classification Trees • Discriminant Analysis, Logistic Regression, Neural Nets – Prediction (Estimation): • Regression, Regression Trees, k-Nearest Neighbors • Unsupervised Techniques – Cluster Analysis, Principal Components – Association Rules, Collaborative Filtering