SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
presented to fwPASS on 1/26/2010

DATA MINING – A BETTER WAY
TO DESIGN A STIMULUS
PROGRAM LIKE “CASH FOR
CLUNKERS”
About Me
 Work for Systemental as a Consultant and
  Software Developer
 Software development to support Corporate
  business process improvement since 2000
  (Lean or Continuous Improvement Initiatives)
   .Net since 2004
 President, fwPASS.org
 Mfg. Eng. Technology degrees from Ball State
  University
 Six Sigma Black Belt, Certified
What We Will cover

 Data mining – what is it?
 “Cash for Clunkers”
 Other examples
   Amazon.com
   Coke Freestyle
 Basic Data Mining Concepts
 Demo time
Wikipedia

Data mining is the process of extracting
 patterns from data. Data mining is becoming
 an increasingly important tool to transform
 these data into information. It is commonly
 used in a wide range of profiling practices,
 such as marketing, surveillance, fraud
 detection and scientific discovery.
Cash for Clunkers

    Columbia City: SR 30 & SR 9
Objectives of “Cash for
Clunkers”
 Jump start automotive sector sales
   Specifically higher mileage vehicles
 Get gas guzzlers off the street
Cash for Clunkers

  How did they decide who to target and
   how?
  How would you do it?
  Where did the data come from?
  Where should the data come from?
Who to target?

 Anyone, everyone, or targeted
 Self qualified
 Organic growth or just “pull up” existing sales
 Convert foreign sales to GM
   Conflict of interest? – Government motors
 Discriminatory?
Estimating the effectiveness

 Affect of “pull up” vs. organic growth
 Peripheral commercial effect
 Estimation of payback
   Sales, plates and excise tax
   Income tax from lay-off recalls
   Reduction of unemployment
   Auto Insurance
 Reduction in tax revenue at gas pumps
Data content and source

 Public records
 CAFE
 GM Data
 Industry sponsored studies
Amazon.com
SQL Server 2005 Data Mining

 Nine algorithms (3rd party pluggable)
 Both Modeling and exploration in VS
 Integrated tools: SS*S
 API
 Data Mining Extensions to SQL (DMX)
Type of analysis

 Optimization vs. Predictive
 Descriptive – provides deeper understanding
  of existing data
 Predictive – provides insight to understand
  probability of future conditions
Data Mining Objective

 Classification – assign data to known classes
    (discrete)
   Segmentation – clustering in similar groups
   Estimation – predicting continuous values
   Association – what events occur together
   Forecasting – time series estimating of future
Algorithms

1.   Decision Trees (attributes from the tree)
2.   Naive Bayes (uses all attributes)
3.   Clustering
4.   Linear Regression
5.   Logistic Regression
6.   Neural Nets
7.   Sequence Clustering
8.   Time Series
9.   Association Rules (discrete only)
DMX

 Column syntax: Name, data type, content
  type, [usage]
 Case being analyzed – key
 Content type: key, key sequence, key time,
  discrete, continuous, discretized (# of
  buckets)
 Usage: Input, predict, predict-only (not to
  build any other part of model)
Structure

 Datamart, DW, cube
   Data source
     Mining Structure (which fields)
       Mining Models (algorithms, attributes)
         Viewers (tree, clusters, discrimination, classification)
Training the model

 SSIS Percentage Sampling Data Flow
  Component
 Training, Testing
 Estimating error
Demos

 Visual Studio
 SSMS
 Win Client
 Web Client
Miscellaneous

 Sequence or timing
 Prediction + measure of confidence
 Caution: Over-fitting the model
 Nested tables ex: transactional detail data
   Key is never foreign key to case table
   Key is what table is about
References
   http://dean-o.blogspot.com/
   http://abbottanalytics.blogspot.com/
   http://www.thearling.com/umass/index_frame.htm
   http://www.thearling.com/text/dmtechniques/dmtechniques.htm
   MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise
   http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M
    ining%20Web%20Controls%20Library
   http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele
    aseId=34035
   Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and
    Stephen Forte – Chapter 20
Thank you!

 Website
   http://www.systemental.com
 Blogs
   http://dean-o.blogspot.com/
   http://practicalhoshin.blogspot.com
 Twitter
   http://www.twitter.com/deanwillson
 Email
   dean@systemental.com
 LinkedIn
   http://www.linkedin.com/in/deanwillson

Contenu connexe

Similaire à Data Mining for Stimulus Program Design

Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsOlha Hrytsay
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALMark Tabladillo
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...DataWorks Summit
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Presentation Title
Presentation TitlePresentation Title
Presentation Titlebutest
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013nkabra
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...Big Data Week
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008llangit
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalHarvinder Atwal
 

Similaire à Data Mining for Stimulus Program Design (20)

KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16KNIME Meetup 2016-04-16
KNIME Meetup 2016-04-16
 
Building Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data PlatformsBuilding Predictive Analytics on Big Data Platforms
Building Predictive Analytics on Big Data Platforms
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
BDW16 London - Scott Krueger, skyscanner - Does More Data Mean Better Decisio...
 
Mastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and AnalysisMastering MapReduce: MapReduce for Big Data Management and Analysis
Mastering MapReduce: MapReduce for Big Data Management and Analysis
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder AtwalDataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
DataOps - Big Data and AI World London - March 2020 - Harvinder Atwal
 

Plus de Dean Willson

Intro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoIntro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoDean Willson
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessDean Willson
 
Automating sql server daily health checks
Automating sql server daily health checksAutomating sql server daily health checks
Automating sql server daily health checksDean Willson
 
Visual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsVisual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsDean Willson
 
Intro to Powershell
Intro to PowershellIntro to Powershell
Intro to PowershellDean Willson
 
Continuous improvement in a professional organization
Continuous improvement in a professional organizationContinuous improvement in a professional organization
Continuous improvement in a professional organizationDean Willson
 
Database Source Control
Database Source ControlDatabase Source Control
Database Source ControlDean Willson
 
Career Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesCareer Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesDean Willson
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Dean Willson
 
Implementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityImplementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityDean Willson
 
Introduction to SSRS Report Builder
Introduction to SSRS Report BuilderIntroduction to SSRS Report Builder
Introduction to SSRS Report BuilderDean Willson
 
Designing For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDesigning For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDean Willson
 

Plus de Dean Willson (12)

Intro to the Internet of Things using Netduino
Intro to the Internet of Things using NetduinoIntro to the Internet of Things using Netduino
Intro to the Internet of Things using Netduino
 
Index Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for SuccessIndex Reorganization and Rebuilding for Success
Index Reorganization and Rebuilding for Success
 
Automating sql server daily health checks
Automating sql server daily health checksAutomating sql server daily health checks
Automating sql server daily health checks
 
Visual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity ToolsVisual Studio 2012 Productivity Tools
Visual Studio 2012 Productivity Tools
 
Intro to Powershell
Intro to PowershellIntro to Powershell
Intro to Powershell
 
Continuous improvement in a professional organization
Continuous improvement in a professional organizationContinuous improvement in a professional organization
Continuous improvement in a professional organization
 
Database Source Control
Database Source ControlDatabase Source Control
Database Source Control
 
Career Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers SeriesCareer Transitions - Ball State University, Six Sigma Speakers Series
Career Transitions - Ball State University, Six Sigma Speakers Series
 
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)Introduction to SQL Server 2008 Management Data Warehouse (MDW)
Introduction to SQL Server 2008 Management Data Warehouse (MDW)
 
Implementing ASP.NET Role Based Security
Implementing ASP.NET Role Based SecurityImplementing ASP.NET Role Based Security
Implementing ASP.NET Role Based Security
 
Introduction to SSRS Report Builder
Introduction to SSRS Report BuilderIntroduction to SSRS Report Builder
Introduction to SSRS Report Builder
 
Designing For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps SlideshareDesigning For Occasionally Connected Apps Slideshare
Designing For Occasionally Connected Apps Slideshare
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Data Mining for Stimulus Program Design

  • 1. presented to fwPASS on 1/26/2010 DATA MINING – A BETTER WAY TO DESIGN A STIMULUS PROGRAM LIKE “CASH FOR CLUNKERS”
  • 2. About Me  Work for Systemental as a Consultant and Software Developer  Software development to support Corporate business process improvement since 2000 (Lean or Continuous Improvement Initiatives)  .Net since 2004  President, fwPASS.org  Mfg. Eng. Technology degrees from Ball State University  Six Sigma Black Belt, Certified
  • 3. What We Will cover  Data mining – what is it?  “Cash for Clunkers”  Other examples  Amazon.com  Coke Freestyle  Basic Data Mining Concepts  Demo time
  • 4. Wikipedia Data mining is the process of extracting patterns from data. Data mining is becoming an increasingly important tool to transform these data into information. It is commonly used in a wide range of profiling practices, such as marketing, surveillance, fraud detection and scientific discovery.
  • 5. Cash for Clunkers Columbia City: SR 30 & SR 9
  • 6. Objectives of “Cash for Clunkers”  Jump start automotive sector sales  Specifically higher mileage vehicles  Get gas guzzlers off the street
  • 7. Cash for Clunkers  How did they decide who to target and how?  How would you do it?  Where did the data come from?  Where should the data come from?
  • 8. Who to target?  Anyone, everyone, or targeted  Self qualified  Organic growth or just “pull up” existing sales  Convert foreign sales to GM  Conflict of interest? – Government motors  Discriminatory?
  • 9. Estimating the effectiveness  Affect of “pull up” vs. organic growth  Peripheral commercial effect  Estimation of payback  Sales, plates and excise tax  Income tax from lay-off recalls  Reduction of unemployment  Auto Insurance  Reduction in tax revenue at gas pumps
  • 10. Data content and source  Public records  CAFE  GM Data  Industry sponsored studies
  • 12. SQL Server 2005 Data Mining  Nine algorithms (3rd party pluggable)  Both Modeling and exploration in VS  Integrated tools: SS*S  API  Data Mining Extensions to SQL (DMX)
  • 13. Type of analysis  Optimization vs. Predictive  Descriptive – provides deeper understanding of existing data  Predictive – provides insight to understand probability of future conditions
  • 14. Data Mining Objective  Classification – assign data to known classes (discrete)  Segmentation – clustering in similar groups  Estimation – predicting continuous values  Association – what events occur together  Forecasting – time series estimating of future
  • 15. Algorithms 1. Decision Trees (attributes from the tree) 2. Naive Bayes (uses all attributes) 3. Clustering 4. Linear Regression 5. Logistic Regression 6. Neural Nets 7. Sequence Clustering 8. Time Series 9. Association Rules (discrete only)
  • 16. DMX  Column syntax: Name, data type, content type, [usage]  Case being analyzed – key  Content type: key, key sequence, key time, discrete, continuous, discretized (# of buckets)  Usage: Input, predict, predict-only (not to build any other part of model)
  • 17. Structure  Datamart, DW, cube  Data source  Mining Structure (which fields)  Mining Models (algorithms, attributes)  Viewers (tree, clusters, discrimination, classification)
  • 18. Training the model  SSIS Percentage Sampling Data Flow Component  Training, Testing  Estimating error
  • 19. Demos  Visual Studio  SSMS  Win Client  Web Client
  • 20. Miscellaneous  Sequence or timing  Prediction + measure of confidence  Caution: Over-fitting the model  Nested tables ex: transactional detail data  Key is never foreign key to case table  Key is what table is about
  • 21. References  http://dean-o.blogspot.com/  http://abbottanalytics.blogspot.com/  http://www.thearling.com/umass/index_frame.htm  http://www.thearling.com/text/dmtechniques/dmtechniques.htm  MSDN webcast: Applying SQL Server 2005 Data Mining to Enterprise  http://msftasprodsamples.codeplex.com/wikipage?title=SS2005!Data%20M ining%20Web%20Controls%20Library  http://msftasprodsamples.codeplex.com/Release/ProjectReleases.aspx?Rele aseId=34035  Programming SQL Server 2005, Microsoft Press, Andrew J. Brust and Stephen Forte – Chapter 20
  • 22. Thank you!  Website  http://www.systemental.com  Blogs  http://dean-o.blogspot.com/  http://practicalhoshin.blogspot.com  Twitter  http://www.twitter.com/deanwillson  Email  dean@systemental.com  LinkedIn  http://www.linkedin.com/in/deanwillson