4. Resources Microsoft Visual Studio 2008(NOT 2010) SQL Server 2008 (NOT Express Edition) MSSQL Server Community Projects & Sampleshttp://www.codeplex.com/SqlServerSamples Adventure Works Databasesfor SQL Server 2008http://msftdbprodsamples.codeplex.com/ Adventure Works Sample Data Warehouse Documentationhttp://technet.microsoft.com/en-us/library/ms124623(SQL.90).aspx SQL Authority Adventure Works Tutorialhttp://blog.sqlauthority.com/2008/08/10/sql-server-2008-download-and-install-samples-database-adventureworks-2005-detail-tutorial/
5. Adventure Works Example Database of fictional companynamed „Adventure Works“ SSAS Integration (SQL Server Analysis Services) Finance Franchises Currency Rates (daily exchange rates) Sales Reseller Contracts 5
6. Available Scenarios DM/DW Scenarios Mining Szenarios Forecasting Bikes by Region/Time Targeted Mailing Campaign Algorithmsfordemographicdata Age, Region, Volume, etc. Market Basked Analysis „suggesting a product“ Sequence Clustering 6
7. Available Scenarios OLAP Scenarios Financial Reporting Actual versus Budget ProductProfitability Analysis Sales Force Performance Trend/Growth Analysis Promotion Effectiveness Source: http://msdn.microsoft.com/en-us/library/ms124623.aspx 7
8. Adventure Works Data Warehouse Data from OLTP DB + Additional „External“ Datasource Synchronization via available SSIS Packages Copy of actual (live) data Can bechanged, mergedformining 8
11. Data Mining Applied with AW DB Read andtryit out!!! Preparation 1. Get Visual Studio 2008 2. Get SQL Server 2008 3. InstallAdventure Works Database (DW) Homework http://msdn.microsoft.com/en-us/library/ms167167.aspx
12. Data Mining Applied with AW DB Don‘tgetconfused* “SQL Server Business Intelligence Development Studio” is the combination of Microsoft Visual Studio 2008 + SQL Server 2008 (not Express) + with Feature “Business Intelligence” (*For the first time everybody is confused here! )
15. View: vTargetMail 15 -- vTargetMail supports targeted mailing data model -- Uses vDMPrep to determine if a customer buys a bike and joins to DimCustomer CREATE VIEW [dbo].[vTargetMail] AS SELECT c.[CustomerKey], -- [...] CASE x.[Bikes] WHEN 0 THEN 0 ELSE 1 END AS [BikeBuyer] FROM [dbo].[DimCustomer] c INNER JOIN(SELECT [CustomerKey],[Region],[Age] ,Sum(CASE [EnglishProductCategoryName] WHEN 'Bikes' THEN 1 ELSE 0 END) AS [Bikes] FROM [dbo].[vDMPrep] GROUP BY [CustomerKey],[Region],[Age]) AS [x] ON c.[CustomerKey] = x.[CustomerKey]; GO
18. AlgorithmOverview Used to identify relationships Column 1, Column 2, Column 3 Most cases: 4 Steps Analyze Create Model (Training) Verify Model (Testing) Predict Future Data 18
19. DecisionTrees Also: ClassificationTrees Partition Data Can detect non-linear relationships Machine Learning Technique Sepearateinto Training andTestingset Training setiscreatedtocreate model based on certaincriteria Test setisusedtoverifythe model 19
20. DecisionTrees: Example 20 Income > $30 000: 3,6 % Male 3,0% Income < $30 000: 2,3 % 2,6 % respose rate Age > 40: 3,8% Female 2,9% Age < 40: 3,2 % TrainedTree Males: $30 000 Response Rate: > 3,5 % Female: 40+
21. Pros andCons of DecisionTrees 21 Pros Very flexible, white box Model Occams Razor: Kiss – Keep it simple, stupid! Little preparation and resources needed Cons Can be tuned until death Long time to build Wisley select training data False training yields false results Big tree might require disk swapping