Key Principles Of Data Mining

1. Key Principles of Data Mining Presentation by Tobie Muir (Data-Decisions) Henry Stewart Briefing: An Introduction to Marketing Analytics London, 23rd June 2010

3. These datasets can be incomprehensibly large – too large to analyse without the aid of computer-driven processes.

4. The role of data mining is to introduce (semi) automated computer-driven processes and statistical techniques, to extract meaningful patterns from such data with the goal of improving the business in question. A classic example in marketing is using DM insights to achieve revenue with less marketing budget.

5. For very large datasets data mining can focus on a sample within a dataset – instead of analysing millions (billions!) of records, which can be computationally expensive / slow – we analyse a subset of this data in the hope that patterns prevalent in the subset also apply to the entire dataset.

6. Careful analysis is then required to determine whether any patterns found are meaningful: they could be spurious, coincidental, or it may be such a pattern is only found in the subset. 2 Copyright © 2010 Data-Decisions Ltd

8. Business intelligence tools can also encompass the extraction, storage, visualisation and distribution of business information, not just the analysis of business data.

10. Online analytical processing (OLAP)

11. Statistical analysis and forecasting

15. Optimise Best Media Mix

17. Cross-Sell

19. The CRISP data mining process CRISP stands for Cross-Industry Standard Process for Data Mining Developed by the CRISP-DM consortium, consisting of DaimlerChrysler (formally Daimler-Benz), SPSS (formally ISL), and NCR. The idea was to standardise the process of data mining across the industry – a common pattern for the process of data mining was established among all collaborators, and CRISP-DM was also a mechanism to introduce uniform terminology and differentiation. CRISP-DM 1.0 was rolled out in Aug 2000, including detailed documentation To the right is the standard six-part CRISP model for how the data mining process occurs from this document: The model highlights the relationships and interdependencies between all 6 phases – the data mining process is one that is dynamic 6 Copyright © 2010 Data-Decisions Ltd

20. The CRISP data mining processPhase 1 and 2 1. Business understanding We begin by understanding the requirements of the project from the business perspective – what does the company in question want to achieve/ get out of this? What are the priorities? How will we the measure outcome? We conclude this phase by producing a preliminary (phase) plan to tackle the established objectives. 2. Data understanding The data understanding phase has two broad aims. The first is to test the data (on which the analysis will be based) in order to identify any quality issues. The second is to try and discover any initial insights into the data that might provide any additional meaningful information. Some basic data visualisation – scatter plots, bar charts, distribution analysis is a great way to get to grips with the data, spot any immediate patterns, as well as test the general data sufficiency, which leads logically onto the next phase, Data Preparation. 7 Copyright © 2010 Data-Decisions Ltd

21. The CRISP data mining process Phase 3 and 4 3. Data preparation The data preparation phase does exactly as its name suggests: this is the phase when the initial (raw) data is modified to produce the final dataset upon which the analysis will take place. Data preparation covers all activities that turn the raw data into the final dataset, ready for the modelling phase, including merging separate datasets and further data pooling, table/record/attribute selection, missing values imputation, data cleaning and spurious data removal and transformation. It is also advisable to consider how to partition the data into modelling and testing segments (typically on a 70/30 split, depending on data volumes). Data preparation, in my experience, is the most time consuming, but absolutely ESSENTIAL, phase out of the entire CRISP process. 8 Copyright © 2010 Data-Decisions Ltd

22. The CRISP data mining process Phase 3 and 4 4. Modelling The modelling phase is the heart of the CRISP model. This is the point when we take the modified dataset and apply (typically) several modelling techniques. We would want to use several techniques as no single technique is perfect, and the range of results gathered should overcome the limitations of any one particular model. There is some interaction between phases 3 and 4: different techniques may require the data in different forms, and so it may be necessary to prepare the data in multiple ways to prep it for the various models. We will cover some of the different modelling techniques later in the presentation. 9 Copyright © 2010 Data-Decisions Ltd

23. The CRISP data mining process Phase 5 5. Evaluation There are many different techniques and methods for evaluating the models created during the modelling phase. First and foremost you are looking to compare the model error rates, or inversely, the model accuracy rates – this is estimated by how well the models perform on the test data (data that was omitted during the model building phase). There are a number of ways to measure this, but most methods simply amount to providing a score that allows you to choose the model with the lowest error rate. Lift charts provide a very effective way to visualise and compare model performances over the test set. This is also a good way to access whether you may need to combine models together to arrive at an overall better solution. 10 Copyright © 2010 Data-Decisions Ltd

25. The CRISP data mining process Phase 6 6. Deployment The deployment phase consolidates the results that the Model produces in a form that is useable to the customer. It could be that the data mining exercise was undertaken with the aim of simply increasing the knowledge of the data, but even in this restricted remit, and more generally, any knowledge gained from the exercise must be presented in a way that is of use to the customer. Depending on the nature of the data mining project undertaken, the deployment phase can vary from being simply a report generated all the way through to implementing a repeatable data mining process across the enterprise. It is not unusual for the customer to perform the deployment phase (as opposed to the data analyst), and in either case it is important that the customer understands the actions that need to be carried out in order to make best use of the models created. 12 Copyright © 2010 Data-Decisions Ltd

27. Artificial neural nets

28. K-nearest neighbour

29. Support vectors

30. Linear regression

31. Logistic regression

32. Discriminant analysis

34. Artificial neural nets

35. Conceptual clustering

39. Models need to be evaluated to see that the results produced are compatible with the project objectives.

41. Conclusion “Data mining is the process of finding patterns in your data which you can use to do your business better” Data mining is a subset of a much larger sphere known as Business Intelligence, which includes data parsing, visualisation, OLAP and data warehousing Advanced analytics encompasses Data Mining but also includes non-customer focussed activities that require mathematical and statistical approaches CRISP is an established proven Data Mining framework Key emphasis in Data Mining must be on understanding – also never underestimate the importance or amount of work involved in data mining No model is ever perfect and is only the starting point for future iterative improvements 16 Copyright © 2010 Data-Decisions Ltd

43. http://msdn.microsoft.com/en-s/library/ms175428(SQL.100).aspx

44. http://fbhalper.wordpress.com/2010/01/04/five-predictions-for-advanced-analytics-in-2010/

46. Applied Data Mining: Statistical Methods for Business and Industry (Paolo Giudici)

47. Data Mining Techniques: for Marketing, Sales and Customer Relationship Management (Berry and Linoff)Tobie Muir (Managing Director) E. tobie@data-decisions.co.uk T. 0208 144 7422 /07903 525358 W. data-decisions.co.uk 17 Copyright © 2010 Data-Decisions Ltd

Key Principles Of Data Mining

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Key Principles Of Data Mining

Similaire à Key Principles Of Data Mining (20)

Dernier

Dernier (20)

Key Principles Of Data Mining

Notes de l'éditeur