Data Mining – analyse Bank Marketing Data Set by WEKA.
1. Data Mining – analyse Bank Marketing
Data Set by WEKA.
Author: Mateusz Brzoska
ID: M*********
Supervisor: Daming Shi
24/04/2015
Middlesex University 2015
A thesis submitted in partial fulfilment of the requirements for the degree of Bachelor of Science.
2. Table of Contents
1. Knowledge Discovery in Databases ........................................................................................ 4
2. Data Mining............................................................................................................................. 7
2.1. Overview .................................................................................................................................. 7
2.2. Data Mining Methods ............................................................................................................... 8
2.3. WEKA Methods ....................................................................................................................... 11
2.4. The problems of knowledge discovery ................................................................................... 16
3. WEKA Software ..................................................................................................................... 18
4. Bank Marketing Data Set ...................................................................................................... 21
4.1. Description of Data Set ........................................................................................................... 21
4.2. Cleaning Data Set .................................................................................................................... 22
4.3. Visualization of Data Set and Examining Data........................................................................ 25
4.4. Discovering potentially useful patterns from a data set ........................................................ 28
5. Conclusion ............................................................................................................................. 43
6. References............................................................................................................................. 46
3. Abstract
Our lives such as network and computers are filled giant volumes of data. Huge funds are sacrificed
for collecting and storing data by scientific institutions, businesses, and government agencies. Only a
small amount of these data will ever be used. Data structures are very often too complex to analyze
them effectively or too big to manage them. In the corporate and business world, the customer data
are becoming recognized as a strategic asset. It is becoming more and more important in today’s
competitive world to have the ability to extract useful knowledge hidden in these data and to operate
on that knowledge. Increasingly large amounts of data generated by the systems are caused by the
increasing use of computers. Decision Support System (DSS) is information system, also called Data
Mining, deals with discovering new, interesting and useful patterns and relationships between them,
to solve problems with plenty volumes of data. Exploitation of data is directed in order to establish a
general knowledge of the group rather than knowledge about specific individuals - though pattern
analysis also may be used to recognize anomalous individual behaviour such criminal activity. Since
data mining is a natural activity to be executed on large data sets, one of the biggest target markets is
the entire data - warehousing, data - mart, and decision – support community, include professionals
from such industries as manufacturing, telecommunications, retail, health care, transportation and
insurance. In the computer industry, data mining already is the fastest growing field. The greatest
strengths of data mining are reflected in its wide range of techniques and methodologies that can be
applied to a host of problem sets. The data mining has not only survived but matured and adapted for
practical use in the business world. To study techniques and methodologies in data mining that can
be applied to gain specific goals The project will focus on Data Mining as one of process Decision
Support Systems, which collecting and discovering knowledge from data. It will show the techniques,
algorithms and rules used to achieve certain goal from Bank Marketing Data Set. The WEKA software
will be used to show how to analyse data and it will explain many kinds of data mining techniques
used into the project.
Aims
1. To study techniques and methodologies in data mining that can be applied to gain specific goal:
which is predict if the client will subscribe (yes/no) a term deposit (variable y).
2. To analyse a data set of interest for clustering, classification, learning dependencies and
prediction, using algorithms such as k-means, soft k-means, and decision trees.
3. To process the data and achieve the final satisfactory result.
Objectives
1. To study Knowledge Discovery in Database (KDD) as the process of discovering useful patterns and
knowledge from data sources.
2. To understand the need for analyses of large, complex, information - rich data sets.
3. To describe process decision-making based on gaining of knowledge from data mining.
4. To provide essential information about patterns, techniques and methods.
5. To demonstrate relevant algorithms onto techniques and to operation on data.
6. To prepare results and conclusions.