SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
411
A Simulated Decision Trees Algorithm (SDT)
Mohamed H. Farrag1
, Maha M. Hana2
, Mona M. Nasr2
1
Theodor Bilharz Research Institute, Ministry of Scientific Research, Cairo, Egypt
eng.m.hassan@gmail.com
2
Faculty of Computers and Information, Helwan University, Cairo, Egypt
mahana_eg@yahoo.com, m.nasr@helwan.edu.eg
Abstract — The customer's information contained in
databases has increased dramatically in the last few years.
Data mining is a good approach to deal with this volume of
information to enhance the process of customer services.
One of the most important and powerful techniques of data
mining is decision trees algorithm. It appropriate for large
and sophisticated business area but it's complicated, high
cost and not easy to use by not specialists in the field. To
overcome this problem SDT is proposed which is a simple,
powerful and low-cost proposed methodology to simulate the
decision trees algorithm for different business scopes and
nature. SDT methodology consists of three phases. The first
phase is the data preparation which prepare data for
computing calculations, the second phase is SDT algorithm
which represents a simulation of decision trees algorithm to
find the most important rules that distinguish specific type of
customers, the third phase is to visualize results and rules for
better understanding and clarifying the results. In this paper
SDT methodology is tested by a dataset consists of 1000
instants for German Credit Data belongs to one of German
bank customers. SDT selects the most important rules and
paths that reaches the selected ratio and tested cluster of
customers successfully with interesting remarks and finding.
Keywords: Customer relationship Management, Data Mining,
Decision Trees, Customer services
I. INTRODUCTION
Simulated Decision trees methodology SDT is a
simple and powerful form of multiple variable
analyses and classification methodology that aims to
classifying customer data to produce the most
important characteristics that distinguish specific
type of customers.
This Paper is a trial to presents SDT methodology
that contains all decision trees features but at the
same time is suitable for small and medium
organization scopes, different business fields, ease of
use even by not specialists in the field for
distinguishing customers' clusters to enhance
customer services process.
This Paper is organized as follows: section two
demonstrates the related work, section three
discusses SDT methodology and section four
explains the experiment. While the results and the
evaluation of SDT methodology are indicated in fifth
section, finally conclusion and future work are in the
last section.
II. RELATED WORK
Knowledge Discovery in Database (KDD) is the
non-trivial process of identifying valid, potentially
useful, and ultimately understandable patterns in
data. The knowledge discovery process comprises
six phases as Data selection, Data cleansing,
Enrichment, Data transformation or encoding,
Appling data mining tools and Displaying of the
discovered information.
Customer Relationship Management (CRM) is
defined as: "Communication with the customer,
developing products and services that customers
need, and selling or supporting them in manners that
customer’s desire"[1]. CRM is an Enterprise
approach to understanding and influencing customer
behavior through meaningful communications in
order to improve customer acquisition, customer
retention, customer loyalty, and customer
profitability to build a profitable long term
relationship with the customers in the field of
marketing, sales, customer services and support.
The CRM framework classified in two
classifications the first one is operational
classification which is according to the business
automation. While the second classification is the
analytical classification which is according to the
customer characteristics and behaviors to help an
organization to effectively allocate resources to the
most profitable group of customers or the target
customers.
The CRM framework has four dimensions the
first one is customer identification which concern
with target customer analysis and customer
segmentation, second is customer attraction which
concern with direct marketing, the third is customer
retention which concern with one to one marketing
412
and loyalty programs while customer development is
the fourth one concern with customer lifetime value
analysis, up cross selling and market basket analysis.
Organization need to discover the hidden
knowledge in the stored data to use it to acquire and
retain potential customers and maximize customer
return value that could be by using data mining tools
which could help the organization to better
discriminate and effectively allocate resources to the
most profitable group of customers. Also, Data
Mining is important tool to transform customer's
data into meaning patterns to help in predicting and
distinguishing different customer clusters. Data
Mining is defined from different views and scopes.
The first definition is "A branch of the applied
informatics, which allows us to sift through large
amounts of structured or unstructured data in attempt
to find hidden patterns and/or rules". The second
definition is "The data mining methods as tools for
searching databases with special algorithms to
identify general patterns which can be used in the
classification of the individual observations and
making predictions". Another definition is "Data
mining is the search for valuable information in large
volumes of data” [1].
Data mining has several roles in business
process. First role is association which aims to
establishing relationships between items which exist
together in a given record. Second role is
classification which use for building a model to
predict future customer behaviors through
classifying customer data into a number of
predefined classes based on certain criteria. Third
role is clustering which is to segment a
heterogeneous population into a number of more
homogenous clusters. Fourth role is forecasting
which estimates the future value based on a record’s
patterns. It deals with continuously valued outcomes.
Fifth role is regression which is a kind of statistical
estimation technique used to map each data object to
a real value provide prediction value. Sixth role is
sequence discovery which is concerned with
identification of associations or patterns over time.
Seventh role is visualization which can be defined as
the presentation of data so that users can view
complex patterns.
Data Mining Algorithms the previous data
mining model could be used through the next
algorithms: Association Rule is defined as "a way to
find interesting associations among large sets of data
items". Also can be defined as "if /then statements
that help to uncover relationships between seemingly
unrelated data in a relational database or other
information warehouse". An association rule has two
parts, an antecedent (if) and a consequent (then). An
antecedent is an item found in the data. A
consequent is an item that is found in combination
with the antecedent [5].
Decision Trees structure is a hierarchy of
branches within branches that produces the
characteristic inverted decision trees structure form.
The nested hierarchy of branches is called a Decision
Tree. Decision Trees structure is simple, although
powerful form of multiple variable analyses. They
provide unique capabilities to supplement, to
complement, and to substitute a variety of data
mining tools and techniques [4].
Genetic Algorithms (GAs) are adaptive heuristic
search algorithm premised on the evolutionary ideas
of natural selection and genetic. Artificial Neural
Networks (ANN) is an information processing
pattern. The key element of this model is the novel
structure of the information processing system. It is
composed of a large number of highly
interconnected processing elements working in
harmony to solve specific problems.
K-Nearest neighbor (K-NN) is a nonparametric
method in that no parameters are estimated, output
variables can either be interval variables in which
case the K-NN algorithm is used for prediction while
if the output variables are categorical, either nominal
or ordinal.
Linear discriminate (LDA) / logistic regression
(LR) they are widely used multivariate statistical
methods for analysis of data with categorical
outcome variables. Both of them are appropriate for
the development of linear classification models. The
difference between the two methods is LR makes no
assumptions on the distribution of the explanatory
data; LDA has been developed to give better results
in the case when the normality assumptions are
fulfilled.
413
Preperation
Phase
Algorithm
Phase
Visualize
results
phase
Data Mining has two methods depends on the
nature of use and nature of results. The first is
Predictive Methods is depending on using some
variables to predict unknown or future values of
other variables. The second is Descriptive Methods
is depending on find human-interpretable patterns
that describe the data.
III. SIMULATED DECISION TREES
METHODOLOGY
The proposed methodology is a simulation of
decision trees structure deals with multiple variable
datasets to identify the most important customer
characteristic that distinguish specific type of
customers featured with visualization aid to visualize
results.
Figure (1): SDT Methodology Phases
A. Preparation Phase:
Figure (2): Preparation phase
Preparation phase consists of three steps as shown in
Figure (2).
1. Preprocessing Data
This step concerns with data preparation which aims
to adopt the dataset values to be suitable for
computer calculations. It numerates the dataset
attribute, revises the data for odds values and missed
values and divides them into 2:4 groups according to
the range of its values and the density distribution of
each group.
2. Enter Attributes and Its Sets
This step concerns with entering the attributes names
of the data set and its short names to use by the
algorithm then enter the ranges for each set in each
attribute.
3. Enter Experimental Values
This Step has the following steps:
- Step1: Selects the needed attributes to apply
algorithm on them.
- Step2: Selects the main attribute to choose one of
its sets as a start node.
- Step3: Selects the start set node from the main
attribute sets which selected in step2.
- Step4: Enters tested ratio "R" which algorithm
compares and ranks results paths and rules
according to its value.
- Step5: Enters studied case value in numeric
format represents in "C" which algorithm
generates calculation using this studied case
instants in the dataset.
B. A simulated Decision trees algorithm phase:
SDT algorithm is a proposed algorithm consists of
six procedures as shown in Figure (3):
Figure (3): Algorithm Procedures
- Procedure1: Generate Sets Paths Seeds:
In this step algorithm searches for the sets for each
selected attribute and then generate the seeds for
each probably path starting with the selected node
using these attributes.
- Procedure2: Generate Potential Paths:
In this step algorithm generates all possibly paths
and rules from the previous paths seeds, then remove
the repetitive sets and eliminate the redundancy of
sets to the same attribute in each path to allow one
set for each attribute in each path.
- Procedure3:Generate Query Statements:
In this step algorithm generates the selected query
statements for each path using attributes names and
their sets.
Preparation
Phase
Preprocessing
data
Enter attributes
and its sets
Enter
expermintal
values
Generate sets paths seeds
Generate probably paths
Generate query statements
Adding Condition for studied case
Generate ratio
Select valid paths
414
- Procedure4: Adding Condition for Selected
Studied Case:
In this step algorithm uses the previous selected
query statement and gives them the conditions for
studied case value on the sample dataset, then
calculate the number of instances for each path.
- Procedure5: Generate ratio:
In this step algorithm generate the ratio for each path
by dividing the number of instances which was
founded in the dataset for each path represents n and
total number of instances in sample dataset for this
type of customers represents N.
- Procedure6: Select valid paths:
In this step algorithm represents only the paths that
have ratio greater than or equal tested ratio "R".
C. Visualize result phase
This phase aims to visualize results by display and
draw the selected path from all paths that have ratio
greater than or equal tested ratio "R".
IV. EXPERIMENT
Data Set Description
The algorithm uses a dataset for German Credit
Data, source information is Professor Dr. Hans
Hofmann, Institute for Statistics, Hamburg
University. This dataset contains 1000 instants
divided in two clusters of customers. Clustering is
based on measuring the risk of customers. The first
cluster is "Low-Risk cluster" that has 700 instants
while the second cluster has 300 instants for "High-
Risk cluster". The data set has twenty attributes
divided in two statuses of attributes. First status is 7
numerical attributes and second status is 13
qualitative attributes. These attributes are different in
nature and scope. The attributes names and nature
are shown in Table (1).
Table (1): Attributes Names and Status
# Attributes Status
1 Status of existing checking account qualitative
2 Duration in month numerical
3 Credit history qualitative
4 Purpose qualitative
5 Credit amount numerical
6 Savings account/bonds qualitative
7 Present employment since qualitative
8
Installment rate in percentage of disposable
income
numerical
9 Personal status and sex qualitative
10 Other debtors / guarantors qualitative
11 Present residence since numerical
12 Property qualitative
13 Age in years numerical
14 Other installment plans qualitative
15 Housing qualitative
16 Number of existing credits at this bank numerical
17 Job qualitative
18
Number of people being liable to provide
maintenance for
numerical
19 Telephone qualitative
20 foreign worker qualitative
A. Data preparation phase
This phase aims to detect and classify dataset
attributes, then assign observations values from
dataset to each attribute and clear dataset if any
missed values.
1. Preprocessing Data
This step consists of three main steps to insure that
each used attribute is a qualitative attribute. This
attributes having 2:4 sets according to its values and
the density distribution of each group instants. The
numerical attributes which needs to convert to
qualitative attribute are shown in Table (2)
Table (2): Numerical Attributes
- Step1: Converts the numerical attributes to
qualitative one.
# Attributes Set Set min and max value
2
Duration in
month
A301 ... <= 12
A302 13 ... <= 24
A303 25 ... <= 36
A304 >= 37
3 Credit history
A30
no credits taken/all credits paid back
duly
A31 all credits at this bank paid back duly
A32
existing credits paid back duly till
now
A33 delay in paying off in the past
A34
critical account/other credits existing
(not at this bank)
5 Credit amount
A305 <= 5000
A306 5001 ... < 12000
A307 > 12000
10
Other debtors /
guarantors
A101 none
A102 co-applicant
A103 guarantor
13 Age in years
A316 ... < 20 year
A317 20 <= ... < 35 years
A318 35 <= ... < 50 years
A319 .. >= 50 years
17 Job
A171 unemployed/ unskilled - non-resident
A172 unskilled - resident
A173 skilled employee / official
A174
management/ self-employed/ highly
qualified employee/ officer
Attributes Status
Duration in month numerical
Credit amount numerical
Installment rate in percentage of disposable
income
numerical
Present residence since numerical
Age in years numerical
Number of existing credits at this bank numerical
Number of people being liable to provide
maintenance for
numerical
415
- Step2: Divide the new attributes into 2:4 groups
according to the range of its values and the
density distribution of each group.
- Step3: Replace the numeric values within the new
qualitative values in the dataset.
Table (3): Example of Dataset Attributes and its sets after
Preprocessing Data and Converting Numerical Attributes
2. Enter Attributes and its Sets
This step aims first to enter the dataset attributes,
attributes short names and attributes codes to use by
the algorithm. For example, the attribute "Age in
Years" its code is "13" and its short name is "AIY".
Second to enter the attributes sets names, its short
name and ranges as shown in Table (4).
Table (4): Example for "Age In Years" Attribute Sets
Set Range
A316 ... < 20 year
A317 20 <= ... < 35 years
A318 35 <= ... < 50 years
A319 .. >= 50 years
3. Enter Experimental Values
This step has five steps to ensure that all needed
parameters and inputs to algorithm are clearly
defined as:
- Step1
This step aims to sets the needed attributes from the
attributes list to run the algorithm using these
attributes and its sets. For example, the selected
attributes in this experimental are shown in Table
(5).
Table (5): Selected Attributes To Test by The Algorithm
Selected Attributes
Credit Amount
Credit History
Duration In month
Housing
- Step 2
Step2 aims to select the main attribute to select one
of its sets as a start node. For example, the selected
attribute is "Job".
- Step3
Step3 aims to select the start set node from the "job"
attribute sets to be the start node in the algorithm.
For Example, the selected set from "Job" attribute is
set "A173" which refers to "skilled employee /
official"
- Step4
Step4 aims to enter the selected ratio represents in
"R" that all accepted probably paths must get ratio
greater than or equal it by dividing the number of
instants belongs this path and how many instants of
them belongs to the tested case. For example, the
selected ratio is 65%.
- Step5
Step5 aims to enter the tested case represents in "C"
in numeric format as 1, 2 or 3 to represent the
selected cluster to perform calculation on its instants.
For example, the tested case is "1" which belongs to
"Low Risk Customers".
B. A simulated Decision Trees Algorithm
Procedures SDT:
- Procedure1: Generate sets paths seeds:
This procedure aims to gets the sets for each selected
attribute as shown in Table (6) and set the start node
as shown in Table (7).
Table (6): Selected Attributes Sets
Attribute Set Set range
Credit amount
A305 <= 5000
A306 5001 ... < 12000
A307 >= 12000
Credit history
A30
no credits taken/all credits
paid back duly
A31
all credits at this bank paid
back duly
A32
existing credits paid back duly
till now
A33 delay in paying off in the past
Duration In month
A301 ... <= 12
A302 13 ... <= 24
A303 25 ... <= 36
A304 >= 37
Housing
A151 rent
A152 own
A153 for free
Table (7): Start set node.
Attribute Set Set range
Job A173 skilled employee / official
- Procedure2: Generate Potential Paths:
This procedure aims to generate all probably paths
from the previous paths seeds starting with the
selected node using these attribute, second remove
the repetitive sets and eliminate the redundancy of
sets to the same attribute in each path to allow only
one set for each attribute in each probably path as
416
shown in Table (8) by completing this procedure the
algorithms gives 272 probably paths.
Table (8): Example of Generated Probably Paths and Rules
Path ID Attribute Name Set
1 Job A173
2
Job A173
Credit amount A305
3
Job A173
Credit amount A306
4
Job A173
Credit amount A307
5
Job A173
Credit history A30
- Procedure3: Generate Query Statements:
This procedure aims to generate the selected query
statement for each path using attributes names and
their sets to run on dataset to calculate the number of
instants belongs to each probably path as shown in
Table (9).
Table (9): Example of the Selected Query Statement for Each
Probably Path.
Id Path
1 job='A173'
2 job='A173' and Credit Amount='A305'
3 job='A173' and Credit Amount='A306'
4 job='A173' and Credit Amount='A307'
5 job='A173' and Credit History='A30'
- Procedure4: Adding condition for selected
studied case:
This procedure aims to use the previous selected
query statement and gives each of them the tested
case value then calculates the number of instances
for each path as shown in Table (10).
Table (10): Example of the Selected Query Statement for Each
Probably Path
id Paths selected query Statements for tested case
1
Count (*) from samples where job='A173' and
customer=1
2
Count (*) from samples where job='A173' and Credit
Amount='A305' and customer=1
3
Count (*) from samples where job='A173' and Credit
Amount='A306' and customer=1
4
Count (*) from samples where job='A173' and Credit
Amount='A307' and customer=1
5
Count (*) from samples where job='A173' and Credit
History='A30' and customer=1
- Procedure5: Generate ratio:
This procedure aims to generate the ratio for each
path by dividing the number of instances belongs to
this path in the dataset for the tested case and
represent it "n" and total number of instances
belongs to this path for all cases in the dataset
represents N as shown in Table (11).
Table (11): Example of the Selected Query Statement for Each
Probably Path and its Ratio
Id
Paths selected query statement for
tested case
Ratio
15 job='A173' and housing='A152' 0.75663717
241 job='A173' and housing='A152' 0.75663717
2 job='A173' and ca='A305' 0.72952381
18 job='A173' and ca='A305' 0.72952381
33 job='A173' and ca='A305' 0.72952381
- Procedure6: Select valid paths:
This procedure aims to represents only the probably
paths that have ratio greater than or equal to the
selected ratio "R"=65% as shown in Table (12).
Table (12): Example of the Accepted Paths and Rules that Have
Ratio ≥ to the Selected Ratio "R" = 65%
Path id Valid paths Ratio>=65%
146 job='A173' and ch='A34' and ca='A305' 0.8516129
9 job='A173' and ch='A34' 0.83783784
145 job='A173' and ch='A34' 0.83783784
162 job='A173' and dm='A301' and ca='A305' 0.79207921
10 job='A173' and dm='A301' 0.7902439
The algorithm selects only 43 paths from 272
probably paths are valid paths that have ratio greater
than or equal to selected ratio "R" =65%.
C. Visualize results phase
This phase aims to display and draw the selected
path from all paths that have ratio greater than or
equal to the selected ratio "R". For example drawing
the path number 162 as shown in Figure (4).
417
Figure (4): Example of Drawing Path Number 162
V. RESULTS AND EVALUATION OF SDT
Simulated Decision Trees methodology SDT is a
simple and powerful form of multiple variable
analyses and classification algorithm that aims to
classifying customer data to produce the most
important characteristic that distinguish specific type
of customers. SDT methodology has two main
characteristics. First, it doesn’t have any special
requirements and second its implementation is
flexible enough to adapt completely different data
domains and different datasets structures.
Our initial findings show some interesting results
that the field test conducted by our dataset proved
that the algorithm is very accurate and the target-
oriented campaign is very effective. Helping in
turning raw data into meaning patterns to increase
knowledge and awareness of business, enable users
to deploy that knowledge in a simple and powerful
set of human readable rules. Many benefits and
advantages will be gained by applying this work on
small and medium organizations scopes in different
business area like banks, call centers, markets and
training centers.
The ability to produce results on screen or as print
out reports, ability to make backup at end of each
procedure, it also give the user the ability to load last
experimental values and change selected ratio "R"
value or tested case "C" type, satisfy the needs and
possibilities of decision makers, ease to use, easy to
make changes and low cost.
VI. CONCLUSION AND FUTURE WORK
SDT methodology is a new proposed methodology
to produce the most important characteristic and
rules from different customers’ attributes that
distinguish specific cluster of customers. It has many
advantages first is Simplicity by giving the ability to
select the needed attributes without any restriction
and find the relationship between each other and
ability to change the selected ratio according to
business requirements. Second is Efficiency for two
reasons the first one is expressing complex
alternatives clearly. Second reason is exploring and
modifying new information from output. Third is
Visualization Aid by representing and visualizing the
decision alternatives, possible outcomes this feature
is helpful in comprehending sequential decisions and
outcome dependencies. The fourth is Harmony
comes by merging SDT with other project
management tools to enhance the process of
customer relationship management
It is a suggested to test SDT methodology using
different data domains and study its accuracy and
capacity on different scopes and natures.
REFERENCES
[1] Gramatikov, M., (2003), Data Mining Techniques and the
Decision Making Process in the Bulgarian Public
Administration, NISP Acee Conference, Bucharest, Romania.
[2] Srivastava, J. (2000 January), Data Mining for Customer
Relationship Management, ACM SIGKDD Explorations
Newsletter, Volume 1, No 2.
[3] Agrawal, D., (2007 November), Building Profitable
Customer Relationships with Data Mining, CSI Research
Journal of India.
[4] Jia-Lang Seng & T.C. Chen, (2010 December), An Analytic
Approach to Select Data Mining for Business Decision,
Taiwan Expert Systems with Applications, Volume 37, Pages
8042–8057.
[5] Giha, E. F., Singh, P.Y., & Ewe, T. H., (2006 June). Mining
Generalized Customer Profiles, AIML 06 International
Conference, Sharm El Sheikh, Egypt, Volume 6, Pages 141-
147.
[6] Hsieh, N. & Chu, K., (2009), Enhancing Consumer Behavior
Analysis by Data Mining Techniques, International Journal of
Information and Management Sciences, Volume 20, No
1, Pages 39-53.
[7] İkizler, N., & Güvenir, H., (2001), Mining Interesting Rules
in Bank Loans Data, Proceedings of the Tenth Turkish
Symposium on Artificial Intelligence and Neural Networks,
Pages 238-246.
[8] Bartok, J., Habala, O., Bednar, P., Gazak, M. & Hluchý, L.,
(2010) Data Mining and Integration for Predicting Significant
Meteorological Phenomena, ICCS 2010 Procedia Computer
Science Volume 1, No 1, Pages 37–46.
[9] Çiflikli, C., & Kahya-Özyirmidokuz, E., (2010 December),
Implementing a data mining solution for enhancing carpet
manufacturing productivity. Knowledge-Based Systems,
Volume 23, No 8, Pages 783–788.
[10] Pauray S.M., (2010), Mining top-k frequent closed item sets
over data streams using the sliding window model, Taiwan
418
Expert Systems with Applications Volume 37, Pages 6968–
6973.
[11] Kamrunnahar, M., & Urquidi-Macdonald, M., (2010 March),
Prediction of corrosion behavior using neural network as a
data Mining tool, Corrosion Science, Volume 52, No 3, Pages
669–677.
[12] Thanuja V., Venkateswarlu, B., & Anjaneyulu, G. S. G. N.,
(2011 June), Applications of Data Mining In Customer
Relationship Management. Journal of Computer and
Mathematical Sciences Volume 2, No 3, Pages 399-580.

Contenu connexe

Tendances

An effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systemsAn effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systemsijdms
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceIJDKP
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentIJDKP
 
LABELING CUSTOMERS USING DISCOVERED KNOWLEDGE CASE STUDY: AUTOMOBILE INSURAN...
LABELING CUSTOMERS USING DISCOVERED KNOWLEDGE  CASE STUDY: AUTOMOBILE INSURAN...LABELING CUSTOMERS USING DISCOVERED KNOWLEDGE  CASE STUDY: AUTOMOBILE INSURAN...
LABELING CUSTOMERS USING DISCOVERED KNOWLEDGE CASE STUDY: AUTOMOBILE INSURAN...ijmvsc
 
Proficiency comparison ofladtree
Proficiency comparison ofladtreeProficiency comparison ofladtree
Proficiency comparison ofladtreeijcsa
 
THE EFFECTIVENESS OF DATA MINING TECHNIQUES IN BANKING
THE EFFECTIVENESS OF DATA MINING TECHNIQUES IN BANKINGTHE EFFECTIVENESS OF DATA MINING TECHNIQUES IN BANKING
THE EFFECTIVENESS OF DATA MINING TECHNIQUES IN BANKINGcsijjournal
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaIJDKP
 
Incentive compatible privacy preserving data
Incentive compatible privacy preserving dataIncentive compatible privacy preserving data
Incentive compatible privacy preserving dataIEEEFINALYEARPROJECTS
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured datasetVibhore Agarwal
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...IJDKP
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Performance Analysis of Selected Classifiers in User Profiling
Performance Analysis of Selected Classifiers in User ProfilingPerformance Analysis of Selected Classifiers in User Profiling
Performance Analysis of Selected Classifiers in User Profilingijdmtaiir
 
The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in ...
The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in ...The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in ...
The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in ...IJMIT JOURNAL
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASEIJwest
 
Literature review of attribute level and
Literature review of attribute level andLiterature review of attribute level and
Literature review of attribute level andIJDKP
 
km ppt neew one
km ppt neew onekm ppt neew one
km ppt neew oneSahil Jain
 

Tendances (20)

An effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systemsAn effective pre processing algorithm for information retrieval systems
An effective pre processing algorithm for information retrieval systems
 
Recommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduceRecommendation system using bloom filter in mapreduce
Recommendation system using bloom filter in mapreduce
 
A statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environmentA statistical data fusion technique in virtual data integration environment
A statistical data fusion technique in virtual data integration environment
 
LABELING CUSTOMERS USING DISCOVERED KNOWLEDGE CASE STUDY: AUTOMOBILE INSURAN...
LABELING CUSTOMERS USING DISCOVERED KNOWLEDGE  CASE STUDY: AUTOMOBILE INSURAN...LABELING CUSTOMERS USING DISCOVERED KNOWLEDGE  CASE STUDY: AUTOMOBILE INSURAN...
LABELING CUSTOMERS USING DISCOVERED KNOWLEDGE CASE STUDY: AUTOMOBILE INSURAN...
 
Proficiency comparison ofladtree
Proficiency comparison ofladtreeProficiency comparison ofladtree
Proficiency comparison ofladtree
 
Z36149154
Z36149154Z36149154
Z36149154
 
THE EFFECTIVENESS OF DATA MINING TECHNIQUES IN BANKING
THE EFFECTIVENESS OF DATA MINING TECHNIQUES IN BANKINGTHE EFFECTIVENESS OF DATA MINING TECHNIQUES IN BANKING
THE EFFECTIVENESS OF DATA MINING TECHNIQUES IN BANKING
 
Av24317320
Av24317320Av24317320
Av24317320
 
Ch35
Ch35Ch35
Ch35
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
Incentive compatible privacy preserving data
Incentive compatible privacy preserving dataIncentive compatible privacy preserving data
Incentive compatible privacy preserving data
 
Data imputation for unstructured dataset
Data imputation for unstructured datasetData imputation for unstructured dataset
Data imputation for unstructured dataset
 
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
INTEGRATED ASSOCIATIVE CLASSIFICATION AND NEURAL NETWORK MODEL ENHANCED BY US...
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
Performance Analysis of Selected Classifiers in User Profiling
Performance Analysis of Selected Classifiers in User ProfilingPerformance Analysis of Selected Classifiers in User Profiling
Performance Analysis of Selected Classifiers in User Profiling
 
The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in ...
The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in ...The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in ...
The Use of Genetic Algorithm, Clustering and Feature Selection Techniques in ...
 
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASECONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
CONFIGURING ASSOCIATIONS TO INCREASE TRUST IN PRODUCT PURCHASE
 
Literature review of attribute level and
Literature review of attribute level andLiterature review of attribute level and
Literature review of attribute level and
 
km ppt neew one
km ppt neew onekm ppt neew one
km ppt neew one
 
Clustering
ClusteringClustering
Clustering
 

Similaire à A simulated decision trees algorithm (sdt)

data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousingSunny Gandhi
 
DISCOVERY OF ACTIONABLE PATTERNS THROUGH SCALABLE VERTICAL DATA SPLIT METHOD ...
DISCOVERY OF ACTIONABLE PATTERNS THROUGH SCALABLE VERTICAL DATA SPLIT METHOD ...DISCOVERY OF ACTIONABLE PATTERNS THROUGH SCALABLE VERTICAL DATA SPLIT METHOD ...
DISCOVERY OF ACTIONABLE PATTERNS THROUGH SCALABLE VERTICAL DATA SPLIT METHOD ...IJDKP
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)Kartik Kalpande Patil
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Reviewijdpsjournal
 
Study of Data Mining Methods and its Applications
Study of  Data Mining Methods and its ApplicationsStudy of  Data Mining Methods and its Applications
Study of Data Mining Methods and its ApplicationsIRJET Journal
 
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime Value
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime ValueAHP Based Data Mining for Customer Segmentation Based on Customer Lifetime Value
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime ValueIIRindia
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...Editor IJMTER
 
Scalable Action Mining Hybrid Method for Enhanced User Emotions in Education ...
Scalable Action Mining Hybrid Method for Enhanced User Emotions in Education ...Scalable Action Mining Hybrid Method for Enhanced User Emotions in Education ...
Scalable Action Mining Hybrid Method for Enhanced User Emotions in Education ...IJCI JOURNAL
 
Using Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer SegmentationUsing Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer SegmentationIJERA Editor
 
Paper id 29201413
Paper id 29201413Paper id 29201413
Paper id 29201413IJRAT
 
What Is Data Mining How It Works, Benefits, Techniques.pdf
What Is Data Mining How It Works, Benefits, Techniques.pdfWhat Is Data Mining How It Works, Benefits, Techniques.pdf
What Is Data Mining How It Works, Benefits, Techniques.pdfAgile dock
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEIJDKP
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 

Similaire à A simulated decision trees algorithm (sdt) (20)

data mining and data warehousing
data mining and data warehousingdata mining and data warehousing
data mining and data warehousing
 
DISCOVERY OF ACTIONABLE PATTERNS THROUGH SCALABLE VERTICAL DATA SPLIT METHOD ...
DISCOVERY OF ACTIONABLE PATTERNS THROUGH SCALABLE VERTICAL DATA SPLIT METHOD ...DISCOVERY OF ACTIONABLE PATTERNS THROUGH SCALABLE VERTICAL DATA SPLIT METHOD ...
DISCOVERY OF ACTIONABLE PATTERNS THROUGH SCALABLE VERTICAL DATA SPLIT METHOD ...
 
Data Mining
Data MiningData Mining
Data Mining
 
Ez36937941
Ez36937941Ez36937941
Ez36937941
 
knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)knowledge discovery and data mining approach in databases (2)
knowledge discovery and data mining approach in databases (2)
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
Study of Data Mining Methods and its Applications
Study of  Data Mining Methods and its ApplicationsStudy of  Data Mining Methods and its Applications
Study of Data Mining Methods and its Applications
 
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime Value
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime ValueAHP Based Data Mining for Customer Segmentation Based on Customer Lifetime Value
AHP Based Data Mining for Customer Segmentation Based on Customer Lifetime Value
 
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...
 
Scalable Action Mining Hybrid Method for Enhanced User Emotions in Education ...
Scalable Action Mining Hybrid Method for Enhanced User Emotions in Education ...Scalable Action Mining Hybrid Method for Enhanced User Emotions in Education ...
Scalable Action Mining Hybrid Method for Enhanced User Emotions in Education ...
 
Using Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer SegmentationUsing Data Mining Techniques in Customer Segmentation
Using Data Mining Techniques in Customer Segmentation
 
Paper id 29201413
Paper id 29201413Paper id 29201413
Paper id 29201413
 
What Is Data Mining How It Works, Benefits, Techniques.pdf
What Is Data Mining How It Works, Benefits, Techniques.pdfWhat Is Data Mining How It Works, Benefits, Techniques.pdf
What Is Data Mining How It Works, Benefits, Techniques.pdf
 
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVEDATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
DATA MINING IN EDUCATION : A REVIEW ON THE KNOWLEDGE DISCOVERY PERSPECTIVE
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Data Mining
Data MiningData Mining
Data Mining
 

Dernier

Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Dernier (20)

Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

A simulated decision trees algorithm (sdt)

  • 1. 411 A Simulated Decision Trees Algorithm (SDT) Mohamed H. Farrag1 , Maha M. Hana2 , Mona M. Nasr2 1 Theodor Bilharz Research Institute, Ministry of Scientific Research, Cairo, Egypt eng.m.hassan@gmail.com 2 Faculty of Computers and Information, Helwan University, Cairo, Egypt mahana_eg@yahoo.com, m.nasr@helwan.edu.eg Abstract — The customer's information contained in databases has increased dramatically in the last few years. Data mining is a good approach to deal with this volume of information to enhance the process of customer services. One of the most important and powerful techniques of data mining is decision trees algorithm. It appropriate for large and sophisticated business area but it's complicated, high cost and not easy to use by not specialists in the field. To overcome this problem SDT is proposed which is a simple, powerful and low-cost proposed methodology to simulate the decision trees algorithm for different business scopes and nature. SDT methodology consists of three phases. The first phase is the data preparation which prepare data for computing calculations, the second phase is SDT algorithm which represents a simulation of decision trees algorithm to find the most important rules that distinguish specific type of customers, the third phase is to visualize results and rules for better understanding and clarifying the results. In this paper SDT methodology is tested by a dataset consists of 1000 instants for German Credit Data belongs to one of German bank customers. SDT selects the most important rules and paths that reaches the selected ratio and tested cluster of customers successfully with interesting remarks and finding. Keywords: Customer relationship Management, Data Mining, Decision Trees, Customer services I. INTRODUCTION Simulated Decision trees methodology SDT is a simple and powerful form of multiple variable analyses and classification methodology that aims to classifying customer data to produce the most important characteristics that distinguish specific type of customers. This Paper is a trial to presents SDT methodology that contains all decision trees features but at the same time is suitable for small and medium organization scopes, different business fields, ease of use even by not specialists in the field for distinguishing customers' clusters to enhance customer services process. This Paper is organized as follows: section two demonstrates the related work, section three discusses SDT methodology and section four explains the experiment. While the results and the evaluation of SDT methodology are indicated in fifth section, finally conclusion and future work are in the last section. II. RELATED WORK Knowledge Discovery in Database (KDD) is the non-trivial process of identifying valid, potentially useful, and ultimately understandable patterns in data. The knowledge discovery process comprises six phases as Data selection, Data cleansing, Enrichment, Data transformation or encoding, Appling data mining tools and Displaying of the discovered information. Customer Relationship Management (CRM) is defined as: "Communication with the customer, developing products and services that customers need, and selling or supporting them in manners that customer’s desire"[1]. CRM is an Enterprise approach to understanding and influencing customer behavior through meaningful communications in order to improve customer acquisition, customer retention, customer loyalty, and customer profitability to build a profitable long term relationship with the customers in the field of marketing, sales, customer services and support. The CRM framework classified in two classifications the first one is operational classification which is according to the business automation. While the second classification is the analytical classification which is according to the customer characteristics and behaviors to help an organization to effectively allocate resources to the most profitable group of customers or the target customers. The CRM framework has four dimensions the first one is customer identification which concern with target customer analysis and customer segmentation, second is customer attraction which concern with direct marketing, the third is customer retention which concern with one to one marketing
  • 2. 412 and loyalty programs while customer development is the fourth one concern with customer lifetime value analysis, up cross selling and market basket analysis. Organization need to discover the hidden knowledge in the stored data to use it to acquire and retain potential customers and maximize customer return value that could be by using data mining tools which could help the organization to better discriminate and effectively allocate resources to the most profitable group of customers. Also, Data Mining is important tool to transform customer's data into meaning patterns to help in predicting and distinguishing different customer clusters. Data Mining is defined from different views and scopes. The first definition is "A branch of the applied informatics, which allows us to sift through large amounts of structured or unstructured data in attempt to find hidden patterns and/or rules". The second definition is "The data mining methods as tools for searching databases with special algorithms to identify general patterns which can be used in the classification of the individual observations and making predictions". Another definition is "Data mining is the search for valuable information in large volumes of data” [1]. Data mining has several roles in business process. First role is association which aims to establishing relationships between items which exist together in a given record. Second role is classification which use for building a model to predict future customer behaviors through classifying customer data into a number of predefined classes based on certain criteria. Third role is clustering which is to segment a heterogeneous population into a number of more homogenous clusters. Fourth role is forecasting which estimates the future value based on a record’s patterns. It deals with continuously valued outcomes. Fifth role is regression which is a kind of statistical estimation technique used to map each data object to a real value provide prediction value. Sixth role is sequence discovery which is concerned with identification of associations or patterns over time. Seventh role is visualization which can be defined as the presentation of data so that users can view complex patterns. Data Mining Algorithms the previous data mining model could be used through the next algorithms: Association Rule is defined as "a way to find interesting associations among large sets of data items". Also can be defined as "if /then statements that help to uncover relationships between seemingly unrelated data in a relational database or other information warehouse". An association rule has two parts, an antecedent (if) and a consequent (then). An antecedent is an item found in the data. A consequent is an item that is found in combination with the antecedent [5]. Decision Trees structure is a hierarchy of branches within branches that produces the characteristic inverted decision trees structure form. The nested hierarchy of branches is called a Decision Tree. Decision Trees structure is simple, although powerful form of multiple variable analyses. They provide unique capabilities to supplement, to complement, and to substitute a variety of data mining tools and techniques [4]. Genetic Algorithms (GAs) are adaptive heuristic search algorithm premised on the evolutionary ideas of natural selection and genetic. Artificial Neural Networks (ANN) is an information processing pattern. The key element of this model is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements working in harmony to solve specific problems. K-Nearest neighbor (K-NN) is a nonparametric method in that no parameters are estimated, output variables can either be interval variables in which case the K-NN algorithm is used for prediction while if the output variables are categorical, either nominal or ordinal. Linear discriminate (LDA) / logistic regression (LR) they are widely used multivariate statistical methods for analysis of data with categorical outcome variables. Both of them are appropriate for the development of linear classification models. The difference between the two methods is LR makes no assumptions on the distribution of the explanatory data; LDA has been developed to give better results in the case when the normality assumptions are fulfilled.
  • 3. 413 Preperation Phase Algorithm Phase Visualize results phase Data Mining has two methods depends on the nature of use and nature of results. The first is Predictive Methods is depending on using some variables to predict unknown or future values of other variables. The second is Descriptive Methods is depending on find human-interpretable patterns that describe the data. III. SIMULATED DECISION TREES METHODOLOGY The proposed methodology is a simulation of decision trees structure deals with multiple variable datasets to identify the most important customer characteristic that distinguish specific type of customers featured with visualization aid to visualize results. Figure (1): SDT Methodology Phases A. Preparation Phase: Figure (2): Preparation phase Preparation phase consists of three steps as shown in Figure (2). 1. Preprocessing Data This step concerns with data preparation which aims to adopt the dataset values to be suitable for computer calculations. It numerates the dataset attribute, revises the data for odds values and missed values and divides them into 2:4 groups according to the range of its values and the density distribution of each group. 2. Enter Attributes and Its Sets This step concerns with entering the attributes names of the data set and its short names to use by the algorithm then enter the ranges for each set in each attribute. 3. Enter Experimental Values This Step has the following steps: - Step1: Selects the needed attributes to apply algorithm on them. - Step2: Selects the main attribute to choose one of its sets as a start node. - Step3: Selects the start set node from the main attribute sets which selected in step2. - Step4: Enters tested ratio "R" which algorithm compares and ranks results paths and rules according to its value. - Step5: Enters studied case value in numeric format represents in "C" which algorithm generates calculation using this studied case instants in the dataset. B. A simulated Decision trees algorithm phase: SDT algorithm is a proposed algorithm consists of six procedures as shown in Figure (3): Figure (3): Algorithm Procedures - Procedure1: Generate Sets Paths Seeds: In this step algorithm searches for the sets for each selected attribute and then generate the seeds for each probably path starting with the selected node using these attributes. - Procedure2: Generate Potential Paths: In this step algorithm generates all possibly paths and rules from the previous paths seeds, then remove the repetitive sets and eliminate the redundancy of sets to the same attribute in each path to allow one set for each attribute in each path. - Procedure3:Generate Query Statements: In this step algorithm generates the selected query statements for each path using attributes names and their sets. Preparation Phase Preprocessing data Enter attributes and its sets Enter expermintal values Generate sets paths seeds Generate probably paths Generate query statements Adding Condition for studied case Generate ratio Select valid paths
  • 4. 414 - Procedure4: Adding Condition for Selected Studied Case: In this step algorithm uses the previous selected query statement and gives them the conditions for studied case value on the sample dataset, then calculate the number of instances for each path. - Procedure5: Generate ratio: In this step algorithm generate the ratio for each path by dividing the number of instances which was founded in the dataset for each path represents n and total number of instances in sample dataset for this type of customers represents N. - Procedure6: Select valid paths: In this step algorithm represents only the paths that have ratio greater than or equal tested ratio "R". C. Visualize result phase This phase aims to visualize results by display and draw the selected path from all paths that have ratio greater than or equal tested ratio "R". IV. EXPERIMENT Data Set Description The algorithm uses a dataset for German Credit Data, source information is Professor Dr. Hans Hofmann, Institute for Statistics, Hamburg University. This dataset contains 1000 instants divided in two clusters of customers. Clustering is based on measuring the risk of customers. The first cluster is "Low-Risk cluster" that has 700 instants while the second cluster has 300 instants for "High- Risk cluster". The data set has twenty attributes divided in two statuses of attributes. First status is 7 numerical attributes and second status is 13 qualitative attributes. These attributes are different in nature and scope. The attributes names and nature are shown in Table (1). Table (1): Attributes Names and Status # Attributes Status 1 Status of existing checking account qualitative 2 Duration in month numerical 3 Credit history qualitative 4 Purpose qualitative 5 Credit amount numerical 6 Savings account/bonds qualitative 7 Present employment since qualitative 8 Installment rate in percentage of disposable income numerical 9 Personal status and sex qualitative 10 Other debtors / guarantors qualitative 11 Present residence since numerical 12 Property qualitative 13 Age in years numerical 14 Other installment plans qualitative 15 Housing qualitative 16 Number of existing credits at this bank numerical 17 Job qualitative 18 Number of people being liable to provide maintenance for numerical 19 Telephone qualitative 20 foreign worker qualitative A. Data preparation phase This phase aims to detect and classify dataset attributes, then assign observations values from dataset to each attribute and clear dataset if any missed values. 1. Preprocessing Data This step consists of three main steps to insure that each used attribute is a qualitative attribute. This attributes having 2:4 sets according to its values and the density distribution of each group instants. The numerical attributes which needs to convert to qualitative attribute are shown in Table (2) Table (2): Numerical Attributes - Step1: Converts the numerical attributes to qualitative one. # Attributes Set Set min and max value 2 Duration in month A301 ... <= 12 A302 13 ... <= 24 A303 25 ... <= 36 A304 >= 37 3 Credit history A30 no credits taken/all credits paid back duly A31 all credits at this bank paid back duly A32 existing credits paid back duly till now A33 delay in paying off in the past A34 critical account/other credits existing (not at this bank) 5 Credit amount A305 <= 5000 A306 5001 ... < 12000 A307 > 12000 10 Other debtors / guarantors A101 none A102 co-applicant A103 guarantor 13 Age in years A316 ... < 20 year A317 20 <= ... < 35 years A318 35 <= ... < 50 years A319 .. >= 50 years 17 Job A171 unemployed/ unskilled - non-resident A172 unskilled - resident A173 skilled employee / official A174 management/ self-employed/ highly qualified employee/ officer Attributes Status Duration in month numerical Credit amount numerical Installment rate in percentage of disposable income numerical Present residence since numerical Age in years numerical Number of existing credits at this bank numerical Number of people being liable to provide maintenance for numerical
  • 5. 415 - Step2: Divide the new attributes into 2:4 groups according to the range of its values and the density distribution of each group. - Step3: Replace the numeric values within the new qualitative values in the dataset. Table (3): Example of Dataset Attributes and its sets after Preprocessing Data and Converting Numerical Attributes 2. Enter Attributes and its Sets This step aims first to enter the dataset attributes, attributes short names and attributes codes to use by the algorithm. For example, the attribute "Age in Years" its code is "13" and its short name is "AIY". Second to enter the attributes sets names, its short name and ranges as shown in Table (4). Table (4): Example for "Age In Years" Attribute Sets Set Range A316 ... < 20 year A317 20 <= ... < 35 years A318 35 <= ... < 50 years A319 .. >= 50 years 3. Enter Experimental Values This step has five steps to ensure that all needed parameters and inputs to algorithm are clearly defined as: - Step1 This step aims to sets the needed attributes from the attributes list to run the algorithm using these attributes and its sets. For example, the selected attributes in this experimental are shown in Table (5). Table (5): Selected Attributes To Test by The Algorithm Selected Attributes Credit Amount Credit History Duration In month Housing - Step 2 Step2 aims to select the main attribute to select one of its sets as a start node. For example, the selected attribute is "Job". - Step3 Step3 aims to select the start set node from the "job" attribute sets to be the start node in the algorithm. For Example, the selected set from "Job" attribute is set "A173" which refers to "skilled employee / official" - Step4 Step4 aims to enter the selected ratio represents in "R" that all accepted probably paths must get ratio greater than or equal it by dividing the number of instants belongs this path and how many instants of them belongs to the tested case. For example, the selected ratio is 65%. - Step5 Step5 aims to enter the tested case represents in "C" in numeric format as 1, 2 or 3 to represent the selected cluster to perform calculation on its instants. For example, the tested case is "1" which belongs to "Low Risk Customers". B. A simulated Decision Trees Algorithm Procedures SDT: - Procedure1: Generate sets paths seeds: This procedure aims to gets the sets for each selected attribute as shown in Table (6) and set the start node as shown in Table (7). Table (6): Selected Attributes Sets Attribute Set Set range Credit amount A305 <= 5000 A306 5001 ... < 12000 A307 >= 12000 Credit history A30 no credits taken/all credits paid back duly A31 all credits at this bank paid back duly A32 existing credits paid back duly till now A33 delay in paying off in the past Duration In month A301 ... <= 12 A302 13 ... <= 24 A303 25 ... <= 36 A304 >= 37 Housing A151 rent A152 own A153 for free Table (7): Start set node. Attribute Set Set range Job A173 skilled employee / official - Procedure2: Generate Potential Paths: This procedure aims to generate all probably paths from the previous paths seeds starting with the selected node using these attribute, second remove the repetitive sets and eliminate the redundancy of sets to the same attribute in each path to allow only one set for each attribute in each probably path as
  • 6. 416 shown in Table (8) by completing this procedure the algorithms gives 272 probably paths. Table (8): Example of Generated Probably Paths and Rules Path ID Attribute Name Set 1 Job A173 2 Job A173 Credit amount A305 3 Job A173 Credit amount A306 4 Job A173 Credit amount A307 5 Job A173 Credit history A30 - Procedure3: Generate Query Statements: This procedure aims to generate the selected query statement for each path using attributes names and their sets to run on dataset to calculate the number of instants belongs to each probably path as shown in Table (9). Table (9): Example of the Selected Query Statement for Each Probably Path. Id Path 1 job='A173' 2 job='A173' and Credit Amount='A305' 3 job='A173' and Credit Amount='A306' 4 job='A173' and Credit Amount='A307' 5 job='A173' and Credit History='A30' - Procedure4: Adding condition for selected studied case: This procedure aims to use the previous selected query statement and gives each of them the tested case value then calculates the number of instances for each path as shown in Table (10). Table (10): Example of the Selected Query Statement for Each Probably Path id Paths selected query Statements for tested case 1 Count (*) from samples where job='A173' and customer=1 2 Count (*) from samples where job='A173' and Credit Amount='A305' and customer=1 3 Count (*) from samples where job='A173' and Credit Amount='A306' and customer=1 4 Count (*) from samples where job='A173' and Credit Amount='A307' and customer=1 5 Count (*) from samples where job='A173' and Credit History='A30' and customer=1 - Procedure5: Generate ratio: This procedure aims to generate the ratio for each path by dividing the number of instances belongs to this path in the dataset for the tested case and represent it "n" and total number of instances belongs to this path for all cases in the dataset represents N as shown in Table (11). Table (11): Example of the Selected Query Statement for Each Probably Path and its Ratio Id Paths selected query statement for tested case Ratio 15 job='A173' and housing='A152' 0.75663717 241 job='A173' and housing='A152' 0.75663717 2 job='A173' and ca='A305' 0.72952381 18 job='A173' and ca='A305' 0.72952381 33 job='A173' and ca='A305' 0.72952381 - Procedure6: Select valid paths: This procedure aims to represents only the probably paths that have ratio greater than or equal to the selected ratio "R"=65% as shown in Table (12). Table (12): Example of the Accepted Paths and Rules that Have Ratio ≥ to the Selected Ratio "R" = 65% Path id Valid paths Ratio>=65% 146 job='A173' and ch='A34' and ca='A305' 0.8516129 9 job='A173' and ch='A34' 0.83783784 145 job='A173' and ch='A34' 0.83783784 162 job='A173' and dm='A301' and ca='A305' 0.79207921 10 job='A173' and dm='A301' 0.7902439 The algorithm selects only 43 paths from 272 probably paths are valid paths that have ratio greater than or equal to selected ratio "R" =65%. C. Visualize results phase This phase aims to display and draw the selected path from all paths that have ratio greater than or equal to the selected ratio "R". For example drawing the path number 162 as shown in Figure (4).
  • 7. 417 Figure (4): Example of Drawing Path Number 162 V. RESULTS AND EVALUATION OF SDT Simulated Decision Trees methodology SDT is a simple and powerful form of multiple variable analyses and classification algorithm that aims to classifying customer data to produce the most important characteristic that distinguish specific type of customers. SDT methodology has two main characteristics. First, it doesn’t have any special requirements and second its implementation is flexible enough to adapt completely different data domains and different datasets structures. Our initial findings show some interesting results that the field test conducted by our dataset proved that the algorithm is very accurate and the target- oriented campaign is very effective. Helping in turning raw data into meaning patterns to increase knowledge and awareness of business, enable users to deploy that knowledge in a simple and powerful set of human readable rules. Many benefits and advantages will be gained by applying this work on small and medium organizations scopes in different business area like banks, call centers, markets and training centers. The ability to produce results on screen or as print out reports, ability to make backup at end of each procedure, it also give the user the ability to load last experimental values and change selected ratio "R" value or tested case "C" type, satisfy the needs and possibilities of decision makers, ease to use, easy to make changes and low cost. VI. CONCLUSION AND FUTURE WORK SDT methodology is a new proposed methodology to produce the most important characteristic and rules from different customers’ attributes that distinguish specific cluster of customers. It has many advantages first is Simplicity by giving the ability to select the needed attributes without any restriction and find the relationship between each other and ability to change the selected ratio according to business requirements. Second is Efficiency for two reasons the first one is expressing complex alternatives clearly. Second reason is exploring and modifying new information from output. Third is Visualization Aid by representing and visualizing the decision alternatives, possible outcomes this feature is helpful in comprehending sequential decisions and outcome dependencies. The fourth is Harmony comes by merging SDT with other project management tools to enhance the process of customer relationship management It is a suggested to test SDT methodology using different data domains and study its accuracy and capacity on different scopes and natures. REFERENCES [1] Gramatikov, M., (2003), Data Mining Techniques and the Decision Making Process in the Bulgarian Public Administration, NISP Acee Conference, Bucharest, Romania. [2] Srivastava, J. (2000 January), Data Mining for Customer Relationship Management, ACM SIGKDD Explorations Newsletter, Volume 1, No 2. [3] Agrawal, D., (2007 November), Building Profitable Customer Relationships with Data Mining, CSI Research Journal of India. [4] Jia-Lang Seng & T.C. Chen, (2010 December), An Analytic Approach to Select Data Mining for Business Decision, Taiwan Expert Systems with Applications, Volume 37, Pages 8042–8057. [5] Giha, E. F., Singh, P.Y., & Ewe, T. H., (2006 June). Mining Generalized Customer Profiles, AIML 06 International Conference, Sharm El Sheikh, Egypt, Volume 6, Pages 141- 147. [6] Hsieh, N. & Chu, K., (2009), Enhancing Consumer Behavior Analysis by Data Mining Techniques, International Journal of Information and Management Sciences, Volume 20, No 1, Pages 39-53. [7] İkizler, N., & Güvenir, H., (2001), Mining Interesting Rules in Bank Loans Data, Proceedings of the Tenth Turkish Symposium on Artificial Intelligence and Neural Networks, Pages 238-246. [8] Bartok, J., Habala, O., Bednar, P., Gazak, M. & Hluchý, L., (2010) Data Mining and Integration for Predicting Significant Meteorological Phenomena, ICCS 2010 Procedia Computer Science Volume 1, No 1, Pages 37–46. [9] Çiflikli, C., & Kahya-Özyirmidokuz, E., (2010 December), Implementing a data mining solution for enhancing carpet manufacturing productivity. Knowledge-Based Systems, Volume 23, No 8, Pages 783–788. [10] Pauray S.M., (2010), Mining top-k frequent closed item sets over data streams using the sliding window model, Taiwan
  • 8. 418 Expert Systems with Applications Volume 37, Pages 6968– 6973. [11] Kamrunnahar, M., & Urquidi-Macdonald, M., (2010 March), Prediction of corrosion behavior using neural network as a data Mining tool, Corrosion Science, Volume 52, No 3, Pages 669–677. [12] Thanuja V., Venkateswarlu, B., & Anjaneyulu, G. S. G. N., (2011 June), Applications of Data Mining In Customer Relationship Management. Journal of Computer and Mathematical Sciences Volume 2, No 3, Pages 399-580.