2. Content
Introduction
What is Chronic Kidney Disease (CKD)
Data Mining & Classification
Role ofAttribute Selection
LiteratureReview
Dataset Used
PerformanceParameters
Results & Discussion
Conclusion
References
3. Introduction
As the past records show, the number of deaths in
India due to chronic kidney disease (CKD) were 5.21
million in 2008 and this number can be further
raised to 7.63 million by 2020 [4] .
There is need of detection of the chronic kidney
disease at early stage before getting it worse.
To reduce mortality rate, an efficient technique is required
to predict and classify it.
4. Need of Study
General Problems :
A large space is required for complete dataset
Large computation time
Not providing good Accuracy
Aim of study:
To predict Chronic Kidney Disease in more accurate and
faster way with reduced attributes.
5. What is Chronic Kidney Disease (CKD)
Structural or functional abnormalities of the kidneys for
>3 months, as manifested by either:
1. Kidney damage, with or without decreased GFR,
as defined by
pathologic abnormalities
markers of kidney damage, including abnormalities in the
composition of the blood or urine or abnormalities in
imaging tests
2. GFR <60 ml/min/1.73 m2, with or without
kidney damage; where GFR is Glomerular
FiltrationRate.
6. CKD
death
Stages in Progression of Chronic Kidney Disease
and Therapeutic Strategies
Complications
Screening
for CKD
risk factors
CKD risk
reduction;
Screening for
CKD
Diagnosis
& treatment;
Treat
comorbid
conditions;
Slow
progression
Estimate
progression;
Treat
complications;
Prepare for
replacement
Replacement
by dialysis
& transplant
Normal
Increased
risk
Kidney
failure
Damage GFR
7. Data Mining & Classification
Data mining refers to extracting meaningful
information from hidden patterns of dataset [2].
The data mining techniques are very useful in health
informatics [16, 17].
Data mining classification techniques play a vital role
in classifying various diseases from symptoms and
various medical tests.
8. Attribute Selection
Before inducing a model we almost always do input
engineering
The most useful part of this is attribute selection (also
called feature selection)
Select relevantattributes
Remove redundantand/or irrelevantattributes
Select the most “relevant” subset of attributes according to
some selection criteria.
Why?
9. Reasons for Attribute Selection
Simpler model
Moretransparent
Easier to interpret
Faster model induction
Structural knowledge
Knowing which attributes are important may be inherently
important to the application
Reduce storage requirement
What about the accuracy?
11. Filter Method
Results in either
Ranked list of attributes
Typical when each attribute is evaluated individually
Must select how many to keep
A selected subset of attributes
Forward selection
Best first
Random search such as genetic algorithm
12. Wrapper Method
“Wrap around” the learning algorithm
Always evaluate subsets
Return the best subset of attributes
Use same search methods as before
Wrapper approach is generally more accurate but
also more computationally expensive
13. Literature Review
Researcher Year Classifier Accuracy Remarks
K.R. Lakshmi [6] 2014 ANN 93.8521% Performed better than
Decision Tree and Logical
regressionclassifiers
Naganna Chetty
[7]
2015 NaïveBayes,
SMO,IBK
99%,98.25%,
100%
Attribute Reduction using
Wrapper Method
S.Vijayarani [8] 2015 SVM 76.32%. 584 instances and six
attributes
L.Jerlin Rubini
[9]
2015 Multilayer
Preceptor
99.75% Performed better than radial
basis function network, logistic
regression
Uma N Dulhare
[10]
2016 NaïveBayes 97.5% Attribute Reduction using
OneR
HuseyinPolat
[11]
2017 SVM 98.5%. Attribute Reduction
WalaA. [12] 2017 Decisiontree 99% Missing Values are replaced
withmean
16. RESULT AND DISCUSSION
Tool
WEKA 3.8 (The Waikato Environment for Knowledge
Analysis)
Classifier
J48,DecisionTable and IBK
AttributeSelection
CfsSubsetEval,ClassifierSubsetEval,and WrapperSubsetEval
SearchingTechnique
Greedy and Bestfit Search Approach
17. RESULT OF J48, DECISION TABLE AND IBK
CLASSIFIERS ON CKD
Algorithm Accuracy Precision Recall Kappa Statistics Execution Time RMSE
J48 99% 0.990 0.990 0.9786 0.13 0.0807
DecisionTable 99% 0.990 0.990 0.9786 0.46 0.2507
IBK 95.75% 0.962 0.958 0.9113 0.01 0.2056
General Observations:
•J48 and Decision table provide 99% accuracy
•J48 provides least RMSE value
•IBK takes least time to execute
20. Comparison of Accuracy for J48, Decision
Table and IBK Classifier with original and
reduced dataset
21. CONCLUSION
The accuracy of IBK for original dataset is 95.75%
While with 72% reduced dataset, it provides 100% accuracy
using WrapperSubsetEval attribute evaluator with bestfirst
search.
J48 and Decision Table provides better results than IBK for
originaldataset
While IBK performed better with reduced dataset than
originaldataset.
IBK can be used to predict CKD in efficient and fast way with
reduced attributes.
22. References
[1] L. Jena, and N. Ku. Kamila, "Distributed data mining classification algorithms for prediction of chronic-
kidney-disease," International Journal of Emerging Research in Management &Technology, vol-4, Issue-
11, pp: 110-118, November 2015.
[2] K. Chandel, V. Kunwar, S. Sabitha, T. Choudhury, and S. Mukherjee, “A comparative study on thyroid
disease detection using K-nearest neighbor and Naive Bayes classification techniques, CSI transactions on
ICT, 4(2-4), pp: 313-319, 2016.
[3] Sudhir B. Jagtap, "Census data mining and data analysis using WEKA," arXiv preprint arXiv:1310.4647,
2013.
[4] S.Dilli Arasu, R.Thirumalaiselvi, “Review of Chronic Kidney Disease based on Data Mining Techniques,”
International Journal ofApplied Engineering Research, vol-12, pp: 13498-13505, 2017.
[5] S. Zeynu, Shruti Patil, “Survey on Prediction of Chronic Kidney Disease Using Data Mining Classification
Techniques and Feature Selection,” International Journal of Pure and Applied Mathematics, vol-118, No.
8,pp:149-156, 2018.
[6] K. R. Lakshmi, Y. Nagesh, and M. Veera Krishna, "Performance comparison of three data mining techniques
for predicting kidney dialysis survivability," International Journal of Advances in Engineering &
Technology, vol. 7, pp: 242-254, 2014.
[7] N. Chetty, Kunwar Singh Vaisla, and Sithu D. Sudarsan, “Role of attributes selection in classification of
Chronic Kidney Disease patients,” Computing, Communication and Security (ICCCS), International
Conference on. IEEE, 2015.
23. References
[8] S. Vijayarani, and S. Dhayanand, "Data mining classification algorithms for kidney disease
prediction,"International Journal on Cybernetics and Informatics (IJCI) , 2015.
[9] L. Jerlin Rubini and Dr. P. Eswaran, “Generating comparative analysis of early stage prediction of Chronic
Kidney Disease,” International Journal of Modern Engineering Research (IJMER), Volume 5, Issue 7, pp
49-55, July2015.
[10] Uma N. Dulhare, and Mohammad Ayesha, “Extraction of action rules for chronic kidney disease using
Naïve bayes classifier,” Computational Intelligence and Computing Research (ICCIC), IEEE International
Conference on IEEE, 2016.
[11] H. Polat, Homay Danaei Mehr, and Aydin Cetin, “Diagnosis of chronic kidney disease based on support
vector machine by feature selection methods,” Journal of medical systems, Feb 2017.
[12] W. Abedalkhader, and Noora Abdulrahman, “Missing Data Classification Of Chronic Kidney Disease,”
International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.7, No.5/6,
November 2017.
[13] Abeer Y. Al-Hyari, “Chronic Kidney Disease Prediction System UsingClassifying Data Mining Techniques,”
Library of university of Jordan, 2012.
[14 Jiliang Tang, Salem Alelyani, and Huan Liu, “Feature selection for classification: A review,” Data
classification:Algorithms and applications, 2014.
24. References
[15] Geoffrey Holmes, Andrew Donkin, and Ian H. Witten, “Weka: A machine learning workbench,”
Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand
Conference on. IEEE, 1994.
[16] Mary K. Obenshain, “Application of data mining techniques to healthcare data,” Infection Control &
Hospital Epidemiology25.8, pp: 690-695, 2004.
[17] Cheng, Li-Chen, Ya-Han Hu, and Shr-Han Chiou, “Applying the Temporal Abstraction Technique to the
Predictionof Chronic Kidney Disease Progression,” Journal of medical systems 41, April 2017.
[18] Neeraj Bhargava, Girja Sharma, Ritu Bhargava, and Manish Mathuria, “Decision tree analysis on J48
algorithm for data mining,” Proceedings of International Journal of Advanced Research in Computer
Scienceand Software Engineering, Vol. 3, pp:1114-1119, June 2013.
[19] Hongjun Lu, and Hongyan Liu, “Decision tables: Scalable classification exploring RDBMS
capabilities,”Proceedings of the 26th International Conference onVery Large Data Bases,VLDB'00. 2000.