Data mining is the process of discovering
interesting patterns and knowledge from mammoth size of
data. Heart disease or cardiovascular disease is the class of
diseases that involve the heart or blood vessels (arteries and
veins). Today most countries face high and increasing rates of
heart disease and it has become a leading cause of
debilitation and death worldwide. In many countries heart
disease is viewed as a "second epidemic" replacing infectious
diseases leading to the main cause of death. Making a
diagnosis of heart disease includes taking a complete medical
evaluation, history, physical examination and early diagnosis
of heart disease can help in reducing the rate of mortality
(Thaksin University, 2006). One of the best ways to diagnose
a heart disease is by using decision tree algorithm. Most
researchers have applied J48 Decision Tree based on Gain
Ratio and binary discretization. Gini Index and Information
Gain are two successful types of Decision Trees that are less
used in the diagnosis of heart disease. Some of the
discretization techniques like voting method and reduced
error pruning are known to produce more accurate Decision
Trees. This research work investigates the results after
applying a range of techniques to different types of Decision
Trees in order to get better performance in heart disease
diagnosis. To evaluate the performance of the alternative
Decision Trees the sensitivity, specificity, and accuracy are
calculated. This research work proposes a model that
performs better than J48 Decision Tree and Bagging
algorithm in the diagnosis of heart disease.
Published In:IJCSN Journal Volume 5, Issue 6
Date of Publication : December 2016
Pages : 885-892
Figures :01
Tables : 06
Mudasir Manzoor Kirmani : SKUAST-K, J&K, India.
Syed Immamul Ansarullah : MANUU, Hyderabad, India.
Data Mining, Decision Tree, Discretization, Heart
Disease
Decision Tree is one of the best data mining techniques
used in the diagnosis of heart disease; but compared to
other data mining algorithms its accuracy is not perfect.
This research work systematically tested decision tree type and voting to identify a more robust, more accurate
method. Applying voting shows increase in the accuracy
of different types of Decision Tree. Gini Index Decision
Tree can enhance the accuracy of the diagnosis of heart
disease.
[1] Gale Nutrition Encyclopedia (2011). Heart Disease.
Available at http://www.answers.com/topic/ischaemicheart-
disease (Accessed 25 February 2011)
[2] European Public Health Alliance. (July 2010-Febuary
2011). [Online]. Available:
http://www.epha.org/a/2352.
[3] ESCAP Available:
http://www.unescap.org/stat/data/syb2009/9.Healthrisks-
causes-of-d eath.asp
[4] Han, J. and Kamber, M. (2006). Data Mining:
Concepts and Techniques. Second Edition, Morgan
Kaufmann Publishers, San Francisco
[5] G.Subbalakshmi et al.”Decision Support in Heart
Disease Prediction System using Naive Bayes”;Indian
Journal of Computer Science and Engineering (IJCSE)
Vol. 2 No. 2 Apr-May 2011
[6] Aditya Methaila et.al. EARLY HEART DISEASE
PREDICTION USING DATA MINING
TECHNIQUES, CCSEIT, DMDB, ICBB, MoWiN,
AIAP – 2014.
[7] Mythili T et. al. “A Heart Disease Prediction Model
using SVM-Decision Trees-Logistic Regression
(SDL)”; International Journal of Computer
Applications (0975 – 8887) Volume 68– No.16, April
2013
[8] Nidhi Bhatla and Kiran Jyoti ,“An Analysis of Heart
Disease Prediction using Different Data Mining
Techniques”; International Journal of Engineering
Research & Technology (IJERT) Vol. 1 Issue 8,
October – 2012
[9] Chaitrali S. Dangare Sulabha S. Apte, “Improved
Study of Heart Disease Prediction System using Data
Mining Classification Techniques”; International
Journal of Computer Applications Volume 47– No.10,
June 2012.
[10] Sitar-Taut et.al.” Using Machine Learning Algorithms
in Cardiovascular Disease Risk Evaluation”.
[11] Hlaudi Daniel Masethe, Mosima Anna Masethe,
"Prediction of Heart Disease using Classification
Algorithms”; Proceedings of the World Congress on
Engineering and Computer Science 2014 Vol. II
WCECS 2014, 22-24 October, 2014, San Francisco,
USA.
[12] ` David L. Olson and Dursun Delen,
“Advanced Data Mining Techniques” springer.com
2008
[13] James Dougherty, Ron Kohavi and Mehran Sahami,
"Supervised and Unsupervised Discretization of
Continuous Features”.
[14] Cleveland Clinic Foundation Heart disease data set
available at
http://archive.ics.uci.edu/ml/datasets/Heart+Disease
[15] Mai Shouman, Tim Turner and Rob Stocker,” Using
Decision Tree for Diagnosing Heart Disease Patients”;
Proceedings of the 9-th Australasian Data Mining
Conference (AusDM'11), Ballarat, Australia, CRPIT
Volume 121 - Data Mining and Analytics 2011
[16] Kerber, R. (1992). "ChiMerge: Discretization of
Numeric Attributes." In Proceedings of the Tenth
National Conference on Arterial Intelligence.
[17] Hall, L. O., K. W. Bowyer, et al. (2000). "Distributed
Learning on Very Large Data Sets." In Workshop on
Distributed and Parallel Knowledge Discover.
[18] Paris, I. H. M., L. S. Affendey, et al. (2010).
"Improving Academic performance Prediction using
Voting Technique in Data Mining." World Academy of
Science, Engineering and Technology 62.
[19] Bramer, M. (2007). Principles of data mining,
Springer.
[20] Esposito, F., D. Malerba, et al. (1997). "A Comparative
Analysis of Methods for Pruning Decision Trees."
IEEE TRANSACTIONS ON PATTERN ANALYSIS
AND MACHINE INTELLIGENCE VOL. 19, NO. 5.