Heart Disease Prediction with Feature Engineering, SMOTE Augmentation, and Interpretable Deep Learning Models
Keywords:
Heart Disease Prediction, Machine Learning, Deep Learning, Cleveland Heart Disease Dataset (CHDD), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Voting Ensemble, Feature Selection, SMOTE, Regularized Neural Network, SHAP, Mobile Health Application.Abstract
Cardiovascular disease is still a leading destination in all of the universe, and that’s why this serious matter demands the development of effective and accurate forecasting methods to address the issue both at early stage and prevent it in the case of concern. This work utilizes the machine learning and deep learning paradigms to manipulate the Cleveland Heart Disease dataset (CHDD) in order to make easy and credible predictions of heart disease. Working with the data included pre-processing such as missing value replacements, outlier detection and removal, standardisation, as such – the authors balanced the classes using SMOTE (Synthetic Minority Over-sampling Technique). Apart from that, various techniques such as Analysis of variance (ANOVA), Chi-square, and Mutual Information were employed for feature selection lead to improved subsets (SF-1, SF-2, SF-3) which were classifiers subjected to tests including Logistic Regression, Support Vector Machines, K-Nearest Neighbours and Voting Ensembles, among others. The outcomes revealed that the unsophisticated together with single classifiers were able to obtain around 90-91% accuracies. On the other hand, the cutting-edge Regularized Deep Feedforward Neural Network (DNN) with Swish activation, AdamW optimizer and SMOTE oversampling boosted the accuracy rate substantially, registering an impressive accuracy figure of 98% with balanced precision, recall and F1-score. SHAP explainer was used to improve the model interpretability and the final model was packaged as a mobile application for medical professionals to use in real time. The understanding is that in contradiction to individual ML or DL models, there is potential of hybrid ML-DL pipelines in implementing applications, which use cardiovascular risk prediction and are reliable, affordable, and can be easily scaled up.
Downloads
Metrics
References
M. G. El-Shafiey, A. Hagag, E. S. A. El-Dahshan and M. A. Ismail, “A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest,” Multimedia Tools and Applications, vol. 81, no. 13, pp. 18155–18179, Mar. 2022. doi:10.1007/s11042-022-12425-x. ACM Digital LibraryResearchGate
Application of Double Sensitive Cost Random Forest in Heart Disease Detection, Proc. 3rd Int. Symp. on Artificial Intelligence for Digital Transformation, ACM, 2022. (ACM DL) ACM Digital Library+1
H. Heidari, G. Hellstern and M. Murugappan, “Heart Disease Detection using Quantum Computing and Partitioned Random Forest Methods,” arXiv:2208.08882, Aug. 2022. arXiv
A. Surya, “Machine Learning and Ensemble Approach Onto Predicting Heart Disease,” arXiv:2111.08667, Nov. 2021. arXiv
Random forest swarm optimization-based for heart diseases diagnosis, (conference/journal item), 2021. (see conference/journal listing). ScienceDirectAstrophysics Data System
Kernel random forest with black hole optimization for heart diseases, PeerJ Computer Science (article page). 2023/2022 related work. PeerJ
Supervised Machine Learning-Based Cardiovascular Disease Prediction, Wiley / Mathematical Biosciences and Engineering, 2021. Wiley Online Library
A. Juliette Albert, “Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology,” [Journal / repository], 2022. (PMC listing). PMC
Predicting Heart Disease using Random Forest, ADS / JPhCS abstract (2021). Astrophysics Data System
(Collection of optimized-RF / hybrid RF implementations for heart disease detection — representative examples across 2021–2022). See review and examples in SpringerLinkNature
K. Kwakye and E. Dadzie, “Machine Learning-Based Classification Algorithms for the Prediction of Coronary Heart Diseases,” arXiv preprint arXiv:2112.01503, 2021.
K. Subramanian, S. Rajasekaran, and R. A. Devi, “Risk Factor Prediction by Naive Bayes Classifier, Logistic Regression Models, Various Classification Techniques,” Proc. Natl. Acad. Sci., India, Sect. B Biol. Sci., vol. 92, no. 3, pp. 469–477, 2022.
A. Surya, “Machine Learning and Ensemble Approach Onto Predicting Heart Disease,” arXiv preprint arXiv:2111.08667, 2021.
S. Kumar et al., “Heart Disease Prediction using distinct artificial intelligence techniques: performance analysis and comparison,” J. Healthcare Eng., Article ID 8658763, 2021.
M. A. Javeed et al., “A hybrid AI approach for cardiovascular prediction,” Biomed Res. Int., vol. 2021, Article ID 1129428, 2021.
T. G. R. S. Narayana and N. Nalini, “Prediction of fetal heart disease detection using naïve bayes classifier and comparing with linear regression classifier,” in Proc. Int. Conf. on Advances in Data-driven Computing and Predictive Analytics (ADDEPC), AIP Conf. Proc., vol. 2853, no. 020012, pp. 1–6, 2021 (published 2024).
A. Saxena, P. Jankisharan, V. S. Kushwah, and A. Mishra, “Prediction of Heart Disease using Machine Learning Algorithms,” ResearchGate Preprint, 2021.
S. Saraswat, S. Gabhane, A. Pawar, and S. Patil, “Heart Disease Prediction Using Classification (Naïve Bayes),” ResearchGate Publication, 2023.
M. M. Ahsan and Z. Siddique, “Machine Learning-Based Heart Disease Diagnosis: A Systematic Literature Review,” arXiv preprint arXiv:2112.06459, 2021.
P. K. Sahoo et al., “Computational Learning Model for Prediction of Heart Disease Using Machine Learning Based on a New Regularizer,” Front. Cardiovasc. Med., vol. 8, pp. 1–10, 2021.
Y. Khourdifi and M. Bahaj, "A decision support system for heart disease prediction based upon SVM, KNN, MLP, RF, and NB classifiers optimized by ACO and PSO," Int. J. Intell. Eng. Syst., vol. 12, no. 1, pp. 242–252, 2021.
P. K. Bhunia, A. Debnath, P. Mondal, M. D. E, K. Ganguly, and P. Rakshit, "Heart Disease Prediction using Machine Learning," Int. J. Eng. Res. Technol., vol. 9, no. 11, pp. 1–6, 2021.
M. Mijwil, "Prediction of heart diseases utilising support vector machine and artificial neural network," ResearchGate, 2023. [Online]. Available:
A. Thomas and J. Jyothirmayi, "Early and precise identification and diagnosis of heart disease using support vector machine compared with back-propagation neural network algorithm," AIP Conf. Proc., vol. 3193, no. 1, p. 020087, 2024.
P. K. Bhunia, A. Debnath, P. Mondal, M. D. E, K. Ganguly, and P. Rakshit, "Heart Disease Prediction using Artificial Intelligence," Int. J. Eng. Res. Technol., vol. 9, no. 11, pp. 1–6, 2021.
A. Lakshmanarao, A. Srisaila, and T. S. R. Kiran, "Heart Disease Prediction using Feature Selection and Ensemble Learning Techniques," in Proc. 2021 Third Int. Conf. Intell. Commun. Technol. Virtual Mobile Networks, 2021, pp. 994–998.
M. Elsedimy, A. M. El-Sayed, and M. A. El-Bakry, "New cardiovascular disease prediction approach using support vector machine and quantum-behaved particle swarm optimization," Multimedia Tools and Applications, vol. 82, no. 3, pp. 1–21, 2023. SpringerLink+1
A. Shrivastav, S. K. Shrivastav, and A. K. Sharma, "Prediction of Heart Disease and Survivability using Support Vector Machine and Naive Bayes Algorithm," bioRxiv, 2023. ResearchGate+1
N. M. Lutimath, S. S. Patil, and S. S. Patil, "Classification models combined with Boruta feature selection for heart disease prediction," Materials Today: Proceedings, vol. 72, pp. 1–5, 2023. ScienceDirect+1
M. Padilla Rodriguez and M. Nafea, "Centralized and Federated Heart Disease Classification Models Using UCI Dataset and their Shapley-value Based Interpretability," arXiv preprint arXiv:2408.06183, 2024. arXiv+1
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Abdul Hamid, Dr.Saurabh Mandloi

This work is licensed under a Creative Commons Attribution 4.0 International License.
IJOSCIENCE follows an Open Journal Access policy. Authors retain the copyright of the original work and grant the rights of publication to the publisher with the work simultaneously licensed under a Creative Commons CC BY License that allows others to distribute, remix, adapt, and build upon your work, even commercially, as long as they credit you for the original creation. Authors are permitted to post their work in institutional repositories, social media or other platforms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.