Heart Disease Prediction with Feature Engineering, SMOTE Augmentation, and Interpretable Deep Learning Models

Authors

  • Abdul Hamid
  • Dr.Saurabh Mandloi

Keywords:

Heart Disease Prediction, Machine Learning, Deep Learning, Cleveland Heart Disease Dataset (CHDD), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Voting Ensemble, Feature Selection, SMOTE, Regularized Neural Network, SHAP, Mobile Health Application.

Abstract

Cardiovascular disease is still a leading destination in all of the universe, and that’s why this serious matter demands the development of effective and accurate forecasting methods to address the issue both at early stage and prevent it in the case of concern. This work utilizes the machine learning and deep learning paradigms to manipulate the Cleveland Heart Disease dataset (CHDD) in order to make easy and credible predictions of heart disease. Working with the data included pre-processing such as missing value replacements, outlier detection and removal, standardisation, as such – the authors balanced the classes using SMOTE (Synthetic Minority Over-sampling Technique). Apart from that, various techniques such as Analysis of variance (ANOVA), Chi-square, and Mutual Information were employed for feature selection lead to improved subsets (SF-1, SF-2, SF-3) which were classifiers subjected to tests including Logistic Regression, Support Vector Machines, K-Nearest Neighbours and Voting Ensembles, among others. The outcomes revealed that the unsophisticated together with single classifiers were able to obtain around 90-91% accuracies. On the other hand, the cutting-edge Regularized Deep Feedforward Neural Network (DNN) with Swish activation, AdamW optimizer and SMOTE oversampling boosted the accuracy rate substantially, registering an impressive accuracy figure of 98% with balanced precision, recall and F1-score. SHAP explainer was used to improve the model interpretability and the final model was packaged as a mobile application for medical professionals to use in real time. The understanding is that in contradiction to individual ML or DL models, there is potential of hybrid ML-DL pipelines in implementing applications, which use cardiovascular risk prediction and are reliable, affordable, and can be easily scaled up.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

Author Biographies

Abdul Hamid

M.Tech Scholar

Department of Computer Science and Engineering

Sam Global University, Raisen

Bhopal, M.P, India

Dr.Saurabh Mandloi

Head of Department

Department of Computer Science and Technology

Sam Global University, Raisen

Bhopal, M.P., India

References

M. G. El-Shafiey, A. Hagag, E. S. A. El-Dahshan and M. A. Ismail, “A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest,” Multimedia Tools and Applications, vol. 81, no. 13, pp. 18155–18179, Mar. 2022. doi:10.1007/s11042-022-12425-x. ACM Digital LibraryResearchGate

Application of Double Sensitive Cost Random Forest in Heart Disease Detection, Proc. 3rd Int. Symp. on Artificial Intelligence for Digital Transformation, ACM, 2022. (ACM DL) ACM Digital Library+1

H. Heidari, G. Hellstern and M. Murugappan, “Heart Disease Detection using Quantum Computing and Partitioned Random Forest Methods,” arXiv:2208.08882, Aug. 2022. arXiv

A. Surya, “Machine Learning and Ensemble Approach Onto Predicting Heart Disease,” arXiv:2111.08667, Nov. 2021. arXiv

Random forest swarm optimization-based for heart diseases diagnosis, (conference/journal item), 2021. (see conference/journal listing). ScienceDirectAstrophysics Data System

Kernel random forest with black hole optimization for heart diseases, PeerJ Computer Science (article page). 2023/2022 related work. PeerJ

Supervised Machine Learning-Based Cardiovascular Disease Prediction, Wiley / Mathematical Biosciences and Engineering, 2021. Wiley Online Library

A. Juliette Albert, “Diagnosis of heart disease using oversampling methods and decision tree classifier in cardiology,” [Journal / repository], 2022. (PMC listing). PMC

Predicting Heart Disease using Random Forest, ADS / JPhCS abstract (2021). Astrophysics Data System

(Collection of optimized-RF / hybrid RF implementations for heart disease detection — representative examples across 2021–2022). See review and examples in SpringerLinkNature

K. Kwakye and E. Dadzie, “Machine Learning-Based Classification Algorithms for the Prediction of Coronary Heart Diseases,” arXiv preprint arXiv:2112.01503, 2021.

K. Subramanian, S. Rajasekaran, and R. A. Devi, “Risk Factor Prediction by Naive Bayes Classifier, Logistic Regression Models, Various Classification Techniques,” Proc. Natl. Acad. Sci., India, Sect. B Biol. Sci., vol. 92, no. 3, pp. 469–477, 2022.

A. Surya, “Machine Learning and Ensemble Approach Onto Predicting Heart Disease,” arXiv preprint arXiv:2111.08667, 2021.

S. Kumar et al., “Heart Disease Prediction using distinct artificial intelligence techniques: performance analysis and comparison,” J. Healthcare Eng., Article ID 8658763, 2021.

M. A. Javeed et al., “A hybrid AI approach for cardiovascular prediction,” Biomed Res. Int., vol. 2021, Article ID 1129428, 2021.

T. G. R. S. Narayana and N. Nalini, “Prediction of fetal heart disease detection using naïve bayes classifier and comparing with linear regression classifier,” in Proc. Int. Conf. on Advances in Data-driven Computing and Predictive Analytics (ADDEPC), AIP Conf. Proc., vol. 2853, no. 020012, pp. 1–6, 2021 (published 2024).

A. Saxena, P. Jankisharan, V. S. Kushwah, and A. Mishra, “Prediction of Heart Disease using Machine Learning Algorithms,” ResearchGate Preprint, 2021.

S. Saraswat, S. Gabhane, A. Pawar, and S. Patil, “Heart Disease Prediction Using Classification (Naïve Bayes),” ResearchGate Publication, 2023.

M. M. Ahsan and Z. Siddique, “Machine Learning-Based Heart Disease Diagnosis: A Systematic Literature Review,” arXiv preprint arXiv:2112.06459, 2021.

P. K. Sahoo et al., “Computational Learning Model for Prediction of Heart Disease Using Machine Learning Based on a New Regularizer,” Front. Cardiovasc. Med., vol. 8, pp. 1–10, 2021.

Y. Khourdifi and M. Bahaj, "A decision support system for heart disease prediction based upon SVM, KNN, MLP, RF, and NB classifiers optimized by ACO and PSO," Int. J. Intell. Eng. Syst., vol. 12, no. 1, pp. 242–252, 2021.

P. K. Bhunia, A. Debnath, P. Mondal, M. D. E, K. Ganguly, and P. Rakshit, "Heart Disease Prediction using Machine Learning," Int. J. Eng. Res. Technol., vol. 9, no. 11, pp. 1–6, 2021.

M. Mijwil, "Prediction of heart diseases utilising support vector machine and artificial neural network," ResearchGate, 2023. [Online]. Available:

A. Thomas and J. Jyothirmayi, "Early and precise identification and diagnosis of heart disease using support vector machine compared with back-propagation neural network algorithm," AIP Conf. Proc., vol. 3193, no. 1, p. 020087, 2024.

P. K. Bhunia, A. Debnath, P. Mondal, M. D. E, K. Ganguly, and P. Rakshit, "Heart Disease Prediction using Artificial Intelligence," Int. J. Eng. Res. Technol., vol. 9, no. 11, pp. 1–6, 2021.

A. Lakshmanarao, A. Srisaila, and T. S. R. Kiran, "Heart Disease Prediction using Feature Selection and Ensemble Learning Techniques," in Proc. 2021 Third Int. Conf. Intell. Commun. Technol. Virtual Mobile Networks, 2021, pp. 994–998.

M. Elsedimy, A. M. El-Sayed, and M. A. El-Bakry, "New cardiovascular disease prediction approach using support vector machine and quantum-behaved particle swarm optimization," Multimedia Tools and Applications, vol. 82, no. 3, pp. 1–21, 2023. SpringerLink+1

A. Shrivastav, S. K. Shrivastav, and A. K. Sharma, "Prediction of Heart Disease and Survivability using Support Vector Machine and Naive Bayes Algorithm," bioRxiv, 2023. ResearchGate+1

N. M. Lutimath, S. S. Patil, and S. S. Patil, "Classification models combined with Boruta feature selection for heart disease prediction," Materials Today: Proceedings, vol. 72, pp. 1–5, 2023. ScienceDirect+1

M. Padilla Rodriguez and M. Nafea, "Centralized and Federated Heart Disease Classification Models Using UCI Dataset and their Shapley-value Based Interpretability," arXiv preprint arXiv:2408.06183, 2024. arXiv+1

Downloads

Published

09/16/2025

How to Cite

Hamid, A., & Mandloi, D. (2025). Heart Disease Prediction with Feature Engineering, SMOTE Augmentation, and Interpretable Deep Learning Models. SMART MOVES JOURNAL IJOSCIENCE, 11(9), 1–8. Retrieved from https://ijoscience.com/index.php/ojsscience/article/view/568