Optimizing Heart Disease Prediction Using SMOTE, Decision Tree, and Random Forest: A Regional Analysis Approach
DOI:
https://doi.org/10.55537/cosie.v5i2.1675Keywords:
SMOTE, heart disease, Decision Tree, Random Forest, data imbalance, regional analysisAbstract
Heart disease remains the leading cause of mortality globally, including Indonesia. However, developing accurate predictive models is often hindered by class imbalance in medical datasets, where positive cases significantly outnumber negative cases. This study optimizes heart disease prediction by applying SMOTE (Synthetic Minority Oversampling Technique) regionally to Decision Tree and Random Forest algorithms using the "Heart Attack Prediction in Indonesia" dataset from Kaggle, which contains rural and urban attributes. Following the CRISP-DM framework, SMOTE was applied separately for each region to capture local distributional diversity and reduce regional bias. Results demonstrate that regional SMOTE significantly improved recall and F1-scores for both algorithms, particularly in rural areas where Random Forest recall increased from 60.50% to 70.34%. Statistical significance was confirmed through paired t-tests and Wilcoxon signed-rank tests on 5-fold cross-validation results (p < 0.001). Fairness analysis using Demographic Parity Difference and Equalized Odds Difference confirmed equitable performance across populations (DPD < 0.005, EOD < 0.008). Random Forest consistently outperformed Decision Tree, achieving the highest F1-score of 66.19% in urban regions post-SMOTE. These findings support that regional SMOTE effectively enhances model sensitivity toward minority classes while maintaining spatial fairness in heart disease prediction
Downloads
References
[1] L. A. K. Suardiani and K. D. Muliadana, “Perbandingan biaya riil pada pasien diabetes mellitus tipe 2 dengan tarif INA-CBG’S,” Holistik Jurnal Kesehatan, vol. 19, no. 12, pp. 3810–3816, Feb. 2026, doi: 10.33024/hjk.v19i12.2200.
[2] W. S. P. Harmadha et al., “Explaining the increase of incidence and mortality from cardiovascular disease in Indonesia: A global burden of disease study analysis (2000–2019),” PLoS One, vol. 18, no. 12, Dec. 2023, doi: 10.1371/journal.pone.0294128.
[3] A. B. Hartopo et al., “Modifiable risk factors for coronary artery disease in the Indonesian population: a nested case-control study,” Cardiovascular Prevention and Pharmacotherapy, vol. 5, no. 1, pp. 24–34, Jan. 2023, doi: 10.36011/cpp.2023.5.e3.
[4] R. C. Azahra, F. Defitrika, and A. Ardaninggar, “Pengaruh pola Konsumsi Cepat Saji terhadap Kesehatan Kardiovaskular pada Remaja,” Sulawesi Tenggara Educational Journal, vol. 5, no. 1, pp. 291–298, Apr. 2025, doi: 10.54297/seduj.v5i1.1110.
[5] Narayanan and Jayashree, “Implementation of Efficient Machine Learning Techniques for Prediction of Cardiac Disease using SMOTE,” in Procedia Computer Science, 2024, pp. 558–569. doi: 10.1016/j.procs.2024.03.245.
[6] A. de Carvalho Dutra et al., “Analysis of the Predictors of Mortality from Ischemic Heart Diseases in the Southern Region of Brazil: A Geographic Machine-Learning-Based Study,” Glob. Heart, vol. 19, no. 1, 2024, doi: 10.5334/gh.1371.
[7] F. Asadi, R. Homayounfar, Y. Mehrali, C. Masci, S. Talebi, and F. Zayeri, “Detection of cardiovascular disease cases using advanced tree-based machine learning algorithms,” Sci. Rep., vol. 14, no. 1, p. 22230, Sep. 2024, doi: 10.1038/s41598-024-72819-9.
[8] M. A. Bouqentar et al., “Early heart disease prediction using feature engineering and machine learning algorithms,” Heliyon, vol. 10, no. 19, Oct. 2024, doi: 10.1016/j.heliyon.2024.e38731.
[9] K. Sumwiza, C. Twizere, G. Rushingabigwi, P. Bakunzibake, and P. Bamurigire, “Enhanced cardiovascular disease prediction model using random forest algorithm,” Inform. Med. Unlocked, vol. 41, p. 101316, 2023, doi: 10.1016/j.imu.2023.101316.
[10] A. Yogianto, A. Homaidi, and Z. Fatah, “Implementasi Metode K-Nearest Neighbors (KNN) untuk Klasifikasi Penyakit Jantung,” G-Tech: Jurnal Teknologi Terapan, vol. 8, no. 3, pp. 1720–1728, Jul. 2024, doi: 10.33379/gtech.v8i3.4495.
[11] M. Salmi, D. Atif, D. Oliva, A. Abraham, and S. Ventura, “Handling imbalanced medical datasets: review of a decade of research,” Artif. Intell. Rev., vol. 57, no. 10, p. 273, Sep. 2024, doi: 10.1007/s10462-024-10884-2.
[12] J. Zhu et al., “Processing imbalanced medical data at the data level with assisted-reproduction data as an example,” BioData Min., vol. 17, no. 1, Dec. 2024, doi: 10.1186/s13040-024-00384-y.
[13] M. Aryuni, S. Adiarto, E. Miranda, E. D. Madyatmadja, V. D. S. Albert, and E. Sestomi, “Imbalanced Learning in Heart Disease Categorization: Improving Minority Class Prediction Accuracy Using the SMOTE Algorithm,” INTERNATIONAL JOURNAL of FUZZY LOGIC and INTELLIGENT SYSTEMS, vol. 23, no. 2, pp. 140–151, Jun. 2023, doi: 10.5391/IJFIS.2023.23.2.140.
[14] N. Sinha, M. A. G. Kumar, A. M. Joshi, and L. R. Cenkeramaddi, “DASMcC: Data Augmented SMOTE Multi-Class Classifier for Prediction of Cardiovascular Diseases Using Time Series Features,” IEEE Access, vol. 11, pp. 117643–117655, 2023, doi: 10.1109/ACCESS.2023.3325705.
[15] S. Hossain, M. K. Hasan, M. O. Faruk, N. Aktar, R. Hossain, and K. Hossain, “Machine learning approach for predicting cardiovascular disease in Bangladesh: evidence from a cross-sectional study in 2023,” BMC Cardiovasc. Disord., vol. 24, no. 1, Dec. 2024, doi: 10.1186/s12872-024-03883-2.
[16] D. Ruswanti, D. Susilo, and R. Riani, “Implementasi CRISP-DM pada Data Mining untuk Melakukan Prediksi Pendapatan dengan Algoritma C.45,” Go Infotech: Jurnal Ilmiah STMIK AUB, vol. 30, no. 1, pp. 111–121, Jun. 2024, doi: 10.36309/goi.v30i1.266.
[17] S. Chatterjee, “When caution becomes harm: Understanding the psychology of over-investigation,” Am. J. Med., vol. 139, no. 2, pp. 154–160, Feb. 2026, doi: 10.1016/j.amjmed.2025.10.013.
[18] A. Gupta, R. Chauhan, S. G, and A. Shreekumar, “Improving sepsis prediction in intensive care with SepsisAI: A clinical decision support system with a focus on minimizing false alarms,” PLOS Digital Health, vol. 3, no. 8, Aug. 2024, doi: 10.1371/journal.pdig.0000569.
[19] S. Barocas, M. Hardt, and A. Narayanan, Fairness and Machine Learning: Limitations and Opportunities. MIT Press, 2023.
[20] Adewale Abayomi Adeniran, Amaka Peace Onebunne, and Paul William, “Explainable AI (XAI) in healthcare: Enhancing trust and transparency in critical decision-making,” World Journal of Advanced Research and Reviews, vol. 23, no. 3, pp. 2447–2658, Sep. 2024, doi: 10.30574/wjarr.2024.23.3.2936
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ryan Harrys Pratama, Ade Surya Budiman, Amin Nur Rais

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.



