Interpretable Machine Learning Framework for Diabetes Risk Assessment in Women with Enhanced Feature Engineering

25, April 2026

Interpretable Machine Learning Framework for Diabetes Risk Assessment in Women with Enhanced Feature Engineering

April 25, 2026 EDITOR VOLUME-12; ISSUE-4; APR-2026

Author(s): Ms. Snehal Shah , Ms. Roshani Ladwa, Ms. Divya Patel

Authors Affiliations:

Assistant professor, School of Computing and Technology ^{1, 2, 3}

The Institute of Advanced Research, Gandhinagar, India

DOIs:10.2015/IJIRMF/202604023 | Paper ID: IJIRMF202604023

Abstract

Keywords

Cite this Article/Paper as

References

Abstract: Type 2 diabetes mellitus (T2DM) is a major global health concern, affecting over 537 million adults worldwide as per the International Diabetes Federation. Women are frequently more at risk because of diseases such gestational diabetes polycystic ovarian syndrome (PCOS), and hormonal fluctuations. Due to this, the early identification of high-risk individuals is crucial for timely intervention. In this work, a machine learning-based framework is developed to predict diabetes risk in women. The study focuses on addressing common issues reported in earlier research, including class imbalance, limited use of meaningful features, and lack of interpretability. To improve prediction performance, six composite features were designed—BMI-Glucose Index (BGI), Insulin Resistance Proxy (IRP), Cardiometabolic Score (CABS), Glucose-Age Synergy (GASI), Hereditary-Obstetric Risk Score (HORS), and Vascular-Adiposity Ratio (VATR). Additionally, SMOTE was carefully integrated within cross-validation to prevent data leaking, and Winsorization was used to lessen the influence of high values. Five machine learning models in all were assessed. Among them, Random Forest model and XGBoost model showed the best performance, achieving accuracies of 86.1% and 85.9 %, respectively. XGBoost provided a balanced outcome with a sensitivity of 82.0% and specificity of 88.2%. For interpretability, SHAP and LIME techniques were used, and both methods consistently highlighted IRP, Glucose, and BGI as the greatest influential features. Overall, the recommended approach outperformed several existing methods while maintaining interpretability, with an 86.07 percent accuracy rate. The model can be applied to actual healthcare settings to enhance early diabetes risk assessment in women because it only makes use of clinical features that are commonly accessible

XGBoost, Random Forest, SMOTE, SHAP, LIME, Machine Learning, Type 2 Diabetes Mellitus, Feature Engineering, and PIMA Dataset

Ms. Snehal Shah , Ms. Roshani Ladwa, Ms. Divya Patel (2026); Interpretable Machine Learning Framework for Diabetes Risk Assessment in Women with Enhanced Feature Engineering, International Journal for Innovative Research in Multidisciplinary Field, ISSN(O): 2455-0620, Vol-12, Issue-4, Available on – https://www.ijirmf.com/

1. Kachhia, J. A. (2026). Enhancing early diabetes screening through ML and explainable AI. IEEE i-COSTE 2025.
2. Islam, M., Tisha, N. T., Alom, M. R., Oyshe, K. U., & Rahaman, M. A. (2025). An explainable AI-based ensemble ML framework for early-stage diabetes prediction. IEEE 2025.
3. Abu-Shareha, A. A., et al. (2026). Diabetes prediction using hybrid supervised and unsupervised techniques on the PIMA dataset. JAIT, 6, 79–87. https://doi.org/10.37965/jait.2025.0899
4. IDF. (2021). IDF Diabetes Atlas (10th ed.). https://www.diabetesatlas.org
5. Pradhan, D., et al. (2025). Therapeutic interventions for diabetes mellitus. Current Diabetes Reviews, 21(8).
6. Lowe, W. L., et al. (2019). Association of gestational diabetes with maternal disorders. JAMA, 320(10), 1005–1016.
7. Smith, J. W., et al. (1988). Using the ADAP algorithm to forecast the onset of diabetes. Proc Annual Symposium Computer Application Medical Care, 261–265.
8. Sharma, T., & Shah, M. (2021). A comprehensive review of ML techniques on diabetes detection. Visual Computing for Industry, Biomedicine and Art, 4(1), 30.
9. Wee, B. F., et al. (2024). Diabetes detection based on ML and deep learning. Multimedia Tools and Applications, 83(8), 24153–24185.
10. Tanim, S. A., et al. (2025). Explainable deep learning for diabetes with DeepNetX2. Biomedical Signal Processing and Control, 99, 106902.
11. Enríquez-Ortega, D., et al. (2025). Enhancing diabetes diagnosis through ML. Applied Sciences, 15(18).
12. Toleva, B., et al. (2025). Effective methodology for diabetes prediction with class imbalance. Bioengineering, 12(1).
13. Saihood, Q., & Sonuc, E. (2023). Early detection of diabetes using ensemble ML. Turkish J. Electrical Engineering, 31(4), 722–738.
14. Maniruzzaman, M., et al. (2020). Comparative approaches for the classification of diabetes mellitus. Computer Methods Programs Biomedicine, 152, 23–34.
15 Domingos, P. (2012). A few useful things to know about ML. Communications ACM, 55(10), 78–87.
16. Matthews, D. R., et al. (1985). Homeostasis model assessment. Diabetologia, 28, 412–419.
17. Zou, Q., et al. (2018). Predicting diabetes mellitus with ML techniques. Frontiers in Genetics, 9, 515.
18. Chang, V., et al. (2023). PIMA Indians’ diabetes classification based on ML. Neural Computing and Applications, 35(22), 16157.
19. Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. NeurIPS 30.
20. Lundberg, S. M., et al. (2020). From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2, 56–67.
21.Shams, M. Y., et al. (2025). A novel RFE-GRU model for diabetes using PIMA. Scientific Reports, 15(1), 982.
22. Talari, P., et al. (2024). Hybrid feature selection for early prediction of type 2 diabetes. PLOS ONE, 19(1), e0292100.
23. Chawla, N. V., et al. (2002). SMOTE: Synthetic Minority Over-sampling Technique. JAIR, 16, 321–357.
24. DeLong, E. R., et al. (1988). Comparing areas under correlated ROC curves. Biometrics, 44(3), 837–845.
25. Alkalifah, B., et al. (2025). ML-based regression for diabetes levels. Heliyon, 11(1).
26. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. KDD, 785–794.
27. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.28. Sarma, A. D., & Devi, M. (2025). AI in diabetes management. Hormones, 1–16.
29. Mittal, R., et al. (2025). ML for early detection of type 1 diabetes. Int. J. Molecular Sciences, 26(9).
30. Kaur, R., & Rani, R. (2020). Comparative analysis of ML algorithms for diabetes. J. Healthcare Engineering, 2020.

Post Views: 19

Download Full Paper

Download PDF No. of Downloads:1 | No. of Views: 17

Email: editor@ijirmf.com, | Contact: +91 9033767725

INTERNATIONAL JOURNAL FOR INNOVATIVE RESEARCH IN MULTIDISCIPLINARY FIELD

ISSN: 2455-0620 | Impact Factor: 9.47 | UGC-CARE Followed

UGC Approved Journal Number : 47793

International Peer-Reviewed, Refereed, Indexed, Online Journal

Monthly Open Access, Multidisciplinary, Scholarly, Scientific Journal

Interpretable Machine Learning Framework for Diabetes Risk Assessment in Women with Enhanced Feature Engineering

Download Full Paper