Huzain Azis, Nurul Rismayanti


Anaemia is a widespread blood disorder characterized by a deficiency of red blood cells or hemoglobin, which can lead to severe health complications if not diagnosed and treated promptly. This research aims to develop a machine learning model to predict anaemia based on hemoglobin levels and image pixel distributions, leveraging a dataset from Kaggle. The dataset includes features such as percentages of red, green, and blue pixels in images and hemoglobin levels. We applied a Random Forest Classifier, a robust machine learning algorithm, and evaluated its performance using 5-fold cross-validation. The data pre-processing involved removing irrelevant columns, encoding categorical variables, and scaling numerical features. The model achieved a mean accuracy of 97.05%, precision of 97.02%, recall of 97.05%, and F1-score of 96.88%, indicating its high reliability in predicting anaemia. Visualizations such as Correlation Heatmaps, 3D PCA, Parallel Coordinates Plots, 3D t-SNE, and Violin Plots were used to understand feature relationships and distributions. These results underscore the potential of machine learning in providing a non-invasive, cost-effective diagnostic tool for anaemia, especially in resource-limited settings. Future research should address dataset imbalance and potential biases, explore additional features, and test other machine learning models to further enhance the predictive accuracy. This study contributes to the field of medical diagnostics by demonstrating the efficacy of integrating hemoglobin levels and image data for anaemia prediction, paving the way for improved early detection and treatment strategies.

Keywords: Anaemia, Hemoglobin, Machine Learning, Random Forest.

Full Text:



I. A. Liberty, I. S. Septadina, A. M. Kurniati, and E. S. Ananingsih, Monograf Risiko Transmisi COVID-19 pada Ibu Rumah Tangga. 2023.

A. P. Wibowo, M. Taruk, T. E. Tarigan, and ..., “Improving Mental Health Diagnostics through Advanced Algorithmic Models: A Case Study of Bipolar and Depressive Disorders,” Indones. J. …, 2024.

R. F. Syam, “Performance Comparison Analysis of Classifiers on Binary Classification Dataset,” Indones. J. Data Sci., 2023.

N. Rismayanti and A. P. Utami, “Improving Multi-Class Classification on 5-Celebrity-Faces Dataset using Ensemble Classification Methods,” Indones. J. Data …, 2023.

X. Yu, “Random forest algorithm-based classification model of pesticide aquatic toxicity to fishes,” Aquat. Toxicol., vol. 251, 2022, doi: 10.1016/j.aquatox.2022.106265.

S. Khomsah and E. Faizal, “Effectiveness Evaluation of the RandomForest Algorithm in Classifying CancerLips Data,” … Artif. Intell. Med. …, 2023.

A. R. Manga, M. A. F. Latief, A. W. M. Gaffar, and ..., “Hyperparameter Tuning of Identity Block Uses an Imbalance Dataset with Hyperband Method,” 2024 18th …, 2024.

H. Azis, L. Syafie, F. Fattah, and ..., “Unveiling Algorithm Classification Excellence: Exploring Calendula and Coreopsis Flower Datasets with Varied Segmentation Techniques,” 2024 18th Int. …, 2024.

F. D. U. Arif, Perbandingan Kinerja Algoritma Random Forest, Xgboost Dan Lightgbm Dalam Klasifikasi Emosi Komentar Reddit. 2023.

Nilawati, “Perbandingan Tingkat Akurasi Metode Weighted Naïve Bayes Dengan Random Forest Dalam Mengklasifikasi Penerima Program Keluarga Harapan ( PKH ) Nilawati Program Studi Statistika Universitas Sulawesi Barat Tahun 2024,” 2024.

D. Liliyawati, “Perbandingan Performa Model Prediksi Customer Churn Berbasis Machine Learning Pada Fashion E-Commerce,” 2023.

A. Faradibah, D. Widyawati, A. U. T. Syahar, and ..., “Comparison Analysis of Random Forest Classifier, Support Vector Machine, and Artificial Neural Network Performance in Multiclass Brain Tumor Classification,” Indones. J. …, 2023.

L. B. C. Tanujayaa, B. Susanto, and A. Saragiha, “Perbandingan Metode Regresi Logistik dan Random Forest untuk Klasifikasi Fitur Mode Audio Spotify,” Indones. J. data Sci., vol. 1, no. 3, pp. 68–78, 2020, doi:

M. M. Baharuddin, T. Hasanuddin, and H. Azis, “Analisis Performa Metode K-Nearest Neighbor untuk Identifikasi Jenis Kaca,” Ilk. J. Ilm., vol. 11, no. 28, pp. 269–274, 2019.

H. Azis, F. Fattah, and P. Putri, “Performa Klasifikasi K-NN dan Cross-validation pada Data Pasien Pengidap Penyakit Jantung,” Ilk. J. Ilm., vol. 12, no. 2, pp. 81–86, 2020.

H. Nugroho, G. E. Yuliastuti, and Andrean Firman Pradana, “Klasifikasi Diagnosis Diabetes Melitus Menggunakan Metode Naïve Bayes Dengan Seleksi Fitur Backward Elimination,” NERO (Networking Eng. Res. Oper., vol. 8, no. 2, pp. 97–106, 2023, doi: 10.21107/nero.v8i2.21110.

I. Sulistiani, “Breast Cancer Prediction Using Random Forest and Gaussian Naïve Bayes Algorithms,” 2022 1st Int. Conf. Inf. Syst. Inf. Technol. ICISIT 2022, pp. 170–175, 2022, doi: 10.1109/ICISIT54091.2022.9872808.

I. F. Hanbal, “Classifying Wastes Using Random Forests, Gaussian Naïve Bayes, Support Vector Machine and Multilayer Perceptron,” IOP Conf. Ser. Mater. Sci. Eng., vol. 803, no. 1, 2020, doi: 10.1088/1757-899X/803/1/012017.

T. E. Tarigan, E. Susanti, M. I. Siami, I. Arfiani, and ..., “Performance Metrics of AdaBoost and Random Forest in Multi-Class Eye Disease Identification: An Imbalanced Dataset Approach,” … Artif. Intell. …, 2023.

I. Alwiah, U. Zaky, and A. W. Murdiyanto, “Assessing the Predictive Power of Logistic Regression on Liver Disease Prevalence in the Indian Context,” … J. Data Sci., 2024.

R. Setiawan and H. Oumarou, “Classification of Rice Grain Varieties Using Ensemble Learning and Image Analysis Techniques,” Indones. J. Data …, 2024.

I. P. A. Pratama, E. S. J. Atmadji, and ..., “Evaluating the Performance of Voting Classifier in Multiclass Classification of Dry Bean Varieties,” Indones. J. …, 2024.

B. S. W. Poetro, E. Maria, H. Zein, and ..., “Advancements in Agricultural Automation: SVM Classifier with Hu Moments for Vegetable Identification,” Indones. J. …, 2024.

F. T. Admojo and N. Rismayanti, “Estimating Obesity Levels Using Decision Trees and K-Fold Cross-Validation: A Study on Eating Habits and Physical Conditions,” Indones. J. Data …, 2024.

N. Rismayanti, A. Naswin, U. Zaky, M. Zakariyah, and D. A. Purnamasari, “Evaluating Thresholding-Based Segmentation and Humoment Feature Extraction in Acute Lymphoblastic Leukemia Classification using Gaussian Naive Bayes,” Int. J. Artif. Intell. Med. Issues, vol. 1, no. 2, 2023.

A. Naswin and A. P. Wibowo, “Performance Analysis of the Decision Tree Classification Algorithm on the Pneumonia Dataset,” … Artif. Intell. Med. …, 2023.

R. A. Azdy, R. F. Syam, E. Faizal, and ..., “Performance Evaluation of Bagging Meta-Estimator in Lung Disease Detection: A Case Study on Imbalanced Dataset,” Int. J. …, 2023.

P. S. Kumar, “Classification of skin cancer using convolutional neural network in comparison with decision tree classifier,” AIP Conf. Proc., vol. 2822, no. 1, 2023, doi: 10.1063/5.0173035.

V. R. Nitha, “Lung Cancer Malignancy detection Using Voting Ensemble Classifier,” ICCSC 2023 - Proc. 2nd Int. Conf. Comput. Syst. Commun., 2023, doi: 10.1109/ICCSC56913.2023.10142984.



  • There are currently no refbacks.

Copyright (c) 2024 Huzain Azis, Nurul Rismayanti