Comparative Performance of IndoBERT and IndoLEM Baseline Models for Post-Disaster Health Information Extraction from Indonesian Online News
DOI:
https://doi.org/10.55537/cosie.v4i3.1174Keywords:
Bencana Alam, Dampak Kesehatan, IndoBERT, NER, text miningAbstract
Natural disasters often have significant impacts on public health, yet systematic monitoring of post-disaster diseases in Indonesia remains limited. This study compares the performance of two Named Entity Recognition (NER) models in extracting health impacts, affected locations, and disaster types from Indonesian-language online news articles. The first model is IndoBERT, fine-tuned using 1,137 manually validated disaster-related news articles. The second comprises baseline models from the IndoLEM benchmark, namely mBERT and XLM-RoBERTa, without domain-specific training. Evaluation results show that IndoBERT outperforms the baseline models, achieving 90.00% accuracy and an F1-score of 88.26%, compared to mBERT (72.93%) and XLM-R (76.44%). Further analysis of the extracted entities reveals spatial and temporal disease trends: floods in Java are consistently associated with diarrhea and skin diseases, while volcanic eruptions in eastern Indonesia are linked to respiratory infections and hypertension. These findings highlight the importance of selecting appropriate models to support data-driven public health monitoring systems in disaster-prone regions
Downloads
References
[1] B. Warbung, K. Kusuma, B. Wahyudi, M. N. Gibran, and P. Widodo, “Strategi Penerapan Teknologi IOT dalam Sistem Komunikasi Kebencanaan di Indonesia,” Nusant. J. Ilmu Pengetah. Sos., vol. 11, no. 8, pp. 3108–3117, 2024.
[2] GAR, Global Assessment Report on Disaster Risk Reduction 2022: Our World at Risk: Transforming Governance for a Resilient Future. 2022.
[3] R. Febtrina et al., “Mitigasi Bencana Banjir di Desa Palung Raya: Dampak Kesehatan dan Upaya Penanggulangannya,” SIGDIMAS, vol. 2, no. 1, pp. 1–12, 2024.
[4] I. D. M. Arsyad, A. Md, A. A. Arsyad, and M. A. F. I. Aslim, Mitigasi Bencana di Lingkungan Kawasan Karst. Indonesia Emas Group, 2025.
[5] J. Mangoma and W. Sulistiadi, “Island Health Crisis: Bridging Gaps in Indonesia’s Healthcare Deserts,” J. Indones. Heal. Policy Adm., vol. 9, no. 2, p. 5, 2024.
[6] R. Rocca, N. Tamagnone, S. Fekih, X. Contla, and N. Rekabsaz, “Natural language processing for humanitarian action: Opportunities, challenges, and the path toward humanitarian NLP,” Front. big Data, vol. 6, p. 1082787, 2023.
[7] G. F. Shidik et al., “Indonesian disaster named entity recognition from multi source information using bidirectional LSTM (BiLSTM),” J. Open Innov. Technol. Mark. Complex., vol. 10, no. 3, p. 100358, 2024.
[8] A. Mehmood, M. T. Zamir, M. A. Ayub, N. Ahmad, and K. Ahmad, “A named entity recognition and topic modeling-based solution for locating and better assessment of natural disasters in social media,” arXiv Prepr. arXiv2405.00903, 2024.
[9] V. Yadav and S. Bethard, “A survey on recent advances in named entity recognition from deep learning models,” arXiv Prepr. arXiv1910.11470, 2019.
[10] J. Sunjaya, J. Ong, R. F. Ziliwu, H. Risni, and A. Pratama, “Applying BERT Model for Early Detection of Mental Disorders Based on Text Input,” J. Ilm. Tek. Inform. dan Komun., vol. 5, no. 2, pp. 660–670, 2025.
[11] M. Košprdić, N. Prodanović, A. Ljajić, B. Bašaragin, and N. Milošević, “From zero to hero: harnessing transformers for biomedical named entity recognition in zero-and few-shot contexts,” Artif. Intell. Med., vol. 156, p. 102970, 2024.
[12] A. O. Alharm and S. Naim, “Enhancing natural disaster response: A deep learning approach to disaster sentiment analysis using bert and lstm,” Available SSRN 4755638.
[13] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A benchmark dataset and pre-trained language model for Indonesian NLP,” arXiv Prepr. arXiv2011.00677, 2020.
[14] S. Cahyawijaya et al., “IndoNLG: Benchmark and resources for evaluating Indonesian natural language generation,” arXiv Prepr. arXiv2104.08200, 2021.
[15] J. L. Arsianto and T. K. Gautama, “Pengembangan Aplikasi Web Scraping untuk Crawling Web Data dari Situs E-Commerce Properti,” J. Strateg. Maranatha, vol. 7, no. 1, pp. 151–163, 2025.
[16] V. Vennila, A. Rajivkannan, S. Savitha, G. J. Santhosh, R. Jeevanantham, and K. Kavin, “Integrated T5 Neural Network and Spacy-Based AI Framework for Advanced Grammar and Speech Analysis,” in International Conference on Sustainability Innovation in Computing and Engineering (ICSICE 2024), 2025, pp. 741–754.
[17] P. Bhadekar, R. Gavali, R. Chavan, A. Karve, and P. Shelke, “Legal Document Summarizer using Spacy and BART.,” Grenze Int. J. Eng. Technol., vol. 10, 2024.
[18] V. P. Vasani, S. C. Pawar, S. Ahamad, A. Sahu, and G. Talele, “Transformer Models for Enhanced Natural Language Processing in Medical Records Management,” in 2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS), 2024, pp. 1808–1814.
[19] Z. Hu, W. Hou, and X. Liu, “Deep learning for named entity recognition: a survey,” Neural Comput. Appl., vol. 36, no. 16, pp. 8995–9022, 2024.
[20] A. Salam and S. R. Sidiq, “SciBERT Optimisation for Named Entity Recognition on NCBI Disease Corpus with Hyperparameter Tuning,” J. Appl. Informatics Comput., vol. 9, no. 2, pp. 432–441, 2025.
[21] S. Kanbara, T. Ando, and R. Shaw, “Enhancing Planetary Health Through Data Visualization and Transdisciplinary Actions: A Case of Primary Healthcare and Climate Disaster Response in Japan,” Available SSRN 4519610, 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Nalar Istiqomah, Fanny Novika

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.