Mostrar el registro sencillo del ítem
dc.contributor.author
Petri, Javier
dc.contributor.author
Barcena Barbeira, Pilar
dc.contributor.author
Pesce, Martina
dc.contributor.author
Xhardez, Verónica

dc.contributor.author
Laje, Rodrigo

dc.contributor.author
Cotik, Viviana Erica

dc.date.available
2025-06-26T15:17:45Z
dc.date.issued
2025-06
dc.identifier.citation
Petri, Javier; Barcena Barbeira, Pilar; Pesce, Martina; Xhardez, Verónica; Laje, Rodrigo; et al.; Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study; Academic Press Inc Elsevier Science; Journal Of Biomedical Informatics; 166; 6-2025; 1-14
dc.identifier.issn
1532-0464
dc.identifier.uri
http://hdl.handle.net/11336/264687
dc.description.abstract
Objective:Our study aims to enhance epidemic intelligence through event-based surveillance in an emerging pandemic context. We classified electronic health records (EHRs) from La Rioja, Argentina, focusing on predicting COVID-19-related categories in a scenario with limited disease knowledge, evolving symptoms, non-standardized coding practices, and restricted training data due to privacy issues.Methods:Using natural language processing techniques, we developed rapid, cost-effective methods suitable for implementation with limited resources. We annotated a corpus for training and testing classification models, ranging from simple logistic regression to more complex fine-tuned transformers.Results:The transformer-based, Spanish-adapted models BETO Clínico and RoBERTa Clínico, further pre-trained with an unannotated portion of our corpus, were the best-performing models (F1= 88.13% and 87.01%). A simple logistic regression (LR) model ranked third (F1=85.09%), outperforming more complex models like XGBoost and BiLSTM. Data classified as COVID-confirmed using LR and BETO Clínico exhibit stronger time-series Pearson correlation with official COVID-19 case counts from the National Health Surveillance System (SNVS 2.0) in La Rioja province compared to the correlations observed between the International Code of Diseases (ICD-10) codes and the SNVS 2.0 data (0.840, 0.873, and 0.663, p-values < 3x10^-7). Both models have a good Pearson correlation with ICD-10 codes assigned to the clinical notes for confirmed (0.940 and 0.902) and for suspected cases (0.960 and 0.954), p-values < 3x10^-18.Conclusion:This study shows that simple, resource-efficient methods can achieve results comparable to complex approaches. BETO Clínico and LR strongly correlate with official data, revealing uncoded confirmed cases at the pandemic’s onset. Our results suggest that annotating a smaller set of EHRs and training a simple model may be more cost-effective than manual coding. This points to potentially efficient strategies in public health emergencies, particularly in resource-limited settings, and provides valuable insights for future epidemic response efforts.
dc.format
application/pdf
dc.language.iso
eng
dc.publisher
Academic Press Inc Elsevier Science

dc.rights
info:eu-repo/semantics/restrictedAccess
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.subject
BIONLP FOR SPANISH
dc.subject
TEXT CLASSIFICATION
dc.subject
SPANISH EHRS
dc.subject
EPIDEMIC INTELLIGENCE
dc.subject
EVENT-BASED SURVEILLANCE
dc.subject
MACHINE LEARNING
dc.subject
TRANSFORMERS
dc.subject.classification
Ciencias de la Información y Bioinformática

dc.subject.classification
Ciencias de la Computación e Información

dc.subject.classification
CIENCIAS NATURALES Y EXACTAS

dc.subject.classification
Políticas y Servicios de Salud

dc.subject.classification
Ciencias de la Salud

dc.subject.classification
CIENCIAS MÉDICAS Y DE LA SALUD

dc.subject.classification
Epidemiología

dc.subject.classification
Ciencias de la Salud

dc.subject.classification
CIENCIAS MÉDICAS Y DE LA SALUD

dc.title
Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study
dc.type
info:eu-repo/semantics/article
dc.type
info:ar-repo/semantics/artículo
dc.type
info:eu-repo/semantics/publishedVersion
dc.date.updated
2025-06-25T11:51:01Z
dc.journal.volume
166
dc.journal.pagination
1-14
dc.journal.pais
Estados Unidos

dc.description.fil
Fil: Petri, Javier. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
dc.description.fil
Fil: Barcena Barbeira, Pilar. Universidad de Buenos Aires. Facultad de Medicina. Departamento de Salud Publica.; Argentina
dc.description.fil
Fil: Pesce, Martina. Universidad de Buenos Aires. Facultad de Medicina. Departamento de Salud Publica.; Argentina
dc.description.fil
Fil: Xhardez, Verónica. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Centro Interdisciplinario de Estudios En Ciencia Tecnología E Innovación;
dc.description.fil
Fil: Laje, Rodrigo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina
dc.description.fil
Fil: Cotik, Viviana Erica. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina
dc.journal.title
Journal Of Biomedical Informatics

dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S1532046425000243
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1016/j.jbi.2025.104795
Archivos asociados