Mostrar el registro sencillo del ítem

dc.contributor.author
Petri, Javier  
dc.contributor.author
Barcena Barbeira, Pilar  
dc.contributor.author
Pesce, Martina  
dc.contributor.author
Xhardez, Verónica  
dc.contributor.author
Laje, Rodrigo  
dc.contributor.author
Cotik, Viviana Erica  
dc.date.available
2025-06-26T15:17:45Z  
dc.date.issued
2025-06  
dc.identifier.citation
Petri, Javier; Barcena Barbeira, Pilar; Pesce, Martina; Xhardez, Verónica; Laje, Rodrigo; et al.; Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study; Academic Press Inc Elsevier Science; Journal Of Biomedical Informatics; 166; 6-2025; 1-14  
dc.identifier.issn
1532-0464  
dc.identifier.uri
http://hdl.handle.net/11336/264687  
dc.description.abstract
Objective:Our study aims to enhance epidemic intelligence through event-based surveillance in an emerging pandemic context. We classified electronic health records (EHRs) from La Rioja, Argentina, focusing on predicting COVID-19-related categories in a scenario with limited disease knowledge, evolving symptoms, non-standardized coding practices, and restricted training data due to privacy issues.Methods:Using natural language processing techniques, we developed rapid, cost-effective methods suitable for implementation with limited resources. We annotated a corpus for training and testing classification models, ranging from simple logistic regression to more complex fine-tuned transformers.Results:The transformer-based, Spanish-adapted models BETO Clínico and RoBERTa Clínico, further pre-trained with an unannotated portion of our corpus, were the best-performing models (F1= 88.13% and 87.01%). A simple logistic regression (LR) model ranked third (F1=85.09%), outperforming more complex models like XGBoost and BiLSTM. Data classified as COVID-confirmed using LR and BETO Clínico exhibit stronger time-series Pearson correlation with official COVID-19 case counts from the National Health Surveillance System (SNVS 2.0) in La Rioja province compared to the correlations observed between the International Code of Diseases (ICD-10) codes and the SNVS 2.0 data (0.840, 0.873, and 0.663, p-values < 3x10^-7). Both models have a good Pearson correlation with ICD-10 codes assigned to the clinical notes for confirmed (0.940 and 0.902) and for suspected cases (0.960 and 0.954), p-values < 3x10^-18.Conclusion:This study shows that simple, resource-efficient methods can achieve results comparable to complex approaches. BETO Clínico and LR strongly correlate with official data, revealing uncoded confirmed cases at the pandemic’s onset. Our results suggest that annotating a smaller set of EHRs and training a simple model may be more cost-effective than manual coding. This points to potentially efficient strategies in public health emergencies, particularly in resource-limited settings, and provides valuable insights for future epidemic response efforts.  
dc.format
application/pdf  
dc.language.iso
eng  
dc.publisher
Academic Press Inc Elsevier Science  
dc.rights
info:eu-repo/semantics/restrictedAccess  
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/  
dc.subject
BIONLP FOR SPANISH  
dc.subject
TEXT CLASSIFICATION  
dc.subject
SPANISH EHRS  
dc.subject
EPIDEMIC INTELLIGENCE  
dc.subject
EVENT-BASED SURVEILLANCE  
dc.subject
MACHINE LEARNING  
dc.subject
TRANSFORMERS  
dc.subject.classification
Ciencias de la Información y Bioinformática  
dc.subject.classification
Ciencias de la Computación e Información  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.subject.classification
Políticas y Servicios de Salud  
dc.subject.classification
Ciencias de la Salud  
dc.subject.classification
CIENCIAS MÉDICAS Y DE LA SALUD  
dc.subject.classification
Epidemiología  
dc.subject.classification
Ciencias de la Salud  
dc.subject.classification
CIENCIAS MÉDICAS Y DE LA SALUD  
dc.title
Low-cost algorithms for clinical notes phenotype classification to enhance epidemiological surveillance: A case study  
dc.type
info:eu-repo/semantics/article  
dc.type
info:ar-repo/semantics/artículo  
dc.type
info:eu-repo/semantics/publishedVersion  
dc.date.updated
2025-06-25T11:51:01Z  
dc.journal.volume
166  
dc.journal.pagination
1-14  
dc.journal.pais
Estados Unidos  
dc.description.fil
Fil: Petri, Javier. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina  
dc.description.fil
Fil: Barcena Barbeira, Pilar. Universidad de Buenos Aires. Facultad de Medicina. Departamento de Salud Publica.; Argentina  
dc.description.fil
Fil: Pesce, Martina. Universidad de Buenos Aires. Facultad de Medicina. Departamento de Salud Publica.; Argentina  
dc.description.fil
Fil: Xhardez, Verónica. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Centro Interdisciplinario de Estudios En Ciencia Tecnología E Innovación;  
dc.description.fil
Fil: Laje, Rodrigo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina  
dc.description.fil
Fil: Cotik, Viviana Erica. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina  
dc.journal.title
Journal Of Biomedical Informatics  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S1532046425000243  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1016/j.jbi.2025.104795