Mostrar el registro sencillo del ítem

dc.contributor.author
Rolando, Matias  
dc.contributor.author
Raggio, Victor  
dc.contributor.author
Naya, Hugo  
dc.contributor.author
Spangenberg, Lucia  
dc.contributor.author
Cagnina, Leticia Cecilia  
dc.date.available
2025-10-09T14:11:56Z  
dc.date.issued
2025-02  
dc.identifier.citation
Rolando, Matias; Raggio, Victor; Naya, Hugo; Spangenberg, Lucia; Cagnina, Leticia Cecilia; A labeled medical records corpus for the timely detection of rare diseases using machine learning approaches; Nature; Scientific Reports; 15; 1; 2-2025; 1-10  
dc.identifier.issn
2045-2322  
dc.identifier.uri
http://hdl.handle.net/11336/273229  
dc.description.abstract
Rare diseases (RDs) are a group of pathologies that individually affect less than 1 in 2000 people but collectively impact around 7% of the world’s population. Most of them affect children, are chronic and progressive, and have no specific treatment. RD patients face diagnostic challenges, with an average diagnosis time of 5 years, multiple specialist visits, and invasive procedures. This ‘diagnostic odyssey’ can be detrimental to their health. Machine learning (ML) has the potential to improve healthcare by providing more personalized and accurate patient management, diagnoses, and in some cases, treatments. Leveraging the MIMIC-III database and additional medical notes from different sources such as in-house data, PubMed and chatGPT, we propose a labeled dataset for early RD detection in hospital settings. Applying various supervised ML methods, including logistic regression, decision trees, support vector machine (SVM), deep learning methods (LSTM and CNN), and Transformers (BERT), we validated the use of the proposed resource, achieving 92.7% F-measure and a 96% AUC using SVM. These findings highlight the potential of ML in redirecting RD patients towards more accurate diagnostic pathways and presents a corpus that can be used for future development and refinements.  
dc.format
application/pdf  
dc.language.iso
eng  
dc.publisher
Nature  
dc.rights
info:eu-repo/semantics/openAccess  
dc.rights.uri
https://creativecommons.org/licenses/by-nc-nd/2.5/ar/  
dc.subject
RARE DISEASES  
dc.subject
MACHINE LEARNING  
dc.subject
ARTIFICIAL CORPUS  
dc.subject.classification
Ciencias de la Información y Bioinformática  
dc.subject.classification
Ciencias de la Computación e Información  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.title
A labeled medical records corpus for the timely detection of rare diseases using machine learning approaches  
dc.type
info:eu-repo/semantics/article  
dc.type
info:ar-repo/semantics/artículo  
dc.type
info:eu-repo/semantics/publishedVersion  
dc.date.updated
2025-10-08T09:59:15Z  
dc.journal.volume
15  
dc.journal.number
1  
dc.journal.pagination
1-10  
dc.journal.pais
Reino Unido  
dc.description.fil
Fil: Rolando, Matias. Instituto Pasteur de Montevideo; Uruguay  
dc.description.fil
Fil: Raggio, Victor. Universidad de la República; Uruguay  
dc.description.fil
Fil: Naya, Hugo. Universidad de la República; Uruguay. Instituto Pasteur de Montevideo; Uruguay  
dc.description.fil
Fil: Spangenberg, Lucia. Universidad de la República; Uruguay. Instituto Pasteur de Montevideo; Uruguay  
dc.description.fil
Fil: Cagnina, Leticia Cecilia. Universidad Nacional de San Luis. Facultad de Ciencias Físico Matemáticas y Naturales. Departamento de Informática. Laboratorio Investigación y Desarrollo en Inteligencia Computacional; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis; Argentina  
dc.journal.title
Scientific Reports  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://www.nature.com/articles/s41598-025-90450-0  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1038/s41598-025-90450-0