Mostrar el registro sencillo del ítem

dc.contributor.author
Nievas Offidani, Mauro Andrés  
dc.contributor.author
Delrieux, Claudio Augusto  
dc.date.available
2025-12-18T10:32:27Z  
dc.date.issued
2023-12-23  
dc.identifier.citation
Nievas Offidani, Mauro Andrés; Delrieux, Claudio Augusto; Dataset of clinical cases, images, image labels and captions from open access case reports from PubMed Central (1990–2023); Elsevier; Data in Brief; 52; 110008; 23-12-2023; 1-10  
dc.identifier.issn
2352-3409  
dc.identifier.uri
http://hdl.handle.net/11336/278085  
dc.description.abstract
This paper details the acquisition, structure and preprocessing of the MultiCaRe Dataset, a multimodal case report dataset which contains data from 75,382 open access PubMed Central articles spanning the period from 1990 to 2023. The dataset includes 96,428 clinical cases, 135,596 images, and their corresponding labels and captions. Data extraction was performed using different APIs and packages such as Biopython, requests, Beautifulsoup, BioC API for PMC and EuropePMC RESTful API. Image labels were created based on the contents of their corresponding captions, by using Spark NLP for Healthcare and manual annotations. Images were preprocessed with OpenCV in order to remove borders and split figures containing multiple images, data were analyzed and described, and a subset was randomly selected for quality assessment. The dataset's structure allows for seamless integration of different types of data, making it a valuable resource for training or fine-tuning medical language, computer vision or multi-modal models.  
dc.format
application/pdf  
dc.language.iso
eng  
dc.publisher
Elsevier  
dc.rights
info:eu-repo/semantics/openAccess  
dc.rights.uri
https://creativecommons.org/licenses/by/2.5/ar/  
dc.subject
Multimodal  
dc.subject
Medical  
dc.subject
Healthcare  
dc.subject
Radiology  
dc.subject.classification
Otras Ciencias de la Computación e Información  
dc.subject.classification
Ciencias de la Computación e Información  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.title
Dataset of clinical cases, images, image labels and captions from open access case reports from PubMed Central (1990–2023)  
dc.type
info:eu-repo/semantics/article  
dc.type
info:ar-repo/semantics/artículo  
dc.type
info:eu-repo/semantics/publishedVersion  
dc.date.updated
2025-11-03T15:20:18Z  
dc.journal.volume
52  
dc.journal.number
110008  
dc.journal.pagination
1-10  
dc.journal.pais
Países Bajos  
dc.journal.ciudad
Amsterdam  
dc.description.fil
Fil: Nievas Offidani, Mauro Andrés. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; Argentina  
dc.description.fil
Fil: Delrieux, Claudio Augusto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Eléctrica y de Computadoras; Argentina  
dc.journal.title
Data in Brief  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://linkinghub.elsevier.com/retrieve/pii/S2352340923010351  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1016/j.dib.2023.110008