Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

Dataset of clinical cases, images, image labels and captions from open access case reports from PubMed Central (1990–2023)

Nievas Offidani, Mauro Andrés; Delrieux, Claudio AugustoIcon
Fecha de publicación: 23/12/2023
Editorial: Elsevier
Revista: Data in Brief
ISSN: 2352-3409
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Otras Ciencias de la Computación e Información

Resumen

This paper details the acquisition, structure and preprocessing of the MultiCaRe Dataset, a multimodal case report dataset which contains data from 75,382 open access PubMed Central articles spanning the period from 1990 to 2023. The dataset includes 96,428 clinical cases, 135,596 images, and their corresponding labels and captions. Data extraction was performed using different APIs and packages such as Biopython, requests, Beautifulsoup, BioC API for PMC and EuropePMC RESTful API. Image labels were created based on the contents of their corresponding captions, by using Spark NLP for Healthcare and manual annotations. Images were preprocessed with OpenCV in order to remove borders and split figures containing multiple images, data were analyzed and described, and a subset was randomly selected for quality assessment. The dataset's structure allows for seamless integration of different types of data, making it a valuable resource for training or fine-tuning medical language, computer vision or multi-modal models.
Palabras clave: Multimodal , Medical , Healthcare , Radiology
Ver el registro completo
 
Archivos asociados
Thumbnail
 
Tamaño: 1.977Mb
Formato: PDF
.
Descargar
Licencia
info:eu-repo/semantics/openAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution 2.5 Unported (CC BY 2.5)
Identificadores
URI: http://hdl.handle.net/11336/278085
URL: https://linkinghub.elsevier.com/retrieve/pii/S2352340923010351
DOI: http://dx.doi.org/10.1016/j.dib.2023.110008
Colecciones
Articulos (ICIC)
Articulos de INSTITUTO DE CS. E INGENIERIA DE LA COMPUTACION
Citación
Nievas Offidani, Mauro Andrés; Delrieux, Claudio Augusto; Dataset of clinical cases, images, image labels and captions from open access case reports from PubMed Central (1990–2023); Elsevier; Data in Brief; 52; 110008; 23-12-2023; 1-10
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES