Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance

Mosquera, Candelaria; Ferrer, Luciana; Milone, Diego Humberto; Luna, Daniel; Ferrante, Enzo

doi:10.1007/s00330-024-10834-0

Artículo

Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance

Mosquera, Candelaria; Ferrer, Luciana Icon

; Milone, Diego Humberto Icon

; Luna, Daniel; Ferrante, Enzo Icon

Fecha de publicación: 06/2024

Editorial: Springer

Revista: European Radiology

e-ISSN: 1432-1084

Idioma: Inglés

Tipo de recurso: Artículo publicado

Clasificación temática:

Ciencias de la Información y Bioinformática

Resumen

This work aims to assess standard evaluation practices used by the research community for evaluating medical imaging classifiers, with a specific focus on the implications of class imbalance. The analysis is performed on chest X-rays as a case study and encompasses a comprehensive model performance definition, considering both discriminative capabilities and model calibration.We conduct a concise literature review to examine prevailing scientific practices used when evaluating X-ray classifiers. Then, we perform a systematic experiment on two major chest X-ray datasets to showcase a didactic example of the behavior of several performance metrics under different class ratios and highlight how widely adopted metrics can conceal performance in the minority class.Our literature study confirms that: (1) even when dealing with highly imbalanced datasets, the community tends to use metrics that are dominated by the majority class; and (2) it is still uncommon to include calibration studies for chest X-ray classifiers, albeit its importance in the context of healthcare. Moreover, our systematic experiments confirm that current evaluation practices may not reflect model performance in real clinical scenarios and suggest complementary metrics to better reflect the performance of the system in such scenarios.Our analysis underscores the need for enhanced evaluation practices, particularly in the context of class-imbalanced chest X-ray classifiers. We recommend the inclusion of complementary metrics such as the area under the precision-recall curve (AUC-PR), adjusted AUC-PR, and balanced Brier score, to offer a more accurate depiction of system performance in real clinical scenarios, considering metrics that reflect both, discrimination and calibration performance.This study underscores the critical need for refined evaluation metrics in medical imaging classifiers, emphasizing that prevalent metrics may mask poor performance in minority classes, potentially impacting clinical diagnoses and healthcare outcomes.

Palabras clave: Deep learning , Computer-assisted diagnosis , X-rays , Prevalence

Ver el registro completo

Archivos asociados

Tamaño: 1.118Mb

Formato: PDF

Solicitar

Licencia

Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)

Identificadores

URI: http://hdl.handle.net/11336/258289

URL: https://link.springer.com/10.1007/s00330-024-10834-0

DOI: http://dx.doi.org/10.1007/s00330-024-10834-0

Colecciones

Articulos(ICC)
Articulos de INSTITUTO DE INVESTIGACION EN CIENCIAS DE LA COMPUTACION

Articulos(SINC(I))
Articulos de INST. DE INVESTIGACION EN SEÑALES, SISTEMAS E INTELIGENCIA COMPUTACIONAL

Citación

Mosquera, Candelaria; Ferrer, Luciana; Milone, Diego Humberto; Luna, Daniel; Ferrante, Enzo; Class imbalance on medical image classification: towards better evaluation practices for discrimination and calibration performance; Springer; European Radiology; 6-2024; 1-9

Altmétricas