Identifying Highly Relevant Entries in Datasets: A Relevance-Based Classification

Delbianco, Fernando Andrés; Tohmé, Fernando Abel

doi:10.1007/s00357-025-09513-6

Artículo

Identifying Highly Relevant Entries in Datasets: A Relevance-Based Classification

Delbianco, Fernando Andrés Icon

; Tohmé, Fernando Abel Icon

Fecha de publicación: 07/2025

Editorial: Springer

Revista: Journal Of Classification

ISSN: 0176-4268

Idioma: Inglés

Tipo de recurso: Artículo publicado

Clasificación temática:

Estadística y Probabilidad

Resumen

In this paper, we present a methodology to classify dataset entries in datasets, based on theirrelevance for answering different specific queries. It employs a repeated individualized inference approach to identify entries with significant Shapley values, contributing with accurate answers to queries about other entries in the dataset. This information is captured in three matrices: a general relevance matrix, a Shapley value matrix, and a significant Shapley value matrix. Since usually the information in datasets is non-homogeneously distributed, relevance is often concentrated in a few entries. This is in particular observed in a representative case study.

Palabras clave: Conformal prediction , Individualized inference , Synthetic data , Shapley values

Ver el registro completo

Archivos asociados

Tamaño: 1.298Mb

Formato: PDF

Solicitar

Licencia

Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)

Identificadores

URI: http://hdl.handle.net/11336/268335

URL: https://link.springer.com/10.1007/s00357-025-09513-6

DOI: http://dx.doi.org/10.1007/s00357-025-09513-6

Colecciones

Articulos(INMABB)
Articulos de INST.DE MATEMATICA BAHIA BLANCA (I)

Citación

Delbianco, Fernando Andrés; Tohmé, Fernando Abel; Identifying Highly Relevant Entries in Datasets: A Relevance-Based Classification; Springer; Journal Of Classification; 7-2025; 1-21

Altmétricas