Mostrar el registro sencillo del ítem
dc.contributor.author
Basgall, María José
dc.contributor.author
Naiouf, Ricardo Marcelo
dc.contributor.author
Fernández, Alberto
dc.date.available
2022-01-20T10:20:51Z
dc.date.issued
2021-08
dc.identifier.citation
Basgall, María José; Naiouf, Ricardo Marcelo; Fernández, Alberto; FDR2-BD: A fast data reduction recommendation tool for tabular big data classification problems; Molecular Diversity Preservation International; Electronics; 10; 15; 8-2021; 1-19
dc.identifier.issn
2079-9292
dc.identifier.uri
http://hdl.handle.net/11336/150370
dc.description.abstract
In this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR2-BD. The key of our proposal is to analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between feature selection to generate dense clusters of data and uniform sampling reduction to keep only a few representative samples from each problem area. Its main advantage is allowing the model’s predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a hyper-parametrization process, in which all data are taken into consideration by following a k-fold procedure. Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark. An extensive experimental study is performed over 25 big datasets with different characteristics. In most cases, the obtained reduction percentages are above 95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The most promising outcome is maintaining the representativeness of the original data information, with quality prediction values around 1% of the baseline.
dc.format
application/pdf
dc.language.iso
eng
dc.publisher
Molecular Diversity Preservation International
dc.rights
info:eu-repo/semantics/openAccess
dc.rights.uri
https://creativecommons.org/licenses/by/2.5/ar/
dc.subject
APACHE SPARK
dc.subject
BIG DATA
dc.subject
CLASSIFICATION
dc.subject
DATA REDUCTION
dc.subject
PREPROCESSING TECHNIQUES
dc.subject.classification
Ciencias de la Computación
dc.subject.classification
Ciencias de la Computación e Información
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS
dc.title
FDR2-BD: A fast data reduction recommendation tool for tabular big data classification problems
dc.type
info:eu-repo/semantics/article
dc.type
info:ar-repo/semantics/artículo
dc.type
info:eu-repo/semantics/publishedVersion
dc.date.updated
2022-01-06T14:57:17Z
dc.journal.volume
10
dc.journal.number
15
dc.journal.pagination
1-19
dc.journal.pais
Suiza
dc.description.fil
Fil: Basgall, María José. Universidad de Granada; España. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina
dc.description.fil
Fil: Naiouf, Ricardo Marcelo. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina
dc.description.fil
Fil: Fernández, Alberto. Universidad de Granada; España
dc.journal.title
Electronics
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2079-9292/10/15/1757
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.3390/electronics10151757
Archivos asociados