Mostrar el registro sencillo del ítem

dc.contributor.author
Basgall, María José  
dc.contributor.author
Naiouf, Ricardo Marcelo  
dc.contributor.author
Fernández, Alberto  
dc.date.available
2022-01-20T10:20:51Z  
dc.date.issued
2021-08  
dc.identifier.citation
Basgall, María José; Naiouf, Ricardo Marcelo; Fernández, Alberto; FDR2-BD: A fast data reduction recommendation tool for tabular big data classification problems; Molecular Diversity Preservation International; Electronics; 10; 15; 8-2021; 1-19  
dc.identifier.issn
2079-9292  
dc.identifier.uri
http://hdl.handle.net/11336/150370  
dc.description.abstract
In this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR2-BD. The key of our proposal is to analyze data in a dual way (vertical and horizontal), so as to provide a smart combination between feature selection to generate dense clusters of data and uniform sampling reduction to keep only a few representative samples from each problem area. Its main advantage is allowing the model’s predictive quality to be kept in a range determined by a user’s threshold. Its robustness is built on a hyper-parametrization process, in which all data are taken into consideration by following a k-fold procedure. Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark. An extensive experimental study is performed over 25 big datasets with different characteristics. In most cases, the obtained reduction percentages are above 95%, thus outperforming state-of-the-art solutions such as FCNN_MR that barely reach 70%. The most promising outcome is maintaining the representativeness of the original data information, with quality prediction values around 1% of the baseline.  
dc.format
application/pdf  
dc.language.iso
eng  
dc.publisher
Molecular Diversity Preservation International  
dc.rights
info:eu-repo/semantics/openAccess  
dc.rights.uri
https://creativecommons.org/licenses/by/2.5/ar/  
dc.subject
APACHE SPARK  
dc.subject
BIG DATA  
dc.subject
CLASSIFICATION  
dc.subject
DATA REDUCTION  
dc.subject
PREPROCESSING TECHNIQUES  
dc.subject.classification
Ciencias de la Computación  
dc.subject.classification
Ciencias de la Computación e Información  
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS  
dc.title
FDR2-BD: A fast data reduction recommendation tool for tabular big data classification problems  
dc.type
info:eu-repo/semantics/article  
dc.type
info:ar-repo/semantics/artículo  
dc.type
info:eu-repo/semantics/publishedVersion  
dc.date.updated
2022-01-06T14:57:17Z  
dc.journal.volume
10  
dc.journal.number
15  
dc.journal.pagination
1-19  
dc.journal.pais
Suiza  
dc.description.fil
Fil: Basgall, María José. Universidad de Granada; España. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; Argentina  
dc.description.fil
Fil: Naiouf, Ricardo Marcelo. Universidad Nacional de La Plata. Facultad de Informática. Instituto de Investigación en Informática Lidi; Argentina  
dc.description.fil
Fil: Fernández, Alberto. Universidad de Granada; España  
dc.journal.title
Electronics  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://www.mdpi.com/2079-9292/10/15/1757  
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.3390/electronics10151757