Mostrar el registro sencillo del ítem
dc.contributor.author
Amil, Pablo
dc.contributor.author
Almeira, Nahuel

dc.contributor.author
Masoller, Cristina
dc.date.available
2021-02-17T16:09:18Z
dc.date.issued
2019-11-26
dc.identifier.citation
Amil, Pablo; Almeira, Nahuel; Masoller, Cristina; Outlier Mining Methods Based on Graph Structure Analysis; Frontiers Media S.A.; Frontiers in Physics; 7; 194; 26-11-2019; 1-11
dc.identifier.issn
2296-424X
dc.identifier.uri
http://hdl.handle.net/11336/125807
dc.description.abstract
Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.
dc.format
application/pdf
dc.language.iso
eng
dc.publisher
Frontiers Media S.A.

dc.rights
info:eu-repo/semantics/openAccess
dc.rights.uri
https://creativecommons.org/licenses/by/2.5/ar/
dc.subject
ANOMALY DETECTION
dc.subject
COMPLEX NETWORKS
dc.subject
MACHINE LEARNING
dc.subject
OUTLIER MINING
dc.subject
PERCOLATION
dc.subject
SUPERVISED LEARNING
dc.subject
UNSUPERVISED LEARNING
dc.subject.classification
Otras Ciencias de la Computación e Información

dc.subject.classification
Ciencias de la Computación e Información

dc.subject.classification
CIENCIAS NATURALES Y EXACTAS

dc.title
Outlier Mining Methods Based on Graph Structure Analysis
dc.type
info:eu-repo/semantics/article
dc.type
info:ar-repo/semantics/artículo
dc.type
info:eu-repo/semantics/publishedVersion
dc.date.updated
2020-12-01T16:24:10Z
dc.journal.volume
7
dc.journal.number
194
dc.journal.pagination
1-11
dc.journal.pais
Suiza

dc.description.fil
Fil: Amil, Pablo. Universitat Politecnica de Catalunya; España
dc.description.fil
Fil: Almeira, Nahuel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomia y Física. Sección Física. Grupo de Teoria de la Materia Condensada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Física Enrique Gaviola. Universidad Nacional de Córdoba. Instituto de Física Enrique Gaviola; Argentina
dc.description.fil
Fil: Masoller, Cristina. Universitat Politecnica de Catalunya; España
dc.journal.title
Frontiers in Physics
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://www.frontiersin.org/article/10.3389/fphy.2019.00194/full
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.3389/fphy.2019.00194
Archivos asociados