Outlier Mining Methods Based on Graph Structure Analysis

Amil, Pablo; Almeira, Nahuel; Masoller, Cristina

doi:10.3389/fphy.2019.00194

Mostrar el registro sencillo del ítem

dc.contributor.author

Amil, Pablo

dc.contributor.author

Almeira, Nahuel Se ha confirmado la validez de este valor de autoridad por un usuario

dc.contributor.author

Masoller, Cristina

dc.date.available

2021-02-17T16:09:18Z

dc.date.issued

2019-11-26

dc.identifier.citation

Amil, Pablo; Almeira, Nahuel; Masoller, Cristina; Outlier Mining Methods Based on Graph Structure Analysis; Frontiers Media S.A.; Frontiers in Physics; 7; 194; 26-11-2019; 1-11

dc.identifier.issn

2296-424X

dc.identifier.uri

http://hdl.handle.net/11336/125807

dc.description.abstract

Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.

dc.format

application/pdf

dc.language.iso

eng

dc.publisher

Frontiers Media S.A. Se ha confirmado la validez de este valor de autoridad por un usuario

dc.rights

info:eu-repo/semantics/openAccess

dc.rights.uri

https://creativecommons.org/licenses/by/2.5/ar/

dc.subject

ANOMALY DETECTION

dc.subject

COMPLEX NETWORKS

dc.subject

MACHINE LEARNING

dc.subject

OUTLIER MINING

dc.subject

PERCOLATION

dc.subject

SUPERVISED LEARNING

dc.subject

UNSUPERVISED LEARNING

dc.subject.classification

Otras Ciencias de la Computación e Información Se ha confirmado la validez de este valor de autoridad por un usuario

dc.subject.classification

Ciencias de la Computación e Información Se ha confirmado la validez de este valor de autoridad por un usuario

dc.subject.classification

CIENCIAS NATURALES Y EXACTAS Se ha confirmado la validez de este valor de autoridad por un usuario

dc.title

Outlier Mining Methods Based on Graph Structure Analysis

dc.type

info:eu-repo/semantics/article

dc.type

info:ar-repo/semantics/artículo

dc.type

info:eu-repo/semantics/publishedVersion

dc.date.updated

2020-12-01T16:24:10Z

dc.journal.volume

7

dc.journal.number

194

dc.journal.pagination

1-11

dc.journal.pais

Suiza

dc.description.fil

Fil: Amil, Pablo. Universitat Politecnica de Catalunya; España

dc.description.fil

Fil: Almeira, Nahuel. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomia y Física. Sección Física. Grupo de Teoria de la Materia Condensada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Instituto de Física Enrique Gaviola. Universidad Nacional de Córdoba. Instituto de Física Enrique Gaviola; Argentina

dc.description.fil

Fil: Masoller, Cristina. Universitat Politecnica de Catalunya; España

dc.journal.title

Frontiers in Physics

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/url/https://www.frontiersin.org/article/10.3389/fphy.2019.00194/full

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.3389/fphy.2019.00194

Archivos asociados

Tamaño: 2.704Mb

Formato: PDF

Descargar