Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

Outlier Mining Methods Based on Graph Structure Analysis

Amil, Pablo; Almeira, NahuelIcon ; Masoller, Cristina
Fecha de publicación: 26/11/2019
Editorial: Frontiers Media S.A.
Revista: Frontiers in Physics
ISSN: 2296-424X
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Otras Ciencias de la Computación e Información

Resumen

Outlier detection in high-dimensional datasets is a fundamental and challenging problem across disciplines that has also practical implications, as removing outliers from the training set improves the performance of machine learning algorithms. While many outlier mining algorithms have been proposed in the literature, they tend to be valid or efficient for specific types of datasets (time series, images, videos, etc.). Here we propose two methods that can be applied to generic datasets, as long as there is a meaningful measure of distance between pairs of elements of the dataset. Both methods start by defining a graph, where the nodes are the elements of the dataset, and the links have associated weights that are the distances between the nodes. Then, the first method assigns an outlier score based on the percolation (i.e., the fragmentation) of the graph. The second method uses the popular IsoMap non-linear dimensionality reduction algorithm, and assigns an outlier score by comparing the geodesic distances with the distances in the reduced space. We test these algorithms on real and synthetic datasets and show that they either outperform, or perform on par with other popular outlier detection methods. A main advantage of the percolation method is that is parameter free and therefore, it does not require any training; on the other hand, the IsoMap method has two integer number parameters, and when they are appropriately selected, the method performs similar to or better than all the other methods tested.
Palabras clave: ANOMALY DETECTION , COMPLEX NETWORKS , MACHINE LEARNING , OUTLIER MINING , PERCOLATION , SUPERVISED LEARNING , UNSUPERVISED LEARNING
Ver el registro completo
 
Archivos asociados
Thumbnail
 
Tamaño: 2.704Mb
Formato: PDF
.
Descargar
Licencia
info:eu-repo/semantics/openAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution 2.5 Unported (CC BY 2.5)
Identificadores
URI: http://hdl.handle.net/11336/125807
URL: https://www.frontiersin.org/article/10.3389/fphy.2019.00194/full
DOI: http://dx.doi.org/10.3389/fphy.2019.00194
Colecciones
Articulos(IFEG)
Articulos de INST.DE FISICA ENRIQUE GAVIOLA
Citación
Amil, Pablo; Almeira, Nahuel; Masoller, Cristina; Outlier Mining Methods Based on Graph Structure Analysis; Frontiers Media S.A.; Frontiers in Physics; 7; 194; 26-11-2019; 1-11
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES