Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

First Steps towards Data-Driven Adversarial Deduplication

Paredes, José NicolásIcon ; Simari, GerardoIcon ; Martinez, Maria VaninaIcon ; Falappa, Marcelo AlejandroIcon
Fecha de publicación: 27/07/2018
Editorial: MDPI AG
Revista: Information (Switzerland)
ISSN: 2078-2489
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Ciencias de la Computación

Resumen

In traditional databases, the entity resolution problem (which is also known as deduplication)refers to the task of mapping multiple manifestations of virtual objects totheir corresponding real-worldentities. When addressing this problem, in both theory and practice, it is widely assumed that suchsets of virtual objects appear as the result of clerical errors, transliterations, missing or updatedattributes, abbreviations, and so forth. In this paper, we address this problem under the assumptionthat this situation is caused by malicious actors operating in domains in which they do not wishto be identified, such as hacker forums and markets in which the participants are motivated toremain semi-anonymous (though they wish to keep their true identities secret, they find it useful forcustomers to identify their products and services). We are therefore in the presence of a different, andeven more challenging, problem that we refer to as adversarial deduplication. In this paper, we studythis problem via examples that arise from real-world data on malicious hacker forums and marketsarising from collaborations with a cyber threat intelligence company focusing on understanding thiskind of behavior. We argue that it is very difficult—if not impossible—to find ground truth data onwhich to build solutions to this problem, and develop a set of preliminary experiments based ontraining machine learning classifiers that leverage text analysis to detect potential cases of duplicateentities. Our results are encouraging as a first step towards building tools that human analysts canuse to enhance their capabilities towards fighting cyber threats.
Palabras clave: ADVERSARIAL DEDUPLICATION , CYBER THREAT INTELLIGENCE , MACHINE LEARNING CLASSIFIERS
Ver el registro completo
 
Archivos asociados
Thumbnail
 
Tamaño: 1.412Mb
Formato: PDF
.
Descargar
Licencia
info:eu-repo/semantics/openAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution 2.5 Unported (CC BY 2.5)
Identificadores
URI: http://hdl.handle.net/11336/89020
URL: https://www.mdpi.com/2078-2489/9/8/189
DOI: http://dx.doi.org/10.3390/info9080189
Colecciones
Articulos(CCT - BAHIA BLANCA)
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - BAHIA BLANCA
Citación
Paredes, José Nicolás; Simari, Gerardo; Martinez, Maria Vanina; Falappa, Marcelo Alejandro; First Steps towards Data-Driven Adversarial Deduplication; MDPI AG; Information (Switzerland); 9; 8; 27-7-2018; 189-204
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES