Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

Fake opinion detection: how similar are crowdsourced datasets to real data?

Fornaciari, Tommaso; Cagnina, Leticia CeciliaIcon ; Rosso, Paolo; Poesio, Massimo
Fecha de publicación: 12/2020
Editorial: Springer
Revista: Language Resources And Evaluation
ISSN: 1574-020X
e-ISSN: 1574-0218
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Ciencias de la Computación

Resumen

Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves collecting (supposedly) truthful reviews online and adding them to a set of deceptive reviews obtained through crowdsourcing services. Models trained this way are generally successful at discriminating between ‘genuine’ online reviews and the crowdsourced deceptive reviews. It has been argued that the deceptive reviews obtained via crowdsourcing are very different from real fake reviews, but the claim has never been properly tested. In this paper, we compare (false) crowdsourced reviews with a set of ‘real’ fake reviews published on line. We evaluate their degree of similarity and their usefulness in training models for the detection of untrustworthy reviews. We find that the deceptive reviews collected via crowdsourcing are significantly different from the fake reviews published online. In the case of the artificially produced deceptive texts, it turns out that their domain similarity with the targets affects the models’ performance, much more than their untruthfulness. This suggests that the use of crowdsourced datasets for opinion spam detection may not result in models applicable to the real task of detecting deceptive reviews. As an alternative method to create large-size datasets for the fake reviews detection task, we propose methods based on the probabilistic annotation of unlabeled texts, relying on the use of meta-information generally available on the e-commerce sites. Such methods are independent from the content of the reviews and allow to train reliable models for the detection of fake reviews.
Palabras clave: CROWDSOURCING , DECEPTION DETECTION , GROUND TRUTH , PROBABILISTIC LABELING
Ver el registro completo
 
Archivos asociados
Tamaño: 312.7Kb
Formato: PDF
.
Solicitar
Licencia
info:eu-repo/semantics/restrictedAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)
Identificadores
URI: http://hdl.handle.net/11336/143656
URL: https://link.springer.com/article/10.1007%2Fs10579-020-09486-5
DOI: https://doi.org/10.1007/s10579-020-09486-5
Colecciones
Articulos(CCT - SAN LUIS)
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - SAN LUIS
Citación
Fornaciari, Tommaso; Cagnina, Leticia Cecilia; Rosso, Paolo; Poesio, Massimo; Fake opinion detection: how similar are crowdsourced datasets to real data?; Springer; Language Resources And Evaluation; 54; 4; 12-2020; 1019-1058
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES