Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval

Maisonnave, MarianoIcon ; Delbianco, Fernando AndrésIcon ; Tohmé, Fernando AbelIcon ; Maguitman, Ana GabrielaIcon
Fecha de publicación: 05/2021
Editorial: Elsevier Science
Revista: Information Processing & Management
ISSN: 0306-4573
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Ciencias de la Computación

Resumen

Topic-based retrieval is the task of seeking and retrieving material related to a topic of interest. This task involves two subtasks: selecting query terms and ranking the retrieved results. Supervised approaches to assess the importance of a term in a topic or class have demonstrated to be effective for guiding the query-term selection subtask. This article analyzes and evaluates FDD, a supervised term-weighting scheme that can be applied for query-term selection in topic-based retrieval. FDD weights terms based on two factors representing the descriptive and discriminating power of the terms with respect to the given topic. It then combines these two factor through the use of an adjustable parameter that allows to favor different aspects of retrieval, such as precision, recall or a balance between both. Previous preliminary studies have demonstrated the potential of FDD to identify useful query terms. However, preceding studies have limited the analysis to a single domain represented by a single data set with binary categories and have not compared FDD to other recently formulated term-weighting techniques. The contributions of this article are the following: (1) it presents an extensive analysis of the behavior of FDD as a function of its adjustable parameter; (2) it compares FDD against eighteen traditional and state-of-the-art weighting scheme; (3) it evaluates the performance of disjunctive queries built by combining terms selected using the analyzed methods; (4) it makes a full data set and the full code publicly available to replicate the reported analysis and foster future research in the area. The analysis and evaluations are performed on three data sets: two well-known text data sets, namely 20 Newsgroups and Reuters-21578, and the newly released data set. It is possible to conclude that despite its simplicity, FDD is competitive with state-of-the-art methods and has the important advantage of offering flexibility at the moment of adapting to specific task goals. The results also demonstrate that FDD offers a useful mechanism to explore different approaches to build complex queries.
Palabras clave: TERM WEIGHTING , VARIABLE EXTRATION , INFORMATION RETRIEVAL , QUERY-TERM SELECTION , TOPIC-BASED RETRIEVAL
Ver el registro completo
 
Archivos asociados
Tamaño: 1.139Mb
Formato: PDF
.
Solicitar
Licencia
info:eu-repo/semantics/restrictedAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)
Identificadores
URI: http://hdl.handle.net/11336/135329
URL: https://www.sciencedirect.com/science/article/abs/pii/S0306457320309729
DOI: http://dx.doi.org/10.1016/j.ipm.2020.102483
URL: https://arxiv.org/abs/2007.06616
Colecciones
Articulos (ICIC)
Articulos de INSTITUTO DE CS. E INGENIERIA DE LA COMPUTACION
Articulos(INMABB)
Articulos de INST.DE MATEMATICA BAHIA BLANCA (I)
Citación
Maisonnave, Mariano; Delbianco, Fernando Andrés; Tohmé, Fernando Abel; Maguitman, Ana Gabriela; Assessing the behavior and performance of a supervised term-weighting technique for topic-based retrieval; Elsevier Science; Information Processing & Management; 58; 3; 5-2021; 1-17; 102483
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES