Artículo
Topic relevance and diversity in information retrieval from large datasets: A multi-objective evolutionary algorithm approach
Fecha de publicación:
08/2018
Editorial:
Elsevier Science
Revista:
Applied Soft Computing
ISSN:
1568-4946
Idioma:
Inglés
Tipo de recurso:
Artículo publicado
Clasificación temática:
Resumen
Enabling effective information search is an increasing problem, as technology enhances the ability to publish information rapidly, and large quantities of information are instantly available for retrieval. In this scenario, topical search is the process of searching for material that is relevant to a given topic. Multi-objective Evolutionary Algorithms have demonstrated great potential for addressing the topical search problem in very large datasets. In an evolutionary approach to topical search, a population of queries is automatically generated from a given topic, and the population of queries then evolves towards successively better candidate queries. Despite the promise of this approach, previous studies have revealed a common genotypic phenomenon: throughout evolution, the population tends to converge to almost identical sets of terms. This situation reduces the solution set to a few queries and leads to the exploration of a very limited region of the search space, which constitutes a limitation when users require different options from a topical search tool. This paper proposes and evaluates strategies to favor diversity in evolutionary topical search. These strategies rely on novel fitness functions, different parameterization for the crossover and mutation rates, and the use of multiple populations to favor diversity preservation. Experimental results conducted using these strategies in combination with the NSGA-II algorithm on a dataset consisting of more than 350,000 labeled web pages indicate that the proposed strategies show great promise for searching very large datasets, by helping to achieve query and search result diversity without giving up precision.
Archivos asociados
Licencia
Identificadores
Colecciones
Articulos(CCT - BAHIA BLANCA)
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - BAHIA BLANCA
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - BAHIA BLANCA
Citación
Cecchini, Rocío Luján; Lorenzetti, Carlos Martin; Maguitman, Ana Gabriela; Ponzoni, Ignacio; Topic relevance and diversity in information retrieval from large datasets: A multi-objective evolutionary algorithm approach; Elsevier Science; Applied Soft Computing; 69; 8-2018; 749-770
Compartir
Altmétricas