Artículo
Distributed Text Search using Suffix Arrays
Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica
; Marín, Mauricio; Navarro, Gonzalo
Fecha de publicación:
11/07/2014
Editorial:
Elsevier Science
Revista:
Parallel Computing
ISSN:
0167-8191
Idioma:
Inglés
Tipo de recurso:
Artículo publicado
Clasificación temática:
Resumen
Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, where it is necessary to efficiently process intensive-traffic streams of on-line queries. This paper proposes strategies to enable such services by means of suffix arrays. We introduce techniques for deploying suffix arrays on clusters of distributed-memory processors and then study the processing of multiple queries on the distributed data structure. Even though the cost of individual search operations in sequential (non-distributed) suffix arrays is low in practice, the problem of processing multiple queries on distributed-memory systems, so that hardware resources are used efficiently, is relevant to services aimed at achieving high query throughput at low operational costs. Our theoretical and experimental performance studies show that our proposals are suitable solutions for building efficient and scalable on-line search services based on suffix arrays.
Palabras clave:
Arreglos de Sufijos
,
Sistemas Distribuidos
,
Distributed Text Search
Archivos asociados
Licencia
Identificadores
Colecciones
Articulos(CCT - SAN LUIS)
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - SAN LUIS
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - SAN LUIS
Citación
Arroyuelo, Diego; Bonacic, Carolina; Gil Costa, Graciela Verónica; Marín, Mauricio; Navarro, Gonzalo; Distributed Text Search using Suffix Arrays; Elsevier Science; Parallel Computing; 40; 9; 11-7-2014; 471-495
Compartir
Altmétricas