Distributed search based on self-indexed compressed text

Arroyuelo, Diego; Gil Costa, Graciela Verónica; González, Senén; Marín, Mauricio; Oyarzún, Mauricio

doi:10.1016/j.ipm.2011.01.008

Mostrar el registro sencillo del ítem

dc.contributor.author

Arroyuelo, Diego

dc.contributor.author

Gil Costa, Graciela Verónica Se ha confirmado la validez de este valor de autoridad por un usuario

dc.contributor.author

González, Senén

dc.contributor.author

Marín, Mauricio Se ha confirmado la validez de este valor de autoridad por un usuario

dc.contributor.author

Oyarzún, Mauricio

dc.date.available

2023-05-11T14:04:14Z

dc.date.issued

2012-03

dc.identifier.citation

Arroyuelo, Diego; Gil Costa, Graciela Verónica; González, Senén; Marín, Mauricio; Oyarzún, Mauricio; Distributed search based on self-indexed compressed text; Pergamon-Elsevier Science Ltd; Information Processing & Management; 48; 5; 3-2012; 819-827

dc.identifier.issn

0306-4573

dc.identifier.uri

http://hdl.handle.net/11336/197197

dc.description.abstract

Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e.; an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.

dc.format

application/pdf

dc.language.iso

eng

dc.publisher

Pergamon-Elsevier Science Ltd Confianza no establecida para este valor

dc.rights

info:eu-repo/semantics/openAccess

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/2.5/ar/

dc.subject

QUERY PROCESSING

dc.subject

SELF-INDEXED COMPRESSED TEXT

dc.subject

SNIPPET EXTRACTION

dc.subject

WAVELET TREES

dc.subject

WEB SEARCH ENGINES

dc.subject.classification

Ciencias de la Computación Se ha confirmado la validez de este valor de autoridad por un usuario

dc.subject.classification

Ciencias de la Computación e Información Se ha confirmado la validez de este valor de autoridad por un usuario

dc.subject.classification

CIENCIAS NATURALES Y EXACTAS Se ha confirmado la validez de este valor de autoridad por un usuario

dc.title

Distributed search based on self-indexed compressed text

dc.type

info:eu-repo/semantics/article

dc.type

info:ar-repo/semantics/artículo

dc.type

info:eu-repo/semantics/publishedVersion

dc.date.updated

2023-04-20T12:33:05Z

dc.journal.volume

48

dc.journal.number

5

dc.journal.pagination

819-827

dc.journal.pais

Países Bajos Se ha confirmado la validez de este valor de autoridad por un usuario

dc.journal.ciudad

Amsterdam

dc.description.fil

Fil: Arroyuelo, Diego. Yahoo! Research Latin America; Chile

dc.description.fil

Fil: Gil Costa, Graciela Verónica. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - San Luis; Argentina. Universidad Nacional de San Luis; Argentina. Yahoo! Research Latin America; Chile

dc.description.fil

Fil: González, Senén. Yahoo! Research Latin America; Chile

dc.description.fil

Fil: Marín, Mauricio. Universidad de Santiago de Chile; Chile. Yahoo! Research Latin America; Chile

dc.description.fil

Fil: Oyarzún, Mauricio. Universidad de Santiago de Chile; Chile

dc.journal.title

Information Processing & Management Confianza no establecida para este valor

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0306457311000094

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1016/j.ipm.2011.01.008

Archivos asociados

Tamaño: 249.9Kb

Formato: PDF

Descargar