Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

Distributed search based on self-indexed compressed text

Arroyuelo, Diego; Gil Costa, Graciela VerónicaIcon ; González, Senén; Marín, Mauricio; Oyarzún, Mauricio
Fecha de publicación: 03/2012
Editorial: Pergamon-Elsevier Science Ltd
Revista: Information Processing & Management
ISSN: 0306-4573
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Ciencias de la Computación

Resumen

Query response times within a fraction of a second in Web search engines are feasible due to the use of indexing and caching techniques, which are devised for large text collections partitioned and replicated into a set of distributed-memory processors. This paper proposes an alternative query processing method for this setting, which is based on a combination of self-indexed compressed text and posting lists caching. We show that a text self-index (i.e.; an index that compresses the text and is able to extract arbitrary parts of it) can be competitive with an inverted index if we consider the whole query process, which includes index decompression, ranking and snippet extraction time. The advantage is that within the space of the compressed document collection, one can carry out the posting lists generation, document ranking and snippet extraction. This significantly reduces the total number of processors involved in the solution of queries. Alternatively, for the same amount of hardware, the performance of the proposed strategy is better than that of the classical approach based on treating inverted indexes and corresponding documents as two separate entities in terms of processors and memory space.
Palabras clave: QUERY PROCESSING , SELF-INDEXED COMPRESSED TEXT , SNIPPET EXTRACTION , WAVELET TREES , WEB SEARCH ENGINES
Ver el registro completo
 
Archivos asociados
Thumbnail
 
Tamaño: 249.9Kb
Formato: PDF
.
Descargar
Licencia
info:eu-repo/semantics/openAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Atribución-NoComercial-SinDerivadas 2.5 Argentina (CC BY-NC-ND 2.5 AR)
Identificadores
URI: http://hdl.handle.net/11336/197197
URL: http://www.sciencedirect.com/science/article/pii/S0306457311000094
DOI: http://dx.doi.org/10.1016/j.ipm.2011.01.008
Colecciones
Articulos(CCT - SAN LUIS)
Articulos de CTRO.CIENTIFICO TECNOL.CONICET - SAN LUIS
Citación
Arroyuelo, Diego; Gil Costa, Graciela Verónica; González, Senén; Marín, Mauricio; Oyarzún, Mauricio; Distributed search based on self-indexed compressed text; Pergamon-Elsevier Science Ltd; Information Processing & Management; 48; 5; 3-2012; 819-827
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES