Repositorio Institucional
Repositorio Institucional
CONICET Digital
  • Inicio
  • EXPLORAR
    • AUTORES
    • DISCIPLINAS
    • COMUNIDADES
  • Estadísticas
  • Novedades
    • Noticias
    • Boletines
  • Ayuda
    • General
    • Datos de investigación
  • Acerca de
    • CONICET Digital
    • Equipo
    • Red Federal
  • Contacto
JavaScript is disabled for your browser. Some features of this site may not work without it.
  • INFORMACIÓN GENERAL
  • RESUMEN
  • ESTADISTICAS
 
Artículo

Robust front-end for audio, visual and audio–visual speech classification

Terissi, Lucas DanielIcon ; Sad, Gonzalo DanielIcon ; Gómez, Juan Carlos
Fecha de publicación: 06/2018
Editorial: Springer
Revista: International Journal of Speech Technology
ISSN: 1381-2416
Idioma: Inglés
Tipo de recurso: Artículo publicado
Clasificación temática:
Ingeniería Eléctrica y Electrónica

Resumen

This paper proposes a robust front-end for speech classification which can be employed with acoustic, visual or audio–visual information, indistinctly. Wavelet multiresolution analysis is employed to represent temporal input data associated with speech information. These wavelet-based features are then used as inputs to a Random Forest classifier to perform the speech classification. The performance of the proposed speech classification scheme is evaluated in different scenarios, namely, considering only acoustic information, only visual information (lip-reading), and fused audio–visual information. These evaluations are carried out over three different audio–visual databases, two of them public ones and the remaining one compiled by the authors of this paper. Experimental results show that a good performance is achieved with the proposed system over the three databases and for the different kinds of input information being considered. In addition, the proposed method performs better than other reported methods in the literature over the same two public databases. All the experiments were implemented using the same configuration parameters. These results also indicate that the proposed method performs satisfactorily, neither requiring the tuning of the wavelet decomposition parameters nor of the Random Forests classifier parameters, for each particular database and input modalities.
Palabras clave: AUDIO–VISUAL SPEECH RECOGNITION , RANDOM FORESTS , WAVELET DECOMPOSITION
Ver el registro completo
 
Archivos asociados
Thumbnail
 
Tamaño: 1.823Mb
Formato: PDF
.
Descargar
Licencia
info:eu-repo/semantics/openAccess Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)
Identificadores
URI: http://hdl.handle.net/11336/88897
URL: https://link.springer.com/article/10.1007/s10772-018-9504-y
DOI: http://dx.doi.org/10.1007/s10772-018-9504-y
Colecciones
Articulos(CIFASIS)
Articulos de CENTRO INT.FRANCO ARG.D/CS D/L/INF.Y SISTEM.
Citación
Terissi, Lucas Daniel; Sad, Gonzalo Daniel; Gómez, Juan Carlos; Robust front-end for audio, visual and audio–visual speech classification; Springer; International Journal of Speech Technology; 21; 2; 6-2018; 293-307
Compartir
Altmétricas
 

Enviar por e-mail
Separar cada destinatario (hasta 5) con punto y coma.
  • Facebook
  • X Conicet Digital
  • Instagram
  • YouTube
  • Sound Cloud
  • LinkedIn

Los contenidos del CONICET están licenciados bajo Creative Commons Reconocimiento 2.5 Argentina License

https://www.conicet.gov.ar/ - CONICET

Inicio

Explorar

  • Autores
  • Disciplinas
  • Comunidades

Estadísticas

Novedades

  • Noticias
  • Boletines

Ayuda

Acerca de

  • CONICET Digital
  • Equipo
  • Red Federal

Contacto

Godoy Cruz 2290 (C1425FQB) CABA – República Argentina – Tel: +5411 4899-5400 repositorio@conicet.gov.ar
TÉRMINOS Y CONDICIONES