Complementary models for audio-visual speech classification

Sad, Gonzalo Daniel; Terissi, Lucas Daniel; Gómez, Juan C.

doi:10.1007/s10772-021-09944-7

Artículo

Complementary models for audio-visual speech classification

Sad, Gonzalo Daniel Icon

; Terissi, Lucas Daniel Icon

; Gómez, Juan C.

Fecha de publicación: 03/2022

Editorial: Springer

Revista: International Journal of Speech Technology

ISSN: 1381-2416

e-ISSN: 1572-8110

Idioma: Inglés

Tipo de recurso: Artículo publicado

Clasificación temática:

Otras Ciencias de la Computación e Información

Resumen

A novel scheme for disambiguating conflicting classification results in Audio-Visual Speech Recognition applications is proposed in this paper. The classification scheme can be implemented with both generative and discriminative models and can be used with different input modalities, viz. only audio, only visual, and audio visual information. The proposed scheme consists of the cascade connection of a standard classifier, trained with instances of each particular class, followed by a complementary model which is trained with instances of all the remaining classes. The performance of the proposed recognition system is evaluated on three publicly available audio-visual datasets, and using a generative model, namely a Hidden Markov model, and three discriminative techniques, viz. random forests, support vector machines, and adaptive boosting. The experimental results are promising in the sense that for the three datasets, the different models, and the different input modalities, improvements in the recognition rates are achieved in comparison to other methods reported in the literature over the same datasets.

Palabras clave: AUDIO-VISUAL SPEECH , CLASSIFIER COMBINATION , COMPLEMENTARY MODELS , SPEECH CLASSIFICATION

Ver el registro completo

Archivos asociados

Tamaño: 1.358Mb

Formato: PDF

Solicitar

Licencia

Excepto donde se diga explícitamente, este item se publica bajo la siguiente descripción: Creative Commons Attribution-NonCommercial-ShareAlike 2.5 Unported (CC BY-NC-SA 2.5)

Identificadores

URI: http://hdl.handle.net/11336/210949

DOI: http://dx.doi.org/10.1007/s10772-021-09944-7

Colecciones

Articulos(CIFASIS)
Articulos de CENTRO INT.FRANCO ARG.D/CS D/L/INF.Y SISTEM.

Citación

Sad, Gonzalo Daniel; Terissi, Lucas Daniel; Gómez, Juan C.; Complementary models for audio-visual speech classification; Springer; International Journal of Speech Technology; 25; 1; 3-2022; 231-249

Altmétricas