Mostrar el registro sencillo del ítem
dc.contributor.author
Mitra, Vikramjit
dc.contributor.author
Franco, Horacio
dc.contributor.author
Stern, Richard M.
dc.contributor.author
Van Hout, Julien
dc.contributor.author
Ferrer, Luciana
dc.contributor.author
Graciarena, Martin
dc.contributor.author
Wang, Wen
dc.contributor.author
Vergyri, Dimitra
dc.contributor.author
Alwan, Abeer
dc.contributor.author
Hansen, John H. L.
dc.contributor.other
Watanabe, Shinji
dc.contributor.other
Delcroix, Marc
dc.contributor.other
Metze, Florian
dc.contributor.other
Hershey, John R.
dc.date.available
2022-07-26T14:48:46Z
dc.date.issued
2017
dc.identifier.citation
Mitra, Vikramjit; Franco, Horacio; Stern, Richard M.; Van Hout, Julien; Ferrer, Luciana; et al.; Robust features in deep-learning-based speech recognition; Springer Nature Switzerland AG; 2017; 183-212
dc.identifier.isbn
978-3-319-64679-4
dc.identifier.uri
http://hdl.handle.net/11336/163169
dc.description.abstract
Recent progress in deep learning has revolutionized speech recognition research, with Deep Neural Networks (DNNs) becoming the new state of the art for acoustic modeling. DNNs offer significantly lower speech recognition error rates compared to those provided by the previously used Gaussian Mixture Models (GMMs). Unfortunately, DNNs are data sensitive, and unseen data conditions can deteriorate their performance. Acoustic distortions such as noise, reverberation, channel differences, etc. add variation to the speech signal, which in turn impact DNN acoustic model performance. A straightforward solution to this issue is training the DNN models with these types of variation, which typically provides quite impressive performance. However, anticipating such variation is not always possible; in these cases, DNN recognition performance can deteriorate quite sharply. To avoid subjecting acoustic models to such variation, robust features have traditionally been used to create an invariant representation of the acoustic space. Most commonly, robust feature-extraction strategies have explored three principal areas: (a) enhancing the speech signal, with a goal of improving the perceptual quality of speech; (b) reducing the distortion footprint, with signal-theoretic techniques used to learn the distortion characteristics and subsequently filter them out of the speech signal; and finally (c) leveraging knowledge from auditory neuroscience and psychoacoustics, by using robust features inspired by auditory perception. In this chapter, we present prominent robust feature-extraction strategies explored by the speech recognition research community, and we discuss their relevance to coping with data-mismatch problems in DNN-based acoustic modeling. We present results demonstrating the efficacy of robust features in the new paradigm of DNN acoustic models. And we discuss future directions in feature design for making speech recognition systems more robust to unseen acoustic conditions. Note that the approaches discussed in this chapter focus primarily on single channel data.
dc.format
application/pdf
dc.language.iso
eng
dc.publisher
Springer Nature Switzerland AG
dc.rights
info:eu-repo/semantics/restrictedAccess
dc.rights.uri
https://creativecommons.org/licenses/by-nc-sa/2.5/ar/
dc.subject
SPEECH RECOGNITION
dc.subject
ROBUST FEATURES
dc.subject
DEEP LEARNING
dc.subject.classification
Ciencias de la Información y Bioinformática
dc.subject.classification
Ciencias de la Computación e Información
dc.subject.classification
CIENCIAS NATURALES Y EXACTAS
dc.title
Robust features in deep-learning-based speech recognition
dc.type
info:eu-repo/semantics/publishedVersion
dc.type
info:eu-repo/semantics/bookPart
dc.type
info:ar-repo/semantics/parte de libro
dc.date.updated
2022-07-25T15:38:32Z
dc.journal.pagination
183-212
dc.journal.pais
Suiza
dc.journal.ciudad
Cham
dc.description.fil
Fil: Mitra, Vikramjit. SRI International; Estados Unidos
dc.description.fil
Fil: Franco, Horacio. SRI International; Estados Unidos
dc.description.fil
Fil: Stern, Richard M.. University of Carnegie Mellon; Estados Unidos
dc.description.fil
Fil: Van Hout, Julien. SRI International; Estados Unidos
dc.description.fil
Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina
dc.description.fil
Fil: Graciarena, Martin. SRI International; Estados Unidos
dc.description.fil
Fil: Wang, Wen. SRI International; Estados Unidos
dc.description.fil
Fil: Vergyri, Dimitra. SRI International; Estados Unidos
dc.description.fil
Fil: Alwan, Abeer. University of California at Los Angeles; Estados Unidos
dc.description.fil
Fil: Hansen, John H. L.. University of Texas; Estados Unidos
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/url/https://link.springer.com/chapter/10.1007/978-3-319-64680-0_8
dc.relation.alternativeid
info:eu-repo/semantics/altIdentifier/doi/https://doi.org/10.1007/978-3-319-64680-0_8
dc.conicet.paginas
436
dc.source.titulo
New era for robust speech recognition: exploiting deep learning
Archivos asociados