Artículo
Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems
Fecha de publicación:
02/2015
Editorial:
Elsevier Science
Revista:
Speech Communication
ISSN:
0167-6393
Idioma:
Inglés
Tipo de recurso:
Artículo publicado
Clasificación temática:
Resumen
We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features.
Archivos asociados
Licencia
Identificadores
Colecciones
Articulos(OCA CIUDAD UNIVERSITARIA)
Articulos de OFICINA DE COORDINACION ADMINISTRATIVA CIUDAD UNIVERSITARIA
Articulos de OFICINA DE COORDINACION ADMINISTRATIVA CIUDAD UNIVERSITARIA
Citación
Ferrer, Luciana; Bratt, Harry; Richey, Colleen; Franco, Horacio; Abrash, Victor; et al.; Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems; Elsevier Science; Speech Communication; 69; 2-2015; 31-45
Compartir
Altmétricas