Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems

Ferrer, Luciana; Bratt, Harry; Richey, Colleen; Franco, Horacio; Abrash, Victor; Precoda, Kristin

doi:10.1016/j.specom.2015.02.002

Mostrar el registro sencillo del ítem

dc.contributor.author

Ferrer, Luciana Se ha confirmado la validez de este valor de autoridad por un usuario

dc.contributor.author

Bratt, Harry

dc.contributor.author

Richey, Colleen

dc.contributor.author

Franco, Horacio

dc.contributor.author

Abrash, Victor

dc.contributor.author

Precoda, Kristin

dc.date.available

2018-03-06T21:32:52Z

dc.date.issued

2015-02

dc.identifier.citation

Ferrer, Luciana; Bratt, Harry; Richey, Colleen; Franco, Horacio; Abrash, Victor; et al.; Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems; Elsevier Science; Speech Communication; 69; 2-2015; 31-45

dc.identifier.issn

0167-6393

dc.identifier.uri

http://hdl.handle.net/11336/38100

dc.description.abstract

We present a system for detection of lexical stress in English words spoken by English learners. This system was designed to be part of the EduSpeak® computer-assisted language learning (CALL) software. The system uses both prosodic and spectral features to detect the level of stress (unstressed, primary or secondary) for each syllable in a word. Features are computed on the vowels and include normalized energy, pitch, spectral tilt, and duration measurements, as well as log-posterior probabilities obtained from the frame-level mel-frequency cepstral coefficients (MFCCs). Gaussian mixture models (GMMs) are used to represent the distribution of these features for each stress class. The system is trained on utterances by L1-English children and tested on English speech from L1-English children and L1-Japanese children with variable levels of English proficiency. Since it is trained on data from L1-English speakers, the system can be used on English utterances spoken by speakers of any L1 without retraining. Furthermore, automatically determined stress patterns are used as the intended target; therefore, hand-labeling of training data is not required. This allows us to use a large amount of data for training the system. Our algorithm results in an error rate of approximately 11% on English utterances from L1-English speakers and 20% on English utterances from L1-Japanese speakers. We show that all features, both spectral and prosodic, are necessary for achievement of optimal performance on the data from L1-English speakers; MFCC log-posterior probability features are the single best set of features, followed by duration, energy, pitch and finally, spectral tilt features. For English utterances from L1-Japanese speakers, energy, MFCC log-posterior probabilities and duration are the most important features.

dc.format

application/pdf

dc.language.iso

eng

dc.publisher

Elsevier Science Se ha confirmado la validez de este valor de autoridad por un usuario

dc.rights

info:eu-repo/semantics/openAccess

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/2.5/ar/

dc.subject

Computer-Assisted Language Learning

dc.subject

Gaussian Mixture Models

dc.subject

Lexical Stress Detection

dc.subject

Mel Frequency Cepstral Coefficients

dc.subject

Prosodic Features

dc.subject.classification

Ciencias de la Computación Se ha confirmado la validez de este valor de autoridad por un usuario

dc.subject.classification

Ciencias de la Computación e Información Se ha confirmado la validez de este valor de autoridad por un usuario

dc.subject.classification

CIENCIAS NATURALES Y EXACTAS Se ha confirmado la validez de este valor de autoridad por un usuario

dc.title

Classification of lexical stress using spectral and prosodic features for computer-assisted language learning systems

dc.type

info:eu-repo/semantics/article

dc.type

info:ar-repo/semantics/artículo

dc.type

info:eu-repo/semantics/publishedVersion

dc.date.updated

2018-03-06T17:43:29Z

dc.journal.volume

69

dc.journal.pagination

31-45

dc.journal.pais

Países Bajos Se ha confirmado la validez de este valor de autoridad por un usuario

dc.journal.ciudad

Amsterdam

dc.description.fil

Fil: Ferrer, Luciana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Sri International; Estados Unidos

dc.description.fil

Fil: Bratt, Harry. Sri International; Estados Unidos

dc.description.fil

Fil: Richey, Colleen. Sri International; Estados Unidos

dc.description.fil

Fil: Franco, Horacio. Sri International; Estados Unidos

dc.description.fil

Fil: Abrash, Victor. Sri International; Estados Unidos

dc.description.fil

Fil: Precoda, Kristin. Sri International; Estados Unidos

dc.journal.title

Speech Communication Se ha confirmado la validez de este valor de autoridad por un usuario

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/url/http://www.sciencedirect.com/science/article/pii/S0167639315000151

dc.relation.alternativeid

info:eu-repo/semantics/altIdentifier/doi/http://dx.doi.org/10.1016/j.specom.2015.02.002

Archivos asociados

Tamaño: 878.1Kb

Formato: PDF

Descargar