Speech Emotion Recognition Using Convolutional- Recurrent Neural Networks with Attention Model

YAWEI MU, LUIS A. HERNÁNDEZ GÓMEZ, ANTONIO CANO MONTES, CARLOS ALCARAZ MARTÍNEZ, XUETIAN WANG, HONGMIN GAO

Abstract


Speech Emotion Recognition (SER) plays an important role in human-computer interface and assistant technologies. In this paper, a new method is proposed using distributed Convolution Neural Networks (CNN) to automatically learn affect-salient features from raw spectral information, and then applying Bidirectional Recurrent Neural Network (BRNN) to obtain the temporal information from the output of CNN. In the end, an Attention Mechanism is implemented on the output sequence of the BRNN to focus on target emotion-pertinent parts of an utterance. This attention mechanism not only improves the classification accuracy, but also provides model’s interpretability. Experimental results show that this approach can gain 64.08% weighted accuracy and 56.41% unweighted accuracy for four-emotion classification in IEMOCAP dataset, which outperform previous results reported for this dataset.

Keywords


Speech emotion recognition, Distributed convolution neural networks, Bidirectional recurrent neural networks, Attention mechanism


DOI
10.12783/dtcse/cii2017/17273

Refbacks

  • There are currently no refbacks.