基于注意力机制的LSTM语音情感主要特征选择

胡婷婷; 冯亚琴; 沈凌洁; 王蔚

doi:10.16300/j.cnki.1000-3630.2019.04.010

基于注意力机制的LSTM语音情感主要特征选择

The salient feature selection by attention mechanism based LSTM in speech emotion recognition

摘要

摘要: 传统的语音情感识别方式采用的语音特征具有数据量大且无关特征多的特点，因此选择出与情感相关的语音特征具有重要意义。通过提出将注意力机制结合长短时记忆网络（Long Short Term Memory，LSTM），根据注意力权重进行特征选择，在两个数据集上进行了实验。结果发现：（1）基于注意力机制的LSTM相比于单独的LSTM模型，识别率提高了5.4%，可见此算法有效提高了模型的识别效果；（2）注意力机制是一种有效的特征选择方法。采用注意力机制选择出了具有实际物理意义的声学特征子集，此特征集相比于原有公用特征集在降低了维数的情况下，提高了识别准确率；（3）根据选择结果对声学特征进行分析，发现有声片段长度特征、无声片段长度特征、梅尔倒谱系数（Mel-Frequency Cepstral Coefficient，MFCC）、F0基频等特征与情感识别具有较大相关性。

Abstract: The traditional approaches to speech emotion recognition use the acoustic features characterized by large amount of data and redundancy. So, it is of great significance to choose the important phonetic features related to emotion. In this study, the attention mechanism is combined with Long Short Term Memory (LSTM) to conduct feature selection according to the attention parameters. The results show that:(1) the recognition rate of the attention mechanism based LSTM is increased by 5.4% compared with the single LSTM model, so this algorithm effectively improves the recognition accuracy; (2) the attention mechanism is an effective feature selection method, by which, the subsets of acoustic features with practical physical significance can be selected to improve the recognition accuracy and reduce the dimension compared with the original common feature set; (3) according to the selection results, the acoustic features are analyzed, and it is found that the emotion recognition is correlated with the features of voiced segment length, unvoiced segment length, fundamental frequency F0 and Mel-frequency cepstral coefficients.

HTML全文

参考文献(28)

施引文献

资源附件(0)