Abstract:
The traditional approaches to speech emotion recognition use the acoustic features characterized by large amount of data and redundancy. So, it is of great significance to choose the important phonetic features related to emotion. In this study, the attention mechanism is combined with Long Short Term Memory (LSTM) to conduct feature selection according to the attention parameters. The results show that:(1) the recognition rate of the attention mechanism based LSTM is increased by 5.4% compared with the single LSTM model, so this algorithm effectively improves the recognition accuracy; (2) the attention mechanism is an effective feature selection method, by which, the subsets of acoustic features with practical physical significance can be selected to improve the recognition accuracy and reduce the dimension compared with the original common feature set; (3) according to the selection results, the acoustic features are analyzed, and it is found that the emotion recognition is correlated with the features of voiced segment length, unvoiced segment length, fundamental frequency F0 and Mel-frequency cepstral coefficients.