深度神经网络的语音深度特征提取方法

Speech deep feature extraction method for deep neural network

摘要: 为了提升连续语音识别系统性能，将深度自编码器神经网络应用于语音信号特征提取。通过堆叠稀疏自编码器组成深度自编码器（Deep Auto-Encoding，DAE），经过预训练和微调两个步骤提取语音信号的本质特征，使用与上下文相关的三音素模型，以音素错误率大小为系统性能的评判标准。仿真结果表明相对于传统梅尔频率倒谱系数（Mel-Frequency Cepstral Coefficient，MFCC）特征以及优化后的MFCC特征，基于深度自编码器提取的深度特征更具优越性。

Abstract: In order to improve the performance of continuous speech recognition system, this paper applies the deep auto-encoder neural network to the speech signal feature extraction process. The deep auto-encoder is formed by stacking sparsely the auto-encoder. The neural networks based on deep learning introduce the greedy layer-wise learning algorithm by pre-training and fine-tuning. The context-dependent three-phoneme model is used in the continuous speech recognition system, and the phoneme error rate is taken as the criterion of system performance. The simulation results show that the deep auto-encoder based deep feature is more advantageous than the traditional MFCC features and optimized MFCC features.