DBLSTM-DCNN based conversion from bone conduction voice to air conduction voice
-
Abstract
In order to solve the problem that people's speech signals transmitted through air will be seriously distorted in strong noise environment, a speech conversion model from bone conduction voice to air conduction voice based on deep bidirectional long short term memory-deep convolutional neural network (DBLSTM-DCNN) is proposed in this paper. This model uses the DBLSTM layer to collect and save the hidden information of adjacent consecutive frames, and then uses the DCNN layer to extract feature information in the frequency domain. By this method, the problem that the converted voice is not natural enough due to the serious lack of high-frequency components of bone conduction voice can be well solved. The experimental results show that according to the good evaluation marks in objective indicators, such as perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI) and log-spectral distance (LSD), this speech conversion model is confirmed to have good conversion effect from bone conduction voice to air conduction voice.
-
-