分段语音时长规整算法

Time-scale modification of segmented speech

摘要: 一般的同步叠加算法在进行语音时长规整时,当压扩程度大且语音采样率低时,所得合成信号的语音质量会显著下降。其原因在于同步叠加算法忽略了语音本身的感知重要部分,过度压扩会损害语音的感知效果。针对此现象文章提出一种先根据频谱变化快慢和能量大小将语音划分为感知敏感,非敏感和次敏感部分,对各部分采用不同压扩比进行同步叠加的分段时长规整算法,希望能够提高合成语音质量。实验证明该算法在压扩程度高、低采样率时对语音质量有显著改善。

Abstract: The conventional SOLA method of time-scale modification encounters the problem that the higher the modification rate,the less intelligible the time-scale modified speech signal,because of the negl-ect of different contributions to articulation of different speech signal parts. This paper proposes a parti-tion time-scale modification method based on the knowledge that how fast spectrum changes and how much energy the signal contains,and both play a critical role in speech perception. After identifying portions with different spectrum and energy of a speech signal,the proposed method applies timescale modification to different portions with different modification rate. The result of subjective preference test indicates that the performance of the proposed method is superior to that of the conventional SOLA method.