利用谐波显著度和语者音色特征的混合语音中目标人基频轨迹提取

Target pitch trajectory extraction in hybrid speech by using harmonic saliency and speaker's timbre features

摘要: 从混合语音中提取出目标语者的基频轨迹，是语音监听、语音门禁、对话管理等应用的关键技术。为提高基频轨迹跟踪的准确率、增强抗八度误差的能力、降低系统复杂度，多基频估计以谐波乘积谱为核心，八度校正与基频分组均以元音段为基本单元，并结合了谐波显著度和语者音色特征。基于MIREX2005语音数据集的实验表明，MIREX的4种多基频估计性能指标均在75%以上，基频分组在混合语音中的判断准确率可达92%。

Abstract: Tracking the pitch trajectory of a target speaker in hybrid speech is of great importance in speech monitoring, voice access, and dialog management. To improve the accuracy of pitch trajectory tracking and enforce the octave error suppression ability while reducing the system complexity, the harmonic product spectrum is used in the multipitch es-timation. Both the octave error correction and the pitch grouping are based on the vowel segment unit and using the harmonic saliency and the speaker's timbre features. In the evaluation over the speech data set of MIREX2005, the four performance indexes of the multipitch estimation are all higher than 75%, and the accuracy of pitch grouping in the hybrid speech can reach 92%.