Abstract:
Tracking the pitch trajectory of a target speaker in hybrid speech is of great importance in speech monitoring, voice access, and dialog management. To improve the accuracy of pitch trajectory tracking and enforce the octave error suppression ability while reducing the system complexity, the harmonic product spectrum is used in the multipitch es-timation. Both the octave error correction and the pitch grouping are based on the vowel segment unit and using the harmonic saliency and the speaker's timbre features. In the evaluation over the speech data set of MIREX2005, the four performance indexes of the multipitch estimation are all higher than 75%, and the accuracy of pitch grouping in the hybrid speech can reach 92%.