高级检索

基于迁移学习和基频特征融合的文本相关说话人识别框架

The text-dependent speaker verification framework based on transfer learning and feature fusion

  • 摘要: 目前,面向我国金融支付的说话人识别技术在社会层面上没有大范围的推广,其原因在于数据集的缺乏以及识别技术未能满足安全性要求。针对上述问题,文章录制了用于中文数字串文本相关说话人识别的SHALCASWXSD22B数据集,用于金融支付场景中的数字串声纹识别研究,并提出一种基于迁移学习和基频特征融合的文本相关说话人识别框架,提高了文本相关说话人识别技术的可靠性。在数字串SHALCAS-WXSD22B-d006和SHALCASWXSD22B-d007语料实验中,所提框架实现的最佳等错误率分别为0.88%和1.05%,与ECAPA-TDNN基线模型相比等错误率相对降低了17和20个百分点,且达到了支付场景下的声纹识别安全性指标。实验结果表明,文中所提框架不仅具有更好的识别准确率和安全性能,而且同样能提高框架中包括ResNet34在内的其他log-Mel识别模型的性能。

     

    Abstract: The speaker verification technique for financial payments in China is not widely promoted at the societal level due to lack of datasets and the security of the models. In this paper, a text-related speaker verification framework based on transfer learning and fundamental frequency feature fusion is proposed to address the above problems on the self-recorded SHALCAS-WXSD22B dataset. In the digital string SHALCAS-WXSD22B-d006 and SHALCASWXSD22B-d007 corpus experiments, the best equal error rates achieved by the proposed framework implementation are 0.88% and 1.05%. Compared with the ECAPA-TDNN baseline model, this method can reduce the equal error rates by 17% and 20% respectively and achieves security indicators in the field of financial payments. The experimental results show that the proposed method not only has better recognition accuracy and higher security performance compared to baseline methods, but also can be applied to other log-Mel models including ResNet34.

     

/

返回文章
返回