高级检索

基于特征融合与注意力机制的鸟类声纹识别方法

Bird call recognition based on feature fusion and attention mechanism

  • 摘要: 鸟类声纹识别技术是一种将经过预处理的多种鸟类声音作为输入,通过网络模型识别出相应鸟类的技术。针对真实环境下鸟类声纹识别中单一音频特征局限和模型学习特征能力不佳问题,文章提出了一种基于特征融合和注意力机制的鸟类声纹识别方法。首先,在特征提取时分别获取梅尔频率倒谱系数和功率正则化倒谱系数,其次利用均值和方差归一化处理将两种特征融合得到新型融合特征参数MPFC;然后,以ResNet-50为主干网络在其残差模块中引入轻量化坐标注意力机制得到改进网络模型—坐标注意力残差网络;最后,将融合特征分别输入到坐标注意力残差网络(residual coordinate attention net,ResCA),ResNet-50、ResNeSt-50、DenseNet-121和EfficientNet-B0并在两个数据集Birdsdata和BirdCLEF上进行对比实验。实验结果表明,融合特征比单一特征有更好的表征能力,能够提高一定识别率,改进网络也具有较好的识别效果。

     

    Abstract: Bird call recognition technology is a kind of technology that uses a variety of bird sounds as input after preprocessing, and identifies the corresponding bird species through the network model. In real natural environment, the single audio feature in bird call recognition has a limitation that the characteristics of bird calls cannot be fully described from preprocessing and the learning ability of the network model is poor. In this paper, a bird call recognition method based on feature fusion and attention mechanism is presented. First, Mel frequency cepstrum coefficients and power-normalized cepstral coefficients are obtained during feature extraction in the bird calls preprocessing stage. Secondly, the two features are fused by using the mean and variance normalization processing to obtain a new fusion feature called MPFC. Then, ResNet-50 is used as the backbone network, and by inserting coordinate attentionm mechanism into its residual module to improve the network model, an improved attention residual network model called ResCA can be obtained. Finally, the fusion features are respectively input to the ResCA, ResNet-50, ResNeSt-50, DenseNet-121 and EfficientNet-B0 for comparison in the two datasets Birdsdata and BirdCLEF. The results show that the fusion feature has better characterization ability than the single feature, and can improve the recognition rate. The improved network also has a better recognition effect.

     

/

返回文章
返回