Abstract:
Bird call recognition technology is a kind of technology that uses a variety of bird sounds as input after preprocessing, and identifies the corresponding bird species through the network model. In real natural environment, the single audio feature in bird call recognition has a limitation that the characteristics of bird calls cannot be fully described from preprocessing and the learning ability of the network model is poor. In this paper, a bird call recognition method based on feature fusion and attention mechanism is presented. First, Mel frequency cepstrum coefficients and power-normalized cepstral coefficients are obtained during feature extraction in the bird calls preprocessing stage. Secondly, the two features are fused by using the mean and variance normalization processing to obtain a new fusion feature called MPFC. Then, ResNet-50 is used as the backbone network, and by inserting coordinate attentionm mechanism into its residual module to improve the network model, an improved attention residual network model called ResCA can be obtained. Finally, the fusion features are respectively input to the ResCA, ResNet-50, ResNeSt-50, DenseNet-121 and EfficientNet-B0 for comparison in the two datasets Birdsdata and BirdCLEF. The results show that the fusion feature has better characterization ability than the single feature, and can improve the recognition rate. The improved network also has a better recognition effect.