Abstract:
A multi-task learning residual convolutional network (MTL-ResCNN) method for sound source localization and depth estimation is proposed to address the problem that it is difficult to estimate the sound source depth by the planar microphone array-based sound source localization methods. The proposed network model has two output branches to achieve sound source localization and depth estimation, respectively. The network uses functional beamforming (FBF) imaging results as input features. A high-resolution and side-lobes-free target map is designed as the label of the network to improve the source identification performance of functional beamforming, while the distance between the source plane and the measurement array is uniformly discretized into different depth classes, and the source depth is estimated based on the probability of the depth classes output by the network. The simulation results show that in the test set of five frequencies the proposed method has a localization accuracy of no less than 96.95%, an average distance error of less than
0.0034 m, and a classification accuracy of more than 99.05%, which can accurately locate the sound source and estimate the source depth. In addition, the method can effectively identify the sound source with good generalization even under a low signal to noise ratio.