Abstract:
The performance of ship radiated noise recognition systems is often weakened by underwater noise stemming from biological activities, concurrent ship operations, and natural phenomena such as wind and waves in practical applications. This paper proposes an end-to-end recognition method with learnable parameters that integrates time-frequency analysis and image processing techniques to enhance the feature extraction capability of underwater target recognition systems. The method utilizes a 1D convolutional neural network (CNN) to implement the STFT kernel, employs a 2D CNN to implement 2D Gabor filter banks, and constructs Gabor convolutional layers to process Log-Mel spectrograms, ultimately generating optimized time-frequency representations. Through the training process, the system can learn the conversion parameters from waveform to time-frequency representations, enabling it to capture more information about the target ship radiated noise. Results indicate that compared with the reference algorithm CNN10, the proposed method improves the recognition accuracy on the DeepShip dataset from 65.49% to 72.24%, can reuse PANNs (Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition) through transfer learning approaches, and increases the total number of parameters by only 1.06 million. This effectively improves the training speed and recognition accuracy and demonstrates better robustness under different interference factors, such as natural environmental noise, low cutoff frequency, and time-frequency variation.