Time-frequency feature extraction methods for urban environmental noise sources
-
Abstract
To achieve the precise and efficient control demands of urban environmental noise, it is necessary to develop a high-performance automatic identification technology for noise sources. Sound feature extraction is an important foundation for environmental noise identification and affects the performance of subsequent identification models. This study compared the impact of several popular time-frequency feature extraction methods on the performance of a proposed Convolutional Recurrent Neural Network (CRNN) identification model for urban environmental noise sources, including linear-scaled Short-Time Fourier Transform (STFT), two types of Mel-scaled STFT: Mel-STFT and Log-Mel-scaled STFT (LM-STFT), as well as Constant-Q Transform (CQT). Subsequently, a CRNN model based on LM-STFT features for urban environmental noise source identification was proposed. The experimental results indicated that: 1) The feature extraction method affects the performance of the environmental noise source identification model; 2) The four types of time-frequency features all performed well in the proposed CRNN model, with an accuracy of over 88.1%, confirming their effectiveness in the urban environmental noise source identification task; 3) The performance of the three time-frequency features that incorporate filter bank transformations—Mel-STFT, LM-STFT, and CQT—is significantly better than that of the linear scale features, with an accuracy exceeding 91.3%; 4) The Mel-scale STFT method performs noticeably better than CQT. Among them, the CRNN model based on LM-STFT features achieved an accuracy of 94.0%, demonstrating its significantly superior performance in the environmental noise source identification task.
-
-