Abstract:
In response to the data imbalance issue prevalent in Voice Pathology Detection (VPD), this study introduces a pathological voice detection model that utilizes the Synthetic Minority Oversampling Technique (SMOTE) for oversampling. The original imbalanced dataset is transformed into a class-balanced dataset using SMOTE, and the Edited Nearest Neighbor (ENN) method is applied to remove the noise data generated by SMOTE oversampling, thereby improving the classification performance of traditional models. Mel-frequency cepstral coefficients (MFCC) and Linear predictive coefficients (LPC) are extracted from the original voice signals and fed into a convolutional neural network (CNN) for training, validation, and assessment of VPD’s classification efficacy on the class-balanced dataset. This study also conducts a comparative evaluation of the VPD model’s performance on both the class-balanced and the original datasets. Due to the data imbalance in the original dataset, relying solely on accuracy to evaluate the classifier’s performance is not comprehensive. Hence, this study judiciously employs various evaluation metrics for comparative analysis. The findings reveal that, under an identical network structure, the performance of VPD with the class-balanced dataset after SMOTE-ENN as input is much higher than that of the original dataset. Moreover, the study explores feature fusion by combining MFCC and LPC as new feature inputs into VPD, and the comparative results demonstrate an improvement in accuracy over the use of a single feature.