基于特征融合与SMOTE过采样的病理语音检测

于海畅; 许涛; 钟林堃; 吴季聪

doi:10.16300/j.cnki.1000-3630.24052201

基于特征融合与SMOTE过采样的病理语音检测

Pathological voice detection based on feature fusion and SMOTE oversampling

摘要

摘要: 针对病理语音检测(voice pathology detection, VPD)中易出现的数据不平衡问题，文章提出了一种基于合成少数类过采样技术(synthetic minority oversampling technique, SMOTE)的病理语音检测模型。通过SMOTE将原始不平衡数据集转化为类平衡数据集，同时使用最近邻清理法(edited nearest neighbor, ENN)移除SMOTE过采样生成的噪声数据，提高传统模型的分类性能。从原始语音信号中分别提取梅尔倒谱系数(Mel-frequency cepstral coefficients, MFCC)与线性预测系数(linear predictive coefficients, LPC)，将其输入到卷积神经网络(convolutional neural network, CNN)中进行训练，验证和评估VPD在类平衡数据集下的分类性能。同时，文章还将VPD模型在类平衡数据集与原始数据集下的性能进行对比评估。由于原始数据集存在数据不平衡的问题，使用准确率来评估分类器的性能并不全面，所以文章使用精确度召回率、特异度、G值和F₁分数等模型评价指标进行对比分析。结果表明，在相同的网络结构下，将SMOTE-ENN处理后的类平衡数据集作为输入的VPD性能远优于原始VPD性能。此外，还将MFCC与LPC进行特征融合，作为新特征输入到VPD中进行对比。结果显示，与单个特征相比，融合特征的准确率有所提高。

Abstract: In response to the data imbalance issue prevalent in voice pathology detection (VPD), this study introduces a pathological voice detection model that utilizes the synthetic minority oversampling technique (SMOTE) for oversampling. The original imbalanced dataset is transformed into a class-balanced dataset using SMOTE, and the edited nearest neighbor (ENN) method is applied to remove the noise data generated by SMOTE oversampling, thereby improving the classification performance of traditional models. Mel-frequency cepstral coefficients (MFCC) and linear predictive coefficients (LPC) are extracted from the original voice signals and fed into a convolutional neural network (CNN) for training to verify and evaluate the classification performance on the class-balanced dataset. This study also conducts a comparative evaluation of the VPD model’s performance on the class-balanced and the original datasets. Due to the data imbalance in the original dataset, relying solely on accuracy to evaluate the classifier’s performance is not comprehensive. Hence, this study judiciously employs various evaluation metrics for comparative analysis. The findings reveal that, under an identical network structure, the performance of VPD with the class-balanced dataset after SMOTE-ENN as input is much higher than that of the original dataset. Moreover, the study explores feature fusion by combining MFCC and LPC as new feature inputs into VPD, and the comparative results demonstrate an improvement in accuracy over the use of a single feature.

HTML全文

参考文献(23)

施引文献

资源附件(0)