融合SVM与LSTM的维吾尔族普通话学习者声调识别

索帅红; 古力努尔·艾尔肯; 郭文明; 艾斯卡尔·艾木都拉

doi:10.16300/j.cnki.1000-3630.26010501

融合SVM与LSTM的维吾尔族普通话学习者声调识别

Tone recognition for Uyghur Putonghua learners based on SVM-LSTM fusion

摘要

摘要: 由于维吾尔语中缺乏以基频变化为核心的声调类别，维吾尔族普通话学习者在普通话声调习得过程中容易出现调型控制不准确、声调混淆和轻声弱化不足等问题。为提高维吾尔族普通话学习者声调识别的准确性，文章采用支持向量机(support vector machine, SVM)与长短时记忆网络(long short-term memory, LSTM)相结合的方法，对维吾尔族普通话学习者声调识别进行建模。基于构建的维吾尔族普通话学习者语料库，提取基频、共振峰、时长及差分特征等声学参数，先利用SVM模型对学习者的发音能力进行分组，再利用LSTM完成五类声调分类，并在决策层通过加权融合处理进一步引入分组先验信息。实验结果表明：SVM模型在学习者能力分组任务中的准确率为84.3%；LSTM模型在声调分类任务中的准确率为78.9%，引入学习者能力分组先验信息后，融合模型的分类准确率提升至83.6%。结果进一步表明，引入学习者能力先验信息有助于补充动态基频建模数据，从而提升维吾尔族普通话学习者的声调识别性能。该研究可为普通话发音评测与教学反馈提供一定的技术支持。

Abstract: Due to the lack of tone categories primarily characterized by fundamental frequency variation in Uyghur phonological system, Uyghur Putonghua learners often experience difficulties in tone acquisition, including inaccurate tone contour control, tone confusion, and insufficient neutral-tone reduction. To improve the accuracy of tone recognition for these learners, this study proposes a hybrid modeling approach that combines support vector machine (SVM) and long short-term memory (LSTM) networks. Based on a self-constructed speech corpus of Uyghur Putonghua learners, acoustic parameters such as fundamental frequency, formants, syllable duration, and differential features were extracted. SVM was first used to classify learners into pronunciation proficiency groups, and LSTM was then employed to perform five-class tone classification. Learner grouping priors were further introduced at the decision level through weighted score fusion. Experimental results show that the SVM model achieved an accuracy of 84.3% in learner proficiency grouping, while the LSTM model achieved an accuracy of 78.9% in tone classification. After incorporating learner proficiency grouping priors, the accuracy of the fusion model increased to 83.6%. These results indicate that learner proficiency priors can supplement dynamic fundamental frequency modeling and improve tone recognition performance for Uyghur Putonghua learners. This study provides technical support for automated Putonghua pronunciation assessment and instructional feedback.

HTML全文

参考文献(24)

施引文献

资源附件(0)