高级检索

融合SVM与LSTM的维吾尔族普通话学习者声调识别研究

Research on Tone Recognition of Uyghur Standard Chinese Learners Using SVM-LSTM Fusion

  • 摘要: 由于维吾中缺乏以基频变化为核心的声调类别,维吾尔族普通话学习者(uyghur standard chinese learners, USCLs)在普通话声调习得过程中容易出现调型控制不准确、声调混淆现象和轻声弱化不足等问题。为提高USCLs声调识别的准确性,本文采用结合支持向量机(support vector machine, SVM)与长短时记忆网络(long short-term memory, LSTM)的方法对USCLs声调识别进行建模。基于构建的USCLs普通话语料库,提取基频(fundamental frequency, F0)、共振峰、时长及差分等声学参数,首先利用SVM进行学习者发音能力分组,再利用LSTM完成五类声调分类,并在决策层引入分组先验信息进行加权融合处理。实验结果显示:SVM模型在学习者能力分组任务中的准确率为84.3%;LSTM模型在声调分类任务中的准确率为78.9%,引入学习者能力分组先验信息后,融合模型准确率提升至83.6%。结果进一步表明,引入学习者能力先验有助于补充动态基频建模数据,从而提升USCLs声调识别性能,研究可为普通话发音评测与教学反馈提供一定的技术支持。

     

    Abstract: Since the Uyghur phonological system lacks tonal categories based on fundamental frequency (F0) variation, Uyghur-speaking Standard Chinese learners (USCLs) tend to exhibit difficulties such as inaccurate tone contour production, tone confusion, and insufficient neutral-tone reduction during tone acquisition. To improve tone recognition accuracy for USCLs, this study proposes a hybrid modeling approach combining Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) networks. Using a self-constructed speech corpus of Standard Chinese produced by USCLs, we extracted acoustic parameters—including fundamental frequency (F0), formants, syllable duration, and differential features (e.g., ΔF0, ΔF1). First, SVM was employed to classify learners into proficiency groups; then, LSTM was applied to perform five-class Mandarin tone classification. Furthermore, learner grouping priors were integrated at the decision level via weighted score fusion. Experimental results show that the SVM model achieved 84.3% accuracy in proficiency grouping, while the LSTM model attained 78.9% accuracy in tone classification. After incorporating grouping priors through weighted fusion, the overall tone recognition accuracy improved to 83.6%. These findings indicate that leveraging learner proficiency priors can effectively complement dynamic F0 modeling and enhance tone recognition performance for USCLs. This study provides technical support for automated Standard Chinese pronunciation assessment and data-informed instructional feedback.

     

/

返回文章
返回