Abstract:
Since the Uyghur phonological system lacks tonal categories based on fundamental frequency (F0) variation, Uyghur-speaking Standard Chinese learners (USCLs) tend to exhibit difficulties such as inaccurate tone contour production, tone confusion, and insufficient neutral-tone reduction during tone acquisition. To improve tone recognition accuracy for USCLs, this study proposes a hybrid modeling approach combining Support Vector Machine (SVM) and Long Short-Term Memory (LSTM) networks. Using a self-constructed speech corpus of Standard Chinese produced by USCLs, we extracted acoustic parameters—including fundamental frequency (F0), formants, syllable duration, and differential features (e.g., ΔF0, ΔF1). First, SVM was employed to classify learners into proficiency groups; then, LSTM was applied to perform five-class Mandarin tone classification. Furthermore, learner grouping priors were integrated at the decision level via weighted score fusion. Experimental results show that the SVM model achieved 84.3% accuracy in proficiency grouping, while the LSTM model attained 78.9% accuracy in tone classification. After incorporating grouping priors through weighted fusion, the overall tone recognition accuracy improved to 83.6%. These findings indicate that leveraging learner proficiency priors can effectively complement dynamic F0 modeling and enhance tone recognition performance for USCLs. This study provides technical support for automated Standard Chinese pronunciation assessment and data-informed instructional feedback.