Abstract:
This study proposes an approach to automatically recognizing short Chinese tone based on the fusion of prosodic and cepstral features to improve the recognition rate of Chinese tone.The fused features include seven prosodic features and their statistic parameters based on different models as well as four MFCC log posterior probabilities calculated from four Gaussian mixture models (GMM).Experiments have two steps:First,the classifiers based on prosodic features and cepstral features are combined to classify tone,and both of the two classifiers are given weights to examine the functions of prosodic features and cepstral features on tone classification;Second,seven reduced prosodic features based on different models and four log posterior probabilities obtained from frame-level MFCC which are modeled by Gaussian mixture model are concatenated into a fusion feature.Then,the tone classification performances of five classifiers,namely GMM with two configurations,back propagating neural network (BPNN),support vector machine (SVM) and convolutional neural network (CNN),are compared and evaluated with three indicators of accuracy,unweighted average recall (UAR) and Cohen's Kappa coefficient.Results show that:(1) Cepstral feature method can improve the recognition rate of Chinese tone classification and the weight of the features in the overall tone classification is 0.11;(2) Deep learning method of CNN using fused features outperforms other classifiers with a recognition rate of 87.6%,which is improved by 5.87% compared with the GMM baseline system.This study indicates that cepstral features provide complementary information to tone classification and hence improve the recognition rate.This new method could also be applied to other relevant researches on prosody detection and paralinguistic information detection.