结合注意力机制和因果卷积网络的维吾尔语方言识别

The Uyghur dialect recognition based on attention mechanism and causal convolution networks

摘要: 针对传统x-vector模型生成方言语音段级表示时，未考虑不同帧级特征对方言辨识作用不一致的问题，以及维吾尔语的黏着性特点，提出结合注意力机制和因果卷积网络的维吾尔语方言识别方法。首先使用多层因果卷网络实现方言语音序列建模，然后采用空洞卷积核增大感受野扩展采样范围，最后使用注意力池化获取方言语音段级特征。维吾尔语方言识别实验结果表明，所提方法较标准x-vector模型方言识别的识别准确率提升了23.19个百分点。

Abstract: Considering that different frame features have different effects on dialect recognition when the traditional x-vector model is used to generate segment representation of dialect speech, and that Uighur language is an agglutinative language, a recognition method of Uighur dialect based on attention mechanism and causal convolution network is proposed. First, the multi-layer causal volume network is used to model the speech sequence, then the dilated convolution kernel is used to expand the sampling range of the receptive field, and finally the attention pooling is used to obtain the speech segment features. The experimental results of Uyghur dialect recognition show that the accuracy of the proposed method is 23.19 percentage higher than that of the standard x-vector model.