Speaker recognition model based on multi-granularity spatio-temporal attention mechanism

ZHU Wenbo; WU Jing; JIN Hao; YE Weizhang; ZHU Zhen

doi:10.16300/j.cnki.1000-3630.23060601

ZHU Wenbo, WU Jing, JIN Hao, et al. Speaker recognition model based on multi-granularity spatio-temporal attention mechanism[J]. Technical Acoustics, 2025, 44(1): 93-101. DOI: 10.16300/j.cnki.1000-3630.23060601

Citation:

Speaker recognition model based on multi-granularity spatio-temporal attention mechanism

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Deep learning is widely applied in the field of speaker recognition. However, current models have the shortcoming in low recognition rates and high complex model parameters, making it difficult to achieve lightweight speech recognition. To address this issue, a speaker recognition model, named Multi-granularity Hybrid Compression Network (MGHC-NET), is proposed based on multi-granularity spatio-temporal attention mechanisms, which consists of a multi-granularity mixing module (MGMM), spatio-temporal attention mechanism module, and channel compression module. The MGMM and spatio-temporal attention mechanism module capture local temporal context features and spatial correlation feature information from a multi-scale modeling perspective, and couple the correlation features of different spatial-temporal information in a multi-granularity manner to enhance global spatio-temporal modeling capabilities. Meanwhile, the channel compression module aggregates different speaker channels and context-dependent representations to reduce the overall model parameters. Five-fold cross-validation experiments are conducted on multiple public datasets. The results show that the proposed method can effectively improve the speaker recognition accuracy and reduce the number of parameters, and achieve optimal performance compared to mainstream models. It has important application value in lightweight speaker recognition models.

FullText(HTML)

References (30)

Cited By

Turn off MathJax

Article Contents

Speaker recognition model based on multi-granularity spatio-temporal attention mechanism

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content