采用自适应数据增强的歌声转换方法

谈林涛; 简志华

doi:10.16300/j.cnki.1000-3630.25092501

采用自适应数据增强的歌声转换方法

A singing voice conversion method using adaptive data augmentation

摘要

摘要: 传统歌声转换模型对内容信息与身份特征的解耦不充分，源歌手演唱风格难以完全消除，目标歌手特征学习不足，导致转换效果不自然。为提升模型泛化能力与鲁棒性，数据增强已成为模型训练的关键环节，但现有方法较为单一且难以有效保留歌声有效信息。为此，本文提出自适应数据增强的歌声转换模型ADA-SVC。该方法在训练中引入自适应数据增强模块，根据声学原理动态生成“内容相同、音色微变”的高质量样本，使模型学会区分内容与身份特征，实现有效解耦。同时，利用歌手编码器提取歌手信息，音高提取器建模音高信息，并结合VITS中的前、后验编码器与Flow模块完成歌声转换。实验表明，ADA-SVC在MCD指标上较So-vits模型提升8.7%，主观相似度MOS显著优于基线与消融模型，转换质量明显提高。

Abstract: Traditional singing voice conversion models suffer from insufficient disentanglement between content information and speaker identity features. As a result, the singing style of the source singer cannot be fully eliminated, and feature learning for the target singer remains inadequate, leading to unnatural conversion outcomes. To enhance model generalization and robustness, data augmentation has become critical; however, existing methods are relatively simplistic and struggle to preserve the salient acoustic information of singing voices. To address these issues, this paper proposes ADA-SVC, a singing voice conversion model with adaptive data augmentation. Our method introduces an adaptive data augmentation module during training, which dynamically generates high-quality samples sharing identical linguistic and prosodic content but with subtly modified timbre—grounded in acoustic principles. This enables the model to better distinguish between content and speaker identity features, thereby achieving more effective disentanglement. Meanwhile, a speaker encoder is employed to extract singer-specific information, a pitch extractor is used to model fundamental frequency contours, and the prior/posterior encoders along with the normalizing flow module from VITS are integrated to realize end-to-end singing voice conversion. Experimental results demonstrate that ADA-SVC improves the MCD score by 8.7% over the So-VITS baseline, and the subjective similarity MOS is significantly higher than both the baseline and ablation models, indicating a clear improvement in conversion quality.

HTML全文

参考文献(30)

施引文献

资源附件(0)