高级检索

基于分层残差Transformer的声呐图像生成

Hierarchical Residual Transformer for Sonar Image Generation

  • 摘要: 针对海底底质侧扫声呐图像数据稀缺问题,本研究提出一种基于分层残差Transformer(Hierarchical Residual Transformer, HR-Transformer)的声呐图像生成框架。该模型通过层次化注意力机制与多尺度特征融合策略,旨在生成视觉与结构高保真度的声呐图像。在某次实测的侧扫声呐海底底质数据集上的实验表明,HR-Transformer实现了微观细节与宏观结构真实性的平衡,生成图像的峰值信噪比(PSNR)达33.96 dB,结构相似性(SSIM)达0.94,较基准模型分别提升了15.4%和16.0%。频域损失稳定在0.015以下,感知损失为0.01,验证了模型对海底底质侧扫声呐图像特征的高效学习能力。该框架在微观纹理(砂纹连续性)与宏观结构(礁石孔隙分布)的物理一致性上均显著优于现有方法。本研究通过量化指标验证,为海底底质数据生成提供了一种新范式,并为多模态融合(如引入跨模态注意力机制整合多波束声呐与LiDAR数据构建几何-纹理复合地图)及动态环境建模(结合Transformer-LSTM时序模块捕获海底演化规律)等未来研究方向奠定了技术基础。

     

    Abstract: To address the scarcity of side-scan sonar image data for seabed sediments, this study proposes a sonar image generation framework based on a Hierarchical Residual Transformer (HR-Transformer) model. The model employs a hierarchical attention mechanism and a multi-scale feature fusion strategy to generate sonar images with high fidelity in both visual appearance and structural detail. Experiments on a self-acquired dataset of side-scan sonar seabed sediment images demonstrate that the HR-Transformer achieves a balance between the authenticity of microscopic details and macroscopic structures. The generated images attain a Peak Signal-to-Noise Ratio (PSNR) of 33.96 dB and a Structural Similarity Index (SSIM) of 0.94, representing improvements of 15.4% and 16.0% over benchmark models, respectively. The frequency domain loss remains stable below 0.015, and the perceptual loss is as low as 0.01, validating the model's efficient learning of the characteristics of seabed sediment side-scan sonar images. The framework significantly outperforms existing methods in terms of physical consistency, both in microscopic texture (e.g., continuity of sand ripples) and macroscopic structure (e.g., pore distribution of reefs). Validated by quantitative metrics, this study provides a new paradigm for seabed sediment data generation and lays a technical foundation for future research directions such as multimodal fusion (e.g., integrating multibeam sonar and LiDAR data using cross-modal attention mechanisms to construct geometric-textural composite maps) and dynamic environment modeling (e.g., combining Transformer-LSTM temporal modules to capture seabed evolution patterns).

     

/

返回文章
返回