Abstract:
To address the scarcity of side-scan sonar image data for seabed sediments, this study proposes a sonar image generation framework based on a Hierarchical Residual Transformer (HR-Transformer) model. The model employs a hierarchical attention mechanism and a multi-scale feature fusion strategy to generate sonar images with high fidelity in both visual appearance and structural detail. Experiments on a self-acquired dataset of side-scan sonar seabed sediment images demonstrate that the HR-Transformer achieves a balance between the authenticity of microscopic details and macroscopic structures. The generated images attain a Peak Signal-to-Noise Ratio (PSNR) of 33.96 dB and a Structural Similarity Index (SSIM) of 0.94, representing improvements of 15.4% and 16.0% over benchmark models, respectively. The frequency domain loss remains stable below 0.015, and the perceptual loss is as low as 0.01, validating the model's efficient learning of the characteristics of seabed sediment side-scan sonar images. The framework significantly outperforms existing methods in terms of physical consistency, both in microscopic texture (e.g., continuity of sand ripples) and macroscopic structure (e.g., pore distribution of reefs). Validated by quantitative metrics, this study provides a new paradigm for seabed sediment data generation and lays a technical foundation for future research directions such as multimodal fusion (e.g., integrating multibeam sonar and LiDAR data using cross-modal attention mechanisms to construct geometric-textural composite maps) and dynamic environment modeling (e.g., combining Transformer-LSTM temporal modules to capture seabed evolution patterns).