Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Deformable medical image registration is essential in medical image analyses. Recent transformer-based registration methods have achieved high registration accuracy. However, these methods often rely on patch embedding at the beginning of encoding, resulting in limited ability to capture detailed anatomical structural information in the images and explore local semantic relationships within individual patches. Here, we proposed a novel Dual-feet Encoder (DFEnc) to asynchronously model semantic information from moving and fixed images at various scales through two separate branches in three steps. For each step, features from adjacent resolution levels were processed by a Single Step Hybrid Extractor (SSHExt), which performed patch convolution to preserve local information, followed by several transformer blocks to capture global context. Dense connections were employed to enhance semantic awareness across adjacent feature resolution levels. Additionally, we introduced a Feature Fusion-based Decoder (FFDec) to progressively fuse features related to the fixed and moving images and to generate intermediate deformation fields at each stage, enabling accurate image alignment through stepwise warping and alignment refinement. Extensive ablation studies demonstrated the effectiveness of the proposed DFEnc, SSHExt, and FFDec. Compared to a state-of-the-art AutoFuse-Trans method, our approach yielded improvements in Dice of 1.41\%, 2.21\%, and 7.80\% on the ACDC, OASIS, and Abdomen CT datasets, respectively, while maintaining relatively low computational cost. These results suggest the utility of the proposed approach for broad research and clinical applications.