Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
LiDAR semantic segmentation is a key task in advanced autonomous driving systems. Projection-based methods exhibit real-time potential due to their efficiency, but suffer from inevitable 3D information loss and rely on time-consuming post-processing, limiting overall performance. To address this, we propose MFINet, a real-time semantic segmentation network based on multi-view fusion and 2D-3D interaction enhancement. It adopts a three-branch architecture that integrates 3D Point View (3D-PV), 2D Bird’s Eye View (2D-BEV) and 2D Range View (2D-RV) to make full use of 2D and 3D representation. From 3D to 2D, we design a 3D Point Feature Projector (3DPFP), which injects 3D features into the 2D BEV and RV pseudo-images to retain effective 3D information. From 2D to 3D, a Feature Enhancement (FE) module is designed to leverage the advantages of 2D information in extracting geometric and semantic features. We also introduce a 2D-3D Fusion Head (FH) to aggregate point features from multiple views. Besides, we incorporate a Multi-Scale Dilated Attention (MSDA) module with a sliding window strategy to enhance feature discrimination. Extensive experiments on the SemanticKITTI and NuScenes benchmarks demonstrate that MFINet outperforms existing methods on the SemanticKITTI, NuScenes validation set and achieves competitive results on the NuScenes test set.
