An Efficient Transformer for Simultaneous Learning of BEV and Lane Representations in 3D Lane Detection
1 Conclusion
In this paper, we propose an efficient transformer for 3D lane detection, utilizing a decomposed attention mechanism to simultaneously learn lane and BEV representations. The mechanism decomposes the cross-attention between image-view and BEV features into the one between image-view and lane features, and the one between lane and BEV features, both of which are supervised with ground-truth lane lines. This allows for a more accurate view transformation than IPM-based methods, and a more accurate lane detection than the traditional two-stage pipeline.