An Efficient Transformer for Simultaneous Learning of BEV and Lane Representations in 3D Lane Detection

First Author
Institution1
Institution1 address
firstauthor@i1.org Second Author
Institution2
First line of institution2 address
secondauthor@i2.org

1 Conclusion

In this paper, we propose an efficient transformer for 3D lane detection, utilizing a decomposed attention mechanism to simultaneously learn lane and BEV representations. The mechanism decomposes the cross-attention between image-view and BEV features into the one between image-view and lane features, and the one between lane and BEV features, both of which are supervised with ground-truth lane lines. This allows for a more accurate view transformation than IPM-based methods, and a more accurate lane detection than the traditional two-stage pipeline.