IIRC DeTr generate a sequence to predict boxes of objects. I think this paradigm can be applied to such models. “Think before you locate” could be a new path to explore.
https://github.com/Jamie-Stirling/RetNet non-official implementation
I also want to share some resources.
For Pytorch,
For TPU,
indeed it would be great if the authors did so. I personally found some non-official implementations: