TubeDETR: Spatio-Temporal Video Grounding with Transformers - Citation Graph | Papersgraph