3260 papers • 126 benchmarks • 313 datasets
PEG requires a model to extract phrases from text and locate objects from images simultaneously.
(Image credit: Papersgraph)
These leaderboards are used to track progress in phrase-extraction-and-grounding-peg-5
No benchmarks available.
Use these libraries to find phrase-extraction-and-grounding-peg-5 models and implementations
No datasets available.
No subtasks available.
This paper formulate PEG as a dual detection problem and proposes a novel DQ-DETR model, which introduces dual queries to probe different features from image and text for object prediction and phrase mask prediction and establishes new state-of-the-art results on all visual grounding benchmarks with a ResNet-101 backbone.
Adding a benchmark result helps the community track progress.