Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

computer-vision-1

Grounded Situation Recognition

3260 papers • 126 benchmarks • 313 datasets

Grounded Situation Recognition aims to produce the structured image summary which describes the primary activity (verb), its relevant entities (nouns), and their bounding-box groundings.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in grounded-situation-recognition-2

Trend

Dataset

Best Model

Actions

SWiG

SWiG

Libraries

i

Use these libraries to find grounded-situation-recognition-2 models and implementations

Datasets

VASR

Subtasks

No subtasks available.

Most implemented papers

Collaborative Transformers for Grounded Situation Recognition

Junhyeong Cho, Youngseok Yoon, Suha Kwak•Tue Mar 29 2022

A novel approach where the two processes for activity classification and entity estimation are interactive and complementary, which achieves the state of the art in all evaluation metrics on the SWiG dataset.

36

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

0

Commonly Uncommon: Semantic Sparsity in Situation Recognition

Ali Farhadi, Mark Yatskar, Luke Zettlemoyer, Vicente Ordonez•Fri Dec 02 2016

This paper studies semantic sparsity in situation recognition, the task of producing structured summaries of what is happening in images, including activities, objects and the roles objects play within the activity.

46 0

Situation Recognition with Graph Neural Networks

R. Urtasun, S. Fidler, Jiaya Jia, Makarand Tapaswi, Ruiyu Li, Renjie Liao•Sun Aug 13 2017

A model based on Graph Neural Networks is proposed that allows us to efficiently capture joint dependencies between roles using neural networks defined on a graph and significantly outperforms existing work, as well as multiple baselines.

140 0

Grounded Situation Recognition

Ali Farhadi, Aniruddha Kembhavi, Luca Weihs, Mark Yatskar, Sarah Pratt•Wed Mar 25 2020

A Joint Situation Localizer is proposed and it is found that jointly predicting situations and groundings with end-to-end training handily outperforms independent training on the entire grounding metric suite with relative gains between 8% and 32%.

134 0

Attention-Based Context Aware Reasoning for Situation Recognition

Wei Lu, Thilini Cooray, Ngai-Man Cheung•Sun May 31 2020

Situation Recognition (SR) is a fine-grained action recognition task where the model is expected to not only predict the salient action of the image, but also predict values of all associated semantic roles of the action. Predicting semantic roles is very challenging: a vast variety of possibilities can be the match for a semantic role. Existing work has focused on dependency modelling architectures to solve this issue. Inspired by the success achieved by query-based visual reasoning (e.g., Visual Question Answering), we propose to address semantic role prediction as a query-based visual reasoning problem. However, existing query-based reasoning methods have not considered handling of inter-dependent queries which is a unique requirement of semantic role prediction in SR. Therefore, to the best of our knowledge, we propose the first set of methods to address inter-dependent queries in query-based visual reasoning. Extensive experiments demonstrate the effectiveness of our proposed method which achieves outstanding performance on Situation Recognition task. Furthermore, leveraging query inter-dependency, our methods improve upon a state-of-the-art method that answers queries separately. Our code: https://github.com/thilinicooray/context-aware-reasoning-for-sr

30 0

Grounded Situation Recognition with Transformers

Junhyeong Cho, Youngseok Yoon, Suha Kwak, Hyeonjun Lee•Thu Nov 18 2021

The attention mechanism of the model enables accurate verb classification by capturing high-level semantic feature of an image effectively, and allows the model to flexibly deal with the complicated and image-dependent relations between entities for improved noun classification and localization.

27 0

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement

A. Hauptmann, T. Mitamura, Zhi-Qi Cheng, Qianwen Dai, Siyao Li•Wed Aug 17 2022

A novel two-stage framework that focuses on utilizing bidirectional relations within verbs and roles is proposed, and extensive experimental results show that the renovated framework outperforms other state-of-the-art methods under various metrics.

48 0

Rethinking the Two-Stage Framework for Grounded Situation Recognition

Xiaoyu Yue, Meng Wei, Tat-seng Chua, Long Chen, Wei Ji•Thu Dec 09 2021

A novel SituFormer for GSR which consists of a Coarse-to-Fine Verb Model (CFVM) and a Transformer-based Noun Model (TNM), which is a transformer-based semantic role detection model, which detects all roles parallelly.

38 0

ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition

Basura Fernando, Debaditya Roy, Dhruv Verma•Sat Jul 01 2023

A cross-attention-based Transformer known as ClipSitu XTF outperforms existing state-of-the-art by a large margin of 14.1\% on semantic role labelling (value) for top-1 accuracy using imSitu dataset.

9 0

Adding a benchmark result helps the community track progress.

Grounded Situation Recognition | State-of-the-Art