reasoning-1

Human Judgment Correlation

3260 papers • 126 benchmarks • 313 datasets

A task where an algorithm should generate the judgment scores correlating with human judgments.

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in human-judgment-correlation-1

Trend

Dataset

Best Model

Actions

Flickr8k-Expert

Flickr8k-CF

Libraries

i

Use these libraries to find human-judgment-correlation-1 models and implementations

Datasets

MT-Bench

Subtasks

No subtasks available.

Most implemented papers

CLIPScore: A Reference-free Evaluation Metric for Image Captioning

Yejin Choi, Ari Holtzman, Ronan Le Bras, Maxwell Forbes, Jack Hessel•Sat Apr 17 2021

The surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references is reported.

2365

Content

0

Paper Graph

Mutual Information Divergence: A Unified Metric for Multimodal Generative Models

Kang Min Yoo, Sang-Woo Lee, Jin-Hwa Kim, Jiyoung Lee, Yunji Kim•Tue May 24 2022

This work proposes the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID), and extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks.

41 0

Paper Graph

FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing

Gholamreza Haffari, Quan Hung Tran, Terry Yue Zhuo, Fei Li, Donghong Ji, Zhuang Li, Yuyang Chai, Lizhen Qu•Fri May 26 2023

A novel metric for measuring scene graph similarity is introduced, which, when combined with the improved scene graph parser, achieves state-of-the-art (SOTA) results on multiple benchmark datasets for the aforementioned tasks.

54 0

Paper Graph

Adding a benchmark result helps the community track progress.