3260 papers • 126 benchmarks • 313 datasets
A task where an algorithm should generate the judgment scores correlating with human judgments.
(Image credit: Papersgraph)
These leaderboards are used to track progress in human-judgment-correlation-1
Use these libraries to find human-judgment-correlation-1 models and implementations
No subtasks available.
The surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references is reported.
This work proposes the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID), and extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks.
A novel metric for measuring scene graph similarity is introduced, which, when combined with the improved scene graph parser, achieves state-of-the-art (SOTA) results on multiple benchmark datasets for the aforementioned tasks.
Adding a benchmark result helps the community track progress.