3260 papers • 126 benchmarks • 313 datasets
A task where an algorithm judges which sample is better in accordance with human judgment.
(Image credit: Papersgraph)
These leaderboards are used to track progress in human-judgment-classification-6
Use these libraries to find human-judgment-classification-6 models and implementations
No datasets available.
No subtasks available.
The surprising empirical finding that CLIP (Radford et al., 2021), a cross-modal model pretrained on 400M image+caption pairs from the web, can be used for robust automatic evaluation of image captioning without the need for references is reported.
This work proposes the negative Gaussian cross-mutual information using the CLIP features as a unified metric, coined by Mutual Information Divergence (MID), and extensively compare it with competing metrics using carefully-generated or human-annotated judgments in text-to-image generation and image captioning tasks.
Adding a benchmark result helps the community track progress.