3260 papers • 126 benchmarks • 313 datasets
Confidence calibration – the problem of predicting probability estimates representative of the true correctness likelihood – is important for classification models in many applications. The two common calibration metrics are Expected Calibration Error (ECE) and Maximum Calibration Error (MCE).
(Image credit: Papersgraph)
These leaderboards are used to track progress in classifier-calibration-3
Use these libraries to find classifier-calibration-3 models and implementations
No subtasks available.
Instead of randomly dropping parts of the network as in MC-dropout, Masksemble relies on a fixed number of binary masks, which are parameterized in a way that allows to change correlations between individual models.
This work presents a new approach to multi-class probability estimation by turning IVAPs and CVAPs into multiclass probabilistic predictors, which are experimentally more accurate than both uncalibrated predictors and existing calibration methods.
It is shown that on most tasks the best self-supervised models outperform supervision, confirming the recently observed trend in the literature and finding ImageNet Top-1 accuracy to be highly correlated with transfer to many-shot recognition, but increasingly less so for few-shot, object detection and dense prediction.
This work presents a novel framework to measure and calibrate biased (or miscalibrated) confidence estimates of object detection methods and proposes a new measure to evaluate miscalibration of object detectors.
This article proposes expanding these classifiers’ scores to higher dimensions to boost the calibrator’s performance, and suggests multi-score calibration provides better performance than single score calibration in the majority of experiments, including the two real datasets.
This work proposes a three-stage framework that allows to explicitly and effectively address the challenges of generalized and incremental few shot learning and evaluates the proposed framework on four challenging benchmark datasets for image and video few-shot classification and obtains state-of-the-art results.
HH can serve as a useful diagnostic tool for identifying when local calibration methods would be beneficial, and two similarity-weighted calibration methods that can address HH by adapting locally to each test item are introduced.
A novel and simple algorithm called Classifier Calibration with Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated gaussian mixture model, which achieves state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFar-100, and CINIC-10.
This work proves for several notions of calibration that solving the reduced problem minimizes the corresponding notion of miscalibration in the full problem, allowing the use of non-parametric recalibration methods that fail in higher dimensions.
Adding a benchmark result helps the community track progress.