3260 papers • 126 benchmarks • 313 datasets
Given a set of candidate models, the goal of Model Selection is to select the model that best approximates the observed data and captures its underlying regularities. Model Selection criteria are defined such that they strike a balance between the goodness of fit, and the generalizability or complexity of the models. Source: Kernel-based Information Criterion
(Image credit: Papersgraph)
These leaderboards are used to track progress in model-selection-6
No benchmarks available.
Use these libraries to find model-selection-6 models and implementations
No subtasks available.
This work proposes BERTScore, an automatic evaluation metric for text generation that correlates better with human judgments and provides stronger model selection performance than existing metrics.
This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.
Population Based Training is presented, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance.
The standard splits for the CholeCT50 and CholecT45 datasets are introduced and shown how they compare with existing use of the dataset and a metrics library is developed, ivtmetrics, for model evaluation on surgical triplets.
This work proposes a new CNN architecture which introduces an adaptation layer and an additional domain confusion loss, to learn a representation that is both semantically meaningful and domain invariant and shows that a domain confusion metric can be used for model selection to determine the dimension of an adaptationlayer and the best position for the layer in the CNN architecture.
Metric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators.
The goal of this work is to provide a comprehensive range of statistical tools and open-source software for nonparametric CDE and method assessment which can accommodate different types of settings and be easily fit to the problem at hand.
This dissertation aims to provide a history of web exceptionalism from 1989 to 2002, a period chosen in order to explore its roots as well as specific cases up to and including the year in which descriptions of “Web 2.0” began to circulate.
Adding a benchmark result helps the community track progress.