3260 papers • 126 benchmarks • 313 datasets
Data Summarization is a central problem in the area of machine learning, where we want to compute a small summary of the data. Source: How to Solve Fair k-Center in Massive Data Models
(Image credit: Papersgraph)
These leaderboards are used to track progress in data-summarization-8
No benchmarks available.
Use these libraries to find data-summarization-8 models and implementations
No subtasks available.
This work proposes to simultaneously distill both images and their labels, thus assigning each synthetic sample a `soft' label (a distribution of labels) and demonstrates that text distillation outperforms other methods across multiple datasets.
A fast and accurate data selection method, in which the selected samples are optimized to span the subspace of all data, with linear complexity w.r.t. the number of data, and without any parameter to be tuned is presented.
This work introduces a more robust and flexible meta-learning algorithm for distillation, as well as an effective first-order strategy based on convex optimization layers, and shows it to be more effective than the prior image-based approach to dataset distillation.
A new Hermite series based sequential estimator for the Spearman rank correlation coefficient is described and an exponentially weighted estimator is introduced, which allows the local nonparametric correlation of a bivariate data stream to be tracked.
Simulation studies and tests on real data reveal the Gauss-Hermite based algorithms to be competitive with a leading existing algorithm and provide a solution to online distribution function and online quantile function estimation on data streams.
This work forms the standard table summarization problem, which deals with tables conforming to a single predefined schema, and proposes a mixed hierarchical attention based encoder-decoder model which is able to leverage the structure in addition to the content of the tables.
The experimental results on a real-world and an image dataset show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case, and a theoretical explanation of it is provided.
This paper proposes a novel online algorithm that can compute the nonparametric correlations 10 to 1,000 times faster than the corresponding batch algorithm, and it can compute them based either on all past observations or on fixed-size sliding windows.
Adding a benchmark result helps the community track progress.