3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in data-compression-5
No benchmarks available.
Use these libraries to find data-compression-5 models and implementations
No datasets available.
No subtasks available.
This paper proposes a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning and provides insights on cache access patterns, data compression and sharding to build a scalable tree boosting system called XGBoost.
This work proposes DNABERT-2, a refined genome foundation model that adapts an efficient tokenizer and employs multiple strategies to overcome input length constraints, reduce time and memory expenditure, and enhance model capability, and introduces Byte Pair Encoding (BPE), a statistics-based data compression algorithm that constructs tokens.
A simple and general alternative to approximating subspaces using a locally linear, and potentially multiscale, dictionary is proposed, which instead uses pieces of spheres, or spherelets, to locally approximate the unknown subspace.
It is shown that nonlinear transforms built on Swin-transformers can achieve better compression efficiency than transforms built on convolutional neural networks (ConvNets), while requiring fewer parameters and shorter decoding time.
It is shown that quantization errors in norm have much higher influence on inner products than quantizationerrors in direction, and small quantization error does not necessarily lead to good performance in maximum inner product search (MIPS), so norm-explicit quantization (NEQ) is proposed — a general paradigm that improves existing VQ techniques for MIPS.
This work argues that for high-dimensional multi-class data, the optimal linear discriminative representation maximizes the coding rate difference between the whole dataset and the average of all the subsets and indicates that such a deep convolution network is significantly more efficient to construct and learn in the spectral domain.
This paper adopts ideas from knowledge distillation and neural image compression to compress intermediate feature representations more efficiently and shows that the learned feature representations can be tuned to serve multiple downstream tasks.
The first attempt at an algorithm for sandwiching the R-D function of a general (not necessarily discrete) source requiring only i.i.d. data samples is made, indicating theoretical room for improving state-of-the-art image compression methods by at least one dB in PSNR at various bitrates.
This work proposes a new framework called BottleFit, which, in addition to targeted DNN architecture modifications, includes a novel training strategy to achieve high accuracy even with strong compression rates, and applies BottleFit on cutting-edge DNN models in image classification.
The present article aims to introduce neural compression to a broader machine learning audience by reviewing the necessary background in information theory and computer vision, and providing a curated guide through the essential ideas and methods in the literature thus far.
Adding a benchmark result helps the community track progress.