3260 papers • 126 benchmarks • 313 datasets
You can read these blog posts to get an overview of the approaches. A Visual Survey of Data Augmentation in NLP
(Image credit: Papersgraph)
These leaderboards are used to track progress in text-augmentation-24
No benchmarks available.
Use these libraries to find text-augmentation-24 models and implementations
No datasets available.
No subtasks available.
EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion, which shows that EDA improves performance for both convolutional and recurrent neural networks.
It is shown that crop and rotate provides improvements over the models trained with non-augmented data for majority of the languages, especially for languages with rich case marking systems.
This work retrofit a language model with a label-conditional architecture, which allows the model to augment sentences without breaking the label-compatibility and improves classifiers based on the convolutional or recurrent neural networks.
A Pairwise Augmentation (PairAug) approach that contains an Inter-patient Augmentation (InterAug) branch and an Intra-patient Augmentation (IntraAug) branch that generates radiology images using synthesised yet plausible reports derived from a Large Language Model (LLM).
The proposed method can make use of arbitrary, non-deterministic transformation functions, is robust to misspecified user input, and is trained on unlabeled data, which can be used to perform data augmentation for any end discriminative model.
A sequence-to-sequence generation based data augmentation framework that leverages one utterance’s same semantic alternatives in the training data to produce diverse utterances that help to improve the language understanding module.
This engineering work focuses on the use of practical, robust, scalable and easy-to-implement data augmentation pre-processing techniques similar to those that are successful in computer vision.
This work compares augmentation based on global error statistics with one based on per-word unigram statistics of ASR errors and concludes that it is better to only pay attention the global substitution, deletion and insertion rates.
The effect of different approaches to text augmentation is studied to provide insights for practitioners and researchers on making choices for augmentation for classification use cases and the use of \emph{mixup} further improves performance of all text based augmentations and reduces the effects of overfitting on a tested deep learning model.
Adding a benchmark result helps the community track progress.