3260 papers • 126 benchmarks • 313 datasets
A new task for testing the long-sequence modeling capabilities and efficiency of language models. Image credit: SCROLLS: Standardized CompaRison Over Long Language Sequences
(Image credit: Papersgraph)
These leaderboards are used to track progress in long-range-modeling-9
Use these libraries to find long-range-modeling-9 models and implementations
No subtasks available.
The Structured State Space sequence model (S4) is proposed based on a new parameterization for the SSM, and it is shown that it can be computed much more efficiently than prior approaches while preserving their theoretical strengths.
A simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D are presented and a powerful feature extractor named MS-G3D is developed based on which the model outperforms previous state-of-the-art methods on three large-scale datasets.
A systematic and unified benchmark, LRA, specifically focused on evaluating model quality under long-context scenarios is proposed, paving the way towards better understanding this class of efficient Transformer models.
This paper introduces Mega, a simple, theoretically grounded, single-head gated attention mechanism equipped with (exponential) moving average to incorporate inductive bias of position-aware local dependencies into the position-agnostic attention mechanism.
A new SSM layer, H3, is proposed that is explicitly designed for the impact on language modeling and achieves promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.
This work introduces SCROLLS, a suite of tasks that require reasoning over long texts, and examines existing long-text datasets, and handpick ones where the text is naturally long, while prioritizing tasks that involve synthesizing information across the input.
This work shows that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal, and proposes a new diagonal state space model (DSS) that is conceptually simpler and straightforward to implement.
This work systematically describes various design choices in parameterizing and computing diagonal SSMs, and performs a controlled empirical study ablating the effects of these choices.
A simple yet effective Spatial Calibration Module (SCM) is introduced for accurate WSOL, incorporating semantic similarities of patch tokens and their spatial relationships into a unified diffusion model, and introduces a learnable parameter to dynamically adjust the semantic correlations and spatial context intensities for effective information propagation.
A state space layer that can leverage efficient and widely implemented parallel scans, allowing S5 to match the computational efficiency of S4, while also achieving state-of-the-art performance on several long-range sequence modeling tasks.
Adding a benchmark result helps the community track progress.