3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in playing-the-game-of-2048-2
Use these libraries to find playing-the-game-of-2048-2 models and implementations
No subtasks available.
This work shows that extrapolation can be enabled by simply changing the position representation method, though it finds that current methods do not allow for efficient extrapolation, and introduces a simpler and more efficient position method, Attention with Linear Biases (ALiBi).
GShard enabled us to scale up multilingual neural machine translation Transformer model with Sparsely-Gated Mixture-of-Experts beyond 600 billion parameters using automatic sharding and it is demonstrated that such a giant model can efficiently be trained on 2048 TPU v3 accelerators in 4 days to achieve far superior quality for translation from 100 languages to English compared to the prior art.
The best known 2048 playing program is developed, which confirms the effectiveness of the introduced methods for discrete-state Markov decision problems, with the aim to develop a strong playing program.
LSTR provides an effective and efficient method to model long videos with fewer heuristics, which is validated by extensive empirical analysis and achieves state-of-the-art performance on three standard online action detection benchmarks.
A novel spatial-separated curve rendering network for efficient and high-resolution image harmonization for the first time and reduces more than 90% parameters compared with previous methods but still achieves the state-of-the-art performance on both synthesized iHarmony4 and real-world DIH test sets.
A new algorithm is introduced, Stochastic MuZero, that learns a stochastic model incorporating afterstates, and uses this model to perform a stochastic tree search and maintain the superhuman performance of standard MuZero in the game of Go.
This paper empirically evaluates the effectiveness on two neural networks: AlexNet and ResNet-50 trained with the ImageNet-1k dataset while preserving the state-of-the-art test accuracy, and uses large batch size, powered by the Layer-wise Adaptive Rate Scaling (LARS) algorithm, for efficient usage of massive computing resources.
This study shows that using the Adam optimization algorithm with a batch size of up to 2048 is a viable choice for carrying out large scale machine learning computations.
A belief propagation-tailored polar code approaches the SCL error-rate performance without any modifications in the decoding algorithm itself, based on the Genetic Algorithm.
An atrous convolutional encoder-decoder trained to denoise electron micrographs and outperforms their best mean squared error and structural similarity index performances.
Adding a benchmark result helps the community track progress.