reasoning-11

Arithmetic Reasoning

3260 papers • 126 benchmarks • 313 datasets

This task has no description! Would you like to contribute one?

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in arithmetic-reasoning-11

Trend

Dataset

Best Model

Actions

GSM8K

Libraries

i

Use these libraries to find arithmetic-reasoning-11 models and implementations

epfllm/megatron-llm

2 papers 337

Datasets

GSM8K

MGSM

SMART-101

Subtasks

No subtasks available.

Most implemented papers

Mistral 7B

A. Mensch, Guillaume Lample, Thibaut Lavril, Diego de Las Casas, M. Lachaux, Thomas Wang, Teven Le Scao, Timothée Lacroix, Albert Qiaochu Jiang, Alexandre Sablayrolles, Chris Bamford, Devendra Singh Chaplot, Florian Bressand, Gianna Lengyel, Lucile Saulnier, Lélio Renard Lavaud, Pierre Stock, William El Sayed•Mon Oct 09 2023

This work introduces Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency, which leverages grouped-query attention (GQA) for faster inference, coupled with sliding window attention (SWA) to effectively handle sequences of arbitrary length with a reduced inference cost.

Content

skytliang/multi-agents-debate

2 papers 117

3033 0

Paper Graph

Llama 2: Open Foundation and Fine-Tuned Chat Models

Yinghai Lu, J. Reizenstein, Sharan Narang, Hugo Touvron, Aur'elien Rodriguez, Thibaut Lavril, Todor Mihaylov, Eric Michael Smith, Marcin Kardas, Ross Taylor, Robert Stojnic, Yixin Nie, Angela Fan, Louis Martin, Vedanuj Goswami, Yuning Mao, M. Lachaux, Cristian Canton Ferrer, Naman Goyal, Cynthia Gao, Pushkar Mishra, Adina Williams, Prajjwal Bhargava, Moya Chen, Madian Khabsa, Sergey Edunov, Saghar Hosseini, Shruti Bhosale, Punit Singh Koura, Puxin Xu, Thomas Scialom, Xavier Martinet, Guillem Cucurull, Kevin R. Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Niko-lay Bashlykov, Soumya Batra, D. Bikel, Lukas Blecher, David Esiobu, Jude Fernandes, J. Fu, Wenyin Fu, Brian Fuller, A. Hartshorn, Rui Hou, Hakan Inan, Viktor Kerkez, Isabel M. Kloumann, A. Korenev, Jenya Lee, Diana Liskovich, Igor Molybog, Andrew Poulton, Rashi Rungta, Kalyan Saladi, A. Schelten, Ruan Silva, R. Subramanian, Xia Tan, Binh Tang, Jian Xiang Kuan, Zhengxu Yan, Iliyan Zarov, Yuchen Zhang, M. Kambadur•Mon Jul 17 2023

This work develops and releases Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters, which may be a suitable substitute for closed-source models.

15666 0

Paper Graph

Large Language Models are Zero-Shot Reasoners

S. Gu, Yusuke Iwasawa, Yutaka Matsuo, Machel Reid, Takeshi Kojima•Mon May 23 2022

Experimental results demonstrate that the Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances on diverse benchmark reasoning tasks including arithmetics, symbolic reasoning, and other logical reasoning tasks, without any hand-crafted few-shot examples.

6322 0

Paper Graph

PAL: Program-aided Language Models

Graham Neubig, Pengfei Liu, Uri Alon, Jamie Callan, Aman Madaan, Yiming Yang, Shuyan Zhou, Luyu Gao•Thu Nov 17 2022

This paper presents Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter.

626 0

Paper Graph

Reasoning with Language Model Prompting: A Survey

Huajun Chen, Ningyu Zhang, Shumin Deng, Chuanqi Tan, Fei Huang, Yunzhi Yao, Shuofei Qiao, Yixin Ou•Sun Dec 18 2022

This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting with comparisons and summaries and provides systematic resources to help beginners.

404 0

Paper Graph

Batch Prompting: Efficient Inference with Large Language Model APIs

Jungo Kasai, Tao Yu•Wed Jan 18 2023

Batch prompting, a simple yet effective prompting approach that enables the LLM to run inference in batches, instead of one sample at a time, is proposed, which reduces both token and time costs while retaining downstream performance.

107 0

Paper Graph

LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models

Soujanya Poria, Lidong Bing, Roy Ka-Wei Lee, Lei Wang, Ee-Peng Lim, Zhiqiang Hu, Yihuai Lan, Wanyu Xu•Mon Apr 03 2023

LLM-Adapters is presented, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks, demonstrating that using adapter- based PEFT in smaller-scale LLMs with few extra trainable parameters yields comparable, and in some cases superior, performance to powerful LLMs in zero-shot inference on both reasoning tasks.

402 0

Paper Graph

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

Shizhu He, Kang Liu, Jun Zhao, Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li•Mon Apr 03 2023

This work highlights the potential of seamlessly unifying explicit rule learning via CoNNs and implicit pattern learning in LMs, paving the way for true symbolic comprehension capabilities.

12 0

Paper Graph

Learning to Reason for Text Generation from Scientific Tables

Dan Roth, N. Moosavi, Andreas Ruckl'e, Iryna Gurevych•Thu Apr 15 2021

SciGen is the first dataset that assesses the arithmetic reasoning capabilities of generation models on complex input structures, i.e., tables from scientific articles and their corresponding descriptions, and one of the main bottlenecks for this task is the lack of proper automatic evaluation metrics.

21 0

Paper Graph

Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning

Song-Chun Zhu, Siyuan Huang, Xiaodan Liang, Pan Lu, Ran Gong, Shibiao Jiang, Liang Qiu•Sun May 09 2021

This work constructs a new largescale benchmark, Geometry3K, consisting of 3,002 geometry problems with dense annotation in formal language, and proposes a novel geometry solving approach with formal language and symbolic reasoning, called Interpretable Geometry Problem Solver (InterGPS).

370 0

Paper Graph

Adding a benchmark result helps the community track progress.