Home Research Papers Datasets State of the Art Pricing

Discover, visualize, and connect AI research papers. Explore the latest trends and insights in artificial intelligence research.

Product

Home
Research Papers
About

Support

Contact
Terms of Service
Privacy Policy

© 2026 Papersgraph. All rights reserved.

miscellaneous-6

Automated Theorem Proving

3260 papers • 126 benchmarks • 313 datasets

The goal of Automated Theorem Proving is to automatically generate a proof, given a conjecture (the target theorem) and a knowledge base of known facts, all expressed in a formal language. Automated Theorem Proving is useful in a wide range of applications, including the verification and synthesis of software and hardware systems. Source: Learning to Prove Theorems by Learning to Generate Theorems

(Image credit: Papersgraph)

Benchmarks

These leaderboards are used to track progress in automated-theorem-proving-18

Trend

Dataset

Best Model

Actions

miniF2F-test

miniF2F-test

miniF2F-valid

miniF2F-valid

HolStep (Conditional)

Libraries

i

Use these libraries to find automated-theorem-proving-18 models and implementations

eleutherai/gpt-neox

2 papers 6,434

Datasets

MiniF2F

Geometry3K

MED

Kinship

HOList

HolStep

Subtasks

No subtasks available.

Most implemented papers

Holophrasm: a neural Automated Theorem Prover for higher-order logic

Daniel Whalen•Sun Aug 07 2016

Holophrasm exploits the formalism of the Metamath language and explores partial proof trees using a neural-network-augmented bandit algorithm and a sequence-to-sequence model for action enumeration.

55

Content

Introduction Benchmarks Datasets Subtasks Libraries Papers

HolStep (Conditional)

HOList benchmark

HOList benchmark

HolStep (Unconditional)

HolStep (Unconditional)

Metamath set.mm

Metamath set.mm

miniF2F-curriculum

miniF2F-curriculum

CompCert

CompCert

CoqGym

CoqGym

LeanDojo Benchmark

LeanDojo Benchmark

ProofNet

ProofNet

GamePad Environment

GamePad Environment

0

Proof Artifact Co-training for Theorem Proving with Language Models

Yuhuai Wu, Jesse Michael Han, Stanislas Polu, Edward W. Ayers, Jason M. Rute•Wed Feb 10 2021

PACT is proposed, a general methodology for extracting abundant self-supervised data from kernel-level proof terms for co-training alongside the usual tactic prediction objective and applied to Lean, an interactive proof assistant which hosts some of the most sophisticated formalized mathematics to date.

143 0

Llemma: An Open Language Model For Mathematics

Stella Biderman, S. Welleck, Hailey Schoelkopf, Zhangir Azerbayev, Keiran Paster, Marco Dos Santos, S. McAleer, Albert Q. Jiang, Jia Deng•Sun Oct 15 2023

Llemma is a large language model for mathematics that outperforms all known open base models, as well as the unreleased Minerva model suite on an equi-parameter basis, and is capable of tool use and formal theorem proving without any further finetuning.

400 0

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Kunhao Zheng, Jesse Michael Han, Stanislas Polu•Mon Aug 30 2021

The miniF2F benchmark currently targets Metamath, Lean, Isabelle, and HOL Light and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad, as well as material from high-school and undergraduate mathematics courses.

290 0

Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs

Yuhuai Wu, Guillaume Lample, Jiacheng Liu, S. Welleck, Timothée Lacroix, Albert Qiaochu Jiang, J. Zhou, Wenda Li, M. Jamnik•Thu Oct 20 2022

Draft, Sketch, and Prove (DSP), a method that maps informal proofs to formal proof sketches, and uses the sketches to guide an automated prover by directing its search to easier sub-problems, is introduced.

251 0

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

Alex Gu, Anima Anandkumar, R. Prenger, Aidan M. Swope, Kaiyu Yang, Rahul Chalamala, Peiyang Song, Shixing Yu, Saad Godil•Mon Jun 26 2023

This paper introduces LeanDojo: an open-source Lean playground consisting of toolkits, data, models, and benchmarks, and develops ReProver (Retrieval-Augmented Prover): an LLM-based prover augmented with retrieval for selecting premises from a vast math library.

359 0

DeepMath - Deep Sequence Models for Premise Selection

François Chollet, Christian Szegedy, G. Irving, Alexander A. Alemi, J. Urban, N. Eén•Mon Jun 13 2016

A two stage approach is proposed that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models.

250 0

Learning to Prove Theorems by Learning to Generate Theorems

Jia Deng, Mingzhe Wang•Sun Feb 16 2020

This work proposes to learn a neural generator that automatically synthesizes theorems and proofs for the purpose of training a theorem prover, and demonstrates that synthetic data from this approach improves the theorem provers and advances the state of the art of automated theorem proving in Metamath.

56 0

Measuring Systematic Generalization in Neural Proof Generation with Transformers

Koustuv Sinha, C. Pal, Siva Reddy, Nicolas Gontier•Tue Sep 29 2020

It is observed that models that are not trained to generate proofs are better at generalizing to problems based on longer proofs, which suggests that Transformers have efficient internal reasoning strategies that are harder to interpret.

67 0

Adding a benchmark result helps the community track progress.

Automated Theorem Proving | State-of-the-Art