3260 papers • 126 benchmarks • 313 datasets
This task has no description! Would you like to contribute one?
(Image credit: Papersgraph)
These leaderboards are used to track progress in logical-fallacies-2
Use these libraries to find logical-fallacies-2 models and implementations
No subtasks available.
This paper presents an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher.
This paper introduces Natural Language to First-Order Logic (NL2FOL), a framework to autoformalize natural language to FOL step by step using Large Language Models (LLMs), and uses Satisfiability Modulo Theory solvers to reason about the logical validity of natural language statements.
This paper formalizes prior theoretical work on logical fallacies into a comprehensive three-stage evaluation framework of detection, coarse- grained, and fine-grained classification, and employs three families of robust and explainable methods based on prototype reasoning, instance-based reasoning, and knowledge injection.
A Case-Based Reasoning method that classifies new cases of logical fallacy by language-modeling-driven retrieval and adaptation of historical cases, and designs four complementary strategies to enrich input representation for this model, based on external information about goals, explanations, counterarguments, and argument structure.
Findings indicate that both GPT-3.5 and GPT-4 can adjust their opinion through reasoning, however, when presented with logical fallacies, GPT-3.5 and GPT-4 are erroneously convinced 41% and 69% more often, respectively, compared to when logical reasoning is used.
A closer look at the self-verification abilities of LLMs in the context of logical reasoning, focusing on their ability to identify logical fallacies accurately, suggests that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self- Verification methods.
This work presents OlympiadBench, an Olympiad-level bilingual multimodal scientific benchmark, featuring 8,476 problems from Olympiad-level mathematics and physics competitions, including the Chinese college entrance exam, and implements a comprehensive assessment methodology to accurately evaluate model responses.
Adding a benchmark result helps the community track progress.