Competitive Programming with Large Reasoning Models
s1: Simple test-time scaling
On the Diagram of Thought
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge
View PDF
COMBINING INDUCTION AND TRANSDUCTION FOR ABSTRACT REASONING
GSM-Symbolic: Understanding the Limitations of Mathematical...
Schrodinger's Memory: Large Language Models
One Thousand and One Pairs: A "novel" challenge for...
View PDF
A Careful Examination of Large Language Model Performance on Grade School Arithmetic
View PDF
Iterative Reasoning Preference Optimization
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap
Download PDF
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs | OpenReview
WorldSense: A Synthetic Benchmark for Grounded Reasoning in Large Language Models
Download PDF
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks
Download PDF
Preventing Language Models From Hiding Their Reasoning
Download PDF
Large Language Models can Learn Rules
Download PDF
Large Language Models Cannot Self-Correct Reasoning Yet
Download PDF
The Jiminy Advisor: Moral Agreements among Stakeholders Based on Norms and Argumentation | Journal of Artificial Intelligence Research
Advances in apparent conceptual physics reasoning in GPT-4
Improving Factuality and Reasoning in Language Models through Multiagent Debate
FOLIO: Natural Language Reasoning with First-Order Logic