Discriminator-Guided Multi-step Reasoning with Language Models

Muhammad Khalifa, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Lu Wang

by Muhammad Khalifa and Lajanujen Logeswaran

Abstract

In the context of multi-step reasoning, language models (LMs) probabilities are often miscalibrated – solutions with high probabilities are not always correct. Therefore, greedy decoding, which is the standard decoding method for reasoning tasks, often yields incorrect solutions. In addition, methods such as self-consistency and verifiers rely on sampling from the LM distribution and do not tackle the underlying issue. To address this, we introduce Guiding Multi-step ReAsoning with a CorrectnEss Discriminator (GRACE), a stepwise decoding approach that nudges the model towards producing correct reasoning steps. GRACE employs a discriminator model, which is trained to differentiate correct steps from invalid ones, to adjust decoding preferences based on the correctness of each reasoning step. Importantly, GRACE does not require fine-tuning or re-training the LMs. When compared with conventional decoding strategies over four popular math reasoning benchmarks, GRACE exhibits significant improvements in both final answer accuracy and step correctness, outperforming both greedy decoding and self-consistency.

High-level Overview

Discriminator Training

1. Sampling: Sampling incorrect solutions from the LM.

2. Alignment: Align sampled solutions with the reference solution to create training examples to train the discriminator.

3. Learning: Train the discriminator with max-margin loss.

Results Summary

We evaluate GRACE over 4 math (GSM8K, SVAMP, MultiArith, MathQA-Gain) and 2 symbolic reasoning tasks (Coin Flip and Tracking Shuffled Objects). Using GRACE for multi-step solution decoding outperforms greedy decoding and vanilla self-consistency with temperature sampling.

Performance on Math Reasoning:

Performance on Symbolic Reasoning:

Citation

If you find our paper useful, please cite us:

@article{khalifa2023discriminator,
  title={Discriminator-Guided Multi-step Reasoning with Language Models},
  author={Khalifa, Muhammad and Logeswaran, Lajanugen and Lee, Moontae and Lee, Honglak and Wang, Lu},
  journal={arXiv preprint arXiv:2305.14934},
  year={2023}
}