Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
With the widespread use of large language models (LLMs) in NLP tasks,
researchers have discovered the potential of Chain-of-thought (CoT) to assist
LLMs in accomplishing complex reasoning tasks by generating intermediate steps.
However, human thought processes are often non-linear, rather than simply
sequential chains of thoughts. Therefore, we propose Graph-of-Thought (GoT)
reasoning, which models human thought processes not only as a chain but also as
a graph. By representing thought units as nodes and connections between them as
edges, our approach captures the non-sequential nature of human thinking and
allows for a more realistic modeling of thought processes. Similar to
Multimodal-CoT, we modeled GoT reasoning as a two-stage framework, generating
rationales first and then producing the final answer. Specifically, we employ
an additional graph-of-thoughts encoder for GoT representation learning and
fuse the GoT representation with the original input representation through a
gated fusion mechanism. We implement a GoT reasoning model on the T5
pre-trained model and evaluate its performance on a text-only reasoning task
(GSM8K) and a multimodal reasoning task (ScienceQA). Our model achieves
significant improvement over the strong CoT baseline with 3.41% and 5.08% on
the GSM8K test set with T5-base and T5-large architectures, respectively.
Additionally, our model boosts accuracy from 84.91% to 91.54% using the T5-base
model and from 91.68% to 92.77% using the T5-large model over the
state-of-the-art Multimodal-CoT on the ScienceQA test set. Experiments have
shown that GoT achieves comparable results to Multimodal-CoT(large) with over
700M parameters, despite having fewer than 250M backbone model parameters,
demonstrating the effectiveness of GoT.