On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Generating a chain of thought (CoT) can increase large language model (LLM)
performance on a wide range of tasks. Zero-shot CoT evaluations, however, have
been conducted primarily on logical tasks (e.g. arithmetic, commonsense QA). In
this paper, we perform a controlled evaluation of zero-shot CoT across two
sensitive domains: harmful questions and stereotype benchmarks. We find that
using zero-shot CoT reasoning in a prompt can significantly increase a model's
likelihood to produce undesirable output. Without future advances in alignment
or explicit mitigation instructions, zero-shot CoT should be avoided on tasks
where models can make inferences about marginalized groups or harmful topics.