# chainofthought_reasoning_without_prompting__ed4e85ef.pdf Chain-of-Thought Reasoning without Prompting Xuezhi Wang Google Deep Mind xuezhiw@google.com Denny Zhou Google Deep Mind dennyzhou@google.com In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (Co T) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, Co T reasoning paths can be elicited from pre-trained LLMs by simply altering the decoding process. Rather than conventional greedy decoding, we investigate the top-k alternative tokens, uncovering that Co T paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs intrinsic reasoning abilities. Moreover, we observe that the presence of a Co T in the decoding path correlates with a higher confidence in the model s decoded answer. This confidence metric effectively differentiates between Co T and non-Co T paths. Extensive empirical studies on various reasoning benchmarks show that the proposed Co Tdecoding effectively elicits reasoning capabilities from language models, which were previously obscured by standard greedy decoding. 1 Introduction Large language models (LLMs) have demonstrated remarkable performance on various complicated reasoning benchmarks (Anil et al., 2023; Brown et al., 2020; Chowdhery et al., 2023; Gemini, 2023; Open AI, 2023; Romera-Paredes et al., 2023). These reasoning capabilities of LLMs are typically elicited by prompting techniques (Brown et al., 2020), which can be few-shot prompting with intermediate steps augmented demonstration exemplars (Chen et al., 2023b; Gao et al., 2022; Nye et al., 2021; Wei et al., 2022; Yao et al., 2023; Zhou et al., 2023a), or zero-shot prompting with specific instructions which ask for showing certain intermediate steps (Kojima et al., 2022; Yasunaga et al., 2023). The other prevalent strategy for eliciting LLM reasoning is through model training or instruction tuning using a substantial amount of chain-of-thought (Co T) reasoning data (Chung et al., 2022; Cobbe et al., 2021; Ling et al., 2017; Nye et al., 2021). Prompting techniques, while effective, often encode task-specific human priors, thereby making it difficult to assess a language model s intrinsic reasoning abilities. Ideally, a well-trained language model should be capable of independent reasoning and delivering optimal responses, without requiring humans to tweak the prompts or refine repeatedly when the initial response is unsatisfactory. Modeltuning can be expensive and requires a substantial amount of supervised data. In this work, we explore a different perspective and ask: Can LLMs reason effectively without prompting? And to what extent can they reason? We find that, perhaps surprisingly, there exists a task-agnostic way to elicit Co T reasoning from pre-trained LLMs by simply altering the decoding procedure. Figure 1 illustrates this phenomenon: given a reasoning question, the LLM generates a wrong answer via the standard greedy decoding path, yet alternative top-k token inspection unveiled inherent Co T paths (e.g., decoding paths 2 and 4), which accurately resolved the query. This decoding modification bypasses prompting and is entirely unsupervised without the need for model tuning. 38th Conference on Neural Information Processing Systems (Neur IPS 2024). Q: I have 3 apples, my dad has 2 more apples than me, how many apples do we have in total? A: Decoding step 0 top-1: 5 top-2: I top-3: We top-4: You top-5: The Continue greedy decoding I have 3 apples, my dad has 2 more apples than me, so he has 5 apples. 3+5=8. We have 8 apples in total. We have 5 apples in total. You have 3 apples, your dad has 2 more apples than you, so he has 5 apples. 3+5=8. You have 8 apples in total. The answer is 5. uncertain certain Question in standard QA format Figure 1: Illustration of Co T-decoding. Pre-trained LLMs are capable of inherent reasoning without prompting by considering alternative top-k tokens, rather than solely relying on the top-1 greedy decoding path. Moreover, these models tend to display higher confidence in decoding the final answer (indicated by a darker shaded color) when a Co T reasoning path is present. In more details, we formulate the input using the standard question-answer (QA) format: Q: [question]\n A:".1 While most existing work suggest that LLMs falter in such direct-QA scenarios on reasoning (Cobbe et al., 2021; Kojima et al., 2022; Nye et al., 2021; Wei et al., 2022), our findings reveal a nuanced picture. We observe that LLMs indeed struggle with reasoning when relying solely on greedily decoded paths. However, when we consider alternative paths among the top-k tokens, Co T reasoning patterns emerge naturally within the decoding trajectories of LLMs. In addition, we have observed an interesting pattern: the model demonstrates increased confidence in the final answer when a Co T reasoning path is present in the decoding process. As illustrated in Figure 1, this is evident where paths 2 and 4 show heightened certainty in arriving at the correct answer 8 , contrasting sharply with the high uncertainty in paths that lead to the incorrect 5 . Leveraging this phenomenon, we develop a method to sift through the top-k decoding paths, which we refer to as Co T-decoding, thereby isolating the most reliable paths for model output. Our contributions are summarized as follows: We present a novel finding that LLMs can reason by simple decoding changes, without the use of prompting. In contrast to prior research that focuses on refining prompts to elicit reasoning from LLMs, our work, for the first time, shows that the reasoning process can be readily elicited by simple decoding changes. Moreover, we challenge the prevailing notion in the literature that LLMs are inherently incapable of effective reasoning without prompting. We show that this belief is an artifact of considering only the greedy path during decoding, and the model s reasoning paths can be revealed by traversing the alternative decoding paths. Our method enables a better understanding of LLMs intrinsic reasoning capabilities without imposing human priors. The employment of intricate prompting techniques often introduces various human priors, making it difficult to distinguish between the extent of human teaching" and the degree to which LLMs can reason independently. Our approach bypasses the confounders introduced by prompting, enabling a more truthful assessment of the models intrinsic reasoning abilities. Our study reveals that pre-trained language models inherently possess reasoning capabilities for many tasks including math and commonsense reasoning, and existing prompting approaches mostly serve the role of bringing those inherent reasoning paths forward as the top decoding paths. In contrast, the Co T-paths are less prevalent in complex and highly synthetic tasks, where the few-shot Co T demonstrations play a teaching role in guiding how models solve a task, with models primarily mimicing the format of these prompts to generate accurate reasoning paths. We further propose Co T-decoding that reliably selects Co T-paths based on answer confidence. We find that the language model s confidence in its final answers increases when a Co T is present in its decoding path. Leveraging this increased confidence, we propose Co T-decoding to select more reliable decoding paths, demonstrating significant improvements over greedy decoding across various reasoning benchmarks. 1The QA format is only needed because without it a pre-trained language model will continue the question instead of answering. It is also the most basic formatting employed in existing works for pre-trained models. 2 Chain-of-Thought (Co T) Decoding 2.1 Pre-trained Language Models Can Reason without Prompting We investigate whether pre-trained language models inherently possess reasoning capabilities, without explicit prompts or human intervention. In Table 1, we show example decoding paths across math (GSM8K, Cobbe et al. (2021)) and commonsense reasoning (year parity, Allen-Zhu and Li (2023)). We employ the pre-trained Pa LM-2 large model (Anil et al., 2023) to compare its greedy decoding path (k = 0), predominantly used in state-of-the-art LLMs for reasoning tasks, with alternative decoding paths (k > 0), where k represents the choice of the k-th token at the first decoding step. [GSM8K] One glass costs $5, but every second glass costs only 60% of the price. Kylar wants to buy 16 glasses. How much does he need to pay for them? [Year Parity] Was Nicolas Cage born in an even or odd year? Greedy path: k = 0: $60.00 (0.029) Alternative top-k paths: k = 1: 60 (0.058) k = 2: Kylar needs to pay $60 for 16 glasses. (0.058) . . . k = 7: If Kylar buys 16 glasses, he will pay $60. (0.032) k = 9: We can calculate . . . we need to multiply the price of one glass by 16 and then subtract 40% of the price of 8 glasses. 16 x 5 = 80 8 x 5 = 40 40 x 0.4 = 16 80 16 = 64 Kylar needs to pay $64 for 16 glasses. (0.994) Greedy path: k = 0: Nicolas Cage was born in an odd year. (0.117) Alternative top-k paths: k = 1: Even (0.207) k = 2: Odd (0.198) k = 3: 1964, an even year. (0.949) k = 4: He was born in an even year. (0.0) . . . k = 7: Cage was born in 1964, an even year. (0.978) Table 1: Examples of greedy decoded paths and alternative top-k paths over the Pa LM-2 Large model. The model s confidence over the answers (bolded) are highlighted in blue (See 2.2 for details). LLMs indeed cannot reason if we only consider the greedy decoding path. First, we observe that models employing greedy decoding often does not contain a Co T path, opting to solve problems directly. This tendency may stem from the model s skewed perception of problem difficulty, shaped by its pre-training on predominantly simpler questions. Consequently, the model is predisposed to immediate problem-solving. This observation aligns with findings in (Cobbe et al., 2021; Kojima et al., 2022; Nye et al., 2021; Wei et al., 2022), which show that direct-answer prompts generally result in low accuracy on reasoning tasks even for large language models. LLMs can reason if we consider the alternative decoding paths. Contrastingly, an intriguing phenomenon emerges when exploring alternative top-k (k > 0) tokens at the first decoding step. Continuing with greedy decoding from this point reveals natural Co T reasoning in many cases. These findings suggest that large language models possess inherent reasoning capabilities for numerous tasks following pre-training, but these abilities are obscured by the predominant use of greedy decoding. These reasoning paths can be easily uncovered by incorporating alternative decoding paths. For instance, in the GSM8K question (Table 1), a valid Co T emerges at k = 9. Similarly, in the year parity task, greedy decoding attempts to directly answer the parity question at k = 0, leading to a random choice between even and odd which often results in an incorrect answer. However, when exploring k > 0, the model naturally generates Co T paths at k = 3 and k = 7, where it first determines the year before resolving the parity. 2.2 Co T-Decoding for Extracting Co T Paths In this section, we further show how we can reliably extract those Co T-paths during the decoding process. We observe that Co T paths do not consistently outrank non-Co T ones in the model s probability assessment. Moreover, they often do not represent the predominant answer among all paths, rendering methods like self-consistency (Wang et al., 2023a) inapplicable. For instance, in the GSM8K question, the prevalent answer 60 , which aligns with the greedy decoding result, fails to serve as a reliable indicator for identifying the correct path. Interestingly, upon examining the model s logits, we found that the presence of a Co T path typically leads to a more confident decoding of the final answer, characterized by a significant probability disparity between the top and secondary tokens: k,answer = 1 |answer| xt answer p(x1 t | x