Chain-of-thought, tree-of-thought, and graph-of-thought: Prompting techniques explained

In this article, we'll provide a brief overview of a family of prompting techniques starting with chain-of-thought (CoT), modifying to enable tree search using tree-of-thought (ToT) and then finally generalizing it using graph-of-thought (GoT) prompting.
Saurav Maheshkar
Created on June 15|Last edited on June 27
Comment
﻿
﻿
Chain-of-thought (CoT) is a simple prompting technique that often leads to more accurate LLM outputs. Put simply, chain-of-thought prompts instructs the model to think rationally, in a step-by-step manner (a chain of thought) instead of rushing toward the answer. 
In this report, we'll look at CoT and a few related prompting techniques so we can understand how they work and how later techniques evolved from CoT.
Table of ContentsChain-of-thought promptingTree-of-thought promptingGraph-of-thought promptingConclusion
﻿
﻿
Chain-of-thought promptingThe authors behind Chain-Of-Thought (CoT) (2201.11903) had a very simple motivation: Scaling LLMs did not have the desired affect in model performance on complex reasoning tasks. It relies on two key ideas:
Complex arithmetic tasks can benefit by breaking them down into small intermediate steps in the form of natural language instructions
Prompting (in-context few-shot learning) can be used to adapt a model to a different task by providing input-output examples in the main prompt rather than fine-tuning for each separate task.
The chain-of-thought technique relies on combining both of these steps into one, by providing the model with a prompt consisting of triples, viz. <input, chain of thought, output>, where:
A chain of thought is a series of intermediate natural language reasoning steps that lead to the final output
💡
Figure 1: Illustrated chain-of-thought prompting from Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2201.11903)﻿
From Figure 1, we can see how a model can produce two different outputs, if prompted using this technique. The provided Input-Output example consists of a simple reasoning step where we determine the required operation in this case Addition. (5 + 2 x 3 = 11) but the required question has another intermediate subtraction step (23 - 20 + 6 = 9)
In standard prompting (on the left), we attempt to provide input output examples but without any reasoning steps. Thus the model tries to repeat the same math operations but without realising that the second question is different. Thereby getting the wrong answer as 27.
But if we use chain-of-thought prompting (on the right), wherein we explicitly specify the reasoning step as "Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11" (highlighted in blue in Figure 1). As we can see now we get the correct response as 9.
The following code snippet uses the graph_of_thoughts package to create a CoT pipeline:
def cot() -> operations.GraphOfOperations:
    operations_graph = operations.GraphOfOperations()
﻿
    operations_graph.append_operation(operations.Generate(1, 1))
    operations_graph.append_operation(operations.Score(1, False, utils.num_errors))
    operations_graph.append_operation(operations.GroundTruth(utils.test_sorting))
﻿
    return operations_graph
Tree-of-thought promptingShortly after CoT, the Tree-Of-Thoughts (2305.10601) (ToT) technique was proposed which frames any reasoning task as a tree search problem, where intermediate tree states refer to partial solutions which the model can refer and use while trying to determine the solution of the main problem using standard tree traversal techniques like branching and backtracking. Let's break down the technique:
Breakdown intermediate processed into steps: in ToT at every branch we attempt to decompose a given problem into intermediate steps. We can either sample various thoughts from a given step via CoT, or propose new thoughts using some generator.
Evaluate the various generated states: Given some different states, we use a evaluator which evaluates every state serving as a heuristic for the overall problem. But instead of defined formulas, we use the model to reason about every state. We can either evaluate them independently by calculating a value for each state or in a combined manner where we vote across all states at a given level/depth.
Search across evaluated states: Now that we have generated some possible next states and evaluated their values, we can use standard techniques from algorithmic theory to determine which branch to explore next. In the papers, the authors considered Depth-first search and Breadth-first search as two viable techniques.
Figure 2: Illustrated Tree-Of-Thought prompting from Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2305.10601)﻿
Like CoT, let's walkthrough an example. In the paper the authors considered the Game of 24 problem. Game of 24 is a mathematical reasoning challenge, where the goal is to use 4 numbers and basic arithmetic operations (+-*/) to obtain 24. In Figure 2, we can see that using "4 9 10 13" as the starting state, the model generated some viable options as the next step. We can generate the next few steps using either Value or Propose prompting, and then based on the scores decide which branch to explore next.
In this particular example, We used the language model to suggest the next possible steps using the prompt "Possible next steps: ". It suggested either (10-4=6) or (4+9=13). We then used the language model again to evaluate the states using the prompt "Evaluate if the given numbers can reach 24(sure/likely/impossible) <...>" and then based on what the model outputs, we choose the next state using a given search strategy (DFS/BFS).
The following code snippet uses the graph_of_thoughts package to create a ToT pipeline.
def tot() -> operations.GraphOfOperations:
    operations_graph = operations.GraphOfOperations()
﻿
    operations_graph.append_operation(operations.Generate(1, 20))
    operations_graph.append_operation(operations.Score(1, False, utils.num_errors))
    keep_best_1 = operations.KeepBestN(1, False)
    operations_graph.append_operation(keep_best_1)
﻿
    for _ in range(1):
        operations_graph.append_operation(operations.Generate(1, 20))
        operations_graph.append_operation(operations.Score(1, False, utils.num_errors))
        keep_best_2 = operations.KeepBestN(1, False)
        keep_best_2.add_predecessor(keep_best_1)
        operations_graph.append_operation(keep_best_2)
        keep_best_1 = keep_best_2
﻿
    operations_graph.append_operation(operations.KeepBestN(1, False))
    operations_graph.append_operation(operations.GroundTruth(utils.test_sorting))
﻿
    return operations_graph﻿﻿
Graph-of-thought promptingNow can we generalize this approach further to not be limited by trees. The Graph-Of-Thoughts (2308.09687) (GoT) technique proposes one such solution wherein we can model the flow of information using as an arbitrary graph structure. The overall flow is the same as in ToT (a thought generator which suggest the possible steps, a evaluator which determines scores for various thoughts, a Ranker which can be used to select the next thought).
Figure 3: Comparison of Graph-Of-Thoughts prompting with other techniques from Graph of Thoughts: Solving Elaborate Problems with Large Language Models (2308.09687)﻿
The following code snippet uses the graph_of_thoughts package to create a GoT pipeline.
def got() -> operations.GraphOfOperations:
    operations_graph = operations.GraphOfOperations()
﻿
    plans = operations.Generate(2, 1)
    operations_graph.append_operation(plans)  # generate the sublists
    for i in range(1, 3):
        list_id = f"List {i}"
        sub_list = operations.Selector(
            lambda thoughts, list_id=list_id: [
                thought for thought in thoughts if thought.state["part"] == list_id
            ]
        )
        sub_list.add_predecessor(plans)
        operations_graph.add_operation(sub_list)
        sort_sub_list = operations.Generate(1, 5)
        sort_sub_list.add_predecessor(sub_list)
        operations_graph.add_operation(sort_sub_list)
        score_sub_list = operations.Score(1, False, utils.num_errors)
        score_sub_list.add_predecessor(sort_sub_list)
        operations_graph.add_operation(score_sub_list)
        keep_best_sub_list = operations.KeepBestN(1, False)
        keep_best_sub_list.add_predecessor(score_sub_list)
        operations_graph.add_operation(keep_best_sub_list)
﻿
    final_aggregate = operations.Aggregate(10)
    operations_graph.append_operation(final_aggregate)
    operations_graph.append_operation(operations.Score(1, False, utils.num_errors))
    keep_best_aggregate_final = operations.KeepBestN(1, False)
    operations_graph.append_operation(keep_best_aggregate_final)
﻿
    operations_graph.append_operation(operations.Generate(1, 10))
    score_aggr_3 = operations.Score(1, False, utils.num_errors)
    score_aggr_3.add_predecessor(keep_best_aggregate_final)
    operations_graph.append_operation(score_aggr_3)
    operations_graph.append_operation(operations.KeepBestN(1, False))
﻿
    operations_graph.append_operation(operations.GroundTruth(utils.test_sorting))
﻿
    return operations_graph
ConclusionIn this article, we went through a brief overview of a series of papers on thought prompting and how we can use Weights & Biases to explore the training process and how that can lead to valuable insights.
To see the full suite of Weights & Biases features, please check out this short 5-minute guide. If you want more reports covering the math and "from-scratch" code implementations, let us know in the comments down below or on our forum ✨!﻿
Check out these other reports on Fully Connected covering other LLM-related topics like Model Adaptation.
What Are Intrinsic Dimensions? The Secret Behind LoRA
This article provides a brief overview of intrinsic dimensions and how they enable Low-Rank Domain Adaptation. We also provide code samples which use Weights & Biases for interactive visualizations. 
A Brief Introduction to LoRA
This article givens an overview of LoRA (Low-Rank Adaptation) of Large Language Models , using W&B for interactive visualizations. It includes code samples for you to follow.
AdaLoRA: Adaptive Budget Allocation for LoRA
This article provides an overview of "Adaptive Budget Allocation for Parameter Efficient Fine-Tuning" using W&B for interactive visualizations. It includes code samples for you to follow!
What is QLoRA?
This article provides an overview of "QLoRA: Efficient Finetuning of Quantized LLMs" using W&B for interactive visualizations. It includes code samples for you to follow!
﻿
﻿
Add a comment
Tags: Articles, Intermediate, LLM, Prompts, GenAI
Iterate on AI agents and models faster. Try Weights & Biases today.