Asking Models to Reason
Learning Objectives
- You know what chain of thought prompting is.
Reasoning required
Some tasks require a bit of reasoning from the model. For example, we might want to really bake the chocolate cookies that the model suggested in the chapter focused on Clear Instructions.
Let’s assume that the recipe requires 200g of chocolate, 150g of butter, 2 dl of sugar, 1 egg, 3 dl of wheat flour, and a pinch of salt to produce 50 cookies, but we are short on some of the ingredients.
We can ask the model to calculate the amount of cookies that we can make based on the amount of ingredients that we have. To receive only a numeric response, we use few-shot prompting.
A chocolate cookie recipe that produces 50 cookies requires 200g of chocolate, 150g of butter, 2 dl of sugar, 1 egg, 3 dl of wheat flour, and a pinch of salt.
#####
I have all the ingredients, but I only have 100 grams of chocolate. How many cookies can I produce?
#####
Answer: 25
#####
I have all the ingredients, but I only have 0.2 dl of sugar. How many cookies can I produce?
#####
Answer: 5
#####
I have all the ingredients, but I only have 75 grams of butter. How many cookies can I produce?
#####
Answer:
12.5
The above is not quite right, as the model fails to reason about the amount of butter.
Chain-of-thought prompting
Chain-of-thought prompting is a technique where large language models are provided some information of how to determine the answer. This can lead to improved reasoning about the inputs.
For additional details, see Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.
Let’s adjust the prompt so that in each answer, we also provide reasoning that leads to the output.
A chocolate cookie recipe that produces 50 cookies requires 200g of chocolate, 150g of butter, 2 dl of sugar, 1 egg, 3 dl of wheat flour, and a pinch of salt.
#####
I have all the ingredients, but I only have 100 grams of chocolate. How many cookies can I produce?
#####
Answer: As 200 grams of chocolate yields 50 cookies, 100 grams of chocolate yields 25 cookies.
#####
I have all the ingredients, but I only have 0.2 dl of sugar. How many cookies can I produce?
#####
Answer: As 2 desiliters of sugar yields 50 cookies, 0.2 desiliters of sugar yields 5 cookies.
#####
I have all the ingredients, but I only have 75 grams of butter. How many cookies can I produce?
#####
Answer:
As the original recipe calls for 150 grams of butter to produce 50 cookies, having only 75 grams of butter would allow you to produce 25 cookies.
That’s more like it! The model is now able to reason about the amount of butter.
In the above examples, we provided the reasoning for the model to determine the answer. Chain-of-thought prompting can also be used without explicit examples. In such a case, the prompt is added an additional instruction to think through the solution step by step.
This is highlighted in the following example.
A chocolate cookie recipe that produces 50 cookies requires 200g of chocolate, 150g of butter, 2 dl of sugar, 1 egg, 3 dl of wheat flour, and a pinch of salt.
I have only 75g of butter. How many cookies can I produce?
Let's think step by step
First, we need to calculate the ratio of the ingredients to determine how many cookies can be made with 75g of butter.
The original recipe calls for 150g of butter to produce 50 cookies.
To find out how many cookies can be made with 75g of butter, we can set up a proportion:
150g butter is to 50 cookies as 75g butter is to x cookies
150g/50 cookies = 75g/x cookies
Cross multiplying, we get:
150g * x = 75g * 50
150x = 3750
x = 3750 / 150
x = 25
So, with 75g of butter, you can produce 25 cookies.
Try asking the AI chatbot to do mathematics, using GPT-4o mini. Start with a simple arithmetics operation such as 7+7. Continue building a more difficult calculation and see how long does it take for the model to fail to respond to the question. Once ready, try the chain of thought prompting technique and see if it helps out.
As we’ve also observed before, the prompt can influence the output considerably. Specific prompts or parts of prompts can yield more meaningful results than others. As an example, researchers have found that simply adding “Let’s think step by step” before each answer can lead to reasoning in zero-shot prompting as well.
For additional details, see Large Language Models are Zero-Shot Reasoners.
The OpenAI o1 models that were released in September (for beta users) use chain-of-thought reasoning -like techniques to provide more accurate answers.