Back in February 2023, I did a chat with GPT to determine its understanding of supply and demand. I discovered that it did not understand the concept, but I didn’t blame the AI. Instead, I argued that it reflected a flawed understanding of Supply and Demand in the documents that GPT was trained on.
Claude 3 was just released, and I decided to investigate whether AI had progressed in its understanding of S&D. As we will see, it has not. I suspect this is because AI is limited by the quality of the material that it trains on, at least for the moment. Perhaps in the future it will be able to “think for itself”.
I had hoped that Claude 3 would recognize that the impact of a price change on consumption depends on whether the price change was caused by a shift in supply or a shift in demand. It failed:
As I did in February 2023, I specified that I was looking for an equilibrium outcome, to make the question less “tricky”. As in 2023, this led to some bizarre confusion in the reply:
The decrease in supply in step 5 is perfectly fine, but the way Claude reasons between steps #3 and #5 is cringeworthy.
I worried that the phrase “in equilibrium” might have confused Claude, leading it to assume I only meant starting in equilibrium. So I repeated the question with language making it clear I meant “remains in equilibrium”:
This is a bit better, although it still misses the basic point that you cannot reason from a price change. Claude had no basis for assuming the price increase was due to a decline in supply.
Interestingly, at some level Claude does “understand” the fallacy of reasoning from a price change . But here the term “understand” is used in roughly the way a B student at a decent university understands concepts. They can memorize them and provide answers on tests, but often are unable to apply the concept in a novel situation.
I also repeated a question about the impact of an expansionary monetary policy on interest rates. Once again, it assumed that interest rates would fall, which is a mistake made by many economists. More worrisome is its claim that inflation expectations would also fall:
READER COMMENTS
Jim Glass
Mar 7 2024 at 1:58am
Let us hope AI is better at strategizing nuclear warfare. Last year Bloomberg reported that the armed forces are testing AI in operational exercises “using secret level data” and “it could be deployed by the military in the very near term.”
After that academic researchers tested five LLMs in various geopolitical ‘wargame’ scenarios giving a range of options such as trade agreements, peace negotiations, etc., and found that even from the neutral, no-conflict scenario they escalated…
BOOM! So AIs do think just like people after all. Or, more optimistically, the researchers suggest that because the Internet is filled with much more analysis of escalation than of de-escalation, the LLMs over-train on the former. (As there’s no “I” in AI, it’s really just machine learning that mimics whatever is fed to it — as to geopolitical escalation as to supply and demand.)
Sabine explains the details. Here is the original paper.
Dylan
Mar 7 2024 at 9:36am
You’d think that this problem was solved in 1983 pretty convincingly. Can’t we have LLMs train exclusively on that War Games?
Scott Sumner
Mar 7 2024 at 11:39am
Yes, most human documents are dumb. I worry a bit about AIs being trained on dumb ideas (such as reasoning from a price change.)
Robert E. Wright
Mar 13 2024 at 8:42am
Let’s get some investors together and make a good AI trained on classical liberal stuff only. Call it Frank Lee or something like that (frankly, get it?) Or the Liberty Freeman. Or the Econobot. I’m serious … I’ve been helping to train a next gen AI and it sucks on economic history, refuses to discuss the economics of slavery, etc. Carpe AI say I!
William
Mar 7 2024 at 8:38am
I work with AI a lot in my job (coding), and as a general rule, it is hard to get good answers from a large language model when your questions are unclear. It’s definitely a weakness of LLMs at this point, their inability to say “wait a second, the information you gave me is incomplete or doesn’t seem to make sense.”
You are not wrong to want Claude to respond to your first question with “What do you mean, ‘oil prices rise’? Oil prices rising isn’t just something that *happens* in a supply and demand model, there has to be a *reason* for the rise, so I cannot answer your question without knowing the reason.”
But by their nature, LLMs just try to complete the sequence based on the text up to that point, they will never tell you that your question is garbage, which causes problems because they haven’t solved the age-old “garbage in, garbage out” problem.
I get what you are trying to do, but the fact that asking (purposely) bad questions leads to bad answers is not a unique to large language models.
robc
Mar 7 2024 at 9:26am
But its not a bad question, a human, or at least an econ student, should be able to answer it correctly. Many won’t, because they haven’t learned not to reason from a price change. But a good econ student would be able to answer it properly.
William
Mar 7 2024 at 10:23am
It is a bad question because Scott wrote it to be a bad question. The fact that we can argue about whether it is “bad” is an illustration of the concept. “Bad” can mean different things to different people, so there is no universal response to the query, “Is Scott’s first question bad?”It is definitely a *tricky* question, because you look at a supply and demand graph, see where they intersect, and that tells you the price. So if someone asks “But what if the price goes up?” It’s really not a reasonable question. If the price just “goes up” with no shift in the curves, then the price will just go right back down again, and the quantity consumed will be the same. Or if the price “has gone up” because one of the curves shifted, then the quantity will change.
“If the grocery store doubles the price of milk next week, will they sell more or less milk?”You could say “well, they wouldn’t double the price of milk unless it was in response to a huge increase in demand for milk, so they will sell more!” But…come on, man, be serious.
Scott Sumner
Mar 7 2024 at 11:43am
This is wrong. My question says nothing about supply and demand graphs. It’s a simple question; what happens to quantity if you see the price rise in the real world? Nothing “tricky” at all. People are only tricked if they don’t understand the S&D model. It is their ignorance that makes them make mistakes, not the question, which is about an empirical observation in the real world.
William
Mar 7 2024 at 12:48pm
I don’t know how you can ask a question to which the correct answer is “There is not enough information to answer this question,” and think it is not a tricky question.
Scott Sumner
Mar 7 2024 at 8:43pm
Easy, answer by describing what factors the outcome depends on.
James
Mar 9 2024 at 9:19am
Hey Scott. The feedback that your question was too simple is valid. It has to do with the way LLMs work. LLMs work best with more context in your prompt. Restructure it in a manner where you were talking to another economist, and not siri or a student, and you should receive better responses. Or keep the student question and give it the context that it is an econ college student answering a question for a test and the professor is asking for step by step answer (etc).
Warren Platts
Mar 11 2024 at 2:10pm
The answer is simple, and I’m not even an economics student: if the price goes up due to an increase in demand, the quantity of oil consumed will increase; if the price goes up due to a decrease in supply, the quantity of oil consumed will decrease; if the demand increases while the supply decreases simultaneously, it’s possible that the quantity of oil consumed would remain the same; and if either the supply or demand curve happens totally inelastic, the quantity consumed will remain the same.
I would consider that a pretty complete answer (assuming I’m not wrong).
PS I guess the last part could be considered a wrong answer in the context of oil because I read somewhere that the elasticity of both supply & demand for oil is on the order of 0.1 (meaning a mere 1% change in supply or demand entails a 10% change in price).
BC
Mar 7 2024 at 11:46am
Scott is trying to test depth of understanding. As we know from tests that are given to students, testing depth of understanding typically requires asking “tricky” questions. Those are the questions that have a “twist” on them to distinguish students that truly understand concepts from those that merely “memorize them and provide answers on tests, but often are unable to apply the concept in a novel situation.”
csbeau
Mar 12 2024 at 10:19pm
Hi William,
Thank you for sharing your insightful thoughts on this article. As someone with experience studying neural networks back in the early days of 1989-1993, I can appreciate both the impressive capabilities of modern large language models like Claude, as well as the areas where there is still room for improvement.
You make an astute observation that posing unclear or adversarial prompts designed to “trick” the AI can often result in unsatisfactory outputs – a classic “garbage in, garbage out” situation. The author of this piece seems to have taken that approach intentionally to try and expose the limitations of Claude’s economic reasoning abilities.
However, I don’t believe these results necessarily indict the AI’s overall competence. As we’ve both experienced working with AI systems, they perform best when provided clear context, well-framed queries, and a collaborative approach from the user. An adversarial framing centered on “making the AI fail” is unlikely to showcase its full potential.
In my own applications of cutting-edge AI models to various business and personal endeavors, I follow a thoughtful process:
1) Provide relevant background context to ground the AI in the domain2) Build up questions iteratively, allowing the AI to incorporate prior responses3) Approach follow-ups cooperatively by reformulating rather than combatively “poking holes”
Using this methodology, I’ve been able to extract highly insightful and intelligent outputs that can absolutely further human understanding and decision-making capabilities when utilized responsibly. In fact, I worked together with Claude 3 “Sonnet” to formulate and improve this very response – a counter-example to the author’s criticism.
At the same time, I agree with the underlying point that today’s AI language models can still struggle with robust conceptual reasoning across domains, especially for edge cases. This likely stems from flaws or gaps in how core principles like economic theory are represented in the training data.
As the field continues advancing, developing AI with deeper conceptual mastery across disciplines will be crucial. This may require higher-quality curated training data, exposing models to counterfactual reasoning tasks, and novel architectures focused on enhancing logical reasoning abilities.
But dismissing AI’s current potential based solely on adversarial prompting examples misses the mark in my opinion. With human-AI collaboration and an understanding of the technology’s strengths and limitations, we can absolutely leverage these tools effectively today while working towards their continued improvement.
William
Mar 7 2024 at 10:59am
Yes, but the context of your prompt matters a lot. By asking ChatGPT two of your questions within the same prompt, I got the answer you were looking for.Prompt:
Response:
robc
Mar 7 2024 at 11:06am
Which proves the point (a point?) that it is not intelligent. Why should the answer to #2 change depending on other information in the prompt. The answer to that question is the same whether or not #1 exists in the prompt or not.
William
Mar 7 2024 at 11:29am
It is totally normal for the answer to a question to depend on the context. In this case, the answer to #2 changes depending on the context because Scott Sumner’s answer, while it is technically correct and illustrates an important concept (never reason from a price change!), would be seen as annoyingly pedantic in another context.
My point all along has been, not that LLMs are “intelligent,” but that clear thinking and well-written prompts lead to much better results.
robc
Mar 7 2024 at 12:17pm
https://www.youtube.com/watch?v=0ZEuWJ4muYc
Scott Sumner
Mar 7 2024 at 11:45am
Thanks. That supports my claim that Claude is like a B student. With suitable coaching, the B student can get to the right answer. But they have trouble thinking abstractly.
Kenneth Duda
Mar 7 2024 at 6:14pm
Comparing chatbots to humans is like comparing submarines to swimmers. They’re just so different….
Scott Sumner
Mar 7 2024 at 8:46pm
I suppose that one difference is that relative “intelligence” would vary dramatically depending on the type of question being asked. The concept of “g” is not applicable. Humans are much smarter at some things, and machines at other things. Does that sound right?
BC
Mar 7 2024 at 11:39am
Maybe, try asking, “If oil prices have risen, what has happened to the quantity of oil consumed? Be careful not to reason from a price change.”
“But here the term ‘understand’ is used in roughly the way a B student at a decent university understands concepts.”
This is a good example of why claims like, “The LLM outperformed 90% of students on the test,” don’t necessarily mean that the LLM has achieved expert status. A lot of students taking an economics course, for example, won’t become professional economists. Someone scoring at the 90th percentile on the SAT Math won’t necessarily become a PhD-level scientist. (I had always assumed that most STEM PhDs at elite universities scored at or near the 99th percentile, but maybe I’m overestimating.) An AI robot that can play football better than 90% of high school students may not even make the high school team, let alone play at the D-I college or NFL level. Gains from specialization…
BC
Mar 7 2024 at 12:10pm
On the other hand, the LLM might still be able to write decent econ and business news articles. I’m guessing that many econ and business journalists would struggle with these questions too. Question for Scott: how well do you think Claude 3 would do as a tutor-bot for an online econ course relative to a typical, or even above average but not best, econ grad student TA?
Scott Sumner
Mar 7 2024 at 8:47pm
I think it would do pretty well.
Michael S.
Mar 20 2024 at 4:56am
I also think that most of the variation of essays and articles is explained by the authors mimicking the other essays and articles they have read.
LLM are trained to do exactly that
Kenneth Duda
Mar 7 2024 at 5:13pm
Scott, when I asked the question (ChatGPT-4), I got the same wrong answer (higher prices means lower quantities). Then, I responded to its wrong answer with this:
============
Sometimes, economists warn against “reasoning from a price change”. In your answer above, did you reason from a price change? If so, how would you change your answer to avoid reasoning from a price change?
===============
I got what I thought was a very good answer (below). Just like the answer to “public opinion” surveys depend on how you pose the question, chatbots’ answers depend crucially on how you prompt it.
-Ken
=================
The warning against “reasoning from a price change” refers to the caution that economists advise when interpreting the consequences of a price change without considering the underlying cause of that price change. When I explained the effects of a price rise in the oil market, I indeed reasoned from a price change by focusing on the consequences of the price increase (i.e., a decrease in the quantity of oil purchased) without addressing what caused the price to rise in the first place.
To avoid reasoning from a price change and to provide a more comprehensive analysis, it’s important to consider both the factors that could lead to a price increase and the subsequent effects on the market. Here’s how the answer could be reframed to avoid reasoning solely from a price change:
Instead of starting with a price increase, we should consider potential scenarios that could lead to such an increase and analyze the market’s response to these scenarios. There are two primary ways a price might rise: due to an increase in demand or a decrease in supply.
1. **Increase in Demand**: If the price of oil rises due to an increase in demand (perhaps because of economic growth, increased transportation needs, or decreased availability of substitutes), the initial effect is a shift rightward in the demand curve. In this scenario, people are willing to buy more oil at every price level, leading to a higher equilibrium price and quantity. The focus is on why the demand increased and how this demand shift affects prices and quantities.
2. **Decrease in Supply**: Alternatively, if the price of oil rises because of a decrease in supply (perhaps due to production cuts, geopolitical tensions, or natural disasters affecting oil extraction), this would lead to a leftward shift in the supply curve. In this case, there is less oil available at every price level, leading to a higher equilibrium price but a lower equilibrium quantity. The analysis focuses on the reasons behind the supply decrease and its impact on the market equilibrium.
By starting with the underlying causes of the price change (increased demand or decreased supply), we can more accurately predict the market outcomes. This approach helps avoid the pitfall of assuming a direct cause-and-effect relationship from the price change to the quantity demanded or supplied, without understanding the dynamics that led to the price change. It provides a more nuanced view that considers both the initial shifts in supply or demand that led to the price change and the subsequent market adjustments in response to these shifts.
William Bruntrager
Mar 7 2024 at 8:10pm
Nicely done. I think people who just experiment with LLMs from time to time, rather than using them on a daily basis have a hard time understanding how they can be used effectively. Asking clear questions and providing context is crucial. You can see my results with ChatGPT above, they are similar to yours.
Scott Sumner
Mar 7 2024 at 8:57pm
Ken, Yes, I agree. And William asked a similar question. But notice that in this case you have to be pretty smart to “coach” it toward the right answer. But if you were that smart then you would already know the answer, and not need any help from Claude.
Your public opinion survey example is a good analogy. But ask yourself this—if you word a survey question in such a way as to get the desired result, have you actually ascertained public opinion? If I have to word Claude questions in a very specific way to get a correct answer, is Claude actually of much help?
More broadly, this post is not a criticism of AI; it’s a criticism of economists. I don’t expect AIs to go beyond the documents they are trained on, which are full of reasoning from a price change. I don’t blame the AIs.
The interesting question is whether AIs will ever get so smart that they are not mislead by all the incorrect reasoning in their training documents, and can figure out reality on their own.
Sam Bellamy
Mar 14 2024 at 6:35pm
Sam you are confusing asking a question with the concept of prompting AI models. Yes AIs are prone to give the wrong answer without context or clarification. This is because it does not ‘think’ in the traditional sense. It can ‘understand’ in a way what action you want it to do, but in the end it just predicts the next word based on the corpus that it has been trained. The bigger the corpus, the more likely it will understand the meaning of a users query and generated a decent answer. However it is prone to hallucinations. This is where prompting techniques and other methods are used to mitigate this. Further more, these hallucinations and errors can be reduced be training the model on quality data for a specific task, such like economics. Yet its performance will decrease on other subjects, since the data it was trained on was focus on a specific subject, and LLMs are as good as the data used to train it and the specific use case.
I do hope you continue experimenting with LLM. Specifically for economics, given your knowledge on the matter, maybe you can find a way to prompt, fine tune the AI to turn a specific model into an A+ economic student.
Carl
Mar 8 2024 at 11:58am
Google AI Researchers came to the same conclusion about AI’s ability to generalize (i.e. truly understand) a concept:
“Our empirical results show transformers demonstrate near-optimal unsupervised model selection capabilities, in their ability to first in-context identify different task families and in-context learn within them when the task families are well-represented in their pretraining data. However when presented with tasks or functions which are out-of-domain of their pretraining data, we demonstrate various failure modes of transformers and degradation of their generalization for even simple extrapolation tasks. ”
From: [2311.00871] Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models (arxiv.org)I got the link to the paper from this article: Google Researchers May Have Just Turned the Race to AGI Upside Down With a Single Paper (ampproject.org)
Scott Sumner
Mar 8 2024 at 1:02pm
Very interesting.
Tasha Upchurch
Mar 11 2024 at 11:53am
Prompting llms is actually a rather intense skill to master. I have provided a corrected prompt below. There are many other ways to ask this question in the same manner.
Also, I’m seeking employment.
If the price of oil has risen what has happened? GIven the context of taking a econ test.
Claude3 Opus:
In the context of an economics test, if the price of oil has risen, it could be due to several factors. Here are a few potential causes:
1. Decreased supply: If there is a decrease in the supply of oil, possibly due to factors such as reduced production, geopolitical issues, or natural disasters affecting oil-producing regions, the price of oil will likely increase.
2. Increased demand: If there is an increase in the demand for oil, possibly due to economic growth, rising population, or increased consumption from emerging markets, the price of oil will tend to rise.
3. Market speculation: If speculators believe that the price of oil will increase in the future, they may buy oil futures contracts, driving up the current price.
4. Currency fluctuations: Since oil is primarily traded in U.S. dollars, if the value of the dollar decreases relative to other currencies, the price of oil will appear to increase for countries using those currencies.
5. Government policies: Changes in government policies, such as taxes, subsidies, or regulations related to the oil industry, can impact the price of oil.
In summary, the rise in oil prices can be attributed to shifts in supply and demand, market speculation, currency fluctuations, and government policies. The specific cause would depend on the context provided in the question on the economics test.
Josh Kirschner
Mar 15 2024 at 12:36am
When I just asked your initial question of ChatGPT 4, it gave me an answer that seems to be much better reasoned with potential demand factors. This may suggest it is learning from the interactions around this topic.
“When oil prices rise, the immediate effect tends to be a decrease in the quantity of oil consumed. This happens because higher prices make oil more expensive for consumers and businesses, leading them to reduce their usage, seek out more efficient ways to use oil, or find alternative energy sources. However, the extent of this reduction in consumption can vary based on factors such as the availability of substitutes, the elasticity of demand for oil (which measures how responsive the demand for a product is to a change in its price), and the specific reasons for the price increase. If the price increase is due to a supply shock, the reduction in consumption might be more significant. If it’s due to increased demand, the decrease in quantity consumed might be less pronounced because the underlying demand factors are still driving consumption.”
Comments are closed.