The longer a conversation goes, the more likely that a large language model (LLM) will go astray. A research paper from Philippe Laban, Hiroaki Hayashi, Yingbo Zhou, and Jennifer Neville finds that most models lose aptitude—and unreliability skyrockets—in multi-turn exchanges:
We find that LLMs often make assumptions in early turns and prematurely attempt to generate final solutions, on which they overly rely. In simpler terms, we discover that when LLMs take a wrong turn in a conversation, they get lost and do not recover.
Effectively, these models talk when they should listen. The researchers found that LLMs generate overly verbose responses, which leads them to…
- Speculate about missing details instead of asking questions
- Propose final answers too early
- Over-explain their guesses
- Build on their own incorrect past outputs
The takeaway: these aren’t answer machines or reasoning engines; they’re conversation engines. They are great at interpreting a request and at generating stylistically appropriate responses. What happens in between can get messy. And sometimes, the more they talk, the worse it gets.