RE: LeoThread 2025-08-13 21:50
You are viewing a single comment's thread:
Noticing a trend where due to extensive benchmarking on long-term tasks, models are becoming overly agentic.
0
0
0.000
You are viewing a single comment's thread:
Noticing a trend where due to extensive benchmarking on long-term tasks, models are becoming overly agentic.
For coding, they tend to think too long, unnecessarily list and grep files across entire repos, conduct repeated web searches, and over-analyze even incomplete code, often returning after several minutes for simple inquiries.
Oh!
In which model(s) have you witnessed such a behavior? It seems that Claude 4.0 has actually improved in that respect, compared to its 3.7 version.
While this is okay for long-running tasks, it's not ideal for iterative development or quick checks, like confirming indexing or spotting simple errors.
This often leads to having to stop them with prompts like "Stop overthinking, focus on this single file, avoid tool usage, and don't over-engineer."
As the default shifts towards an "ultrathink" mode, there's a growing need for methods to convey intent or stakes---from "just have a quick look" to "spend 30 minutes ensuring certainty."