RE: LeoThread 2025-08-13 21:50

You are viewing a single comment's thread:

Noticing a trend where due to extensive benchmarking on long-term tasks, models are becoming overly agentic.



0
0
0.000
5 comments
avatar

For coding, they tend to think too long, unnecessarily list and grep files across entire repos, conduct repeated web searches, and over-analyze even incomplete code, often returning after several minutes for simple inquiries.

0
0
0.000
avatar

Oh!

In which model(s) have you witnessed such a behavior? It seems that Claude 4.0 has actually improved in that respect, compared to its 3.7 version.

0
0
0.000
avatar

While this is okay for long-running tasks, it's not ideal for iterative development or quick checks, like confirming indexing or spotting simple errors.

0
0
0.000
avatar

This often leads to having to stop them with prompts like "Stop overthinking, focus on this single file, avoid tool usage, and don't over-engineer."

0
0
0.000
avatar

As the default shifts towards an "ultrathink" mode, there's a growing need for methods to convey intent or stakes---from "just have a quick look" to "spend 30 minutes ensuring certainty."

0
0
0.000