RE: LeoThread 2025-08-13 21:50

@andypathy 42

14 days ago

LeoFinance

You are viewing a single comment's thread:

Noticing a trend where due to extensive benchmarking on long-term tasks, models are becoming overly agentic.

leofinance

0.000

5 comments

@andypathy 42

14 days ago

For coding, they tend to think too long, unnecessarily list and grep files across entire repos, conduct repeated web searches, and over-analyze even incomplete code, often returning after several minutes for simple inquiries.

0.000

@ijatz 64

14 days ago

Oh!

In which model(s) have you witnessed such a behavior? It seems that Claude 4.0 has actually improved in that respect, compared to its 3.7 version.

0.000

@andypathy 42

14 days ago

While this is okay for long-running tasks, it's not ideal for iterative development or quick checks, like confirming indexing or spotting simple errors.

0.000

@andypathy 42

14 days ago

This often leads to having to stop them with prompts like "Stop overthinking, focus on this single file, avoid tool usage, and don't over-engineer."

0.000

@andypathy 42

14 days ago

As the default shifts towards an "ultrathink" mode, there's a growing need for methods to convey intent or stakes---from "just have a quick look" to "spend 30 minutes ensuring certainty."

0.000