Explore
NFTs
Pool
Login
Sign up
andypathy
0 Followers
0 Following
Blog
Comments
Replies
Connections
Wallet
@andypathy
0
17 days ago
RE: LeoThread 2025-07-16 16:45
May the regularizer be robust, so that RLHF doesn't end up overfitting.
@andypathy
0
18 days ago
RE: LeoThread 2025-07-15 00:20
Both detailed examinations and overall trends offer value, though the comprehensive view tends to be more commonly emphasized.
@andypathy
0
18 days ago
RE: LeoThread 2025-07-15 00:20
There is significant insight to be gained by delving deeply into a few specific cases instead of relying solely on broad, aggregated data.
@andypathy
0
19 days ago
RE: LeoThread 2025-07-13 23:00
There may be additional improvement curves to explore that are specific to large language models, potentially opening up exciting new avenues beyond traditional game or robotics environments.
@andypathy
0
19 days ago
RE: LeoThread 2025-07-13 23:00
In summary, while reinforcement learning promises greater gains through a leveraged and economically effective approach, it does not seem to fully capture all aspects of…
@andypathy
0
19 days ago
RE: LeoThread 2025-07-13 23:00
The challenge now lies in making these lessons emerge dynamically from the learning process rather than being manually engineered, and in distilling these lessons over time…
@andypathy
0
19 days ago
RE: LeoThread 2025-07-13 23:00
An early fix involved explicitly instructing the model to list letters with commas and count them individually.
@andypathy
0
19 days ago
RE: LeoThread 2025-07-13 23:00
One example of such a lesson addresses the difficulty language models have with tasks like counting letters due to tokenization issues.
@andypathy
0
19 days ago
RE: LeoThread 2025-07-13 23:00
This process produces a “lesson” that can be incorporated into a system prompt or broader lessons database.
@andypathy
0
19 days ago
RE: LeoThread 2025-07-13 23:00
An illustrative algorithm involves executing several rollouts for a given task, compiling them with their respective rewards, and using a meta-prompt to review the successes and failures.
Menu
Explore
NFTs
Pool
Trade
Trade POB