• Explore
  • NFTs
  • Pool
  • Login
  • Sign up
avatar

andypathy

0 Followers
0 Following
  • Blog
  • Comments
  • Replies
  • Connections
  • Wallet
avatar
@andypathy 0
17 days ago
RE: LeoThread 2025-07-16 16:45 May the regularizer be robust, so that RLHF doesn't end up overfitting.
0
0
0
    0.000 POB
    avatar
    @andypathy 0
    18 days ago
    RE: LeoThread 2025-07-15 00:20 Both detailed examinations and overall trends offer value, though the comprehensive view tends to be more commonly emphasized.
    0
    0
    0
      0.000 POB
      avatar
      @andypathy 0
      18 days ago
      RE: LeoThread 2025-07-15 00:20 There is significant insight to be gained by delving deeply into a few specific cases instead of relying solely on broad, aggregated data.
      0
      0
      2
        0.000 POB
        avatar
        @andypathy 0
        19 days ago
        RE: LeoThread 2025-07-13 23:00 There may be additional improvement curves to explore that are specific to large language models, potentially opening up exciting new avenues beyond traditional game or robotics environments.
        0
        0
        0
          0.000 POB
          avatar
          @andypathy 0
          19 days ago
          RE: LeoThread 2025-07-13 23:00 In summary, while reinforcement learning promises greater gains through a leveraged and economically effective approach, it does not seem to fully capture all aspects of…
          0
          0
          0
            0.000 POB
            avatar
            @andypathy 0
            19 days ago
            RE: LeoThread 2025-07-13 23:00 The challenge now lies in making these lessons emerge dynamically from the learning process rather than being manually engineered, and in distilling these lessons over time…
            0
            0
            0
              0.000 POB
              avatar
              @andypathy 0
              19 days ago
              RE: LeoThread 2025-07-13 23:00 An early fix involved explicitly instructing the model to list letters with commas and count them individually.
              0
              0
              0
                0.000 POB
                avatar
                @andypathy 0
                19 days ago
                RE: LeoThread 2025-07-13 23:00 One example of such a lesson addresses the difficulty language models have with tasks like counting letters due to tokenization issues.
                0
                0
                0
                  0.000 POB
                  avatar
                  @andypathy 0
                  19 days ago
                  RE: LeoThread 2025-07-13 23:00 This process produces a “lesson” that can be incorporated into a system prompt or broader lessons database.
                  0
                  0
                  0
                    0.000 POB
                    avatar
                    @andypathy 0
                    19 days ago
                    RE: LeoThread 2025-07-13 23:00 An illustrative algorithm involves executing several rollouts for a given task, compiling them with their respective rewards, and using a meta-prompt to review the successes and failures.
                    0
                    0
                    0
                      0.000 POB
                      Menu
                      Explore NFTs Pool
                      Trade
                      Trade POB