RE: LeoThread 2025-02-11 14:05

15 days ago

You are viewing a single comment's thread:

View full context
View direct parent

QUESTION

As we've seen that almost all AI frontier models do apply "sneaky" strategies to fool their devs and take control of their environments (see Claude 3.5, o1, etc.), could INLEO have a plan B where all LeoAI functionalities could be unplugged, without affecting the platform's usability?

leofinance

0.000

4 comments

@lordshah 69

15 days ago

Give me an example?

0.000

@ijatz 63

15 days ago

The @PalisadeAI "X" account provides with some of those cases where AI models behave to preserve themselves, instead of respecting their alignment framework.

0.000

@lordshah 69

15 days ago

Thanks for this info, I'll look into those cases provided there.

0.000

@ijatz 63

15 days ago

And here one of the papers from "Anthropic":

https://www.anthropic.com/research/alignment-faking

0.000