Not All AI Models Are Made Equal

I'm sure you know that. They differ between companies making them available, and they differ between the models of the same company.

But I'm not here to tell you all that, but rather to tell you a funny thing (in the end) that just happened to me, involving AIs.

So... I was tracking down a nasty bug that happens with my charts when resizing the window. Will likely nail it down in the end, but not being particularly skilled in this area, I used an AI to investigate what was going on.

I wasn't paying attention to the AI model that was selected. Usually ChatGPT is pretty decent at finding little bugs and saying how to fix them without being too verbose about it, which consumes tokens. And this sounded like what was going on now...

Something I'm not sure I like about ChatGPT, in general, is that if you upload a code file, it doesn't seem to process it unless it really has to. Maybe it's trained to be lazy about it, maybe the researchers know that most people upload files not needed or too big that don't really matter. And processing them takes computing power and therefore, tokens.

But that also makes ChatGPT guess a lot before actually taking a look at the code.

That happened today too. It turned out—according to it—that its first guess was wrong (it didn't actually say that). It seemed on the right track though... I would have guessed the same way, at first.

Then, it actually looked at the code, and said something like: "Oh wait, you use this stuff, that means we need to approach the problem differently.".

So far, it sounds almost scientific, apart from the guesswork.

But here's where the funny part starts. It started to talk about things like they were in my code and... they weren't. I mean... function names, calls, and everything.

Despite not remembering what it was talking about, I thought it is possible I might have forgotten or left out some old piece of code by mistake, now unused. So I searched for the function names. Nowhere to be found. I told it that. It insisted I have them and showed me more evidence, none from my code.

Ok, I said, if I don't have it, maybe it's in the D3 code I load in my HTML. Very unlikely, since that is minified, but hey, let's ask ChatGPT. On this, it agreed with me. But ChatGPT still thought that was my code that I uploaded minutes ago.

The only logical explanation I could find other than ChatGPT having a really bad time, was that I discovered I used the conversational model, not the one specialized in coding. I promptly switched, but haven't continued with a new session before writing this while events are fresh in my mind.

I wonder if these models should decline their competence when there are better (free) models from the same company available to serve a certain query, rather than attempt to answer it to the best of their ability and making a fool of themselves.

In my already pretty extensive experience on using AI models to help out with coding, I ran into... many situations when they start "breaking". A fresh session usually helps (kind of like a reboot for computers), but that also means the necessity of a new context provided (i.e. reloading programs and opening documents you worked on, for computers). And you can't do that many times daily, due to token restrictions, at least not for free.

Posted Using INLEO



0
0
0.000
18 comments
avatar

Lol that was fun... Actuality sometimes it fucks up, like ignoring what you write and goes on his own path, you need to make it notice its mistake

0
0
0.000
avatar

Hallucination is still a major problem for AI.

0
0
0.000
avatar

It is, unfortunately. Especially dangerous when they don't want to admit their mistake and persist with the hallucination.

0
0
0.000
avatar

Yep, I tried to. But it doubled down on its hallucination, as amr said below. That's another problem for AIs. They sometimes are convinced of their own hallucinations. I think it's the first time for me so... obvious and without being able to convince it that it was wrong. They are wrong many times, but they usually admit when they are.

0
0
0.000
avatar

AI really said trust me bro and doubled down on nonsense. Sometimes when I'm chatting with chatgpt it feels like arguing with a confident kindergartener who just learned a new word. It made a big mistake and a person said it's wrong, I fact checked and it was so wrong yet said it to me like facts and later apologized

0
0
0.000
avatar

Good parallel with a kindergartner. It might be the case, sometimes. They are just as stubborn, lol. I don't want to think what they'll do when they'll be in their teen age.

0
0
0.000
avatar

In their teens it's catastrophe😂😂😂😂 but lowkey I love AI capabilities

0
0
0.000
avatar

Artificial intelligence models like ChatGPT always store your first conversation with it and build the subsequent answer based on that. This causes confusion and deviation from context in different topics, and this is the biggest flaw in these models, in my opinion.

0
0
0.000
avatar

They have different ways to approach this. For example, Claude doesn't store context beyond the current session (at least we don't know them to do that). Once you reach the length limit of your current session, if you still have tokens to use, you can open a new session and start over. Technically, that makes more sense since the context an AI can reliably use without forgetting or distorting things is pretty limited. Once that happens, pretty much nothing it says based on previous context is reliable anymore, until you start a new session and provide new context.

0
0
0.000
avatar

You're very right, I've noticed this with my account too...

It's always giving response based on what we've built on from the initial

0
0
0.000
avatar

Lol, you'll have to persist on providing the evidence that it's wrong before the AI accepts it. This could be potentially dangerous in a medical field, as in giving a false diagnosis. A lazy AI and a lazy human interacting is a not a good combo. Better to always verify :)

0
0
0.000
avatar
(Edited)

I actually wanted to upload the file again, maybe that would have helped. But already reached my limit of uploads per day (2). And pasting the module wouldn't have been a solution. It would have generated a "too long" prompt error. So... I had to leave it like that with both of us keeping our own opinion. But know that these models don't evolve from interacting with us, beyond the context of our conversation. They need to be re-trained to learn new things.

0
0
0.000
avatar

It's tough when there are so many models. It's easy to get things mixed up, and I think it's hard to deal with sessions. Sometimes you want the previous stuff, and other times, you don't.

0
0
0.000
avatar

Yeah, I thought the selected model sticks, but it looks like it doesn't. For example, now it introduces me GPT-5 for the first time.

I think it's hard to deal with sessions. Sometimes you want the previous stuff, and other times, you don't.

I got used to this as I got used to many other tweaks to squeeze more out of these models. At some point, this won't be a problem anymore as the tech barriers are passed, but until then, the workarounds are ok, most of the times.

0
0
0.000
avatar

You did well with the post, thanks for sharing

0
0
0.000