LLMs are not good at counting, are they?

LLMs können nicht gut zählen und den eigenen Prompt referenzieren. Zumindest war das bis vor Kurzem der Fall und eine Aufgabe, bei denen LLMs kolossal gescheitert sind.

Auch kleinere aktuelle Open-Source-Modelle wie das chinesische GLM-4-0414 (32B) von zAI haben damit immer noch Probleme.

GPT-5 (Copilot) kann die Aufgabe aber erstaunlicherweise lösen.

GLM-4.6 schafft es mittlerweile auch, allerdings nur im Thinking-Modus.

Grundsätzlich sind Rechenaufgaben keine Stärke von LLMs, aber wenn wir ehrlich sind, ist es auch keine Stärke von menschlicher Intelligenz.

Kinder müssen jahrelang Multiplikationstabellen und komplizierte Stift und Papier-Algorithmen auswendig lernen, um überhaupt größere Rechenaufgaben bewältigen zu können.

Trotz der enormen Rechenleistung des Gehirns können die meisten Menschen gerade einmal kleine Zähl-Aufgaben im ein- und zweistelligen Bereich lösen, dann ist ohne Hilfsmittel und jahrelanges Training normalerweise Sense.

Warum denkt ihr verzählen sich Sprachmodelle und auch Menschen so oft? Stellt euch vor ein Sprachmodell könnte auf die Rechenleistung von Computern zurückgreifen und die Fähigkeiten vereinen.

LLMs are not good at counting, are they?

glm-4-counting-1.jpg

GLM-4-0414 (32B) failing to count the number of a letter in the prompt

https://chat.z.ai/

GPT-5 (copilot)

Count the number of the letter 'e' in this prompt!

Thinking:

Counting occurrences of 'e'
I'm working through counting 'e's in the prompt. It says to count only the lowercase "e," so I'll focus on those. I'll break down the text into words and check each one. For example, “the” has 1 'e', “number” has 1 'e', and “letter” has 2 'e's. Just focusing on these, I get a total of 6 'e's. I’m double-checking my breakdown and considering if I might be missing something, but it seems like I’ve covered all the ‘e’ counts based on the prompt.

Confirming letter count
Okay, after reviewing it, I can confirm that there are 6 lowercase 'e's in the prompt. No need for any long explanation; it's just a straightforward answer. 6 it is!

Answer:
6

English

LLMs are not good at counting and referencing their own prompts. At least that was the case until recently, and it was a task at which LLMs failed colossally.

Even smaller, current open-source models such as the Chinese GLM-4-0414 (32B) from zAI still have problems with this.

Surprisingly, however, GPT-5 (Copilot) can solve the task.

GLM-4.6 can now do it too, but only in thinking mode.

Basically, arithmetic tasks are not a strength of LLMs, but if we are honest, they are not a strength of human intelligence either.

Children have to spend years memorizing multiplication tables and complicated pen-and-paper algorithms in order to be able to tackle larger arithmetic tasks at all.

Despite the enormous computing power of the brain, most people can only solve small counting problems in the single and double-digit range, without aids and years of training, humans are usually lost.

Why do you think language models and even humans miscount so often? Imagine if a language model could draw on the computing power of computers and combine the two capabilities.



0
0
0.000
7 comments
avatar

Maybe it's the aspect of the brain that does mathematics isn't well developed or rather used often. I think LLMs are mirroring the abilities of their creators :D

0
0
0.000
avatar

The LLMs and AI seem to be fairly inaccurate. They have alot more work to be done before they are ready as you have to double check everything.

0
0
0.000