Apple researchers show that AI can't even solve grade-school math problems very well

0comments
ChatGPT logo on a sci-fi-looking background.
Apple researchers have now confirmed some serious logical faults in generative AI's reasoning, especially when it comes to numbers and math. In fact, it seems AI isn't as "smart" as is believed, and couldn't achieve a stellar result when solving basic elementary-school math problems.

A newly-published paper from six Apple researchers called "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models", shows that math reasoning that advanced large language models (LLMs) can be inaccurate and fragile.

What the researchers did was start with GSM8K (this is a dataset of high-quality linguistically diverse grade school math word problems) and its standardized set of 8,000 grade-school level math problems.

This is a common benchmark for testing LLMs. Then, the researchers slightly altered the wording without changing the problem logic and called this test the GSM-Symbolic test.


The first set of testing recorded a performance drop between 0.3 percent and 9.2 percent. The second set, which had a statement included in some of the problems that had nothing to do with the answer, showed "catastrophic performance drops" from about 17.5 percent to a massive 65.7 percent.

For some people, this is not at all surprising. I've personally seen AI struggle with some simple tasks related to numbers. In fact, AI doesn't properly solve math problems but instead uses simple "pattern matching" to convert statements to operations without truly grasping what it all means.

It seems that the AI tended to fail to solve simple math problems because the words were essentially too confusing or didn't follow the exact pattern. All in all, it seems that AI just gives the illusion of "reasoning" and instead just relies on hoarding data and then processing it.

But what would that mean for the bigger picture? We've all been way too focused on AI recently and it seems that some people are expecting wonders from it (I'm also guilty of similar thinking). But it has serious limitations and I'm not sure if those would be able to be mitigated. Of course, I'm not an AI scientist, but it'll be very curious to see where AI's growth will stagger (well, apart from math, that is!).

Recommended Stories

Loading Comments...
FCC OKs Cingular\'s purchase of AT&T Wireless