Apple researchers have now confirmed some serious logical faults in generative AI's reasoning, especially when it comes to numbers and math. In fact, it seems AI isn't as "smart" as is believed, and couldn't achieve a stellar result when solving basic elementary-school math problems.
A newly-published paper from six Apple researchers called "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models", shows that math reasoning that advanced large language models (LLMs) can be inaccurate and fragile.
What the researchers did was start with GSM8K (this is a dataset of high-quality linguistically diverse grade school math word problems) and its standardized set of 8,000 grade-school level math problems.
This is a common benchmark for testing LLMs. Then, the researchers slightly altered the wording without changing the problem logic and called this test the GSM-Symbolic test.
How smart could AI be? | Image Credit - Solen Feyissa on Unsplash
The first set of testing recorded a performance drop between 0.3 percent and 9.2 percent. The second set, which had a statement included in some of the problems that had nothing to do with the answer, showed "catastrophic performance drops" from about 17.5 percent to a massive 65.7 percent.
For some people, this is not at all surprising. I've personally seen AI struggle with some simple tasks related to numbers. In fact, AI doesn't properly solve math problems but instead uses simple "pattern matching" to convert statements to operations without truly grasping what it all means.
It seems that the AI tended to fail to solve simple math problems because the words were essentially too confusing or didn't follow the exact pattern. All in all, it seems that AI just gives the illusion of "reasoning" and instead just relies on hoarding data and then processing it.
But what would that mean for the bigger picture? We've all been way too focused on AI recently and it seems that some people are expecting wonders from it (I'm also guilty of similar thinking). But it has serious limitations and I'm not sure if those would be able to be mitigated. Of course, I'm not an AI scientist, but it'll be very curious to see where AI's growth will stagger (well, apart from math, that is!).
Create a free account and join our vibrant community
Register to enjoy the full PhoneArena experience. Here’s what you get with your PhoneArena account:
Izzy, a tech enthusiast and a key part of the PhoneArena team, specializes in delivering the latest mobile tech news and finding the best tech deals. Her interests extend to cybersecurity, phone design innovations, and camera capabilities. Outside her professional life, Izzy, a literature master's degree holder, enjoys reading, painting, and learning languages. She's also a personal growth advocate, believing in the power of experience and gratitude. Whether it's walking her Chihuahua or singing her heart out, Izzy embraces life with passion and curiosity.
Recommended Stories
Loading Comments...
COMMENT
All comments need to comply with our
Community Guidelines
Phonearena comments rules
A discussion is a place, where people can voice their opinion, no matter if it
is positive, neutral or negative. However, when posting, one must stay true to the topic, and not just share some
random thoughts, which are not directly related to the matter.
Things that are NOT allowed:
Off-topic talk - you must stick to the subject of discussion
Offensive, hate speech - if you want to say something, say it politely
Spam/Advertisements - these posts are deleted
Multiple accounts - one person can have only one account
Impersonations and offensive nicknames - these accounts get banned
Moderation is done by humans. We try to be as objective as possible and moderate with zero bias. If you think a
post should be moderated - please, report it.
Have a question about the rules or why you have been moderated/limited/banned? Please,
contact us.
Things that are NOT allowed: