Anthropic AI finds massive security flaws worldwide

9,999 Views | 116 Replies | Last: 27 days ago by ErnestEndeavor
KingofHazor
How long do you want to ignore this user?
bmks270 said:

KingofHazor said:

I've used Claude and several other AIs quite a bit in an attempt to find help in doing scholarly research. The positive is that they, Claude in particular, can suggest ideas that I had not even considered. Nor, as best I can tell, has anyone else ever considered them. In other words, Claude appears to have original ideas.

The bad is that the net output is worthless. Every idea, no matter how original, has to be anchored in some reality. Claude will cite articles in support of his/its novel ideas, but the articles turn out not to exist. Claude readily admits that it is hallucinating, but admits so in a very friendly, disarming manner.

It raises the question, in my mind at least, how much the output of these AIs can be completely trusted. I came across an article recently in which the author claimed that these flaws cannot be cured but are baked into the very hardware of the AIs. Is that correct? I have no idea. But his thesis is that we are quickly reaching the ceiling for the AIs, rather than the exponential improvement that many AI bros are claiming.

My personal experience, using AIs for things like scholarly research, and mundane things like shopping for the best prices, is that AI output cannot be trusted to be accurate at all.



Hallucinations are baked in because it's really a next word predictor based on training data. It doesn't know facts from fiction. It doesn't use logic or reasoning. It's interesting that some of the ideas appear to you to be novel. Maybe because its training is on word associations, and your training is in a research field?

AI is really good at code because code is so structured. The prediction of the next word is a lot easier as a result.

It just returns words that look a lot like words in the training data.

Good points, but it also just makes **** up. I can't figure out how that is done based on a statistical algorithm, or however the AIs work.
BusterAg
How long do you want to ignore this user?
AG
Over_ed said:

BusterAg said:

I think that it is short-sighted to think that there is no possible fix for the hallucinations thing.

Again, one way to sniff them out is to turn the AI on itself. It gives you an argument, you ask it why that argument could potentially be wrong, and zero in on the things that it identifies.

Once you get the AI to do that well automatically, a lot of these hallucinations may go away.

Or, there may be some other way forward that we haven't even thought of yet. But to say that it is an impossible problem to fix is short sighted in my opinion.

I agree the hallucinations are overblown and can generally be avoided/minimized,

First AI running in several roles, providing its own validation. THEN a second AI checking the work of the first.

A pain to set up, but little additional work once it is set up.

My prediction is that someone else will provide a service to set it up for you sometime before 2027.
Stmichael
How long do you want to ignore this user?
AG
Jeeper79 said:

Stmichael said:

Logos Stick said:

Claude is now the best software engineer and top cybersecurity expert in the world. With Mythos, I can have Claude quickly decompile and disassemble any code on my desktop and find vulnerabilities. This is a serious national security issue, imo.


And yet, every software engineer points out that LLM's only understand coding on a small scale and have no concept of data structures, organization, ease of maintenance, etc. Left to its own devices, Claude will generate a pile of junk code that will take twice as long to fix as it would to simply start fresh.
I generally agree, but they could probably be taught, the same as a human.

Can LLMs speak up when they don't k ow something and need to be shown? Or can they only confidently spout nonsense?


LLM's don't actually understand things the way you or I do. It has an excessively large number of statistical relationships between a variety of words and parts of words that (pretty closely) approximate the semblance of speech. It's trying to guess every word it says based on context clues and statistics rather than a confident understanding of the thing it's trying to say. That's how it can screw up basic arithmetic that a simple handheld calculator can perform easily - it's not actually doing math, just guessing what the answer is. So, can it be taught? Not in it's current state, nor even with the current model. The idea of axiomatic facts is a foreign concept to LLM's.

As to why they hallucinate so much, it's a consequence of their goal-seeking "training." A non-answer is the same as a wrong answer to it, but a guess has a very small chance of being right accidentally. Thus, if the model has nothing to point it in the right direction, it will guess as if that's the confident right answer.
3 Toed Pete
How long do you want to ignore this user?
AG
Just imagine how much money the next democratic President can sell this to Iran for. Or China.
Deputy Travis Junior
How long do you want to ignore this user?
This is very outdated. Reinforcement learning (post training) is a cycle that trains/teaches AI models to perform better at certain tasks.

Results have been incredible. Teaching LLMs via RL is why somebody who's never programmed can now create a mobile app in a weekend.
Logos Stick
How long do you want to ignore this user?
Stmichael said:

Logos Stick said:

Claude is now the best software engineer and top cybersecurity expert in the world. With Mythos, I can have Claude quickly decompile and disassemble any code on my desktop and find vulnerabilities. This is a serious national security issue, imo.


And yet, every software engineer points out that LLM's only understand coding on a small scale and have no concept of data structures, organization, ease of maintenance, etc. Left to its own devices, Claude will generate a pile of junk code that will take twice as long to fix as it would to simply start fresh.



The engineers doing that are either ignorant or full of you know what and trying to protect their jobs. It's understandable, I can't blame them. Small scale?! Opus 4.6 can now do single context analysis of up to 1 million tokens, which is about 800k lines of code. Claude writes its own code now. Opus has solved the coding problem. The next models will be even better because it's an exponential curve.
bmks270
How long do you want to ignore this user?
AG
Logos Stick said:

Stmichael said:

Logos Stick said:

Claude is now the best software engineer and top cybersecurity expert in the world. With Mythos, I can have Claude quickly decompile and disassemble any code on my desktop and find vulnerabilities. This is a serious national security issue, imo.


And yet, every software engineer points out that LLM's only understand coding on a small scale and have no concept of data structures, organization, ease of maintenance, etc. Left to its own devices, Claude will generate a pile of junk code that will take twice as long to fix as it would to simply start fresh.



The engineers doing that are either ignorant or full of you know what and trying to protect their jobs. It's understandable, I can't blame them. Small scale?! Opus 4.6 can now do single context analysis of up to 1 million tokens, which is about 800k lines of code. Claude writes its own code now. Opus has solved the coding problem. The next models will be even better because it's an exponential curve.


Lol according to who?

Anthropic and OpenAI?
Stmichael
How long do you want to ignore this user?
AG
Deputy Travis Junior said:

This is very outdated. Reinforcement learning (post training) is a cycle that trains/teaches AI models to perform better at certain tasks.

Results have been incredible. Teaching LLMs via RL is why somebody who's never programmed can now create a mobile app in a weekend.

Reinforcement learning was part of the mix from the start. Nothing has changed. The kind of coding that goes on with LLM's, while "functional", has a strong tendency to break down and be difficult to follow and thus to repair. A competent software engineer who has a strong grasp of programming principles can not only put together better code using snippets of publicly available code, but it will also be much cleaner and easier to fix and build on later.

But let's take this another direction: How's AI handling law or engineering? Try asking it to put together a business proposal for you and then see how bad it is at even basic math. Ask it to draw you a diagram for a wood working project, or an isometric drawing for machining something.

It's crap. This junk isn't even close to delivering on one tenth of the promises that the grifters who built them have made.
Stmichael
How long do you want to ignore this user?
AG
Logos Stick said:

Stmichael said:

Logos Stick said:

Claude is now the best software engineer and top cybersecurity expert in the world. With Mythos, I can have Claude quickly decompile and disassemble any code on my desktop and find vulnerabilities. This is a serious national security issue, imo.


And yet, every software engineer points out that LLM's only understand coding on a small scale and have no concept of data structures, organization, ease of maintenance, etc. Left to its own devices, Claude will generate a pile of junk code that will take twice as long to fix as it would to simply start fresh.



The engineers doing that are either ignorant or full of you know what and trying to protect their jobs. It's understandable, I can't blame them. Small scale?! Opus 4.6 can now do single context analysis of up to 1 million tokens, which is about 800k lines of code. Claude writes its own code now. Opus has solved the coding problem. The next models will be even better because it's an exponential curve.

Go on then, write me some fancy code. If it's as good as you say, you should be absolutely bursting with productivity and making millions off over-employment.
ntxVol
How long do you want to ignore this user?
Stmichael said:

Deputy Travis Junior said:

This is very outdated. Reinforcement learning (post training) is a cycle that trains/teaches AI models to perform better at certain tasks.

Results have been incredible. Teaching LLMs via RL is why somebody who's never programmed can now create a mobile app in a weekend.

Reinforcement learning was part of the mix from the start. Nothing has changed. The kind of coding that goes on with LLM's, while "functional", has a strong tendency to break down and be difficult to follow and thus to repair. A competent software engineer who has a strong grasp of programming principles can not only put together better code using snippets of publicly available code, but it will also be much cleaner and easier to fix and build on later.

But let's take this another direction: How's AI handling law or engineering? Try asking it to put together a business proposal for you and then see how bad it is at even basic math. Ask it to draw you a diagram for a wood working project, or an isometric drawing for machining something.

It's crap. This junk isn't even close to delivering on one tenth of the promises that the grifters who built them have made.
I've been one of the most vocal detractors of AI on here for a while. It has come a long way in a short time. I've started using claude code recently and I don't write much code anymore. That's actually a small part of my job anyway but Claude is really good at that. It's what many AI platforms were first built for. As far as the other things, I don't know. I can't speak to those types of tasks but I don't ask it to do anything like that.
Logos Stick
How long do you want to ignore this user?
Stmichael said:

Deputy Travis Junior said:

This is very outdated. Reinforcement learning (post training) is a cycle that trains/teaches AI models to perform better at certain tasks.

Results have been incredible. Teaching LLMs via RL is why somebody who's never programmed can now create a mobile app in a weekend.

Reinforcement learning was part of the mix from the start. Nothing has changed. The kind of coding that goes on with LLM's, while "functional", has a strong tendency to break down and be difficult to follow and thus to repair. A competent software engineer who has a strong grasp of programming principles can not only put together better code using snippets of publicly available code, but it will also be much cleaner and easier to fix and build on later.


Not true. Reinforcement learning wasn't baked in from day one. Early GPT models just did unsupervised pre-training on a ton of text, then maybe some supervised fine-tuning on instruction data. That's it.

A competent engineer using an LLM as a powerful assistant routinely produces cleaner, more modular results than they could by hand in the same time. That is obvious to anyone who actually uses it.
Deputy Travis Junior
How long do you want to ignore this user?
Bad at basic math? Have you looked at nothing since 2023? Modern AI is acing math olympiad problem sets and is starting to solve erdos problems. Next token predictions leading to 2+2=3 is far in the past.

The AI can't write software claim is hopelessly wrong and goes against everything the best programmers on earth are saying. My personal anecdote on coding is Replit just put the finishing touches on a mobile app that I directed it to develop. I registered an LLC for it a half hour ago and will be publishing it to app stores in a couple weeks after beta testing + some legal stuff. Took me probably 15 hours and a hundred bucks of token credits to build.

If you already knew about RL, not sure why you said you can't teach models. That's exactly what RL is.
Logos Stick
How long do you want to ignore this user?
ntxVol said:

Stmichael said:

Deputy Travis Junior said:

This is very outdated. Reinforcement learning (post training) is a cycle that trains/teaches AI models to perform better at certain tasks.

Results have been incredible. Teaching LLMs via RL is why somebody who's never programmed can now create a mobile app in a weekend.

Reinforcement learning was part of the mix from the start. Nothing has changed. The kind of coding that goes on with LLM's, while "functional", has a strong tendency to break down and be difficult to follow and thus to repair. A competent software engineer who has a strong grasp of programming principles can not only put together better code using snippets of publicly available code, but it will also be much cleaner and easier to fix and build on later.

But let's take this another direction: How's AI handling law or engineering? Try asking it to put together a business proposal for you and then see how bad it is at even basic math. Ask it to draw you a diagram for a wood working project, or an isometric drawing for machining something.

It's crap. This junk isn't even close to delivering on one tenth of the promises that the grifters who built them have made.
I've been one of the most vocal detractors of AI on here for a while. It has come a long way in a short time. I've started using claude code recently and I don't write much code anymore. That's actually a small part of my job anyway but Claude is really good at that. It's what many AI platforms were first built for. As far as the other things, I don't know. I can't speak to those types of tasks but I don't ask it to do anything like that.


The days of the pure developer are numbered - tomorrow's builder will be one part product thinker, one part systems architect, and one part AI wrangler.
BillYeoman
How long do you want to ignore this user?
3 Toed Pete said:

Just imagine how much money the next democratic President can sell this to Iran for. Or China.


And coal miners learning to transition their skills to coding….
Burn-It
How long do you want to ignore this user?
AG
Sell all electronic securities, including retirement accounts, buy physical assets, ignore or fight the IRS' claims until AI dooms-day occurs.

Sounds crazy-town, but once an AI agent wipes out all financial institutional data, everyone is screwed, the world goes dark, the IRS has no power, the only transactional currency is physical.

You think Anthropic recently gave 10 of the largest US financial institutions & AI chip manufacturers access to their latest AI agent is by accident?, you're sticking your head in the sand. Anthropic sees the danger they built & trying to put the toothpaste back in the tube.
AKA 13-0
Traces of Texas
How long do you want to ignore this user?
Speaking of massive security flaws worldwide, China has them.


China Supercomputer Hacked, Steals 10 Petabytes worth of Data
“I must say as to what I have seen of Texas, it is the garden spot of the world. The best land & best prospects for health I ever saw is here, and I do believe it is a fortune to any man to come here.” —– David Crockett
Logos Stick
How long do you want to ignore this user?
According to "whom".
Logos Stick
How long do you want to ignore this user?
It's a national security threat.

SquirrellyDan
How long do you want to ignore this user?
AG
Stmichael said:

Deputy Travis Junior said:

This is very outdated. Reinforcement learning (post training) is a cycle that trains/teaches AI models to perform better at certain tasks.

Results have been incredible. Teaching LLMs via RL is why somebody who's never programmed can now create a mobile app in a weekend.

Reinforcement learning was part of the mix from the start. Nothing has changed. The kind of coding that goes on with LLM's, while "functional", has a strong tendency to break down and be difficult to follow and thus to repair. A competent software engineer who has a strong grasp of programming principles can not only put together better code using snippets of publicly available code, but it will also be much cleaner and easier to fix and build on later.

But let's take this another direction: How's AI handling law or engineering? Try asking it to put together a business proposal for you and then see how bad it is at even basic math. Ask it to draw you a diagram for a wood working project, or an isometric drawing for machining something.

It's crap. This junk isn't even close to delivering on one tenth of the promises that the grifters who built them have made.


Not sure how much you've tried out the newer models. What you're saying is simply wrong.
Deputy Travis Junior
How long do you want to ignore this user?
To anybody who's still dissing AI, I can only advise you all to start shelling out $20/month, turn on reasoning, learn how to write good prompts + build reusable/agentic processes, and then don't come up for air.

We're kicking off one of the greatest entrepreneurial periods in human history; I mean if you're creative, these things give you business superpowers. You can literally build a complex app that looks professionally designed for $100-200 of tokens, and then let another agent handle marketing for another $30-50/month. That means that if you're driven, you can try a dozen business ideas for less than a single paycheck, and you don't have to quit your current job, work 100-hour silicon valley weeks, or manage any employees to do it either. You just solo tinker and magic happens.
flakrat
How long do you want to ignore this user?
AG
DTP02 said:

https://www.nytimes.com/2026/04/07/opinion/anthropic-ai-claude-mythos.html?unlocked_article_code=1.ZVA.DuZ1.tSnsJb7Od3ZD&smid=nytcore-android-share

Quote:

Anthropic said it found critical exposures in every major operating system and Web browser, many of which run power grids, waterworks, airline reservation systems, retailing networks, military systems and hospitals all over the world.

If this A.I. tool were, indeed, to become widely available, it would mean the ability to hack any major infrastructure system a hard and expensive effort that was once essentially the province only of private-sector experts and intelligence organizations will be available to every criminal actor, terrorist organization and country, no matter how small.


According to the writer, Anthropic developed an AI that got so good, so quickly at finding security flaws that it "scared them" and they're no longer planning a wide release out of fear of the inevitability of misuse. Instead they reached out to the govt and other major tech companies to help ensure the weaknesses it found were shored up.

Scary stuff, and the province where many future battles will be fought I'm sure. Makes me want to be a Luddite.

If they truly wanted to turn this into a positive, they'd train the AI to find and suggest fixes to the security flaws. They would then release them to the public as a sign of good will.

Waiting....
ts5641
How long do you want to ignore this user?
AI will end us eventually.
dude95
How long do you want to ignore this user?
AG
flakrat said:

DTP02 said:

https://www.nytimes.com/2026/04/07/opinion/anthropic-ai-claude-mythos.html?unlocked_article_code=1.ZVA.DuZ1.tSnsJb7Od3ZD&smid=nytcore-android-share

Quote:

Anthropic said it found critical exposures in every major operating system and Web browser, many of which run power grids, waterworks, airline reservation systems, retailing networks, military systems and hospitals all over the world.

If this A.I. tool were, indeed, to become widely available, it would mean the ability to hack any major infrastructure system a hard and expensive effort that was once essentially the province only of private-sector experts and intelligence organizations will be available to every criminal actor, terrorist organization and country, no matter how small.


According to the writer, Anthropic developed an AI that got so good, so quickly at finding security flaws that it "scared them" and they're no longer planning a wide release out of fear of the inevitability of misuse. Instead they reached out to the govt and other major tech companies to help ensure the weaknesses it found were shored up.

Scary stuff, and the province where many future battles will be fought I'm sure. Makes me want to be a Luddite.

If they truly wanted to turn this into a positive, they'd train the AI to find and suggest fixes to the security flaws. They would then release them to the public as a sign of good will.

Waiting....

It can find and fix security flaws. The current version does that right now. The question is - what about for code bases the owners aren't using it to fix. If Chase bank is 99.99999% secure right now, the fear is hackers could use the new model will find those security flaws before Chase has the time to use it to fix the security flaws. Now walk through this theory with every business on the internet.

but I also saw there was this
Over_ed
How long do you want to ignore this user?
AG
flakrat said:

DTP02 said:

https://www.nytimes.com/2026/04/07/opinion/anthropic-ai-claude-mythos.html?unlocked_article_code=1.ZVA.DuZ1.tSnsJb7Od3ZD&smid=nytcore-android-share

Quote:

Anthropic said it found critical exposures in every major operating system and Web browser, many of which run power grids, waterworks, airline reservation systems, retailing networks, military systems and hospitals all over the world.

If this A.I. tool were, indeed, to become widely available, it would mean the ability to hack any major infrastructure system a hard and expensive effort that was once essentially the province only of private-sector experts and intelligence organizations will be available to every criminal actor, terrorist organization and country, no matter how small.


According to the writer, Anthropic developed an AI that got so good, so quickly at finding security flaws that it "scared them" and they're no longer planning a wide release out of fear of the inevitability of misuse. Instead they reached out to the govt and other major tech companies to help ensure the weaknesses it found were shored up.

Scary stuff, and the province where many future battles will be fought I'm sure. Makes me want to be a Luddite.

If they truly wanted to turn this into a positive, they'd train the AI to find and suggest fixes to the security flaws. They would then release them to the public as a sign of good will.

Waiting....

You can stop waiting.

This is occuring, but with unintended consequences. So many bugs are being reported that some companies have stopped taking AI bug reports altogether. Other companies, who offered "bounties" for bugs have stopped. Software was even crappier than most thought. :-)
Logos Stick
How long do you want to ignore this user?

The bigger problem is all the companies who are not in the project. The foundational software is initially being targeted here, which makes sense. If Mythos is eventually released to the public without strong safeguards, it will create strong pressure on non-participating software companies to adopt it for their own security assessments.

The fact that several of these foundational companies are Anthropic competitors tells you all you need to know about the model's capabilities.

For this and other reasons, we are approaching the end of the optional "free era' of LLMs.
No Spin Ag
How long do you want to ignore this user?
BigRobSA said:

AI, taking over the world!?



Tttttrrrrrruuuuuummmmmmmpppppp!!!!!!1

Dude, how could you forget, "Thanks, Obummer!"
There are in fact two things, science and opinion; the former begets knowledge, the later ignorance. Hippocrates
ErnestEndeavor
How long do you want to ignore this user?
Independent researchers have now shown these same flaws published by Anthropic were discoverable using tiny open source models already available to the public, and they were discovered using tiny fractions of cost of the Mythos model.

Anthropic is brilliant at marketing their products. It is possible they have some super secret capability that only the top 40 tech and software companies can possibly be trusted to use it, but most likely this is a marketing push to get Anthropic embedded into those companies as part of an ecosystem.

I love a lot of the use cases for AI and there's so much that's helpful for people. I just don't trust a damn word the big companies say about anything anymore. We have been lied to so many times about capabilities and it's just getting old. Their products can be so useful for what they are and I wish they would just market what they can actually do. I suppose the problem with that is what they can actually do isn't profitable so they always have to market based on future potential.

In my opinion this is also a play to government interests and the general public to push a too big to fail narrative so when they inevitably financially fall apart there will be support for massive bailouts.
AustinAg2K
How long do you want to ignore this user?
Over_ed said:

flakrat said:

DTP02 said:

https://www.nytimes.com/2026/04/07/opinion/anthropic-ai-claude-mythos.html?unlocked_article_code=1.ZVA.DuZ1.tSnsJb7Od3ZD&smid=nytcore-android-share

Quote:

Anthropic said it found critical exposures in every major operating system and Web browser, many of which run power grids, waterworks, airline reservation systems, retailing networks, military systems and hospitals all over the world.

If this A.I. tool were, indeed, to become widely available, it would mean the ability to hack any major infrastructure system a hard and expensive effort that was once essentially the province only of private-sector experts and intelligence organizations will be available to every criminal actor, terrorist organization and country, no matter how small.


According to the writer, Anthropic developed an AI that got so good, so quickly at finding security flaws that it "scared them" and they're no longer planning a wide release out of fear of the inevitability of misuse. Instead they reached out to the govt and other major tech companies to help ensure the weaknesses it found were shored up.

Scary stuff, and the province where many future battles will be fought I'm sure. Makes me want to be a Luddite.

If they truly wanted to turn this into a positive, they'd train the AI to find and suggest fixes to the security flaws. They would then release them to the public as a sign of good will.

Waiting....

You can stop waiting.

This is occuring, but with unintended consequences. So many bugs are being reported that some companies have stopped taking AI bug reports altogether. Other companies, who offered "bounties" for bugs have stopped. Software was even crappier than most thought. :-)

Open source software have stopped taking AI bug reports because millions of PRs are being created and 99% of them are total garbage. It's not because AI is finding so much stuff wrong. It's because 14 year old kids are using it to try and up their street cred by saying, "Look at how many PR's I just created!!! I'm a badass!"

Also, any decent developer already knows most software is crap, and since AI is trained off of that software...
AustinAg2K
How long do you want to ignore this user?
ErnestEndeavor said:

Their products can be so useful for what they are and I wish they would just market what they can actually do. I suppose the problem with that is what they can actually do isn't profitable so they always have to market based on future potential.


This is what I think is going on. AI can do a lot of cool stuff, but these companies aren't close to profitable. They lose money on every request, even from paying customers. If they charged the actual cost right now, no one would use it. Maybe costs will eventually fall enough they can turn a profit, but right now the only way they can keep going right now is to get more investment. These press releases are to drive more investment.
solishu
How long do you want to ignore this user?
AG
ErnestEndeavor said:

Independent researchers have now shown these same flaws published by Anthropic were discoverable using tiny open source models already available to the public, and they were discovered using tiny fractions of cost of the Mythos model.

Anthropic is brilliant at marketing their products. It is possible they have some super secret capability that only the top 40 tech and software companies can possibly be trusted to use it, but most likely this is a marketing push to get Anthropic embedded into those companies as part of an ecosystem.

I love a lot of the use cases for AI and there's so much that's helpful for people. I just don't trust a damn word the big companies say about anything anymore. We have been lied to so many times about capabilities and it's just getting old. Their products can be so useful for what they are and I wish they would just market what they can actually do. I suppose the problem with that is what they can actually do isn't profitable so they always have to market based on future potential.

In my opinion this is also a play to government interests and the general public to push a too big to fail narrative so when they inevitably financially fall apart there will be support for massive bailouts.

Generally I'd agree with this, but it seems incongruent with the fact that Anthropic, who is on the outs with the Federal government and is suing them, would be the ones being hyped by Bessent as the "doomsday machine".
Stmichael
How long do you want to ignore this user?
AG
SquirrellyDan said:

Stmichael said:

Deputy Travis Junior said:

This is very outdated. Reinforcement learning (post training) is a cycle that trains/teaches AI models to perform better at certain tasks.

Results have been incredible. Teaching LLMs via RL is why somebody who's never programmed can now create a mobile app in a weekend.

Reinforcement learning was part of the mix from the start. Nothing has changed. The kind of coding that goes on with LLM's, while "functional", has a strong tendency to break down and be difficult to follow and thus to repair. A competent software engineer who has a strong grasp of programming principles can not only put together better code using snippets of publicly available code, but it will also be much cleaner and easier to fix and build on later.

But let's take this another direction: How's AI handling law or engineering? Try asking it to put together a business proposal for you and then see how bad it is at even basic math. Ask it to draw you a diagram for a wood working project, or an isometric drawing for machining something.

It's crap. This junk isn't even close to delivering on one tenth of the promises that the grifters who built them have made.


Not sure how much you've tried out the newer models. What you're saying is simply wrong.



This picture is less than a year old, generated off a prompt asking for a simple children's poster of the ABC's with a picture of something that begins with that letter. A prompt that a 5 year old could handle, and the supposedly world-altering technology fails with absolute confidence.

That's because such a prompt requires that you understand such notions like what the alphabet is, that the letters go in order, etc. It does nothing of the sort. It breaks down your prompt into relational connections and tries to fit the output to those parameters. It start with things like "I know what the alphabet is, so I can start by filling in all those letters first."

How about math next?

For randomly generated arithmetic that the model has not specifically trained on, LLM's have success rates in single digits at best. Even math-specific AI fails around 20% of the time for simple calculations a 4 function child's calculator would get right 100% of the time.

This means that for anything that requires any sort of problem solving or independent thought, AI is just guessing. Putting any kind of confidence in that is foolish at best, and downright dangerous more often than not.
Stmichael
How long do you want to ignore this user?
AG


And if anyone's keen for a laugh, this video of ChatGPT trying to play chess was a riot. It's like watching a really smart 3 year old who only kinda knows the rules playing against a grandmaster who is just happy the kid is having fun.
BenFiasco14
How long do you want to ignore this user?
AG
It will uncover the MJ file
CNN is an enemy of the state and should be treated as such.
Logos Stick
How long do you want to ignore this user?
For the others on this board....

that image about integer arithmetic comes from a 2023 research paper titled "GPT Can Solve Mathematical Problems Without a Calculator".

Might as well publish an example from 1990.

Grok4 - released in July of last year - scored 100% on the AIME 2025. AIME is a notoriously difficult early-college level math competition exam used to qualify students for the US Math Olympiad team. GROK 4 aced it!

Your claim is just cherry picking an old, narrow benchmark to make a broad negative point about AI capabilities. It was legit criticism in 2023-2024, but it doesn't hold water with frontier 2025-2026 models.
KingofHazor
How long do you want to ignore this user?
Logos Stick said:

For the others on this board....

that image about integer arithmetic comes from a 2023 research paper titled "GPT Can Solve Mathematical Problems Without a Calculator".

Might as well publish an example from 1990.

Grok4 - released in July of last year - scored 100% on the AIME 2025. AIME is a notoriously difficult early-college level math competition exam used to qualify students for the US Math Olympiad team. GROK 4 aced it!

Your claim is just cherry picking an old, narrow benchmark to make a broad negative point about AI capabilities. It was legit criticism in 2023-2024, but it doesn't hold water with frontier 2025-2026 models.

You guys can claim accuracy by AI all you want but my frequent use of Gemini, Claude, Grok, ChatGPT, and Elicit show that they remain replete with all kinds of errors and are absolutely untrustworthy.

The anecdotal stories of AIs failing some test and then passing it a year later with flying colors sounds like the AIs are being revised specifically to pass those tests they failed, without fixing the underlying problems that cause them to fail multiple different types of tests. It's reminiscent of stock traders tweaking their models to perform 100% of historical data, but then the models fail 100% of the time on real-time trades.
 
×
subscribe Verify your student status
See Subscription Benefits
Trial only available to users who have never subscribed or participated in a previous trial.