The launch of OpenAI’s lengthy anticipated new mannequin, GPT-5, is off to a rocky begin to say the least.
Even forgiving errors in charts and voice demos throughout yesterday’s livestreamed presentation of the brand new mannequin (really 4 separate fashions, and a ‘Thinking’ mode that may be engaged for 3 of them), plenty of person stories have emerged since GPT-5’s launch displaying it erring badly when fixing comparatively easy issues that previous OpenAI fashions — and rivals from competing AI labs — reply accurately.
For instance, knowledge scientist Colin Fraser posted screenshots displaying GPT-5 getting a math proof incorrect (whether or not 8.888 repeating is the same as 9 — it’s in fact, not).
It additionally failed on a easy algebra arithmetic downside that elementary schoolers might most likely nail, 5.9 = x + 5.11.
AI Scaling Hits Its Limits
Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be a part of our unique salon to find how prime groups are:
Turning vitality right into a strategic benefit
Architecting environment friendly inference for actual throughput beneficial properties
Unlocking aggressive ROI with sustainable AI methods
Safe your spot to remain forward: https://bit.ly/4mwGngO
Utilizing GPT-5 to evaluate OpenAI’s personal misguided presentation charts additionally didn’t yield useful or right responses.
It additionally failed on this trickier math phrase downside beneath (which, to be honest, stumped this human at first…although Elon Musk’s Grok 4 AI answered it accurately. For a touch, consider the truth that flagstones on this case can’t be divided into smaller parts. They need to stay in tact as 80 separate items, so no halves or quarters).
The older 4o mannequin carried out higher for me on no less than one in all these math issues. Sadly, OpenAI is slowly deprecating these older fashions — together with the previous default GPT-4o and the highly effective reasoning mannequin o3 — for customers of ChatGPT, although they’ll proceed to be out there within the software programming interface (API) for builders for the foreseeable future.
Not pretty much as good at coding as benchmarks point out
Regardless that OpenAI’s inside benchmarks and a few third-party exterior ones have proven GPT-5 to outperform all different fashions at coding, it seems that in actual world utilization, Anthropic’s just lately up to date Claude Opus 4.1 appears to do a greater job at “one-shotting” sure duties, that’s, finishing the person’s desired software or software program construct to their specs. See an instance beneath from developer Justin Solar posted to X :
Opus 4.1’s one-shot try at “create a 3d capybara petting zoo” – 8 minutes whole
This was actually fairly insane, not solely are the capybaras manner cuter and transferring, there are particular person pet affinity ranges, a day/night time switcher, feeding, and even a screenshot function pic.twitter.com/FiKTO3FKK4
— justin (@justinsunyt) August 7, 2025
As well as, a report from safety agency SPLX discovered that OpenAI’s inside security layer left main gaps in areas like enterprise alignment and vulnerability to immediate injection and obfuscated logic assaults.
Whereas anecdotal, the checking the temperature on how the mannequin is faring with early AI adopters appears to point a cold reception.
AI influencer and former Googler Bilawal Sidhu posted a ballot on X asking for a “vibe check” from his followers and the broader userbase, and thus far, with 172 votes in, the overwhelming response is “Kinda mid.”
Alright, GPT-5 vibe verify
— Bilawal Sidhu (@bilawalsidhu) August 7, 2025
The overwhelming consensus on GPT-5 from each X and the Reddit AMA are overwhelmingly detrimental
Most customers are disgruntled in regards to the damaged mannequin picker and non-pro customers not getting access to legacy fashions
What are your preliminary ideas on GPT-5?
— AI Leaks and Information (@AILeaksAndNews) August 8, 2025
A bit unhappy how the GPT-5 launch goes thus far, particularly after the lengthy wait and excessive expectations
– The automated switching between fashions (the router) appears partly damaged/unreliable
– It is unclear precisely which mannequin you are really interacting with (customary or mini,…
— Tibor Blaho (@btibor91) August 8, 2025
Competitors ready within the wings
Thus, the sentiment towards ChatGPT-5 is way from universally constructive, highlighting a serious downside for OpenAI because it faces growing competitors from main U.S. rivals like Google and Anthropic, and a rising record of free, open supply and highly effective Chinese language LLMs providing options that many U.S. fashions lack.
Take the Alibaba Qwen Workforce of AI researchers, who simply at present up to date their extremely performant Qwen 3 mannequin to have 1 million token context — giving customers the power to alternate practically 4x as a lot data with the mannequin in a single again/forth interplay as GPT-5 gives.
Given OpenAI’s different massive launch this week — that of recent open supply gpt-oss fashions — additionally obtained a combined reception from early customers, issues are usually not wanting up for the primary devoted AI firm by customers proper now (700 million weekly lively customers of ChatGPT as of this month).
Certainly, that is additionally exemplified by customers of the betting market Polymarket overwhelmingly deciding following the discharge of GPT-5 that Google would probably have one of the best AI mannequin by the tip of this month, August 2025.
Different energy customers like Otherside AI co-founder and CEO Matt Schumer, who obtained early entry to GPT-5 and blogged about it favorably in a overview right here, opined that views would shift as extra individuals discovered one of the best methods to make use of the brand new mannequin and adjusted their integration approaches:
A whole lot of of us who’re having a foul expertise are utilizing GPT-5 in agent harnesses that are not but optimized for it.
For each new mannequin launch, there is a time lag between launch + when firms that combine the mannequin have it really working nicely.
Agent firms rush to…
— Matt Shumer (@mattshumer_) August 8, 2025
Whereas it’s nonetheless early days for GPT-5 — and the sentiment might change dramatically as extra customers get their fingers on it and take a look at it for various duties — the early indications are usually not wanting like it is a “home run” launch for OpenAI in the identical manner that prior releases similar to GPT-4, and even the newer 4o and o3, had been. And that’s a regarding indicator for an organization that simply raised one more funding spherical, but stays unprofitable on account of its excessive prices of analysis and improvement.
Each day insights on enterprise use circumstances with VB Each day
If you wish to impress your boss, VB Each day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for optimum ROI.
An error occured.